Granular Material - Granular Material

Mar 31, 2024

Financing Common Crawl

Mozilla recently published an excellent new report out about Common Crawl, the non-profit whose web crawls have played an important role in the development of numerous large language models (LLMs). Written by Stefan Baack and Mozilla Insights, the report is based on both public documents and new interviews with Common Crawl’s current director and crawl engineer, and goes into some detail about the history of the organization, and how its data is being used.

Tags: common-crawl , knowledge-infrastructure , large-language-models , archives , digital-preservation , wayback-machine , data , power

Feb 22, 2024

ChatGPT Prompt Speculations

In a recent tweet that went viral, Dylan Patel claimed to have discovered or revealed the ChatGPT prompt, using a simple hack. The tweet included a link to a text file on pastebin and a screenshot of that same text with newlines removed. More interestingly, the author suggested in a reply that anyone could replicate this finding, and a subsequent tweet included a video of ChatGPT generating text in response to the same trick. That, however, is where things get somewhat strange.

Tags: chatgpt , large-language-models , reproducibility , augmented-reality-games

Jan 28, 2024

Infinitely Wide Culture

A lot of this is still unresolved in my mind, but I think there is something interesting happening at the intersection of generative AI, art, style, and entertainment. Rather than letting it gestate until more fully formed, I figured I’d just post some preliminary thoughts and come back to this at some later date.

The main reason I’m thinking about this now is the ongoing debate about how generative AI will impact creative fields, such as writing and design (as well as white collar jobs more broadly). Arguably there have already been some pretty dramatic effects, such as the sci-fi magazine Clarkesworld being suddenly overwhelmed by spammy submissions. At the same time, it’s hard to know the extent to which these disruptions may end up being transient phenomena that broader systems will adapt to.

Tags: books , art , music , film , fiction , criticism , commentary , culture , stability , instability , large-language-models , generative-ai

Dec 14, 2023

Tightly-woven Cultural Commentary

I recently started listening to the Art of the Score podcast, which does an amazing job of unpacking movie soundtracks in depth, beginning with Raiders of the Lost Ark in the first episode. Like most such explorations of music, however, it ultimately leaves me wanting something that I’ve long thought there should be much more of—namely, more tightly interwoven cultural commentary, especially for music. As I will describe below, I’m broadly interested in different ways of combining context, commentary, and criticism with the thing being commented on or critiqued.

Tags: commentary , criticism , culture , music , art , books , film , museums , podcasts , visualization , science

Nov 27, 2023

Altair vs. Bokeh (part 2)

In the first part of this series, I used a basic bar plot to illustrate the differences between Altair and Bokeh in terms of defaults, chart configuration, and syntactic style. In this post, I’ll get into some of the more substantive differences, including more complicated chart types, combining plots, basic interactivity, and how to deploy the output online. Note that this is not intended to be a systematic comparison, but rather more of a preliminary exploration of the options.

Tags: visualization , altair , bokeh

Oct 25, 2023

Refik Anadol

Last week’s speaker in the Penny Stamps Speaker Series was Refik Anadol, one of the most successful artists working primarily with AI, data, and visualization. Over a decade or so, Anadol has produced work in collaboration with organizations and venues like the LA Philharmonic, Gaudi’s Casa Batlló, the Exosphere in Las Vegas, and the Museum of Modern Art in New York.

The lecture he gave was basically a review of his body of work, and that of his art and design studio, which is comprised of about 20 people. The earlier projects he showed mostly took the form of massive scale light projections on the sides of buildings, which could give the impression of a totally different texture or form. More recent work tended to use data driven visualizations, abstracting some sort of data into visual patterns displayed on a wall, building, or screen. Many of these also made use of what appeared to be procedural animations to create the impression of three dimensionality and motion.

Tags: machine-learning , artificial-intelligence , archives , visualization , memory , art , refik-anadol , zach-leiberman

Sep 17, 2023

Ubi Sunt

There is a duality at the heart of large language models. On the one hand, they are essentially a backwards-looking invention, a “cultural technology”, in the words of Alison Gopnik—algorithms which index and remix a large slice of human culture (though one that is typically heavily biased towards the recent past). On the other hand, they can often seem to be producing something entirely new, and can thereby leave many people with the impression of having a personality or even “sentience” (whatever that means exactly); in the most extreme cases, some people have apparently convinced themselves that such models are a step on the path towards some sort of successor species to humanity, a new regime of algorithmic children that will survive our own human catastrophes. Complicating matters here is the fact that the emergence of and widespread attention to these systems largely overlapped with the Covid-19 pandemic, a time in which we have all had additional reason to reflect on life, death, loss, and creation.

Tags: machine-learning , large-language-models , archives , books , fiction , memory , mortality , digital-preservation

Aug 28, 2023

University of Michigan's New AI Tools

Just before the start of the fall semester, the University of Michigan announced that it was launching a new suite of tools, all focused on “generative AI”, (although so far limited to language models), which will be available to all students, faculty, and staff. This post provides a preliminary exploration of the new offerings.

Tags: large-language-models , chatgpt , reproducibility , evaluation , open-science

Jul 25, 2023

Altair vs. Bokeh (part 1)

This is the first of what I hope will be a series of posts comparing Altair and Bokeh. Both are actively supported python packages for making interactive visualizations. This post will only scratch the surface, but is intended to show the basic differences in how they approach creating visualizations.

Tags: visualization , altair , bokeh

Jun 11, 2023

The Gradual Disappearance of Twitter

It was recently reported by inews.co.uk that Twitter is going to start charging academic researchers and institution $42,000/month if they want to maintain their current level of expansive access to data, and – more significantly – require that they delete all Twitter data from their archives if they do not. I’d heard rumors that this might be happening a few weeks ago, but the iNews article is the first independent reporting that I’ve seen about it.

Tags: twitter , archives , sociotechnical-systems , reproducibility , change , instability

Newer

Older