ChatGPT Dominance

I expect that almost anyone reading this will have heard of ChatGPT by now. Released about a month ago, ChatGPT is a system developed by OpenAI which provides text responses to text input. Although details are scarce, under the hood ChatGPT is basically a large language model, trained with some additional tricks (see Yoav Goldberg’s write up for a good summary). In other words, it is a model which maps from the text input (treated as a sequence of tokens), to a distribution over possible next tokens, and generates text by making repeated calls to this function, and sampling tokens from the predicted distributions.

AI, software, and governance

In a recent article covering the FTX collapse, the New York Times described large language models (LLMs) as “an increasingly powerful breed of A.I. that can write tweets, emails and blog posts and even generate computer programs.” There is a lot that we could pick apart in this definition (e.g., what makes LLMs part of a “breed”, what distinguishes the ability to write an email as opposed to a blog post, etc.), but for the moment I’d like to focus on the term “A.I.” (henceforth “AI”). Referring to LLMs as an example of AI is certainly not atypical. Indeed, it increasingly seems like LLMs have become one of the modern canonical examples of this concept. But why is it that we think of these systems as members of this category? And how much rhetorical work is being done by referring to LLMs as a type of “AI”, as opposed to “models”, “programs”, “systems”, or other similar categories?

Hacking LLM bots

For anyone who missed it, a Twitter account named @mkualquiera recently deployed what seems like a kind of adversarial attack in the wild on a large language model (LLM)-based Twitter bot. I’ll link to the key post below, but it’s worth providing a bit of context, as it wasn’t immediately clear to me what was going on when I first saw the tweet.

Report from FAccT 2022

The fifth iteration of FAccT (the ACM Conference on Fairness, Accountability, and Transparency) was held earlier this month (June 21–24) in Seoul, South Korea. More than just a hybrid conference, this was actually a full in-person conference, combined with a full on-line conference. These happened in parallel, with virtual sessions starting before and continuing after the in-person component each day. Around 500 people attended in person, with another 500 participating remotely.

Counting Deaths

Although morbid, it’s fascinating to read a recent article in the NYT about efforts in Sierra Leone to use “electronic autopsies” in a large scale attempt at counting deaths. According to the article, this undertaking is part of a broader effort at data collection, including questions on age, religion, marital status, etc. The novelty, it seems, is in trying to be thorough with respect to what people have died of (including extensive questions about symptoms), even though this information is being collected potentially long after the fact.

Modular Domain Adaptation

Despite their limitations, off-the-shelf models are still quite widely used by computational social science researchers for measuring various properties of text, including both lexicons, like LIWC, and cloud-based APIs, like Perspective API. The approach of using an off-the-shelf model has some definite advantages, including standardization and reproducibility, but such models may not be reliable when applied to a domain that differs from the ones on which they were developed…

Stability and Change

One of the biggest frustrations with software is that things are constantly changing. From operating systems to apps to web interfaces, things rarely remain the same for very long, especially for users of Windows or MacOS.

There are many reasons for this of course. For decades, hardware has continued to improve at a steady rate, and so software is constantly being rewritten to take advantage of the latest capabilities. Moreover, the incredibly sloppy standard for software quality and reliability (compared to traditional engineering disciplines) means that even the most professional software is shipped with massive numbers of bugs and vulnerabilities, which constantly need to be patched. This is a particularly large problem in institutional settings which are not set up for this pace of updates; some of the worst effects of the WannaCry ransomware attack, for example, were on hospitals that were still using hopelessly out of date Windows machines.

Confidence in Science

I’ve been thinking recently about the role of confidence in science, and how long beliefs can persist simply because everyone else seems to believe them. Coincidentally, Andrew Gelman posted about this two days ago, responding to comments from a biologist about how the replication crisis had not been a major problem in biology. Her argument was that this was because biology is a “cumulative science”. By this she meant that when something important gets published, it is often the kind of discovery that people want to use immediately. If the original claims was wrong, people will quickly figure it out.

AI Dermatology: Part 2

​In the last post, I discussed the possible broader implications of Google’s recent foray into making an AI dermatology tool. In this follow up post, I want to focus on the research behind the product announcement, bringing a slightly critical eye.

AI Dermatology: Part 1

Midway through last year Google announced a new foray into the medical technology space, sharing that it was developing an “AI-powered dermatology assist tool”—a phone-based app that would allow users to take photos of skin lesions and retrieve information about relevant medical conditions from the web. Similar apps already exist, but it’s fair to say that a comparable effort by Google is likely to have much more significant effects on how people interact with the medical system, their personal data, and even their own bodies.