Language Model Hacking
With the widespread success of language models on many tasks in a zero-shot setting, there has been a huge surge of interest among social scientists in wanting to use them to code or classify documents, sometimes in place of human annotators. Given both the freedom to specify prompts, and the lack of connection to domain-specific training data, a concern naturally arises as to how easily people can manipulate their designs to produce a desired conclusion. This had been on my mind recently, and so I was delighted to see that a couple of recent papers specifically take on this question, both of which conclude that there is indeed considerable latitude to produce a desired finding by manipulating the choices involved. These are extremely useful and important results, although they also open up some questions for me, which I wanted to think through here.