Financing Common Crawl
Mozilla recently published an excellent new report out about Common Crawl, the non-profit whose web scrapes have played an important role in the development of numerous large language models (LLMs). Written by Stefan Baack and Mozilla Insights, the report is based on both public documents and new interviews with Common Crawl’s current director and crawl engineer, and goes into some detail about the history of the organization, and how its data is being used.