Examining Crowdsourcing

This post originally appeared on Cody Carpenter’s blog. It’s part 1 in a four-part series examining crowdsourcing in digital cultural heritage work.

Part 1: What is Crowdsourcing?

Crowdsourcing can be applied to anything from this jigsaw to an archive of cultural data – we just have to realize that everybody holds a different piece of the puzzle. Photo credit Hans-Peter Gauster sloppyperfectionist, CC0, via Wikimedia Commons.

“…distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains…It’s not outsourcing; it’s crowdsourcing.”

These words form the backbone of “The Rise of Crowdsourcing,” Jeff Howe’s seminal 2006 Wired article which helped to popularize the term. Even before the Internet, gathering data by way of the public existed, but it cannot be overstated how much global connectedness has helped revolutionize the process. In this five-part blog series, I want to explore the accountability aspect of crowdsourcing – in layman’s terms, how can we sift the good data from the bad if it’s all coming from the public? But in order to examine that question, first a quick overview of what exactly crowdsourcing is would seem to be in order.

Howe’s article describes crowdsourcing as harnessing the “productive potential of millions of plugged-in enthusiasts” in order to bring together a cohesive whole of anything you might imagine. For the article, it’s written from the standpoint of business needing labor performed; for our purposes, it would be sending a call out to the Internet to gather data on any number of topics. From transcribing digitized shipping lists to contributing your own oral history of a certain event, person, or place, chances are, if you can think of it, someone has a need for it. Or, by virtue of the Internet, you can create the niche if it doesn’t already exist.

This is the power and primary appeal of crowdsourcing: anyone can contribute anything. One doesn’t need to be an expert on labor movements at the turn of the twentieth century to transcribe a pamphlet with the time and location of a secretive union meeting. An oral history passed down from grandparent to parent to child can find a permanent home in an archive, digitized and able to viewed by countless future generations.

A further example can be seen in a case study published in Digital Humanities Quarterly in mid-2020, “Crowdsourcing Image Extraction and Annotation: Software Development and Case Study”. Ana Jofre and her fellow authors of this study aimed to use “…large, image-heavy corpora, in particular periodical archives, to gain insights into cultural history.” (Jofre et al.) Sorting through such an enormous array of images as can be found in the archives of a periodical – the study used Time magazine, which has been around since 1923 – is labor-intensive to say the least. The authors, thus, turned to the method of crowdsourcing to achieve their aims: “A viable alternative is crowdsourcing, the process of enlisting untrained individuals to perform computationally intensive tasks…” (Jofre et al.)

Using software developed specifically for a task such as this, the authors set the participants in the study the task of scanning through pictures in the Time magazine archive and assign them different tags. These tags were general identifier such as race, gender, expression, apparent age, etc. (Jofre et al., Table 1)

A simple task to be sure, but when confronted with the vastness of an archive such as Time’s, a bit daunting. The participants worked with sets of 25 or 50 faces, and the rate of accuracy they returned may be surprising: an “effective accuracy of 80.4%” for the larger sets and “above 87%” for the smaller sets. (Jofre et al.)

That result puts Joe Q. Public in a fairly high percentile range. Not perfect to be sure, but imperfections can be dealt with and corrected. Crowdsourcing and the general public don’t seem to be half bad at this.

In her book Democratizing Our Data, Julia Lane exhorts us: “There are new data, new tools, and new technologies that can be combined in new ways to create new evidence…and truly democratize our data.” (Lane, Democratizing Our Data, 9) What can be more democratic than data contributed to, and assessed by, the public at large as crowdsourcing promises to do?

Of course, not all data is genuinely helpful, nor should one expect it to be. Join me in Part Two to further examine this situation, and what can be done about it.

Source: Part 1 – What is Crowdsourcing?

Visit Cody Carpenter’s site for the next parts of the series.