How Data Happened: My Thoughts (Part 1)

Cover art of “How Data Happened”. Image taken from the W.W. Norton website.

That’s right. After writing several novel reviews (and in this case, several does in fact mean seven if I can count), I give you today another nonfiction review. This one is of the first two chapters of How Data Happened by Chris Wiggins and Matthew Allen Jones. I’ve long been fascinated by numbers and analytics, so I naturally had to pick this book up when at a local bookstore the other day.

The book opens with an anecdote from Wiggins, who is a professor at Columbia University. Wiggins speaks of a day in April 2018 when Mark Zuckerberg, the CEO of social media giant Facebook was called in to testify before the United States Senate. Students in his course understood that Facebook used data and algorithms to influence the behavior of its users, which was a key issue of Zuckerberg’s testimony. More specifically, Zuckerberg was asked about how the use of this data could influence elections.

Of course, with data comes power. A reflection on this fact reminded me of David Pakman’s book The Echo Machine, which I recently reviewed for this very blog. In that book, Pakman talked about how students in Finland learn in math class how numbers can be manipulated to serve a certain narrative. How Data Happened is not, strictly speaking, about critical thinking, but rather possibly the effects of not thinking critically.

The first proper chapter of the book opens with a quote from one Melvin Kranzberg, who observed in 1986 that technology is neither good nor bad, nor is it neutral. In the intervening decades since 1986, of course, we’ve seen the impact that this technology has had on society that most certainly cannot be described as neutral. 

This thesis seems to have been reinforced even further every year. Look how many people are stuck in social media echo chambers claiming that climate change is a hoax, or that millions of people are going to die because of the COVID-19 vaccines. Many thinkers, of course, have taken notice.

Wiggins and Jones describe a tech conference in 2014 at which Hanna Wallach, a high-profile researcher at Microsoft, said that social scientists should be involved in analyzing algorithmic data. In 2014, this concept might have seemed revolutionary; by 2025, the more educated among us probably take such a statement’s value for granted. And of course, it bears repeating that this conference occurred before the “Trump era” of politics truly began in the United States. (This book, by the way, apparently focuses mostly on the USA, though the impacts of Big Data are also felt abroad).

Of course, it isn’t just private companies that extensively use societal data, sometimes gleaned from things as simple as Internet searches. The intelligence agencies of various countries (a list that included the United States, United Kingdom, Israel, and China as of 2014, but has no doubt grown immensely since then) also analyze it to no small degree. None of us are truly private online. That’s no secret, and yet there remain endless debates about privacy versus national security.

 Some would say these debates echo a common theme in American history, the conflict between individual rights and the well-being of the community. But that’s a conversation, again, for a different article.

For more than a decade, privacy concerns surrounding data collection and analysis have been discussed. In fact, Wiggins and Jones describe how Internet giants such as Google, Meta, and IBM currently employ AI ethicists. Given that Google currently forces AI summaries into your face after almost every search, we can argue about how effective these AI ethics teams truly are. This point does demonstrate, however, just how salient the issue has become.

It’s worth noting that around the turn of the 21st century, many believed (both among commoners and in academia) that the Internet would usher in a digital utopia. By today’s standards, this belief might seem hopelessly naïve, but it was very commonplace at a time that, in the grand scheme of things, wasn’t very long ago. Nowadays, the fact that technology isn’t neutral is taken for granted.

To be clear, as the authors note, data scientists and other scholars are divided between alarmists and optimists. Data has enabled numerous medical breakthroughs, some of which are still to come. And this should be celebrated - diagnosing diseases from genomic data will no doubt help save and extend lives.

 At the same time, for all the talk about how machines are replacing blue-collar laborers, even white-collar jobs are not exempt from this trend. Doctors have spoken out about how AI’s ability to diagnose diseases will make their expertise potentially obsolete. And this trend toward more data usage has skyrocketed in such a relatively short time.

To illustrate just how rapidly this has changed, a former Republican speechwriter, Peggy Noonan, was sent a hiring notice by Barack Obama’s 2012 presidential campaign that called for hiring several analytics professionals. Noonan remarked that this hiring notice sounded like “politics as done by Martians”. In 2012, these efforts were seen as insane, and yet, in the following presidential election cycle four years later, both major political parties had very extensive data operations.

The book’s second chapter discusses Adolph Quetelet, a nineteenth-century Belgian astronomer. He’s credited with inventing the Body Mass Index, a measure of a given person’s risk of weight-related health problems. He also popularized the idea of the “statistically average person.” Again, these are two concepts we take for granted today, but were revolutionary at the time.

One of the many ironies in the history of science is that even though his ideas would fundamentally reshape the way science is conducted, Quetelet’s inventions were driven by a desire for stability. If we the people of 2025 think the world is chaotic, it was no less chaotic two hundred years ago as European powers fought for dominance at home and abroad. Quetelet wanted an objective measurement of averages.

In the 18th century, European colonial powers needed money to continue waging wars of conquest. For a government to have money, it required taxes, and the bureaucracies needed to collect these taxes required data to function. In fact, the word statistics, another discipline we take for granted today, is derived from the word state because it originally referred to the knowledge of a state’s resources. I’ll admit that I’ve rarely given much thought to etymology, but I guess that word needed to come from somewhere.

To be clear, eighteenth-century European powers were not the first societies to utilize such information. Societies that existed long before then (the authors give the examples of the South American Incas and ancient China) also collected what we might consider statistical data. It’s just that Enlightenment Europe took this use of data to a far greater level.

In any case, our friend Quetelet was an astronomer, and he observed that the positions of stars in the sky tended to vary across time, from person to person, and depending on the instrument used. Quetelet used the “bell curve” concept that statisticians today are highly familiar with and applied it to data about human beings. 

Today, of course, generating “average people” is instrumental in understanding the world. Governments all over the globe use statistics derived from averages, such as GDP per capita and crime rates, to shape policy, and this is just taken for granted. Of course we do it this way. But it hasn’t always been this way.

The end of the second chapter hints at some of the ways Quetelet’s successors have used the idea of “average people” for both good and ill. In the modern-day United States, a country that has suffered from the legacy of slavery and lingering systemic racism, predictive policing has generated significant controversy. Its detractors are concerned that racial minorities will be seen as inherently more likely to commit crimes, and that this practice will only exacerbate the disparities in our criminal justice system.

However, we don’t need to go all the way to 2025 to find an example of data being used to justify “scientific racism.” At the close of Chapter 2, the authors mention Sir Francis Galton, the “father of eugenics”, whom it seems the third chapter will focus on. He was an extended relative of Charles Darwin, who discovered the theory of evolution by natural succession, no doubt one of the most important scientific discoveries in the last few centuries due to its extensive implications in many fields.

Overall, I am greatly enjoying How Data Happened so far. I probably sound like a broken record saying this, but the current state of affairs with regards to data usage might seem inevitable. If there’s one thing this book has taught me, though, Big Data was anything but inevitable.

Thank you for reading.

Next
Next

The Echo Machine Review (David Pakman)