Data and Justice

An Introductory Lesson

By Jacob Pleasants

This is a lesson that I have used with college-level students to introduce them to issues around data collection and representation. One way to approach this is to focus on human subjects research ethics, and that’s definitely important. Alternatively, you could look at the data-harvesting practices of Big Tech - an example of handling human data in an industry rather than research context.

My goal with this lesson is a little bit different. In this inquiry, I want to help students recognize how seemingly “mundane” data practices can have unexpected consequences that are ethically significant. Even if students recognize the importance of treating human subjects data with care, they might conclude that measurements of nonliving entities is essentially neutral. But this isn’t the case. Any time we gather and report data, we are participating in sociotechnical systems and pursuing (intentionally or not) certain values and goals. 

This lesson takes a lot of inspiration from Catherine D’Ignazio and Lauren F. Klein’s 2020 book, Data Feminism. The goal of the lesson is to raise (though not necessarily answer) the following questions:

  • Are data neutral?

  • Who “owns” data?

  • Who decides what is done with data?

  • What ethical issues arise when data are collected?

To begin: I present the following scenario:

Imagine that you are working on a project that addresses the issue of PFAS: a group of “forever chemicals” that are used in a wide variety of products, from nonstick cookware to firefighting foams to waterproof clothing. PFAS have lots of useful characteristics, but they also accumulate in the environment and break down very slowly (hence, “forever” chemicals). We have been producing these chemicals for decades, and it was initially thought that they were not harmful. However, several PFAS have now been identified as toxic, and the EPA has introduced regulations that limit their presence in drinking water.

PFAS can accumulate in soil as well as water, and they are everywhere. But like any contaminant, they are more abundant in some places than others. So, imagine that you are part of a team that wants to create a map of PFAS contamination in the city where you live and its surrounding areas. Doing that will require taking samples from around the area you want to map and measuring the PFAS levels. Don’t worry about the technical details of how those measurements are taken - someone else on your team has that responsibility. Your job is to figure out how to actually go about the task of gathering and managing the data.

What are some potential issues that could arise during this project? What are some ways that you might navigate those issues?

At first glance, this data collection effort seems pretty benign. Isn’t it obviously beneficial to know more about potential contamination from these harmful chemicals? I don’t ask my students to extensively discuss this scenario yet. I present it to them and have them simply ponder it. Before we dig in, I present some related examples that illustrate the potential issues.

  1. Who owns these data?

Data have become a commodity, and a valuable one at that. Why else would companies invest billions of dollars in creating data centers? There are countless issues of ownership that one could raise, but let’s take a look at one that, on its face, seems rather tame. My home, like most, has an electricity meter on it that supplies hourly usage data. That is, it measures the amount of electricity used in a given hour (even though my overall bill is based on monthly usage). I have access to an online portal where I can look up my hourly usage for whatever day I’d like. Here’s what one of those charts look like:

So, I can access my usage data. But who actually owns the data? And why would anyone care about that?

Well, here’s an interesting use case. There are all kinds of “smart home” technologies that are designed to utilize your home electricity data. Utility companies will often incentivize you to use electricity at certain hours of the day, and there are technologies that will automate your home systems to take advantage of that. But of course, for those technologies to respond to your electricity usage, they need access to your data. But what if you don’t actually own the data? This, it turns out, is an actual issue (one that is more extensively explored on this episode of Volts). I can access my usage data through the online portal sort of, but only in a limited way. I cannot actually get, for instance, a spreadsheet with all my usage statistics. And the utility will not give the data to me in real time, or let me integrate my meter data with another technology. It’s not that it’s technically difficult to do those things (it’s pretty easy). The issue is that the utility “owns” my usage data, and they don’t really have a strong desire to just give it away (especially if it means I would end up reducing my electricity usage – you know, the thing that the utility sells me).

2. What could be done with these data?

Related to the subject of ownership, who can access my usage data, and under what circumstances? If I don’t own the data, who even gets to decide those kinds of things? Again, at first, you might wonder who would care. But if you think a little more about your electricity usage, you can see how all sorts of things might be revealed by it. So, I ask students:

What could you infer about me (or someone else) if you had full access to their electricity usage data? [I’ll use this formatting for other student-facing questions]

For instance, law enforcement agencies have tried to leverage energy usage data to locate marijuana grow operations. Government regulations have successfully blocked some of those efforts (for now), but it shows what could be done with seemingly mundane data.

Go back to my energy usage chart. What sorts of inferences could you make about me based on that chart? The first thing that comes to mind is that you could pretty easily tell when I am home and when I’m not, which isn’t something I necessarily want to be public knowledge. If you had lots of my usage data, you could probably make other inferences as well. Do I have an electric vehicle, for instance? Do I seem to be operating a grow operation?

To build on this idea, I bring up another example of how data that are created for one purpose can be used in unexpected ways. A consequential example is the credit score. The purpose of the credit score is to take a measure of someone’s credit risk for the purposes of financial lending. We could certainly interrogate the way those scores are calculated in the first place, but let’s table that for now. Instead, it’s worth considering the unintended ways that credit scores have been put to use. Just as having access to my electricity usage creates opportunities to make inferences about how I live my life, having access to my credit score opens the door to inferences about me that may or may not be warranted. Two cases are particularly salient: the use of credit scores for hiring and housing decisions.

Why do you suppose an employer or landlord would want to know someone’s credit score? Why is this practice concerning?

3. Whose goals are pursued with these data?

People decide which data to collect, which data to report, and how those data are represented. Through those decisions, certain stories are told, and others are not. Certain objectives are pursued, and others are not. 

The City of Detroit has an “open data portal” that includes a Crime Viewer app. Below is a screenshot of what this looks like.

What story is told by this data representation? 


One impression that is pretty easy to take away from these kinds of maps is that certain parts of a city are dangerous. Certain places would therefore be risky to visit or live in. Heck, maybe some parts of the city are so risky that you wouldn’t want to, say, give someone a mortgage to buy property there. Sounds kind of familiar (image below from Detour Detroit).

In what ways is the Crime Viewer story similar to the one told in the redlining map? In what ways are they different?

Not all stories of danger and risk and hazard necessarily pursue the same kinds of goals and outcomes. It depends who is making the decisions and setting the priorities. D’Ignazio and Klein (2020) present this alternative example from Detroit (original report here). The intention of this data report was not to stoke fear but to draw attention to a serious problem being experienced by people in a particular community.

Why is that such an important distinction?

4. Data is just the beginning… what actions come next?

The previous examples are related to human conflicts, but what about something more environmental (like the PFAS example)? For environmental situations, isn’t more knowledge a good thing? It seems like it ought to be less politically charged than data about safety and conflict.

An instructive example: what parts of a city are most susceptible to natural hazards such as flooding? The following map comes from the FEMA Flood Map Service Center

You probably would want to know how susceptible your home is to flooding, right? Nobody wants their home to be in a high risk area. But better to know than not so that you can adequately prepare. If you lived in a place that had a risk of earthquakes or tornadoes, you’d similarly want to know such things. You’d also probably want to have confidence that whatever risk appears in a map like the one above is based on sound measurements. 

The trouble is that risk is not a stable thing, especially in a changing climate. Places that once had a low risk of flooding may now be at a much higher risk. Those “hundred year floods” may now be more like “five year floods.” So, we should probably be updating those risk maps on a regular basis to ensure that they are as accurate as possible, and so that people can take appropriate actions. But let’s slow down a moment. 

If we were to suddenly update a flood risk map, what would be the consequences for people who now find themselves in a higher risk area than they originally believed?

Actions will be taken, to be sure. But those actions are going to cause significant burdens for certain people. Who ought to bear the costs that come from this new knowledge? This gnarly situation is summarized really well in a lovely episode of NPR’s The Indicator.

Suppose that a group of scientists came to a city, drew up some new flood risk maps, handed those maps to the city government, then walked away. In what ways might that actually be a highly unethical thing to do?

The key point here is that if we are going to go out and collect data on natural hazards or environmental risks or contamination or pollution or anything else, we need to think very carefully about what comes next. Who is responsible for taking action based on the knowledge that comes from those efforts?

Back to the PFAS issue…

aImagine that you are working on a project that addresses the issue of PFAS: a group of “forever chemicals” that are used in a wide variety of products, from nonstick cookware to firefighting foams to waterproof clothing. PFAS have lots of useful characteristics, but they also accumulate in the environment and break down very slowly (hence, “forever” chemicals). We have been producing these chemicals for decades, and it was initially thought that they were not harmful. However, several PFAS have now been identified as toxic, and the EPA has introduced regulations that limit their presence in drinking water.

PFAS can accumulate in soil as well as water, and they are everywhere. But like any contaminant, they are more abundant in some places than others. So, imagine that you are part of a team that wants to create a map of PFAS contamination in the city where you live and its surrounding areas. Doing that will require taking samples from around the area you want to map and measuring the PFAS levels. Don’t worry about the technical details of how those measurements are taken - someone else on your team has that responsibility. Your job is to figure out how to actually go about the task of gathering and managing the data.

What are some potential issues that could arise during this project? What are some ways that you might navigate those issues?

At this stage, students are ready to take on this question, and they can usually generate quite a few possible issues. Coming up with ways to navigate those issues is especially productive. I generally have my students work on this problem in pairs before bringing their ideas together as a whole group. Some students will often suggest that the entire project is ill-advised and that if we want to respond to the issue of PFAS, then this mapping endeavor really doesn’t make all that much sense. That’s a reasonable stance to take, of course, but it’s important to at least consider what could be gained from the project - and whether we are the ones who should judge whether or not it should occur. Often, students will suggest that they engage with the community itself. How do they want this project to be done? Do they want it to occur at all?

Because while this project might be about measuring a nonliving chemical substance, at the end of the day it’s about people. It’s always about people.