In 2012, Brooklyn police officer Michael Rodrigues arrested a burglary gang, the Brower Boys, by adding gang members as friends on Facebook. The day of the arrest was like gathering the lowest-hanging fruit. “It’s break-in day on the avenue,” one gang member posted in his status message. Officer Rodrigues and colleagues tracked the gang members to the avenue in question. They photographed the young men committing the crime, and then arrested them.

For the past several years, police and prosecutors across the country have been quietly using social media to track criminal networks. Their methods have become more sophisticated: by combining social media APIs, databases, and network analysis tools, police can keep tabs on gang activity. In New York’s Harlem neighborhood, at-risk teens are identified as members of gangs based on their affiliations and are monitored on Instagram, Facebook, and Twitter.

Teens are profiled using various criteria, including the number of followers. “The average teenager has about 300 friends or followers. These kids have thousands,” says Jeffrey Lane, an urban ethnographer at Rutgers and the author of the forthcoming book The Digital Street. Lane spent five years in Harlem hanging out on the corner with kids and with cops to discover how digital technology is woven into the fabric of community life in the inner city.

The police began using social media almost by accident, he says. One officer discovered over the course of ordinary social media use that he could see the status updates of neighborhood kids. Soon, cops and prosecutors were looking at photos to figure out who might be a witness in a particular case. Bystanders could be identified from the background of photos posted on social media sites. If a kid posted a time-stamped photo of himself standing in front of a door, and the cop recognized the doorway, it could be relevant in an investigation.

Today, police across the country regularly use social media data to keep tabs on citizens. 75 percent of them are self-taught, according to a 2014 Lexis-Nexis research report on social media use in law enforcement. “Facebook has helped me by identifying suspects that were friends or associates of other suspects in a crime and all brought in and interviewed and later convicted of theft and drug offenses,” said one respondent interviewed in the report. “My biggest use for social media has been to locate and identify criminals,” said another. “I have started to utilize it to piece together local drug networks.” Only 9 percent of respondents had received training on using social media in investigations from their agency.

Social media can produce evidence in some cases, but it also fails to capture the complexity of human relationships—and can sometimes distort them. For this reason, it is important to take care that social media data is not misused or misinterpreted in the pursuit of justice.

Take the case of Jelani Henry, a young man from Harlem profiled recently in The Verge. As teenagers, Jelani and his older brother Asheem ran with a Harlem crew called Goodfellas. Asheem, who was more heavily involved than Jelani, was arrested during a police gang raid in November 2011 and charged with criminal conspiracy. Jelani was involved in the crew by virtue of being Asheem’s brother, and because he lived in the neighborhood—but except for a few minor scuffles, he steered clear of violence. Most of the time, Jelani told a reporter, "What they was doing, I wasn’t doing."

But like most teens, Jelani was keenly aware of his social position. "People are looking to see how you respond," Jelani told a reporter. He explained that if someone from your crew posts a video of a fight to Facebook or YouTube, it is expected that you click the “like” button next to it. If not, he said, "people are gonna ask you why."

That social media footprint came back to haunt him. Jelani was arrested five months after Asheem and charged with a double shooting. The family says that Jelani was labeled a criminal affiliate because of his social media connections. When a shooting happened on 129th Street, and the description of the shooter was “a tall light-skinned black man in a hoodie,” Jelani was hauled in—because he fit the ambiguous description and because he was labeled in a database as an affiliate of the recently busted Goodfellas. He was held in Rikers Island for two years, including nine months in solitary confinement, all the while protesting his innocence.

It’s easy to imagine two people who are closely linked, yet follow very different paths. Take a pair of siblings, one trying to toe the line and the other a bit wild: It’s an old story that appears throughout Western culture, as in Cain and Abel, Romulus and Remus, or Meg and Jo March. But in computer terms, there is no such nuance. Because of his social connections, a person like Jelani Henry can be entered into a database of suspected criminals and viewed with suspicion for the indefinite future.

The fundamental problem with policing via social-media data is that it misrepresents what social networks actually look like on the ground. Despite what techno-evangelists might wish, not all social relationships can be described using computational logic. The problem is structural and epistemological. Like all computer programs, databases are ultimately based on binary logic. If you want shades of meaning, you have to explicitly build that capability into your system. And building nuance is far harder than it seems.

On Facebook, there are only two options for a post: either you click the like button, or you don’t click the like button. There’s no field for someone like Jelani Henry to indicate “I clicked the like button on this post so I wouldn’t get harassed on my way to school.” A like is simply a number used as a flag, true (1) or false (0). Humans are the ones who invest likes with context and meaning. The computer only displays the results of its computation.

When you click to show the list of people who “liked” a post, you probably think you’re getting a list of people who expressed positive sentiment toward a particular combination of sentences. But computationally, what you’re getting is slightly different. “Show me a list of the people who liked a Facebook post” is actually a command more like “Display a list of FirstName and LastName for usernames where the flag LikedThisPost = True.” There are a lot of assumptions built in there. There’s the assumption that the username corresponds to a single real person, which is not always true—people have multiple Facebook accounts, and some accounts have multiple people posting to them, and some accounts are fake. There’s the assumption that the agent that clicked the like button is the same person referred to by the username—not necessarily true. People often forget to log out of their Facebook accounts on desktops and laptops, or use other people’s phones to browse social media.

There’s also the false assumption that clicking the like button always expresses a positive sentiment. The other day, a Facebook friend of mine posted that his mother had died suddenly. I clicked like on the post. Obviously, I was not expressing positive sentiment toward death—I was clicking the button to express sympathy. (I also followed up with a note, because grief is too devastating and expansive for a click to be sufficient.) From a computational perspective, the flag on my like of my Facebook friend’s post about his mother’s passing looks exactly the same as the flag on Jelani Henry’s like of his Facebook friend’s post about an afterschool fight. To me, the two flags have totally different meanings. To the computer, neither has any “meaning” beyond value=0 or value =1.

A like button is merely a tool. Humans use tools in breathtakingly creative ways—this is one of the many exciting and inspiring things about social uses of technology. However, the meaning a person imparts to an action on a social-media platform does not always correspond to the actual intent. In real-life social interactions, nuance is everything. On social media, where that nuance is obscured, we ought to be hyper-critical about the ethical ramifications of using social media data for real-world judgments.

There are many known hazards to data-driven policing. “Crime network data in general have limitations and biases,” write sociologists Amir Rostami and Hernan Mondani in a case study about gang databases. An observational study in Arizona showed that police were more aggressive with documented gang members, using excessive force more often than with individuals not documented in a gang database. Listing a teen in a database as a gang affiliate could bias future prosecutions against them. A district attorney or cop looking for a suspect could automatically assume that the kid who’s listed in the gang database is more likely to be involved than the kid who isn’t. This specific bias is embedded in IBM’s CopLink, a software package in use at police departments across the country since 1996. “The premise behind CopLink is that most crime is committed by persons who are already in police records,” writes Meghan S. Strohine in Critical Issues in Policing: Contemporary Readings. Simply creating an entry in a database labeled “criminals” tinkers with the presumption of innocence. Rebecca Rader Brown writes of this issue in the Columbia Journal of Law & Social Problems:

Gang databases may also interfere with an individual’s First Amendment Freedom of Association. Since a person may be documented for affiliating with other known or suspected gang members, he may be targeted as a suspect before committing any criminal act. Using a “guilt by association” standard can have the effect of sweeping entire neighborhoods into a gang database. This effect is felt disproportionately by minority populations due to geographic targeting of anti-gang efforts. In certain localities, police tend to document minorities for behaviors that, if observed among members of the majority population, are considered innocuous.

For the kid listed in a gang database, it can be unclear how to get out of it. In the world of human interaction, we accept change through behavior: the addict can redeem himself by getting clean, or the habitual interrupter can redeem himself by not interrupting. We accept behavior change. But in the database world, unless someone has permission to delete or amend a database record, no such change is possible. Credit agencies are required to forgive financial sins after 7 years. Police are not—at least, not consistently. The National Gang Center, in its list of gang-related legislation, shows only 12 states with policies that specifically address gang databases. Most deny the public access to the information in these databases. Only a few of these twelve mention regular purging of information, and some specifically say that a person cannot even find out if they have a record in the database.

This permanence does not necessarily match real-world conditions. Kids cycle in and out of street gangs the way they cycle in and out of any other social group, and many young men age out of violent behavior. Regularly purging the gang database, perhaps on a one-year or two-year cycle, would allow some measure of computational forgiveness. However, few institutions are good at keeping the data in their databases up-to-date. (If you’ve ever been served an ad for a product you just bought, you’re familiar with this problem of information persistence and the clumsiness of predictive algorithms.) The police are no worse and no better than the rest of us. Criminologist Charles Katz found that despite a written department policy in one large Midwestern police gang unit, data was not regularly audited or purged. “The last time that the gang unit purged its files, however, was in 1993—approximately 4 years before this study was conducted,” he wrote. “One clerk who is responsible for data entry and dissemination estimated, ‘At a minimum, 400 to 500 gang members would be deleted off the gang list today if we went through the files.’ Accordingly, Junction City’s gang list of 2,086 gang members was inflated by approximately 20% to 25%.”

Does current technology offer better alternatives? One way to gauge the level of sophistication in software knowledge is by browsing Github, a popular code-sharing platform. A search for the term “criminal database” reveals six different free, open-source database applications that anyone can download and use. None of them contain an expiration date, or any regulations about purging, or any kind of guidance on ethical use.

When I talk about the ethical responsibility of software programming, I usually get a question like, “If Amazon can predict what book I want to buy next, hasn’t this problem already been solved?” The answer is inevitably no. When computer scientists were building the Internet in the late 1980s, there weren’t any widely adopted or established ethical guidelines because we were building these systems for the first time in human history. The Association for Computing Machinery, the central professional organization for computer science, does publish ethical guidelines. They are recommendations, not requirements; following the guidelines is left up to individual programmers.

Now that the Internet is thirty years old, the long-term consequences of information permanence are becoming clear. We also need to acknowledge that computer systems are not a panacea. “Your program really does stink, and the sooner you get used to the idea, the better,” writes Nathan S. Borenstein in Programming as if People Mattered. “The inadequacies of your software are simply a reflection of your frail, shortsighted, and limited human nature. Every program ever built is doomed to eventual obsolescence.” We need to put people before programs, and if programs don’t reflect our human values, we need to change the code. And if programmers can’t write code that is fair and just, we should consider relying on people instead of programs.

If American law enforcement is going to go deeper into the brave new world of data-driven policing, we need to create systems that have human values embedded in them. If our technological systems are entrapping innocent citizens or tampering with the presumption of innocence, should they be used?