Right now, you’re more than likely spending the vast majority of your time at home. Someday, however, we will all be able to leave the house once again and emerge, blinking, into society to work, travel, eat, play, and congregate in all of humanity’s many bustling crowds.
The world, when we eventually enter it again, is waiting for us with millions of digital eyes—cameras, everywhere, owned by governments and private entities alike. Pretty much every state out there has some entity collecting license plate data from millions of cars—parked or on the road—every day. Meanwhile all kinds of cameras—from police to airlines, retailers, and your neighbors’ doorbells—are watching you every time you step outside, and unscrupulous parties are offering facial recognition services with any footage they get their hands on.
In short, it’s not great out there if you’re a person who cares about privacy, and it’s likely to keep getting worse. In the long run, pressure on state and federal regulators to enact and enforce laws that can limit the collection and use of such data is likely to be the most efficient way to effect change. But in the shorter term, individuals have a conundrum before them: can you go out and exist in the world without being seen?
Systems are dumber than people
You, a person, have one of the best pattern-recognition systems in the entire world lodged firmly inside your head: the human brain.
People are certainly easy to fool in many ways—no argument there. But when it comes to recognizing something as basic as a car, stop sign, or fellow human being—literally the kinds of items that babies and toddlers learn to identify before they can say the words—fooling cameras is in many ways easier than fooling people. We’re simply trained by broad experience to look at things differently than software is.
For example, if you’re driving down the road and see a stop sign at night, you still know it’s supposed to be “red.” And if it has some weird stickers on it, to you it is still fundamentally a stop sign. It’s just one that someone, for some reason, has defaced. A car, however, may instead “read” that sign as a speed limit sign, indicating it should go up to 45 miles per hour, with potentially disastrous results.
Similarly, a person looking at another person with a weird hairstyle and splotches of makeup on their face will see a human, sporting a weird hairstyle and with makeup on their face. But projects such as CV Dazzle have shown that, when applied in a certain way, makeup and hair styling can be used to make a person effectively invisible to facial recognition systems.
Heavy, patterned makeup and hair straight out of a JRPG are impractical for daily life, but all of us put on some kind of clothing to leave the house. As Ars’ own Jonathan Gitlin has described, the idea of using the “ugly shirt” to render oneself invisible to cameras has been a part of science fiction for a decade or more. But today, there are indeed computer scientists and artists working to make invisibility as simple as a shirt or a scarf… in theory, at least.
Digital and physical invisibility
Two decades of Harry Potter in the public imagination have cemented for millions the idea that a cloak of invisibility itself should be lightweight and hard to perceive. The reality, on the other hand, is not exactly subtle—and still very much a work in progress.
“If you wanted to do like a Mission Impossible-style heist of the Smithsonian, I don’t think you’d want to rely on this cloak to not be detected by a security system,” computer science professor Tom Goldstein, of the University of Maryland, told Ars in an interview.
Goldstein and a team of students late last year published a paper studying “adversarial attacks on state-of-the-art object detection frameworks.” In short, they looked at how some of the algorithms that allow for the detection of people in images work, then subverted them basically by tricking the code into thinking it was looking at something else.
It turns out, confounding software into not realizing what it’s looking at is a matter of fooling several different smaller systems at once.
Think about a person, for example. Now think of a person who looks nothing like that. And now do it again. Humanity, after all, contains multitudes, and a person can have many different appearances. A machine learning system needs to understand the diverse array of different inputs that, put together, mean “person.” A nose by itself won’t do it; an eye alone will not suffice; anything could have a mouth. But put dozens or hundreds of those priors together, and you’ve got enough for an object detector.
Code does not “think” in terms of facial features, the way a human does, but it does look for and classify features in its own way. To foil it, the “cloaks” need to interfere with most or all of those priors. Simply obscuring some of them is not enough. Facial recognition systems used in China, for example, have been trained to identify people who are wearing medical masks while trying to prevent the spread of COVID-19 or other illnesses.
And of course, to make the task even more challenging, different object detection frameworks all use different mechanisms to detect people, Goldstein explained. “We have different cloaks that are designed for different kinds of detectors, and they transfer across detectors, and so a cloak designed for one detector might also work on another detector,” he said.
But even when you get it to work across a number of different systems, making it work consistently is another layer.
“One of the things we did in our research was to quantify how often these things work and the variability of the scenarios in which they work,” he said. “Before, they got it to work once, and if the lighting conditions were different, for example, maybe it doesn’t work anymore, right?”
People move. We breathe, we turn around, we pass through light and shadow with different backgrounds, patterns, and colors around us. Making something that works when you’re standing in a plain white room, lit for a photo session, is different from making something that works when you’re shopping in a big-box store or walking down the street. Goldstein explained:
Modifying an image is different than modifying a thing, right? If you give me an image, I can say: we’ll make this pixel intensity over here different, make that pixel intensity over there a little more red, right? Because I have access to the pixels, I can change the individual bits that encode that image file.
But when I make a cloak, I don’t have that ability. I’m going to make a physical object and then a detector is going to input the image of me and pass the result to a computer. So when you have to make an adversarial attack in the physical world, it has to survive the detection process. And that makes it much more difficult to craft reliable attacks.
All of the digital simulations run on the cloak worked with 100-percent effectiveness, he added. But in the real world, “the reliability degrades.” The tech has room for improvement.
“How good can they get? Right now I think we’re still at the prototype stage,” he told Ars. “You can produce these things that, when you wear them in some situations, they work. It’s just not reliable enough that I would tell people, you know, you can put this on and reliably evade surveillance.”