A few years ago, a computer scientist named Yejin Choi gave a presentation at an artificial intelligence conference in New Orleans. On a screen, she projected an image of a television news where two anchors appeared before the title “STAB CHEESEBURGER.” Choi explained that human beings find it easy to discern the outlines of history from these two words alone. Did someone stab a cheeseburger? Probably not. Was a cheeseburger used to stab someone? Also unlikely. A cheeseburger had stabbed a cheeseburger? Impossible. The only plausible scenario was that someone had stabbed someone else for a cheeseburger. Computers, Choi said, are puzzled by this kind of problem. They don’t have the common sense to rule out the possibility of a food crime.
For certain types of tasks (playing chess, detecting tumors), artificial intelligence can rival or surpass human thought. But the wider world presents endless unforeseen circumstances, and there the AI often stumbles. The researchers speak of “corner cases”, which are located on the periphery of the probable or anticipated; in such situations, human minds can rely on common sense to carry them through, but AI systems, which rely on prescribed rules or learned associations, often fail.
By definition, common sense is something everyone has; that doesn’t sound like a big deal. But imagine living without it and it becomes clearer. Suppose you are a robot visiting a carnival and you are confronted with a funny mirror; devoid of common sense, you might wonder if your body has suddenly changed. On the way back, you see that a fire hydrant has burst, flooding the road; you cannot determine if it is safe to drive through the jet. You park in front of a pharmacy and a man on the sidewalk cries out for help, bleeding profusely. Are you allowed to pick up bandages at the store without queuing to pay? At home, there’s a news report – something about a stabbed cheeseburger. As a human being, you can tap into a vast reservoir of tacit knowledge to interpret these situations. You do it all the time, because life is in the corners. AIs are likely to get stuck.
Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, in Seattle, told me that common sense is “the dark matter” of AI “It” so much shapes what we do and what we need to do, and yet it is ineffable,” he added. The Allen Institute is working on the subject with the Defense Advanced Research Projects Agency (DARPA), which launched a four-year, seventy million dollar effort called Machine Common Sense in 2019. If computer scientists could bring their AI systems to common sense, many thorny problems would be solved. Like a journal article noted, the AI looking at a piece of wood above a table would know that it was probably part of a chair, rather than a random plank. A linguistic translation system could disentangle ambiguities and double meanings. A cleaning robot would understand that a cat should not be thrown out or placed in a drawer. Such systems could work in the world because they possess the kind of knowledge we take for granted.
[Support The New Yorker’s award-winning journalism. Subscribe today »]
In the 1990s, questions about AI and security helped Etzioni begin to study common sense. In 1994 he co-authored a paper attempting to formalize the “first law of robotics” – a fictional rule in Isaac Asimov’s science fiction novels that states that “a robot cannot harm a human being or, through inaction, allow a human being to come to harm. The problem, he discovered, was that computers have no concept of danger. This kind of understanding would require a broad and basic understanding of the needs, values and priorities of a person; without it, mistakes are almost inevitable. In 2003, philosopher Nick Bostrom imagined an AI program to maximize the production of paperclips; he realizes that people could turn it off and gets rid of it to accomplish its mission.
Bostrom’s paperclip AI lacks moral common sense – it might figure messy, uncut documents are a form of evil. But perceptual common sense is also a challenge. In recent years, computer scientists have begun cataloging examples of “contradictory” contributions– small changes in the world that confuse computers trying to find their way around. In one study, the strategic placement of a few small stickers on a stop sign caused a computer vision system to see it as a speed limit sign. In another study, subtly altering the pattern of a 3D-printed turtle allowed an AI computer program to see it as a gun. A savvy AI wouldn’t be so easily confused: it would know that guns don’t have four legs and a cartridge.
Choi, who teaches at the University of Washington and works with the Allen Institute, told me that in the 1970s and 1980s, artificial intelligence researchers thought they were about to program common sense into computers. “But then they realized ‘Oh, this is just too hard,'” she said; instead they turned to “easier” problems, such as object recognition and language translation. Today the picture is different. Many AI systems, such as driverless cars, may soon work alongside us in the real world on a regular basis; this heightens the need for artificial common sense. And common sense can also be more accessible. Computers learn best on their own, and researchers learn to feed them the right kinds of data. AI may soon cover more corners.
How do human beings acquire common sense? The short answer is that we are multifaceted learners. We try things and see the results, read books and listen to instructions, absorb in silence and reason for ourselves. We fall on our faces and watch others make mistakes. AI systems, on the other hand, are not as comprehensive. They tend to follow one path to the exclusion of all others.
Early researchers went the route of explicit instructions. In 1984, a computer scientist named Doug Lenat started building Cyc, a kind of common-sense encyclopedia based on axioms, or rules, that explain how the world works. An axiom might hold that owning something means owning its parts; another might describe how hard things can damage soft things; a third might explain that flesh is softer than metal. Combine the axioms and you come to common-sense conclusions: if the bumper of your driverless car hits someone’s leg, you’re responsible for the injury. “It’s basically representing and reasoning in real time with complicated nested modal expressions,” Lenat told me. Cycorp, the company that owns Cyc, is still in business, and hundreds of logicians have spent decades inputting tens of millions of axioms into the system; the company’s products are shrouded in secrecy, but Stephen DeAngelis, CEO of Enterra Solutions, which advises manufacturing and retail companies, told me his software can be powerful. He offered a culinary example: Cyc, he said, has enough common-sense knowledge of the “flavor profiles” of various fruits and vegetables to think that, although a tomato is a fruit, it shouldn’t not get into a fruit salad.
Academics tend to view Cyc’s approach as outdated and labor-intensive; they doubt that the nuances of common sense can be captured by axioms. Instead, they focus on machine learning, the technology behind Siri, Alexa, Google Translate and other services, which works by detecting patterns in large amounts of data. Instead of reading an instruction manual, machine learning systems analyze the library. In 2020, the OpenAI research lab revealed a machine learning algorithm called GPT-3; he examined text from the World Wide Web and discovered linguistic patterns that allowed him to produce presumably human writing from scratch. GPT-3’s mimicry is amazing in some ways, but disappointing in others. The system can still produce strange statements: for example, “It takes two rainbows to jump from Hawaii at seventeen”. If GPT-3 had any sense, it would know that rainbows aren’t units of time and seventeen isn’t a place.
Choi’s team tries to use language models like GPT-3 as a springboard to common sense. In one line of research, they asked GPT-3 to generate millions of plausible, common-sense statements describing causes, effects, and intentions, for example: “Before Lindsay received a job offer, Lindsay must apply”. They then asked a second machine learning system to analyze a filtered set of these statements, with a view to answering fill-in-the-blank questions. (“Alex makes Chris wait. Alex is considered . . .”) -three percent common sense.
Choi’s lab did something similar with short videos. She and her collaborators first created a database of millions of subtitled clips, then had a machine learning system analyze them. Meanwhile, online crowdworkers – internet users who perform tasks for pay – composed multiple-choice questions on still images taken from a second set of clips, which the AI had never seen, and multiple-choice questions asking for justifications for the answer. A typical image, from the film “Swingers”, shows a waitress delivering pancakes to three men in a restaurant, one of the men pointing at another. In response to the question “Why [person4] pointing to [person1]?”, the system indicated that the man pointing “indicated [person3] this [person1] ordered the pancakes. Asked to explain its response, the program said that “[person3] delivers food to the table, and she might not know who the order belongs to. AI answered questions sensibly seventy-two percent of the time, compared to eighty-six percent for humans. Such systems are impressive – they seem to have enough common sense to understand everyday situations in terms of physics, cause and effect and even psychology. It’s like they know that people eat pancakes in restaurants, that each restaurant has a different order, and pointing fingers is a way to provide information.