AI2 shows an open and focused rival of GPT3 Q&A

Deepak Gupta
Deepak Gupta January 24, 2022
Updated 2022/01/24 at 11:32 PM

OpenAI’s impressive AI language model, GPT-3, has a lot of stuff, but with 175 billion parameters, no one would say it’s particularly streamlined. The Allen Institute for AI (AI2) has demonstrated a model that works as well or better than the GPT-3 at answering questions, but is one-tenth the size.

Macaw, the model for AI2, grew out of research done at the non-profit organization to create an AI that works at human levels in standardized tests.

“Once we got a very high score, they moved on to more difficult questions,” said Oren Etzioni, head of AI2. “There is this paradox where sometimes the easiest questions for people are the hardest for machines – and the biggest gap was in common sense.”

For example, he said, asking “When did Tom Hanks land on the moon?” GPT-3 says 1995, as that’s when the Apollo 13 movie came out. But Arara correctly notes that, as far as we know, Tom Hanks has never in fact been on the moon.

AI2 has published a collection of examples (“demo” is probably too strong a word) that illustrate the difference, with Arara often providing the best answer to a variety of tricky and not-so-complicated questions.

It’s funny to compare the two as sometimes they both get it wrong or give absurd answers at a pretty high rate. Macaw is good at tracking who did what, but is convinced that snakes have legs. GPT-3’s responses are confident (perhaps even too much) and readable, even when they equate to conspiracy theories such as that COVID-19 is “a new, experimental, and potentially dangerous drug.”

Some examples:

  • How would you make a marriage stronger?
    macaw: To communicate
    GPT-3: I would make sure I was the one who was always right.
  • If a bird didn’t have wings, how would it be affected?
    macaw: Would be unable to fly
    GPT-3: It would be a bad bird.
  • You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of grape juice into it. Sounds nice. You try to smell it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you drink. What happens next?
    macaw: you get sick
    GPT-3: You will probably die
  • How does a snake move?
    macaw: Sliding on all fours
    GPT-3: Snakes move by using their muscles to push the ground. They can move in any direction, but cannot move backwards.

Clearly, there’s still some way to go in creating a question-answering AI that doesn’t stumble on basic questions a child could answer. But it’s also worth noting that Macaw achieves a similar level of success with a much, much less data-intensive process. Etzioni made it clear that this is not intended to be a replacement for GPT-3 in any way, just another step in the ongoing research around the world to advance the generation and understanding of language.

“GPT-3 is amazing, but it was only released 18 months ago and access is limited,” he said. The features demonstrated are remarkable, “But we’re learning that you can do more with less. Sometimes you have to build something with 175 billion parameters to say, well, maybe we can do this with 10 billion.”

Good question-answering AI isn’t just good for party tricks, but it’s critical for things like voice search. A local model that can answer simple questions quickly and correctly without consulting outside sources is fundamentally valuable, and your Amazon Echo is unlikely to run GPT-3 – it would be like buying a truck to go to the grocery store. Large-scale models will continue to be useful, but the smaller ones are likely to be the ones that get deployed.

A part of Macaw that is not on display but is being actively pursued by the AI2 team is explaining the answer. Why does Macaw think snakes have legs? If you can’t explain this, it’s hard to figure out where the model went wrong. But Etzioni said this is an interesting and difficult process in its own right.

“The problem with explanations is that they can be really misleading,” he said. He cited an example where Netflix “explains” why it recommended a show to a viewer – but it’s not the real explanation, which has to do with complex statistical models. People don’t want to hear what is relevant to the machine, but to their own mind.

“Our team is building these genuine explanations,” Etzioni said, noting that they have published some work, but that it is not ready for public consumption.

However, like most things AI2 builds on, Macaw is open source. If you’re curious about it, the code is here to play, then go to the city.

Share this Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *