Is Fake News a Machine Learning Problem?

On Friday, Donald J. Trump was sworn in as the 45th president of the United States. The inauguration followed a bruising primary and general election, in which social media played an unprecedented role. In particular, the proliferation of fake news emerged as a dominant storyline. Throughout the campaign, explicitly false stories circulated through the internet’s echo chambers. Some fake stories originated as rumors, others were created for profit and monetized with click-based advertisements, and according to US Director of National Intelligence James Clapper, many fake news were orchestrated by the Russian government with the intention of influencing the results.  While it is not possible to observe the counterfactual, many believe that the election’s outcome hinged on the influence of these stories.

For context, consider one illustrative case as described by the New York Times. On November 9th, 35-year old marketer Erik Tucker tweeted a picture of several buses, claiming that they were transporting paid protesters to demonstrate against Trump. The post quickly went viral, receiving over 16,000 shares on Twitter and 350,000 shares on Facebook. Trump and his surrogates joined in, promoting the story through social media. Tucker’s claim turned out to be a fabrication. Nevertheless, it likely reached millions of people, more than many conventional news stories.

A number of critics cast blame on technology companies like Facebook, Twitter, and Google, suggesting that they have a responsibility to address the fake news epidemic because their algorithms influence who sees which stories. Some linked the fake news phenomenon to the idea that personalized search results and news feeds create a filter bubble, a dynamic in which readers only encounter stories that they are likely to click on, comment on, or like. As a consequence, readers might only encounter stories that confirm pre-existing beliefs.

Facebook, in particular, has been strongly criticized for their trending news widget, which operated (at the time) without human intervention, giving viral items a spotlight, however defamatory or false. In September, Facebook’s trending news box promoted a story titled ‘Michele Obama was born a man’. Some have wondered why Facebook, despite its massive investment in artificial intelligence (machine learning), hasn’t developed an automated solution to the problem.

Twitter personality and technology entrepreneur Dean Pomerleau went so far as to publicly harangue Yann LeCun, Facebook’s director of artificial intelligence over the failure to do something. He also ponied up $2,000 as prize money for a competition to identify fake news. Unfortunately, as Cade Metz of Wired identified in an article, the problem is harder than Pomerleau has acknowledged. A cursory look at the contest suggests that it might be underspecified.

According to the contest website, the input consists only of a string representing a claim or headline, e.g. “Climate change is a Martian takeover operation”, and the outputs are a four-tuple:

  1. Boolean fakeness indicator ({0,1})
  2. Confidence score (range unspecified)
  3. Provenance URL to support (from unspecified set, evaluation criteria also unspecified)
  4. A confidence threshold for accepting/rejecting (difference from item 2 unclear)

The competition suggests that both training data and development data will be provided. It also suggests that part of the development data will consist of part of the training data. What any of this data consists of remains unclear. Will the claims be dated? What if one claim, is made in year n but is untrue, however it subsequently becomes true in year n + t?

From his Twitter activity, Mr. Pomerleau appears a sub-ideal candidate to lead the AI community’s response to fake news. He frequently communicates through Tweetstorms. At the time of this writing, he has authored 31 tweets in the last 3 hours. Moreover, while his quest seems genuinely altruistic, it is disturbingly unfocused. In response to a gag mocking deep learning hype (the announcement of fake start-up RocketAI), Pomerleau insisted that comedians should append the hashtags #JOKE and #SARC to all satirical posts. The irony of promoting such heavy-handing censorship while proposing to combat propaganda seemed lost on him.

Still, despite these shortcomings, he is trying to do something, investing his time and reputation, with apparently good intentions. It’s an uncommon degree of social consciousness in an industry that often espouses mottos of making the world a better place, while doing little to back it up (see George Packer’s New Yorker article for this critique).

In this post, we’ll take a closer look at fake news detection from a machine learning perspective. In particular, we’ll inspect the following issues:

  • Is fake news a well-defined machine learning problem?
  • What are the inputs?
  • What are the outputs?
  • If we had a working fake news detector, how could it be deployed?

Is fake news a well-defined ML problem?

If we agree that fake news is a scourge afflicting social networks, then automating fake news detection with machine learning might sound like a great idea. Machine learning is already widely used to identify pornographic images from all major social networks. And spam detection has long been a successful applications of machine learning.

In a naive sense, automating any task that requires intelligence might be thought of as an artificial intelligence problem. Unfortunately (or fortunately), we don’t at present possess tools to automate all tasks requiring intelligence. Instead we possess a growing set of machine learning techniques that mine patterns from data to maximize well-defined objectives.

At present, product-ready machine learning systems perform supervised learning. Supervised learning algorithms take as input a dataset \mathcal{D} consisting of many examples \{x_1, ..., x_n \} and corresponding labels \{y_1, ..., y_2\}. The output of the learning algorithm is a function \hat{y}(x) that maps each input x onto a predicted output (\hat{y}).

In the case of spam detection, each input x_i is some represenation of an email and its metadata. Each label y_i is a binary label \{0,1\} indicating whether or not the email is spam. For pornography detection, each input is an image x_i \in \mathbbm{R}^{h\times w \times 3} and each y_i is again a binary label. In each case, it’s clear what the input should be and what form the output should take. It also seems likely that annotators will have a high level of agreement about what constitutes spam or pornography. On close examination, however, the fake news detection appears to be a much harder problem.

What are the inputs?

To start with, what should the inputs x_i to a fake news detector be? Per the website for Pomerleau’s competition, the input should be a single sentence containing a headline or claim, such as “Christ Turns Down 3-Year, Multimillion Dollar Deal To Coach Notre Dame“. This setup seems problematic. A single sentence does not provide enough context to say much about the document from which it was extracted. While technically, Jesus Christ has not been offered a coaching deal in the NBA, “fake news”  does not reasonably characterize the satirical article bearing this title.

Even given an entire document, and assuming that each statement in it should be interpreted earnestly, the veracity of a claim depends on the actual truth. The same claim might or might not amount to fake news depending on whether or not it’s true. “Donald Trump eats turtle for dinner” would be true if Donald Trump ate a turtle for dinner. And it wouldn’t be true if the event never occurred.

Implicit in Pomerleau’s competition is access to the entire internet. They suggest that URL’s could be combed for corroborating evidence and that each prediction should be submitted with an accompanying provenance URL (they offer no explanation for how the provenance URL would be factored into the competition’s evaluation methodology)

This setup remains problematic. First, what if the article is the first to break the news? The first story to break news cannot possible be corroborated by pre-existing published information. Second, the state of machine learning techniques for question answering with knowledge bases is fairly primitive. Presently, this work is confined to small, neatly curated databases. Drawing upon the entire internet is a tall order and a $2,000 purse seems unlikely to alter the course of this research.

Moreover, even if URLs supporting the claim can be found, this may not solve fake news so much as punt the problem one hop away. Instead of fake news detection, we would simply be converting the problem into  fake provenance URL detection.

What are the outputs?

While it’s convenient to express problems as  binary classification, this may not be reasonable for fake news detection. Are there two neatly identifiable categories for fake and not fake news? Could annotators agree upon them?

Some questions jump out at the outset. Dow low quality news count as fake news? When a well-meaning journalist makes a factual mistake in an otherwise correct story, does the piece qualify as fake news? What about when the journalist makes two mistakes? At what point does an article rise to the level of fake news? Is fake news fundamentally too fuzzy a concept to classify? Does fake news status depend upon the intent to deceive? If so, could two people post the same article but for one author it’s fake news and not for the other?

Moreover, how should a fake news detector respond to a claim whose truth value can not be determined? Opinions are neither facts or lies. It seems reasonable that even at the sentence level, it might be necessary to formulate a multi-class or multi-label classification task with a more granular set of output categories.  At the article level, categories should distinguish between outright fabrications, stories with a few inaccurate claims, stories that reference debunked claims, opinion pieces, humor pieces, among others.

If we had a working automatic fake news detector, how could it be deployed?

Even if we had a solid problem formulation, and reasonably harmonious annotators, it’s not clear what the best path might be to incorporate a fake news detector into existing services. Some possibilities to consider:

  1. Use the fake news detector to block articles: This approach might be the most straight-forward. In limited cases, as for Facebook’s Trending widget, it might be appropriate. But outright censorship (as, say, of the News Feed) eradicates misinformation at the expense of free speech. If we think of Facebook as taking the place of the public square, then explicit censorship may not be acceptable.
  2. Warnings to readers: Likely, as fake articles spread through online filter bubbles, many people who share the articles, actually believe them to be true. Potentially, a fake news detector could be used to attach a warning to advise readers that an article contains debunked claims or comes from an unreliable source.
  3. Aid human reviewers: Another likely scenario is that in the short run, machine learning won’t be able to sift enough background information, or do enough detective work to conclusively recognize fake news articles. But if the systems were good enough to provide a high recall, low precision filter, they might be useful to assist human annotators by narrowing down the list of links to investigate.

The Short-Run Conclusion

Fake news presents a serious problem. The infrastructure of democracy depends upon an informed citizenry. And likely, there’s more that today’s media technology giants can do to combat explicit propaganda from propagating through social networks and dominating search results.

However, it’s not clear that machine learning offers the best hope for near-term solutions. Perhaps crowdsourcing may offer greater hope. The task of distinguishing fact from fiction is not new. Wikipedia may be the most prolific assembler of facts the world. The encyclopedia is kept up-to-date and maintains surprisingly high overall quality despite many attempts to deface or manipulate its content. While Wikipedia does not explicitly vet each article that is posted around the internet, it does  assemble information in real time, and faces the problem of excluding fabricated content. To date, crowdsourcing has demonstrated more short-run potential than machine learning for performing accurate and flexible fact-checking.

Automatic fake news detection warrants further investigation and deeper critical thought. But we should resist the temptation to confuse the sentiments “It would be nice if machine learning could solve this” and “machine learning is the best tool for this job”.

[Following changes to the Fake News Challenge, I posted an updated analysis here]

Author: Zachary C. Lipton

Zachary Chase Lipton is a PhD student in the Computer Science Engineering department at the University of California, San Diego. He is interested in both theoretical foundations and applications of machine learning. In addition to his work at UCSD, he has worked with Microsoft Research Redmond, Microsoft Research Bangalore, and Amazon Core Machine Learning.

6 thoughts on “Is Fake News a Machine Learning Problem?”

    1. This is a funny comment. Interpreting it earnestly, on the surface, throwing fake news to the crowd by any arbitrary process and expecting them to sort it out its a bad idea. But this isn’t how Wikipedia operates. There are some strict guidelines, a well-planned hierarchy of editing privileges, and remarkably the system tends to do a good job.

      I wouldn’t underestimate the importance in crowd-sourcing of having a well-designed process and cultivating a strong, self-regulating community. Look at the difference between Stack-Overflow and Quora (reasonably high quality crowd-sourcing platforms), and Yahoo Answers (if it were any worse, it would be more informative).

  1. Hi Zack,

    I wish you’d talked to me first or looked at our discussions on the Fake News Challenge public slack channel (at FakeNewsChallenge.org) before posting. We’ve discussed ALL the issues you’ve raised and many more in great depth.

    I actually agree with much of what you’ve said here, and much of it is fair criticism of the Fake News Challenge as we ORIGINALLY specified it, and as it is currently defined on our website.

    That’s why we’re are about to release the FINAL specification and training data for the Fake News Challenge competition, which is VERY different from what you’ve read on our outdated website. Among the changes is the fact that the ML task we’re giving teams is no longer ‘truth labelling’ of brief claims as originally specified.

    We’ve changed the task because, as you rightfully point out, ‘truth labelling’ is an ill-specified task that (among other things) lacks sufficient training data to be tractable using existing ML techniques.

    I’d love to talk to you more about it either via our slack channel, skype or email. In fact, we’d love to have you become a member of our advisory board if you’d be willing to provide us with feedback, especially on the NEW problem definition and training set we’re about to release, so that teams can get cracking!

    Please contact me.

    Dean Pomerleau
    Co-Organizer & Co-Sponsor
    Fake News Challenge

    1. Hi Dean –

      Thanks for your reply. I won’t comment on the new version of the fake news challenge here as 1) it’s not yet made public, and 2) it wasn’t available at the time of writing. Per your private comments, I’ve taken the conversation offline (email) and am happy to provide some feedback on new iterations of the fake news challenge.

  2. My honest belief is that together you would both bring a lot to the fake news challenge table. Have you considered doing so as a Google 10X tech moonshot?

    I believe it was Salvador Dali who said ~ do not fear perfection for you will never achieve it ~ and Vince Lombardi who suggested we should strive for perfection and settle for excellence!

    I see the fake news challenge challenge as an impossible dream we should strive for nonetheless.

Leave a Reply

Your email address will not be published. Required fields are marked *