Fake News Challenge – Revised and Revisited

The organizers of the The Fake News Challenge have subjected it to a significant overhaul. In this light, many of my criticisms of the challenge no longer apply.

Some context:

Last month, I posted a critical piece addressing the fake news challenge. Organized by Dean Pomerleau and Delip Rao, the challenge aspires to leverage advances in machine learning to combat the epidemic viral spread of misinformation that plagues social media. The original version of the the challenge asked teams to take a claim, such as “Hillary Clinton eats babies”, and output a prediction of its veracity together with supporting documentation (links culled from the internet). Presumably, their hope was that an on-the-fly artificially-intelligent fact checker could be integrated into social media services to stop people from unwittingly sharing fake news.

My response criticized the challenge as both ill-specified (fake-ness not defined), circular (how do we know the supporting documents are legit?) and infeasible (are teams supposed to comb the entire web?)

Shortly after I posted these complaints, Dean reached out to me via the comment thread and directly by email. He acknowledged many of the problems and informed me that they had already arrived at some of the same conclusions themselves. Shortly afterwards, Dean emailed me a mock-up for the new version of the challenge, which launched roughly a week later. While I plan to keep the old post live, it also seems appropriate to update the record on Approximately Correct.

What’s new:

In version 2.0, the challenge pivoted away from the unrealistic goal of solving fake news outright. Instead they proposed the task of stance detection. Here’s the goal is to take a headline (from article 1) and body text (from article 2) and identify whether the body text agrees disagrees or discusses (without taking a position) with the claim.

This idea is considerably more modest than the original goal of the Fake News Challenge. For starters, stance detection algorithms have no ability to label any article as fake automatically. Instead, their idea is that this might be a useful tool for fact checkers. Given a dubious claim, the human fact checker might use the stance detection system to pull up a long list of articles that either support or disagree with the claim. The fact checker can then follow up on these leads and expertly evaluate the quality of the evidence presented.

It is not clear just how much easier such a tool would make the lives of fact checkers. However, the challenge strikes a reasonable compromise between ambition and feasibility. Many moonshot research efforts go nowhere precisely because they are unwilling to patiently endure the incrementalism that progress typically requires. If nothing else:

  1. In running the Fake News Challenge, the organizers are building a large dataset that might be re-usable for other fake news related challenges.
  2. They seem well-positioned to identify a core community of talented researchers committed to addressing societal challenges with machine learning.

I look forward to seeing how the challenge pans out and where they go next.

 

Is Fake News a Machine Learning Problem?

On Friday, Donald J. Trump was sworn in as the 45th president of the United States. The inauguration followed a bruising primary and general election, in which social media played an unprecedented role. In particular, the proliferation of fake news emerged as a dominant storyline. Throughout the campaign, explicitly false stories circulated through the internet’s echo chambers. Some fake stories originated as rumors, others were created for profit and monetized with click-based advertisements, and according to US Director of National Intelligence James Clapper, many fake news were orchestrated by the Russian government with the intention of influencing the results.  While it is not possible to observe the counterfactual, many believe that the election’s outcome hinged on the influence of these stories.

For context, consider one illustrative case as described by the New York Times. On November 9th, 35-year old marketer Erik Tucker tweeted a picture of several buses, claiming that they were transporting paid protesters to demonstrate against Trump. The post quickly went viral, receiving over 16,000 shares on Twitter and 350,000 shares on Facebook. Trump and his surrogates joined in, promoting the story through social media. Tucker’s claim turned out to be a fabrication. Nevertheless, it likely reached millions of people, more than many conventional news stories.

A number of critics cast blame on technology companies like Facebook, Twitter, and Google, suggesting that they have a responsibility to address the fake news epidemic because their algorithms influence who sees which stories. Some linked the fake news phenomenon to the idea that personalized search results and news feeds create a filter bubble, a dynamic in which readers only encounter stories that they are likely to click on, comment on, or like. As a consequence, readers might only encounter stories that confirm pre-existing beliefs.

Facebook, in particular, has been strongly criticized for their trending news widget, which operated (at the time) without human intervention, giving viral items a spotlight, however defamatory or false. In September, Facebook’s trending news box promoted a story titled ‘Michele Obama was born a man’. Some have wondered why Facebook, despite its massive investment in artificial intelligence (machine learning), hasn’t developed an automated solution to the problem.

Continue reading “Is Fake News a Machine Learning Problem?”