Fake News Challenge – Revised and Revisited

The organizers of the The Fake News Challenge have subjected it to a significant overhaul. In this light, many of my criticisms of the challenge no longer apply.

Some context:

Last month, I posted a critical piece addressing the fake news challenge. Organized by Dean Pomerleau and Delip Rao, the challenge aspires to leverage advances in machine learning to combat the epidemic viral spread of misinformation that plagues social media. The original version of the the challenge asked teams to take a claim, such as “Hillary Clinton eats babies”, and output a prediction of its veracity together with supporting documentation (links culled from the internet). Presumably, their hope was that an on-the-fly artificially-intelligent fact checker could be integrated into social media services to stop people from unwittingly sharing fake news.

My response criticized the challenge as both ill-specified (fake-ness not defined), circular (how do we know the supporting documents are legit?) and infeasible (are teams supposed to comb the entire web?)

Shortly after I posted these complaints, Dean reached out to me via the comment thread and directly by email. He acknowledged many of the problems and informed me that they had already arrived at some of the same conclusions themselves. Shortly afterwards, Dean emailed me a mock-up for the new version of the challenge, which launched roughly a week later. While I plan to keep the old post live, it also seems appropriate to update the record on Approximately Correct.

What’s new:

In version 2.0, the challenge pivoted away from the unrealistic goal of solving fake news outright. Instead they proposed the task of stance detection. Here’s the goal is to take a headline (from article 1) and body text (from article 2) and identify whether the body text agrees disagrees or discusses (without taking a position) with the claim.

This idea is considerably more modest than the original goal of the Fake News Challenge. For starters, stance detection algorithms have no ability to label any article as fake automatically. Instead, their idea is that this might be a useful tool for fact checkers. Given a dubious claim, the human fact checker might use the stance detection system to pull up a long list of articles that either support or disagree with the claim. The fact checker can then follow up on these leads and expertly evaluate the quality of the evidence presented.

It is not clear just how much easier such a tool would make the lives of fact checkers. However, the challenge strikes a reasonable compromise between ambition and feasibility. Many moonshot research efforts go nowhere precisely because they are unwilling to patiently endure the incrementalism that progress typically requires. If nothing else:

  1. In running the Fake News Challenge, the organizers are building a large dataset that might be re-usable for other fake news related challenges.
  2. They seem well-positioned to identify a core community of talented researchers committed to addressing societal challenges with machine learning.

I look forward to seeing how the challenge pans out and where they go next.

 

The Deception of Supervised Learning – V2

[This article is a revised version reposted with permission from KDnuggets]

Imagine you’re a doctor tasked with choosing a cancer therapy. Or a Netflix exec tasked with recommending movies. You have a choice. You could think hard about the problem and come up with some rules. But these rules would be overly simplistic, not personalized to the patient or customer. Alternatively, you could let the data decide what to do!

The ability to programmatically make intelligent decisions by learning complex decision rules from big data is a driving selling point of machine learning. Leaps forward in the predictive accuracy of supervised learning techniques, especially deep learning, now yield classifiers that outperform human predictive accuracy on many tasks. We can guess how an individual will rate a movie, classify images, or recognize speech with jaw-dropping accuracy. So why not make our services smart by letting the data tell us what to do?

Continue reading “The Deception of Supervised Learning – V2”