Is Fake News a Machine Learning Problem?

On Friday, Donald J. Trump was sworn in as the 45th president of the United States. The inauguration followed a bruising primary and general election, in which social media played an unprecedented role. In particular, the proliferation of fake news emerged as a dominant storyline. Throughout the campaign, explicitly false stories circulated through the internet’s echo chambers. Some fake stories originated as rumors, others were created for profit and monetized with click-based advertisements, and according to US Director of National Intelligence James Clapper, many fake news were orchestrated by the Russian government with the intention of influencing the results.  While it is not possible to observe the counterfactual, many believe that the election’s outcome hinged on the influence of these stories.

For context, consider one illustrative case as described by the New York Times. On November 9th, 35-year old marketer Erik Tucker tweeted a picture of several buses, claiming that they were transporting paid protesters to demonstrate against Trump. The post quickly went viral, receiving over 16,000 shares on Twitter and 350,000 shares on Facebook. Trump and his surrogates joined in, promoting the story through social media. Tucker’s claim turned out to be a fabrication. Nevertheless, it likely reached millions of people, more than many conventional news stories.

A number of critics cast blame on technology companies like Facebook, Twitter, and Google, suggesting that they have a responsibility to address the fake news epidemic because their algorithms influence who sees which stories. Some linked the fake news phenomenon to the idea that personalized search results and news feeds create a filter bubble, a dynamic in which readers only encounter stories that they are likely to click on, comment on, or like. As a consequence, readers might only encounter stories that confirm pre-existing beliefs.

Facebook, in particular, has been strongly criticized for their trending news widget, which operated (at the time) without human intervention, giving viral items a spotlight, however defamatory or false. In September, Facebook’s trending news box promoted a story titled ‘Michele Obama was born a man’. Some have wondered why Facebook, despite its massive investment in artificial intelligence (machine learning), hasn’t developed an automated solution to the problem.

Continue reading “Is Fake News a Machine Learning Problem?”

The Failure of Simple Narratives

Approximately Correct is not a political blog in any traditional sense. The mission is not to prognosticate elections, like FiveThirtyEight, nor to revel in the political circus, like Politico. And the common variety political writing seems antithetical to our goals. Today, political arguments tend to follow an anti-scientific pattern of choosing a perspective first and then selectively reaching for supporting evidence. It’s everything we should hope to avoid.

But, per our mission statement, this blog aims to address the intersection of scientific and technical developments with social issues. And social issues -the economy, the environment, healthcare, news curation, et al. – are necessarily political. Moreover, scientific practice requires dispassionate discourse and the ability to change one’s beliefs given new information. In this light, the abstention of scientists from political discourse seems irresponsible.

[An aside: Not all political issues are scientific or technical. The relative value of free speech vs the danger of hate speech may be an intrinsically subjective judgment. But many issues, such as global warming, explicitly exhibit scientific dimensions.]

Technical developments can necessitate policy shifts. Absent the capacity to warm the planet or the ability to detect such warming, one couldn’t justify strong reforms to energy policy. Additionally, absent scientific understanding of the likely effects of policy, one cannot argue effectively for or against them. So sober scientific analysis has a role to play not just in evaluating policies, but also in evaluating individual arguments.

Machine learning and data science interact with politics in a third important way. The political landscapes of entire nations are immense. Take last night’s presidential election for example. Roughly 120 million people voted in 3,007 counties, 435 congressional districts and 50 states. Hardly any citizens have visited every state. Not even the candidates could possibly visit every county. Thus, our sense of the nation’s pulse, and our narratives regarding the driving forces in the election are ultimately shaped by a mixture of second-hand accounts and data science (as by extensive polling).

Simplistic Narratives

Simplistic narratives and data science play off of each other. Narratives influence the questions that pollsters ask. And each poll result invites simplistic analysis. In the remainder of this post, without expressing my personal opinions, I’d like to give a dispassionate analysis of several popular stories that have risen to prominence during this election, sampled from across both the Democratic-Republican and establishment/anti-establishment divide. I choose these narratives neither because they are completely true nor completely false. Each presents a seemingly simple thesis that  belies more complex realities. To be as even-handed as possible, I’ve chosen one each from the Clinton-learning and Trump-leaning narratives. Continue reading “The Failure of Simple Narratives”