Do I really have to cite an arXiv paper?

With peak submission season for machine learning conferences just behind us, many in our community have peer-review on the mind. One especially hot topic is the arXiv preprint service. Computer scientists often post papers to arXiv in advance of formal publication to share their ideas and hasten their impact.

Despite the arXiv’s popularity, many authors are peeved, pricked, piqued, and provoked by requests from reviewers that they cite papers which are only published on the arXiv preprint.

“Do I really have to cite arXiv papers?”, they whine.

“Come on, they’re not even published!,” they exclaim.

The conversation is especially testy owing to the increased use (read misuse) of the arXiv by naifs. The preprint, like the conferences proper is awash in low-quality papers submitted by band-wagoners. Now that the tooling for deep learning has become so strong, it’s especially easy to clone a repo, run it on a new dataset, molest a few hyper-parameters, and start writing up a draft.

Of particular worry is the practice of flag-planting. That’s when researchers anticipate that an area will get hot. To avoid getting scooped / to be the first scoopers, authors might hastily throw an unfinished work on the arXiv to stake their territory: we were the first to work on X. All that follow must cite us. In a sublimely cantankerous rant on Medium, NLP/ML researcher Yoav Goldberg blasted the rising use of the (mal)practice. Continue reading “Do I really have to cite an arXiv paper?”

The Futurist’s Dilemma

The following passage is a musing on the futility of futurism. While I present a perspective, I am not married to it.

When I sat down to write this post, I briefly forgot how to spell “dilemma”. Fortunately, Apple’s spell-check magnanimously corrected me. But it seems likely, if I were cast away on an island without any automatic spell-checkers or other people to subject my brain to the cold slap of reality, that my spelling would slowly deteriorate.

And just yesterday, I had a strong intuition about trajectories through weight-space taken by neural networks along an optimization path. For at least ten minutes, I was reasonably confident that a simple trick might substantially lower the number of updates (and thus the time) it takes to train a neural network.

But for the ability to test my idea against an unforgiving reality, I might have become convinced of its truth. I might have written a paper, entitled “NO Need to worry about long training times in neural networks” (see real-life inspiration for farcical clickbait title). Perhaps I might have founded SGD-Trick University, and schooled the next generation of big thinkers on how to optimize neural networks.

Continue reading “The Futurist’s Dilemma”

Notes on Response to “The AI Misinformation Epidemic”

On Monday, I posted an article titled The AI Misinformation Epidemic. The article introduces a series of posts that will critically examine the various sources of misinformation underlying this AI hype cycle.

The post came about for the following reason: While I had contemplated the idea for weeks, I couldn’t choose which among the many factors to focus on and which to exclude. My solution was to break down the issue into several narrower posts. The AI Machine Learning Epidemic introduced the problem, sketched an outline for the series, and articulated some preliminary philosophical arguments.

To my surprise, it stirred up a frothy reaction. In a span of three days, the site received over 36,000 readers. To date, the article received 68 comments on the original post, 274 comments on hacker news, and 140 comments on machine learning subreddit.

To ensure that my post contributes as little novel misinformation as possible, I’d like to briefly address the response to the article and some common misconceptions shared by many comments. Continue reading “Notes on Response to “The AI Misinformation Epidemic””

Mission Statement

This post introduces The aspiration for this blog is to offer a critical perspective on machine learning. We intend to cover both technical issues and the fuzzier problems that emerge when machine learning intersects with society.

For explaining the technical details of machine learning, we enter a lively field. As recent breakthroughs in machine learning have attracted mainstream interest, many blogs have stepped up to provide high quality tutorial content. But at present, critical discussions on the broader effects of machine learning lag behind technical progress.

On one hand, this seems natural. First a technology must exist before it can have an effect. Consider the use of machine learning for face recognition. For many decades, the field has accumulated extensive empirical knowledge. But until recently, with the technology reduced to practice, any consideration of how it might be used could only be speculative.

But the precarious state of the critical discussion owes to more than chronology. It also owes to culture, and to the rarity of the relevant interdisciplinary expertise. The machine learning community traditionally investigates scientific questions. Papers address well-defined theoretical problems, or empirically compare methods with well-defined objectives. Unfortunately, many pressing issues at the intersection of machine learning and society do not admit such crisp formulations. But, with notable exceptions, consideration of social issues within the machine learning community remains too rare.

Conversely, those academics and journalists best equipped to consider economic and social issues rarely possess the requisite understanding of machine learning to anticipate the plausible ways the two arenas might intersect. As a result, coverage in the mainstream consistently misrepresents the state of research, misses many important problems, and hallucinates others. Too many articles address Terminator scenarios, overstating the near-term plausibility of human-like machine consciousness, assume the existence (at present) of self-motivated machines with their own desiderata, etc. Too few consider the precise ways that machine learning may amplify biases or perturb the job market.

In short, we see this troubling scenario:

  1. Machine learning models increasingly find industrial use, assisting in credit decisions, recognizing faces in police databases, curating the news on social networks, and enabling self-driving cars. 
  2. The majority of knowledgeable people in the machine learning community, with notable exceptions, are not in the habit of considering the relevant economic, social, and other philosophical issues.
  3. Those in the habit of considering the relevant issues rarely possess the relevant machine learning expertise.

Complicating matters, mainstream discussion of AI-related technologies introduces speculative or spurious ideas alongside sober ones without communicating uncertainty clearly. For example, the likelihood of a machine learning classifier making a mistake on a new example, and the likelihood of a machine learning causing massive unemployment and the likelihood of the entire universe being a simulation run by agents in some meta-universe are all discussed as though they can be a assigned some common form of probability.

Compounding the lack of rigor, we also observe a hype cycle in the media. PR machines actively promote sensational views of the work currently done in machine learning, even as the researchers doing that work view the hype with suspicion. The press also has an incentive to run with the hype: sensational news sells papers. The New York Times has referenced “Terminator” and “artificial intelligence” in the same story 5,000 times. It’s referenced “Terminator” and “machine learning” together in roughly 750 stories.

In this blog, we plan to bridge the gap between technical and critical discussions, treating both methodology and consequences as first-class concerns. 

One aspiration of this blog will be to communicate honestly about certainty. We hope to maintain a sober, academic voice, even when writing informally or about issues that aren’t strictly technical. While many posts will express opinions, we aspire to clearly indicate which statements are theoretical facts, which may be speculative but reflect a consensus of experts, and which are wild thought experiments. We also plan to discuss both immediate issues such as employment alongside more speculative consideration of what future technology we might anticipate. In all cases we hope to clearly indicate scope. In service of this goal, and in reference to the theory of learning, we adopt the name Approximately Correct. We hope to be as correct as possible as often as possible, and to honestly convey our confidence.