Hope Returns to the Machine Learning Universe

If you’re not living under a rock, then you’ve surely encountered the Heroes of Deep Learning, an inspiring, diverse band of Deep Learning all-stars whose sheer grit, determination, and—[dare we say?]—genius, catalyzed the earth-shaking revolution that has brought to market such technological marvels as DeepFakes, GPT-7, and Gary Marcus.

But these are no ordinary times. And as the world contends with a rampaging virus, incendiary wildfires, and smouldering social unrest, no ordinary heroes will suffice. However, you needn’t fear. Hope has returned to the Machine Learning Universe, and boy, oh boy the timing couldn’t be better.

As confirmed to us by several independent witnesses, the sun, moon, and stars have been joined in the night’s sky by new, supernatural, sights. After a months-long meticulous investigation, including consultations with NASA, MI6, and Singularity University, we can confirm the presence, on Earth, of the Superheroes of Deep Learning!

Continue reading “Hope Returns to the Machine Learning Universe”

OpenAI Trains Language Model, Mass Hysteria Ensues

On Thursday, OpenAI announced that they had trained a language model. They used a large training dataset and showed that the resulting model was useful for downstream tasks where training data is scarce. They announced the new model with a puffy press release, complete with this animation (below) featuring dancing text. They demonstrated that their model could produce realistic-looking text and warned that they would be keeping the dataset, code, and model weights private. The world promptly lost its mind.

For reference, language models assign probabilities to sequences of words. Typically, they express this probability via the chain rule as the product of probabilities of each word, conditioned on that word’s antecedents p(w_1,...,w_n) = p(w_1)\cdot p(w_2|w_1) \cdot p(w_n|w_1,...,w_{n-1}). Alternatively, one could train a language model backwards, predicting each previous word given its successors. After training a language model, one typically either 1) uses it to generate text by iteratively decoding from left to right, or 2) fine-tunes it to some downstream supervised learning task.

Training large neural network language models and subsequently applying them to downstream tasks has become an all-consuming pursuit that describes a devouring share of the research in contemporary natural language processing.

Continue reading “OpenAI Trains Language Model, Mass Hysteria Ensues”

Is This a Paper Review?

With paper submissions rocketing and the pool of experienced researchers stagnant, machine learning conferences, backs to the wall, have made the inevitable choice to inflate the ranks of peer reviewers, in the hopes that a fortified pool might handle the onslaught.

Infographic depicting NIPS submissions over time. The red bar plots fabricated data from the future.

With nearly every professor and senior grad student already reviewing at capacity, conference organizers have gotten creative, finding reviewers in unlikely places. Reached for comment, ICLR’s program chairs declined to reveal their strategy for scouting out untapped reviewing talent, indicating that these trade secrets might be exploited by rivals NeurIPS and ICML. Fortunately, on condition of anonymity, several (less senior) ICLR officials agreed to discuss a few unusual sources they’ve tapped:

  1. All of /r/machinelearning
  2. Twitter users who follow @ylecun
  3. Holders of registered .ai & .ml domains
  4. Commenters from ML articles posted to Hacker News
  5. YouTube commenters on Siraj Raval deep learning rap videos
  6. Employees of entities registered as owners of .ai & .ml domains
  7. Everyone camped within 4° of Andrej Karpathy at Burning Man
  8. GitHub handles forking TensorFlow, Pytorch, or MXNet in last 6 mos.
  9. A joint venture with Udacity to make reviewing for ICLR a course project for their Intro to Deep Learning class

Continue reading “Is This a Paper Review?”

Troubling Trends in Machine Learning Scholarship

By Zachary C. Lipton* & Jacob Steinhardt*
*equal authorship

Originally presented at ICML 2018: Machine Learning Debates [arXiv link]
Published in Communications of the ACM

1   Introduction

Collectively, machine learning (ML) researchers are engaged in the creation and dissemination of knowledge about data-driven algorithms. In a given paper, researchers might aspire to any subset of the following goals, among others: to theoretically characterize what is learnable, to obtain understanding through empirically rigorous experiments, or to build a working system that has high predictive accuracy. While determining which knowledge warrants inquiry may be subjective, once the topic is fixed, papers are most valuable to the community when they act in service of the reader, creating foundational knowledge and communicating as clearly as possible.

What sort of papers best serve their readers? We can enumerate desirable characteristics: these papers should (i) provide intuition to aid the reader’s understanding, but clearly distinguish it from stronger conclusions supported by evidence; (ii) describe empirical investigations that consider and rule out alternative hypotheses [62]; (iii) make clear the relationship between theoretical analysis and intuitive or empirical claims [64]; and (iv) use language to empower the reader, choosing terminology to avoid misleading or unproven connotations, collisions with other definitions, or conflation with other related but distinct concepts [56].

Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on the following four patterns that appear to us to be trending in ML scholarship:

  1. Failure to distinguish between explanation and speculation.
  2. Failure to identify the sources of empirical gains, e.g. emphasizing unnecessary modifications to neural architectures when gains actually stem from hyper-parameter tuning.
  3. Mathiness: the use of mathematics that obfuscates or impresses rather than clarifies, e.g. by confusing technical and non-technical concepts.
  4. Misuse of language, e.g. by choosing terms of art with colloquial connotations or by overloading established technical terms.

AI Researcher Joins Johnson & Johnson, to Make More than $19 Squillion

Three weeks ago, New York Times reporter Cade Metz sent shockwaves through society with a startling announcement that A.I. researchers were making more than $1 Million dollars, even at a nonprofit!

AI super-hero and newly minted squillionaire Zachary Chase Lipton feeds a wallaby bitcoins while vacationing on Elon Musk’s interplanetary animal preserve on the Martian plains.

Within hours, I received multiple emails. Parents, friends, old classmates, my girlfriend all sent emails. Did you see the article? Maybe they wanted me to know what riches a life in private industry had in store for me? Perhaps they were curious if I was already bathing in Cristal, shopping for yachts, or planning to purchase an atoll among the Maldives? Perhaps the communist sympathizers in my social circles had renewed admiration for my abstention from such extreme opulence.

Continue reading “AI Researcher Joins Johnson & Johnson, to Make More than $19 Squillion”

Portfolio Approach to AI Safety Research

[This article originally appeared on the Deep Safety blog.]

dimensionsLong-term AI safety is an inherently speculative research area, aiming to ensure safety of advanced future systems despite uncertainty about their design or algorithms or objectives. It thus seems particularly important to have different research teams tackle the problems from different perspectives and under different assumptions. While some fraction of the research might not end up being useful, a portfolio approach makes it more likely that at least some of us will be right.

In this post, I look at some dimensions along which assumptions differ, and identify some underexplored reasonable assumptions that might be relevant for prioritizing safety research. In the interest of making this breakdown as comprehensive and useful as possible, please let me know if I got something wrong or missed anything important.

Continue reading “Portfolio Approach to AI Safety Research”

Death Note: Finally, an Anime about Deep Learning

It’s about time someone developed an anime series about deep learning. In the last several years, I’ve paid close attention to deep learning. And while I’m far from an expert on anime, I’ve watched a nonzero number of anime cartoons. And yet through neither route did I encounter even one single anime about deep learning.

There were some close calls. Ghost in the Shell gives a vague pretense of addressing AI. But the character might as well be a body-jumping alien. Nothing in this story speaks to the reality of machine learning research.

In Knights of Sidonia, if you can muster the superhuman endurance required to follow the series past its only interesting season, you’ll eventually find out that the flying space-ship made out of remnants of Earth on which Tanikaze and friends photosynthesize, while taking breaks from fighting space monsters, while wearing space-faring versions of mecha suits … [breath] contains an artificially intelligent brain-emulating parasitic nematode. But no serious consideration of ML appears.

If you were looking to anime for a critical discourse on artificial intelligence, until recently you’d be disappointed.

Continue reading “Death Note: Finally, an Anime about Deep Learning”

Machine Learning Security at ICLR 2017

(This article originally appeared here. Thanks to Janos Kramar for his feedback on this post.)

The overall theme of the ICLR conference setting this year could be summarized as “finger food and ships”. More importantly, there were a lot of interesting papers, especially on machine learning security, which will be the focus on this post. (Here is a great overview of the topic.)

food-and-ships

On the attack side, adversarial perturbations now work in physical form (if you print out the image and then take a picture) and they can also interfere with image segmentation. This has some disturbing implications for fooling vision systems in self-driving cars, such as impeding them from recognizing pedestrians. Adversarial examples are also effective at sabotaging neural network policies in reinforcement learning at test time.

Continue reading “Machine Learning Security at ICLR 2017”

DeepMind Solves AGI, Summons Demon

In recent years, the rapid advance of artificial intelligence has evoked cries of alarm from billionaire entrepreneur Elon Musk and legendary physicist Stephen Hawking. Others, including the eccentric futurist Ray Kurzweil, have embraced the coming of true machine intelligence, suggesting that we might merge with the computers, gaining superintelligence and immortality in the process. As it turns out, we may not have to wait much longer.

This morning, a group of research scientists at Google DeepMind announced that they had inadvertently solved the riddle of artificial general intelligence (AGI). Their approach relies upon a beguilingly simple technique called symmetrically toroidal asynchronous  bisecting convolutions. By the year’s end, Alphabet executives expect that these neural networks will exhibit fully autonomous self-improvement. What comes next may affect us all.

Continue reading “DeepMind Solves AGI, Summons Demon”