AI safety going mainstream at NIPS 2017

[This article originally appeared on the Deep Safety blog.]


This year’s NIPS gave me a general sense that near-term AI safety is now mainstream and long-term safety is slowly going mainstream. On the near-term side, I particularly enjoyed Kate Crawford’s keynote on neglected problems in AI fairness, the ML security workshops, and the Interpretable ML symposium debate that addressed the “do we even need interpretability?” question in a somewhat sloppy but entertaining way. There was a lot of great content on the long-term side, including several oral / spotlight presentations and the Aligned AI workshop.

Value alignment papers

Inverse Reward Design (Hadfield-Menell et al) defines the problem of an RL agent inferring a human’s true reward function based on the proxy reward function designed by the human. This is different from inverse reinforcement learning, where the agent infers the reward function from human behavior. The paper proposes a method for IRD that models uncertainty about the true reward, assuming that the human chose a proxy reward that leads to the correct behavior in the training environment. For example, if a test environment unexpectedly includes lava, the agent assumes that a lava-avoiding reward function is as likely as a lava-indifferent or lava-seeking reward function, since they lead to the same behavior in the training environment. The agent then follows a risk-averse policy with respect to its uncertainty about the reward function.


The paper shows some encouraging results on toy environments for avoiding some types of side effects and reward hacking behavior, though it’s unclear how well they will generalize to more complex settings. For example, the approach to reward hacking relies on noticing disagreements between different sensors / features that agreed in the training environment, which might be much harder to pick up on in a complex environment. The method is also at risk of being overly risk-averse and avoiding anything new, whether it be lava or gold, so it would be great to see some approaches for safe exploration in this setting.

Repeated Inverse RL (Amin et al) defines the problem of inferring intrinsic human preferences that incorporate safety criteria and are invariant across many tasks. The reward function for each task is a combination of the task-invariant intrinsic reward (unobserved by the agent) and a task-specific reward (observed by the agent). This multi-task setup helps address the identifiability problem in IRL, where different reward functions could produce the same behavior.

repeated irl

The authors propose an algorithm for inferring the intrinsic reward while minimizing the number of mistakes made by the agent. They prove an upper bound on the number of mistakes for the “active learning” case where the agent gets to choose the tasks, and show that a certain number of mistakes is inevitable when the agent cannot choose the tasks (there is no upper bound in that case). Thus, letting the agent choose the tasks that it’s trained on seems like a good idea, though it might also result in a selection of tasks that is less interpretable to humans.

Deep RL from Human Preferences (Christiano et al) uses human feedback to teach deep RL agents about complex objectives that humans can evaluate but might not be able to demonstrate (e.g. a backflip). The human is shown two trajectory snippets of the agent’s behavior and selects which one more closely matches the objective. This method makes very efficient use of limited human feedback, scaling much better than previous methods and enabling the agent to learn much more complex objectives (as shown in MuJoCo and Atari).


Dynamic Safe Interruptibility for Decentralized Multi-Agent RL (El Mhamdi et al) generalizes the safe interruptibility problem to the multi-agent setting. Non-interruptible dynamics can arise in a group of agents even if each agent individually is indifferent to interruptions. This can happen if Agent B is affected by interruptions of Agent A and is thus incentivized to prevent A from being interrupted (e.g. if the agents are self-driving cars and A is in front of B on the road). The multi-agent definition focuses on preserving the system dynamics in the presence of interruptions, rather than on converging to an optimal policy, which is difficult to guarantee in a multi-agent setting.

Aligned AI workshop

This was a more long-term-focused version of the Reliable ML in the Wild workshop held in previous years. There were many great talks and posters there – my favorite talks were Ian Goodfellow’s “Adversarial Robustness for Aligned AI” and Gillian Hadfield’s “Incomplete Contracting and AI Alignment”.

Ian made the case of ML security being important for long-term AI safety. The effectiveness of adversarial examples is problematic not only from the near-term perspective of current ML systems (such as self-driving cars) being fooled by bad actors. It’s also bad news from the long-term perspective of aligning the values of an advanced agent, which could inadvertently seek out adversarial examples for its reward function due to Goodhart’s law. Relying on the agent’s uncertainty about the environment or human preferences is not sufficient to ensure safety, since adversarial examples can cause the agent to have arbitrarily high confidence in the wrong answer.

ian talk_3

Gillian approached AI safety from an economics perspective, drawing parallels between specifying objectives for artificial agents and designing contracts for humans. The same issues that make contracts incomplete (the designer’s inability to consider all relevant contingencies or precisely specify the variables involved, and incentives for the parties to game the system) lead to side effects and reward hacking for artificial agents.

Gillian talk_4

The central question of the talk was how we can use insights from incomplete contracting theory to better understand and systematically solve specification problems in AI safety, which is a really interesting research direction. The objective specification problem seems even harder to me than the incomplete contract problem, since the contract design process relies on some level of shared common sense between the humans involved, which artificial agents do not currently possess.

Interpretability for AI safety

I gave a talk at the Interpretable ML symposium on connections between interpretability and long-term safety, which explored what forms of interpretability could help make progress on safety problems (slides, video). Understanding our systems better can help ensure that safe behavior generalizes to new situations, and it can help identify causes of unsafe behavior when it does occur.

For example, if we want to build an agent that’s indifferent to being switched off, it would be helpful to see whether the agent has representations that correspond to an off-switch, and whether they are used in its decisions. Side effects and safe exploration problems would benefit from identifying representations that correspond to irreversible states (like “broken” or “stuck”). While existing work on examining the representations of neural networks focuses on visualizations, safety-relevant concepts are often difficult to visualize.

Local interpretability techniques that explain specific predictions or decisions are also useful for safety. We could examine whether features that are idiosyncratic to the training environment or indicate proximity to dangerous states influence the agent’s decisions. If the agent can produce a natural language explanation of its actions, how does it explain problematic behavior like reward hacking or going out of its way to disable the off-switch?

There are many ways in which interpretability can be useful for safety. Somewhat less obvious is what safety can do for interpretability: serving as grounding for interpretability questions. As exemplified by the final debate of the symposium, there is an ongoing conversation in the ML community trying to pin down the fuzzy idea of interpretability – what is it, do we even need it, what kind of understanding is useful, etc. I think it’s important to keep in mind that our desire for interpretability is to some extent motivated by our systems being fallible – understanding our AI systems would be less important if they were 100% robust and made no mistakes. From the safety perspective, we can define interpretability as the kind of understanding that help us ensure the safety of our systems.

For those interested in applying the interpretability hammer to the safety nail, or working on other long-term safety questions, FLI has recently announced a new grant program. Now is a great time for the AI field to think deeply about value alignment. As Pieter Abbeel said at the end of his keynote, “Once you build really good AI contraptions, how do you make sure they align their value system with our value system? Because at some point, they might be smarter than us, and it might be important that they actually care about what we care about.”

(Thanks to Janos Kramar for his feedback on this post, and to everyone at DeepMind who gave feedback on the interpretability talk.)

Embracing the Diffusion of AI Research in Yerevan, Armenia

In July of this year, NYU Professor of Psychology Gary Marcus argued in the New York Times that AI is stuck, failing to progress towards a more general, human-like intelligence. To liberate AI from it’s current stuckness, he proposed a big science initiative. Covetously referencing the thousands of bodies (employed at) and billions of dollars (lavished on) CERN, he wondered whether we ought to launch a concerted international AI mission.

Perhaps owing to my New York upbringing, I admire Gary’s contrarian instincts. With the press pouring forth a fine slurry of real and imagined progress in machine learning, celebrating any story about AI as a major breakthrough, it’s hard to state the value of a relentless critical voice reminding the community of our remaining shortcomings.

But despite the seductive flash of big science and Gary’s irresistible chutzpah, I don’t buy this particular recommendation. Billion-dollar price tags and frightening head counts are bugs, not features. Big science requires getting those thousands of heads to agree about what questions are worth asking. A useful heuristic that applies here:

The larger an organization, the simpler its elevator pitch needs to be.

Machine learning research doesn’t yet have an agreed-upon elevator pitch. And trying to coerce one prematurely seems like a waste of resources. Dissent and diversity of viewpoints are valuable. Big science mandates overbearing bureaucracy and some amount of groupthink, and sometimes that’s necessary. If, as in physics, an entire field already agrees about what experiments come next and these happen to be thousand-man jobs costing billions of dollars, then so be it.

But right now, in machine learning research, most recent breakthroughs come from pods of 1-4 researchers working with 1-4 NIVIDA GPUS (graphics cards used to speed up neural network computations) on a single computer. Even within the big labs – most papers come from the concerted efforts of small groups of researchers. We don’t need to collect the community in one place. Even now, when a glut of scientists exists in one place, it’s not clear there’s a significant benefit compared to when they’re dispersed. For example, in most of my career at universities and industry labs, my work usually has deeper connections to some other project scattered around the world than to the projects going on in adjacent offices.

Nearly all AI research projects require thousands of dollars of computing resources (not millions or billions). We don’t yet require squillion-dollar microscopes or particle-identification detectors. I suspect that even for Alpha-Go, perhaps the most capital-intensive machine learning project in recent memory, the primary costs were employee salaries. Among scientific and engineering disciplines, ML research has little dependence on buried trade secrets. Research papers and code are increasingly shared publicly and the most interesting experiments in the field can often be reproduced in just hours and the scarcest resources are still salaries, good mentors, free time, and the elusive carte blanche to work on interesting problems.

Reflections from Yerevan

Two weeks ago, I arrived in Yerevan, Armenia to attend the Machine Learning for Discovery Sciences workshop co-sponsored by the National Science Foundation (NSF) and the Foundation for Armenian Science and Technology. The workshop brought invited speakers from around the world to Yerevan to give short talks on their research, participate in panel discussions, and engage in a day of roundtables to discuss recommendations for the development of sciences in Armenia and future collaborations with US researchers and institutions.

YerevaNN co-founder Hrant Khachatrian speaks on deep learning for medicine at the Machine Machine Learning for Discovery Sciences workshop in Yerevan

Admittedly, I arrived fairly ignorant about the country of Armenia. Primarily, I knew the course outlines of the atrocities committed against ethnic Armenians during World War I. From time spent in Los Angles, I had eaten some Armenian food and knew that the golden state was home to one of the large diaspora communities. I also have been enamored of the music of Tigran Hamasyan, a singular pianist whose music pulls together djent/thrash metal, Armenian folk melodies and ornamentation, and modern jazz improvisation practice into an amalgam that sounds on paper like it should go horribly wrong, but miraculously never does.

But one month ago, I couldn’t have told you the population of Armenia (3M), its income per capita (3600 USD per person), its neighbors (Iran, Turkey, Georgia, Azerbaijan), or the history of its economy, which collapsed following the dissolution of the USSR owing to a strong dependence on the Soviet military-industrial complex.

Over the course of one week in Armenia’s capital, Yerevan, I participated in a crash course on machine learning by day and at all other times (besides those wee hours spent either sleeping or working on papers with Californian collaborators) participated in a crash course on Armenia itself, learning my way around Yerevan’s streets, food, music, and most importantly, getting to know their students – emerging AI researchers searching for and creating opportunities in the Armenian capital.

Armenian cheese, herbs, and flatbread, in the countryside following the NSF-FAST workshop.

I could write an entire post about the workshop proper. Workshop co-chairs Aram Galstyan and Naira Hovakimyan pulled together a terrific group of professors and researchers from theory (Arnak Dalayan, Negar Kiyavash, Mesrob Ohannessian, Nathan Srebro), NLP (Jerry Hobbs), computational social science (Katy Pearce, Daniel Larremore), and medicine (David Kale, Kristina Simonyan), graph mining (Tina Eliassi-rad, Danai Koutra, Zoran Obradovic), and more. I could write an entire post about the talks, another about the food, and a third about the music. And someone more knowledgable than me could write about the societal political significance (the president attended the gala!) of the workshop at the level of governments and institutions. But I suspect that the future of AI research in Armenia has less to do with invited guests and more to do with the next generation of researchers.

A view from the university

Before I came to Armenia, a colleague at Amazon learned I was going and connected with American University of Armenia (AUA) to arrange a 4-hour hands-on tutorial on our new Gluon interface for deep learning.  On October 17th, the day before the FAST workshop began, I arrived at AUA to deliver the talk.

Giving hands-on tutorials is challenging. On any day, half the audience might consist of machine learning PhDs. You could also find yourself in a room full of first-timers looking for a gentle primer on deep learning. Giving the right tutorial on a given day means curating the content on the fly and finding the right pace.

While I started conservatively, shortly after beginning, it was clear that for at least half of the room I was going too slowly. Many attendees had extensive experience in at least one other deep learning framework and most were familiar with core machine learning concepts.

After the talk, Dean of Engineering Aram Hajian took me, friends David Kale and Daniel Moyer, and a handful of students who still had questions after 4 hours of tutorial out to eat Georgian dumplings (Khinkali). I walked in a pack of especially inquisitive students who comprise the research staff of YerevaNN, an upstart non-profit research based in Yerevan.

When we got to dinner, I asked the students how many machine learning professors are around to guide doctoral research in Armenia. The answer: zero. Despite building Armenia’s ML community up from a somewhat blank slate, these students had forged collaborations across continents. In particular, YerevaNN researchers connected with USC’s Information Science’s Institute through FAST co-chair Aram Galstyan, collaborating on a number of projects, including some work with my frequent collaborator David Kale establishing public benchmarks for predicting diagnoses and outcomes given clinical  medical time series data.

The next day, David, Daniel, and I visited YerevaNN’s office. Located across the street from Yerevan State University, they have several rooms, a few whiteboards, a kitchen, and a couple servers each equipped with an NVIDIA GPU. Per their website, the lab currently consists of Hrant Khachatrian (recent PhD student, with a previous research program investigating graph theory), Hrayr Harutyunyan and Karen Hambardzumyan (masters students), and Tigran Galstyan (undergrad).

Zachary Lipton, Tigran Galstyan, David Kale, and Karen Hambardzumyan at YerevaNN’s offices

YerevaNN is not (yet) quite OpenAI or Google DeepMind. Its team consists of young researchers who, while driven, are not yet household names in CS departments around the world. While DeepMind dolled out 138 million USD in salary last year, I imagine several orders of magnitude separate the striving researchers at YerevaNN. While the lab is starting to put together strong conference submissions, university professors are not going to be cowed by their bibliometrics in the immediate future.

But aside from a handful of big science projects (like AlphaGo, say) it’s hard to point to any given paper that’s coming out of a giant lab that couldn’t just as easily be done by these researchers in Yerevan with just a little more time, a few more GPUs, and perhaps a few Skype calls with some older farts to pick over their papers and provide critical feedback.

The information they need is free. The rent is cheap. At the undergrad and masters degree levels, the talent is undeniably there. One minor challenge is finding mentorship for all the deserving students perhaps the formidable challenge will be holding on to some of that talent once it is cultivated and tempted by lucrative opportunities abroad.

Seeing small research groups take root without extensive infrastructure and without massive flows of capital, I’m both confident and relieved that AI research has not transitioned into a phase of Big Science. The science itself benefits from the large diverse community feeding – and so do aspiring researchers in countries around the world that couldn’t possibly provide the infrastructure for, say, a world-class particle accelerator, or the Human Genome Project.

On my last day in Armenia, David Kale and I visited the TUMO Center. Launched by Lebanese Armenians, Sam and Sylva Simonian, the center inhabits an architecturally marvelous home on the outskirts of Yerevan and houses 1000s of students. The students follow bespoke software-driven curricula in TUMObiles – mobile iMac-equipped carts powered through the ceiling, attend workshops, and participate in classes with local teachers and visiting scholars in curated Learning Labs. Even with students aged 12-18, TUMO’s curators already seemed keen to familiarize them with machine learning.

Students follow bespoke curriculum in their Tumobiles at the Tumo Center in Yerevan. Thousands of students participate in Tumo’s after-school curricula.

Over the next years, Yerevan’s high-tech educational initiatives and undergraduate universities will churn out thousands of promising students. And as science grows more decentralized, and organizations like FAST and TUMO step up investment in science education, these students may have footing to compete in a global machine learning research ecosystem. Perhaps, with few signs of a thawing in the San Francisco, Seattle, New York, or London real estate markets, they may even have an advantage.