Machine Learning Security at ICLR 2017

(This article originally appeared here. Thanks to Janos Kramar for his feedback on this post.)

The overall theme of the ICLR conference setting this year could be summarized as “finger food and ships”. More importantly, there were a lot of interesting papers, especially on machine learning security, which will be the focus on this post. (Here is a great overview of the topic.)

food-and-ships

On the attack side, adversarial perturbations now work in physical form (if you print out the image and then take a picture) and they can also interfere with image segmentation. This has some disturbing implications for fooling vision systems in self-driving cars, such as impeding them from recognizing pedestrians. Adversarial examples are also effective at sabotaging neural network policies in reinforcement learning at test time.

adv-ex-policy.png

In more encouraging news, adversarial examples are not entirely transferable between different models. For targeted examples, which aim to be misclassified as a specific class, the target class is not preserved when transferring to a different model. For example, if an image of a school bus is classified as a crocodile by the original model, it has at most 4% probability of being seen as a crocodile by another model. The paper introduces an ensemble method for developing adversarial examples whose targets do transfer, but this seems to only work well if the ensemble includes a model with a similar architecture to the new model.

On the defense side, there were some new methods for detecting adversarial examples. One method augments neural nets with a detector subnetwork, which works quite well and generalizes to new adversaries (if they are similar to or weaker than the adversary used for training). Another approach analyzes adversarial images using PCA, and finds that they are similar to normal images in the first few thousand principal components, but have a lot more variance in later components. Note that the reverse is not the case – adding arbitrary variation in trailing components does not necessarily encourage misclassification.

There has also been progress in scaling adversarial training to larger models and data sets, which also found that higher-capacity models are more resistant against adversarial examples than lower-capacity models. My overall impression is that adversarial attacks are still ahead of adversarial defense, but the defense side is starting to catch up.

20170426_202937.jpg

Policy Field Notes: NIPS Update

By Jack Clark and Tim Hwang. 

Conversations about the social impact of AI often are very abstract, focusing on broad generalizations about technology rather than talking about the specific state of the research field. That makes it challenging to have a full conversation about what good public policy regarding AI would be like. In the interest of helping to bridge that gap, Jack Clark and I have been playing around with doing recaps that’ll take a selection of papers from a recent conference and talk about the longer term policy implications of the work. This one covers papers that appeared at NIPS 2016.

If it’s helpful to the community, we’ll plan to roll out similar recaps throughout 2017 — with the next one being ICLR in April.

Continue reading “Policy Field Notes: NIPS Update”

AI Safety Highlights from NIPS 2016

[This article is cross-posted from my blog. Thanks to Jan Leike, Zachary Lipton, and Janos Kramar for providing feedback on this post.]

This year’s Neural Information Processing Systems conference was larger than ever, with almost 6000 people attending, hosted in a huge convention center in Barcelona, Spain. The conference started off with two exciting announcements on open-sourcing collections of environments for training and testing general AI capabilities – the DeepMind Lab and the OpenAI Universe. Among other things, this is promising for testing safety properties of ML algorithms. OpenAI has already used their Universe environment to give an entertaining and instructive demonstration of reward hacking that illustrates the challenge of designing robust reward functions.

I was happy to see a lot of AI-safety-related content at NIPS this year. The ML and the Law symposium and Interpretable ML for Complex Systems workshop focused on near-term AI safety issues, while the Reliable ML in the Wild workshop also covered long-term problems. Here are some papers relevant to long-term AI safety:

Continue reading “AI Safety Highlights from NIPS 2016”

Clopen AI: Openness in different aspects of AI development

[This article is cross-posted from my blog. Thanks to Jelena Luketina and Janos Kramar for their detailed feedback on this post.]

1-clopen-set

There has been a lot of discussion about the appropriate level of openness in AI research in the past year – the OpenAI announcement, the blog post Should AI Be Open?, a response to the latter, and Nick Bostrom’s thorough paper Strategic Implications of Openness in AI development.

There is disagreement on this question within the AI safety community as well as outside it. Many people are justifiably afraid of concentrating power to create AGI and determine its values in the hands of one company or organization. Many others are concerned about the information hazards of open-sourcing AGI and the resulting potential for misuse. In this post, I argue that some sort of compromise between openness and secrecy will be necessary, as both extremes of complete secrecy and complete openness seem really bad. The good news is that there isn’t a single axis of openness vs secrecy – we can make separate judgment calls for different aspects of AGI development, and develop a set of guidelines.

Continue reading “Clopen AI: Openness in different aspects of AI development”