10 Papers from ICML and CVPR

cvprICML

The International Conference on Machine Learning (ICML) and Computer Vision and Pattern Recognition (CVPR) 2016 occurred back-to-back this year. In this post, I’ll cover some of the most exciting advances in research as experienced from an on-the-ground perspective. Unsurprisingly, ICML focused more on fundamental research in an intimate setting while CVPR focused on applications research. Both included copious amounts of deep learning applied to many different areas.

ICML

Set in Times Square, it was a trivial task to screen out the constant flood of tourists with great research in the forefront.

In general, ICML felt intimate (despite significant growth) and top researchers were accessible for informal ad-hoc conversations and brainstorming. It’s probably an inflection point before this conference gets too crowded.

Top papers and presentations

Kaiming He’s Deep Residual Networks tutorial revealed tips and tricks for training ultra-deep residual networks. First presented at ICCV 2015 where he did his first victory lap for winning the 2015 ImageNet competition, Kaiming kept it interesting with a few neat points. For instance, cutting out to a video demonstrating the layer by layer variance visualization. CVPR would later show that ResNets are incredibly important and experts must reduce it to practice in order to obtain leading results. Kaiming He’s ResNet Tutorial

As always, David Silver gave one of the best presentations on deep reinforcement learning. It was just the start of great presentations from Google Deepmind spanning the entire week. He closed the sessions with details from their AlphaGo work. David Silver’s DRL Tutorial

For instance, Dueling DQNs for reinforcement learning was uploaded awhile back on arxiv, but still relevant. Salience maps between advantage and value streams show difference between local and global (equivalently near and far) optimization. One weakness of the approach is games requiring leaps in strategy exploration. http://arxiv.org/abs/1511.06581

Leon Bottou’s presentation on accelerating DNN training uses optimization surface curvature information to speedup optimization, descending more quickly where the topology allows. He showed numerical results confirming his theorems that natural gradient methods allow scaling proportional to weights instead of neurons. “Three observations on diagonal second order methods applied to deep networks” http://arxiv.org/abs/1606.04838 Related work: http://arxiv.org/abs/1301.3584

G-convs as a drop-in replacement for standard convs, consistent improvements in accuracy and ties to Lie algebra. Open source: https://github.com/tscohen/GrouPy https://github.com/tscohen/gconv_experiments https://staff.fnwi.uva.nl/m.welling/wp-content/uploads/papers/icml2016_GCNN.pdf

Stacked what-where autoencoders reducing sampling during training and tested on an open dataset. On GitHub here: https://github.com/YutingZhang/caffe-recon-dec https://arxiv.org/abs/1506.02351

Concatented ReLUs as an easily implemented, low cost layer improvement http://jmlr.org/proceedings/papers/v48/shang16.pdf

Sashank Reddi presented Stochastic Variance Reduction for Nonconvex Optimization SVRG, emerging as the most promising candidate to supersede stochastic gradient descent. http://suvrit.de/papers/nonconvex_svrg.pdf

Ivo Danihelka presenting associative LSTMs used complex-valued vectors for a general, low cost, and parallelizable way of adding memory to LSTMs for longer recall (for instance to improve performance through noise reduction from multiple copies) http://jmlr.org/proceedings/papers/v48/danihelka16.html (Edit: uLSTMs may have better performance, Oriol Vinyals commenting here)

In the meetups surrounding ICML, Leo Dirac from Amazon gave a great talk on DSSTNE at scale. He gave practical guidelines on scaling limits and referenced Nikko Strom’s work on reducing weight updates.

One of the hidden highlights was the systems session at the Friday workshop (for the intrepid few that had enough steam after a full week). One trend is increasing the accessibility for data scientist use, ie. FBLearnerFlow, ModelDB, and KNIME. Soumith Chintala mentioned in his presentation the aggregate community numbers are the ultimate judge of success for an open source project. Databricks emphasized Spark is maintained under the Apache Foundation which guarantees stable governance. Poseidon was presented as an off-shoot of Bosen and Pentuum addressing distributed training by extending Caffe with tests on ILSVRC2012 and ImageNet 22K: http://arxiv.org/abs/1512.06216

feifeiTweet Well played, well played. Stanford Professor Feifei Li closed her ICML talk with quite the quip at Trump, who spoke earlier in the day in the same venue.

CVPR

CVPR of course has a different focus, namely applications of computer vision through which deep learning techniques lead in image classification, object detection, semantic segmentation, multi-modal analysis, etc. An excellent collection of open source tools released at CVPR is available here