Alberta Machine Intelligence Institute

Amii at NeurIPS 2020

Published

Nov 30, 2020

Amii is proud to share the efforts and achievements of our researchers in the 34th Conference on Neural Information Processing Systems (NeurIPS), running online this year from December 6 to 12.

NeurIPS is one of the highest-ranked ML & AI conferences in the world, based on its H5-index and Impact Score (see: Google Scholar and Guide2Research). Of the 8,186 papers reviewed this year, only 1,903 papers were accepted (20.1%); 20 of those papers were co-authored by Amii researchers. The conference profiles research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects.

Accepted papers and workshops from Amii researchers cover a range of topics, including the introduction of CoinDICE, a novel and efficient algorithm for computing confidence intervals; and the exploration of Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback.

Amii Fellows and Canada CIFAR AI Chairs – professors at the University of Alberta, Simon Fraser University and the University of British Columbia – are included in the proceedings, as well as other Amii researchers:

Accepted Papers

A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

  • Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Gorur, Chris Harris & Dale Schuurmans Poster Session: December 8 (10 a.m. - 12 p.m. MST) This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending the existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.

An implicit function learning approach for parametric modal regression

  • Yangchen Pan, Ehsan Imani, Amir-massoud Farahmand & Martha White Poster Session: December 9 (10 a.m. - 12 p.m. MST) For multi-valued functions---such as when the conditional distribution on targets given the inputs is multi-modal---standard regression approaches are not always desirable because they provide the conditional mean. Modal regression algorithms address this issue by instead finding the conditional mode(s). Most, however, are nonparametric approaches and so can be difficult to scale. Further, parametric approximators, like neural networks, facilitate learning complex relationships between inputs and targets. In this work, we propose a parametric modal regression algorithm. We use the implicit function theorem to develop an objective, for learning a joint function over inputs and targets. We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions. We demonstrate that our method is competitive in a real-world modal regression problem and two regular regression datasets.

CoinDICE: Off-Policy Confidence Interval Estimation

  • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári & Dale Schuurmans Spotlight Presentation: December 7 (8:10 - 8:20 p.m. MST) Poster Session: December 7 (10 p.m. - 12 a.m. MST) We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the Q-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.

Differentiable Meta-Learning of Bandit Policies

  • Craig Boutilier, Chih-wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvári & Manzil Zaheer Poster Session: December 8 (10 p.m. - 12 a.m. MST) Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form. To do this, we parameterize our policies in a differentiable way and optimize them by policy gradients, an approach that is pleasantly general and easy to implement. We derive effective gradient estimators and propose novel variance reduction techniques. We also analyze and experiment with various bandit policy classes, including neural networks and a novel softmax policy. The latter has regret guarantees and is a natural starting point for our optimization. Our experiments show the versatility of our approach. We also observe that neural network policies can learn implicit biases expressed only through the sampled instances.

Efficient Planning in Large MDPs with Weak Linear Function Approximation

  • Roshan Shariff & Csaba Szepesvári Poster Session: December 8 (10 a.m. - 12 p.m. MST) Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of “core” states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.

Escaping the Gravitational Pull of Softmax

  • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvári & Dale Schuurmans Spotlight Presentation: December 8 (7:15 - 7:30 a.m. MST) Poster Session: December 8 (10 a.m. - 12 p.m. MST) The softmax is the standard transformation used in machine learning to map real-valued vectors to categorical distributions. Unfortunately, this transform poses serious drawbacks for gradient descent (ascent) optimization. We reveal this difficulty by establishing two negative results: (1) optimizing any expectation with respect to the softmax must exhibit sensitivity to parameter initialization ("softmax gravity well''), and (2) optimizing log-probabilities under the softmax must exhibit slow convergence ("softmax damping''). Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities. To circumvent these shortcomings we investigate an alternative transformation, the \emph{escort} mapping, that demonstrates better optimization properties. The disadvantages of the softmax and the effectiveness of the escort transformation are further explained using the concept of N\L{} coefficient. In addition to proving bounds on convergence rates to firmly establish these results, we also provide experimental evidence for the superiority of the escort transformation.

Exemplar Guided Active Learning

  • Jason S. Hartford, Kevin Leyton-Brown, Hadas Raviv, Dan Padnos, Shahar Lev, Barak Lenz Poster Session: December 9 (10 p.m. - 12 a.m. MST) We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. For example, consider the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each “common class” that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from “rare classes” whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew. We describe an active learning approach that (1) explicitly searches for rare classes by leveraging the contextual embedding spaces provided by modern language models, and (2) incorporates a stopping rule that ignores classes once we prove that they occur below our target threshold with high probability. We prove that our algorithm only costs logarithmically more than a hypothetical approach that knows all true label frequencies and show experimentally that incorporating automated search can significantly reduce the number of samples needed to reach target accuracy levels.

ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool

  • Gellert Weisz, András György, Wei-I Lin, Devon Graham, Kevin Leyton-Brown, Csaba Szepesvári & Brendan Lucier Poster Session: December 10 (10 p.m. - 12 a.m. MST) Algorithm configuration procedures optimize parameters of a given algorithm to perform well over a distribution of inputs. Recent theoretical work focused on the case of selecting between a small number of alternatives. In practice, parameter spaces are often very large or infinite, and so successful heuristic procedures discard parameters "impatiently'', based on very few observations. Inspired by this idea, we introduce ImpatientCapsAndRuns, which quickly discards less promising configurations, significantly speeding up the search procedure compared to previous algorithms with theoretical guarantees, while still achieving optimal runtime up to logarithmic factors under mild assumptions. Experimental results demonstrate a practical improvement.

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

  • Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans Poster Session: December 8 (10 p.m. - 12 a.m. MST) Discrete structures play an important role in applications like program language modeling and software engineering. Current approaches to predicting complex structures typically consider autoregressive models for their tractability, with some sacrifice in flexibility. Energy-based models (EBMs) on the other hand offer a more flexible and thus more powerful approach to modeling such distributions, but require partition function estimation. In this paper we propose \modelshort, a new algorithm for learning conditional and unconditional EBMs for discrete structured data, where parameter gradients are estimated using a learned sampler that mimics local search. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration, achieving a better trade-off between flexibility and tractability. Experimentally, we show that learning local search leads to significant improvements in challenging application domains. Most notably, we present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

  • Zaheen Ahmad, Levi Lelis & Michael Bowling Poster Session: December 8 (10 p.m. - 12 a.m. MST) Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.

Model Selection in Contextual Stochastic Bandit Problems

  • Aldo Pacchiano, My Phan, Yasin Abbasi Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore & Csaba Szepesvári Poster Session: December 10 (10 a.m. - 12 p.m. MST) We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal O(√T) model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has O(logT) regret, in general it is impossible to get better than Ω(√T) regret in model selection, even asymptotically. Using our techniques, we address model selection in a variety of problems such as misspecified linear contextual bandits \citep{lattimore2019learning}, linear bandit with unknown dimension \citep{Foster-Krishnamurthy-Luo-2019} and reinforcement learning with unknown feature maps. Our algorithm requires the knowledge of the optimal base regret to adjust the master learning rate. We show that without such prior knowledge any master can suffer a regret larger than the optimal base regret.

MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation

  • Saim Wani, Shivansh Patel, Unnat Jain, Angel Chang & Manolis Savva Poster Session: December 8 (10 a.m. - 12 p.m. MST) Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed. We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment. MultiON generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects. We perform a set of multiON experiments to examine how a variety of agent models perform across a spectrum of navigation task complexities. Our experiments show that: i) navigation performance degrades dramatically with escalating task complexity; ii) a simple semantic map agent performs surprisingly well relative to more complex neural image feature map agents; and iii) even oracle map agents achieve relatively low performance, indicating the potential for future work in training embodied navigation agents using maps.

Off-Policy Evaluation via the Regularized Lagrangian

  • Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li & Dale Schuurmans Poster Session: December 7 (10 p.m. - 12 a.m. MST) The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same linear program. The unification allows us to expand the space of DICE estimators to new alternatives that demonstrate improved performance. More importantly, by analyzing the expanded space of estimators both mathematically and empirically we find that dual solutions offer greater flexibility in navigating the tradeoff between optimization stability and estimation bias, and generally provide superior estimates in practice.

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

  • Arun Verma, Manjesh Kumar Hanawal, Csaba Szepesvári & Venkatesh Saligrama Poster Session: December 8 (10 a.m. - 12 p.m. MST) In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and the learner selects the arms sequentially till some depth. The total cost incurred by stopping at an arm is the sum of fixed costs of arms selected and the stochastic loss associated with the arm. The learner's goal is to learn a decision rule that maps contexts to arms with the goal of minimizing the total expected loss. The problem is challenging as we are faced with an unsupervised setting as the total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm can be inferred (explicitly or implicitly) from the problem structure. We observe that learning is still possible when the problem instance satisfies the so-called 'Contextual Weak Dominance' (CWD) property. Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.

PAC-Bayes Analysis Beyond the Usual Bounds

  • Omar Rivasplata, Ilja Kuzborskij, Csaba Szepesvári & John Shawe-Taylor Poster Session: December 8 (10 a.m. - 12 p.m. MST) We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed ‘data-free’ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss.

Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses

  • Yihan Zhou, Victor Sanches Portella, Mark Schmidt, Nicholas Harvey Poster Session: December 10 (10 a.m. - 12 p.m. MST) In online convex optimization (OCO), Lipschitz continuity of the functions is commonly assumed in order to obtain sublinear regret. Moreover, many algorithms have only logarithmic regret when these functions are also strongly convex. Recently, researchers from convex optimization proposed the notions of relative Lipschitz continuity'' and relative strong convexity''. Both of the notions are generalizations of their classical counterparts. It has been shown that subgradient methods in the relative setting have performance analogous to their performance in the classical setting. In this work, we consider OCO for relative Lipschitz and relative strongly convex functions. We extend the known regret bounds for classical OCO algorithms to the relative setting. Specifically, we show regret bounds for the follow the regularized leader algorithms and a variant of online mirror descent. Due to the generality of these methods, these results yield regret bounds for a wide variety of OCO algorithms. Furthermore, we further extend the results to algorithms with extra regularization such as regularized dual averaging.

Shared Space Transfer Learning for analyzing multi-site fMRI data

  • Muhammad Yousefnezhad, Alessandro Selvitella, Daoqiang Zhang, Andrew Greenshaw & Russell Greiner Poster Session: December 10 (10 a.m. - 12 p.m. MST) Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data, for distinguishing when subjects are performing different cognitive tasks — e.g., watching movies or making decisions. MVPA works best with a well-designed feature set and an adequate sample size. However, most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes. Further, training a robust, generalized predictive model that can analyze homogeneous cognitive tasks provided by multi-site fMRI datasets has additional challenges. This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning (TL) approach that can functionally align homogeneous multi-site fMRI datasets, and so improve the prediction performance in every site. SSTL first extracts a set of common features for all subjects in each site. It then uses TL to map these site-specific features to a site-independent shared space in order to improve the performance of the MVPA. SSTL uses a scalable optimization procedure that works effectively for high-dimensional fMRI datasets. The optimization procedure extracts the common features for each site by using a single-iteration algorithm and maps these site-specific common features to the site-independent shared space. We evaluate the effectiveness of the proposed method for transferring between various cognitive tasks. Our comprehensive experiments validate that SSTL achieves superior performance to other state-of-the-art analysis techniques.

Towards Safe Policy Improvement for Non-Stationary MDPs

  • Yash Chandak, Scott Jordan, Georgios Theocharous, Martha White & Philip S. Thomas Spotlight Presentation: December 9 (9 - 9:10 a.m. MST) Poster Session: December 9 (10 a.m. - 12 p.m. MST) Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy’s forecasted performance, and confidence intervals are obtained using wild bootstrap.

Unsupervised Text Generation by Learning from Search

  • Jingjing Li, Zichao Li, Lili Mou, Xin Jiang, Michael Lyu & Irwin King Poster Session: December 9 (10 a.m. - 12 p.m. MST) In this work, we propose TGLS, a novel framework for unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) estimates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, unsupervised paraphrasing and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance to strong supervised methods for paraphrase generation.

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

  • Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvári & Mengdi Wang Spotlight Presentation: December 8 (9:10 - 9:20 a.m. MST) Poster Session: December 8 (10 a.m. - 12 p.m. MST) In recent years, reinforcement learning systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. Further, we prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, and we also establish its rate of convergence that matches or improves the convergence rate available in the case of RL with cumulative rewards.

Workshops

Organizers:

  • OPT2020: Optimization for Machine Learning: Optimization lies at the heart of many machine learning algorithms and enjoys great interest in the community. This intimate relation of optimization with ML is the key motivation for the OPT series of workshops, co-organized by Canada CIFAR AI Chair at Amii Mark Schmidt.

  • Policy Optimization in Reinforcement Learning: co-organized by Sham Kakade, Martha White (Amii Fellow and Canada CIFAR AI Chair) and Nicolas Le Roux; with help from Amii alum Alan Chan and Amii researchers Shivan Garg, Dhawal Gupta & Abhishek Naik.

Speakers:

  • Talking to Strangers: Zero-Shot Emergent Communication: Communication is one of the most impressive human abilities, but it has historically been studied in ML mainly on confined datasets of natural language. Thanks to deep RL, emergent communication can now be studied in complex multi-agent scenarios. This workshop features a talk by Amii Fellow Michael Bowling entitled “Hindsight Rationality: Alternatives to Nash”, which explores some of the often unstated principles common in multiagent learning research that may be impeding progress, and suggests an alternative set of principles.

  • NewInML: A Workshop for Newcomers to Machine Learning: Is this your first time submitting to a top conference? Have you ever wanted your work recognized by a large and active community? Then the NewInML workshop is for you! Amii Fellow Michael Bowling will be featured in a panel discussion alongside other top NeurIPS reviewers and researchers.

Paper Publications:

The following papers by Amii researchers have been accepted into workshops (please check out the individual sites for presentation times):

Tutorials

In addition, Amii Fellow and Canada CIFAR AI Chair Martha White will be co-presenting the following tutorials:

Policy Optimization in Reinforcement Learning

  • Sham M Kakade, Martha White & Nicolas Le Roux

  • December 7 (12 - 2:30 p.m. MST)

Tutorial Questions and Answers

  • Sham M Kakade, Martha White & Nicolas Le Roux

  • December 10 (2 - 2:50 p.m. MST)

Reviewing

Amii researchers have also received accolades for being in the top 10% of high-scoring reviewers for NeurIPS this year! Congratulations to Dustin Morrill, Sina Ghiassian, Alex Kearney, Eric Graves, Kris De Asis, Amii alum Alan Chan, Amii Fellow Russ Greiner & Canada CIFAR AI Chair at Amii Angel Chang.

Amii Fellows Csaba Szepesvári and Dale Schuurmans also served as Senior Area Chair (SAC) members; only 63 researchers were chosen for this honour. SAC members oversaw the work of Area Chairs and ensured that the reviewing process went smoothly.

Authors

Britt Ayotte

Martha White

Yangchen Pan

Ehsan Imani

Amir-massoud Farahmand

Roshan Shariff

Jincheng Mei

Chenjun Xiao

Zaheen Ahmad

Angel Chang

Muhammad Yousefnezhad

Alan Chan

Shivan Garg

Dhawal Gupta

Abhishek Naik

Varun Ranganathan

Matthew Schlegel

Shibhansh Dohare

Banafsheh Rafiee

Dustin Morrill

Sina Ghiassian

Alex Kearney

Kenny Young

Manan Tomar

Rohan Nuttall

Kristopher De Asis

Levi Lelis

Lili Mou

Mark Schmidt

Richard S. Sutton

Sahir

Rupam Mahmood

Russ Greiner

Kevin Leyton-Brown

Dale Schuurmans

Csaba Szepesvári

Share