Canada CIFAR AI Chair Mark Schmidt has been awarded a 2025 Dorothy Killam Fellowship to support his work on improving the fundamental elements of machine learning. This work could have enormous impacts on the way that we use artificial intelligence.

The fellowships are awarded each year to researchers “whose superior, ground-breaking and transformative research stands to positively improve the lives of Canadians.”

“I'm still pretty shocked. they must get a large number of very good applications,” Schmidt says.

“So, I feel really lucky that mine was chosen out of the bunch.”

In addition to the recognition of his work, the fellowship also provides funding to support Schmidt’s research, which has recently focused on optimizing machine learning models. Much of the focus has been on determining the optimal learning rates for large models.

Learning how to learn

In machine learning, hyperparameters are settings that can be configured to change how a model learns from data. The learning rate hyperparameter controls how much a model changes values while it is being trained: effectively, how fast a model learns during training. There is no universal best learning rate for all models—it depends on many factors, including the dataset, the training methods and other variables.

Finding the right learning rate for a particular application can be time-consuming and difficult. It can often involve trying out different sets of parameters, which is slow and expensive. Schmidt uses the example of a large language model, like ChatGPT. When training the model, machine learning scientists will use heuristics to set an individual learning rate for each of the model’s variables. However, this is a blunt approach, which means the learning rate for each variable is sub-optimal, making training inefficient.

It’s possible to monitor learning rates to find out which ones could be adjusted, but that’s a tall order when talking about the billions of variables that make up something like a large language model. To make it even more complicated, those learning rates are not independent: adjusting one variable’s learning rate could change what the optimal rates are for other variables.

Schmidt is working towards methods that could update the learning rate automatically as the model is trained. This would fundamentally advance machine learning model training and have enormous applications in areas like AI in healthcare, engineering, and scientific discovery.

“I just think the potential impact is enormous,” he said. “If you've heard of DeepSeek … they very cleverly trained a large language model and they showed that you can make it dramatically cheaper. And I think there are more opportunities to do stuff like that. We're hopefully in a good position to do something like that.”

New approach, old math

So far, the work has been promising. Schmidt’s lab published a paper late last year after one of his students developed a method that could find connections between sets of learning rates: if a set of a billion learning rates needs to be adjusted, it could also suggest other groups of learning rates that might need similar tweaks. Schmidt says the student’s method of determining different rates had roots in the Jacobi Method, a mathematical approach developed in the 1840s.

“So, this is an old topic, and my students made this insight and it led to this new method. I'm really shocked by it. I think it's one of the coolest things I've ever been involved with.”

In addition to the funding provided by the Killam Fellowship, Schmidt says it also provides teaching relief, which will allow him to focus more on research. He notes that he’s already been able to hire new graduate students to help explore methods for optimizing per-variable learning rates.

This spring, Schmidt will present some of his work on machine learning optimization at Amii’s Upper Bound conference in Edmonton, May 20 - 23.