A partnership between Amii and Canada's National Research Council explores how AI and machine learning can help feed a growing global population. The project harnesses the power of machine learning and plant genomics to develop crops that can increase crop yields, use fewer resources and weather the increasing effects of climate change on food production.
"According to UN’s Food and Agriculture Organization of the United Nations, we need to double our food production than what we're producing," says Jubair Sheikh, the Amii machine learning scientist leading the project.
"And we have less agricultural land now, so we need to increase the food production or yield values. Gene editing combined with genomic selection is one of the ways to do it."
The potential of plant genomics
From the moment humans started growing crops, we've selectively bred plants to encourage desirable traits in our food — anything from higher yields to better taste to a pleasing appearance. After thousands of years of selective breeding, most crops we plant have very little in common with their wild ancestors.
However, plant genomics and gene editing mean that food producers can accomplish much more than with selective breeding alone. Genomics can help identify the genetic basis of a trait, and gene editing allows this knowledge to be applied to alter the trait using the same type of mutations that occur in nature.
Amii and the NRC have worked together on several projects surrounding plant genomics. The most recent one involves using a machine-learning model trained on gene sequence and expression differences between species to predict how a gene will be expressed directly from the genetic sequence.
DNA is encoded with a lot of information, some only used under certain conditions. Gene expression is the process by which that information is used and turned into a function, which influences the cells of the organism.. Shiekh explains that gene expression can be "up-regulated" or "down-regulated," which makes that expression more or less likely.
So, he says, if a particular plant has an up-regulated gene for drought resistance, it means that crop is more likely to withstand dry weather. If it is down-regulated, that trait is less likely to appear.
Adapting to a changing climate
The latest Amii/NRC partnership uses machine learning to compare the genes between three related crops — Pea, Faba and Medicago— to predict expression level of a certain gene of one species when compared to the ortholog of the other species. Since the machine learning model is trained on the functional properties of the gene sequence, the models are expected to capture the language of gene expression and help explain the biology of the process as well.
The Amii machine learning model is trained on the proprietary data of NRC in Pea, Faba and Medicago, which allows it to have strong performance on legume crops. While plant genetics can also be explored with non-ML methods, Sheikh says machine learning is much more effective and uses far less time since the other methods may need lab experiments or field experiments. That's important, he adds, because traditional genetic methods of identifying genes can be extremely time-consuming and costly.
“Machine learning allows us to add complementary biological information such as impacts on gene expression into the identification of causal variants and genes, and it also allows us to integrate this information into genomic prediction models," said NRC's Associate Research Officer, David Konkin.
Having more robust predictive frameworks for crops is important for responding to changing climate. Current strategies usually involve planting fields of test crops and waiting until the plants have grown. It can take multiple growing seasons to get reliable results. But in a world with shifting weather patterns due to climate change, that can mean that the data is quickly out-of-date.
"If you plant test fields, it might take five years [to build new cultivars.] Now, imagine there is a drought, and the plant isn't getting as much water as it did five years ago. You've already lost five years in the field. I'm not saying what you did is obsolete, but it is almost there", Sheikh says.
By using the machine learning model to predict gene expressions, researchers can focus their testing on the most promising candidates, saving time and resources.
While the results have been promising, Sheikh says they are working to improve the models to give greater accuracy. He is also optimistic that the model they designed to examine maize and sorghum could be retrained to be used on other crops. There has also been some success, he says, in using the work to predict the gene expressions of plants that are not related to one another, opening up new possibilities for the future of food production.
Authors
Jubair Sheikh