Scientists from MIPT (Moscow Institute of Physics and Technology) and MSU (Moscow State University), under the leadership of Yan Ivanenkov, were first to develop a computer model predicting agrochemical activity – the beneficial influence of simple molecules on the plants. With the help of an independent test set and the results of their own study, they showed that the model has a high predictive power. The work is published in the scientific journal Phytochemistry.
To construct the model, the authors used the methods of machine learning, in particular — Kohonen self-organizing maps. The unique sample was used as a training sample, comprising 1,800 carefully selected agrochemicals. As information sources, the authors used patents, scientific publications and specialized databases. It is important to note that the model was also able to predict the molecules activity class (specifically: what impact they will have on the plant), whereby this prediction was made with a fairly high degree of accuracy — 87%, and the prediction accuracy of the molecules activity was 67%.
The molecules of interest (from the point of view of agrochemistry) can be divided into 2 categories: pesticides (which fight against insects, weeds and fungi) and plant growth regulators (which stimulate or inhibit their growth). In order to discover a new active molecule from a group, the scientists conduct costly experiments — they synthesize a large number (usually several thousand) different molecules, and then check their impact on the cells or whole plants. However in a significant percentage of cases, such experiments do not produce desirable results — at best, active molecules amount only to a few tens. In other words, the task now is to utilize the model using the significantly reduced number of molecules (comparing with their initial number) which are available for further experiments. This will significantly reduce both the time and financial costs in the search of active molecules.
In their work, the authors have used the image of a chemical space in which each molecule is described as a set of specific parameters (molecular descriptors) for modeling. The value of such a descriptor reflects a particular property of the molecule – its solubility, size, polar surface area, etc. Each molecule in the chemical space is defined (coded) by a set of such parameters as a point having certain coordinates on the plane.
Using Kohonen algorithm, without any teacher, you can reduce the dimensionality of this data with the least error (this stage is called learning algorithm) and visualize the result in the form convenient for analysis of the two-dimensional map, on which you can highlight one-by-one the areas occupied by molecules of different categories. Then, using this map, you can evaluate the classification ability of the model. If this ability is high (for example, for such large-scale tasks it is greater than 70%), then the model can be tested with the use of an independent test set of molecules which have not been involved in the learning process. That is what the authors of the work achieved, clearly demonstrating that their model is able to predict the specific activity of new molecules, relating them to one of the commonly accepted categories: herbicides, plant growth regulators, etc.
"It is important to note that the model has good differential predictive power, and it is the first one in the field of agricultural chemistry built with the use of such an impressive learning sample set. In the course of work, we, together with colleagues from the Laboratory for the Development of Innovative Drugs, were able to test the model using the real test results which we obtained ourselves. In the future, we plan to enhance the learning model and improve its predictive ability — possibly with the use of other machine learning algorithms," commented Yan Ivanenkov, the lead author and head of the MIPT's Laboratory of Medical Chemistry and Bioinformatics, while talking about the main results of their work and their future plans.
In the future, similar computational models will significantly reduce the cost of the search for new active molecules and contribute to the understanding of mechanisms of their work.