Why Machine Learning Can Find a New Material, But Not a Needle in a Haystack

Join us for the FAIRmat seminar on September 15, 2023 at 11:00, in IRIS Adlershof (Berlin) and online with speaker Kevin Jablonka.


The space of possible materials is unimaginably large.

To find our way in this space, it would be nice to have a map that can guide us. In this presentation, we show that machine learning can provide us with such a map [1]. We can use machine learning to encode patterns that are tacit or hidden in a large number of dimensions of this chemical space and then use it to guide the design materials.

The simplest application of this navigation system is to predict properties that are hard to predict with conventional quantum chemistry or molecular simulation alone [2, 3].

Once we have this in place, we can use it to most efficiently gather information about structure-property-function relationships. A key difficulty here is, however, that we often have to deal with multiple, often competing objectives. For instance, increasing the reactivity often decreases the selectivity. Interestingly, one can show that using a geometric construction one can also effectively, and without bias, use machine learning to dramatically accelerate materials design and discovery in such a multiobjective design space [4].

It is important to realize, however, that machine learning relies on data that a machine can use [5]. Towards this goal, we need to develop infrastructure to allow for the capture without overhead while providing chemists with tools that simplify their daily work [4, 5].

A challenge, however, is that data often cannot be easily collected in this nice tabular form.

Recent advantages of the application of large language models (LLMs) to chemistry indicate that they might be used to address this challenge.

I will showcase how LLMs can autonomously use tools, leverage structured data as well as soft inductive biases, and in this way transform how we model chemistry. [6, 7]


[1] Jablonka, K. M.; Ongari, D.; Moosavi, S. M.; Smit, B. Big-Data Science in Porous Materials: Materials Genomics and Machine Learning. Chem. Rev. 2020, 120 (16), 8066–8129.

[2] Jablonka, K. M.; Ongari, D.; Moosavi, S. M.; Smit, B. Using Collective Knowledge to Assign Oxidation States of Metal Cations in Metal–Organic Frameworks. Nat. Chem. 2021, 13 (8), 771–777.

[3] Jablonka, K. M.; Moosavi, S. M.; Asgari, M.; Ireland, C.; Patiny, L.; Smit, B. A Data-Driven Perspective on the Colours of Metal–Organic Frameworks. Chem. Sci. 2021, 12 (10), 3587–3598.

[4] Jablonka, K. M.; Jothiappan, G. M.; Wang, S.; Smit, B.; Yoo, B. Bias Free Multiobjective Active Learning for Materials Design and Discovery. Nat Commun 2021, 12 (1), 2312.

[5] Jablonka, K. M.; Patiny, L.; Smit, B. Making the collective knowledge of chemistry open and machine actionable, Nat. Chem. 2022.

[6]Jablonka, K. M.; Ai, Q.; Al-Feghali, A.; Badhwar, S.; Bran, J. D. B. A. M.; Bringuier, S.; Brinson, L. C.; Choudhary, K.; Circi, D.; Cox, S.; de Jong, W. A.; Evans, M. L.; Gastellu, N.; Genzling, J.; Gil, M. V.; Gupta, A. K.; Hong, Z.; Imran, A.; Kruschwitz, S.; Labarre, A.; Lála, J.; Liu, T.; Ma, S.; Majumdar, S.; Merz, G. W.; Moitessier, N.; Moubarak, E.; Mouriño, B.; Pelkie, B.; Pieler, M.; Ramos, M. C.; Ranković, B.; Rodriques, S. G.; Sanders, J. N.; Schwaller, P.; Schwarting, M.; Shi, J.; Smit, B.; Smith, B. E.; Van Heck, J.; Völker, C.; Ward, L.; Warren, S.; Weiser, B.; Zhang, S.; Zhang, X.; Zia, G. A.; Scourtas, A.; Schmidt, K. J.; Foster, I.; White, A. D.; Blaiszik, B. 14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon. arXiv June 9, 2023.

[7] Jablonka, K. M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Is GPT-3 All You Need for Low-Data Discovery in Chemistry? 2023. 


photo of Kevin Jablonka Kevin Jablonka obtained his bachelor’s degree in chemistry at TU Munich. He joined EPFL for his master’s studies (and an extended study degree in applied machine learning), after which he joined Berend Smit’s group for a Ph.D. He now leads a research group at the Helmholtz Institute for Polymers in Energy Applications of the University of Jena and the Helmholtz Center Berlin. Kevin’s research interests are in the digitization of chemistry. For this, he has been contributing to the cheminfo electronic lab notebook ecosystem. He also developed a toolbox for digital reticular chemistry. Using tools from this toolbox, he addressed questions from the atom to the pilot-plant scale. Kevin is also interested in using large language models in chemistry and co-leads the ChemNLP project (with support from and Stability.AI).


Registration is closed now!