Development of machine learning algorithms and big data analysis in the field of X-ray absorption spectroscopy

РФФИ. Перспектива. Бугаев. 2019-2021
From 01.10.2019 till 31.10.2022
Grant holder: Aram Bugaev

Development of modern synchrotron sources and beamline equipment allows measuring X-ray absorption spectra with milli- and micro-second time resolution, which results in the increase of the total amount of spectra collected during a single experiment and make researchers to deal with big datasets of spectra. This project is aimed to develop a complex technique for automated and semi-automated analysis of big data in the field of X-ray absorption spectroscopy by applying the state-of-the-art algorithms for statistical analysis including multivariate curve resolution alternating least-squares method, and algorithms for machine learning and artificial intelligence. As the result, we will establish a software, that will allow online processing big datasets of experimental spectra during the measurements. In particular, such software will be installed on the beamline computers to immediately provide the users qualitative and quantitative information about the sample under investigation.

In the first year of the project, the program codes which implemented a number of modern statistical approaches for big data analysis, including MCR-ALS, and methods based on supervised machine learning algorithms were developed and successfully tested on different experimental and theoretical spectral datasets. For big data analysis, a wide range of statistical criteria were used, allowing for determination of the number of independent states of the system upon its evolution induced by temperature or pressure changes and during chemical reactions. This approach is especially useful for processing big experimental datasets measured in situ and in operando, as well as with high time resolution. The approach does not require reference spectra for subsequent analysis. Machine learning methods were based on the computation of huge theoretical libraries of spectra used to train the ML algorithm, which can further correlate each experimental spectrum with the corresponding atomic structure of the sample. The main feature of the method is the ability of its application not only to EXAFS, but also to XANES spectra, whose analysis in the literature is usually limited to qualitative comparison with the reference spectra. The developed theoretical approaches were successfully applied to solve several problems of practical importance: determination of surface and bulked oxides in palladium nanoparticles formed in presence of oxygen and hydrogen, revealing the dynamic structure of Ni active sites during ethylene dimerization reaction, and resolving the 3D atomic structure of gold complex. The results have been reported at international conferences and prepared for publication in high-impact peer-reviewed journals.