Detecting Organic Compounds Using Visible Light
Scientists from Universidad de Santiago de Chile and the University of Notre Dame have developed a machine learning-based approach to recognize organic compounds by their refractive index at a particular optical wavelength. This method has the potential to automate chemical analysis and make it more affordable, safer, and less reliant on specialized skills. As a result, it may have research and industrial uses.
Methods and steps in the creation of the organic compound identification machine
The scientists published a paper titled “Machine learning identification of organic compounds using visible light” in The Journal of Physical Chemistry A. In this paper they describe their innovative approach to collecting a distinct dataset and constructing a prototype organic chemistry sensor using the techniques outlined.
The scientists trained machine learning on a publicly accessible database of optical experiments containing published data from scientific literature dating back to 1940. They discovered all the necessary parameters in the database to build identification profiles for 61 organic molecules, including group velocity and dispersion, measurement wavelength range, sample state, refractive indexes, and extinction coefficients over a broad range of wavelengths. They used a total of 194,816 spectral records of refractive index and extinction curves from the 61 organic compounds and polymers in the database.
In a standard infrared (IR) molecular classification sensor, analysis of its Raman absorption and scattering peaks identifies the molecule, resulting in a unique fingerprint that matches to a database. However, the static refractive index of organic compounds is a single characteristic that lacks the same level of encoded information. Similarly, refractive index databases at individual wavelengths outside of the ultraviolet and infrared absorption resonances do not provide enough information, which may explain why they did not use visible light to classify organic molecules.
The initial tests
After initial testing with raw data, which resulted in an accuracy rate of 80%, the scientists aimed to improve the results further. They discovered that the original database was not optimized for machine learning, as much of the information came from research conducted prior to the advent of home computers. The database contained a vast amount of information on wavelengths in the ultraviolet and infrared ranges, which the AI was being cross-trained on. As a result, the researchers decided to adopt a more targeted approach.
The scientists used various data preprocessing techniques to simulate an ideal learning environment for the AI. They aimed to develop a balanced data set to prevent the AI from giving preference to particular features due to the amount of information available. To minimize the impact of IR wavelengths on the overall data set, they employed oversampling, undersampling, and physical-based augmentation techniques. By training the AI on the balanced preprocessed data, the researchers were able to achieve molecular classification testing accuracies above 98% in the visible regions.
More studies are needed according to the researchers
The scientists acknowledge that further research is necessary to broaden and generalize the classifier to recognize the structural and chemical properties of the molecules found in the Refractive Index Database. In conclusion, they note that their work serves as a promising foundation for creating remote chemical sensors.
Read the original article on PHYS ORG
Read More: Breaking the C-H bonds in Hydrocarbons to Synthesize Complex Organic Molecules