Chemical engineers constantly need reliable property data for process design development and optimization. This information is predominantly coming from scientific publications. Thousands of papers are published in this field every year, and the publication rate is constantly increasing. The large number of data sources in combination with a wide variety of industrially important chemicals and associated properties, as well as the absence of commonly accepted machine-readable data formats, requires tremendous human efforts to acquire the desired information from scientific papers, analyze the quality and reliability of the reported properties, and deliver the data in a well-structured form to the final industrial users. Recent advances in AI (Artificial Intelligence), specifically in NLP (Natural Language Processing) and ML (Machine Learning), are promising for developing large-scale automated processing of scientific publications to extract property information and all associated metadata, which would require minimum human oversight. To build this complex system, AI techniques can be applied on various stages of the process: search and classification of papers containing relevant information, recognition and processing of tables, extraction of property data from plots, analysis of paper content and extraction of metadata (substances, description of their samples, experimental methods, uncertainties, etc.). This processing system can also be expanded to assist reviewers in their assessment of scientific manuscripts, which is in extremely high demand now.
Thermodynamics Research Center (TRC) at NIST collects, stores, and evaluates published experimental property data from open literature as part of its activities: specifically, thermophysical and thermochemical property data for pure compounds, binary and ternary mixtures, and chemical reactions. The TRC database, which is used by thousands of researchers and chemical engineers worldwide, currently contains more than 7 million data points for more than 25,000 compounds and 80,000 mixtures, along with the source documents, which can be used as a training system for development of AI methods for addressing the needs indicated above. Initial developments in classification of scientific publications by their relevancy have been done at TRC. A successful applicant is expected to have a strong background in computer sciences, particularly in AI, NLP, and ML. No specific knowledge in chemical engineering, thermodynamics, or thermophysics is required.
Artificial Intelligence; Natural Language Processing; Machine Learning; Thermophysical and Thermodynamic Property Data