Skip to content

Text-and-Data-Mining (TDM) Thieme Chemistry content

Success in Digital Chemistry today depends on data quality. Thieme and its curated knowledge database Science of Synthesis (SoS) is able to provide chemical reaction and structure data in Organic Synthetic Chemistry to an unprecedented level of accuracy. Research organizations, academic institutions and companies are using machine-learning (ML) and artificial intelligence (AI) techniques to explore scientific data, train algorithms and create knowledge from datasets as part of text-and-data mining (TDM). Thieme Chemistry can help support you with this task by providing highly standardized and structured organic synthesis information, including XML, .cdx and SDF/RDF files.

Science of Synthesis/Thieme data prove their accelerating potential in a fruitful collaboration with IBM Research

In 2018 IBM launched the RXN for Chemistry cloud platform to help synthetic organic chemists in predicting the outcome of chemical reactions using an artificial intelligence (AI) model, called Molecular Transformer. Earlier in 2021 IBM Research and Thieme Chemistry incorporated expert synthesis data from Thieme’s curated digital publication source on organic chemistry – Science of Synthesis – into RXN for Chemistry.

Initial results show that Thieme-trained models predict correct reactions approximately three times better for forward prediction and nine times better for retrosynthesis against baseline models when tested on Science of Synthesis chemistry.*

Thieme Chemistry content including Science of Synthesis datasets in key figures

Science of Synthesis contains approx.

  • 440,500 reactions
  • 2,147,795 molecules
  • 55,000 experimental procedures
  • 2,200 articles
  • 86,000 printed pages covering synthetic methods in organic and organometallic chemistry

Boost results of ML and AI Projects with Thieme Chemistry content

Unleash the full potential of quality and curated knowledge of databases such as Science of Synthesis by leveraging ML and AI techniques with a wide range of scope:

  • Evidence-based research
  • The evaluation of reactions
  • Synthesis design
  • Drug design
  • Pattern recognition
  • Pattern analysis
  • Modelling
  • Substructure and similarity searches
  • Discovery and innovation
  • New insights/knowledge
Dr. Teodoro Laino

“The ultimate quality of the data used in model training will determine the future adoption of AI tools in chemical synthesis. Integrating high-quality, curated data from Science of Synthesis provides a once-in-a-lifetime opportunity to boost the performance of RXN for chemistry to unprecedented levels while also unleashing the entire knowledge value contained in hundreds of thousands of high-quality chemical reaction records.”

Dr. Teodoro Laino, IBM Research Europe, Switzerland

Prof. Margaret Brimble

“This innovative SOS/IBM platform provides an efficient tool for synthetic chemistry researchers to provide validation for their own retrosynthetic plans whilst also being presented with alternative solutions. The platform enables a rigorous assessment for the retrosynthetic design phase of a given synthesis which no doubt will pay dividends when the selected synthetic plan is implemented.”

Dame Margaret Brimble, Professor and Director of Medicinal Chemistry at the University of Auckland, New Zealand.

Prof. Richmond Sarpong

“It is great to work with IBM/Thieme to refine this important advance in using the vast data that has been gathered over the ages and machine learning tools to streamline the chemical synthesis of complex molecules. A sustainable future for synthesis will include minimizing the number of unproductive strategies that are pursued by running only those reactions that lead to a productive end. This is only possible through the marrying of computer designed and human designed efforts, which makes this collaboration exciting.”

Richmond Sarpong, Professor of Chemistry at the University of California Berkeley

How Thieme Chemistry content adds to your success

By applying TDM skills or training AI with Thieme Chemistry content data you could potentially profit in many ways. Thieme’s cooperation with IBM RXN prove the following Science of Synthesis characteristics to be based on evidence*:

  • Inspiring: Find a greater diversity in reaction coverage in comparison to patent data.
  • Reliable: Science of Synthesis data show the most reliable synthetic transformations available. It is curated by expert chemists over a 20-year period.
  • Comprehensive: Science of Synthesis data covers yields and conditions, reaction reactants, products, reagents, and catalysts. Detailed and proven experimental procedures also available.
  • Consistent: Science of Synthesis data shows an exceptionally consistent quality and structure because of the high-quality and comprehensive scientific edit (use of chemistry nomenclature, detailed reaction schemes including solvents and catalysts across all records).
  • Improved results: AI models retrained by Thieme data give better results when evaluated by top academic natural-product research groups and retrosynthesis experts worldwide.
  • Exclusive: The unique Science of Synthesis dataset is not available in the public domain.
  • Expert: Profit from over 20 years of work in the compilation of synthetic methods by over 2,000 expert authors worldwide.*


Powering Molecular Transformers with High Quality Data: IBM RXN for Chemistry meets Science of Synthesis and Synfacts from Thieme

Watch the recording!
More Info

IBM Research Europe

The results from the IBM Research Europe and Thieme Chemistry collaboration, connecting information from Science of Synthesis and Synfacts to IBM’s Molecular Transformer AI model showed an increase in chemical reaction prediction accuracy:

Thieme trains IBM RXN for Chemistry with high-quality data


Train your algorithm and advance your company's or institute's research!

Interested in using authorative Science of Synthesis data?
Please contact:

Back To Top