top of page
databases-for-deep-learning-upscaled.jpg

OUR DATA

Where AI Meets Expertise: Uniting Cutting-Edge Large Language Models

with Human Expertise for Unparalleled Diversity and Reliability of Data

Over 7500 scientific papers have been manually reviewed by our team of experts to construct a micronucleus in vitro/in vivo dataset!

Regular Updates and Advanced Data Mining

Our databases are systematically updated, drawing from a comprehensive array of both open and private sources.

The chemical space is represented with pharmaceuticals, industrial chemicals, pesticides, biocides, flavoring agents, food additives, cosmetic ingredients, natural products and other diverse chemicals used in different R&D. 

We leverage a customized Large Language Model to meticulously extract pertinent information from over 35 million scientific papers, significantly enhancing the quantity and diversity of our databases.

 

Data quality

The quality of data plays a pivotal role in the development of a robust and reliable QSAR model. All our datasets are manually reviewed and normalized by experts, adhering to the Klimisch criteria. The Klimisch criteria is founded on a scoring system designed to evaluate the reliability of data. This rating system was initially formulated for toxicological and ecotoxicological studies. Subsequently, it has been expanded to encompass physicochemical studies and is currently recognized and accepted by numerous regulatory authorities and organizations.

Following manual expert review, the datasets, representing endpoints or properties outlined in regulatory guidelines and aligning with standardized test procedures, receive a score of 1, placing them within the 'reliable without restriction' category. Our experts retain only data that aligns or is comparable to guideline studies, preferably conducted according to GLP, ensuring that test procedures adhere to national standard methods and generally accepted scientific standards, all described in detail. The datasets representing endpoints or properties obtained in non-GLP studies, but conducted comparable to guideline studies with acceptable restrictions and/or utilizing test procedures in accordance with national standard methods, also with acceptable restrictions, or well-documented studies adhering to generally accepted scientific principles and deemed acceptable for assessment, or utilizing accepted calculation methods, receive a score of 2 and are categorized as 'Reliable with restriction' data. 

Aqueous solubility (logS)

98450

 Mutagenicity (AMES test)

9166

Genotoxicity (in vitro micronucleus assay)

981

Genotoxicity (in vivo micronucleus assay)

1222

Carcinogenicity (rat)

1481

Carcinogenicity (mouse)

1298

Hepatotoxicity (DILI)

1283

Neurotoxicity (AChE inhibition)

4785

Nephrotoxicity

964

Human Intestinal Absorption

883

 Human intestinal permeability (CACO-2)

6146

Plasma protein binding

4816

 Blood Brain Barrier permeability

3941

Acute oral toxicity

13727

Drug induced nephrotoxicity

964

Carcinogenicity (rodent)

1778

Developmental toxicity

Forthcoming

CYP1A2 substrates

1187

CYP2C9 substrates

1281

CYP2C19 substrates

1235

CYP2D6 substrates

1317

CYP3A4 substrates

1889

CYP1A2 inhibitors

11196

CYP2C9 inhibitors

11044

CYP2C19 inhibitors

11849

CYP2D6 inhibitors

12169

CYP3A4 inhibitors

15238

Plasma clearance

Forthcoming

Microsomal stability

Forthcoming

Renal clearance

Forthcoming

Estrogen receptor

Forthcoming

Androgen receptor

Forthcoming

Thyroid Toxicity

Forthcoming

bottom of page