{"id":21346,"date":"2024-05-20T12:08:48","date_gmt":"2024-05-20T12:08:48","guid":{"rendered":"https:\/\/clinlabint.com\/?p=21346"},"modified":"2024-08-22T08:56:36","modified_gmt":"2024-08-22T08:56:36","slug":"new-ai-language-model-nach0-bridges-biomedical-text-and-chemical-data","status":"publish","type":"post","link":"https:\/\/clinlabint.com\/new-ai-language-model-nach0-bridges-biomedical-text-and-chemical-data\/","title":{"rendered":"New AI language model nach0 bridges biomedical text and chemical data"},"content":{"rendered":"
\n

\r\n\"Bio-Rad<\/a>\r\n<\/p>\n<\/div><\/section><\/div>

<\/p>\n<\/div><\/section>
\n

New AI language model nach0 bridges biomedical text and chemical data<\/h1>AI<\/a><\/span>, Insilico Medicine<\/a><\/span>, LLM<\/a><\/span>, molecular generation<\/a><\/span>, nach0<\/a><\/span>, E-News<\/a>, Molecular Diagnostics<\/a> <\/span><\/span><\/header>\n<\/div><\/section>
\n

Researchers from Insilico Medicine and NVIDIA have developed a novel large language model (LLM) called nach0 that can understand and generate both biomedical text and chemical structural data.<\/h3>\n

<\/p>\n

\"nach0<\/a>

Novel nach0 AI can understand and generate both biomedical text and chemical structural data<\/p><\/div>\n

This multi-domain, multi-task transformer was trained on a massive dataset combining PubMed abstracts, patent descriptions, and simplified molecular-input line-entry system (SMILES) representations of chemical structures.<\/p>\n

While existing biomedical LLMs like BioBERT and SciFive excel at natural language processing of biomedical text, they lack integrated abilities to work with chemical structure data. Conversely, chemically-aware models to date have had limited training on diverse biomedical text sources. Nach0 is the first LLM designed from the ground up to operate fluently across both domains.<\/p>\n

The model was trained on 100 million biomedical documents from PubMed and patent sources totalling 355 million text tokens, as well as 2.9 billion patent SMILES strings converted into 4.7 billion chemical tokens. Special annotation symbols were used to encode the SMILES representations.<\/p>\n

What tasks can nach0 perform?<\/strong><\/h2>\n

Nach0 can perform a variety of tasks spanning natural language processing like document classification and question answering, molecular property prediction, molecular generation, reagent prediction, and cross-domain capabilities such as description-guided molecular design.<\/p>\n

In benchmark evaluations, nach0 significantly outperformed general LLMs like ChatGPT on molecular tasks while delivering competitive performance on biomedical text processing compared to specialized models like FLAN and SciFive.<\/p>\n

Automating molecular discovery<\/strong><\/h2>\n

Case studies demonstrated nach0\u2019s ability to generate molecular structures for potential diabetes drugs based on prompts describing the desired biological activity, mechanism of action, synthesis route, and properties. The model rapidly produced chemically sensible molecules satisfying the criteria within minutes.<\/p>\n

\u201cNach0 represents a major advance in automating molecular discovery and design through natural language interaction,\u201d said Alex Zhavoronkov, CEO of Insilico Medicine. \u201cWe envision further enhancing it with protein sequence data and using transfer learning to specialize for new applications.\u201d<\/p>\n

The model leverages the NVIDIA BioNeMo<\/a> platform, taking advantage of its data loading, natural language processing, and generative AI capabilities optimized for biology and chemistry workloads.<\/p>\n

As models like nach0 continue to evolve, they may provide powerful molecular design assistance while reducing the need for extensive human supervision compared to traditional computational chemistry methods.<\/p>\n

\u201cWe anticipate that as nach0 evolves, it will require less supervision, and it will be able to simply generate and validate promising therapeutic options for medicinal chemists,\u201d says Maksim Kuznetsov, a senior research scientist at Insilico and one of the paper\u2019s lead authors.<\/p>\n

The nach0 framework is available for research purposes:<\/h4>\n

nach0 base is available via: https:\/\/huggingface.co\/insilicomedicine\/nach0_base<\/a>;
\nnach0 large is available via:
https:\/\/huggingface.co\/insilicomedicine\/nach0_large<\/a>;
\nfor pre-processing scripts, see:
https:\/\/github.com\/insilicomedicine\/nach0<\/a>.<\/p>\n