Skip NavigationSkip to Content

Dr. Marc Nicklaus

Dr. Marc Nicklaus
Computer-Aided Drug Design

Research Summary

Computer-Aided Drug Design. The Computer-Aided Drug Design (CADD) Group is a research unit within the Chemical Biology Laboratory (CBL) that employs, analyzes, and develops computer-based methods to aid in the drug discovery, design, and development projects of the CBL and other researchers at the NIH. We split our efforts about evenly between support-type projects and research projects initiated and conducted by CADD staff members. We are implementing many projects, and making available resources developed by the CADD Group, in a Web-based manner. This offers three advantages: (1) it frees all users, including the group members themselves, from platform restraints and the concomitant expenses for specific software/hardware, (2) it makes resources and results immediately available for sharing among all collaborators regardless of their location, and (3) helps, without additional effort, further the mission of the NCI as a publicly funded institution by providing data and services directly to the (scientific) public. The following research areas and projects form part of the portfolio of the CADD Group.

Synthetically Accessible Virtual Inventory (SAVI). Aggregated libraries on the order of 100 million on-the-shelf compounds are available in the commercial market for in silico screening of screening samples for computer-aided drug design. Still, this represents only a microscopically small fraction of the drug-like small-molecule space, estimated to be on the order of 1060 possible structures or even larger. To tap into this huge chemical space, we have created the SAVI database of 1.75 billion compounds predicted to be easily synthesizable. They have been created by a set of transforms based on an adaptation and extension of the CHMTRN/PATRAN programming languages describing chemical synthesis expert knowledge, which originally stem from the LHASA project. The chemoinformatics toolkit CACTVS was used to apply a total of 53 transforms to about 150,000 readily available building blocks from Enamine. Only single-step, two-reactant syntheses were calculated for this database even though the technology can execute multi-step reactions. The possibility to incorporate scoring systems in CHMTRN allowed us to subdivide the database into sets according to their predicted synthesizability, with the most-synthesizable class comprising 1.09 billion synthetic products. Properties calculated for all SAVI products show that the database should be well-suited for drug discovery. It has been made available for free download from

Tautomerism. The CADD Group has been doing significant research on tautomerism, the existence of multiple possible forms of the same molecule that are capable of interconverting via an intramolecular movement of atoms and reconfiguration of bonds. We have collected about 90 different transforms of tautomeric interconversions, comprising prototropic, ring−chain, and valence tautomerism. The majority of these rules were extracted from experimental literature. A web tool has been created to for users apply these rules to their molecules at We have analyzed these rules against an aggregated set of over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules’ enumerated tautomer sets by the InChI chemical identifier. These results are the scientific background of IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2.

Chemical Identifier Resolver (CIR). CIR works as a resolver for many different chemical structure identifiers (e.g., chemical names, InChI, SMILES, etc.) and allows one to convert the given structure identifier into a full structure representation or another structure identifier including references to particular databases in which the corresponding structure or structure identifier occurs. CIR offers a simple to use, programmatic application programming interface (API) based on URLs requested by HTTP. This allows easy linking of CIR and its content to other scientific web services and program packages. CIR currently provides access to 120 million structure records.

Enhanced NCI Database Browser. The Enhanced NCI Database Browser can be used to search the 250,000-compound Open NCI Database. This dataset is the publicly available part of the half-million structure collection assembled by the NCI's Developmental Therapeutics Program during the program's 50+ years of screening compounds against cancer and, more recently, AIDS. Visit the CADD Group's home page or the Enhanced NCI Database Browser service for more information.

Fundamentals of Protein-Ligand Interactions. The non-covalent binding of a drug to the binding site of an enzyme (or other biomacromolecule) is the fundamental process of most drug actions. In spite of a vast body of experimental data available on protein-ligand complexes, mostly obtained by X-ray crystallography, there are still open questions of how this binding process occurs at the atomic and quantitative energetic level. One of the issues is the range of conformational energies one can expect to find for the small-molecule ligand bound to proteins, which we found to be higher than generally assumed. This has led us to broader questions regarding x-ray crystallographic methodologies, such as whether quantum-mechanical refinement (or re-refinement) of protein ligand structures may improve structural quality in various ways.

HIV Integrase. A long-standing interest of our group has been HIV integrase (IN) as a drug development target. This enzyme catalyzes the integration of the viral DNA into the human DNA, which is an essential step in the viral replication cycle. Only a handful of approved drugs so far are based on IN inhibition. We have been utilizing all available experimental results, be they structural, mechanistic, or biochemical, to model and better understand inhibition of IN by small molecules.

Among our main collaborators are Wolf-Dietrich Ihlenfeldt, Xemistry, Germany; Vladimir Poroikov, Russian Academy of Medical Sciences, Moscow; Philip Judson, Lhasa Ltd.; and Raul Cachau, Leidos, FNLCR.
1 - 5 of 161 results

1)  Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV.
Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods.
J Chem Inf Model. 61: 653-663, 2021. [Journal]

2)  Dhaked Devendra K, Guasch Laura, Nicklaus Marc C.
Tautomer Database: A Comprehensive Resource for Tautomerism Analyses.
J Chem Inf Model. 60: 1090-1100, 2020. [Journal]

3)  Dhaked Devendra K, Ihlenfeldt Wolf-Dietrich, Patel Hitesh, Delannée Victorien, Nicklaus Marc C.
Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2.
J Chem Inf Model. 60: 1253-1275, 2020. [Journal]

4)  Judson Philip N, Ihlenfeldt Wolf-Dietrich, Patel Hitesh, Delannée Victorien, Tarasova Nadya, Nicklaus Marc C.
Adapting CHMTRN (CHeMistry TRaNslator) for a New Use.
J Chem Inf Model. 60: 3336-3341, 2020. [Journal]

5)  Stolbov Leonid, Druzhilovskiy Dmitry, Rudik Anastasia, Filimonov Dmitry, Poroikov Vladimir, Nicklaus Marc.
AntiHIV-Pred: web-resource for in silico prediction of anti-HIV/AIDS activity.
Bioinformatics. 36: 978-979, 2020. [Journal]