APPLICATIONS OF MACHINE LEARNING FROM CONSTRUCTING THE DATABASE TO THE LEAD DISCOVERY: PRACTICAL APPROACHES ON CANCER-RELATED PROTEINS

Said MOSHAWIH; Long Chaiu MING; Nurolaini KIFLI; Hui Poh GOH

doi:10.29228/jrp.552

Editor-in-Chief Hatice Kübra Elçioğlu Vice Editors Levent Kabasakal Esra Tatar Online ISSN 2630-6344 Publisher Marmara University Frequency Bimonthly (Six issues / year) Abbreviation J.Res.Pharm. Former Name Marmara Pharmaceutical Journal

Journal of Research in Pharmacy 2023 , Vol 27 , Issue Supp.

APPLICATIONS OF MACHINE LEARNING FROM CONSTRUCTING THE DATABASE TO THE LEAD DISCOVERY: PRACTICAL APPROACHES ON CANCER-RELATED PROTEINS

Said MOSHAWIH¹,Long Chaiu MING¹,Nurolaini KIFLI¹,Hui Poh GOH¹

¹PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
²School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia DOI : 10.29228/jrp.552 Drug discovery using advanced computational tools such as machine learning has succeeded in reducing about 40% and 60% of the time and costs required by conventional drug discovery pipelines respectively. In this study we aim at building a combinatorial library of anthraquinone and chalcone derivative and producing a workflow of different screening and scoring methodologies to find hits against cancerrelated proteins, and examine them using molecular dynamic and mechanics simulations. A combinatorial library, consisting of virtual compounds, was synthesized using 20 anthraquinone and 24 chalcone core structures via R-group enumeration methodology. The resulting compounds were optimized to the near drug-likeness properties and the physicochemical descriptors were calculated for all datasets and compared with commercially available databases such as FDA, Non-FDA, and natural products (NPs) datasets from ZINC 15. A workflow of a novel virtual screening and scoring methods was optimized based on the nature of the protein target. As a result; the optimized enumeration resulted in 1,610,268 compounds with NP-Likeness, and synthetic feasibility mean scores close to FDA, Non-FDA, and NPs datasets. The cheminformatic analysis illustrated an overlap between the chemical space of the generated library was more prominent with NPs with the lowest molecular diversity compared with other natural and synthetic drugs databases. Moreover, the consensus scoring methodology that we produced was based on quantitative structure-activity relationship, pharmacophore fitness, shape similarity, and docking scores. The optimized virtual screening for the protein targets was found to be beneficial in the retrospective enrichment studies, as it prioritized true positives in high percentage (ROC curve > 0.9). Compared to all other conventional screening methods individually, consensus scoring outperformed them. It was also found that this method of multistage virtual screening overcome challenges in the training set such as limited number of data points and limited diversity of activity. In molecular mechanic simulations, the range of activity of the experimental datasets plays a crucial role in the nature of the correlation between experimental activity values and binding free energy obtained by MM/GBSA calculations. In conclusion, consensus scoring using z-score fusion method is a beneficial way of virtual screening especially when the training dataset is imbalanced. Keywords : Cheminformatics; virtual screening; machine learning; drug design; consensus scoring

Marmara University

Home

Abstracting / Indexing

Contact