Home / Publications / How to stop worrying and love multiple citation experimental data

How to stop worrying and love multiple citation experimental data

Yaroslav Vladislavovich Timofeev 1, 2
Yaroslav Vladislavovich Timofeev
Amir M. Mrasov 1, 2
Amir M. Mrasov
Maria Vyacheslavovna Panova 1
Maria Vyacheslavovna Panova
Fedor Nikolaevich Novikov 1
Fedor Nikolaevich Novikov
Igor Svitanko 1
Igor Svitanko
Published 2025-02-18
CommunicationVolume 35, Issue 2, 224-227
0
Share
Cite this
GOST
 | 
Cite this
GOST Copy
Timofeev Y. V. et al. How to stop worrying and love multiple citation experimental data // Mendeleev Communications. 2025. Vol. 35. No. 2. pp. 224-227.
GOST all authors (up to 50) Copy
Timofeev Y. V., Mrasov A. M., Panova M. V., Novikov F. N., Svitanko I. How to stop worrying and love multiple citation experimental data // Mendeleev Communications. 2025. Vol. 35. No. 2. pp. 224-227.
RIS
 | 
Cite this
RIS Copy
TY - JOUR
DO - 10.71267/mencom.7710
UR - https://mendcomm.colab.ws/publications/10.71267/mencom.7710
TI - How to stop worrying and love multiple citation experimental data
T2 - Mendeleev Communications
AU - Timofeev, Yaroslav Vladislavovich
AU - Mrasov, Amir M.
AU - Panova, Maria Vyacheslavovna
AU - Novikov, Fedor Nikolaevich
AU - Svitanko, Igor
PY - 2025
DA - 2025/02/18
PB - Mendeleev Communications
SP - 224-227
IS - 2
VL - 35
ER -
BibTex
 | 
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Timofeev,
author = {Yaroslav Vladislavovich Timofeev and Amir M. Mrasov and Maria Vyacheslavovna Panova and Fedor Nikolaevich Novikov and Igor Svitanko},
title = {How to stop worrying and love multiple citation experimental data},
journal = {Mendeleev Communications},
year = {2025},
volume = {35},
publisher = {Mendeleev Communications},
month = {Feb},
url = {https://mendcomm.colab.ws/publications/10.71267/mencom.7710},
number = {2},
pages = {224--227},
doi = {10.71267/mencom.7710}
}
MLA
Cite this
MLA Copy
Timofeev, Yaroslav Vladislavovich, et al. “How to stop worrying and love multiple citation experimental data.” Mendeleev Communications, vol. 35, no. 2, Feb. 2025, pp. 224-227. https://mendcomm.colab.ws/publications/10.71267/mencom.7710.

Keywords

biological activity.
ChEMBL database
machine learning
MEDLINE
NLP
OOD detection

Abstract

Numerous public databases now collect and disseminate biological activity data from literature and patents, forming the basis for chemogenomics and novel scoring functions. However, data quality is often compromised due to multiple citations of values across different studies with varying protocols. To address this issue, we used the XGBoost model in combination with a BERT-based NLP approach and a distance-based out-of-distribution (OOD) data detection method to enhance classification accuracy and exclude review articles.

References

.
ChEMBL: a large-scale bioactivity database for drug discovery
Gaulton A., Bellis L.J., Bento A.P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., Overington J.P.
Nucleic Acids Research, 2011
.
Random Forests
Breiman L.
Machine Learning, 2001
.
Support-vector networks
Cortes C., Vapnik V.
Machine Learning, 1995
.
XGBoost
Chen T., Guestrin C.
2016
.
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
Mayr A., Klambauer G., Unterthiner T., Steijaert M., Wegner J.K., Ceulemans H., Clevert D., Hochreiter S.
Chemical Science, 2018
.
The ChEMBL database in 2017
Gaulton A., Hersey A., Nowotka M., Bento A.P., Chambers J., Mendez D., Mutowo P., Atkinson F., Bellis L.J., Cibrián-Uhalte E., Davies M., Dedman N., Karlsson A., Magariños M.P., Overington J.P., et. al.
Nucleic Acids Research, 2016
.
The Experimental Uncertainty of Heterogeneous Public Ki Data
Kramer C., Kalliokoski T., Gedeck P., Vulpetti A.
Journal of Medicinal Chemistry, 2012
.
Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide
Marshall I.J., Noel-Storr A., Kuiper J., Thomas J., Wallace B.C.
Research Synthesis Methods, 2018
.
Comparability of Mixed IC50 Data – A Statistical Analysis
Kalliokoski T., Kramer C., Vulpetti A., Gedeck P.
PLoS ONE, 2013
.
Performance Analysis of XGBoost Ensemble Methods for Survivability with the Classification of Breast Cancer
Mahesh T.R., Vinoth Kumar V., Muthukumaran V., Shashikala H.K., Swapna B., Guluwadi S.
Journal of Sensors, 2022
.
Using BERT to identify drug-target interactions from whole PubMed
Aldahdooh J., Vähä-Koskela M., Tang J., Tanoli Z.
BMC Bioinformatics, 2022
.
A document classifier for medicinal chemistry publications trained on the ChEMBL corpus
Papadatos G., van Westen G.J., Croset S., Santos R., Trubian S., Overington J.P.
Journal of Cheminformatics, 2014
.
Activity, assay and target data curation and quality in the ChEMBL database
Papadatos G., Gaulton A., Hersey A., Overington J.P.
Journal of Computer-Aided Molecular Design, 2015
.
The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods
Zdrazil B., Felix E., Hunter F., Manners E.J., Blackshaw J., Corbett S., de Veij M., Ioannidis H., Lopez D.M., Mosquera J., Magarinos M., Bosc N., Arcila R., Kizilören T., Gaulton A., et. al.
Nucleic Acids Research, 2023