ADR•X: an interpretable, leakage-aware machine learning framework for sertraline adverse drug reaction signal detection using FAERS pharmacovigilance data

Authors

  • Adarsh Dheeraj Dubey Department of Bioinformatics, Guru Nanak Khalsa College of Arts, Science and Commerce (Autonomous), Matunga, Mumbai, Maharashtra, India
  • Ranjana Mangesh Parab Department of Bioinformatics, Guru Nanak Khalsa College of Arts, Science and Commerce (Autonomous), Matunga, Mumbai, Maharashtra, India
  • Sermarani Nadar Department of Bioinformatics, Guru Nanak Khalsa College of Arts, Science and Commerce (Autonomous), Matunga, Mumbai, Maharashtra, India
  • Gursimran Kaur Uppal Department of Bioinformatics, Guru Nanak Khalsa College of Arts, Science and Commerce (Autonomous), Matunga, Mumbai, Maharashtra, India

DOI:

https://doi.org/10.18203/2319-2003.ijbcp20261968

Keywords:

Adverse drug reactions, Pharmacovigilance, Sertraline, Machine learning, FAERS, Leakage-aware modelling, SHAP, LightGBM

Abstract

Adverse drug reactions (ADRs) are among the leading causes of preventable patient harm globally, and while the FDA Adverse Event Reporting System (FAERS) offers the most comprehensive post-marketing safety repository available, most published machine learning (ML) studies that work with this database introduce information leakage by incorporating outcome-derived disproportionality metrics—proportional reporting ratios (PRR) and reporting odds ratios (ROR)—directly as model features, thereby inflating performance estimates and undermining real-world generalisability. This study presents ADR•X, a LightGBM-based, leakage-aware framework designed to detect sertraline ADR signals from FAERS data using an approximately 208-variable feature space spanning patient demographics, physicochemical molecular descriptors, pharmacogenomic indicators, biology-guided multi-omics proxy variables, and mechanistic interaction terms, with all PRR-, ROR-, and frequency-derived variables explicitly excluded. Two model configurations were evaluated: an unweighted baseline and an inverse class-frequency-weighted variant. The baseline achieved an AUC-ROC of 0.53–0.54 and the imbalance-adjusted model reached 0.55–0.56. Global SHAP analysis identified dose mg, metabolic overload score, and polypharmacy flag as the three most influential predictors, while all remaining features clustered near zero, confirming the absence of leakage-driven dominance. The framework was deployed as a reproducible Streamlit research portal and is intended exclusively for population-level hypothesis generation, not individual clinical risk prediction. Modest AUC values reflect the bounded information content of voluntary reporting systems and represent honest signal estimation rather than model inadequacy. ADR•X demonstrates that biologically plausible and interpretable ADR signal detection is achievable from FAERS data without sacrificing methodological integrity.

References

Edwards IR, Aronson JK. Adverse drug reactions: definitions, diagnosis, and management. Lancet. 2000;356(9237):1255-9.

WHO. The Importance of Pharmacovigilance: Safety Monitoring of Medicinal Products. Geneva: WHO Press. 2022. Available at: https://www.who.int/publications/i/item/10665-42493?. Accessed on 03 March 2026.

U.S. Food and Drug Administration. FDA Adverse Event Reporting System (FAERS) Public Dashboard. Silver Spring (MD): FDA. 2023. Available at: https://www.fda.gov/drugs/fda-adverse-event-reporting-system-faers. Accessed on 03 March 2026.

Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C. Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Ther. 2012;91(6):1010.

Hauben M, Bate A. Decision support methods for the detection of adverse events in post-marketing data. Drug Discov Today. 2009;14(8):343-57.

Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25:1337-40.

Stahl SM. Stahl's Essential Psychopharmacology: Neuroscientific Basis and Practical Applications. 4th ed. Cambridge: Cambridge University Press. 2013.

Kirchheiner J, Brosen K, Dahl ML, Gram LF, Kasper S, Roots I, et al. CYP2D6 and CYP2C19 genotype-based dose recommendations for antidepressants. Acta Psychiatr Scand. 2001;104(3):173-92.

Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, et al. Pharmacogenomics knowledge for personalised medicine. Clin Pharmacol Ther. 2012;92(4):414-7.

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146-54.

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765-74.

Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56-67.

Bate A, Evans SJW. Quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol Drug Saf. 2009;18(6):427-39.

Van Puijenbroek EP, Bate A, Leufkens HGM, Lindquist M, Orre R, Egberts ACG. A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions. Pharmacoepidemiol Drug Saf. 2002;11(1):3-10.

Hazell L, Shakir SAW. Under-reporting of adverse drug reactions: a systematic review. Drug Saf. 2006;29(5):385-96.

Ryan PB, Madigan D, Stang PE, Overhage JM, Racoosin JA, Hartzema AG. Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Stat Med. 2012;31(30):4401-15.

Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. arXiv:1702.08608. 2017.

Downloads

Published

2026-06-23

How to Cite

Dubey, A. D., Parab, R. M., Nadar, S., & Uppal, G. K. (2026). ADR•X: an interpretable, leakage-aware machine learning framework for sertraline adverse drug reaction signal detection using FAERS pharmacovigilance data. International Journal of Basic & Clinical Pharmacology, 15(4), 775–780. https://doi.org/10.18203/2319-2003.ijbcp20261968

Issue

Section

Short Communication