Translate this page into:
Biomarker identification and inhibitor discovery for Marburg virus using machine learning driven virtual screening and molecular simulations
* Corresponding author E-mail address: dharmendra30oct@gmail.com (D K Yadav)
-
Received: ,
Accepted: ,
Abstract
Marburg virus (MARV) results in severe hemorrhagic fever, with high morbidity and lacks approved antiviral therapies. The present research identifies differential genes (DEGs) involved in immune response pathways against the MARV, using bioinformatics and molecular modeling to explore new therapeutic targets. Cases were analyzed intensively using microarray data sets GSE58287 and GSE226148. DEGs were screened, followed by functional enrichment of the response in those DEGs. Important regulators use the analyses of protein-protein interaction networks. Screened compounds were further validated using the Lipinski Rule and absorption, digestion, metabolism, excretion, and toxicity (ADMET) profiling. The prediction of inhibitors was performed using machine learning for these targets, which was validated by ADMET profiling. Molecular docking and dynamics simulations were used to calculate the affinities of candidates for binding to STAT1. A total of 1,179 DEGs were identified, with most of them associated with immune responses and antiviral defense mechanisms. Functional enrichment analysis indicated that the DEGs are involved in innate immunity and viral pathways, including influenza A, COVID-19, and hepatitis C. Network analysis revealed STAT1, IRF7, and CXCL10 as central regulators of the immune response. The Random Forest models predicted several potential STAT1 inhibitors, which were confirmed through molecular docking and MD simulations. This study sheds light on the role of immune-related pathways and regulators in MARV infection. STAT1 was identified as a target, with inhibitors discovered through the use of machine learning, docking, and simulations. These findings provide insights into MARV pathogenesis and support the development of targeted therapies for MARV and related immune disorders.
Keywords
Machine learning
Marburg virus
Microarray analysis
Pathway analysis
STAT1
1. Introduction
Marburg disease, caused by MARV of the Filoviridae family, is closely related to the Ebola virus (Ball et al., 2025). MARV is transmitted to humans primarily through direct exposure to infected animals, their bodily fluids, or surfaces and materials contaminated with the virus. Early symptoms include fever, myalgia, which may progress to severe organ failure, and death. Incubation period ranges from 2 to 21 days, during which individuals may remain asymptomatic or exhibit mild signs (Fischer & Munster, 2025; Fletcher, 2025). Identifying DEGs in MARV infection is pivotal for elucidating host responses and advancing targeted therapies. Bioinformatics integrates computational and statistical tools to analyze complex biological data, aiding pathogenesis studies prioritizing biomarker-driven therapeutic strategies to enhance prognosis, improve diagnostics, and mitigate disease progression and associated complications (Tintelnot et al., 2024). Computer-Aided Drug Design (CADD) has revolutionized pharmaceutical research, expediting drug development while reducing costs. Incorporating machine learning (ML) enhances data processing efficiency. Computational approaches like ML and molecular docking hold promise for identifying innovative therapeutic candidates with precision and speed (Fischer & Munster, 2025; Boora et al. 2024).
MARV infection can lead to a severe and often fatal hemorrhagic fever, affecting both humans and various animal species, with a fatality rate ranging from 24% to 90%. Recognized as a highly hazardous pathogen, MARV poses significant biosafety risks. While inactivated samples can be handled in lower-level laboratories under strict safety protocols, any research involving live viruses must be conducted within Biosafety Level 4 (BSL-4) laboratories due to their extreme pathogenic potential. Clinically, MARV symptoms often resemble those of malaria, typhoid fever, or other viral hemorrhagic diseases, necessitating specific laboratory-based diagnostic tests for accurate identification. However, global diagnostic capabilities remain limited, especially in resource-constrained regions. At present, there are no approved vaccines or antiviral treatments specifically targeting MARV, although several candidates are in clinical trials. Management of the disease is primarily supportive, aimed at alleviating symptoms (Boora et al. 2024). The high fatality rate, lack of targeted therapies, and limited diagnostic infrastructure make outbreak control in endemic regions particularly challenging.
While vaccines and antiviral treatments exist for some viruses, no approved vaccine or effective antiviral drug currently exists for Marburg virus disease (MVD), leaving patients dependent on supportive care. Researchers are actively pursuing several vaccine development strategies. These include inactivated virus vaccines, where MARV is cultured and rendered non-infectious; live-attenuated vaccines using weakened virus strains; and viral vector platforms that introduce MARV genetic material via harmless viruses to stimulate immunity. Protein-based vaccines produced through recombinant DNA technology are also under investigation. Among the most promising candidates are the cAd3-Marburg vaccine, MVA-BN-Filo, and VRC-MARDNA025-00-VP. Of these, cAd3-Marburg has undergone Phase 1 trials, demonstrating safety and immune response, while MVA-BN-Filo is advancing to later-stage trials (Boora et al. 2024). Until a licensed vaccine becomes available, treatment remains limited to managing symptoms and maintaining hydration, either orally or intravenously, to improve survival outcomes.
In the absence of specific vaccines or antiviral drugs, medicinal plants offer a promising alternative. Ethnomedicine, increasingly favored over synthetic treatments, is regarded as safe and therapeutically diverse. Numerous studies have highlighted the antiviral potential of phytochemicals derived from medicinal herbs. Compounds from plants such as Cistus incanus, Illicium verum, and Andrographis paniculata, as well as Borneol and Semicochliodinol B, have shown potential activity against viruses, including MARV and dengue.
2. Materials and Methods
2.1 Data collection and preprocessing
The keyword MARV was searched, and a total of two expression profiling datasets were retrieved from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) under accession GSE58287 and GSE226148. GSE58287 dataset downloaded under platform GPL10332, having 15 normal and 15 affected samples (Saeed et al., 2024; Semancik et al., 2024). On other hand, dataset GSE226148 downloaded under platform GPL21697 with three normal and six affected samples.
2.2 Identification of DEGs in MARV and acquisition of overlapped genes
DEGs of dataset GSE58287 and GSE226148 between normal and MARV groups were independently analyzed using the Limma package in R. Fold changes (FCs) in gene expression were computed, and differentially expressed genes (DEGs) were identified based on a p-value threshold of < 0.05. Genes exhibiting a |log2 fold change| ≥ 1.0 were classified as significantly upregulated, while those with |log2 fold change| ≤ -1.0 were deemed significantly downregulated (Heindl et al., 2024).
2.3 Functional enrichment and pathway analysis
Gene Ontology (GO) and KEGG pathway analyses were performed using the DAVID database to access functional roles of DEGs. GO analysis covered Biological Process (BP), cellular component (CC), and Molecular function (MF) of genes, while KEGG analysis identified significant pathways involving overlapping genes (Butera et al., 2024; Callaway, 2024; Camargos et al., 2024). A significance cutoff of P < 0.05 was applied.
2.4 Protein Protein Interaction (PPI) network construction and hub genes identification
PPI networks are employed to establish all overlapped genes in a biological network that delivers a comprehensive knowledge of the functional system of the proteome. STRING database (version) contains information about predicted and experimental connections of proteins (Asadnia et al., 2023; Rodriguez et al., 2021). It was used to identify important protein pairs with a collective score of > 0.4. The PPI network was then constructed using Cytoscape (version). CytoHubba was employed and nodes with a higher degree of connectivity were considered as hub genes.
2.5 Compound identification and decoy generation
Compounds related to the hub gene were obtained from the BindingDB database (https://www.bindingdb.org/rwd/bind/index.jsp), while decoy (inactive) molecules were created using DUDE (https://dude.docking.org/generate). Active compounds from BindingDB were labeled 1, and decoy molecules were labeled 0 to represent inactivity (Islam et al., 2024). Initially, the dataset had an imbalanced 1:2 ratio of active to inactive compounds, which required adjustment to ensure balanced training and test sets. To address this, Synthetic Minority Over-sampling Technique (SMOTE) was applied, creating synthetic samples by interpolating existing data points to balance the dataset.
2.6 Feature calculation and principal component analysis (PCA)
RDKit was used to convert each molecule’s SMILES notation into 10 numerical features (Weisbrod et al., 2024). PCA was applied to extract two principal components, capturing key data variability for subsequent analysis.
2.7 Evaluation of machne Learning (ML) algorithms for compound prediction on new dataset
Multiple ML models, including support vector machine (SVM), k-nearest neighbors (KNN), and random forest (RF), were developed to classify compounds as active or inactive (Mahdizadeh Gharakhanlou & Perez, 2024; Manochkumar et al., 2024; Mohammed et al., 2024). Performance was evaluated using recall, precision, accuracy, and AUC-ROC. The optimal model was applied to a dataset of 5,000 phytochemicals, with Lipinski’s Rule of Five used to refine the selection of drug-like compounds (Kolisnik et al., 2023; Konkel et al., 2023).
2.8 Molecular docking
For molecular docking analysis, the 3D structure of the hub gene was retrieved from the RCSB PDB database. UCSF Chimera was used to refine protein structure by removing non-standard amino acid residues and minimizing energy. Potential phytochemical structures, meeting Lipinski’s rule, were generated using RDKit. The hub gene’s binding site was identified with the CASTp tool (Nisar et al., 2023; Wu et al., 2023). Molecular docking was performed using PyRx (Wu et al., 2023), with compounds showing higher binding affinity and lower RMSD considered optimal.
2.9 Molecular dynamics (MD) simulation
Following docking studies, MD simulations of STAT1 with 3-methoxy-4-[(3S)-7-methoxy-3,4-dihydro-2H-chromen-3-yl]phenol and (2R,3R)-2-(1,3-benzodioxol-5-yl)-5,7-dimethoxy-3,4-dihydro-2H-chromen-3-ol were conducted using Desmond to analyze binding and conformational stability (Dai et al., 2023; Tordecilla et al., 2021). OPLS3e force field parameterized system, solvated in a TIP3P water model. Energy minimization was performed, followed by 100 ns simulations under NPT ensemble conditions (300 K, 1 atm) using PME for electrostatic interactions and a 9 Å cutoff for van der Waals forces.
2.10 Immune infiltration and cancer exploration
TIMER 2.0 (http://timer.cistrome.org/) was used to analyze immune infiltration and biomarker expression and evaluated by estimating partial Spearman correlations between biomarker expression and immune cell subtypes, namely B cells, CD8+ T-cells, CD4+ T-cells, and endothelial cells, adjusted for purity and correlations with significance indicated: p-values < 0.05.
3. Results
3.1 Analysis of microarray datasets and identification of DEGs
On basis of Limma analysis with p-value < 0.05, |log FC| ≥ 1.0 for up regulated genes and |log FC| ≤ -1.0 for downregulated genes, overall 1179 DEGs were recognized in GSE58287 (567) and GSE226148 (1069) datasets (Figs. 1a and b) (Wiechert et al., 2024). A Volcano plot is used to illustrate differentially expressed genes (DEGs) in transcriptomic studies. It identifies significant MARV biomarkers by identifying genes that have statistically significant expression alterations. DEGs from both datasets were used to find 108 overlapped genes (Fig. 1c). A heatmap of 108 overlapped genes based on their logFC was generated for both datasets (Fig. 1d).

- (a) Volcano plots of GSE226148, (b) Volcano plots of GSE58287, (c) Venn diagram of overlapping genes, and (d) Heatmap of overlapping genes.
3.2 Functional enrichment and pathway analysis
GO analysis revealed significant enrichment of overlapping genes. BP involved innate immune and antiviral responses, while CC localized to the cytoplasm, cytosol, and nucleoplasm. MF included RNA binding, identical protein binding, and protein homodimerization. Pathway analysis highlighted enrichment in influenza A, COVID-19, herpesvirus, and hepatitis C pathways (Figs. 2a and b).

- (a) Vertical bar chart of GO terms (BP, CC, and MF), and (b) Horizontal bar chart of KEGG pathways.
3.3 PPI and identification of hub genes
Using the STRING database, a PPI network was constructed in Cytoscape for overlapping genes, resulting in a network comprising 93 nodes and 1293 edges (Fig. 3a). The CytoHubba plugin was employed to identify hub genes based on their connectivity degree within the network. Ten hub genes were identified, including STAT1, IRF7, CXCL10, IFIH1, GBP1, IFIT3, MX1, IFIT2, ISG15, and OAS1 (Figs. 3b and c). Fig. 3(d) illustrates high-confidence co-expression relationships among these hub genes. STAT1 emerged as a central hub gene within the PPI network and was prioritized for subsequent analyses (Koo et al., 2017; Zerbe et al., 2016).

- (a) PPI network of 108 overlapped genes, (b) Hub genes identified through CytoHubba, (c) bar plot of degree of hub genes, and (d) Co-expression of hub genes.
3.4 Compounds features extraction and PCA
From the Binding DB database, 544 molecules targeting STAT1 were retrieved, supplemented by 1,100 decoy molecules generated via the DUDE platform. This resulted in a dataset of 1,644 compounds, comprising 544 active molecules and 1,100 decoys, verified for completeness and duplicates (Wu et al., 2023). During preprocessing, compounds were numerically encoded by converting SMILES notation into molecular descriptors using the RDKit library. Ten key features were generated, with their statistical metrics presented in Table 1. PCA was applied to reduce the dataset to 2 PCs, capturing considerable variance datasets.
| Feature | Description | Min | Max | Mean |
|---|---|---|---|---|
| MolWt | Molecular weight | 159.19 | 1505.11 | 385.21 |
| MolLogP |
Partition coefficient between octanol and water |
-24.39 | 8.43 | 3.63 |
| MaxPartialCharge | Maximum partial charge | 0.05 | 1 | 0.28 |
| MinPartialCharge | Minimum partial charge | -0.87 | -0.19 | -0.42 |
| MaxEStateIndex |
Maximum electron state indices |
2.23 | 15.32 | 11.68 |
| MinEStateIndex | Minimum electron state indices | -6.02 | 0.99 | -0.79 |
| FpDensityMorgan1 | Fingerprint Density using Morgan algorithm | 0.24 | 1.71 | 1.13 |
| Qed | Quantitative Estimation of Drug-likeness | 0.02 | 0.94 | 0.61 |
| NumValenceElectrons | Number of valence electrons | 60 | 448 | 137.91 |
| Chi0 | Zero-order Molecular Connectivity Index | 8.55 | 63.39 | 18.92 |
3.5 ML model performance evaluation
ML algorithms, including KNN, SVM, and RF, were utilized to classify STAT1 inhibitors. Models trained on BindingDB-derived data. Performance metrics, including accuracy, recall, specificity, and AUC [27-29], evaluated model efficacy, with results detailed for the test dataset. The RF model achieved 84.18% accuracy, 60.74% sensitivity, 95.76% specificity, and an AUC of 0.872, surpassing SVM (79.92%, 50.31%, 94.55%, 0.835) and KNN (77.89%, 63.80%, 84.85%, 0.821). RF performed the best with accuracy = 84.18%, sensitivity = 60.74%, specificity = 95.76%, and an AUC = 0.872. It would increase the validity of the model performance evaluation if a 95% confidence interval for accuracy were included (e.g., RF 81.0%-87.3%).
Fig. 4 displays performance on both training and test sets.

- Performance evaluation of ML models.
3.6 Predictions on new dataset
The RF model was employed to predict active phytochemicals targeting STAT1. Among 5,000 compounds, 469 were identified as potentially active, highlighting the model’s robustness. Drug-likeness of these compounds was assessed using RDKit’s Lipinski module, calculating MW, HBD, HBA, and LogP values. Based on Lipinski’s Rule of Five (MW <500 Da, <5 HBD, <10 HBA, LogP <5), 42 phytochemicals were evaluated as drug candidates (Wu et al., 2024).
3.7 Molecular docking
Molecular docking analysis was accomplished between the STAT1 protein and 42 compounds to identify the most promising therapeutic candidates. The 3D structure of STAT1 was downloaded from the Protein Data Bank (PDB ID: 3WWT; Table 2, Fig. 5(a)). Compounds with maximum binding affinity and lowermost RMSD values were prioritized. Among compounds, 3-methoxy-4-[(3S)-7-methoxy-3,4-dihydro-2H-chromen-3-yl]phenol exhibited the highest binding affinity at -5.3 kcal/mol and an RMSD of 0.762 Å, interacting significantly with residues ASP24, PRO27, and MET28 (Figs. 5b and g). Compound (2R,3R)-2-(1,3-benzodioxol-5-yl)-5,7-dimethoxy-3,4-dihydro-2H-chromen-3-ol showed a binding affinity of -5.2 kcal/mol and RMSD of 0.731 Å, interacting with ASP24, PRO27, MET28, GLU29, ASN89, and PHE94 (Figs. 5c and h). Compound 1-[(3,4-dimethoxyphenyl)methyl]-6-methoxy-2-methyl-3,4-dihydro-1H-isoquinolin-7-ol ranked third, with a binding score of -5.2 kcal/mol and RMSD of 1.038 Å, interacting with residues TYR22, ASP23, ASP24, PRO27, MET28, GLU29, ASN89, ASN93, and PHE94 residues (Figs. 5d and i). 3,5-Dichloro-4-phenylmethoxybenzamide, the fourth-ranked compound, demonstrated a binding affinity of -5.2 kcal/mol with an RMSD of 2.128 Å, interacting with the residues ASP24, PRO27, MET28, LYS85, and ASN89 (Figs. 5e and j). Lastly, (6aS)-1,2,9,10-tetramethoxy-6-methyl-5,6,6a,7-tetrahydro-4H-dibenzo[de,g]quinolin-6-ium had a binding score of -5.1 kcal/mol with RMSD of 0.642 Å, interacting with residues ASP23, PRO27, GLU29, and LYS85 (Figs. 5f and k). These detailed docking interactions signify the therapeutic potential of selected compounds as novel inhibitors targeting STAT1 (Xu et al., 2016; Yang et al., 2016; Zerbe et al., 2016). Absorption, digestion, metabolism, excretion, and toxicity (ADMET) prediction of five molecules revealed different pharmacokinetic profiles ranging from -5.149 to -2.283 (LogS) for solubility and 0.088 to 0.987 (BBB) for blood-brain barrier permeability. Some of them were permeable (MDCK: 4.62E-05 to 3.08E-05). Still, there are risks, such as PGP inhibition (0.016 to 0.925), which require experimental confirmation, as shown in Table 3, and detailed ADMET profiling results have been provided in Table S1.
| IUPAC name | Binding Affinity (kcal/mol) | RMSD (Å) | 2D Structure |
|---|---|---|---|
| 3-methoxy-4-[(3S)-7-methoxy-3,4-dihydro-2H-chromen-3-yl]phenol | -5.3 | 0.762 | ![]() |
| (2R,3R)-2-(1,3-benzodioxol-5-yl)-5,7-dimethoxy-3,4-dihydro-2H-chromen-3-ol | -5.2 | 0.731 | ![]() |
| 1-[(3,4-dimethoxyphenyl)methyl]-6-methoxy-2-methyl-3,4-dihydro-1H-isoquinolin-7-ol | -5.2 | 1.038 | ![]() |
| 3,5-dichloro-4-phenylmethoxybenzamide | -5.2 | 2.128 | ![]() |
| (6aS)-1,2,9,10-tetramethoxy-6-methyl-5,6,6a,7-tetrahydro-4H-dibenzo[de,g]quinolin-6-ium | -5.1 | 0.642 | ![]() |

- Top 5 inhibitors of STAT1 predicted through molecular docking.
| Smiles | LogS | LogD | LogP | Pgp-inh | Pgp-sub | HIA | MDCK | BBB |
|---|---|---|---|---|---|---|---|---|
| COC1CCC2C(C1)OC[C@H](C1CCC(O)CC1OC)C2 | -4.083 | 3.482 | 3.574 | 0.016 | 0.552 | 0.004 | 2.48E-05 | 0.088 |
| COC1CC(OC)C2C(C1)O[C@H](C1CCC3C(C1)OCO3)[C@H](O)C2 | -5.149 | 3.251 | 3.178 | 0.925 | 0.002 | 0.003 | 4.62E-05 | 0.892 |
| COC1CC2C(CC1O)C(CC1CCC(OC)C(OC)C1)N(C)CC2 | -2.283 | 3.038 | 2.316 | 0.538 | 0.69 | 0.004 | 1.90E-05 | 0.987 |
| NC(=O)C1CC(CL)C(OCC2CCCCC2)C(CL)C1 | -4.91 | 3.034 | 3.594 | 0.839 | 0.002 | 0.003 | 1.63E-05 | 0.706 |
| COC1CC2C(CC1OC)-C1C(OC)C(OC)CC3C1[C@H](C2)N(C)CC3 | -3.1 | 3.035 | 2.733 | 0.886 | 0.985 | 0.003 | 3.08E-05 | 0.971 |
3.8 MD simulation
The interaction stability was analyzed based on the calculated RMSD of C-alpha atoms in protein-ligand complexes. All equilibrated the complexes within 10 ns, as shown in Fig. 6. After equilibration, the RMSD for STAT1 complexes with 3-methoxy-4-[(3S)-7-methoxy-3, 4-dihydro-2H-chromen-3-yl] phenol and (2R, 3R)-2-(1, 3-benzodioxol-5-yl)-5, 7-dimethoxy-3, 4-dihydro-2H-chromen-3-ol remained steady at approximately 2.0 Å for 100 ns in Figs. 6 (a) and (b). For STAT1 with 3-methoxy-4-[(3S)-7-methoxy-3, 4-dihydro-2H-chromen-3-yl] phenol, the RMSD of protein reached approximately ∼3.2 Å at 40 ns, and stabilized around ∼6.4 Å by 70 ns. Ligand RMSD increased to ∼5.6 Å by 30 ns, indicating adaptive positioning within the binding pocket (Kumar et al., 2024). It depicted protein RMSD begins at ∼2.0 Å, reflecting initial flexibility post-equilibration. During the first 10 ns, RMSD slightly decreases, stabilizing between 2.7–4.3 Å from 10 to 100 ns, indicating equilibrium.
![(a) RMSD of STAT1 and 3-methoxy-4-[(3S)-7-methoxy-3,4-dihydro-2H-chromen-3-yl]phenol complex (b) RMSD of STAT1 and (2R,3R)-2-(1,3-benzodioxol-5-yl)-5,7-dimethoxy-3,4-dihydro-2H-chromen-3-ol complex (this figure is not clear).](/content/185/2025/37/3/img/JKSUS-37-202025-g11.png)
- (a) RMSD of STAT1 and 3-methoxy-4-[(3S)-7-methoxy-3,4-dihydro-2H-chromen-3-yl]phenol complex (b) RMSD of STAT1 and (2R,3R)-2-(1,3-benzodioxol-5-yl)-5,7-dimethoxy-3,4-dihydro-2H-chromen-3-ol complex (this figure is not clear).
3.9 Immune infiltration and cancer exploration
Fig. 7(a) illustrates a marked overexpression of STAT1 across multiple cancer types, notably including breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), and stomach adenocarcinoma (STAD). Expression levels varied across tumor types, suggesting differential roles of STAT1 in oncogenesis. Immune infiltration heatmap (Fig. 7(b)) revealed positive correlations between STAT1 and immune cells, including B cells, CD8+ T-cells, and CD4+ T-cells, with stronger associations in multiple cancers. These findings suggest STAT1 modulates tumor microenvironment immune responses and angiogenesis, indicating dual roles in cancer immunity and progression depending on tumor context.

-
STAT1 expression and immune infiltration correlations across cancers: (a) Expression analysis; (b) Heatmap of immune cell partial correlations (this figure is not clear).
4. Discussion
This study investigated MARV, which causes severe hemorrhagic fever with high mortality rates, through an integrative bioinformatics approach. Using microarray datasets, 1,179 DEGs were identified, applying thresholds of p-value < 0.05 and |logFC| ≥1.0 and ≤-1.0 for DEGs, which were further analyzed for biological significance (Feldmann, 2025; Fletcher, 2025). Pathways associated with innate immune response and viral defense mechanisms highlighted MARV’s impact on the host immune system (Lyu et al., 2024). PPI analysis identified STAT1 as a key hub gene among the 108 overlapping differentially expressed genes (DEGs). STAT1’s role in the interferon signaling pathway as an essential mediator of viral defense was reinforced. STAT1’s high connectivity and its dual role in immunity and tumor progression (Kumar et al., 2024; Wu et al., 2023). Computational ADMET predictions reveal considerable pharmacokinetic heterogeneity between five molecules. Solubility (LogS: -5.149 to -2.283) and permeability (Caco-2: -4.944 to -4.615) control drug absorption, whereas P-glycoprotein interaction regulates bioavailability (substrate probability: 0.002 to 0.985). Metabolic problems occur due to variable PGP inhibition (ranging from 0.016 to 0.925) (Araújo et al., 2020). Variable BBB penetration (0.088 to 0.987) results from variable access to the central nervous system (CNS). Oral bioavailability (F30%: 0.002 to 0.805) is predicted and shows low systemic exposure for chosen compounds. Even though some of the drug-like candidates, for example, such potential toxic hazard factors as hERG blockade and hepatotoxicity, require in vitro and in vivo validation. Advanced machine learning techniques were applied to identify potential inhibitors of STAT1, a key protein in immune regulation. Compounds (Binding DB) and decoy molecules (DUDE) were optimized using the dataset for ML algorithms (Ahmadieh-Yazdi et al., 2023; Yuan et al., 2022; Zhang et al., 2022). Various models, including RF, SVM, and KNN, were evaluated for their predictive performance, with RF showing the highest accuracy (84.18%) and an AUC of 0.87, indicating its superiority for classifying STAT1 inhibitors (Brown et al., 2016). Molecular docking studies identified 42 phytochemicals that comply with Lipinski’s Rule of Five. Among these, 3-methoxy-4-[(3S)-7-methoxy-3, 4-dihydro-2H-chromen-3-yl] phenol exhibited the highest binding affinity (-5.3 kcal/mol) and demonstrated strong interactions with the binding site residues of STAT1 (ASP24, PRO27, MET28). These interactions were further validated through molecular dynamics simulations using Desmond, which demonstrated the formation of stable protein-ligand complexes over a 100-ns period, indicating their potential as STAT1 inhibitors. This study implemented a multi-stage screening approach to ensure selectivity profiling. Initially, compounds were screened using machine learning models, followed by filtering based on Lipinski’s Rule of Five. Subsequently, ADMET profiling was conducted to assess pharmacokinetic properties. Finally, molecular docking and molecular dynamics (MD) simulations were performed to validate binding specificity, ensuring that selectivity concerns are thoroughly addressed.
STAT1 plays a central role in mediating interferon signaling, a key component of immune responses, particularly in antiviral defense mechanisms. Its role in cancer is exceptionally multifaceted and context-dependent. Primarily, STAT1 functions as a tumor suppressor by enhancing cell cycle arrest and apoptosis, as well as immune surveillance through the increased expression of Major Histocompatibility Complex (MHC) molecules, which enables the immune system to recognize and eliminate tumor cells. STAT1, however, has tumor-promoting activities in some conditions. In specific models of leukemia, STAT1 has been shown to promote tumorigenesis by enhancing the expression of MHC class I molecules, thereby allowing cancer cells to evade natural killer (NK) cell-mediated killing. In the context of MARV and its possible oncogenic interactions with STAT1, direct evidence is limited (Liang et al., 2021). Given STAT1’s key role in antiviral defense mechanisms, it is likely that MARV may manipulate STAT1 function to evade immune surveillance. Such viral interference may disrupt normal STAT1 signaling pathways, potentially leading to the initiation of oncogenic processes. Additional research would be necessary to elucidate the precise mechanisms by which MARV affects STAT1 function and what this might imply for cancer development.
The TIMER 2.0 immune infiltration analysis showed that STAT1 was highly expressed in BRCA and COAD cancers, which also corresponded to CD8+ T-cells and B-cells. Thus, its role in cancer supports both functions, namely the immune modulation function and the role of progression in cancer (Ren et al., 2016). This study integrates bioinformatics, ML, and molecular simulations to enhance drug discovery accuracy and scalability. STAT1 performs a dual function in antiviral immunity and tumor inhibition. In MARV infection, STAT1 is an essential member of the interferon signaling cascade, aiding in the establishment of antiviral immunity through the activation of genes involved in viral replication inhibition and immune cell activation. Nevertheless, MARV evades this immune response by inhibiting STAT1 signaling, thus diminishing the host’s capacity to fight infection in tumor inhibition, STAT1 plays a role in mediating apoptosis, cell proliferation inhibition, and immune surveillance against tumor cells. Loss or impairment of STAT1 can inhibit these processes, thus facilitating tumorigenesis and tumor growth. This bimodal function highlights STAT1’s crucial role in both virus pathogen defense, such as against the MARV, and the inhibition of tumorigenesis (Najjar & Fagard, 2010).
STAT1 inhibition presents a novel therapeutic strategy against the MARV by targeting the host immune response rather than the virus itself. Such a strategy could potentially hinder MARV replication by interfering with the virus’s ability to evade immune detection, thereby reducing the risk of viral resistance (Valmas et al., 2010). Alternatively, the current MARV therapy is focused on either direct inhibition of the virus or symptom management.
Some antiviral treatments, including remdesivir and favipiravir, are active in animal models but lack convincing evidence from human studies (Madelain et al., 2020). Experimental monoclonal antibodies have shown potential by neutralizing MARV and enhancing survival in non-human primates. Existing clinical management involves supporting therapies such as fluid resuscitation and hemodynamic stabilization (Madelain et al., 2020). In comparison to these therapies, STAT1 inhibitors target the host’s signaling cascade, thereby reducing the risk of mutation-based drug resistance. Moreover, this treatment may support current antivirals by conferring a dual mode of action, thus boosting overall therapeutic effectiveness. While inhibition of STAT1 is still in an early investigative phase, it has potential as a novel, host-directed treatment that may advance MARV treatment and enhance patient survival. Further studies are required to validate its clinical safety and effectiveness (Warfield et al., 2017).
By using dynamic simulations, it confirms robust protein-ligand interactions over extended periods, offering more profound insights into STAT1’s biological role. Immune infiltration analysis highlights STAT1’s potential in cancer therapy, particularly in cancers with significant STAT1 pathway involvement (Kumar et al., 2024). This computational approach accelerates drug discovery, reducing reliance on traditional wet-lab experiments and providing a cost-effective solution for identifying therapeutic targets across various pathogens. The identified STAT1 inhibitors could serve as a foundation for targeted antiviral therapies and have broader applications in immune-related cancers. (Islam et al., 2024).
5. Conclusion
The study aimed to identify therapeutic lead compounds targeting biomarkers related to MARV by employing gene expression data retrieval and DEG analysis, along with enrichment analysis highlighting the pivotal role of STAT1 in immune response and antiviral defense, with PPI network identifying STAT1 as a central hub among 10 key genes. A curated dataset of active and inactive compounds was balanced using SMOTE and processed for machine learning. RF classifier outperformed other models, achieving an accuracy of 84.18% and an AUC of 0.87, successfully identifying 469 active compounds from a library of 5,000 phytochemicals. Molecular docking and dynamics simulations refined compound selection, with 3-methoxy-4-[(3S)-7-methoxy-3,4-dihydro-2H-chromen-3-yl] phenol exhibiting a strong binding affinity and stability. This study evaluates the five drug-like molecule ADMET profiles for solubility (LogS: -5.149 to -2.283), permeability (Caco-2: -4.944 to -4.615), and metabolic interaction (PGP-substrate: 0.002 to 0.985). The computed oral bioavailability (F30%) is not uniform (0.002 to 0.805), for which proposed toxicity problems must be validated experimentally. In vitro and in vivo validation of these compounds could confirm their efficacy against STAT1. Additionally, exploring STAT1’s role in immune infiltration and cancer provides broader therapeutic potential across various diseases.
Acknowledgements
The authors are grateful to their respective affiliating organizations for providing needful infrastructural support for this study.
CRediT authorship contribution statement
Fadi S. I. Qashqari: Investigation, Formal analysis, Visualization. Ahmad O. Babalghith: Formal analysis. Ayman K. Johargy: Formal analysis. Hani Faidah: Formal analysis. Abdullah F. Aldairi: Formal analysis. Farkad Bantun: Investigation, Validation, Formal analysis. Shafiul Haque: Writing – review, Funding acquisition. Dharmendra Kumar Yadav: Conceptualization, Methodology, Supervision, Validation, Resources, Writing – original draft, Writing – review & editing, funding acquisition. All authors have read and agreed to the published version of the manuscript.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Declaration of Generative AI and AI-assisted technologies in the writing process
The authors confirm that there was no use of artificial intelligence (AI)-assisted technology for assisting in the writing or editing of the manuscript and no images were manipulated using AI.
Supplementary data
Supplementary material to this article can be found online at https://dx.doi.org/10.25259/JKSUS_20_2025.
References
- Using machine learning approach for screening metastatic biomarkers in colorectal cancer and predictive modeling with experimental validation. Sci Rep. 2023;13:19426. https://doi.org/10.1038/s41598-023-46633-8
- [Google Scholar]
- Identification of potential COX-2 inhibitors for the treatment of inflammatory diseases using molecular modeling approaches. Molecules. 2020;25:4183. https://doi.org/10.3390/molecules25184183
- [Google Scholar]
- The Prognostic value of ASPHD1 and ZBTB12 in colorectal cancer: A machine learning-based integrated bioinformatics approach. Cancers (Basel). 2023;15:4300. https://doi.org/10.3390/cancers15174300
- [Google Scholar]
- Detection of serum antibodies targeting the marburg virus glycoprotein using a multiplex immunoassay platform. Methods Mol Biol. 2025;2877:345-354. https://doi.org/10.1007/978-1-0716-4256-6_23
- [Google Scholar]
- Marburg virus is nature’s wake-up call: A bird’s-eye view. Med Microecology. 2024;20:100102. https://doi.org/10.1016/j.medmic.2024.100102
- [Google Scholar]
- SUMO Ligase protein inhibitor of activated STAT1 (PIAS1) is a constituent promyelocytic leukemia nuclear body protein that contributes to the intrinsic antiviral immune response to herpes simplex virus 1. J Virol. 2016;90:5939-5952. https://doi.org/10.1128/JVI.00426-16
- [Google Scholar]
- Genomic and transmission dynamics of the 2024 Marburg virus outbreak in Rwanda. Nat Med. 2025;31:422-426. https://doi.org/10.1038/s41591-024-03459-9
- [Google Scholar]
- Deadly marburg virus: Scientists race to test vaccines in outbreak. Nature. 2024;634:278. https://doi.org/10.1038/d41586-024-03218-3
- [Google Scholar]
- Efficacy and immunogenicity of a recombinant vesicular stomatitis virus-vectored marburg vaccine in cynomolgus macaques. Viruses. 2024;16:1181. https://doi.org/10.3390/v16081181
- [Google Scholar]
- Using image-based machine learning and numerical simulation to predict pesticide inline mixing uniformity. J Sci Food Agric. 2023;103:705-719. https://doi.org/10.1002/jsfa.12182
- [Google Scholar]
- Isolation and propagation of marburgviruses. Methods Mol Biol. 2025;2877:47-53. https://doi.org/10.1007/978-1-0716-4256-6_3
- [Google Scholar]
- Virus stability on surfaces, in bodily fluids, in wastewater and different environmental conditions. Methods Mol Biol. 2025;2877:439-451. https://doi.org/10.1007/978-1-0716-4256-6_31
- [Google Scholar]
- Quantification of neutralizing antibodies in serum using VSV-MARV-GFP. Methods Mol Biol. 2025;2877:355-360. https://doi.org/10.1007/978-1-0716-4256-6_24
- [Google Scholar]
- ACE2 acts as a novel regulator of TMPRSS2-catalyzed proteolytic activation of influenza a virus in airway cells. J Virol. 2024;98:e0010224. https://doi.org/10.1128/jvi.00102-24
- [Google Scholar]
- Modified oxymatrine as novel therapeutic inhibitors against Monkeypox and Marburg virus through computational drug design approaches. J Cell Mol Med. 2024;28:e70116. https://doi.org/10.1111/jcmm.70116
- [Google Scholar]
- Identifying important microbial and genomic biomarkers for differentiating right- versus left-sided colorectal cancer using random forest models. BMC Cancer. 2023;23:647. https://doi.org/10.1186/s12885-023-10848-9
- [Google Scholar]
- Using natural language processing to characterize and predict homeopathic product-associated adverse events in consumer reviews: Comparison to reports to FDA adverse event reporting system (FAERS) J Am Med Inform Assoc. 2023;31:70-78. https://doi.org/10.1093/jamia/ocad197
- [Google Scholar]
- Oesophageal candidiasis and squamous cell cancer in patients with gain-of-function STAT1 gene mutation. United European Gastroenterol J. 2017;5:625-631. https://doi.org/10.1177/2050640616684404
- [Google Scholar]
- Repurposing of SARS-CoV-2 compounds against marburg Virus using MD simulation, mm/GBSA, PCA analysis, and free energy landscape. J Biomol Struct Dyn. 2024;1-20 https://doi.org/10.1080/07391102.2024.2323701
- [Google Scholar]
- Deadly marburg virus: Scientists race to test vaccines in outbreak. Nature. 2024;634:278. https://doi.org/10.1038/d41586-024-03218-3
- [Google Scholar]
- Deadly marburg virus: Scientists race to test vaccines in outbreak. Nature. 2024;634:278. https://doi.org/10.1038/d41586-024-03218-3
- [Google Scholar]
- Proteasome inhibitors restore the STAT1 pathway and enhance the expression of MHC class I on human colon cancer cells. J Biomed Sci. 2021;28:75. https://doi.org/10.1186/s12929-021-00769-9
- [Google Scholar]
- Mapping knowledge landscapes and emerging trends of Marburg virus: A text-mining study. Heliyon. 2024;10:e29691. https://doi.org/10.1016/j.heliyon.2024.e29691
- [Google Scholar]
- Modeling favipiravir antiviral efficacy against emerging viruses: From animal studies to clinical trials. CPT Pharmacometrics Syst Pharmacol. 2020;9:258-271. https://doi.org/10.1002/psp4.12510
- [Google Scholar]
- From data to harvest: Leveraging ensemble machine learning for enhanced crop yield predictions across Canada amidst climate change. Sci Total Environ. 2024;951:175764. https://doi.org/10.1016/j.scitotenv.2024.175764
- [Google Scholar]
- Machine learning-based prediction models unleash the enhanced production of fucoxanthin in Isochrysis galbana. Front Plant Sci. 2024;15:1461610. https://doi.org/10.3389/fpls.2024.1461610
- [Google Scholar]
- Evaluating machine learning performance in predicting sodium adsorption ratio for sustainable soil-water management in the eastern Mediterranean. J Environ Manage. 2024;370:122640. https://doi.org/10.1016/j.jenvman.2024.122640
- [Google Scholar]
- STAT1 and pathogens, not a friendly relationship. Biochimie. 2010;92:425-444. https://doi.org/10.1016/j.biochi.2010.02.009
- [Google Scholar]
- Comparative molecular docking analysis for analyzing the inhibitory effect of Anakinra and Ustekinumab against IL17F. J Biomol Struct Dyn. 2023;41:13302-13313. https://doi.org/10.1080/07391102.2023.2173299
- [Google Scholar]
- Deubiquitinase USP2a sustains interferons antiviral activity by restricting ubiquitination of activated STAT1 in the nucleus. PLoS Pathog. 2016;12:e1005764. https://doi.org/10.1371/journal.ppat.1005764
- [Google Scholar]
- Machine learning identifies candidates for drug repurposing in Alzheimer’s disease. Nat Commun. 2021;12 https://doi.org/10.1038/s41467-021-21330-0
- [Google Scholar]
- Identification of novel inhibitors against VP40 protein of Marburg virus by integrating molecular modeling and dynamics approaches. J Biomol Struct Dyn. 2025;43:3942-3955. https://doi.org/10.1080/07391102.2023.2300134
- [Google Scholar]
- Prevalence of human filovirus infections in sub-Saharan Africa: A systematic review and meta-analysis protocol. Syst Rev. 2024;13:218. https://doi.org/10.1186/s13643-024-02626-w
- [Google Scholar]
- Inflammatory stress determines the need for chemotherapy in patients with HER2-positive esophagogastric adenocarcinoma receiving targeted and immunotherapy
- Marburg virus disease outbreaks, mathematical models, and disease parameters: A systematic review. Lancet Infect Dis. 2024;24:e307-e317. https://doi.org/10.1016/S1473-3099(23)00515-7
- [Google Scholar]
- Marburg virus disease outbreaks, mathematical models, and disease parameters: A systematic review. Lancet Infect Dis. 2024;24:e307-e317. https://doi.org/10.1016/s1473-3099(23)00515-7
- [Google Scholar]
- Simulation-optimization methods for designing and assessing resilient supply chain networks under uncertainty scenarios: A review. Simul Model Pract Theory. 2021;106:102166. https://doi.org/10.1016/j.simpat.2020.102166
- [Google Scholar]
- Marburg virus evades interferon responses by a mechanism distinct from ebola virus. PLoS Pathog. 2010;6:e1000721. https://doi.org/10.1371/journal.ppat.1000721
- [Google Scholar]
- Assessment of the potential for host-targeted iminosugars UV-4 and UV-5 activity against filovirus infections in vitro and in vivo. Antiviral Res. 2017;138:22-31. https://doi.org/10.1016/j.antiviral.2016.11.019
- [Google Scholar]
- FASTMAP-a flexible and scalable immunopeptidomics pipeline for HLA- and antigen-specific T-cell epitope mapping based on artificial antigen-presenting cells. Front Immunol. 2024;15:1386160. https://doi.org/10.3389/fimmu.2024.1386160
- [Google Scholar]
- Characteristics and outcomes of patients hospitalized for infection with Influenza a, SARS-CoV-2 or respiratory syncytial virus in the season 2023/2024 in a large German primary care centre. Eur J Med Res. 2024;29:509. https://doi.org/10.1186/s40001-024-02096-9
- [Google Scholar]
- Transcriptomics based network analyses and molecular docking highlighted potentially therapeutic biomarkers for colon cancer. Biochem Genet. 2023;61:1509-1527. https://doi.org/10.1007/s10528-023-10333-9
- [Google Scholar]
- Integrating network pharmacology, molecular docking and experimental verification to explore the therapeutic effect and potential mechanism of nomilin against triple-negative breast cancer. Mol Med. 2024;30:166. https://doi.org/10.1186/s10020-024-00928-2
- [Google Scholar]
- IFN regulatory factor 1 restricts hepatitis E virus replication by activating STAT1 to induce antiviral IFN-stimulated genes. Faseb J. 2016;30:3352-3367. https://doi.org/10.1096/fj.201600356R
- [Google Scholar]
- Hepatitis B virus antigens impair NK cell function. Int Immunopharmacol. 2016;38:291-297. https://doi.org/10.1016/j.intimp.2016.06.015
- [Google Scholar]
- Application of explainable machine learning for real-time safety analysis toward a connected vehicle environment. Accid Anal Prev. 2022;171:106681. https://doi.org/10.1016/j.aap.2022.106681
- [Google Scholar]
- Progressive multifocal leukoencephalopathy in primary immune deficiencies: Stat1 gain of function and review of the literature. Clin Infect Dis. 2016;62:986-994. https://doi.org/10.1093/cid/civ1220
- [Google Scholar]
- HergSPred: Accurate classification of hERG blockers/nonblockers with machine-learning models. J Chem Inf Model. 2022;62:1830-1839. https://doi.org/10.1021/acs.jcim.2c00256
- [Google Scholar]





