ComputationalBiologywith Clicks

A platform bridging machine learning and omics-biology to unveil the encrypted language of life.

Read Abstract
DGEANOT.VENNHEATMAPMLNETWORK

Keywords

Graphical Abstract

Graphical Abstract

Abstract

Background

Transcriptomic biomarker discovery has been a challenge due to variation in datasets and platforms, complexity in statistical and computational methods, integration of multiple programing languages, intricacy of ML workflow to evaluate biomarkers. Standard workflows necessitate several stages (quality control, normalization, differential expression), typically executed in R or Python, resulting in bottlenecks for non-experts.

Method

We present omicML, an intuitive graphical user interface (GUI) that combines transcriptomic data analysis with machine learning (ML)-based classification via integrating R and Python packages/libraries. It supports both RNA-Seq and microarray data, automating preprocessing and differential expression analysis. Our extensive ML pipeline enables both supervised and unsupervised learning, integrates various datasets based on candidate gene signatures, and systematically finalizes the biomarker algorithm.

Result

In a case study, omicML identified a six-gene diagnostic model that distinguishes Mpox (monkeypox virus) infections from those caused by other viruses, including SARS-CoV-2, HIV, Ebola, and varicella-zoster. These results illustrate omicML's capacity to discern clinically relevant biomarkers from complex transcriptome data.

Conclusion

Integrating data normalization, differential gene expression analysis, annotation, heatmap analysis, dataset integration, batch effect removal, machine learning analysis, and functional analysis into a unified system diminishes technical barriers and accelerates the conversion of expression data into diagnostic insights for clinicians and bench scientists.

Key Features

Core capabilities of the OmicML framework

⁠Automated Data Preprocessing

Imputation of missing values and batch effect correction to make the expression matrix statistically significant

Detailed information about configuration options, supported data formats, and example workflows available in paper.

⁠Cross-Platform Compatibility

both RNA-Seq and microarray datasets, enabling cross-platform analysis

Detailed information about configuration options, supported data formats, and example workflows available in paper.

Annotation

Broad taxonomic coverage for gene level annotation across 367 species

Detailed information about configuration options, supported data formats, and example workflows available in paper.

⁠Integrated ML Framework

Data standardization, feature selection, benchmarking, nested cross-validation, hyperparameter tuning, feature importance, single-gene model building, multi-gene model (biomarker algorithm) building

Detailed information about configuration options, supported data formats, and example workflows available in paper.

⁠Biomarker Discovery and Validation

Gene-model based identification of biomarker for distinct conditions

Detailed information about configuration options, supported data formats, and example workflows available in paper.

⁠Network Analysis and Functional Enrichment

Contextualization of the candidate biomarkers within biological pathways

Detailed information about configuration options, supported data formats, and example workflows available in paper.

Workflow

Workflow Diagram

Our Team

The researchers and developers behind OmicML

Supervisors

Tanvir Hossain
Tanvir Hossain
Principal Investigator
Saifuddin Sarker
Saifuddin Sarker
Co-Principal Investigator
Preonath Chondrow Dev
Preonath Chondrow Dev
Co-Principal Investigator

Research Students

Joy Prokash Debnath
Joy Prokash Debnath
Research Student
Kabir Hossen
Kabir Hossen
Research Student
Shawon Majid
Shawon Majid
Research Student
Md. Mehrajul Islam
Md. Mehrajul Islam
Research Student
Md. Sayeam Khandaker
Md. Sayeam Khandaker
Research Student
Siam Arefin
Siam Arefin
Research Student

Package versions

R packages

WGCNAv-1.73
DESeq2v-1.44.0
tidyversev-2.0.0
Rtsnev-0.17
umapv-0.2.10.0
ggplot2v-3.5.1
readrv-2.1.5
limmav-3.62.2
apev-5.8.1
micev-3.17.0
dplyrv-1.1.4
BiocManagerv-1.30.25
biomaRtv-2.60.1
gplotsv-3.2.0
ggVennDiagramv-1.5.2
pheatmapv-1.0.12
RColorBrewerv-1.1.3
svav-3.52.0
STRINGdbv-2.18.0
stringrv-1.5.1

Python packages

annotated-typesv-0.7.0
anyiov-4.4.0
bcryptv-4.0.1
certifiv-2024.7.4
cffiv-1.17.0
clickv-8.1.7
contourpyv-1.3.1
cryptographyv-43.0.0
cyclerv-0.12.1
dnspythonv-2.6.1
ecdsav-0.19.0
email_validatorv-2.2.0
fastapiv-0.112.0
fastapi-cliv-0.0.5
fonttoolsv-4.56.0
h11v-0.14.0
httpcorev-1.0.5
httptoolsv-0.6.1
httpxv-0.27.0
idnav-3.7
Jinja2v-3.1.4
joblibv-1.4.2
josev-1.0.0
kiwisolverv-1.4.8
llvmlitev-0.44.0
markdown-it-pyv-3.0.0
MarkupSafev-2.1.5
matplotlibv-3.7.1
mdurlv-0.1.2
numbav-0.61.0
numpyv-1.24.3
packagingv-24.2
pandasv-2.2.2
passlibv-1.7.4
pillowv-11.1.0
pyasn1v-0.6.0
pycparserv-2.22
pydanticv-2.8.2
pydantic_corev-2.20.1
Pygmentsv-2.18.0
PyJWTv-2.9.0
pynndescentv-0.5.13
pyparsingv-3.2.1
python-dateutilv-2.9.0.post0
python-dotenvv-1.0.1
python-josev-3.3.0
python-multipartv-0.0.9
pytzv-2024.1
PyYAMLv-6.0.2
richv-13.7.1
rpy2v-3.5.16
rsav-4.9
scikit-learnv-1.6.1
scipyv-1.15.2
seabornv-0.13.2
shellinghamv-1.5.4
sixv-1.16.0
sniffiov-1.3.1
SQLAlchemyv-2.0.32
starlettev-0.37.2
threadpoolctlv-3.5.0
tqdmv-4.67.1
typerv-0.12.3
typing_extensionsv-4.12.2
tzdatav-2024.1
tzlocalv-5.2
umap-learnv-0.5.7
uvicornv-0.30.5
uvloopv-0.21.0
watchfilesv-0.23.0
websocketsv-12.0
xgboostv-2.1.4
dask[dataframe]v-2024.12.1

Note: Package versions are periodically updated to ensure compatibility and access to the latest features.

Future Perspectives

To address the gaps in OmicML v1.0, future development will prioritize the introduction of additional modules and data types. Proposed improvements encompass gene co-expression networks, survival analysis (notably for cancer cohorts), deep-learning frameworks, integration of proteomics data, single-cell RNA-Seq analysis, ChIP-Seq data processing, spatial transcriptomics, multi-omics integration, AI-driven multi-omics modelling, single-cell ATAC-Seq analysis, specialized modules for bulk RNA-Seq, pan-cancer comparative analyses, and AI applications in genomics, primer design, and PCR data analysis. This enhanced capability will establish omicML as a multifaceted and robust instrument for translational omics analysis.