Open to Data Science roles

Thanh
M. Brown

Data Analyst with 3+ years working experience in bioinformatics and an M.S. in Operations Research, leveraging statistical analysis and machine learning to extract insights from complex data, and transitioning these skills into data science applications in healthcare and clinical analytics.

Experience
3+ Years
Projects
5 End-to-End
ML Stack
Python · Spark · R
Domain
Health & Genomics

Capabilities

Technical Skills

Full-stack data science from raw data to deployed model.

🐍
Languages
Python R SQL
⚙️
Data Science
Machine Learning Statistical Modeling Feature Engineering Hypothesis Testing
Big Data & Distributed Computing
PySpark HPC (SLURM) Parallel Processing
🐳
MLOps & Deployment
Docker Containerized Workflows Reproducible Pipelines
📈
Data Visualization & Applications
R Shiny Plotly Tableau
🛠️
Tools & Environment
Git Jupyter Notebook VS Code

Work

Portfolio Projects

Health data, clinical ML, large-scale EDA, and bioinformatics research.

02 — Clinical ML · Python · Scikit-learn

Osteoporosis Risk Prediction with Ensemble Methods

Explored an osteoporosis case-control dataset through demographic-driven EDA and built classification models (Logistic Regression, Random Forest, SVM, Gradient Boosting). Top models achieved strong ROC performance, but consistently favored the negative class — highlighting the real-world challenges of identifying positive cases in imbalanced clinical data.

Classification EDA ROC/AUC Class Imbalance Scikit-learn

03 — Public Health · Unsupervised Learning

COVID-19 Vaccine Adverse Symptoms — Association Rule Mining

To explore vaccine hesitancy, I analyzed COVID-19 adverse event data from VAERS (CDC/FDA). Using association rule mining, I identified frequent symptom patterns and compared differences in reported adverse events between Moderna and Pfizer vaccines. Key insight: adverse event patterns were broadly similar, suggesting perceived safety differences may be driven more by reporting frequency than fundamentally different symptom profiles.

Association Rules VAERS · CDC/FDA Unsupervised Public Health

04 — FDA Data · R · Interactive App

FDA Medical Device Harm Trends — RShiny Dashboard

Built an interactive visualization app over the 2016 MAUDE (FDA medical device passive surveillance) dataset. Users explore temporal harm trends across device categories and manufacturers. Demonstrates stakeholder-facing data product design.

RShiny FDA · MAUDE Time-series Dashboard R

05 — Bioinformatics Research · Docker Deployment

ISCVAM — Interactive Visual Analytics for Single-Cell Multiomics Research

An interactive visual analytics platform for single-cell multiome data. Integrates sc-RNA and sc-ATAC data to study transcriptomic and epigenetic profiles simultaneously. Features flexible clustering to identify rare cell populations, and supports cross-dataset comparison of up to three datasets for reproducibility. Accepted for presentation at AACR 2023.

Multiomics Reproducible pipeline HPC-slurm Docker Deployment Research

Background

About Me

I am a Data Analyst with 3+ years of experience in bioinformatics and clinical data, specializing in extracting insights from complex, high-dimensional datasets. My work spans machine learning, statistical analysis, and building reproducible workflows for real-world health and genomic applications.

Working in bioinformatics has trained me to navigate ambiguity — where data is messy, sparse, and highly domain-specific — and turn it into structured analysis and actionable insights. I'm a fast learner who thrives in challenging environments and continuously expands my skill set, including ML techniques and scalable data processing with Apache Spark.

I am now pursuing data science roles where I can apply this foundation to solve impactful problems, leveraging my experience in bioinformatics to build models, uncover insights, and drive data-informed decisions in health, biotech, and beyond.

Contact

Get in Touch