Open to Data Science roles

Thanh
M. Brown

Data Analyst transitioning into Data Scientist, with 4+ years of experience analyzing complex datasets and building predictive models. Backed by an M.S. in Operations Research, I bring a rigorous, quantitative approach to solving complex analytical problems across industries.

Experience
4+ Years
Projects
6 End-to-End
ML Stack
Python · Spark · R
Domain
Health & Genomics

Capabilities

Technical Skills

Full-stack data science from raw data to deployed model.

🐍
Languages
Python R SQL
⚙️
Data Science
Machine Learning Statistical Modeling Feature Engineering Hypothesis Testing
Big Data & Distributed Computing
PySpark HPC (SLURM) Parallel Processing
🐳
MLOps & Deployment
Docker Containerized Workflows Reproducible Pipelines
📈
Data Visualization & Applications
R Shiny Plotly Tableau
🛠️
Tools & Environment
Git Jupyter Notebook VS Code

Work

Portfolio Projects

Health data, clinical ML, large-scale EDA, and bioinformatics research.

02 — SQL Data Model · Snowflake · Python New

Neighborhood Economic Risk — U.S. Census at Scale

Built a 3-layer Snowflake SQL model transforming raw ACS Census data (2019–2020, 220k+ neighborhoods) into income adversity insights. Unemployment in the lowest income tier is 4× higher than the highest. High-Risk neighborhoods grew +57% from 2019→2020, reflecting COVID-19 economic impact. Includes staged data pipeline, data dictionary, and binary classification model (Stage 2 in progress).

SQL Snowflake US Census · ACS Python scikit-learn Dash

03 — Clinical ML · Python · Scikit-learn

Osteoporosis Risk Prediction with Ensemble Methods

Explored an osteoporosis case-control dataset through demographic-driven EDA and built classification models (Logistic Regression, Random Forest, SVM, Gradient Boosting). Top models achieved strong ROC performance, but consistently favored the negative class — highlighting the real-world challenges of identifying positive cases in imbalanced clinical data.

Classification EDA ROC/AUC Class Imbalance Scikit-learn

04 — Public Health · Unsupervised Learning

COVID-19 Vaccine Adverse Symptoms — Association Rule Mining

To explore vaccine hesitancy, I analyzed COVID-19 adverse event data from VAERS (CDC/FDA). Using association rule mining, I identified frequent symptom patterns and compared differences in reported adverse events between Moderna and Pfizer vaccines. Key insight: adverse event patterns were broadly similar, suggesting perceived safety differences may be driven more by reporting frequency than fundamentally different symptom profiles.

Association Rules VAERS · CDC/FDA Unsupervised Public Health

05 — FDA Data · R · Interactive App

FDA Medical Device Harm Trends — RShiny Dashboard

Built an interactive visualization app over the 2016 MAUDE (FDA medical device passive surveillance) dataset. Users explore temporal harm trends across device categories and manufacturers. Demonstrates stakeholder-facing data product design.

RShiny FDA · MAUDE Time-series Dashboard R

06 — Bioinformatics Research · Docker Deployment

ISCVAM — Interactive Visual Analytics for Single-Cell Multiomics Research

An interactive visual analytics platform for single-cell multiome data. Integrates sc-RNA and sc-ATAC data to study transcriptomic and epigenetic profiles simultaneously. Features flexible clustering to identify rare cell populations, and supports cross-dataset comparison of up to three datasets for reproducibility. Accepted for presentation at AACR 2023.

Multiomics Reproducible pipeline HPC-slurm Docker Deployment Research

Background

About Me

I am a Data Analyst with 4+ years of experience in bioinformatics and clinical data, specializing in extracting insights from complex, high-dimensional datasets. My work spans machine learning, statistical analysis, and building reproducible workflows for real-world health and genomic applications.

Working in bioinformatics has trained me to navigate ambiguity — where data is messy, sparse, and highly domain-specific — and turn it into structured analysis and actionable insights. I'm a fast learner who thrives in challenging environments and continuously expands my skill set, including ML techniques and scalable data processing with Apache Spark.

I am now pursuing data science roles where I can apply this foundation to solve impactful problems, leveraging my experience in bioinformatics to build models, uncover insights, and drive data-informed decisions in health, biotech, and beyond.

Contact

Get in Touch