Thanh
M. Brown
Data Analyst transitioning into Data Scientist, with 4+ years of experience analyzing complex datasets and building predictive models. Backed by an M.S. in Operations Research, I bring a rigorous, quantitative approach to solving complex analytical problems across industries.
Capabilities
Technical Skills
Full-stack data science from raw data to deployed model.
Work
Portfolio Projects
Health data, clinical ML, large-scale EDA, and bioinformatics research.
01 — Big Data · PySpark New
Income Distribution & Healthcare Spending Analysis at Scale
Large-scale EDA on a clinical health dataset using Apache Spark. Explores income levels vs. healthcare spending patterns across populations. The findings may reflect increased healthcare access through employer-provided insurance and higher treatment utilization among middle-income individuals.
02 — SQL Data Model · Snowflake · Python New
Neighborhood Economic Risk — U.S. Census at Scale
Built a 3-layer Snowflake SQL model transforming raw ACS Census data (2019–2020, 220k+ neighborhoods) into income adversity insights. Unemployment in the lowest income tier is 4× higher than the highest. High-Risk neighborhoods grew +57% from 2019→2020, reflecting COVID-19 economic impact. Includes staged data pipeline, data dictionary, and binary classification model (Stage 2 in progress).
03 — Clinical ML · Python · Scikit-learn
Osteoporosis Risk Prediction with Ensemble Methods
Explored an osteoporosis case-control dataset through demographic-driven EDA and built classification models (Logistic Regression, Random Forest, SVM, Gradient Boosting). Top models achieved strong ROC performance, but consistently favored the negative class — highlighting the real-world challenges of identifying positive cases in imbalanced clinical data.
04 — Public Health · Unsupervised Learning
COVID-19 Vaccine Adverse Symptoms — Association Rule Mining
To explore vaccine hesitancy, I analyzed COVID-19 adverse event data from VAERS (CDC/FDA). Using association rule mining, I identified frequent symptom patterns and compared differences in reported adverse events between Moderna and Pfizer vaccines. Key insight: adverse event patterns were broadly similar, suggesting perceived safety differences may be driven more by reporting frequency than fundamentally different symptom profiles.
05 — FDA Data · R · Interactive App
FDA Medical Device Harm Trends — RShiny Dashboard
Built an interactive visualization app over the 2016 MAUDE (FDA medical device passive surveillance) dataset. Users explore temporal harm trends across device categories and manufacturers. Demonstrates stakeholder-facing data product design.
06 — Bioinformatics Research · Docker Deployment
ISCVAM — Interactive Visual Analytics for Single-Cell Multiomics Research
An interactive visual analytics platform for single-cell multiome data. Integrates sc-RNA and sc-ATAC data to study transcriptomic and epigenetic profiles simultaneously. Features flexible clustering to identify rare cell populations, and supports cross-dataset comparison of up to three datasets for reproducibility. Accepted for presentation at AACR 2023.
Background
About Me
I am a Data Analyst with 4+ years of experience in bioinformatics and clinical data, specializing in extracting insights from complex, high-dimensional datasets. My work spans machine learning, statistical analysis, and building reproducible workflows for real-world health and genomic applications.
Working in bioinformatics has trained me to navigate ambiguity — where data is messy, sparse, and highly domain-specific — and turn it into structured analysis and actionable insights. I'm a fast learner who thrives in challenging environments and continuously expands my skill set, including ML techniques and scalable data processing with Apache Spark.
I am now pursuing data science roles where I can apply this foundation to solve impactful problems, leveraging my experience in bioinformatics to build models, uncover insights, and drive data-informed decisions in health, biotech, and beyond.