Thanh
M. Brown
Data Analyst with 3+ years working experience in bioinformatics and an M.S. in Operations Research, leveraging statistical analysis and machine learning to extract insights from complex data, and transitioning these skills into data science applications in healthcare and clinical analytics.
Capabilities
Technical Skills
Full-stack data science from raw data to deployed model.
Work
Portfolio Projects
Health data, clinical ML, large-scale EDA, and bioinformatics research.
01 — Big Data · PySpark New
Income Distribution & Healthcare Spending Analysis at Scale
Large-scale EDA on a clinical health dataset using Apache Spark. Explores income levels vs. healthcare spending patterns across populations. Covers distributed ingestion, schema validation, statistical summaries, and visual insight extraction — beyond single-machine pandas capacity.
02 — Clinical ML · Python · Scikit-learn
Osteoporosis Risk Prediction with Ensemble Methods
Explored an osteoporosis case-control dataset through demographic-driven EDA and built classification models (Logistic Regression, Random Forest, SVM, Gradient Boosting). Top models achieved strong ROC performance, but consistently favored the negative class — highlighting the real-world challenges of identifying positive cases in imbalanced clinical data.
03 — Public Health · Unsupervised Learning
COVID-19 Vaccine Adverse Symptoms — Association Rule Mining
To explore vaccine hesitancy, I analyzed COVID-19 adverse event data from VAERS (CDC/FDA). Using association rule mining, I identified frequent symptom patterns and compared differences in reported adverse events between Moderna and Pfizer vaccines. Key insight: adverse event patterns were broadly similar, suggesting perceived safety differences may be driven more by reporting frequency than fundamentally different symptom profiles.
04 — FDA Data · R · Interactive App
FDA Medical Device Harm Trends — RShiny Dashboard
Built an interactive visualization app over the 2016 MAUDE (FDA medical device passive surveillance) dataset. Users explore temporal harm trends across device categories and manufacturers. Demonstrates stakeholder-facing data product design.
05 — Bioinformatics Research · Docker Deployment
ISCVAM — Interactive Visual Analytics for Single-Cell Multiomics Research
An interactive visual analytics platform for single-cell multiome data. Integrates sc-RNA and sc-ATAC data to study transcriptomic and epigenetic profiles simultaneously. Features flexible clustering to identify rare cell populations, and supports cross-dataset comparison of up to three datasets for reproducibility. Accepted for presentation at AACR 2023.
Background
About Me
I am a Data Analyst with 3+ years of experience in bioinformatics and clinical data, specializing in extracting insights from complex, high-dimensional datasets. My work spans machine learning, statistical analysis, and building reproducible workflows for real-world health and genomic applications.
Working in bioinformatics has trained me to navigate ambiguity — where data is messy, sparse, and highly domain-specific — and turn it into structured analysis and actionable insights. I'm a fast learner who thrives in challenging environments and continuously expands my skill set, including ML techniques and scalable data processing with Apache Spark.
I am now pursuing data science roles where I can apply this foundation to solve impactful problems, leveraging my experience in bioinformatics to build models, uncover insights, and drive data-informed decisions in health, biotech, and beyond.