Toby Liu

Hi, I'm Toby Liu

Data Scientist & ML Engineer

Building intelligent systems at the intersection of machine learning and healthcare. Currently pursuing my B.S. in Statistics & Data Science at UC Santa Barbara.

Featured Projects

Machine learning solutions for real-world healthcare challenges

Machine Learning

Cancer Treatment Response Predictor

Built Random Forest and XGBoost models to predict prostate cancer treatment responses from gene expression data (TCGA-PRAD), achieving 70% AUC. Implemented deep learning classifiers using TensorFlow for kidney cancer data analysis.

Python TensorFlow XGBoost TCGA
Healthcare Analytics

Hospital Cost Prediction Pipeline

Developed an XGBoost model to predict inpatient procedure costs from SPARCS data, achieving 91% R² with Optuna hyperparameter tuning. Added fairness audits, drift simulations, and anomaly detection, reducing manual QA time by 50%.

Python XGBoost SHAP Streamlit
MLOps

ML Model Audit Copilot

Built a modular ML audit framework to catch drift, data leakage, fairness issues, and schema mismatches. Automated 7+ audit workflows with CLI + SQL support, integrating SHAP explainability and demographic bias breakdowns.

Python SQL SHAP CLI

Professional Experience

Driving innovation through data science and machine learning

Machine Learning Research Intern

Houston Methodist's Medical AI Lab Jun 2024 - Sep 2024
  • Built Random Forest and XGBoost models to predict prostate cancer treatment responses from gene expression data (TCGA-PRAD), achieving 70% AUC
  • Processed 500+ patient samples with 60K+ gene expression features using high-dimensional data reduction techniques
  • Developed deep learning classifiers in TensorFlow for kidney cancer (TCGA-KIRC, TCGA-KIRP) data analysis
  • Created interactive dashboards using Streamlit for model interpretability and clinical insights
Python TensorFlow XGBoost TCGA Streamlit

Bioinformatics Research Intern

Boston Children's Hospital Jun 2023 - Aug 2023
  • Analyzed 50K+ genomic sequences using Python, identifying 15 novel genetic variants linked to neurodevelopmental disorders
  • Built automated bioinformatics pipelines processing 10GB+ datasets daily, reducing analysis time by 40%
  • Implemented statistical models for variant calling with 95% accuracy using scikit-learn and custom algorithms
  • Collaborated with geneticists to validate findings, contributing to 2 research publications
Python Bioinformatics Genomics Statistical Analysis

Data Science Intern

Healthcare Analytics Startup Sep 2023 - Jun 2024
  • Developed XGBoost model to predict hospital readmission rates, achieving 91% R² score with hyperparameter optimization
  • Created ML audit framework detecting drift, data leakage, and fairness issues across 7+ production models
  • Built real-time anomaly detection system using isolation forests, flagging 200+ data quality issues monthly
  • Implemented SHAP-based model explainability features, improving stakeholder trust and model adoption by 35%
Python XGBoost MLOps SHAP SQL

About Me

I'm a Data Scientist with a passion for leveraging machine learning to solve complex healthcare challenges. Currently pursuing my B.S. in Statistics & Data Science and B.A. in Economics at UC Santa Barbara, I've had the privilege of working with leading healthcare institutions to develop impactful ML solutions.

My experience in the healthcare analytics and health tech industry has given me unique insights into how data science can transform patient care. At Houston Methodist's Medical AI Lab, I developed predictive models for cancer treatment responses, while my work at Boston Children's Hospital focused on genomic sequence analysis and computational biology.

I specialize in building end-to-end machine learning pipelines, from feature engineering and model development to deployment and monitoring. My projects emphasize not just accuracy, but also interpretability, fairness, and real-world applicability.

91%

Model Accuracy

50K+

Genomic Sequences Analyzed

60%

Audit Time Reduction

Technical Skills

Languages

Python SQL R

ML/DL

TensorFlow PyTorch Scikit-learn XGBoost

Cloud & Tools

GCP AWS Airflow Git

Domains

Healthcare Analytics Bioinformatics NLP Time Series

Get In Touch

Let's discuss how we can work together

Location

San Francisco, CA