Close

Toby Liu

Data Scientist

Resume

About Me

I'm a driven machine learning and data science student at UC Santa Barbara, passionate about building impactful tools that sit at the intersection of AI and real-world decision-making. My experiences span computational biology, bioinformatics, and applied statistical modeling, with hands-on work in both academic research and large-scale ML projects. I'm especially interested in healthcare analytics, causal inference, and end-to-end ML pipelines. Let's build something meaningful.

Experience

Boston Childrens Hospital

Data Science Intern - Computational Biology

Analyzed over 10,000 genomic sequences using Python-based DANPOS tools to investigate nucleosome dynamics and protein-DNA occupancy. Developed EDA pipelines and visualizations to enhance genomic insight communication, improving pipeline efficiency by 8%.

Baylor College of Medicine

Student Intern - Bioinformatics Lab

Built interactive R dashboards for genetic simulation data. Designed ETL pipelines using Python and R to automate preprocessing, reducing data wrangling time by 10% and accelerating model iterations for hypothesis testing.

Education

University of California - Santa Barbara

Expected June 2026

B.S Statistics & Data Science, B.A. Economics - College of Letters and Sciences Honors Program

Coursework includes Statistical Machine Learning (Grad), Econometrics, Design of Experiments, Regression Analysis, and Data Science Principles. Member of Honors Program and awarded 1st place in Houston Hackathon.

Universitat De Barcelona

Fall 2024

Study Abroad - Economics

Semester abroad focused on international economic models, quantitative methods, and European data infrastructure. Strengthened adaptability and global collaboration skills.

Projects

Hospital Price Prediction Model

Built a supervised ML pipeline on 2.5M hospital records to predict inpatient charges. Combined preprocessing, feature engineering, XGBoost regression, and SHAP explainability. Deployed via Streamlit and optimized with Optuna.

View Project

Clinical Trial Outccome Prediction

Designed a machine learning system to predict clinical trial success using structured trial metadata and biomedical embeddings. Leveraged transformer-based NLP to extract insights from drug indications, trial descriptions, and biomedical literature. Aimed at reducing R&D risk for biotech pipelines.

View Project

Satellite Damage Detection for Disaster Response

Built a deep learning pipeline using CNNs and pretrained vision transformers to identify infrastructure damage in satellite imagery post-disaster. Automated geotagged severity mapping for emergency teams. Trained on open datasets like xView2 and validated on real flood/fire zones.

View Project

Skills

Get in Touch