Projects

Prediction Model for Damage of Box-Girder Seat-Type Ordinary Bridge Structures

Goal: Predict bridge column Damage Index (DI) from earthquake ground-motion inputs to support risk assessment, retrofit prioritization, and safer design decisions.

Data: Simulated ground motions + column response data (OpenSees). Predictors include: event parameters (magnitude, distance, hazard level), intensity measures (e.g., PGV/PGA), and column properties (e.g., stiffness, geometry, reinforcement).

Method: EDA → LASSO variable selection → Generalized Additive Models (one per hazard level) and Neural Networks (3 hidden layers). Compared prediction quality using error frequency tables and RMSE/correlation/R² across locations.

Key results: GAM at hazard level 225 achieved ~70% of predictions within absolute error ≤ 0.05 and ~92% within ≤ 0.10. Neural nets achieved RMSE ≈ 0.078–0.088 with correlations up to ~0.94 (SF/LA/SD).

Python LASSO GAM Neural Nets Risk Modeling

Poster Github

Exploring Factors Affecting the Diagnosis of Alzheimer’s Disease: A Statistical Machine Learning Approach

Goal: (1) Identify significant demographic & health risk factors associated with Alzheimer’s diagnosis, and (2) build a reliable predictive model for diagnosis.

Data: National Alzheimer’s Coordinating Center (NACC), using de-identified participant records (most recent encounter per participant; 2005–2022). Combined Uniform Data Set (UDS) and (subset) Biomarker Data Set (BDS).

Method: Binomial logistic regression with LASSO-based predictor selection, Random Forests, and a Neural Network (one hidden layer). Built an internal R package (NACCdata) to streamline cleaning, wrangling, and analysis workflows.

Key results: Model comparison showed AUC ≈ 0.73 (logistic), 0.85 (random forest), and 0.90 (neural net), with accuracy ≈ 67% (logistic) vs. ≈ 83% (random forest / neural net). Significant predictors included items like hypertension, depression, stroke history, hypercholesterolemia, smoking intensity, BMI, and education.

R GLM Random Forest Neural Net AUC

Poster Github

K-Means Cluster-Based Classification or Regression

Goal: Make unsupervised clustering evaluation more principled by mapping clusters to true classes optimally (one-to-one) and quantifying stability across repeated runs.

Data: Iris dataset; evaluated accuracy across multiple predictor subsets (petal and sepal measurements) to understand which feature sets drive performance.

Method: Random-restart K-means → build contingency table (clusters × true labels) → solve the Linear Sum Assignment Problem (Hungarian algorithm) for optimal relabeling → bootstrap repeats to assess variability and rank feature sets by mean accuracy and runtime.

Key results: Achieved strong clustering-based classification performance (reported ~89% overall in the poster abstract), with top feature subsets reaching ~0.98+ mean accuracy (e.g., Petal.Length + Petal.Width ≈ 0.986) and fast runtimes.

Clustering K-Means Hungarian (LSAP) Bootstrapping Model Validation

Poster GitHub