Prediction Model for Damage of Box-Girder Seat-Type Ordinary Bridge Structures
Goal: Predict bridge column Damage Index (DI) from earthquake ground-motion inputs to support
risk assessment, retrofit prioritization, and safer design decisions.
Data: Simulated ground motions + column response data (OpenSees). Predictors include:
event parameters (magnitude, distance, hazard level), intensity measures (e.g., PGV/PGA), and
column properties (e.g., stiffness, geometry, reinforcement).
Method: EDA → LASSO variable selection → Generalized Additive Models (one per hazard level)
and Neural Networks (3 hidden layers). Compared prediction quality using error frequency tables and
RMSE/correlation/R² across locations.
Key results: GAM at hazard level 225 achieved ~70% of predictions within absolute error ≤ 0.05 and
~92% within ≤ 0.10. Neural nets achieved RMSE ≈ 0.078–0.088 with correlations up to ~0.94 (SF/LA/SD).
Python
LASSO
GAM
Neural Nets
Risk Modeling
Exploring Factors Affecting the Diagnosis of Alzheimer’s Disease: A Statistical Machine Learning Approach
Goal: (1) Identify significant demographic & health risk factors associated with Alzheimer’s diagnosis,
and (2) build a reliable predictive model for diagnosis.
Data: National Alzheimer’s Coordinating Center (NACC), using de-identified participant records
(most recent encounter per participant; 2005–2022). Combined Uniform Data Set (UDS) and (subset)
Biomarker Data Set (BDS).
Method: Binomial logistic regression with LASSO-based predictor selection, Random Forests,
and a Neural Network (one hidden layer). Built an internal R package (NACCdata) to streamline
cleaning, wrangling, and analysis workflows.
Key results: Model comparison showed AUC ≈ 0.73 (logistic), 0.85 (random forest), and 0.90 (neural net),
with accuracy ≈ 67% (logistic) vs. ≈ 83% (random forest / neural net). Significant predictors included items like
hypertension, depression, stroke history, hypercholesterolemia, smoking intensity, BMI, and education.
R
GLM
Random Forest
Neural Net
AUC
K-Means Cluster-Based Classification or Regression
Goal: Make unsupervised clustering evaluation more principled by mapping clusters to true classes
optimally (one-to-one) and quantifying stability across repeated runs.
Data: Iris dataset; evaluated accuracy across multiple predictor subsets (petal and sepal measurements)
to understand which feature sets drive performance.
Method: Random-restart K-means → build contingency table (clusters × true labels) → solve the
Linear Sum Assignment Problem (Hungarian algorithm) for optimal relabeling → bootstrap repeats to assess
variability and rank feature sets by mean accuracy and runtime.
Key results: Achieved strong clustering-based classification performance (reported ~89% overall in the poster abstract),
with top feature subsets reaching ~0.98+ mean accuracy (e.g., Petal.Length + Petal.Width ≈ 0.986) and fast runtimes.
Clustering
K-Means
Hungarian (LSAP)
Bootstrapping
Model Validation