Hi, I'm Jiaran (Jay) 👋

Data-Driven Decision Making

I hold a Master's in Data Science from NYU and am passionate about uncovering patterns, telling stories with code, and building tools that connect data to real-world decisions. Whether designing predictive models, analyzing behavior, or crafting intuitive interfaces, I love solving meaningful problems at the intersection of statistics, technology, and human experience.

Outside of work, you'll find me exploring new cities and cultures, water diving, — or getting unreasonably excited about lifting a heavy weight.

Get in touch:

jp7238@nyu.edu|+1(201)-423-0970

Connect on LinkedIn View on GitHub

Skills

Analytics & Business Intelligence

A/B Testing, Exploratory Data Analysis, Segmentation & Cohort Analysis, Funnel Analysis, Business Metrics, Behavioral Analysis, Causal Inference, Decision Modeling, Data Visualization (Tableau, Power BI)

Programming

Python (Pandas, NumPy, Scikit-learn, Dash, Plotly), SQL, R

Data Engineering & Infrastructure

ETL & Data Integration, Data Cleaning & Reconciliation, Distributed Computing (Spark, Hadoop), Cloud Platforms (AWS), Version Control (Git)

Machine Learning & AI

Predictive Modeling, Regression & Classification, Time Series Forecasting, Feature Engineering, Ensemble Methods (XGBoost, LightGBM, CatBoost), NLP & LLMs, Explainable AI (SHAP, LIME), Model Validation

Experience

Education

New York University

MS in Data Science

GPA: 3.9/4.0

2023 - 2025

Southern University of Science and Technology

BE in Computer Science

GPA: 3.8/4.0

2019 - 2023

University of Pennsylvania

Exchange Student

GPA: 3.7/4.0

2022

💡 Many of the projects below have detailed code and results —!

Professional Experience

Data Scientist, Product Analytics Intern

ETH Tech

New York, NY

July 2025 – September 2025

Integrated and analyzed 1M+ e-commerce sessions to identify checkout frictions driving a 70% cart abandonment rate; performed segmentation and cohort analyses to identify high-intent users and key drop-off points.
Designed and developed a personalized notification feature, collaborating with design and engineering; ran A/B tests on copy, timing, and CTAs, boosting CTR by 25% and cart-to-purchase conversions by 15%.
Delivered Dash/Plotly dashboards and insights in client-ready format, enabling data-driven product decisions.

Group Leader

Probability of Default Modeling – Applied ML in Finance Project

New York, NY

October 2024 – December 2024

Built an end-to-end probability of default prediction system using a 1M-row SME loan dataset: engineered 14 financial ratios, applied rule-based balance sheet imputation, quantile binning, and one-hot encoding.
Developed a grouped ensemble of 9 LightGBM models, trained in a walk-forward fashion with false negative penalties and weighted predictions to account for class imbalance and credit risk.
Validated model performance on time-ordered out-of-sample test sets; used SHAP plots to interpret key features; achieved 0.875 AUC, outperforming the unsegmented logistic regression baseline (0.78 AUC).

Data Scientist Intern

Guotai Junan Securities

Shanghai, China

June 2024 – August 2024

Built a bond yield prediction pipeline using decision tree models, achieving a 0.73 F1-score on test pricing data.
Developed a stepwise regression model to estimate fund duration, reducing duration volatility by 25%.
Created a real-time repo rate monitoring tool by integrating external data into Excel, enabling downstream modeling and improving fixed income strategy responsiveness.

Quantitative Research Intern

ZADS Fund

Shenzhen, China

February 2023 - April 2023

Engineered 16 high-frequency alpha factors using market microstructure data (Level-2/order book), inspired by academic literature and proprietary research.
For each factor, calculated signal values, evaluated predictive power, and backtested performance using historical intraday data.
Built a machine learning pipeline integrating 10 low-correlation factors to predict T+1 stock returns, achieving an AUC of 0.62 on out-of-sample data.

Data Scientist Intern

China Everbright Bank

Beijing, China

June 2022 – August 2022

Processed 200K+ customer records from 8 relational tables and reduced dimensionality from 700+ to 50 features using WOE encoding and Information Value (IV) selection.
Built and compared binary classification models (logistic regression, decision tree, random forest) to predict customer asset change; selected logistic regression for its interpretability and high AUC (0.85).
Delivered model insights through dashboards, contributing to a 20% lift in conversion and 12% reduction in acquisition cost.

Featured Projects

Probability of Default Modeling

Built a probability of default prediction pipeline on 1M+ loan records—cleaned data, engineered 14 financial ratios as features and applied machine learning to model credit risk.

Achievement: 0.875 AUC (6% above baseline)

LightGBMEnsemble ModelsFinancial Ratios

View Code

Bond Yield Prediction

Built a bond yield prediction pipeline using decision tree models for Guotai Junan Securities, and developed stepwise regression models for fund duration estimation.

Achievement: 0.73 F1-score

Decision TreesStepwise RegressionExcel Dashboard

View Code

Customer Asset Prediction

Processed 200K+ customer records from 8 relational tables; used WOE and IV to reduce 700+ features to 50, building binary classification models to predict asset change.

Achievement: AUC 0.85, 20% conversion boost, 12% cost reduction

Logistic RegressionRandom ForestFeature Engineering

Quantitative Investment

Engineered 16 high-frequency alpha factors using market microstructure data (Level-2/order book), inspired by academic literature and proprietary research. Built a machine learning pipeline integrating 10 low-correlation factors to predict T+1 stock returns.

Achievement: 0.62 AUC on out-of-sample data

Alpha FactorsMachine LearningBacktesting

View Code

Statistical Analysis of Movie Ratings

This data analysis project explores movie ratings from over 1,000 participants across 400 films. I used statistical testing to investigate patterns in preferences—like how gender or birth order might influence opinions on Shrek or The Lion King. I also built regression models using Ridge, LASSO, and Elastic Net to predict ratings based on viewer demographics.

Hypothesis TestingRegressionRidgeLASSOElastic Net

View Code

Beyond the Code

🏋️ Weightlifting

Passionate about strength training and pushing physical limits. Love the discipline and progress that comes with consistent training. Whether it's powerlifting, progressive overload, or just getting unreasonably excited about lifting heavy weights.

During my undergraduate years, I competed in a 15-event all-around fitness competition — including powerlifting's big three (squat, bench press, deadlift), sprints, standing long jump, endurance runs, and more. I won 9 individual events and took home the overall championship title.

PowerliftingProgressive OverloadFitness CompetitionChampionship Winner

🚗 Solo Road Trip

Cross-country adventure, covering 4000 miles across 14 states, from New York City to San Diego, California. Learned to deal with the unexpected alone and to be ready to react and change plans as needed.

Cross-Country4000 Miles14 StatesSolo Adventure

🎯 So Many Things Else