Jiaran (Jay) Peng

Hi, I'm Jiaran (Jay) đź‘‹

Data-Driven Decision Making

Passionate about leveraging data to drive strategic decisions and uncover insights that transform business outcomes.

About Me

Jiaran headshot

Hi, I'm Jiaran — I hold a Master's in Data Science from NYU and am passionate about uncovering patterns, telling stories with code, and building tools that connect data to real-world decisions. Whether designing predictive models, analyzing behavior, or crafting intuitive interfaces, I love solving meaningful problems at the intersection of statistics, technology, and human experience.

Outside of work, you'll find me exploring new cities and cultures, water diving, — or getting unreasonably excited about lifting a heavy weight.

Data Science
Machine Learning
Financial Modeling

Skills

Programming

Python (NumPy, Pandas, Scikit-learn, LightGBM), SQL, R

Statistics

Regression Analysis, Hypothesis Testing, Time Series Forecasting, A/B Testing

Machine Learning

Classification, Regression, Clustering, PCA, Ensemble Methods, CNN, RNN

Other Tools

Hadoop, Spark, AWS, Tableau, Power BI, JIRA, Git

💡 Many of the projects below have detailed code and results —!

Professional Experience

Group Leader

Probability of Default Modeling – Applied ML in Finance Project

New York, NY

October 2024 – December 2024

  • Built an end-to-end probability of default prediction system using a 1M-row SME loan dataset: engineered 14 financial ratios, applied rule-based balance sheet imputation, quantile binning, and one-hot encoding.
  • Developed a grouped ensemble of 9 LightGBM models, trained in a walk-forward fashion with false negative penalties and weighted predictions to account for class imbalance and credit risk.
  • Validated model performance on time-ordered out-of-sample test sets; used SHAP plots to interpret key features; achieved 0.875 AUC, outperforming the unsegmented logistic regression baseline (0.78 AUC).

Data Scientist Intern

Guotai Junan Securities

Shanghai

June 2024 – August 2024

  • Built a bond yield prediction pipeline using decision tree models, achieving a 0.73 F1-score on test pricing data.
  • Developed a stepwise regression model to estimate fund duration, reducing duration volatility by 25%.
  • Created a real-time repo rate monitoring tool by integrating external data into Excel, enabling downstream modeling and improving fixed income strategy responsiveness.

Quantitative Research Intern

ZADS Fund

June 2023 - August 2023

  • Engineered 16 high-frequency alpha factors using market microstructure data (Level-2/order book), inspired by academic literature and proprietary research.
  • For each factor, calculated signal values, evaluated predictive power, and backtested performance using historical intraday data.
  • Built a machine learning pipeline integrating 10 low-correlation factors to predict T+1 stock returns, achieving an AUC of 0.62 on out-of-sample data.

Data Scientist Intern

China Everbright Bank

Beijing

June 2022 – August 2022

  • Processed 200K+ customer records from 8 relational tables and reduced dimensionality from 700+ to 50 features using WOE encoding and Information Value (IV) selection.
  • Built and compared binary classification models (logistic regression, decision tree, random forest) to predict customer asset change; selected logistic regression for its interpretability and high AUC (0.85).
  • Delivered model insights through dashboards, contributing to a 20% lift in conversion and 12% reduction in acquisition cost.

Featured Projects

Probability of Default Modeling Cover

Probability of Default Modeling

Built a probability of default prediction pipeline on 1M+ loan records—cleaned data, engineered 14 financial ratios as features and applied machine learning to model credit risk.

Achievement: 0.875 AUC (6% above baseline)
LightGBMEnsemble ModelsFinancial Ratios
View Code
Bond Yield Prediction Cover

Bond Yield Prediction

Built a bond yield prediction pipeline using decision tree models for Guotai Junan Securities, and developed stepwise regression models for fund duration estimation.

Achievement: 0.73 F1-score
Decision TreesStepwise RegressionExcel Dashboard
View Code
Customer Asset Prediction Cover

Customer Asset Prediction

Processed 200K+ customer records from 8 relational tables; used WOE and IV to reduce 700+ features to 50, building binary classification models to predict asset change.

Achievement: AUC 0.85, 20% conversion boost, 12% cost reduction
Logistic RegressionRandom ForestFeature Engineering
Quantitative Research Intern Cover

Quantitative Investment

Engineered 16 high-frequency alpha factors using market microstructure data (Level-2/order book), inspired by academic literature and proprietary research. Built a machine learning pipeline integrating 10 low-correlation factors to predict T+1 stock returns.

Achievement: 0.62 AUC on out-of-sample data
Alpha FactorsMachine LearningBacktesting
View Code
Statistical Analysis of Movie Ratings Cover

Statistical Analysis of Movie Ratings

This data analysis project explores movie ratings from over 1,000 participants across 400 films. I used statistical testing to investigate patterns in preferences—like how gender or birth order might influence opinions on Shrek or The Lion King. I also built regression models using Ridge, LASSO, and Elastic Net to predict ratings based on viewer demographics.

Hypothesis TestingRegressionRidgeLASSOElastic Net
View Code

Beyond the Code

🏋️ Weightlifting

Passionate about strength training and pushing physical limits. Love the discipline and progress that comes with consistent training. Whether it's powerlifting, progressive overload, or just getting unreasonably excited about lifting heavy weights.

During my undergraduate years, I competed in a 15-event all-around fitness competition — including powerlifting's big three (squat, bench press, deadlift), sprints, standing long jump, endurance runs, and more. I won 9 individual events and took home the overall championship title.

PowerliftingProgressive OverloadFitness CompetitionChampionship Winner
Fitness Competition Medal

đźš— Solo Road Trip

Cross-country adventure, covering 4000 miles across 14 states, from New York City to San Diego, California. Learned to deal with the unexpected alone and to be ready to react and change plans as needed.

Cross-Country4000 Miles14 StatesSolo Adventure
Road Trip

🎯 So Many Things Else

SurfingMotorcycle
Snowboarding