My Portfolio
Table of Contents
👉 My Experiences
About Me
Technical Skills
- Programming & Tools: SQL, Python, R/RStudio, Tableau/Power BI, Git/GitHub; MS Office (Excel, Word, PowerPoint, Access, Outlook, Project)
- Cloud & Big Data: AWS (S3, Redshift, Glue, QuickSight), Snowflake, Spark (PySpark)
- Libraries & Frameworks: Pandas, NumPy, Matplotlib, Seaborn, Scikit‑learn, Statsmodels, SciPy, PyTorch
-
Techniques: Statistical Analysis, Hypothesis Testing, Predictive Modeling, AI/ML, ETL/ELT Pipelines, Data Integration, Data Visualization, Data Warehousing
- Project Management: Agile/Scrum, SOP Documentation
- Healthcare Knowledge: EMR/EHR Systems, ICD-10/CPT Coding Familiarity, Claims Processing, 340B Drug Pricing Program, HIPAA Compliance, Medicaid/Medicare Regulations
Exams & Certifications
- Actuarial Exams: P, FM, SRM (in progress)
- VEEs: Economics, Accounting and Finance, Mathematical Statistics
Portfolio Projects
Customer Purchase Behavior Analysis
GitHub | Report
Tech Stack: Python, Pandas, Scikit-Learn, Matplotlib, Seaborn, Plotly, Statistical Analysis, Git/GitHub
- Performed EDA on Walmart sales data to analyze spending trends by gender, age, and marital status.
- Applied statistical methods (CLT, confidence intervals) to compare demographic spending patterns.
- Developed visualizations using Matplotlib & Seaborn to showcase purchase behavior insights.
- Generated recommendations to improve customer acquisition, retention, and marketing strategies.
Ecommerce Data Pipeline and Forecasting Model
GitHub | Report
Tech Stack: Python (Pandas, NumPy, Matplotlib, Scikit-learn), Spark (PySpark), SQL, Git/GitHub
- Built an OOP-based inventory tracking system for 1,000+ products, ensuring efficient inventory control.
- Implemented a PySpark retail data pipeline to clean and transform large‑scale order records for analysis.
- Applied Random Forest and Regression models to predict future product demand and sales trends.
- Engineered features from time-series data, leveraging weekly sales trends for improved forecasting accuracy.
Banking Marketing and Investment Optimization
GitHub | Report
Tech Stack: Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, PyPortfolioOpt), SQL, Git/GitHub
- Developed a credit card approval model with 81.8% accuracy, automating risk assessment for banks.
- Optimized data pipeline for bank marketing analysis, enabling more precise customer targeting.
- Analyzed financial ratios, identifying industry‑specific risk trends for investment decision‑making.
- Optimized a FAANG stock portfolio using mean‑variance optimization, balancing risk and returns.
Healthcare Employee Attrition Prediction
GitHub | Report
Tech Stack: Python (Pandas, Scikit-Learn, Matplotlib, Seaborn), Decision Tree, Logistic Regression, Git/GitHub
- Developed Decision Tree and Logistic Regression models to predict high‑risk employee attrition.
- Achieved classification accuracy of 80%, providing insights for targeted employee retention strategies.
- Engineered demographic, work-related, and compensation features, improving model interpretability.
- Conducted statistical analysis on work‑life balance, job involvement, and salary trends to support HR decisions.
Customer Subscriber Churn Prediction
GitHub | Report
Tech Stack: Python (Pandas, Scikit-Learn, Matplotlib, Seaborn) Logistic Regression, Decision Tree, Random Forest, K-Means Clustering, Git/GitHub
- Performed EDA on Walmart sales data to analyze spending trends by gender, age, and marital status.
- Applied statistical methods (CLT, confidence intervals) to compare demographic spending patterns.
- Developed visualizations using Matplotlib & Seaborn to showcase purchase behavior insights.
- Generated recommendations to improve customer acquisition, retention, and marketing strategies.
University Mental Health Research Study
GitHub | Report
Tech Stack: SQL, Python (Pandas, Seaborn, Scikit‑learn, SciPy, Statsmodels), Git/GitHub
- Conducted statistical analysis on 200+ international students, uncovering mental health trends.
- Developed a Random Forest model predicting student depression risk with 75% accuracy, providing early intervention and recommendations to enhance peer‑support programs.
Technology: Excel (Pivot Tables), Python (Pandas, Seaborn), Tableau (dashboards)
- Analyzed US pharma sales data to assess market share, pricing, and revenue across regions and therapeutic areas; visualized trends to distinguish volume-driven vs. premium-priced drugs.
- Delivered insights and recommendations on pricing strategy and affordability, supporting market access decisions based on therapeutic value.
Technology: SQL (MySQL), Tableau (dashboards), Excel & PowerPoint (reporting)
- Analyzed 40,000+ admissions from university’s factsheets and data table tables. Identified a 41% growth in undergraduate enrollments and identify gaps in faculty to support resource allocation for Finance Department.
- Developed dashboards to present 1000+ enrollments, integrated forecasting models leading to recommendations for hiring faculty specialized in risk management to address industry demands and enhance program prestige.
Technology: Python (Pandas, NumPy, Statsmodels), SQL (PostgreSQL, SQLAlchemy), Git/GitHub
- Designed and implemented an SPC‑based monitoring system, reducing manufacturing defects by identifying deviations in control limits.
- Built a predictive model for car insurance claims, pinpointing driving experience as the strongest predictor with 77.71% accuracy.
Technology: SQL (MySQL), Tableau (dashboards), Excel & PowerPoint (reporting)
- Designed interactive dashboards analyzing state-level income, expenses, unemployment rates, cost of living, and population trends.
- Integrated and processed datasets covering diverse economic indicators (e.g., median income trends from 2012–2023, cost of living indices, and unemployment rates).