Profile

My Profile

Table of Contents

Remove /mindpalace... to see my resume website! 👀


👋 Hi there, I’m Khoa Pham!

🎓 I’m a Master’s student majoring in Business Analytics and Data Science.
📊 I’m passionate about using data to uncover insights and support strategic decision-making.
📈 My experience in finance and healthcare has enabled me to strengthen my data analysis and research skills.
🧑‍💻 Proficient in SQL, Python, R, Tableau and Git; while actively enhancing my expertise in Data Engineering and Cloud technologies.

Contact Info:
📍 Based in Los Angeles, CA
📞 Contact me 714-858-7494
📧 kdpham@umass.edu (School)
💌 kdpham1002@gmail.com (Personal)

Education

University of Massachusetts - Amherst | Sep 2019 - May 2024

  • M.B.A in Business Analytics (Accelerated 4+1 Program)
  • B.S in Informatics Data Science | Minor: Statistics | Pre‑Med Track
    • GPA: 3.9/4.0
    • Dean’s List (all year), Chancellor’s Award Scholarship
  • Relevant Courses: Database Management (SQL), Machine Learning & Data Science for Business (Python), Applied Statistics (R), Linear Algebra, Discrete Math, Data Structures & Algorithms (Java), Web Programming (HTML/CSS/JavaScript), Project Management, Media Marketing

Skills

  • Programming & Tools: SQL, Python, R/RStudio, Tableau/Power BI, Git/GitHub; MS Office (Excel, Word, PowerPoint, Access, Outlook, Project)
  • Cloud & Big Data: AWS (S3, Redshift, Glue, QuickSight), Snowflake, Spark (PySpark)
  • Libraries & Frameworks: Pandas, NumPy, Matplotlib, Seaborn, Scikit‑learn, Statsmodels, SciPy, PyTorch
  • Techniques: Statistical Analysis, Hypothesis Testing, Predictive Modeling, AI/ML, ETL/ELT Pipelines, Data Integration, Data Visualization, Data Warehousing

  • Project Management: Agile/Scrum, SOP Documentation
  • Healthcare Knowledge: EMR/EHR Systems, ICD-10/CPT Coding Familiarity, Claims Processing, 340B Drug Pricing Program, HIPAA Compliance, Medicaid/Medicare Regulations

Exams & Certifications

  • Actuarial Exams: P, FM, SRM (in progress)
  • VEEs: Economics, Accounting and Finance, Mathematical Statistics

Projects

Customer Purchase Behavior Analysis
GitHub | Report

Tech Stack: Python, Pandas, Scikit-Learn, Matplotlib, Seaborn, Plotly, Statistical Analysis, Git/GitHub

  • Performed EDA on Walmart sales data to analyze spending trends by gender, age, and marital status.
  • Applied statistical methods (CLT, confidence intervals) to compare demographic spending patterns.
  • Developed visualizations using Matplotlib & Seaborn to showcase purchase behavior insights.
  • Generated recommendations to improve customer acquisition, retention, and marketing strategies.

Ecommerce Data Pipeline and Forecasting Model
GitHub | Report

Tech Stack: Python (Pandas, NumPy, Matplotlib, Scikit-learn), Spark (PySpark), SQL, Git/GitHub

  • Built an OOP-based inventory tracking system for 1,000+ products, ensuring efficient inventory control.
  • Implemented a PySpark retail data pipeline to clean and transform large‑scale order records for analysis.
  • Applied Random Forest and Regression models to predict future product demand and sales trends.
  • Engineered features from time-series data, leveraging weekly sales trends for improved forecasting accuracy.

Banking Marketing and Investment Optimization
GitHub | Report

Tech Stack: Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, PyPortfolioOpt), SQL, Git/GitHub

  • Developed a credit card approval model with 81.8% accuracy, automating risk assessment for banks.
  • Optimized data pipeline for bank marketing analysis, enabling more precise customer targeting.
  • Analyzed financial ratios, identifying industry‑specific risk trends for investment decision‑making.
  • Optimized a FAANG stock portfolio using mean‑variance optimization, balancing risk and returns.

Healthcare Employee Attrition Prediction
GitHub | Report

Tech Stack: Python (Pandas, Scikit-Learn, Matplotlib, Seaborn), Decision Tree, Logistic Regression, Git/GitHub

  • Developed Decision Tree and Logistic Regression models to predict high‑risk employee attrition.
  • Achieved classification accuracy of 80%, providing insights for targeted employee retention strategies.
  • Engineered demographic, work-related, and compensation features, improving model interpretability.
  • Conducted statistical analysis on work‑life balance, job involvement, and salary trends to support HR decisions.

Customer Subscriber Churn Prediction
GitHub | Report

Tech Stack: Python (Pandas, Scikit-Learn, Matplotlib, Seaborn) Logistic Regression, Decision Tree, Random Forest, K-Means Clustering, Git/GitHub

  • Performed EDA on Walmart sales data to analyze spending trends by gender, age, and marital status.
  • Applied statistical methods (CLT, confidence intervals) to compare demographic spending patterns.
  • Developed visualizations using Matplotlib & Seaborn to showcase purchase behavior insights.
  • Generated recommendations to improve customer acquisition, retention, and marketing strategies.

University Mental Health Research Study
GitHub | Report

Tech Stack: SQL, Python (Pandas, Seaborn, Scikit‑learn, SciPy, Statsmodels), Git/GitHub

  • Conducted statistical analysis on 200+ international students, uncovering mental health trends.
  • Developed a Random Forest model predicting student depression risk with 75% accuracy, providing early intervention and recommendations to enhance peer‑support programs.

Pharmaceutical Market Share & Pricing Analysis
Vertex Pharmaceutical OA | Spring 2025

Technology: Excel (Pivot Tables), Python (Pandas, Seaborn), Tableau (dashboards)

  • Analyzed US pharma sales data to assess market share, pricing, and revenue across regions and therapeutic areas; visualized trends to distinguish volume-driven vs. premium-priced drugs.
  • Delivered insights and recommendations on pricing strategy and affordability, supporting market access decisions based on therapeutic value.

Finance Department’s New Hire Request
Isenberg School of Management | Fall 2024

Technology: SQL (MySQL), Tableau (dashboards), Excel & PowerPoint (reporting)

  • Analyzed 40,000+ admissions from university’s factsheets and data table tables. Identified a 41% growth in undergraduate enrollments and identify gaps in faculty to support resource allocation for Finance Department.
  • Developed dashboards to present 1000+ enrollments, integrated forecasting models leading to recommendations for hiring faculty specialized in risk management to address industry demands and enhance program prestige.

Car Sales, Insurance Claims & Charging Behaviors
College of Information & Computer Science | Summer 2024

Technology: Python (Pandas, NumPy, Statsmodels), SQL (PostgreSQL, SQLAlchemy), Git/GitHub

  • Designed and implemented an SPC‑based monitoring system, reducing manufacturing defects by identifying deviations in control limits.
  • Built a predictive model for car insurance claims, pinpointing driving experience as the strongest predictor with 77.71% accuracy.

States Economic Dynamics Analysis
MGMT 601: Data Management | Spring 2024

Technology: SQL (MySQL), Tableau (dashboards), Excel & PowerPoint (reporting)

  • Designed interactive dashboards analyzing state-level income, expenses, unemployment rates, cost of living, and population trends.
  • Integrated and processed datasets covering diverse economic indicators (e.g., median income trends from 2012–2023, cost of living indices, and unemployment rates).

Experience

Center for Teaching and Learning | Amherst, MA
Data Management Coordinator | Sep - Dec 2023

  • Managed and cleaned survey data for workshop participation analysis, ensuring data accuracy. Developed automated reports recognizing 300+ professors and lecturers for Distinguished Teaching Awards.

Biomedical NLP Processing Laboratory | Amherst, MA
Biomedical Research Assistant | Feb - May 2022

  • Processed unstructured biomedical data, including EHR notes and scientific articles, to support NLP research. Assisted in developing text‑mining algorithms to identify key medical terms in clinical documentation.
  • Optimized large-scale information retrieval by implementing MapReduce and Spark‑based inverse index tables, enhancing query efficiency.

Data Science Track | Amherst, MA
Program Mentor & Peer Tutor | Sep 2021 - May 2022

  • Guided students in curriculum planning, career development, technical preparation for data science and analyst roles, connecting them with mentors from the Computer Science Department.
CS326: Web Programming | Amherst, MA
Data Science Interview Preps Platform | Sep 2021 - May 2022

Technology: SQL, Python, HTML, CSS, JavaScript, Ruby, VSCode (Jupyter Notebook), nbconvert, Git/GitHub

  • Developed a Q&A interview practice platform for Python, SQL, and cloud technologies, leveraging Feynman Technique and Active Recall to improve retention of technical concepts and problem‑solving skills.

Five College Language Program | Amherst, MA
Teaching Assistant | Sep 2021 - Feb 2022

  • Conducted weekly lessons to improve students’ language proficiency, customized curriculum to meet diverse learning needs, prepared them for final exams and provided detailed weekly progress reports.

Millennium Dance Complex | Orange, CA
Digital Media Manager | Summer 2024

  • Filmed, edited, and managed video database for dance classes, implementing efficient categorization to enhance digital workflows.
  • Analyzed social media performance, delivering weekly insights and recommendations that drove increased audience engagements and social media growth.

Humans of CICS | Amherst, MA
Social Media Outreach | Spring 2020

  • Highlighted professional stories of the CS Department’s faculty, professors, and students. Collaborated with alumni and staff to expand and foster stronger connections within the CICS community.

Interests

Add /mindpalace after github.io to have a peek of another me! 🙋‍♂️