top of page

Arjun Rathod

ARJUN RATHOD

DATA ANALYST

untitled (1).jpeg

ABOUT

DATA ANALYST

Data Analyst with 4+ years of experience building analytical systems, 
SQL-based reporting frameworks, and ML models that turn data into 
business decisions. Projects include fraud detection (AUC 0.96, 91% 
precision), CLV modelling (top 10% of customers = 40% of revenue), 
and sales analytics across 186,000+ records using Python, SQL, and 
Power BI.

SKILLS

Areas of Expertise

  • • SQL (CTEs, Window Functions, Cohort Analysis, Aggregations)
    • Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
    • Power BI (DAX, Data Modelling, Interactive Dashboards)
    • Machine Learning (XGBoost, Random Forest, Logistic Regression)
    • NLP (TF-IDF, BERT, LSTM, Text Classification)
    • Tableau · Looker Studio · Excel (Advanced)
    • MySQL · BigQuery · Jupyter Notebook · Git
    • Azure AI-900 (Score: 836) · RFM Analysis · A/B Testing

July 2022 – Present

STEAM Program Manager &Data Systems Lead — The Creative School, Bangalore

  • 3 years, 04 months of teaching experience in IGCSE Computer Science and Combined Science

  • Developed and maintained a comprehensive Tinkering Lab with multiple technology platforms (micro:bit,Arduino,TinkerCad)

  • Led teacher training programs covering 3 key areas: coding instruction, project-based learning methodologies, and assessment design

  • Orchestrated multiple science fairs and innovation showcases

Educational Outreach & Volunteer Work

  • 11 months as a Teach for India Facilitator supporting underserved communities (March 2023 - February 2024)

  • 2 months of field experience in urban slum communities with TYCIA Foundation (June - July 2022)

  • 1+ year of research experience studying higher education accessibility for tribal youth (2020 - 2021)

Marketing & Social Media

WORK EXPERIENCE

  • 2 months of virtual internship experience in campaign strategy development and content creation (May - June 2021)

EDUCATION

Master of Arts in Development

Azim Premji University, Bangalore

Bachelor of Engineering in Electrical Engineering

Nagpur University, Maharashtra

CERTIFICATIONS

Microsoft Certified: Azure AI Fundamentals

The Sacred Classroom & Life and Living
 

Joy of Teaching – Pedagogical Frameworks for Alternative Education

PROJECTS

Sales Exploratory Data Analysis

  • Merged and cleaned ~186,000 records from multiple CSV files using Python, ensuring 100% data integrity and consistency.

  • Integrated MySQL and Excel with Power BI, automating sales data flow for real-time updates.

  • Built an interactive Power BI dashboard used to extract key business insights across finance, sales, and supply chain domains.

  • Identified December as the highest-grossing month and USB-C Charging Cable as the top-selling product with a sales share of 8.3%.

Delivered actionable insights that could potentially improve gross margin by up to 10% through data-informed advertising and inventory decisions.

Sales EDA + Power BI Dashboard (186K records)

  • Merged and cleaned ~186,000 records from multiple CSV files using Python, ensuring 100% data integrity and consistency.

  • Integrated MySQL and Excel with Power BI, automating sales data flow for real-time updates.

  • Built an interactive Power BI dashboard used to extract key business insights across finance, sales, and supply chain domains.

  • Identified December as the highest-grossing month and USB-C Charging Cable as the top-selling product with a sales share of 8.3%.

  • Delivered actionable insights that could potentially improve gross margin by up to 10% through data-informed advertising and inventory decisions.

Car Insurance Fraud Detection

  • Analyzed 50,000+ insurance records, identifying fraud patterns with a precision of 91%.
     

  • Engineered new features such as claim timing analysis, improving fraud detection accuracy by 12%.
     

  • Trained and validated models like Random Forest and XGBoost, selecting the best-performing model with an AUC of 0.96.
     

  • Implemented scalable Python scripts for automated anomaly detection in live data feeds.
     

Communicated findings through reports and dashboards, enabling real-time fraud flagging capability.

Credit Card Spending Habits in India (SQL)

  • Conducted a deep-dive SQL analysis on credit card transactions to uncover consumer behavior patterns.

  • Used CTEs, Window Functions (RANK, ROW_NUMBER, LAG), and Aggregations to answer business questions.

  • Top cities by spend: Delhi, Mumbai

  • Bangalore reached 500 transactions fastest — indicating strong market adoption

  • Females contributed >60% of bills and grocery spending

  • Gold card usage was lowest in certain cities

  • Enabled data-driven strategies for marketing, fraud detection, and customer engagement.

Spam Mail Classification

  • Processed 5,500+ email samples, implementing tokenization, stemming, lemmatization, and stop word removal.

  • Extracted features using TF-IDF and word embeddings, improving signal-to-noise ratio in the dataset by over 60%.

  • Achieved 94%+ accuracy using models like Naive Bayes, Logistic Regression, and SVM.

  • Trained LSTM and BERT models, improving F1-score by 15% compared to classical ML models.

  • Compared models using precision, recall, and F1 metrics to ensure robustness and deployability.

Car Insurance Fraud Detection (Python · XGBoost · AUC 0.96)

  • Leaned and structured public benchmarking data from NYC Open Data.

  • Designed a user-friendly Power BI dashboard with dynamic filters

  • Created KPIs for energy intensity, emissions, and ENERGY STAR scores

  • Visualized trends by borough, property type, and year built

  • Manhattan had the highest average GHG emissions, driven by high-rise offices and hospitals.

  • Medical and education facilities consume significantly more energy per square foot than residential buildings.

  • Over 40% of buildings scored below 50 on the ENERGY STAR scale — indicating poor efficiency.

  • High-performing buildings (>75 score) were mostly newer (post-2000) or green-certified.

PROJECTS

Sales Exploratory Data Analysis

  • Merged and cleaned ~186,000 records from multiple CSV files using Python, ensuring 100% data integrity and consistency.

  • Integrated MySQL and Excel with Power BI, automating sales data flow for real-time updates.

  • Built an interactive Power BI dashboard used to extract key business insights across finance, sales, and supply chain domains.

  • Identified December as the highest-grossing month and USB-C Charging Cable as the top-selling product with a sales share of 8.3%.

Delivered actionable insights that could potentially improve gross margin by up to 10% through data-informed advertising and inventory decisions.

NYC Building Energy Performance Analysis (Power BI)

  • Interactive slicers: Filter by Borough, Property Type, ENERGY STAR Score Range
    KPI cards showing avg. energy intensity, total emissions, and average ENERGY STAR score
    Visualizations: Bar charts, pie charts, line graphs, and emission distributions

  • Manhattan had the highest average GHG emissions due to high-rise offices and hospitals.
    Education and medical facilities use significantly more energy per square foot than residential buildings.
    Over 40% of buildings scored below 50 on the ENERGY STAR scale.
    High-performing buildings (score >75) were mostly newer (post-2000) or green-certified.

Car Insurance Fraud Detection

  • Analyzed 50,000+ insurance records, identifying fraud patterns with a precision of 91%.
     

  • Engineered new features such as claim timing analysis, improving fraud detection accuracy by 12%.
     

  • Trained and validated models like Random Forest and XGBoost, selecting the best-performing model with an AUC of 0.96.
     

  • Implemented scalable Python scripts for automated anomaly detection in live data feeds.
     

Communicated findings through reports and dashboards, enabling real-time fraud flagging capability.

Customer Lifetime Value (CLV) Prediction (RFM · Python)

  • Top 10% of customers contributed over 40% of total revenue — highlighting the importance of retention.
    Recency and Customer Age were among the most important predictors of future value.
    XGBoost captured non-linear patterns but overfitted slightly; Linear Regression provided a robust, interpretable baseline.

Spam Mail Classification

  • Processed 5,500+ email samples, implementing tokenization, stemming, lemmatization, and stop word removal.

  • Extracted features using TF-IDF and word embeddings, improving signal-to-noise ratio in the dataset by over 60%.

  • Achieved 94%+ accuracy using models like Naive Bayes, Logistic Regression, and SVM.

  • Trained LSTM and BERT models, improving F1-score by 15% compared to classical ML models.

  • Compared models using precision, recall, and F1 metrics to ensure robustness and deployability.

Spam Mail Classification (NLP · BERT · 94% accuracy)

  • Processed 5,500+ email samples, implementing tokenization, stemming, lemmatization, and stop word removal.

  • Extracted features using TF-IDF and word embeddings, improving signal-to-noise ratio in the dataset by over 60%.

  • Achieved 94%+ accuracy using models like Naive Bayes, Logistic Regression, and SVM.

  • Trained LSTM and BERT models, improving F1-score by 15% compared to classical ML models.

  • Compared models using precision, recall, and F1 metrics to ensure robustness and deployability.

LET'S WORK TOGETHER

Arjun Rathod

Hyderabad, Telangana

·Open to GCC & startup roles

bottom of page