ABOUT
DATA ANALYST
Data Analyst with 4+ years of experience building analytical systems,
SQL-based reporting frameworks, and ML models that turn data into
business decisions. Projects include fraud detection (AUC 0.96, 91%
precision), CLV modelling (top 10% of customers = 40% of revenue),
and sales analytics across 186,000+ records using Python, SQL, and
Power BI.
SKILLS
Areas of Expertise
-
• SQL (CTEs, Window Functions, Cohort Analysis, Aggregations)
• Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
• Power BI (DAX, Data Modelling, Interactive Dashboards)
• Machine Learning (XGBoost, Random Forest, Logistic Regression)
• NLP (TF-IDF, BERT, LSTM, Text Classification)
• Tableau · Looker Studio · Excel (Advanced)
• MySQL · BigQuery · Jupyter Notebook · Git
• Azure AI-900 (Score: 836) · RFM Analysis · A/B Testing
July 2022 – Present
STEAM Program Manager &Data Systems Lead — The Creative School, Bangalore
-
3 years, 04 months of teaching experience in IGCSE Computer Science and Combined Science
-
Developed and maintained a comprehensive Tinkering Lab with multiple technology platforms (micro:bit,Arduino,TinkerCad)
-
Led teacher training programs covering 3 key areas: coding instruction, project-based learning methodologies, and assessment design
-
Orchestrated multiple science fairs and innovation showcases
Educational Outreach & Volunteer Work
-
11 months as a Teach for India Facilitator supporting underserved communities (March 2023 - February 2024)
-
2 months of field experience in urban slum communities with TYCIA Foundation (June - July 2022)
-
1+ year of research experience studying higher education accessibility for tribal youth (2020 - 2021)
Marketing & Social Media
WORK EXPERIENCE
-
2 months of virtual internship experience in campaign strategy development and content creation (May - June 2021)
EDUCATION
Master of Arts in Development
Azim Premji University, Bangalore
Bachelor of Engineering in Electrical Engineering
Nagpur University, Maharashtra
CERTIFICATIONS
Microsoft Certified: Azure AI Fundamentals
The Sacred Classroom & Life and Living
Joy of Teaching – Pedagogical Frameworks for Alternative Education
PROJECTS
Sales Exploratory Data Analysis
-
Merged and cleaned ~186,000 records from multiple CSV files using Python, ensuring 100% data integrity and consistency.
-
Integrated MySQL and Excel with Power BI, automating sales data flow for real-time updates.
-
Built an interactive Power BI dashboard used to extract key business insights across finance, sales, and supply chain domains.
-
Identified December as the highest-grossing month and USB-C Charging Cable as the top-selling product with a sales share of 8.3%.
Delivered actionable insights that could potentially improve gross margin by up to 10% through data-informed advertising and inventory decisions.
Sales EDA + Power BI Dashboard (186K records)
-
Merged and cleaned ~186,000 records from multiple CSV files using Python, ensuring 100% data integrity and consistency.
-
Integrated MySQL and Excel with Power BI, automating sales data flow for real-time updates.
-
Built an interactive Power BI dashboard used to extract key business insights across finance, sales, and supply chain domains.
-
Identified December as the highest-grossing month and USB-C Charging Cable as the top-selling product with a sales share of 8.3%.
-
Delivered actionable insights that could potentially improve gross margin by up to 10% through data-informed advertising and inventory decisions.
Car Insurance Fraud Detection
-
Analyzed 50,000+ insurance records, identifying fraud patterns with a precision of 91%.
-
Engineered new features such as claim timing analysis, improving fraud detection accuracy by 12%.
-
Trained and validated models like Random Forest and XGBoost, selecting the best-performing model with an AUC of 0.96.
-
Implemented scalable Python scripts for automated anomaly detection in live data feeds.
Communicated findings through reports and dashboards, enabling real-time fraud flagging capability.
Credit Card Spending Habits in India (SQL)
-
Conducted a deep-dive SQL analysis on credit card transactions to uncover consumer behavior patterns.
-
Used CTEs, Window Functions (RANK, ROW_NUMBER, LAG), and Aggregations to answer business questions.
-
Top cities by spend: Delhi, Mumbai
-
Bangalore reached 500 transactions fastest — indicating strong market adoption
-
Females contributed >60% of bills and grocery spending
-
Gold card usage was lowest in certain cities
-
Enabled data-driven strategies for marketing, fraud detection, and customer engagement.
Spam Mail Classification
-
Processed 5,500+ email samples, implementing tokenization, stemming, lemmatization, and stop word removal.
-
Extracted features using TF-IDF and word embeddings, improving signal-to-noise ratio in the dataset by over 60%.
-
Achieved 94%+ accuracy using models like Naive Bayes, Logistic Regression, and SVM.
-
Trained LSTM and BERT models, improving F1-score by 15% compared to classical ML models.
-
Compared models using precision, recall, and F1 metrics to ensure robustness and deployability.
Car Insurance Fraud Detection (Python · XGBoost · AUC 0.96)
-
Leaned and structured public benchmarking data from NYC Open Data.
-
Designed a user-friendly Power BI dashboard with dynamic filters
-
Created KPIs for energy intensity, emissions, and ENERGY STAR scores
-
Visualized trends by borough, property type, and year built
-
Manhattan had the highest average GHG emissions, driven by high-rise offices and hospitals.
-
Medical and education facilities consume significantly more energy per square foot than residential buildings.
-
Over 40% of buildings scored below 50 on the ENERGY STAR scale — indicating poor efficiency.
-
High-performing buildings (>75 score) were mostly newer (post-2000) or green-certified.
PROJECTS
Sales Exploratory Data Analysis
-
Merged and cleaned ~186,000 records from multiple CSV files using Python, ensuring 100% data integrity and consistency.
-
Integrated MySQL and Excel with Power BI, automating sales data flow for real-time updates.
-
Built an interactive Power BI dashboard used to extract key business insights across finance, sales, and supply chain domains.
-
Identified December as the highest-grossing month and USB-C Charging Cable as the top-selling product with a sales share of 8.3%.
Delivered actionable insights that could potentially improve gross margin by up to 10% through data-informed advertising and inventory decisions.
NYC Building Energy Performance Analysis (Power BI)
-
Interactive slicers: Filter by Borough, Property Type, ENERGY STAR Score Range
KPI cards showing avg. energy intensity, total emissions, and average ENERGY STAR score
Visualizations: Bar charts, pie charts, line graphs, and emission distributions -
Manhattan had the highest average GHG emissions due to high-rise offices and hospitals.
Education and medical facilities use significantly more energy per square foot than residential buildings.
Over 40% of buildings scored below 50 on the ENERGY STAR scale.
High-performing buildings (score >75) were mostly newer (post-2000) or green-certified.
Car Insurance Fraud Detection
-
Analyzed 50,000+ insurance records, identifying fraud patterns with a precision of 91%.
-
Engineered new features such as claim timing analysis, improving fraud detection accuracy by 12%.
-
Trained and validated models like Random Forest and XGBoost, selecting the best-performing model with an AUC of 0.96.
-
Implemented scalable Python scripts for automated anomaly detection in live data feeds.
Communicated findings through reports and dashboards, enabling real-time fraud flagging capability.
Customer Lifetime Value (CLV) Prediction (RFM · Python)
-
Top 10% of customers contributed over 40% of total revenue — highlighting the importance of retention.
Recency and Customer Age were among the most important predictors of future value.
XGBoost captured non-linear patterns but overfitted slightly; Linear Regression provided a robust, interpretable baseline.
Spam Mail Classification
-
Processed 5,500+ email samples, implementing tokenization, stemming, lemmatization, and stop word removal.
-
Extracted features using TF-IDF and word embeddings, improving signal-to-noise ratio in the dataset by over 60%.
-
Achieved 94%+ accuracy using models like Naive Bayes, Logistic Regression, and SVM.
-
Trained LSTM and BERT models, improving F1-score by 15% compared to classical ML models.
-
Compared models using precision, recall, and F1 metrics to ensure robustness and deployability.
Spam Mail Classification (NLP · BERT · 94% accuracy)
-
Processed 5,500+ email samples, implementing tokenization, stemming, lemmatization, and stop word removal.
-
Extracted features using TF-IDF and word embeddings, improving signal-to-noise ratio in the dataset by over 60%.
-
Achieved 94%+ accuracy using models like Naive Bayes, Logistic Regression, and SVM.
-
Trained LSTM and BERT models, improving F1-score by 15% compared to classical ML models.
-
Compared models using precision, recall, and F1 metrics to ensure robustness and deployability.
.jpeg)