All Projects
Data engineering pipelines, analytics systems, and machine learning solutions
Sekolah Mana? – Malaysian School Performance Data Pipeline
End-to-end data pipeline processing Malaysian school performance data using Bronze-Silver-Gold architecture. Orchestrated with Apache Airflow, transformed with dbt, and stored in PostgreSQL. Processes 10,000+ schools and 5M+ student records with 99.8% data quality score. Implements incremental loading, data quality checks, and dimensional modeling for analytics consumption.
Weather & Flight Data Engineering Pipeline
Real-time data pipeline integrating weather and flight data using Apache Spark for distributed processing. Polls APIs for live weather and flight status, processes 500K+ flight records monthly, and stores partitioned Parquet files in AWS S3. Uses dbt for creating aggregated metrics and achieved 3x query performance improvement through optimized partitioning strategy.
Food Price & Food Security Forecasting
End-to-end machine learning pipeline forecasting food price changes and food security index in Malaysia. Integrated multi-source datasets including monthly food prices, aggregated daily climate data (temperature and rainfall), and food production statistics. Implemented and evaluated LSTM, Random Forest, SVR, XGBoost, and LightGBM models. Developed interactive Streamlit application supporting manual input, CSV uploads, with prediction storage and visualization capabilities.
Malaysian Affordability Gap Analysis
Comprehensive data analysis project examining income versus household spending across Malaysian states to quantify affordability gaps and financial pressure. Identified states with high expenditure-to-income ratios, low disposable income, and elevated financial stress levels. Categorized regions based on affordability risk highlighting economic disparities. Designed professional statistical visualizations with clear color coding and annotations, translating economic data into actionable insights for policy evaluation and business decision-making.
DBT Transformation Models - ResMed
Production-grade dbt transformation models built at ResMed converting Oracle ERP datasets into Snowflake analytics tables. Standardized SQL logic using Jinja templates improving reusability and maintainability across finance, supply chain, and marketing analytics teams. Participated in Agile sprints with Git-based version control and CI/CD deployment cycles. Explored Dagster-based orchestration for automated pipeline scheduling. Supported cross-functional analytics teams with clean, well-documented dimensional models following Kimball methodology.
Banking ETL Pipelines - InsiteMY
Robust ETL pipelines built using SSIS and SQL Server for major banking clients including BIMB and AmBank. Delivered critical reporting systems: Data Quality Framework (DQF), STATSMART, and Information Security System (ISS). Translated complex business requirements into efficient SQL queries and Java modules ensuring high performance and reliability. Contributed to regulatory reporting for Bank Negara Malaysia, maintaining compliance and data accuracy in high-stakes financial environments. Implemented comprehensive error handling and logging mechanisms.