Data Engineering Pipeline

Sekolah Mana? – Malaysian School Performance Data Pipeline

End-to-end data pipeline processing Malaysian school performance data using Bronze-Silver-Gold architecture. Orchestrated with Apache Airflow, transformed with dbt, and stored in PostgreSQL. Processes 10,000+ schools and 5M+ student records with 99.8% data quality score. Implements incremental loading, data quality checks, and dimensional modeling for analytics consumption.

Apache Airflow dbt PostgreSQL Python Docker AWS S3
Data Engineering Pipeline

Weather & Flight Data Engineering Pipeline

Real-time data pipeline integrating weather and flight data using Apache Spark for distributed processing. Polls APIs for live weather and flight status, processes 500K+ flight records monthly, and stores partitioned Parquet files in AWS S3. Uses dbt for creating aggregated metrics and achieved 3x query performance improvement through optimized partitioning strategy.

Apache Airflow Apache Spark dbt AWS S3 Python Docker
Machine Learning Pipeline

Food Price & Food Security Forecasting

End-to-end machine learning pipeline forecasting food price changes and food security index in Malaysia. Integrated multi-source datasets including monthly food prices, aggregated daily climate data (temperature and rainfall), and food production statistics. Implemented and evaluated LSTM, Random Forest, SVR, XGBoost, and LightGBM models. Developed interactive Streamlit application supporting manual input, CSV uploads, with prediction storage and visualization capabilities.

Python LSTM Random Forest XGBoost LightGBM Streamlit Snowflake pandas scikit-learn
Data Analytics

Malaysian Affordability Gap Analysis

Comprehensive data analysis project examining income versus household spending across Malaysian states to quantify affordability gaps and financial pressure. Identified states with high expenditure-to-income ratios, low disposable income, and elevated financial stress levels. Categorized regions based on affordability risk highlighting economic disparities. Designed professional statistical visualizations with clear color coding and annotations, translating economic data into actionable insights for policy evaluation and business decision-making.

Python pandas matplotlib seaborn Statistical Analysis
Analytics Engineering

DBT Transformation Models - ResMed

Production-grade dbt transformation models built at ResMed converting Oracle ERP datasets into Snowflake analytics tables. Standardized SQL logic using Jinja templates improving reusability and maintainability across finance, supply chain, and marketing analytics teams. Participated in Agile sprints with Git-based version control and CI/CD deployment cycles. Explored Dagster-based orchestration for automated pipeline scheduling. Supported cross-functional analytics teams with clean, well-documented dimensional models following Kimball methodology.

dbt Snowflake Jinja SQL Oracle Git Dagster Agile
ETL Pipeline

Banking ETL Pipelines - InsiteMY

Robust ETL pipelines built using SSIS and SQL Server for major banking clients including BIMB and AmBank. Delivered critical reporting systems: Data Quality Framework (DQF), STATSMART, and Information Security System (ISS). Translated complex business requirements into efficient SQL queries and Java modules ensuring high performance and reliability. Contributed to regulatory reporting for Bank Negara Malaysia, maintaining compliance and data accuracy in high-stakes financial environments. Implemented comprehensive error handling and logging mechanisms.

SSIS SQL Server ETL SQL Java Banking Domain Regulatory Compliance