Computational methods for social science, economics, and machine learning.
I am a predoctoral researcher in the Department of Economics at the National University of Singapore. My research sits at the intersection of empirical economics, machine learning, and computational social science, using large-scale data, causal inference, and deep learning to study media, political economy, labor markets, and cultural dynamics. I hold double honours degrees in Data Science & Analytics and Economics from NUS, where I was awarded the Paul Sherwood Memorial Gold Medal for Best Graduate in Economics and the Lijen Industrial Development Medal for Best Honours Project.
My work draws on tools from econometrics, deep learning, natural language processing, and network analysis to study questions in political economy, media, and computational social science. I am interested in how information environments shape political and economic outcomes, and in building the quantitative infrastructure necessary to answer those questions rigorously.
At NUS, I have worked across several domains: constructing novel datasets on international media coverage, labor markets, and judicial decision-making; designing field experiments to study political preferences; and developing Bayesian adaptive methods for clinical trials in collaboration with Procter & Gamble. I also work on deep learning for audio and 3D vision, and have contributed to software that supports statistical pedagogy and automated assessment.
Outside the laboratory, I am committed to open science and education. I have organized workshops on generative AI, computer vision, and network analysis, mentored students in data analytics and programming, and published over 25 articles on data science and statistics with more than 100,000 total reads.
My research weaves together several intellectual threads that share a common ambition: to bring high-resolution data and rigorous quantitative methods to bear on questions about how societies organize, communicate, and make decisions.
How do media institutions frame political events, and what are the downstream effects on public opinion and political outcomes? I construct and analyze large-scale cross-national media datasets to study coverage patterns, coded rhetoric, and information asymmetries.
I use NLP, network analysis, and causal inference to study social phenomena at scale, including music diffusion across cultures, judicial decision-making, charitable giving, and labor market dynamics, building datasets and methods that enable systematic empirical inquiry.
My methods work spans complex-valued neural networks for audio processing, 3D scene reconstruction (NeRF, SLAM), computer vision for surveillance, and LLM-based text classification. I am interested in adapting deep learning architectures to structured, domain-specific problems.
I develop and evaluate Bayesian adaptive trial designs, including normalized power priors and meta-analytic-predictive priors, and contribute statistical software for copula-based dependence modeling and reproducible fairness benchmarking.
I develop signaling and game-theoretic models for settings where information disclosure is strategic, including data markets and urban real estate, deriving equilibrium conditions and analyzing policy levers that improve allocative efficiency.
I routinely apply difference-in-differences, instrumental variables, regression discontinuity, and high-dimensional fixed effects to identify causal effects in observational and quasi-experimental data across domains including finance, labor, and judicial behavior.
Phase-Aware Deep Learning with Complex-Valued CNNs for Audio Signal Applications Forthcoming
To be presented at the IEEE Conference on Artificial Intelligence (CAI 2026), Granada, Spain.
Proposes complex-valued CNN architectures, with complex batch normalization, principled initialization schemes, and a comparative study of complex activation functions, to exploit phase information typically discarded by real-valued networks. Evaluated across three stages: image classification baselines, MFCC-based audio classification, and phase-aware graph neural networks. Results demonstrate that complex architectures can match real-valued baselines while offering improved handling of phase structure in audio pipelines, with a careful analysis of stability–expressivity trade-offs across activation choices.
Analysis of Student–LLM Interaction in a Software Engineering Project Published
IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code), ICSE 2025, Ottawa, Canada.
Tracked 126 undergraduate students over a 13-week semester, analyzing conversational logs, 730 LLM-generated code snippets, and code complexity metrics including cyclomatic complexity, control flow graph depth, and Halstead effort. Demonstrates that dialogue-based ChatGPT interactions produce shorter, structurally simpler code compared to GitHub Copilot, highlighting how conversational prompting reduces code complexity while satisfying project requirements.
covalchemy: R Package for Constructing Joint Distributions with Control over Statistical Properties
R Package
Provides tools for constructing multivariate distributions that preserve empirical marginals while allowing precise control over correlation structure, mutual information, and higher-order dependence via copula-based methods. Benchmark experiments demonstrate marginal fidelity with less than 1% error. Designed for research applications and statistical pedagogy, including as an engine for reproducible Simpson's paradox benchmarking.
autoharp: R Package for Semi-Automatic Grading of R/Rmd Scripts
R Package
A customizable toolkit for assessing student R and R Markdown submissions, offering automated correctness checks, run-time profiling, and static code analysis. Supports educator workflows with functions to render and evaluate scripts, extract code structure, and generate comprehensive diagnostic reports.
Cultural Flows in the Digital Age: Evidence from YouTube's Global Music Charts Working Paper
Analyzes 1.26 million weekly entries from YouTube Music's Top-100 charts across 60 countries (2021–2025) to map global music circulation. Graph-based clustering reveals three stable listening blocs: a cohesive Latin American community, a broad global mainstream cluster linking Europe and Asian markets, and a distinct East African group. Bilateral similarity regressions confirm that shared language is the strongest predictor of musical affinity, with asymmetric diffusion networks positioning the United States as the central hub and Mexico as the pivotal conduit for Spanish-language content.
Seventy-Five Years of the Supreme Court of India: An Empirical Analysis of Judicial Decision-Making, Citation Networks, and Institutional Evolution Working Paper
Constructs a novel dataset of 42,833 judgments with structured metadata extracted using large language models, covering party characteristics, legal doctrines, constitutional provisions, precedent citations, bench composition, and case outcomes. Estimates determinants of appellant success using OLS, logit, probit, and high-dimensional fixed effects, with causal identification through IV, RDD, and DiD designs. Maps citation networks to identify the 50 most influential precedents and detect doctrinal communities via Louvain clustering.
Crisis, Compassion, and Capital: The Economics of Global Charitable Giving Working Paper
Examines nearly 50,000 projects across 201 countries representing over $575 million in charitable donations (2002–2025). Using the Russian invasion of Ukraine as a natural experiment through difference-in-differences and event study designs, finds 300%+ funding surges for Ukraine-linked projects with evidence of both additionality and substitution effects. Text analysis reveals that urgency keywords increase funding by 35–40% and life-saving language by 89%, operating primarily through new donor acquisition. Documents substantial geographic inequality, with North American projects receiving 2–3× more funding than African counterparts.
Copula-Based Approaches to Engineering Simpson's Paradox and Applications to Fairness Evaluation Working Paper
Presents a reproducible copula-based pipeline, using quantile inverse-CDF mapping, targeted
copula dependence, assignment optimization, and simulated annealing, to generate tunable
Simpson's paradox datasets for algorithmic fairness benchmarking. Demonstrates that pooled
estimators can mask severe subgroup failures, providing a practical testbed for fairness
diagnostics. Implementation is available in the covalchemy R package.
Equilibrium Analysis for Strategic Information Revelation in Data Markets Working Paper
Develops a tractable signaling model in which data sellers choose how much of a dataset to disclose and buyers update beliefs accordingly. Derives conditions for pooling versus separating equilibria and uses simulations to show how disclosure costs and prior beliefs shape market vulnerability to adverse selection. Highlights policy implications including standardized disclosure requirements, certification mechanisms, and third-party audits.
Rigidity as a Signal: A Game-Theoretic Explanation for Persistent Vacancies in Urban Real Estate Markets Working Paper
Models vacancy as a deliberate, costly signal in a two-period game with heterogeneous sellers, deriving multiple equilibrium types: pooling-holdout, pooling-discount, and two separating forms. Analyzes comparative statics and demonstrates how targeted policy levers, including vacancy taxes and price-ratio mandates, can shift equilibria toward more informative outcomes and improve allocative efficiency in urban housing markets.
Department of Economics, National University of Singapore
Working with Dr. Ruben Durante and Dr. Xiaoyue Shan on projects spanning media economics, political economy, and labor markets. I have built a large-scale dataset of 200,000+ newspaper articles from 40+ outlets across 15+ countries on Gaza media coverage (2004–2024), automating scraping, metadata structuring, and regression-based trend analysis. For a labor market study on BDJobs, I developed automation tools for bulk job applications deployed across 10,000+ applications to study how application patterns relate to gender and salary expectations. I have also constructed longitudinal street-name data for all Spanish municipalities (2000–2024) using OpenStreetMap, geospatial matching, named entity recognition, and Wikidata linkage, to study symbolic policy through urban renaming. Additional contributions include field experiment infrastructure on Upwork (20+ task websites, automated profile scraping, political preference surveys), LLM-based detection of dog-whistle political rhetoric in campaign advertisements, and deep learning pipelines for PhD supervisor–student matching and peer effects.
Procter & Gamble & Institute of Mathematical Sciences, Singapore
Collaborated with P&G researchers to develop adaptive algorithms for leveraging historical data in randomized controlled trials, minimizing control-group sample sizes while maintaining Type I and Type II error control. Optimized a suite of Bayesian borrowing priors, including normalized power priors, commensurate priors, elastic priors, and meta-analytic-predictive priors, reducing computational overhead by up to 78% and improving borrowing accuracy in low-congruence trial settings.
School of Computing, National University of Singapore
Examined inter-firm alliance networks and their temporal effects on profitability using Louvain community detection and network centrality measures, providing strategic insights into how dynamic network structures shape long-term firm performance. Also explored ensemble methods and base learners for XGBoost financial fraud detection pipelines, achieving 99% classification accuracy through rigorous statistical inference and model evaluation.
Singapore Bus Services Transit (SBST), Singapore
Researched and developed a system for detecting visual impairments, including blurriness and overexposure, in surveillance footage using deep convolutional neural networks, achieving 96% classification accuracy. Implemented a Siamese network utilizing SVD maps and Fourier transformation features with OpenCV, TensorFlow, and Keras for robust blur detection across varying video conditions.
Jio AI Centre for Excellence, India
Researched algorithms for Simultaneous Localization and Mapping (SLAM), including attention-based graph feature matching (SuperGlue), 3D reconstruction with Neural Radiance Fields (NeRF), and monocular depth estimation (ZoeDepth). Experimented with alternative training paradigms, including Forward-Forward training and Predictive Coding, achieving a 3× performance improvement through efficient gradient computation and enhanced scalability.
A.P. Moller – Maersk (APM Terminals), Singapore
Developed an Electronic Virtual Management System (EVMS) for APM Terminals, integrating 20+ databases to enable automated report generation, real-time operational dashboards, and systematic KPI tracking across logistics, finance, and risk management functions.
I served as an Undergraduate Teaching Assistant across the Departments of Computer Science and Economics at NUS from July 2022 to July 2025, and received multiple teaching excellence awards during that period.
Workshops organized: Generative AI for Business · Computer Vision with OpenCV · Network Analysis with tidygraph (R User Conference) · Introductory & Advanced LaTeX · NUS SoC Summer Workshops
AI · Finalists, NCS Innovation Challenge
TrafficAI: Generative AI for Urban Traffic
An AI-powered traffic control center using Flask and Gemini to process over one million daily data points from real-time sensors, cameras, GPS feeds, and social media. Implemented an active response system for accidents and disruptions.
Simulation · Software Engineering
OptPark: NUS Car Park Optimization
A parking optimization application in R Shiny using discrete event simulation and statistical modeling, deployed on AWS with Docker. Manages seven university car parks with real-time optimization and a full backend–frontend integration.
Geospatial Analysis
Grab-Posisi Traffic Analysis
Analyzed 80 million GPS pings from the Grab-Posisi dataset using GeoPandas and Folium to identify urban traffic bottlenecks and examine the influence of taxi stands, HDB construction sites, and parking infrastructure on traffic flow.
Empirical Finance · Fixed Income
Yield Curve Fitting under Negative Rates
Empirical evaluation of Nelson–Siegel–Svensson and cubic spline yield curve models using 20+ years of daily government bond data from the Eurozone and Japan under negative interest rate regimes, with out-of-sample forecasting and bootstrapping.
| Paul Sherwood Memorial Gold Medal (Best Graduate in Economics) | 2025 |
| Lijen Industrial Development Medal (Best Honours Project) | 2025 |
| Sugar Industry of Singapore Prize (Best Performance, Faculty of Science) | 2022 |
| NUS Dean's List & Dean's Scholars List | 2022–25 |
| Undergraduate Teaching Excellence Award, Department of Economics | 2024, 2025 |
| School of Computing Honour List of Tutors | 2024, 2025 |
| Department of Statistics & Data Science Teaching Assistant Award | 2024, 2025 |
| Top Student Awards: Data Structures & Algorithms, AI, Database Systems | 2023–25 |
| National Economics Olympiad (SRCC), All India Rank 2 | 2020 |
I welcome inquiries from researchers, collaborators, and prospective students. The best way to reach me is by email.
Department of Economics
National University of Singapore
21 Lower Kent Ridge Road
Singapore 119077