Machine Learning and the Propensity Score

Theory and Application in Fair Trade Coffee Certification Schemes

Author
Affiliation

Mitchell Cameron

Published

August 9, 2024

Abstract
This project explores and demonstrates the estimation of propensity scores and the integration of machine learning techniques to enhance their accuracy. Propensity scores are a crucial tool in observational studies for reducing bias when estimating causal effects. By leveraging machine learning algorithms, this work aims to improve the precision of propensity score estimates, thereby enhancing the reliability of causal inferences. The project first presents a theoretical discussion on propensity score estimation, detailing the limitations of traditional methods and the potential benefits of applying machine learning approaches. Particularly, gradient boosting machines have strong theoretical advantages for probability prediction. Second, a practical, coded tutorial is provided, offering a step-by-step guide on implementing machine learning techniques for propensity score estimation. This tutorial serves as a resource for researchers and practitioners who wish to apply these methods in their own work. Finally, the project includes a replication study investigating the effects of Fair Trade coffee certification on producers’ incomes. Using the enhanced propensity score methods discussed earlier, the study re-examines previous findings, offering new insights into the impact of Fair Trade certification. The results underscore the importance of accurate propensity score estimation in evaluating causal relationships in observational data.

Preface

This project is submitted in partial fulfillment of the requirements for the Master of Applied Science in Statistics at the University of Otago. My academic background in economics and politics sparked an enduring interest in causal inference, particularly its role in shaping evidence-based policymaking. However, my focus has evolved to include machine learning — a field that, while not traditionally central to economics, offers powerful tools for refining causal analysis.

The motivation for this project stems from my desire to bridge the gap between propensity score methods and machine learning. While the literature is rich with simulation studies, there is a noticeable lack of comprehensive, tutorial-style resources that guide readers through the application of machine learning to propensity score estimation. This project aims to fill that void, providing a practical and accessible exploration of these techniques with approachable theoretical discussion and coded examples in R. As a dedicated user of R, all the packages and methodologies discussed in this project exist in the R ecosystem. Although comparable tools exist in other languages and software.

The intended audience for this work includes individuals with a foundational understanding of causal inference and machine learning, particularly those interested in enhancing propensity score models. However, my discussion extends beyond propensity scores; much of what is covered is relevant to anyone using machine learning for probability prediction.

Acknowledgments

I would like to express my deepest gratitude to several individuals whose support, guidance, and encouragement have been invaluable throughout the course of this project.

First and foremost, I extend my sincere thanks to Associate Professor Matthew Parry, my supervisor. His Socratic method of teaching has fostered not only my academic growth but also my critical thinking abilities. I am particularly grateful for his flexibility and understanding as I navigated through various iterations and changes in the direction of this project. His patience and positivity, even as the project’s focus shifted, were invaluable, and I deeply appreciate his guidance throughout this journey.

I am also grateful to Dr Conor Kresin, whose knowledge of causal inference provided helpful background for this project. Additionally, my thanks go to Dr Falco J. Bargagli Stoffi for his time and correspondence during earlier iterations of this project.

To my fellow students — Bethany, Jess, Shalini, Sarah, Maryam, Anna, Alex, Jackson, Teddy, Steven, and Chris — thank you for the camaraderie, the shared challenges, collective encouragement, and practical advice.

Lastly, but by no means least, I want to thank my girlfriend, Dani for her patience, support, and understanding throughout this process. I also want to express my gratitude for her meticulous help with proofreading, which has greatly improved the clarity and quality of this project.

Presentation Tools

Quarto is an open-source publishing system that enables the creation of dynamic documents, reports, presentations, books, and websites using R, Python, or Julia. It integrates code, markdown, and graphics seamlessly, making it ideal for reproducible research and communication. I the Quarto Book template to create this project which is hosted on GitHub.

Folding Code

Many coded examples are neatly folded away.

Click: Fold the Code
print("Click the fold button!!")
[1] "Click the fold button!!"

Hover

Many highlighted cross references, equations, and footnotes will “expand” when you hover above them.1 The project begins in 1  Introduction and Background.

Dark Mode

A small toggle on the navagation panel will transform this project with dark mode for night time reading or eye comfort.


  1. This is the footnote.↩︎