Costa Rican River Picture

Research

Statistics brings together mathematical theory, understanding of real-world processes that generate data, the analysis of that data, and communication of the design, analysis, and results of studies to students, scientists, policymakers, and other statisticians.

My research revolves around the organizing principle of understanding what data can and cannot tell us in different contexts; or, as Whitney Houston would say, “How Will I Know?” It can broadly be categorized into four groups:

A full list of my published work can be found at my Google Scholar profile. If you do not have access to any of the articles, please reach out to me. Brief summaries of how selected works fit into my overall research agenda are included below.

Note that nearly all of this work was conducted with other researchers across career stages, including biostatisticians, epidemiologists, virologists, biologists, physicians, and more. Please see the profile or the articles themselves for full author lists.

Cluster-Randomized Trials and Quasi-Experiments

Cluster-randomized trials (CRTs) randomize entire groups to intervention and control conditions, providing opportunities to study interventions and effects that cannot be understood fully through traditional (individually-randomized) trials. With these advantages, however, come statistical challenges and tradeoffs, as variance is inflated and biases can arise. We developed new methods for the analysis of stepped-wedge CRTs that provide increased interpretability and protection against Type I Error. I am currently developing further methods to analyze stepped-wedge trials with treatment effect heterogeneity (the preprint and software are now available) based on quasi-experimental methods.

Similar principles apply to understanding quasi-experimental methods for causal inference, such as the difference-in-differences (and its staggered adoption extensions) and synthetic control methods. These have the added complication of a lack of randomization, leading to further assumptions needed for valid causal estimates. I have written a primer on the potential and pitfalls of these methods for vaccine study design, and more material designed for biostatisticians/epidemiologists considering these methods is coming soon!

CRTs are particularly difficult to power and size. A trial too small will not be able to give meaningful evidence, while an overly large trial can waste time and resources, or appear infeasible. We have developed methods to appropriately estimate sample size for CRTs, described how key assumptions affect power calculations, and provided rules of thumb and simulations for feasible sizing in epidemic contexts.

Infectious Disease Study Design

Designing epidemiologic studies and clinical trials for infectious diseases—especially outbreaks and pandemics—demands an understanding of statistical principles of correlation and clustering as well as infectious disease epidemiology and dynamics. Bridging these gaps can improve the design, conduct, and analysis of studies, as well as the interpretation of their results and the many other data sources that arise in an epidemic. Much of this work has its roots in the COVID-19 pandemic, but has implications beyond this current context.

Statistical and epidemiologic biases are especially prevalent and concerning in infectious contexts. Often these arise due to the correlation among individuals from communicable diseases or the possibility or impossibility of re-infection over time. We have highlighted some biases and interpretation issues specifically for stepped-wedge CRTs and for observational seroprotection studies. Early in the COVID-19 pandemic, we also highlighted some general principles for understanding observational studies and avoiding biases.

Outbreak data streams provide an important way to understand key dynamics and properties of the pathogen, but they must be analyzed sensitively and correctly. We developed methods to use Ct values from PCR testing, an important but underused quantity, to understand epidemic dynamics more quickly and fully than through test positivity alone. We have also begun extending this to variant surveillance, and developed other methods for using contact tracing data to estimate key parameters.

The clustering that arises naturally in an outbreak can also be used to our advantage in designing studies, by borrowing information across transmission groups. We developed designs that account for clustering to improve contact tracing and early studies of the disease. Infectious diseases challenge us to think about population-level, rather than just individual-level, effects. The design and analysis of studies and models that demonstrate population-level effects can thus improve the deployment of public health resources. This is true in the study of vaccine effects and in the modeling of their allocation. For best public health policy, I argue that these effects are crucial to understand.

Collaborative Research

Statistical methods and study designs must come off the shelf and be put into practice to have a positive impact on health and medicine. In addition to some of the infectious disease studies described above, I have collaborated with researchers in nutrition and thoracic surgery to design studies and analyze data. In particular, understanding correlation and clustering can greatly improve the validity of results in various fields.

Understanding, Teaching, and Communicating Statistics

Data and statistics are used by and communicated to a wide variety of audiences, from scientific researchers to policymakers to the public at large. Properly laying out the uses, benefits, and tradeoffs of study designs and statistical analyses is vital for statistics to play a productive role in building a better society, and avoiding the worst abuses that it has been associated with in the past.

The history of statistics can inform these proper and improper uses and modes of communication. I have written about the history of statistics’ internal controversy around p-values and hypothesis testing, as well as how that paradigm became a key regulatory tool for the U.S. Food and Drug Administration. I have also written about the shameful intertwined history of statistics and eugenics, and the importance of openly discussing and confronting that history. Incorporating this history into our understanding of statistics and statistics education provides an opportunity to improve the use of statistics going forward.

I have also written articles aimed at broader audiences that illuminate various concepts in statistics. These include: understanding percentages and Simpson’s Paradox in the context of election resultsquasi-experimental evidence and biases in the context of baseball’s recent rule changes; and estimands and effect measures in the context of vaccine efficacy. See Writing for more details.