I am a PhD candidate at the Biostatistics Department at Harvard T.H. Chan School of Public Health advised by Dr. Tianxi Cai. I’m also part of the Center for a Learning Health System.

I’m focused on developing reinforcement learning and natural language processing methods which are robust enough for real-world applications and keep strong theoretical guarantees. I focus on healthcare and biomedical applications which usually involve challenging and unstructured data such as electronic health records (EHR) which suffers from sampling bias, partially observed rewards, or strong distribution shifts between different hospital sites. My PhD dissertation deals with learning optimal dynamic treatment regimes, in particular I work on the following topics:

  • Semi-supervised reinforcement learning and doubly robust value function estimation
  • Learning domain-specific safe and interpretable policies using hypothesis testing
  • Using a surrogate convex loss function to optimize dynamic treatment regimes

Other topics I enjoy working on are 1) developing natural language processing methods for phenotyping, automatic diagnosis, and for building a medical knowledge graph from clinical data in settings where labels are not available. This research is centered around the fact that patient label data in EHR data is often unavailable. 2) Scalable methods for semi-parametric Gaussian process regression with provable convergence for detecting air pollution effects on cognitive development of newborns.