Research My research interest revolves around representation learning, AI safety, mechanistic interpretability, and AI alignment. Modern machine learning models often operate like a black-box: my goal is to understand the mechanism that enabled the success of these large-scale models, thereby making them more interpretable and aligned with human's goals. |