Xiaobo Sun, PhD
With interdisciplinary training across computer science, bioinformatics, statistics, and biomedical sciences, my research program develops principled statistical methodology, artificial intelligence (AI) methods, and scalable data management and knowledge-mining tools to address key computational challenges in computational genetics and genomics. The overarching objective is to enable rigorous, data-driven studies of molecular and cellular mechanisms underlying diverse biological states and human diseases by making large-scale, heterogeneous biomedical data more interpretable, integrative, and actionable.
In the near term, my work focuses on causality-aware analytic frameworks for heterogeneous multi-sample, multi-condition, multimodal, and longitudinal omics studies. I am particularly interested in generative and mechanistic modeling that integrates single-cell multi-omics, perturbation signals, and spatial context to build a Virtual Cell: a predictive model for simulating cellular states, trajectories, and responses to perturbations across biological contexts. These approaches are designed to disentangle biological variation from technical and cohort confounding, integrate information across molecular layers, and identify mechanistic drivers, beyond associative signals, that explain phenotypic variation across individuals, disease states, and developmental stages.
In the long term, my research interests lie in building computational methodologies and AI systems that integrate generative modeling, causal inference, multimodal representation learning, and graph-based learning with large language models (LLMs) and AI agents to democratize knowledge mining from atlas-scale multi-omics resources. This agenda supports a shift from today’s tool-centric workflow toward a question-centric paradigm, in which scientists can pose high-level biological questions in natural language and receive transparent, reproducible, evidence-grounded answers supported by retrieval from multi-omics atlases and associated literature—thereby accelerating hypothesis generation, mechanistic insight, and translational discovery.
