Jake Yeung

Theory, Experiment, and Analysis for Genome Biology and Drug Discovery

prof_pic.jpg

Principal Scientist.

Genentech, Inc.

1 DNA Way,

South San Francisco, CA 94080, USA

Systems Biology for Drug Discovery

I am a systems biologist at Genentech’s Research & Early Development (gRED), where I combine theory, experiment, and analysis to understand how cells regulate their genomes and to discover drugs that revert disease states to healthy ones. My work pairs top-down observational analysis of complex tissue environments with bottom-up causal perturbations to isolate specific cellular behaviors. By integrating public and in-house datasets, I aim to predict therapeutic efficacy and potential off-target effects across diverse patient contexts.

For my training, I was a Human Frontier Science Program (HFSP) fellow with Alexander van Oudenaarden at the Hubrecht Institute, and completed my PhD at EPFL in computational systems biology advised by Felix Naef.

Research themes

My projects span three approaches — theory, experiment, and statistical analysis — and typically pair two of these at a time (Fig. 1), bringing in both wet and dry lab components to address questions in oncology and genome biology.

Three_Approaches
Fig. 1: Theory, experiment, and statistical analysis for genome biology and drug discovery

(1) Theory-driven experimental design

Theory ↔ Experiment.

Designing informative experiments requires theory: models that predict what can be learned from a given measurement strategy before running it. Rather than experimentally enumerating all possible conditions, we derive theoretical frameworks that quantify information gain, reproducibility limits, and cost-information tradeoffs, then use these to design experiments that maximize signal per dollar.

We have also developed a statistical unmixing method (scChIX-seq) that maps multiple histone modifications in single cells and learns the correlation structure between histone marks (Yeung*, Florescu*, Zeller* et al 2023).

scchix
Fig. 2: Multiplexing and unmixing reveal relationships between chromatin states

(2) Statistical analysis of functional genomics technologies

Experiment ↔ Statistical Analysis.

New functional genomics technologies can measure the regulation and output of every gene across virtually every cell in the body. Making sense of this data requires statistical methods that separate biological signal from technical artifacts in sparse, noisy settings, and that integrate information across experiments performed under vastly different conditions. The goal is inference and prediction that can guide the next experiment or therapeutic hypothesis.

We have shown that combining cell surface markers with histone modification mapping in single cells (sortChIC) can systematically compare genome-wide differences between active (e.g. H3K4me1, H3K4me3) and repressed (e.g. H3K27me3, H3K9me3) chromatin states during blood maturation (Zeller*, Yeung* et al 2022). At Genentech, we extend these approaches to large-scale perturbation screens in oncology, integrating in-house and public datasets to predict therapeutic response across patient contexts.

sortChIC
Fig. 3: Analysis methods that quantify global differences between cell types across chromatin states

(3) Principles of genome regulation

Theory ↔ Statistical Analysis.

Understanding how cells regulate their genomes requires moving from description to mechanism: hypothesizing regulatory principles, articulating them as statistical models, and testing them against carefully crafted null hypotheses. Even well-characterized tissues harbor rich regulatory structure that only becomes visible when the right question is asked of the right data.

We have developed model selection approaches to categorize genes by shared regulatory logic, reducing diverse expression patterns across a tissue to a small number of interpretable principles — for example, quantifying how much tissue-specific dynamics come from individual transcription factors versus cooperation between pairs (Yeung*, Mermet* et al 2018), or disentangling sleep-wake driven from circadian processes in gene expression (Hor*, Yeung* et al 2019). These analyses yield direct predictions testable by experiment, such as CRISPR knockout of enhancers (Mermet*, Yeung* et al 2018), and provide a template for the same regulatory decomposition in disease contexts.

4Cseq
Fig. 4: Changes in chromatin conformation at Cry1 over 24 hours