In vivo genome editing using Staphylococcus aureus Cas9

Article
In vivo genome editing using Staphylococcus aureus Cas9
F. Ann Ran1,2*, Le Cong1,3*, Winston X. Yan1,4,5*, David A. Scott1,6,7, Jonathan S.
Gootenberg1,8, Andrea Kriz3, Bernd Zetsche1, Ophir Shalem1, Xuebing Wu9, Kira Makarova10,
Eugene Koonin10, Phillip A. Sharp3,9, and Feng Zhang1,6,7,11†
1
Broad Institute of MIT and Harvard
Cambridge, MA 02142, USA
2
Society of Fellows
Harvard University
Cambridge, MA 02138, USA
3
Department of Biology
McGovern Institute for Brain Research
7
Department of Brain and Cognitive Sciences
9
David H. Koch Institute for Integrative Cancer Research
11
Department of Biological Engineering
Massachusetts Institute of Technology
Cambridge, MA 02139, USA
6
4
5
Graduate Program in Biophysics
Harvard-MIT Division of Health Sciences and Technology
8
Department of Systems Biology
Harvard Medical School
Boston, MA 02115, USA
10
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
Bethesda, MD 20894, USA
* These authors contributed equally to this work.
† To whom correspondence should be addressed: [email protected].
1
Abstract
The RNA-guided endonuclease Cas9 has emerged as a versatile genome-editing platform.
However, the size of the commonly used Cas9 from Streptococcus pyogenes (SpCas9) limits
its utility for basic research and therapeutic applications that employ the highly versatile
adeno-associated virus (AAV) delivery vehicle. Here, we characterize six smaller Cas9
orthologs and show that Cas9 from Staphylococcus aureus (SaCas9) can edit the genome
with efficiencies similar to those of SpCas9, while being >1kb shorter. We packaged SaCas9
and its sgRNA expression cassette into a single AAV vector and targeted the cholesterol
regulatory gene Pcsk9 in the mouse liver. Within one week of injection, we observed >40%
gene modification, accompanied by significant reductions in serum Pcsk9 and total
cholesterol levels. We further demonstrate the power of using BLESS to assess the genomewide targeting specificity of SaCas9 and SpCas9, and show that SaCas9 can mediate
genome editing in vivo with high specificity.
Introduction
Cas9, an RNA-guided endonuclease derived from the Type II CRISPR-Cas bacterial adaptive
immune system1–7, has been harnessed for genome editing8,9 and holds tremendous promise for
biomedical research. Genome editing of somatic tissue in post-natal animals, however, has been
limited in part by the challenge of delivering Cas9 in vivo. For this purpose, adeno-associated
virus (AAV) vectors are attractive vehicles10 because of their low immunogenic potential,
reduced oncogenic risk from host-genome integration11, and broad-range of serotype
specificity12–15. Nevertheless, the restrictive cargo size (~4.5kb) of AAV presents an obstacle for
packaging the commonly used Streptococcus pyogenes Cas9 (SpCas9, ~4.2kb) and its sgRNA in
a single vector; although technically feasible16,17, this approach leaves little room for customized
expression and control elements.
In search of smaller Cas9 enzymes for efficient in vivo delivery by AAV, we have previously
described a short Cas9 from the CRISPR1 locus of Streptococcus thermophilus LMD-9
(St1Cas9, ~3.3kb)8 as well as a rationally-designed truncated form of SpCas918 for genome
editing in human cells. However, both systems have important practical drawbacks: the former
requires a complex Protospacer-Associated Motif (PAM) sequence (NNAGAAW)3, which
2
restricts the range of accessible targets, whereas the latter exhibits reduced activity. Given the
substantial diversity of CRISPR-Cas systems present in sequenced microbial genomes19, we
therefore sought to interrogate and discover additional Cas9 enzymes that are small, efficient,
and broadly targeting.
In vitro cleavage by small Cas9s
Type II CRISPR-Cas systems require only two main components for eukaryotic genome editing:
a Cas9 enzyme, and a chimeric single guide RNA (sgRNA)6 derived from the CRISPR RNA
(crRNA) and the noncoding trans-activating crRNA (tracrRNA)4,20. Analysis of over 600 Cas9
orthologs shows that these enzymes are clustered into two length groups with characteristic
protein sizes of approximately 1350aa and 1000aa residues, respectively19,21 (Extended Data
Fig. 1a), with shorter Cas9s having significantly truncated REC domains (Fig. 1a). From these
shorter Cas9s, which belong to Type IIA and IIC subtypes, we selected six candidates for
profiling (Fig. 1a and Extended Data Fig. 1b). To determine the cognate crRNA and tracrRNA
for each Cas9, we computationally identified regularly interspaced repeat sequences (direct
repeats) within a 2-kb window flanking the CRISPR locus. We then predicted the tracrRNA by
detecting sequences with strong complementarity to the direct repeat sequence (an anti-repeat
region), at least two predicted stem-loop structures, and a Rho-independent transcriptional
termination signal up to 150-nt downstream of the anti-repeat region. Although a truncated
tracrRNA can support robust DNA cleavage in vitro6, previous reports show that the secondary
structures of the tracrRNA are important for Cas9 activity in mammalian cells8,9,18,22. Therefore,
we designed sgRNA scaffolds for each ortholog by fusing the 3‟ end of a truncated direct repeat
with the 5‟ end of the corresponding tracrRNA, including the full-length tail, via a 4-nt linker6
(Extended Data Fig. 1b and Supplementary Table 1). To identify the PAM sequence for each
Cas9, we first constructed a library of plasmid DNA containing a constant 20-bp target followed
by a degenerate 7-bp sequence (5‟-NNNNNNN). We then incubated cell lysate from human
embryonic kidney 293FT (293FT) cells expressing the Cas9 ortholog with its in vitro transcribed
sgRNA and the plasmid library. By generating a consensus from the 7-bp sequence found on
successfully cleaved DNA plasmids (Fig. 1b), we determined putative PAMs for each Cas9 (Fig.
1c). We observed that, similar to SpCas9, most Cas9 orthologs cleaved targets 3-bp upstream of
the PAM (Extended Data Fig. 2). To validate each putative PAM from the library, we then
3
incubated a DNA template bearing the consensus PAM with cell lysate and the corresponding
sgRNA. We found that the Cas9 orthologs, in combination with the sgRNA designs, successfully
cleaved the appropriate targets (Fig. 1d and Supplementary Table 2).
To test whether each Cas9 ortholog can facilitate genome editing in mammalian cells, we cotransfected 293FT cells with individual Cas9s and their respective sgRNAs targeting human
endogenous loci containing the appropriate PAMs. Of the six Cas9 orthologs tested, only the one
from Staphylococcus aureus (SaCas9) produced indels with efficiencies comparable to those of
SpCas9 (Extended Data Fig. 3a, b and Supplementary Table 3), suggesting that DNAcleavage activity in cell-free assays does not necessarily reflect the activity in mammalian cells.
These observations prompted us to focus on harnessing SaCas9 and its sgRNA for in vivo
applications.
SaCas9 sgRNA design and PAM discovery
Although mature crRNAs in S. pyogenes are processed to contain 20-nt spacers (guides) and 19to 22-nt direct repeats4, RNA sequencing of crRNAs from other organisms reveals that the
spacer and direct repeat sequence lengths can vary4,20,23. We therefore tested sgRNAs for SaCas9
with variable guide lengths and repeat:anti-repeat duplexes. We found that SaCas9 achieves the
highest editing efficiency in mammalian cells with guides between 21- to 23-nt long and can
accommodate a range of lengths for the direct repeat:anti-repeat region (Fig. 2a, b, Extended
Data Fig. 4). This notably contrasts with SpCas9, where the natural 20-nt guide length can be
truncated to 17-nt without significantly compromising nuclease activity, while increasing
specificity24. Additionally, replacing the first base of the guide with guanine further improved
SaCas9 activity (Extended Data Fig. 3c).
To fully characterize the SaCas9 PAM and the seed region within its guide sequence25, we
performed ChIP using catalytically mutant forms of SaCas9 (dSaCas9, D10A and N580A
mutations, based on homology to SpCas9) or SpCas9 (dSpCas9, D10A and H840A
mutations) and their corresponding sgRNAs. We targeted two loci in the human EMX1 gene with
composite NGGRRT PAMs, which allow targeting by both Cas9 variants. A search for motifs
containing both the guide region and PAM within 50-nt of the ChIP peak summits revealed seed
4
sequences of 7-8 nt for dSaCas9 (Fig. 2c). In addition, NNGRRT and NGG PAMs were found
adjacent to the seed sequences for dSaCas9 and dSpCas9, respectively (Extended Data Fig. 5).
Although the 6th position of the PAM is predominantly thymine, we did observe low levels of
degeneracy in both the biochemical and ChIP-based PAM discovery assays (Fig. 1c and
Extended Data Fig. 5a). We therefore tested the base preference for this position and
determined that although SaCas9 cleaves genomic targets most efficiently with NNGRRT, all
NNGRR PAMs can be cleaved and should be considered as potential targets, especially in the
context of off-target evaluations (Fig. 2d, Extended Data Fig. 6, and Supplementary Table 4).
Unbiased profiling of Cas9 specificity
As advances in Cas9 technology promise to enable a broad range of in vivo and therapeutic
applications, accurate, genome-wide identification of off-target nuclease activity has become
increasingly important. Although a number of studies have employed sequence similarity-based
off-target search22,26–30 or dCas9-ChIP31,32 to predict off-target sites for Cas9, such approaches
cannot assess the nuclease activity of Cas9 in a comprehensive and unbiased manner. To directly
measure the genome-wide cleavage activity of SaCas9 and SpCas9, we applied BLESS (direct in
situ breaks labeling, enrichment on streptavidin and next-generation sequencing)33 to capture
Cas9-induced DNA double-stranded breaks (DSBs) in cells. We transfected 293FT cells with
SaCas9 or SpCas9 and the same EMX1 targeting guides used in the previous ChIP experiment, or
pUC19 as negative controls. After cells are fixed, free genomic DNA ends from DSBs are
captured using biotinylated adaptors and analyzed by deep sequencing (Fig. 3a). To identify
candidate Cas9-induced DSB sites genome-wide, we established a three-step analysis pipeline
following alignment of the sequenced BLESS reads to the genome (Extended Data Fig. 7a,
Supplementary Discussion). First, we applied nearest-neighbor clustering on the aligned reads
to identify groups of DSBs (DSB clusters) across the genome. Second, we sought to separate
potential Cas9-induced DSB clusters from background DSB clusters resulting from low
frequency biological processes and technical artifacts, as well as high frequency telomeric and
centromeric DSB hotspots33. From the on-target and a small subset of verified off-target sites
(predicted by sequence similarity using a previously established method22 and sequenced to
detect indels), we found that reads in Cas9-induced DSB clusters mapped to characteristic, welldefined genomic positions compared to the more diffuse alignment pattern at background DSB
5
clusters. To distinguish between the two types of DSB clusters, we calculated in each cluster the
distance between all possible pairs of forward and reverse-oriented reads (corresponding to 3‟
and 5‟ ends of DSBs), and filtered out the background DSB clusters based on the distinctive
pairwise-distance distribution of these clusters (Extended Data Fig. 7b, c). Third, the DSB score
for a given locus was calculated by comparing the count of DSBs in the experimental and
negative control samples using a maximum-likelihood estimate (MLE)22 (Supplementary
Discussion). This analysis identified the on-target loci for both SaCas9 and SpCas9 guides as the
top scoring sites, and revealed additional sites with high DSB scores (Fig. 3b-d).
Next, we sought to assess whether DSB scores correlated with indel formation. We used targeted
deep sequencing to detect indel formation on the ~30 top-ranking off-target sites identified by
BLESS for each Cas9 and sgRNA combination. We found that only those sites that contained
PAM and homology to the guide sequence exhibited indels (Extended Data Fig. 8). We
observed a strong linear correlation between DSB scores and indel levels for each Cas9 and
sgRNA pairing (r2 = 0.948 and 0.989 for the two EMX1 targets with SaCas9 and r2 = 0.941 and
0.753 for those with SpCas9) (Fig. 3c, Extended Fig. 9b-d). Furthermore, BLESS identified
additional off-target sites not previously predicted by sequence similarity to target or ChIP
(Extended Data Fig. 7 and 9, Supplementary Tables 5 and 6). These new off-target sites
include not only those containing Watson-Crick base-pairing mismatches to the guide, but also
the recently reported insertion and deletion mismatches in the guide:target heteroduplex (Fig.
3d)29,30. Together, these results highlight the need for more precise understanding of rules
governing Cas9 nuclease activity, a requisite step towards improving the predictive power of
computational guide design programs.
In vivo genome editing using SaCas9
Following in vitro characterization, we incorporated SaCas9 and its sgRNA into an AAV vector
to test its efficacy and specificity in vivo. The small size of SaCas9 enables packaging of both a
U6-driven sgRNA and a CMV- or TBG-driven SaCas9 expression cassette into a single AAV
vector within the 4.5kb packaging limit. Using hepatocyte-tropic AAV serotype 8, we targeted
the mouse apolipoprotein (Apob) gene (Extended Data Fig. 10a). One week after intravenous
administration of virus into C57BL/6 mice, we observed ~5% indel formation in liver tissue;
6
after four weeks, the liver tissue showed characteristic hepatic lipid accumulation from Apob
knockdown following histology analysis using oil red staining34–37 (Extended Data Fig. 10b, c).
We next targeted proprotein convertase subtilisin/kexin type 9 (Pcsk9), a therapeutically relevant
gene involved in cholesterol homeostasis38. Inhibitors of the human convertase PCSK9 have
emerged as a promising new class of cardioprotective drugs after human genetic studies revealed
that loss of PCSK9 is associated with a reduced risk of cardiovascular disease and lower levels of
LDL cholesterol39–41. We designed two Pcsk9-targeting sgRNAs and validated their activity in
vitro. Each sgRNA was packaged into AAV-SaCas9 and injected into mice (2E11 total genome
copies) (Fig. 4a). One week after administration, we observed greater than 40% indel formation
at either locus in whole liver tissue, with similar levels two and four weeks post-injection (Fig.
4b). To determine the effect of Pcsk9-targeting AAV-SaCas9 dosage on serum Pcsk9 and total
cholesterol levels, we administered a range of AAV titers from 0.5E11 to 4E11 total genome
copies. With all titers, we observed a ~95% decrease in serum Pcsk9 and a ~40% decrease in
total cholesterol one week after administration, both of which were sustained throughout the
course of four weeks (Fig. 4c, d).
Given the importance of targeting specificity in a therapeutic context, we next considered
SaCas9 off-target modifications in vivo. To identify candidate off-target cleavage sites for the
two Pcsk9-targeting guides, we transiently transfected an AAV-CMV::SaCas9 vector into mouse
Neuroblastoma-2a (N2a) cells and applied BLESS to detect Cas9-induced DSBs in the genome.
For both guides, we found very low levels of DSB signal across the genome except at the ontarget locus (Fig. 4e). Targeted deep sequencing of the candidate off-target sites identified by
BLESS in N2a cells did not reveal appreciable levels of indels in either N2a cells or liver tissue
(4 weeks post injection of 2E11 total genome copies) (Fig. 4e, f and Supplementary Table 8).
We additionally sequenced off-target sites predicted by target sequence similarity, and likewise
did not detect indel formations (Supplementary Table 9).
Finally, we examined the titer-matched Pcsk9-targeting and TBG-GFP cohorts as well as naïve
animals for signs of toxicity or acute immune response. At 1 week post-injection, necropsy and
gross examination of liver tissue of the cohorts revealed no abnormalities; further histological
7
examination of the liver by hematoxylin and eosin (H&E) staining showed no signs of
inflammation, such as aggregates of lymphocytes or macrophages (Fig. 5a). Throughout the time
course of the experiment, there were no elevated levels of serum ALT, albumin, and total
bilirubin in any of the cohorts. We observed a slight trend in AST increase across all cohorts at
four weeks, including the un-injected animals. The elevated levels did not exceed the upper limit
of normal and is not indicative of hepatocellular injury (Fig. 5b). However, a larger cohort study
should be conducted to further evaluate the effects of in vivo toxicity.
Discussion
Here, we develop a small and efficient Cas9 from S. aureus for in vivo genome editing17. The
results of these experiments highlight the power of using comparative genomic analysis19,42 in
expanding the CRISPR-Cas toolbox. Identification of new Cas9 orthologs19,42, in addition to
structure-guided engineering, could yield a repertoire of Cas9 variants with expanded capabilities
and mimized molecular weight, for nucleic acid manipulation to further advance genome and
epigenome engineering.
The AAV-SaCas9 system is able to mediate efficient and rapid editing of Pcsk9 in the mouse
liver, resulting in reductions of serum Pcsk9 and total cholesterol levels. To assess the specificity
of SaCas9, we used an unbiased DSB detection method, BLESS, to identify a list of candidate
off-target cleavage sites in mouse cells. We examined these sites in liver tissue transduced by
AAV-SaCas9 and did not observe any indel formation within the detection limits of targeted
deep sequencing. However, the off-target sites identified in vitro might differ from those in vivo,
which need to be further evaluated by the in vivo applications of BLESS or other unbiased
techniques such as those published during the revision of this work43,44. Finally, we did not
observe any overt signs of acute toxicity at one to four weeks post virus administration. Although
further studies are needed to further improve the SaCas9 system for in vivo genome editing, such
as assessing the long-term impact of Cas9 and sgRNA expression, these findings suggest that in
vivo genome editing using SaCas9 has the potential to be highly efficient, specific, and welltolerated.
Supplementary Information is available in the online version of the paper.
8
Acknowledgments
We thank Emmanuelle Charpentier, Ines Fonfara, and Krzysztof Chylinski for discussions;
Abigail Scherer-Hoock, Bailey Clear, and the MIT Division of Comparative Medicine for
assistance with animal experiments; Boston Children‟s Hospital Viral Core and Ru Xiao at the
Massachusetts Eye & Ear Infirmary Viral Vector Core for assistance with AAV production;
Nicola Crosetto for advice on BLESS; Chie-Yu Lin and Ian Slaymaker with experimental
assistance; and the entire Zhang lab for support and advice. F.A.R. is a Junior Fellow at the
Harvard Society of Fellows. W.X.Y. is supported by T32GM007753 from the National Institute
of General Medical Sciences and a Paul and Daisy Soros Fellowship. J.S.G. is supported by a
D.O.E. Computational Science Graduate Fellowship. F.Z. is supported by the National Institutes
of Health through (NIMH: 5DP1-MH100706) and (NIDDK: 5R01DK097768-03), a Waterman
Award from the National Science Foundation, the Keck, New York Stem Cell, Damon Runyon,
Searle Scholars, Merkin, and Vallee Foundations, and Bob Metcalfe. F.Z. is a New York Stem
Cell Foundation Robertson Investigator. The Children‟s Hospital virus core is supported by NIH
core grant (5P30EY012196-17). The content is solely the responsibility of the authors and does
not necessarily represent the official views of the National Institute of General Medical Sciences
or the National Institutes of Health. CRISPR reagents are available to the academic community
through Addgene, and information about the protocols, plasmids, and reagents can be found at
the Zhang Lab website www.genome-engineering.org.
Author Contributions
F.A.R. and F.Z. conceived this study. F.A.R., L.C., W.X.Y., and F.Z. designed and performed
the experiments with help from all authors. F.A.R., J.S.G., O.S., K.M., E.K., and F.Z. performed
analysis on Cas9 orthologs, crRNA, and tracrRNA, and PAM. A.K., F.A.R., and X.W.
performed ChIP and computational analysis and validation. F.A.R., W.X.Y., and L.C. performed
BLESS and targeted sequencing of BLESS-identified off-target sites, and D.A.S. contributed
computational analysis of BLESS data. W.X.Y., F.A.R., L.C., and B.Z. contributed animal data.
W.X.Y., F.A.R., L.C., J.S.G., and F.Z. wrote the manuscript with help from all authors.
9
Author Information
All reagents described in this manuscript have been deposited with Addgene (plasmid IDs 61591,
61592, and 61593). Source data are available online and deep sequencing data are available at
Sequence Read Archive under BioProject ID PRJNA274149. Reprints and permissions
information is available at www.nature.com/reprints. The authors declare competing financial
interests: details are available in the online version of the paper. Readers are welcome to
comment on the online version of the paper. Correspondence and requests for materials should
be addressed to F.Z. ([email protected]).
10
References
1. Bolotin, A., Quinquis, B., Sorokin, A. & Ehrlich, S. D. Clustered regularly interspaced short
palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiol. Read.
Engl. 151, 2551–2561 (2005).
2. Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes.
Science 315, 1709–1712 (2007).
3. Garneau, J. E. et al. The CRISPR/Cas bacterial immune system cleaves bacteriophage and
plasmid DNA. Nature 468, 67–71 (2010).
4. Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded small RNA and host factor
RNase III. Nature 471, 602–607 (2011).
5. Sapranauskas, R. et al. The Streptococcus thermophilus CRISPR/Cas system provides
immunity in Escherichia coli. Nucleic Acids Res. 39, 9275–9282 (2011).
6. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial
immunity. Science 337, 816–821 (2012).
7. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V. Cas9-crRNA ribonucleoprotein
complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc. Natl.
Acad. Sci. U. S. A. 109, E2579–2586 (2012).
8. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339,
819–823 (2013).
9. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826
(2013).
10. Gaudet, D. et al. Review of the clinical development of alipogene tiparvovec gene therapy
for lipoprotein lipase deficiency. Atheroscler. Suppl. 11, 55–60 (2010).
11
11. Vasileva, A. & Jessberger, R. Precise hit: adeno-associated virus in gene targeting. Nat. Rev.
Microbiol. 3, 837–847 (2005).
12. Mingozzi, F. & High, K. A. Therapeutic in vivo gene transfer for genetic disease using
AAV: progress and challenges. Nat. Rev. Genet. 12, 341–355 (2011).
13. Gao, G., Vandenberghe, L. H. & Wilson, J. M. New recombinant serotypes of AAV vectors.
Curr. Gene Ther. 5, 285–297 (2005).
14. Kay, M. A. State-of-the-art gene-based therapies: the road ahead. Nat. Rev. Genet. 12, 316–
328 (2011).
15. Zincarelli, C., Soltys, S., Rengo, G. & Rabinowitz, J. E. Analysis of AAV serotypes 1-9
mediated gene expression and tropism in mice after systemic injection. Mol. Ther. J. Am. Soc.
Gene Ther. 16, 1073–1080 (2008).
16. Swiech, L. et al. In vivo interrogation of gene function in the mammalian brain using
CRISPR-Cas9. Nat. Biotechnol. 33, 102–106 (2015).
17. Senís, E. et al. CRISPR/Cas9-mediated genome engineering: an adeno-associated viral
(AAV) vector toolbox. Biotechnol. J. 9, 1402–1412 (2014).
18. Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA.
Cell 156, 935–949 (2014).
19. Chylinski, K., Makarova, K. S., Charpentier, E. & Koonin, E. V. Classification and evolution
of type II CRISPR-Cas systems. Nucleic Acids Res. 42, 6091–6105 (2014).
20. Chylinski, K., Le Rhun, A. & Charpentier, E. The tracrRNA and Cas9 families of type II
CRISPR-Cas immunity systems. RNA Biol. 10, 726–737 (2013).
21. Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CRISPR-Cas9 for
genome engineering. Cell 157, 1262–1278 (2014).
12
22. Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol.
31, 827–832 (2013).
23. Hou, Z. et al. Efficient genome engineering in human pluripotent stem cells using Cas9 from
Neisseria meningitidis. Proc. Natl. Acad. Sci. U. S. A. 110, 15644–15649 (2013).
24. Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M. & Joung, J. K. Improving CRISPR-Cas
nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279–284 (2014).
25. Semenova, E. et al. Interference by clustered regularly interspaced short palindromic repeat
(CRISPR) RNA is governed by a seed sequence. Proc. Natl. Acad. Sci. U. S. A. 108, 10098–
10103 (2011).
26. Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in
human cells. Nat. Biotechnol. 31, 822–826 (2013).
27. Mali, P. et al. CAS9 transcriptional activators for target specificity screening and paired
nickases for cooperative genome engineering. Nat. Biotechnol. 31, 833–838 (2013).
28. Pattanayak, V. et al. High-throughput profiling of off-target DNA cleavage reveals RNAprogrammed Cas9 nuclease specificity. Nat. Biotechnol. 31, 839–843 (2013).
29. Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions
between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014).
30. Bae, S., Park, J. & Kim, J.-S. Cas-OFFinder: a fast and versatile algorithm that searches for
potential off-target sites of Cas9 RNA-guided endonucleases. Bioinforma. Oxf. Engl. 30,
1473–1475 (2014).
31. Wu, X. et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells.
Nat. Biotechnol. 32, 670–676 (2014).
13
32. Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Genome-wide analysis reveals
characteristics of off-target sites bound by the Cas9 endonuclease. Nat. Biotechnol. 32, 677–
683 (2014).
33. Crosetto, N. et al. Nucleotide-resolution DNA double-strand break mapping by nextgeneration sequencing. Nat. Methods 10, 361–365 (2013).
34. Young, S. G. Recent progress in understanding apolipoprotein B. Circulation 82, 1574–1594
(1990).
35. Soutschek, J. et al. Therapeutic silencing of an endogenous gene by systemic administration
of modified siRNAs. Nature 432, 173–178 (2004).
36. Rozema, D. B. et al. Dynamic PolyConjugates for targeted in vivo delivery of siRNA to
hepatocytes. Proc. Natl. Acad. Sci. U. S. A. 104, 12982–12987 (2007).
37. Wolfrum, C. et al. Mechanisms and optimization of in vivo delivery of lipophilic siRNAs.
Nat. Biotechnol. 25, 1149–1157 (2007).
38. Fitzgerald, K. et al. Effect of an RNA interference drug on the synthesis of proprotein
convertase subtilisin/kexin type 9 (PCSK9) and the concentration of serum LDL cholesterol
in healthy volunteers: a randomised, single-blind, placebo-controlled, phase 1 trial. Lancet
383, 60–68 (2014).
39. Abifadel, M. et al. Mutations in PCSK9 cause autosomal dominant hypercholesterolemia.
Nat. Genet. 34, 154–156 (2003).
40. Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from
frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 (2005).
41. Horton, J. D., Cohen, J. C. & Hobbs, H. H. Molecular biology of PCSK9: its role in LDL
metabolism. Trends Biochem. Sci. 32, 71–77 (2007).
14
42. Briner, A. E. et al. Guide RNA functional modules direct Cas9 activity and orthogonality.
Mol. Cell 56, 333–339 (2014).
43. Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by
CRISPR-Cas nucleases. Nat. Biotechnol. (2014). doi:10.1038/nbt.3117
44. Frock, R. L. et al. Genome-wide detection of DNA double-stranded breaks induced by
engineered nucleases. Nat. Biotechnol. (2014). doi:10.1038/nbt.3101
45. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo
generator. Genome Res. 14, 1188–1190 (2004).
46. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic
Acids Res. 31, 3406–3415 (2003).
47. Gautheret, D. & Lambert, A. Direct RNA motif definition and identification from multiple
sequence alignments using secondary structure profiles. J. Mol. Biol. 313, 1003–1011 (2001).
48. Macke, T. J. et al. RNAMotif, an RNA secondary structure definition and search algorithm.
Nucleic Acids Res. 29, 4724–4735 (2001).
49. Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using
MACS. Nat. Protoc. 7, 1728–1740 (2012).
50. Veldwijk, M. R. et al. Development and optimization of a real-time quantitative PCR-based
method for the titration of AAV-2 vector stocks. Mol. Ther. J. Am. Soc. Gene Ther. 6, 272–
278 (2002).
15
FIGURE LEGENDS
Figure 1 | Biochemical screen for small Cas9 orthologs. a, Phylogenetic tree of selected Cas9
orthologs. Subfamily and sizes (amino acids) are indicated, with nuclease domains highlighted in
colored boxes, and conserved sequences in black. b, Schematic illustration of the in vitro
cleavage-based method used to identify the first seven positions (5‟-NNNNNNN) of protospacer
adjacent motifs (PAMs). c, Consensus PAMs for eight Cas9 orthologs from sequencing of
cleaved fragments. Error bars are Bayesian 95% confidence interval45. d, Cleavage using
different orthologs and sgRNAs targeting loci bearing the putative PAMs (consensus shown in
red). Red triangles indicate cleavage fragments.
Figure 2 | Characterization of Staphylococcus aureus Cas9 (SaCas9) in 293FT cells. a,
SaCas9 sgRNA scaffold (red) and guide (blue) base-pairing at target locus (black) immediately 5‟
of PAM. b, Box-whisker plot showing indels vary depending on the length of the guide sequence
(n=4). c, dSaCas9-ChIP reveals peaks associated with seed + PAM. Text to the right indicates
the total number of peaks and percentage containing significant (FDR < 0.1) match to the guide
motif followed by NNGRRT PAMs. d, Pooled indel values for NNGRR(A), (C), (G), or (T)
PAM combinations (n=12, 21, 39, and 44 respectively).
Figure 3 | Characterization of genome-wide nuclease activity of SaCas9 and SpCas9. a,
Schematic of BLESS processing steps. b, Manhattan plots of genome-wide DSB clusters
generated by each Cas9 and sgRNA pair, with on-target loci shown above. c, Correlation
between DSB scores and indel levels for top-scoring DSB clusters. Trendlines, r2, and p-values
are calculated using ordinary least squares. d, Off-target loci from BLESS with detectable indels
through targeted deep sequencing (n=3) are shown. Heatmaps indicate DSB score (blue), motif
score from ChIP (purple), or sequence similarity score (green) for each locus. Blue triangles
indicate peak positions of BLESS signal.
Figure 4 | AAV-delivery of SaCas9 for in vivo genome editing. a, Single-vector AAV system
and experimental timeline. b, Indels at Pcsk9 targets in liver tissue following injection of AAV at
2E11 total genome copies (n=3 animals). Time course of c, serum Pcsk9 and d, total cholesterol
16
in animals (n=3 for all titers and time points, error bars show S.E.M.). e, Manhattan plots of
BLESS-identified DSB clusters in N2a cells. Inset indicates indel levels at top DSB scoring loci.
f, Indels in liver tissue (n=3 animals, error bars indicate Wilson intervals) at BLESS-identified
off-target loci. Heatmap indicates DSB scores.
Figure 5 | Liver function tests and toxicity examination in injected animals. a., Histological
analysis of the liver at 1-week post-injection by H&E stain. Scale bar = 10μm. b, Liver function
tests in Pcsk9-targeted (both Pcsk9-sg1 and Pcsk9-sg2; 2E11 total genome copies, n ≥ 4),
TBG::EGFP injected (2E11 total genome copies, n=3) , and un-injected (n=5) animals. Dashed
lines show the upper and lower ranges of normal value in mice where applicable.
Extended Data Figure 1 | Selection of Type II CRISPR-Cas loci from eight bacterial
species. a, Distribution of lengths for Cas9 >600 Cas9 orthologs19. b, Schematic of Type II
CRISPR-Cas loci and sgRNA from eight bacterial species. Spacer or “guide” sequences are
shown in blue, followed by direct repeat (gray). Predicted tracrRNAs are shown in red, and
folded based on the Constraint Generation RNA folding model46.
Extended Data Figure 2 | Cas9 ortholog cleavage pattern in vitro. Stacked bar graph indicates
the fraction of targets cleaved at 2, 3, 4, or 5-bp upstream of PAM for each Cas9 ortholog; most
Cas9s cleave stereotypically at 3-bp upstream of PAM (red triangle).
Extended Data Figure 3 | Test of Cas9 ortholog activity in 293FT cells. a, SURVEYOR
assays showing indel formation at human endogenous loci from co-transfection of Cas9
orthologs and sgRNA. PAM sequences for individual targets are shown above each lane, with
the consensus region for each PAM highlighted in red. Red triangles indicate cleaved fragments.
b, SaCas9 generates indels efficiently for a multiple targets. c, Box-whisker plot of indel
formation as a function of SaCas9 guide length L, with unaltered guides (perfect match of L
nucleotides, gray bars) or replacement of the 5‟-most base of guide with guanine (G + L -1
nucleotides, blue bars) (n = 8 guides).
Extended Data Figure 4 | Optimization of SaCas9 sgRNA scaffold in mammalian cells. a,
17
Schematic of the Staphylococcus aureus subspecies aureus CRISPR locus. b, Schematic of
SaCas9 sgRNA with 21-nt guide, crRNA repeat (gray), tetraloop (black) and tracrRNA (red).
The number of crRNA repeat to tracrRNA anti-repeat base-pairing is indicated above the gray
boxes. SaCas9 cleaves targets with varying repeat:anti-repeat lengths in c, HEK 293FT and d,
Hepa1-6 cell lines. (n=3, error bars show S.E.M.)
Extended Data Figure 5 | Genome-wide binding by Cas9-chromatin immunoprecipitation
(dCas9-ChIP). a, Unbiased identification of PAM motif for dSaCas9 and dSpCas9. Peaks were
analyzed for the best match by motif score to the guide region only within 50-nt of the peak
summit. The alignment extended for 10-nt at the 3‟ end and visualized using Weblogo. Numbers
in parentheses indicate the number of called peaks. b, Histograms show the distribution of the
peak summit relative to motif for dSaCas9 and dSpCas9. Position 1 on x-axis indicates the first
base of PAM.
Extended Data Figure 6 | Indel measurements at candidate off-target sites based on ChIP.
Indels at top off-target sites predicted by dCas9-ChIP for each Cas9 and sgRNA pair, based on
ChIP peaks ranked by sequence similarity of the genomic loci to the guide motif (heatmap in
purple), or p-value of ChIP enrichment over control (heatmap in red). Lines connect the common
targets (EMX1) and off-targets between the two Cas9s.
Extended Data Figure 7 | Analysis pipeline of sequencing data from BLESS. a, Overview of
the data analysis pipeline starting from the raw sequencing reads. Representative sequencing
read mappings and corresponding histograms of the pairwise distances between all the forward
orientation (red) reads and reverse orientation (blue) reads, displayed for representative b, DSB
hotspots and poorly-defined DSB sites and c, Cas9 induced DSBs with detectable indels.
Fraction of pairwise distances between reads overlapping by no more than 6bp (dashed vertical
line) are indicated over histogram plots.
Extended Data Figure 8 | Indel measurements at off-target sites based on DSB scores. List
of top off-target sites ranked by DSB scores for each Cas9 and sgRNA pair. Indel levels are
determined by targeted deep sequencing. Blue triangles indicate positions of peak BLESS signal,
18
and where present, PAMs and targets with sequence homology to the guide are highlighted.
Lines connect the common on-targets (EMX1) and off-targets between the two Cas9s. N.D. not
determined.
Extended Data Figure 9 | Indel measurements of top candidate off-target sites based on
sequence similarity score. Off-targets are predicted based on sequence similarity to on-target,
accounting for number and position of Watson-Crick base-pairing mismatches as previously
described22. NNGRR and NRG are used as potential PAMs for SaCas9 and SpCas9, respectively.
Lines connect the common targets (EMX1) and off-targets between the two Cas9s. Correlation
plots between indel percentages and b, prediction based on sequence similarity, c, ChIP peaks
ranked by motif similarity, or d, DSB scores for top ranking off-target loci. Trendlines, r2, and pvalues are calculated using ordinary least squares.
Extended Data Figure 10 | SaCas9 targeting Apob locus in the mouse liver. a, Schematics
illustrating the mouse Apob gene locus and the positions of the three guides tested. b,
Experimental time course and c, SURVEYOR assay showing indel formation at target loci after
intravenous injection of AAV2/8 carrying thyroxine-binding globulin (TBG) promoter-driven
SaCas9 and U6-driven guide at 2E11 total genome copies (n = 1 animal each). d, Oil-red
staining of liver tissue from AAV- or saline-injected animals. Male C56BL/6 mice were injected
at 8 weeks of age and analyzed 4 weeks post injection.
Methods
In vitro transcription and cleavage assay
Cas9 orthologs were human codon-optimized and synthesized by GenScript, and transfected into
293FT cells as described below. Whole cell lysates from 293FT cells were prepared with lysis
buffer (20 mM HEPES, 100 mM KCl, 5 mM MgCl2, 1 mM DTT, 5% glycerol, 0.1% Triton X100) supplemented with Protease Inhibitor Cocktail (Roche). T7-driven sgRNA was transcribed
in vitro using custom oligos (Supplementary Information) and HiScribe T7 In vitro Transcription
Kit (NEB), following the manufacturer's recommended protocol. The in vitro cleavage assay was
carried out as follows: for a 20 μl cleavage reaction, 10 μl of cell lysate was incubated with 2 μl
cleavage buffer (100 mM HEPES, 500 mM KCl, 25 mM MgCl2, 5 mM DTT, 25% glycerol), 1
19
μg in vitro transcribed RNA and 200 ng EcoRI-linearized pUC19 plasmid DNA or 200 ng
purified PCR amplicons from mammalian genomic DNA containing target sequence. After 30
min incubation, cleavage reactions were purified using QiaQuick Spin Columns and treated with
RNase A at final concentration of 80 ng/μl for 30 min and analyzed on a 1% Agarose E-Gel
(Life Technologies).
In vitro PAM screen
Rho-independent transcriptional termination was predicted using the ARNold terminator search
tool47,48. For the PAM library, a degenerate 7-bp sequence was cloned into a pUC19 vector. For
each ortholog, the in vitro cleavage assay was carried out as above with 1 μg T7-transcribed
sgRNA and 400 ng pUC19 with degenerate PAM. Cleaved plasmids were linearized by NheI, gel
extracted, and ligated with Illumina sequencing adaptors. Barcoded and purified DNA libraries
were quantified by Quant-iT PicoGreen dsDNA Assay Kit or Qubit 2.0 Fluorometer (Life
Technologies) and pooled in an equimolar ratio for sequencing using the Illumina MiSeq
Personal Sequencer (Life Technologies). MiSeq reads were filtered by requiring an average
Phred quality (Q score) of at least 23, as well as perfect sequence matches to barcodes. For reads
corresponding to each ortholog, the degenerate region was extracted. All extracted regions were
then grouped and analyzed with Weblogo45.
Cell culture and transfection
Human embryonic kidney 293FT (Life Technologies), Neuro-2a (N2a), and Hepa1-6 (ATCC)
cell lines were maintained in Dulbecco's modified Eagle's Medium (DMEM) supplemented with
10% FBS (HyClone), 2 mM GlutaMAX (Life Technologies), 100 U/ml penicillin, and 100 μg/ml
streptomycin at 37 °C with 5% CO2 incubation.
Cells were seeded into 24-well plates (Corning) one day prior to transfection at a density of
240,000 cells per well, and transfected at 70-80% confluency using Lipofectamine 2000 (Life
Technologies) following the manufacturer‟s recommended protocol. For each well of a 24-well
plate, a total of 500 ng DNA was used. For ChIP and BLESS, a total of 4.5 million cells are
seeded the day before transfection into a 100mm plate, and a total of 20 ug DNA was used.
20
DNA isolation from cells and tissue
Genomic DNA was extracted using the QuickExtract DNA Extraction Solution (Epicentre).
Briefly, pelleted cells were resuspended in QuickExtract solution and incubated at 65 °C for 15
min, 68 °C for 15 min, and 98 °C for 10 min8. Genomic liver DNA was extracted from bulk
tissue fragments using a microtube bead mill homogenizer (Beadbug, Denville Scientific) by
homogenizing approximately 30-50 mg of tissue in 600 μL of DPBS (Gibco). The homogenate
was then centrifuged at 2000 to 3000xg for 5 minutes at 4°C and the pellet was resuspended in
300-600 μL QuickExtract DNA Extraction Solution (Epicentre) and incubated as above.
Indel analysis and guide:target basepairing mismatch search
Indel analyses by SURVEYOR assay and targeted deep sequencing were carried out and
analyzed as previously described8,22. The methods for identification of potential off-target sites
for SpCas9 based on Watson-Crick base-pairing mismatch between guide RNA and target DNA
has been previously described22, and adapted for SaCas9 by considering NNGRR for possible
off-target PAMs.
Chromatin immunoprecipitation and analysis
Cells are passaged at 24 hours post-transfection into a 150mm dish, and fixed for ChIP
processing at 48 hours post-transfection. For each condition, 10 million cells are used for ChIP
input, following experimental protocols and analyses as previously described31 with the
following modifications: instead of pairwise peak-calling, ChIP peaks were only required to be
enriched over both „empty‟ controls (dSpCas9 only, dSaCas9 only) as well as the other
Cas9/other sgRNA sample (e.g., SpCas9/EMX-sg2 peaks must be enriched over SaCas9/EMXsg1 peaks in addition to the empty controls). This was done to avoid filtering out of real peaks
present in two related samples as much as possible.
To identify off-targets ranked by motif or sequence similarity to guide, motif scores for ChIP
peaks were calculated as follows: For a given ChIP peak, the 100-nt interval around the peak
summit, the target sequence, and a given sgRNA guide region L, the query, an alignment score is
calculated for every subsequence of length L in the target. The subsequence with the highest
score is reported as the best match to the query. For each subsequence alignment, the score
21
calculation begins at the 5‟ end of the query. For each position in the alignment, 1 is added or
subtracted for match or mismatch between the query and target, respectively. If the score
becomes negative, it is set to 0 and the calculation continued for the remainder of the alignment.
The score at the 3‟ end of the query is reported as the final score for the alignment. MACS scores
= -10log(p-value relative to the empty control) are determined as previously described49. For
unbiased determination of PAM from ChIP peaks, the peaks were analyzed for the best match by
motif score to the guide region only within 50-nt of the peak summit; the alignment was
extended for 10-nt at the 3‟ end and visualized using Weblogo45.
To calculate the motif score threshold at which FDR < 0.1 for each sample, 100-nt sequences
centered around peak summits were shuffled while preserving dinucleotide frequency. The best
match by motif score to the guide+PAM (NGG for SpCas9, NNGRRT for SaCas9) in these
shuffled sequences was then found. The score threshold for FDR < 0.1 was defined as the score
such that less than 10% of shuffled peaks had a motif score above that score threshold.
BLESS for DSB detection
Cells are harvested at 24 hours post-transfection, then processed as previously described33 with
the following alterations: a total of 10 million cells are fixed for nuclei isolation and
permeabilization, and treated with Proteinase K for 4 min at 37°C before inactivation with PMSF.
All deproteinized nuclei are used for DSB labeling with 100 mM of annealed proximal linkers
overnight. After Proteinase K digestion of labeled nuclei, chromatin are mechanically sheared
with a 26G needle before sonication (BioRuptor, 20 min on High, 50% duty cycle). 20 ug of
sheared chromatin are captured on streptavidin beads, washed, and ligated to 200 mM of distal
linker. Linker hairpins are then cleaved off with I-SceI digestion for 1 hour at 37°C, and products
PCR-enriched for 18 cycles before proceeding to library preparation with TruSeq Nano LT Kit
(Illumina). For the negative control, cells mock transfected with Lipofectamine 2000 and pUC19
DNA were parallel processed through the assay.
BLESS Analysis
Fastq files were demultiplexed, and 30-bp genomic sequences were separated from the BLESS
ligation handles for alignment. Bowtie was used to map the genomic sequences to hg19 or mm9,
22
allowing for a maximum of 2 mismatches. Following alignment, reads from all bio-replicates for
an individual sample were first pooled, and then nearest neighbor clustering was performed with
a 30-bp moving window to identify regions of enrichment across the genome. Within each
cluster, the pairwise distance was calculated between all forward and reverse read strand
mappings (Extended Data Figure 7b, c). Pairwise distance distributions were used to filter out
wide and poorly-defined DSB clusters from the well-defined DSB clusters characteristically
found at Cas9-induced cleavage sites (see Supplementary Information). Finally, we adjusted the
count of predicted Cas9-induced DSBs at a given locus by using a binomial model to calculate
the maximum-likelihood estimate (MLE) of peak enrichment in the Cas9-sgRNA treated
sgRNAs given BLESS measurements from an untreated negative control. After the MLE
calculation, a list of loci ranked by their DSB scores could be obtained and plotted (Figure 3b,
Extended Data Figure 8). Additional descriptions can be found in Supplementary Information.
The top-ranking ~30 sites from the list of Cas9 induced DSB clusters were sequenced for indel
formation (Extended Data Figure 8; validated targets in Figure 3d). Within these loci, PAMs and
regions of target homology were identified by first searching all PAM sites within a ±50 bp
window around the DSB cluster, then selecting the adjacent sequence with fewest mismatches to
the target sequence.
Code Availability
BLESS analysis code is available upon request.
Virus Production and Titration
For in-house viral production, 293FT cells (Life Technologies) were maintained as described
above in 150mm plates. For each transfection, 8 ug of pAAV8 serotype packaging plasmid, 10
ug of pDF6 helper plasmid, and 6 ug of AAV2 plasmid carrying the construct of interest were
added to 1mL of serum-free DMEM. 125 μL of PEI “Max” solution (1mg/mL, pH = 7.1) was
then added to the mixture and incubated at room temperature for 5 to 10 seconds. After
incubation, the mixture was added to 20 mL of warm maintenance media and applied to each
dish to replace the old growth media. Cells were harvested between 48h and 72h post
transfection by scraping and pelleting by centrifugation. The AAV2/8 (AAV2 ITR vectors
23
pseudo-typed with AAV8 capsid) viral particles were then purified from the pellet according to a
previously published protocol50.
High titer and purity viruses were also produced by vector core facilities at Children's Hospital
Boston and Massachusetts Eye and Ear Infirmary (MEEI). These AAV vectors were then titered
by real-time qPCR using a customized TaqMan probe against the transgene, and all viral
preparations were titer-matched across different batches and production facilities prior to
experiments. The purity of AAV vector was further verified by SDS-PAGE.
Animal Injection and Processing
All mice cohorts were maintained at animal facility with standard diet and housing following
IRB-approved protocols. AAV vector was delivered to 5-6 week old C57/BL6 mice
intravenously via lateral tail vein injection. All dosages of AAV were adjusted to 100 μL or 200
μL with sterile phosphate buffered saline (PBS), pH 7.4 (Gibco) before the injection. Animals
were not immunosuppressed or otherwise handled differently prior to injection or during the
course of the experiment except the pre-bleed fasting as noted below.
To track the serum levels of Pcsk9 and total cholesterol, animals were fasted for 12 hours prior to
blood collection by saphenous vein bleeds (no more than 100 μL or 10% of total blood volume
per week). After the blood was allowed to clot at room temperature, the serum was separated by
centrifugation and stored at -20°C for subsequent analysis. For terminal procedures to collect
liver tissue and larger serum volumes for chemistry panels, mice were euthanized by carbon
dioxide inhalation. Subsequently, blood was collected via cardiac puncture. Transcardial
perfusion with 30 mL PBS removed the remaining blood, after which liver samples were
collected. The median lobe of liver was removed and fixed in 10% neutral buffered formalin for
histological analysis, while the remaining lobes were sliced in small blocks of size less than
1x1x3mm3 and frozen for subsequent DNA or protein extraction.
Histology and serum analysis
Following tissue harvesting as described above, flash-frozen mouse liver samples were
embedded in O.C.T. compound (Tissue Tek, Cat # 4583), snap-frozen, and stored at -80°C prior
24
to processing. Frozen tissues were cryosectioned at 4-micron in thickness and stained with Oil
Red O following manufacturer‟s recommended protocol. Liver histology was assessed by H&E
staining sections of 10% neutral buffer formalin fixed liver sections.
Serum levels of Pcsk9 were determined by ELISA using the Mouse Proprotein Convertase
9/PCSK9 Quantikine ELISA Kit (MPC-900, R&D Systems), following the manufacturer‟s
instructions. Total cholesterol levels were measured using the Infinity Cholesterol Reagent
(Thermo Fisher) per the manufacturer‟s instructions. Serum ALT, AST, albumin and total
bilirubin were measured by an Olympus AU5400 (IDEXX Memphis, TN).
25
C
III
R
uv
R
uv
C
II
H
N
H
R
uv
C
I
a
Cas9
aa:
IIA - Streptococcus pyogenes SF370
IIA - Staphylococcus aureus subsp. aureus
IIA - Streptococcus thermophilus LMD-9 CRISPR1
IIA - Streptococcus pasteurianus ATCC 43144
IIC - Neisseria cinerea ATCC 14685
IIC - Campylobacter lari CF89-12
IIC - Parvibaculum lavamentivorans DS-1
IIC - Corynebacterium diphtheriae NCTC 13129
1368
1053
1121
1130
1082
1003
1037
1084
b
c
2.0
DNA substrate with randomized PAM
bits
GGGACTCAACCAAGTCATTCNNNNNNN
CCCTGAGTTGGTTCAGTAAGNNNNNNN
||||||||||||||||||||
GGGACUCAACCAAGUCAUUC
20-nt guide
sgRNA scaffold
P. lavamentivorans
1.0
0.0
G
T
C
A
AT
C
T
T
T
G
A
C
G
G
T
G
G
C. diphtheriae
1.0
C
T
C
A
A
Cas9 cleavage
2.0
bits
CCCTGAGTTGGTTCAGT
TTCNNNNNNN
AAGNNNNNNN
1.0
0.0
Purify cleaved fragments
AAGNNNNNNN
2.0
bits
Adapter ligation
Sequencing
adapter
XXXX TTCNNNNNNN
XXXX
XXXX AAGNNNNNNN
XXXX
Barcode
1.0
0.0
T
CG
A
T
A GA G
C
A
T
S. aureus
T
G
GGT
T
A
AA
T
C
T
C
G
A
C
Barcode
bits
...TTCNNNNNNN... ·
PAM position: 1 2 3 4 5 6 7
GAGGT
AA
GT T TCA
0.0
T
S.thermophilus 1
A
GAT
GAGC
A TCG
0.0
G
S. aure
–
1 2 3 4 5 6 7
AG
G
A
AG TAT
G
AG
G
A
TC
AG
TT AA
AG A
AA
A
N. cine
–
T
CCCGG
1 2 3 4 5 6 7
0.05
AA
T
TC
C
TG
G
G
AT GC
A
G
G
G
G
C
–
AT
G
A
TA ATA
G
AA
TA
–
G
TT
T
TG GTA
G
AG
TA
G
T
C GA
AG A
TG
AC
TG
G
A
AG CA
G G
TT
TG
C. diph
–
AG
TA
C
C
AT
C
AG T
C
AT
G
P. lava
–
G
AC
–
–
lysate
PAM:
T
C. lari
0.5
PAM position: 1 2 3 4 5 6 7
d
GTA
C
0.1
GG
0.0
T
C
G
S.pyogenes
1.0
T
1 2 3 4 5 6 7
0.25
G
T
TC
CG
C
T
AC
G
GA
0.0
PAM position: 1 2 3 4 5 6 7
PAM identification
·
1.0
G
T
TAA
A
C
N. cinerea
2.0
PAM position: 1 2 3 4 5 6 7
TTCNNNNNNN
Sequencing
adapter
S. pasteurianus
T
1 2 3 4 5 6 7
PAM position: 1 2 3 4 5 6 7
GGGACTCAACCAAGTCA
G
GA
GT
0.0
Cas9:
Figure 1
S. past
C. lari
S. pyog
S. ther 1
a
b
15
PAM
Indel (%, Hepa1-6)
Target
5’-...NNNNNNNNNNNNNNNNNNNNNNNGRRTNNNNNNN...3’
|||||||||||||
3’-...NNNNNNNNNNNNNNNNNNNNNNNCYYANNNNNNN...5’
|||||||||||||||||||||
C
5’NNNNNNNNNNNNNNNNNNNNNGUUUUAGUA
UCUG G
A
-10
-1 |||||||||
|||| A
guide -20
A
ACGGAACAAAAUCAU
AGAC
A
CUA
A ||||
SaCas9
A UGCCGUGUUUAUCUCGUCAA C
sgRNA scaffold
U
______‡___
U
3’- UUUUUUUAGAGCGGUU G
c
10
5
0
23 22 21 20 19 18
Guide length (nt)
d
2
seed
PAM
40
seed
2
PAM
Peaks containing
motif+PAM:
NNGRRT 67%
NNGRR 91%
1
0
-20
-19
-18
-17
-16
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
1
2
3
4
5
6
7
bits
EMX1-sg2 (12964 peaks)
Figure 2
Indel (%, 293FT)
0
Peaks containing
motif+PAM:
NNGRRT 71%
NNGRR
89%
-20
-19
-18
-17
-16
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
1
2
3
4
5
6
7
bits
EMX1-sg1 (7257 peaks)
1
30
20
10
0
(N) = A
C
G
T
Preference for NNGRR(N)
a
c
SaCas9
20
Enrich and
sequence
SpCas9
Sp on target
EMX1-sg1
SpCas9 target & PAM (20 nt guide)
SaCas9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17 18 19 20 2122
X
Y
SpCas9
15
10
Sa on target
1
94 03
0. 0
= 0.0
r2 =
p
5
00
0. .948
=
p 2 =0
r
5
Sp on target
0
0
5
10
15
DSB score
SaCas9 target & PAM (21nt guide)
20
EMX1-sg2 TGGCCAGGCTTTGGGGAGGCCTGGAGT
Sa on target
EMX1-sg2
Sa on target
SpCas9 target & PAM (20 nt guide)
SaCas9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 17 18 19 20 2122
X
Y
SpCas9
15
Sp on target
-4
<
p
10
2
r
10 .989
0
=
53
0.7 -4
r2 = 10
p=
5
0
0
10
20
30
40
DSB score
Sp on target
EMX1-sg2
rit
S
Target
NNGRRN
GGCCTCCCCAAAGCCTGGCCA GGGAGT
TGGCCAGGCTTTGGGG-AGGC-C- TGGAGT
GACCTCCCCATAGCCTGGCCA GGGAGG
TGGCCGGGCCTGGGGG-AGGC-C- GAGGGG
GGCCTGCCCAAGGCCTGACCA AGGGAA
AAACCAAGCCTTGGGGTAGGC-C- AGGAAC
GGCCT-CCCAAAGCCAGGCCA GGGGGA
AGACCTGGCCATAGGG-TGGC-CA GGGAGG
GGAGGCCCCGAAGCCTGGCCA CTGGGA
TGTCCAGGCTCTGGGG-AGGC-C- CTGGGG
ila
20
m
15
P
10
Si
5
ES
0
y
Indel (%, 293FT)
hI
ila
m
P
Si
hI
C
BL
NNGRRN
Target
ES
S
rit
y
EMX1-sg1
C
d
BL
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
Label
distal end
SaCas9 target & PAM (21nt guide)
EMX1-sg1 GGCCTCCCCAAAGCCTGGCCAGGGAGT
Sa on target
Indel (%, 293FT)
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
Shear and
capture
Indel (%, 293FT)
DSB score
DSB score
b
DSB
proximal-labeling
Deproteinize
nuclei
Fix cells
Indel (%, 293FT)
0
5
10
0
5
10
15
20
TGGCTGGGCCTTGGGG-AAGC-C- CAGAGG
CTGCCAGGCTCTGGGG-AGGC-C- AAGGGC
NRG
Target
0
5
10
15
20
Target
NRG
GCCTCCCCAAAGCCTGGCCA GGG
GGCCAGG-CTTTGGGGAGGC-C- TGG
ACCTCCCCATAGCCTGGCCA GGG
GGCCAGG-CTCTGAGGAGGC-C- AGG
TCCTCCCCACAGCCTGGCCA GGG
AGCCAGG-CTATGAGGAGGC-C- AGG
TCCTCCCCAGAGCCTGGCCA TGG
GGCCAGG-CTTCTGGGAGGCAC- TGG
ACCTCCCCACAGCCAGGCCA CGG
GGACAGG-CTTTGGGGAGGC-A- CGG
GCCTTCCCAAAGCCCGGCCA TGG
AGCCAGGCCCCTGGGGAGGC-C- GGG
GCCCAGG-CTTTGGGGAGGC-CC TGG
0.1
1
GGCTAGG-CTATGGGGAGAC-C- TGG
10
SaCas9
SpCas9
DSB score (BLESS)
0
10
20
0.01
0.1
1
10
AGCCAGG-TCTTGGGAAGGC-C- GGG
AGACAGG-CTCTGGGGAGGC-C- TGG
GGCCAGG-GGTTGGGGAGGC-C- TGG
Motif score (ChIP)
Sequence similarity score
GCCCAGG-CTTTGGGGAACC-C- TGG
GCCCAGG-CATTAGGGAGGC-C- TGG
Figure 3
15
20
Thyroxine-binding globulin
bGHpA
(TBG) promoter
guide
U6
SaCas9
ITR
b
in vitro target
(Week 2)
(Week 3)
(Weeks 4 — )
validation
virus production tail vein injection tissue analysis
ITR
sgRNA
< 4.7 kb
c
% of control serum Pcsk9
Indel (%, liver)
60
40
20
Pcsk9-sg1, 2E11
Pcsk9-sg2, 2E11
300
d
AAV titer
2E11
Rosa-sg1
0.5E11
1E11
2E11
4E11
Pcsk9-sg1
200
100
**
all
titers
0
0
0
7
14
21
-13
28
-9
Days post injection
-6
-2
7
14
21
uninjected
Rosa-sg1, 2E11
300
Serum cholesterol (mg/dL)
a
Pcsk9-sg1
0.5E11 2E11
4E11
1E11
200
n.s.
**
all
titers
100
0
0
28
7
14
21
28
Days post injection
Days post injection
3
2
100
10
1
0.1
0.01
0.1
1
DSB score
1
0
1
2
3
4
5
6
7
8
4
On target
10
9 10 11 12 13 14 15 16 17 18 19 X Y
On target
2
1
0
1
2
3
4
5
6
7
8
Indel (%, liver)
0 10 20 30 40 50
ES
BL
ES
BL
AGCGGCCACCGCAGCCACGCAGAGCAGTGGGTGCCCAT
TCCCGGCCGCTGACCACACCTGCCAGGTGGGTGCCGTG
TCGAGGCATGTGCCACCAGGCCACCACATTAAAGTTTT
GGTCCTGAGCTCCTGGAGGGGGGGGGCTGGATTAGGAA
AAGCTCCTCCTAGCAGCTGGCCAGGATGCCTAAGCATC
CTGAGAATAAGACAGTTGTGCAACACGAGTCAGTTAGC
CTGGTGGGCCGCGCGTCAATACCAATAATAACCACAAG
TTGTGTATGTGTTCATTTAAACATGCATGCATGCACAC
CTTCCAGTGAGTGGTAGAAGTAGCCAACTGCCATGCTA
TGAAATAATGCATTCAAATGCTACCTCTAAAAAATATC
GTTCAAGGTAGCTTCGTAGCAGCTTATGAGTGCCTTGC
CAGATGAGCCATCTTTCCAGGCCCAGAAACAGTATATT
AGACCAGAGCAACTTGAGATCATTACCGGATGGGGTTA
TGTGCGCTCTTCGGCTGTCCGCCCGCGGTCCACAGCCA
TCTTGCCAGCATATCACACCACCCAGCCCACGTCGGCG
TGAGCAGTGTGTGTGAGTAGACAGCGAGTGAGCAGTGT
0.1
1
10
Pcsk9-sg1
DSB score (BLESS)
Figure 4
Pcsk9-sg2
100
On target
10
1
0.1
0.01
0.1
1
DSB score
10
9 10 11 12 13 14 15 16 17 18 19 X Y
S
f
Pcsk9-sg2
3
S
On target
Indel (%, N2a)
Pcsk9-sg1
DSB score (N2a)
DSB score (N2a)
4
Indel (%, N2a)
e
Indel (%, liver)
0 10 20 30 40 50
AAV: EGFP
AAV: Pcsk9-sg1
10+m
Pcsk9-targeted
EGFP
Uninjected
60
100
0
40
20
7
14
28
Figure 5
0
10+m
6
Albumin (g/dL)
200
10+m
80
300
ALT (IU/L)
AST (IU/L)
b
Uninjected
7
14
1.0
Total bilirubin (mg/dL)
a
4
2
0
28
7
Days post injection
14
28
0.8
0.6
0.4
0.2
0.0
7
14
28