

Cas3 is a helicase that typically also contains a nuclease domain and is involved in target cleavage in type I systems ( 23– 27). Specifically, Cas1 and Cas2 form the adaptation complex that is universal to all autonomous CRISPR-Cas systems (many type III loci and a few in other types lack the adaptation module and apparently rely on the adaptation machinery of other CRISPR-Cas systems in the same organism which they recruit in trans) ( 6, 7, 16– 22). The biochemical activities and biological functions of the 13 families of core Cas proteins that are essential for each of the three stages of the CRISPR immune response in different types of CRISPR-Cas have been extensively studied, although some notable gaps in knowledge remain ( 4, 14, 15). At the final, interference stage, the crRNA bound to Cas proteins is employed as the guide to recognize the protospacer or a closely similar sequence in an invading genome of a virus or plasmid that is then cleaved and inactivated by Cas nuclease(s) ( 9, 10). At the expression-maturation stage, the CRISPR array is typically transcribed into precrRNA that is then processed into mature crRNAs, each consisting of a spacer and a portion of an adjacent repeat, by a distinct complex of Cas proteins or a single, large Cas protein, or an external, non-Cas RNase ( 8). The adaptation process creates immune memory, that is, “vaccinates” a bacterium or archaeon against subsequent infection with the memorized agent. At the adaptation stage, a distinct complex of Cas proteins binds to a target DNA and, typically after recognizing a short (2–4 bp) motif known as protospacer-adjacent motif (PAM), excises a portion of the target DNA (protospacer), and inserts it into the CRISPR array (most often, at the beginning of the array, downstream of the leader sequence) as a spacer ( 6, 7). The CRISPR-Cas immune response is conventionally described in terms of three distinct stages: ( i) adaptation, ( ii) expression and maturation of CRISPR (cr) RNA, and ( iii) interference. These predictions provide ample material for improving annotation of CRISPR- cas loci and experimental characterization of previously unsuspected aspects of CRISPR-Cas system functionality.ĭriven largely by the exceptional recent success of Cas9, Cas12, and Cas13 RNA-guided nucleases as the new generation of genome and transcriptome editing tools, comparative genomics, structures, biochemical activities, and biological functions of CRISPR-Cas systems and individual Cas proteins have been studied in exquisite detail ( 1– 5). Numerous candidate CRISPR-linked genes encode integral membrane proteins suggestive of tight membrane association of CRISPR-Cas systems, whereas many others encode proteins implicated in various signal transduction pathways. A substantial majority of these CRISPR-linked genes reside in type III CRISPR- cas loci, which implies exceptional functional versatility of type III systems. We describe additional criteria to predict functionally relevance for genes in the candidate set and identify 79 genes as strong candidates for functional association with CRISPR-Cas systems. Uncharacterized genes with CRISPRicity values comparable to those of cas genes are considered candidate CRISPR-linked genes. The approach is based on a “CRISPRicity” metric that measures the strength of CRISPR association for all protein-coding genes from sequenced bacterial and archaeal genomes. We developed a computational strategy for systematically detecting genes that are likely to be functionally linked to CRISPR-Cas. Some of these have been shown to perform various ancillary roles in CRISPR response, but the functional relevance of most remains unknown. In addition to the relatively small set of core cas genes that are typically present in all CRISPR-Cas systems of a given (sub)type and are essential for the defense function, numerous genes occur in CRISPR- cas loci only sporadically. The CRISPR-Cas systems of bacterial and archaeal adaptive immunity consist of direct repeat arrays separated by unique spacers and multiple CRISPR-associated ( cas) genes encoding proteins that mediate all stages of the CRISPR response.
