Optimizing CRISPR Genome Editing with AI.

Introduction

Over the last decade, CRISPR-Cas9 has emerged as a powerful and accessible genome editing tool, revolutionizing how researchers manipulate DNA in a wide range of organisms. Originally discovered as a bacterial defense mechanism against viruses, CRISPR has since been repurposed to enable targeted modifications of the genome — from basic gene knockouts to precise genetic corrections.(1,2,3) While the underlying principles of CRISPR are becoming more familiar to scientists and the public alike, successfully applying the technology requires more than just understanding its mechanism. A crucial step in any CRISPR experiment is the design of the guide RNA (gRNA) the short RNA sequence that directs Cas9 to the correct location in the genome. (3,4) Not all gRNAs are equally effective or safe. Some efficiently cut only the intended target (“on-target”), while others may cause unintended edits elsewhere in the genome (“off-target”). This makes careful selection and validation of gRNAs essential for any genome editing experiment. (5,6)

In this article, we first explain the fundamentals of how CRISPR-Cas9 works. Then, we walk through a typical workflow for designing and evaluating gRNAs using two modern tools: GuideScan2 (7), for data-driven gRNA design and off-target prediction, and CRISPRon (8), for estimating the on-target cutting efficiency. Together, these tools allow researchers to make informed decisions when designing CRISPR experiments — increasing accuracy and reducing unintended effects.

What is CRISPR-Cas9?

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and Cas9 (CRISPR-associated protein 9) originated as a prokaryotic adaptive immune system. This system enables bacteria to defend themselves against viral infections by capturing and integrating short sequences of viral DNA into their genome, storing them between palindromic repeat sequences. These integrated DNA fragments, known as “spacers,” serve as a molecular memory of past infections. When the same virus invades again, the bacterial cell can recognize and neutralize it using this stored information effectively acting like a molecular mugshot system for viruses. (9,10)

Mechanism of Action (10)

  1. Acquisition – Upon viral infection, a segment of the viral DNA is excised and integrated into the bacterial genome at the CRISPR locus.
  2. Expression – This CRISPR DNA is transcribed into a precursor RNA and processed into short CRISPR RNAs (crRNAs). These are then complexed with a trans-activating CRISPR RNA (tracrRNA).
  3. Interference – The crRNA–tracrRNA complex binds to the Cas9 protein, forming a ribonucleoprotein complex. This complex scans the cell’s DNA for sequences complementary to the crRNA and adjacent to a specific motif (the PAM sequence).
  4. Cleavage – Upon finding a match, Cas9 induces a double-stranded break (DSB) in the foreign DNA, neutralizing the threat.

This natural mechanism inspired researchers to repurpose CRISPR-Cas9 as a programmable genome editing tool.

From bacteria to human genome editing

Picture made in Biorender

While CRISPR-Cas9 originally evolved in bacteria, groundbreaking research in 2012 demonstrated that the system could be adapted for use in eukaryotic cells, including human cells. This breakthrough opened the door to targeted genome editing across a wide range of organisms and applications.

Applications of CRISPR

  • Medicine & Therapeutics: Targeting genetic diseases, developing gene therapies.
  • Neuroscience: Studying gene function in brain development and disorders.
  • Drug Discovery: Creating cell lines for high-throughput screening.
  • Agricultural Biotechnology: Engineering crops for improved yield or resistance.
  • Animal Model Engineering: Generating transgenic models for biomedical research.

Case: Potato Acrylamide Reduction (11)

In Australia, researchers used CRISPR-Cas9 to silence genes in potatoes responsible for producing acrylamide — a potentially carcinogenic compound formed when cold-stored potatoes are fried. This intervention led to an 80% reduction in acrylamide formation, demonstrating the tool’s potential in food safety.

CRISPR Workflow (9)

  • Step 1: gRNA Design

    The gRNA is a 20-nucleotide sequence that directs Cas9 to the target genomic site. It must be complementary to the target DNA and precede a PAM sequence. This step is crucial, as the gRNA determines specificity and efficiency.

  • Step 2: PAM recognition

    Cas9 from Streptococcus pyogenes recognizes the PAM sequence “NGG” (any nucleotide followed by two guanines). Without this motif, Cas9 will not bind or cleave the DNA — providing an intrinsic safeguard.

  • Step 3: DNA Cleavage

    When the gRNA matches the target and a PAM is present, Cas9 undergoes a conformational change and becomes catalytically active. It creates a double-stranded break (DSB) at the precise location.

  • Step 4: Cellular Repair

    • Non-Homologous End Joining (NHEJ): Fast but error-prone, often leading to insertions or deletions (indels) that disrupt gene function → ideal for gene knockouts.
    • Homology-Directed Repair (HDR): Precise but less efficient; requires a donor DNA template to introduce specific changes.

Optimisation and quality control

Not all gRNAs perform equally. Off-target effects — unintended cuts in the genome — are a major concern. Advances in bioinformatics tools, such as GuideScan2 (7), enable data-driven gRNA selection, balancing on-target efficiency with minimal off-target risk. Delivery of CRISPR components into cells (via plasmids, RNP complexes, or viral vectors) is another key step. Post-editing, researchers must validate the results using: - PCR and DNA sequencing: To confirm the intended modification. - Computational analysis: To assess editing efficiency and detect off-target effects.

Use Case: BRCA1 Gene

CRISPR-Cas9 gRNA design/validation workflow

To demonstrate how to design and evaluate guide RNAs (gRNAs) using GuideScan2 (7) and CRISPRon (8), we use the BRCA1 gene as an example. BRCA1 (Breast Cancer 1) is a well-known tumor suppressor gene involved in DNA repair, and mutations in this gene are associated with an increased risk of breast and ovarian cancer. (12) Due to its clinical relevance and well-characterized sequence, BRCA1 is often used as a model in genome editing studies. However, the workflow presented here is not limited to BRCA1. The same pipeline can be applied to any gene of interest in any organism with a reference genome, making this approach widely applicable in both research and biotechnology settings.

Getting a FASTA file from Ensemble

  1. Go to ensembl.org
  2. Search the desired gene in the search bar above
  3. Search results will pop up
  4. Choose the one from the desired species
  5. Once clicked, search on the left for sequence
  6. There you can click the Download sequence button and download the FASTA file

GuideScan2 (7) Setup (Summary)

  1. Install Miniconda

  2. Create environment:

    conda create -n guidescan2 python=3.9 -y
  3. Add channels and install:

    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge
    conda install guidescan
  4. Activate:

    conda activate guidescan2
  5. Clone CLI:

    git clone https://github.com/pritykinlab/guidescan-cli.git
  6. Find guidescan-cli:

    pwd
  7. Navigate to guidescan cli:

    cd <OUTPUT pwd> 

Generate gRNAs

When running following command a csv file will be created in the guidescan-cli folder:

python scripts/generate_kmers.py <FASTA FILE ensemble> > <Name output file>

The output should like this: You can customize PAM, gRNA length, and ID prefix.

python scripts/generate_kmers.py <FASTA FILE ensemble> --pam NG --kmer-length 23 --prefix gRNA > <Name output file>

With the specific changes the output now look like this:

Specificity Scores

Now we want to know the specificity score of this guideRNA but before we could do that we need the index file of our species this is a very time consuming step but luckily guidescan2 website contains some pre indexed species.

  • When specie is not in that list you need to download the FASTA file from the specie and use the following command

    guidescan index <Fasta file from the species>

To determine the specificity scores run:

guidescan enumerate -i <INDEX> -f <gRNAs.csv> --format csv -o <Filename.csv> -n <threads>

For example:

guidescan enumerate -i hg38_index/GCF_00001405_GRCh38.p13_genomix.fna.index -f grnas_brca1.csv --format csv  -o grnas_offtarget_brca1.csv -n 8

Loading will take a time and should like this:

Use CRISPRon (8)

  1. Go to the Crispron website
  2. Fill out the form by using the ensembl code (ENS…)
  3. After filling out your email en job name click on submit
  4. Scroll down and choose all targets and click on the download button
  5. Move the csv file to the guidescan-cli folder

Merge Results

Use the provided following Python script to merge GuideScan and CRISPRon output:

import sys
import pandas as pd
import os

if len(sys.argv) != 3:
    print("Use: python merge_by_20nt.py <ontarget.csv> <offtarget.csv>")
    sys.exit(1)

ontarget_path = sys.argv[1]
offtarget_path = sys.argv[2]

# Loading CSV-files
ontarget_df = pd.read_csv(ontarget_path)
offtarget_df = pd.read_csv(offtarget_path)

# Extract first 20 nt
ontarget_df["20nt"] = ontarget_df["target+PAM"].astype(str).str[:20]
offtarget_df["20nt"] = offtarget_df["sequence"].astype(str).str[:20]

# Merge 20nt
merged_df = pd.merge(ontarget_df, offtarget_df, on="20nt", suffixes=('_ontarget', '_offtarget'))

# Rename columns
merged_df = merged_df.rename(columns={
    "id_offtarget": "Id",
    "Eff.(%)": "On target eff. (%)"
})

# Desired order (with new names)
desired_columns = [
    "Id",
    "match_chrm",
    "match_position",
    "match_strand",
    "target+PAM",
    "On target eff. (%)",
    "specificity_offtarget",
    "match_distance",
    "match_sequence",
    "rna_bulges",
    "dna_bulges"
]

# Filter and rearrange columns
merged_df = merged_df[[col for col in desired_columns if col in merged_df.columns]]

# Save
output_name = "merged.csv"
merged_df.to_csv(output_name, index=False)

print(f"✅ File saved as: {output_name}")
python <Name python script> <Name on target csv file> <name off target csv file> 

In the newly generated csv file you can organise or sort the gRNAs to your liking.

Conclusion

CRISPR-Cas9 has redefined the landscape of genome editing making it faster, more precise, and more accessible than ever before. Yet, the true power of this technology lies in thoughtful implementation: designing the right guide RNAs, minimizing off-target risks, and choosing the most effective strategies for each context. By combining tools like GuideScan2 and CRISPRon, researchers and innovators can confidently navigate the complexity of gRNA design, whether working on disease models, therapeutics, or agricultural improvements. At eCellula, we specialize in building tailored, data-driven CRISPR workflows to help you move from concept to results with clarity and efficiency. Whether you’re just starting a project or refining an existing pipeline, we’re happy to think along with you. Do not hesitate to contact us.

References:

  1. Prillaman, M. (2024, June 10). What is CRISPR? A bioengineer explains | Stanford Report. Stanford Report. https://news.stanford.edu/stories/2024/06/stanford-explainer-crispr-gene-editing-and-beyond
  2. Redman, M., King, A., Watson, C., & King, D. (2016). What is CRISPR/Cas9? Archives of Disease in Childhood - Education and Practice, 101(4), 213–215. https://doi.org/10.1136/ARCHDISCHILD-2016-310459
  3. Wang, J. Y., & Doudna, J. A. (2023). CRISPR technology: A decade of genome editing is only the beginning. Science, 379(6629). https://doi.org/10.1126/SCIENCE.ADD8643/ASSET/F32937F7-FE8D-4FDA-B0EC-7DB7B79A1B94/ASSETS/IMAGES/LARGE/SCIENCE.ADD8643-F4.JPG
  4. Riesenberg, S., Helmbrecht, N., Kanis, P., Maricic, T., & Pääbo, S. (2022). Improved gRNA secondary structures allow editing of target sites resistant to CRISPR-Cas9 cleavage. Nature Communications, 13(1). https://doi.org/10.1038/s41467-022-28137-7 5.Anthon, C., Corsi, G. I., & Gorodkin, J. (2022). CRISPRon/off: CRISPR/Cas9 on- and off-target gRNA design. Bioinformatics, 38(24). https://doi.org/10.1093/bioinformatics/btac697
  5. Manghwar, H., Li, B., Ding, X., Hussain, A., Lindsey, K., Zhang, X., & Jin, S. (2020). CRISPR/Cas Systems in Genome Editing: Methodologies and Tools for sgRNA Design, Off-Target Evaluation, and Strategies to Mitigate Off-Target Effects. In Advanced Science (Vol. 7, Issue 6). https://doi.org/10.1002/advs.201902312
  6. Schmidt, H., Zhang, M., Mourelatos, H., Sánchez-Rivera, F. J., Lowe, S. W., Ventura, A., Leslie, C. S., & Pritykin, Y. (2022). Genome-wide CRISPR guide RNA design and specificity analysis with GuideScan2. BioRxiv.
  7. Xiang, X., Corsi, G. I., Anthon, C., Qu, K., Pan, X., Liang, X., Han, P., Dong, Z., Liu, L., Zhong, J., Ma, T., Wang, J., Zhang, X., Jiang, H., Xu, F., Liu, X., Xu, X., Wang, J., Yang, H., … Luo, Y. (2021). Enhancing CRISPR-Cas9 gRNA efficiency prediction by data integration and deep learning. Nature Communications, 12(1). https://doi.org/10.1038/s41467-021-23576-0
  8. Doudna, J. A., & Charpentier, E. (2014). The new frontier of genome engineering with CRISPR-Cas9. In Science (Vol. 346, Issue 6213). https://doi.org/10.1126/science.1258096
  9. Jiang, F., & Doudna, J. A. (2017). CRISPR-Cas9 Structures and Mechanisms. In Annual Review of Biophysics (Vol. 46). https://doi.org/10.1146/annurev-biophys-062215-010822
  10. Riesenberg, S., Helmbrecht, N., Kanis, P., Maricic, T., & Pääbo, S. (2022). Improved gRNA secondary structures allow editing of target sites resistant to CRISPR-Cas9 cleavage. Nature Communications, 13(1). https://doi.org/10.1038/s41467-022-28137-7
  11. Manghwar, H., Li, B., Ding, X., Hussain, A., Lindsey, K., Zhang, X., & Jin, S. (2020). CRISPR/Cas Systems in Genome Editing: Methodologies and Tools for sgRNA Design, Off-Target Evaluation, and Strategies to Mitigate Off-Target Effects. In Advanced Science (Vol. 7, Issue 6). https://doi.org/10.1002/advs.201902312

Stay up to date with eCellula

Subscribe to our newsletter for deep dives into real world bioinformatics.

Email Icon
orRequest a quote