Home > DBCLS > GGRNA Search Examples (DNA sequences)

GGRNA Search Examples (DNA sequences)

  • 2012-04-22 (日) 16:59

One of the strength of GGRNA is the ultrafast search of nucleotide and amino-acid sequences. Let’s learn how to search them.

In silico PCR: search for PCR primer binding sites

Have you ever thought “I just want to use the exact primers in this paper for PCR,” or “Oops I can’t remember which region my primers amplify?” Here’s how GGRNA can help.

Let’s say you want to use the following PCR primers:


Enter [ CTAGCTGCCAAAGAAGGACAT  comp:CAATGAGATGTTGTCGTGCTC ] to the search box (→ GGRNA). Since reverse primer is designed against complementary sequence, add comp: operator to let GGRNA to search for complementary sequences.

In this case, two transcript variants of NFκB (NM_001165412.1, NM_003998.3) are retrieved. Let’s take a look at the first one (NM_001165412.1).

Look at the nucleotides highlighted with green: below the highlighted letters, you can see “position 2328 2547,” indicating the start positions of the sequences matched with the queried primers. You can calculate the size of the amplified product by using these numbers: 2547 – 2328 + 21 = 240 (bp), where 21 is the length of the reverse primer. You can also see (CDS: 468 – 3374) at the right side of “position,” which indicates the corresponding CDS region. It means that the primers used here are designed within CDS.

Now try to find out the length of the second (NM_003998.3) amplified products — yes, it’s 2550 – 2331 + 21 = 240 (bp)! This NFκB transcript variant seems to give products with the same size.

Let’s have a close look at the primer binding sites. Click the title of the first hit, “Homo sapiens nuclear factor of kappa …,” and the details of the amplified product are displayed, including the primer binding sites highlighted.

By the way, if you do the same search in UCSC In-Silico PCR, which is a famous and conventional web service to search for fragments amplified by the entered primers, it’ll come up with only one result with the size of 692 bp.

This discrepancy is attributable to the fact that UCSC service searches against genome, while GGRNA searches against transcripts. The above NFκB primers are designed to sandwich a 452 bp intron, so when you use genome DNA as a template, the product will be 240 + 452 = 692 (bp). When designing primers, sandwiching an intron is a good way: since the products derived from cDNA template and genomic DNA template will give different lengths, you can detect genomic DNA contamination in your template by simply looking at the product’s uniformity.

Search for nucleotide sequences included in figures

GGRNA works great for searching for nucleotide sequences appearing in figures of articles.

[Rajewsky et al. (2006) microRNA target predictions in animals. Nature Genetics 38, S8 – S13]

Left RNA strand shows the partial 3′ UTR sequence of myotrophin, which is the target of mice miR-375. Try searching with its partial sequence using GGRNA by selecting Mus musculus (mouse) and entering [ GUUGCAAGA ] to the search box (→ GGRNA). But it returns 322 hits, which is too much. So narrow them down by entering longer fragment [ GUUGCAAGAACAAA ] (→ GGRNA), and you’ll get 1 hit. Be aware that GGRNA treats U and T identically.

The sequence matched at position 3763, and CDS region is 279 – 635, meaning that miR-375 matches to farther downstream in 3′ UTR.

FYI, target sequence of miR-375 shown on the right side in the figure can be retrieved by entering about 13 letters, [ UUUGUUCGUUCGG ] (→ GGRNA).


[Yekta et al. (2004) MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596]

Let’s search using the letters in black background [ CCAACAACAUGAAACUGCCUA ] (in human, mouse, or rat as you like) (→ GGRNA) and it’ll return position 1379 of HOXB8 (NM_024016.3) (CDS: 236 – 967), confirming that they indeed match to 3′ UTR.

For your information, search operators such as comp: for searching for complementary sequences, both: for searching for both strands, seq1:, seq2:, and seq3: for the search allowing 1, 2, or 3 mismatches respectively, are available.

Search for siRNA off-target transcripts

In mammalian RNAi, 21 nt double-stranded short interfering RNA (siRNA) is used. However, if an siRNA sequence resembles to another unrelated gene, it may unexpectedly suppress that gene. This is referred to as an off-target effect. In siDirect (website for designing functional siRNA with reduced off-target effect, which I have launched), designed siRNA sequences (19-mer sequence on the guide strand counting from 5′ positions 2 to 20) are used for homology search and it returns a list of genes homologous to the sequences (with up to 3 mismatches). By the way, why I use the 19-mer rather than the full length (positions 1 to 21) is that when RNAi takes place, the nucleotide at the 5′ end of the guide strand is in the pocket of Mid-domain in Argonaute protein, and the nucleotide at the 3′ end binds to PAZ domain, making them irrelevant for target mRNA recognition. The number of mismatches enough to avoid off-target effect has not been determined yet: 1 mismatch is definitely not enough, and as the number of mismatches increase, the risk of off-target effect decreases. From the bioinformatics analysis, it is possible to design siRNAs that have at least 3 mismatches to any other unrelated genes than the target sequence (for approximately 10% of the entire gene), but it is hardly possible to design them with at least 4 mismatches.

Here we use the below siRNA sequence to search for homologous genes using GGRNA. This siRNA is designed to target a gene called claudin 17.

What we want to do here is to search for sequences that hybridize with 5′-AGAACUUGCAUUGCAACCG-3’, which is the 19-mer of the siRNA guide strand 5′-UAGAACUUGCAUUGCAACCGG-3′ (both ends removed), so let’s start with entering [ comp:AGAACUUGCAUUGCAACCG ] to the search box (→ GGRNA).

Claudin 17 (CLDN17; NM_012131.2), the target gene of this siRNA is the only result retrieved. Now let’s try again with allowing mismatches.

  • allows up to 1 mismatch → [ comp1:AGAACUUGCAUUGCAACCG ] (→ GGRNA)
  • allows up to 2 mismatches → [ comp2:AGAACUUGCAUUGCAACCG ] (→ GGRNA)
  • allows up to 3 mismatches → [ comp3:AGAACUUGCAUUGCAACCG ] (→ GGRNA)

When we tolerated 3 mismatches, 3 other hits are finally retrieved. siDirect will return the results shown below, and the difference between GGRNA and siDirect comes from the versions of the sequence database used (GGRNA uses newer version of the database). The mismatched positions are better visualized in siDirect than in GGRNA: I’m planning to update GGRNA to display the mismatched nucleotides in different color like siDirect near future.

Of course, passenger strand can cause off-target effect as well as guide strand, so the passenger strand should also be searched. From the passenger strand 5′-GGUUGCAAUGCAAGUUCUAUA-3′, remove the nucleotides at both ends and search for sequences hybridizing with 5′-GUUGCAAUGCAAGUUCUAU-3′:

  • perfect match → [ comp:GUUGCAAUGCAAGUUCUAU ] (→ GGRNA): no hit
  • up to 1 mismatch → [ comp1:GUUGCAAUGCAAGUUCUAU ] (→ GGRNA): no hit
  • up to 2 mismatches → [ comp2:GUUGCAAUGCAAGUUCUAU ] (→ GGRNA): no hit
  • up to 3 mismatches → [ comp3:GUUGCAAUGCAAGUUCUAU ] (→ GGRNA): 5 hits

Here 3 mismatches had to be tolerated to return off-target candidates.

It is very rare to have few hits when you search 19-mer with 3 mismatches allowed. Thus the siRNA in the above example seems highly specific in terms of nucleotide sequence.

By the way, ‘seed’ sequences (7-mer nucleotides in positions 2 to 8 of guide strand) with higher Tm also have the risk of exerting off-target effect to genes whose 3′ UTR contain sequences identical to the seed. Please refer to the below paper or TogoTV guide for siDirect for further information. To design siRNAs with less off-target effect, you have to start with choosing sequences with lower seed Tm.

  • Naito et al. (2009) siDirect 2.0: updated software for designing functional siRNA with reduced seed-dependent off-target effect. BMC Bioinformatics 10, 392 → full text
  • TogoTV: Designing siRNA with siDirect: 2011 (in Japanese only)

Search for nucleotide sequences using microarray probe ID

As I have shown in the entry on 2/Jun/2011 “Nucleotide search using microarry probe ID” (in Japanese), when you enter microarray probe IDs, GGRNA will search for genes using the nucleotide sequences corresponding to the probe IDs. It also specifies the positions where probes hybridize.

In particular, Affymetrix microarray uses eleven 25-mer perfect match (PM) probes, collectively called ‘probeset,’ to recognize single transcript. As shown below, Affymetrix also provides mismatch (MM) probes at the same positions as the 11 PM probes for background, but it is not used as often as before.

When probeset ID is entered in GGRNA like [ 1552311_a_at ] (→ GGRNA), GGRNA converts probeset ID into nucleotide sequences to perform search with:


The search returns RAX2 (NM_032753) gene. Click the title of the hit…

…and you will see that the 11 oligonucleotides hybridize with sequences close to 3′ end. Nucleotides with overlapping hits are shown in dark green.

Meanwhile, Agilent microarray uses one 60-mer probe to recognize single transcript. For example, searching with [ A_23_P101434 ] will return the following result (→ GGRNA):

Search for binding motifs of RNA-binding protein

Degenerate motifs recoognized by RNA binding proteins can be searched using IUB codes (e.g., N, R, Y). For example, search for mRNA having PUM binding site UGUANAUA by entering [ iub:UGUANAUA ] to the search box (→ GGRNA: will take about 10 secs).

The search returns 9,720 hits. You can narrow them down by entering other keywords, or use the tab-delimited format provided at the bottom of the page for further analyses using other softwares.


Trackback URL for this entry
Listed below are links to weblogs that reference
GGRNA Search Examples (DNA sequences) from mesoの実験ノート

Home > DBCLS > GGRNA Search Examples (DNA sequences)


Return to page top