RSmatch: aligning RNA secondary structures and finding RNA motifs


Introduction

Many ribonucleic acids (RNAs) play important roles in gene regulation, including non-coding RNAs and cis elements in mRNAs. Some of their functions are attributable to the structure they adopt, which are also called RNA motifs. Like sequence elements, RNA structure elements can be identified by comparing RNAs containing similar structures. The RSmatch package is designed to provide a light-weight approach to compare RNA structures, thereby uncovering functional structure elements. Compared with other tools for RNA structure comparison, RSmatch is fast, requiring quadratic time determined by the sizes of two given structures. 

RSmatch uses two scoring schemes, i.e. position independent and position dependent schemes. The position independent scheme entails two scoring matrices, one for single-stranded regions and the other for double-stranded regions. This scoring scheme is used in pair-wise comparisons and database searches.  The position dependent scheme, also known as profile, scores individual structure positions and is used by the multiple structure alignment and iterative database search functions. RSmatch provides both global and local alignment options even though the latter is more useful in most cases. In addition, RSmatch can take pattern-based structures as input. Please check the following publication for details:

Liu., J., Wang, J.T., Hu, J., and Tian, B. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics 2005, 6:89

In the current version (1.0), RSmatch provides four functions: (1) regular database search, (2) multiple structure alignment, (3) iterative database search, and (4) pair-wise structure alignment.

  1. For a regular database search, the package finds RNA structures in a database that locally or globally match a given query structure.  This function can also be used to detect motif occurrences in an RNA structure database when the query structure is a known motif with a defined pattern.
  2. For multiple structure alignment, RSmatch constructs a multiple alignment for a given set of RNA structures by progressively expanding the alignment one at a time. This function is a useful  when a small set of RNAs are functionally related by a shared motif. 
  3. For iterative database search, RSmatch is able to continuously conduct database searches using a position-specific scoring matrix and update the matrix using the latest result. This function could be much more sensitive than the regular database search, but at the cost of computing time.
  4. For pair-wise structure alignment, RSmatch can take sequences, which are subsequently  folded by the Vienna RNA package and compared by RSmatch functions. RSmatch can also use a sliding window method to fold different regions of the input RNA to enhance sensitivity. In addition, RNA structures at both minimum free energy (MFE) and sub-optimal energies can be used for alignment. 

As we are continuing to polish this software, your feedback will be highly appreciated. Please contact Bin Tian for comments/suggestions. 


Download & Installation

The RSmatch package is implemented in Java and Perl and run under a UNIX/Linux operating system. Thus, it needs a Java environment.To run the program smoothly, please make sure your Java version is no older than 1.4. Otherwise, please download a newer version of JAVA from java.sun.com. If the input data are RNA sequences (which must be in the FASTA format), you also need to download and install the Vienna RNA package (see [B] below for instructions). 

[A]   Install RSmatch

  1. Download the current version of RSmatch (RSmatch1.0) by right clicking the mouse and choosing "save target as" to download.
  2. Create a directory for RSmach, e.g. /home/RSmatch, and extract the tar file to the directory by typing tar xvf RSmatch.tar
  3. A directory named "release" under /home/RSmatch will appear. Switch to it by typing cd release
  4. Type RSmatch to run the program.


[B]   Install Vienna RNA v1.4 & RSmatch

  1. Download Vienna RNA package v1.4 and put it under the /home/RNA directory.
  2. Unpack the Vienna RNA package by typing gunzip < ViennaRNA-1.4.tar.gz | tar xvf -
  3. A directory named "ViennaRNA-1.4" under /home/RNA will appear. Switch to it by typing cd ViennaRNA-1.4
  4. Install the Vienna software by typing make all ; make install
  5. Set up the environment variable "VIENNA_HOME".  If your command shell is bash, add export VIENNA_HOME = /home/RNA/ViennaRNA-1.4 to your .bashrc file. If you use csh, add setenv VIENNA_HOME = /home/RNA/ViennaRNA-1.4 to your .cshrc file. You need to log out and log in again to make it effective.
  6. Install and run RSmatch by following the instructions in [A] above. RSmatch will automatically invoke Vienna RNA v1.4 to fold the input sequences into structures and then align the structures.

Usage instructions

[A]   Input:

There are two types of input data. The first type is the nested parenthesized notation representing an RNA secondary structure. For each structure, it has three lines: header line, primary sequence line and structure notation line. A sample structure is like this:
>NM_003234:3394-3493    Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA
GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTGTTTTTTTAATAGAATATAATTATCGGAAGC
((((((.((((....)).))...((((.........(((((((.(((((......))))))))))))(((((((......)))))))...))))))))))

The second type is the FASTA format for RNA sequences. For the sequence data, RSmatch1.0 will automatically invoke Vienna RNA v1.4 to fold the sequences into structures and then align the structures. A sample sequence in the FASTA format is like this:

>NM_003234:3394-3493    Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA
GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTG
TTTTTTTAATAGAATATAATTATCGGAAGC

[B]   Output:

The output of RSmatch gives detailed alignment information. The Stockholm format is adopted to display the output of multiple structure alignment.

[C]   Options:

You can find the general syntax of the command by typing RSmatch.

The general syntax is as follows:

RSmatch [options]
General options:
  -p [dsearch | isearch | mrsa | prsa]
     choose a program:
       dsearch        simple database search;
       isearch        iterative database search;
       mrsa           multiple RNA structure alignment;
       prsa           pair-wise RNA structure alignment;
  -D <database>       FASTA-formatted sequence database.
  -d <database>       secondary structure database. 
  -g <penalty>        gap penalty.
  -o <output>         output file; default is 'result.out'.
  -r <range>          range of folding free energy (kcal/mol) used to select alternative RNA structures;
                      default is 0.
  -S <ratio>          sliding step length, expressed as a ratio of <W_length>; default is 0.5.
  -W <W_length>       sliding window size; default is 100 nt.
Options for 'dsearch' and 'isearch':
  -n <topN>           output top 'topN' hits.
  -Q <query>          query sequence in FASTA format.
  -q <query>          query structure.
Options for 'dsearch' and 'prsa':
  -s <score_matrix>   file containing position independent score matrices; default is 'scoreMat.structure'.
Options for 'dsearch':
  -G <global alignment>
     T:     global alignment
     F:     local alignment
     default: F
  -m <query type>     query type:
                        0: real structure without IUB code;
                        1: pattern structure containing IUB code.
                        default: 0
Options for 'isearch':
  -R <repeat>         number of iterations.
Options for 'prsa':
  -F <factor>         the window-size decreasing rate. A series of window sizes are generated for folding sequences. 
                      The <factor> is the ratio of two contiguous window sizes.

[D]   Examples: