Many ribonucleic acids (RNAs) play important roles in gene regulation, including non-coding RNAs and cis elements in mRNAs. Some of their functions are attributable to the structure they adopt, which are also called RNA motifs. Like sequence elements, RNA structure elements can be identified by comparing RNAs containing similar structures. The RSmatch package is designed to provide a light-weight approach to compare RNA structures, thereby uncovering functional structure elements. Compared with other tools for RNA structure comparison, RSmatch is fast, requiring quadratic time determined by the sizes of two given structures.
RSmatch uses two scoring schemes, i.e. position independent and position dependent schemes. The position independent scheme entails two scoring matrices, one for single-stranded regions and the other for double-stranded regions. This scoring scheme is used in pair-wise comparisons and database searches. The position dependent scheme, also known as profile, scores individual structure positions and is used by the multiple structure alignment and iterative database search functions. RSmatch provides both global and local alignment options even though the latter is more useful in most cases. In addition, RSmatch can take pattern-based structures as input. Please check the following publication for details:
Liu., J., Wang, J.T., Hu, J., and Tian, B. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics 2005, 6:89.
In the current version (1.0), RSmatch provides four functions: (1) regular database search, (2) multiple structure alignment, (3) iterative database search, and (4) pair-wise structure alignment.
As we are continuing to polish this software, your feedback will be highly appreciated. Please contact Bin Tian for comments/suggestions.
The RSmatch package is implemented in Java and Perl and
run under a UNIX/Linux operating system. Thus, it needs a
Java environment.To run the program smoothly,
please make sure your Java version is no older
than 1.4. Otherwise, please download a newer version of JAVA from java.sun.com. If the input data are RNA sequences (which must be in the FASTA format),
you also need to download and install the
Vienna RNA package (see [B]
below for instructions).
[A] Install RSmatch
[B] Install Vienna RNA v1.4 & RSmatch
>NM_003234:3394-3493 Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTGTTTTTTTAATAGAATATAATTATCGGAAGC ((((((.((((....)).))...((((.........(((((((.(((((......))))))))))))(((((((......)))))))...))))))))))
The second type is the FASTA format for RNA sequences. For the sequence data, RSmatch1.0 will automatically invoke Vienna RNA v1.4 to fold the sequences into structures and then align the structures. A sample sequence in the FASTA format is like this:
>NM_003234:3394-3493 Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTG TTTTTTTAATAGAATATAATTATCGGAAGC
[B] Output:
The output of RSmatch gives detailed alignment information. The Stockholm format is adopted to display the output of multiple structure alignment.
You can find the general syntax of the command by typing RSmatch.
The general syntax is as follows:
RSmatch [options]
General options:
-p [dsearch | isearch | mrsa | prsa]
choose a program:
dsearch simple database search;
isearch iterative database search;
mrsa multiple RNA structure alignment;
prsa pair-wise RNA structure alignment;
-D <database> FASTA-formatted sequence database.
-d <database> secondary structure database.
-g <penalty> gap penalty.
-o <output> output file; default is 'result.out'.
-r <range> range of folding free energy (kcal/mol) used to select alternative RNA structures;
default is 0.
-S <ratio> sliding step length, expressed as a ratio of <W_length>; default is 0.5.
-W <W_length> sliding window size; default is 100 nt.
Options for 'dsearch' and 'isearch':
-n <topN> output top 'topN' hits.
-Q <query> query sequence in FASTA format.
-q <query> query structure.
Options for 'dsearch' and 'prsa':
-s <score_matrix> file containing position independent score matrices; default is 'scoreMat.structure'.
Options for 'dsearch':
-G <global alignment>
T: global alignment
F: local alignment
default: F
-m <query type> query type:
0: real structure without IUB code;
1: pattern structure containing IUB code.
default: 0
Options for 'isearch':
-R <repeat> number of iterations.
Options for 'prsa':
-F <factor> the window-size decreasing rate. A series of window sizes are generated for folding sequences.
The <factor> is the ratio of two contiguous window sizes.
[D] Examples: