Many ribonucleic acids (RNAs) play important roles in gene regulation, including non-coding RNAs and cis elements in mRNAs. Some of their functions are attributable to the structure they adopt, which are also called RNA motifs. Like sequence elements, RNA structure elements can be identified by comparing RNAs containing similar structures. The RSmatch package is designed to provide a light-weight approach to compare RNA structures, thereby uncovering functional structure elements. Compared with other tools for RNA structure comparison, RSmatch is fast, requiring quadratic time determined by the sizes of two given structures.
RSmatch uses two scoring schemes, i.e. position independent and position dependent schemes. The position independent scheme entails two scoring matrices, one for single-stranded regions and the other for double-stranded regions. This scoring scheme is used in pair-wise comparisons and database searches. The position dependent scheme, also known as profile, scores individual structure positions and is used by the multiple structure alignment and iterative database search functions. RSmatch provides both global and local alignment options even though the latter is more useful in most cases. In addition, RSmatch can take pattern-based structures as input. Please check the following publication for details:
Liu., J., Wang, J.T., Hu, J., and Tian, B. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics 2005, 6:89.
In the current version (1.2), RSmatch provides the following functions: (1) regular database search, (2) multiple structure alignment, (3) iterative database search, (4) pair-wise structure alignment, (5) slide folding.
As we are continuing to polish this software, your feedback will be highly appreciated. Please contact Mugdha Khaladkar or Dr. Bin Tian or Dr. Jason T. L. Wang for comments/suggestions/queries.
The version 1.2 of RSmatch can be downloaded from here. The RSmatch package is implemented using Java and Perl and run under a UNIX/Linux operating system. It needs a Java environment to run smoothly. Please make sure your Java version is no older than 1.4. Otherwise, please download a newer version of JAVA from java.sun.com. If the input data are RNA sequences (which must be in the FASTA format), you also need to download and install the Vienna RNA package.
If the input data are RNA structures, follow these instructions to install
and run RSmatch1.2.
[A] Install RSmatch
If the input data are RNA sequences in the FASTA format,
follow these instructions to install
RSmatch1.2 and Vienna
RNA package v1.4.
[B] Install Vienna RNA v1.4 & RSmatch1.2
>NM_003234:3394-3493 Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTGTTTTTTTAATAGAATATAATTATCGGAAGC ((((((.((((....)).))...((((.........(((((((.(((((......))))))))))))(((((((......)))))))...))))))))))
The second type is the FASTA format for RNA sequences. For the sequence data, RSmatch1.2 will automatically invoke Vienna RNA v1.4 to fold the sequences into structures and then align the structures. A sample sequence in the FASTA format is like this:
>NM_003234:3394-3493 Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTG TTTTTTTAATAGAATATAATTATCGGAAGC
The output of RSmatch1.2 gives detailed alignment information. The Stockholm format is adopted to display the output of multiple structure alignment.
You can find the general syntax of the command by typing RSmatch1.2.
The general syntax is as follows:
General options: -p [dsearch | isearch | mrsa | prsa] choose a program: dsearch simple database search; isearch iterative database search; mrsa multiple RNA structure alignment; prsa pair-wise RNA structure alignment; slide slide folding RNA sequences; -D <database> FASTA-formatted sequence database. -d <database> secondary structure database. -g <penalty> gap penalty. -o <output> output file; default is 'result.out'. -r <range> range of folding free energy (kcal/mol) used to select alternative RNA structures; default is 0. -S <ratio> sliding step length, expressed as a ratio of <W_length>; default is 0.5. -W <W_length> sliding window size; default is 100 nt. -z F turn off slide folding. Options for 'dsearch' and 'isearch': -n <topN> output top 'topN' hits. -Q <query> query sequence in FASTA format. -q <query> query structure. Options for 'dsearch' and 'prsa': -s <score_matrix> file containing position independent score matrices; default is 'scoreMat.structure'. Options for 'dsearch': -G <global alignment> T: global alignment F: local alignment default: F -m <query type> query type: 0: real structure without IUB code; 1: pattern structure containing IUB code. default: 0 Options for 'isearch': -R <repeat> number of iterations. Options for 'prsa': -F <factor> the window-size decreasing rate. A series of window sizes are generated for folding sequences. The <factor> is the ratio of two contiguous window sizes.