RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.
* Current RepeatMasker library is RepBase13.06.embl.tar.gz
* version 3.19 avaialabel at /usr/local/RepeatMasker.3.19
* version 3.16 avaialabel at /usr/local/RepeatMasker.3.16
* version 3.15 avaialabel at /usr/local/RepeatMasker.3.15
NAME
RepeatMasker - Mask repetitive DNA
SYNOPSIS
RepeatMasker [-options] <seqfiles(s) in fasta format>
DESCRIPTION
The options are:
-h(elp)
Detailed help
Default settings are for masking all type of repeats in a primate
sequence.
-w(ublast) **deprecated**
Use WU-blast, rather than cross_match as engine. Note use the new
-engine wublast option.
-de(cypher) **deprecated**
Use DeCypher, rather than cross_match as engine. Note use the new
-engine decypher option.
-e(ngine) [crossmatch|wublast|decypher]
Use an alternate search engine to the default.
-pa(rallel) [number]
The number of processors to use in parallel (only works for batch
files or sequences over 50 kb)
-s Slow search; 0-5% more sensitive, 2-3 times slower than default
-q Quick search; 5-10% less sensitive, 2-5 times faster than default
-qq Rush job; about 10% less sensitive, 4->10 times faster than default
(quick searches are fine under most circumstances) repeat options
-nolow /-low
Does not mask low_complexity DNA or simple repeats
-noint /-int
Only masks low complex/simple repeats (no interspersed repeats)
-norna
Does not mask small RNA (pseudo) genes
-alu
Only masks Alus (and 7SLRNA, SVA and LTR5)(only for primate DNA)
-div [number]
Masks only those repeats < x percent diverged from consensus seq
-lib [filename]
Allows use of a custom library (e.g. from another species)
-cutoff [number]
Sets cutoff score for masking repeats when using -lib (default 225)
-species <query species>
Specify the species or clade of the input sequence. The species name
must be a valid NCBI Taxonomy Database species name and be contained
in the RepeatMasker repeat database. Some examples are:
-is_only
Only clips E coli insertion elements out of fasta and .qual files
-is_clip
Clips IS elements before analysis (default: IS only reported)
-no_is
Skips bacterial insertion element check
-rodspec
Only checks for rodent specific repeats (no repeatmasker run)
-primspec
Only checks for primate specific repeats (no repeatmasker run)
Running options
-gc [number]
Use matrices calculated for 'number' percentage background GC level
-gccalc
RepeatMasker calculates the GC content even for batch files/small
seqs
-frag [number]
Maximum sequence length masked without fragmenting (default 40000,
300000 for DeCypher)
-maxsize [nr]
Maximum length for which IS- or repeat clipped sequences can be
produced (default 4000000). Memory requirements go up with higher
maxsize.
-nocut
Skips the steps in which repeats are excised
-noisy
Prints search engine progress report to screen (defaults to .stderr
file)
-nopost
Do not postprocess the results of the run ( i.e. call ProcessRepeats
). NOTE: This options should only be used when ProcessRepeats will
be run manually on the results.
output options
-dir [directory name]
Writes output to this directory (default is query file directory,
"-dir ." will write to current directory).
-a(lignments)
Writes alignments in .align output file; (not working with -wublast)
-inv
Alignments are presented in the orientation of the repeat (with
option -a)
-lcambig
Outputs ambiguous DNA transposon fragments using a lower case name.
All other repeats are listed in upper case. Ambiguous fragments
match multiple repeat elements and can only be called based on
flanking repeat information.
-small
Returns complete .masked sequence in lower case
-xsmall
Returns repetitive regions in lowercase (rest capitals) rather than
masked
-x Returns repetitive regions masked with Xs rather than Ns
-poly
Reports simple repeats that may be polymorphic (in file.poly)
-source
Includes for each annotation the HSP "evidence". Currently this
option is only available with the "-html" output format listed
below.
-html
Creates an additional output file in xhtml format.
-ace
Creates an additional output file in ACeDB format
-gff
Creates an additional Gene Feature Finding format output
-u Creates an additional annotation file not processed by
ProcessRepeats
-xm Creates an additional output file in cross_match format (for
parsing)
-fixed
Creates an (old style) annotation file with fixed width columns
-no_id
Leaves out final column with unique ID for each element (was
default)
-e(xcln)
Calculates repeat densities (in .tbl) excluding runs of >=20 N/Xs in
the query
SEE ALSO
Crossmatch, WUBlast, ProcessRepeats
COPYRIGHT
Copyright 2007 Arian Smit, Institute for Systems Biology
RepeatMasker filename -e wublast
bsub -q queueName RepeatMasker filename -e wublast -o filename.out
* Current RepeatMasker library is RepBase13.06.embl.tar.gz
* version 3.19 avaialabel at /usr/local/RepeatMasker.3.19
* version 3.16 avaialabel at /usr/local/RepeatMasker.3.16
* version 3.15 avaialabel at /usr/local/RepeatMasker.3.15