UGA logo RCC: Research Computing Center
 
 
Home >
 
 
RESOURCES
SERVICES
Application & Code Development
Consulting
Grantwriting Support

Repeatmasker

Category | Version | Author | Description
Program on:altix | inQuiry | pcluster | rcluster,IOB

Category(ies): Bioinformatics

Version: 3.25(3.19,(3.16, 3.15 also available)

Author / Distributor: Author: A.F.A. Smit, R. Hubley & P. Green, Institute For Systems Biology

Description:

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.

altix: Not available on altix

Back to top


pcluster: Not available on pcluster

Back to top


rcluster,IOB: running program | Documentation | Installation | System

Running Program: also refer to submit jobs to queues at rcluster,IOB

Cross_match and Wublast are configured as search engines.

RepeatMasker filename  -e wublast

bsub -q queueName RepeatMasker filename  -e wublast -o filename.out

* Current RepeatMasker library is RepBase13.06.embl.tar.gz
* version 3.19 avaialabel at /usr/local/RepeatMasker.3.19
* version 3.16 avaialabel at /usr/local/RepeatMasker.3.16
* version 3.15 avaialabel at /usr/local/RepeatMasker.3.15



NAME
RepeatMasker - Mask repetitive DNA

SYNOPSIS
RepeatMasker [-options] <seqfiles(s) in fasta format>

DESCRIPTION
The options are:

-h(elp)
Detailed help

Default settings are for masking all type of repeats in a primate
sequence.

-w(ublast) **deprecated**
Use WU-blast, rather than cross_match as engine. Note use the new
-engine wublast option.

-de(cypher) **deprecated**
Use DeCypher, rather than cross_match as engine. Note use the new
-engine decypher option.

-e(ngine) [crossmatch|wublast|decypher]
Use an alternate search engine to the default.

-pa(rallel) [number]
The number of processors to use in parallel (only works for batch
files or sequences over 50 kb)

-s Slow search; 0-5% more sensitive, 2-3 times slower than default

-q Quick search; 5-10% less sensitive, 2-5 times faster than default

-qq Rush job; about 10% less sensitive, 4->10 times faster than default
(quick searches are fine under most circumstances) repeat options

-nolow /-low
Does not mask low_complexity DNA or simple repeats

-noint /-int
Only masks low complex/simple repeats (no interspersed repeats)

-norna
Does not mask small RNA (pseudo) genes

-alu
Only masks Alus (and 7SLRNA, SVA and LTR5)(only for primate DNA)

-div [number]
Masks only those repeats < x percent diverged from consensus seq

-lib [filename]
Allows use of a custom library (e.g. from another species)

-cutoff [number]
Sets cutoff score for masking repeats when using -lib (default 225)

-species <query species>
Specify the species or clade of the input sequence. The species name
must be a valid NCBI Taxonomy Database species name and be contained
in the RepeatMasker repeat database. Some examples are:

-species human
-species mouse
-species rattus
-species "ciona savignyi"
-species arabidopsis

Other commonly used species:

mammal, carnivore, rodentia, rat, cow, pig, cat, dog, chicken, fugu,
danio, "ciona intestinalis" drosophila, anopheles, elegans,
diatoaea, artiodactyl, arabidopsis, rice, wheat, and maize

Contamination options

-is_only
Only clips E coli insertion elements out of fasta and .qual files

-is_clip
Clips IS elements before analysis (default: IS only reported)

-no_is
Skips bacterial insertion element check

-rodspec
Only checks for rodent specific repeats (no repeatmasker run)

-primspec
Only checks for primate specific repeats (no repeatmasker run)

Running options

-gc [number]
Use matrices calculated for 'number' percentage background GC level

-gccalc
RepeatMasker calculates the GC content even for batch files/small
seqs

-frag [number]
Maximum sequence length masked without fragmenting (default 40000,
300000 for DeCypher)

-maxsize [nr]
Maximum length for which IS- or repeat clipped sequences can be
produced (default 4000000). Memory requirements go up with higher
maxsize.

-nocut
Skips the steps in which repeats are excised

-noisy
Prints search engine progress report to screen (defaults to .stderr
file)

-nopost
Do not postprocess the results of the run ( i.e. call ProcessRepeats
). NOTE: This options should only be used when ProcessRepeats will
be run manually on the results.

output options

-dir [directory name]
Writes output to this directory (default is query file directory,
"-dir ." will write to current directory).

-a(lignments)
Writes alignments in .align output file; (not working with -wublast)

-inv
Alignments are presented in the orientation of the repeat (with
option -a)

-lcambig
Outputs ambiguous DNA transposon fragments using a lower case name.
All other repeats are listed in upper case. Ambiguous fragments
match multiple repeat elements and can only be called based on
flanking repeat information.

-small
Returns complete .masked sequence in lower case

-xsmall
Returns repetitive regions in lowercase (rest capitals) rather than
masked

-x Returns repetitive regions masked with Xs rather than Ns

-poly
Reports simple repeats that may be polymorphic (in file.poly)

-source
Includes for each annotation the HSP "evidence". Currently this
option is only available with the "-html" output format listed
below.

-html
Creates an additional output file in xhtml format.

-ace
Creates an additional output file in ACeDB format

-gff
Creates an additional Gene Feature Finding format output

-u Creates an additional annotation file not processed by
ProcessRepeats

-xm Creates an additional output file in cross_match format (for
parsing)

-fixed
Creates an (old style) annotation file with fixed width columns

-no_id
Leaves out final column with unique ID for each element (was
default)

-e(xcln)
Calculates repeat densities (in .tbl) excluding runs of >=20 N/Xs in
the query

SEE ALSO
Crossmatch, WUBlast, ProcessRepeats

COPYRIGHT
Copyright 2007 Arian Smit, Institute for Systems Biology

AUTHORS
Arian Smit <asmit@systemsbiology.org>

Robert Hubley <rhubley@systemsbiology.org>

Documentation: Online document available at RepeatMakser website

Installation: source downloaded from RepeatMasker , library updated from Repbase.

System(s): Unix

Back to top


 
Partnering with UGA