Home | About | Manual | API | Download | Feedback | SF
Homepage |
DNPTrapper Manual and Tutorial
Quick Start
To run DNPTrapper, go to the trapper directory and type
bash$ bin/trapper -d /path/to/project/directory -c
config/trapperconf.xml
with correct path to project directory.
Overview
Concepts
DNPTrapper is a
sequence alignment editing tool developed primarily for
finishing and analysis of complicated shotgun sequencing projects. The
main goal of DNPTrapper is to provide more power to the user than other
finishing tools
allow. The user can move sequences around; re-align them; cut, copy and
paste them; run algoritms on them; add and remove features; choose
between view modes; zoom in and out; et cetera. The user interface is a
front end to a database, and changes made using DNPTrapper are
automatically reflected in the database.
Directories
Each DNPTrapper project needs its own project directory. This directory
must be supplied on the command line with the option -d (or
--dbhome-dir). The directory given must be empty or containing an
existing project. See above for example how to start a DNPTrapper session.
All contigs in one project are located in separate directories under
the project directory. If a new contig is created, it will appear as a
new directory in the project directory.
Supported file formats
Currently, the only format supported is a native XML format. Utility
programs for converting from ACE-files are included in the distribution.
More formats will be supported in the future.
Views and windows
Multiple contigs can be open simultaneously, and multiple views of a
contig can be open at the same time. This can be useful for e.g. viewing
different parts of the same contigs at the same time, for instance at
different zoom levels. Changes made in one view will be reflected in the
other(s), since they represent the same document, i.e. contig. Open
windows can be arranged within the workspace in overlapping or tiling
fashion. The user can switch between windows using the Window menu, or by pressing the Tab
key (Shift + Tab switches the windows in reverse order).
The user interacts with the data (= reads) mainly using the mouse.
Clicking on a read selects it (indicated by a red instead of a black
border). If the user presses Ctrl while clicking on a read, this toggles
the "selectedness" of the read. It's also possible to select reads by
pressing the left mouse button where it doesn't hit a read and dragging
the mouse. A dashed "rubber band" appears, selecting all reads
intersecting it upon release of the button. If Ctrl is pressed during
this operation, previously selected reads remain selected. Right
clicking on selected reads opens a context menu with options for
manipulating data (see below).
Moving modes
In the normal moving mode
(indicated by an arrow on the toolbar), selected reads can be moved
vertically within the present contig. No horizontal movement, or
copying reads to other contigs by dragging, is thus allowed in this
mode. In drag mode, reads may be moved horizontally within a contig or
dragged to other contigs, whereby they are copied to the new
contig. In the new contig, the reads end up in the top left
corner.
When moving reads within the same contig, the last move is
undoable. This is done by selecting "Undo last move" from the Edit
menu. NOTA BENE: if the user changes the read selection, or runs an
Operation (see below), the last move is no longer undoable. This is to
make sure that the integrity of the data is kept.
Final notes on moving: when one or several reads are moved, DNPTrapper will refuse to let reads end up on negative rows or
columns. If this is the case, the whole move will be
cancelled. DNPTrapper will however let reads up on the same positions,
i.e. reads may overlap on the same row. The overlapping part is
indicated by a green color.
Zooming
The user can zoom in and out of the views using toolbar buttons, menu
options or keyboard shortcuts. It is possible to zoom in one direction
only if desired. At a low zoom level ("birdseye view"), certain sequence
features are not displayed as they would be unintelligable at this level
anyway.
Context menu
By right clicking in a view, a context menu opens up. Four different
options are available:
The different
Operations that have been implemented for DNPTrapper so far. Currently
these are:
- Compact Reads -
reorganizes the reads to occupy as little vertical space as possible.
- Export Consensus -
writes the consensus (according to majority vote) to file.
- Find DNPs - analyzes
selected reads for DNP content.
- Find Mates - finds and
selects mate pairs of selected reads, if such data has been imported.
- Quality Trim seqs - marks up
low quality ends on reads. Low quality parts will be shaded, and will be excluded
from some operations (e.g. Find DNPs).
- ReAlign - locally
optimizes alignment of selected reads.
- Remove DNPs - removes
DNPs in selected reads.
- Remove Quality trimming - resets
low quality mark up.
- Search reads for DNP ID
- highlights reads containing a certain DNP.
- Select reads from file
- highlights reads specified in a file with readnames.
- Select strand
- highlights reads of specified strand (normal or reverse).
- Sort according to DNPs -
a DNP grouping algorithm. Naive, but pretty useful as a starting point.
- Write ACE file - writes
selected reads to a file in the ACE format.
- Write alignment to file - writes
alignment in ASCII format to file "alignment.out" in current directory.
- Write readnames to file
- writes the names of selected sequences to a file. Very useful for saving
different selections and used in combination with "Select reads from file" above.
- Write selected seqs to file
- writes the selected sequences, and their corresponding quality values,
to file in FASTA format.
-
Info
Info about
reads and features at the clicked position. If a read, feature etc is
not present at the position, no such info is displayed.
- Read info - read name,
strand, quality region, local position in read and list of read's DNPs.
- Feature info - base,
quality value and DNP ID, if any.
- General info - row and
column in alignment.
-
Switch to viewmode
Selection of different
visualization modes for the data. Sequences can be visualized with
quality as a grey scale behind or above the sequence, DNPs can be
turned on or off etc.
If chromatogram and
phd data has been imported, these options are used to align the trace data at
the desired column.
- Set Chromat center - aligns chromatograms
at cursor position and highlights the column with a vertical line. NB: This can be used also
when no chromatogram data is present and can be useful for keeping track of the column currently
being investigated.
- Clear Chromat center - clears chromatogram
alignment position (i.e. sets it to 0).
-
Scroll here
Scrolls the
view so that the selected position is in the top left corner (if there
are enough rows and columns to make this possible).
Menus
Contig
Open
Opens an
existing contig or creates a new one (Ctrl + O)
Close
Closes current
document and all its views (Ctrl + W)
Exit
Exits the
application. Changes are automatically saved (Ctrl + Q)
Edit
Undo last move
Undoes the last
move if, in the meantime, no Operations has been performed and read selection
hasn't changed (Ctrl + Z)
Cut
Cuts selected
reads. They are also copied to the clipboard and can be pasted back into
the contig or into another contig (Ctrl + X)
Copy
Copies
selected reads to clipboard (Ctrl + C)
Paste
Pastes
clipboard content into current contig (Ctrl + V)
Select All
Selects all
reads (Ctrl + A)
Select Between Rows
Selects all
reads between rows specified by user
View
Different self-explanatory
choices for zooming, along with some actions that need more explanation:
Enlarge
Adds more
viewable rows and columns to contigview. Does not affect data.
Shrink
Removes
viewable rows and columns from contigview. Does not affect data.
Show statistics
Displays
database statistics from Berkley DB.
Window
New view
Opens up
another view of current contig.
Cascade
Orders open
windows in a cascading fashion.
Tile
Orders open
windows in a tiling fashion.
Tools
Import project
Imports a
dataset from an XML file
Import Chromatograms
Imports chromatograms
in an XML format (see below)
Import Mates
Imports mate pair
data (see below)
Import phd data
Imports phd data
files (see below)
Normal Mode
In this mode,
read moving is only allowed in the vertical direction
Drag Mode
In this mode,
it is possible to move reads horizontally within a contig, and copy
reads to other contigs using dragging
Chromatograms and phd files
Chromatograms are imported in an XML format. A parser that translates
ABI trace files into this format is provided with the
distribution.
Phred phd file (or equivalent files of same format) must be
provided in order for the alignment of chromatograms to work
properly. Files must be named READNAME.phd.1 in order to be recognised
by DNPTrapper.
Mate Pair data
Mate pair data should be in a file with one pair on each line, with the format
READNAME MATENAME MATELENGTH
Native file format
DNPTrapper uses a simple XML format, see file "format.txt" in the doc directory.
Safety, backups etc
Two issues are
important to note about DNPTrapper. Firstly, there is currently no "Undo"
function implemented except for the last move. Edit operations made
using DNPTrapper are automatically reflected in the database, and changes
have to be undone manually if the user changes his/her mind. Some
changes made by algorithms are virtually impossible to undo, such as
ReAligning a whole contig.
A way to get around this is to copy reads to new contigs and perform
crucial editing in the new contigs. If the editing doesn't have the
desired effect, the contigs can simply be discarded (by removing the
contig directories AFTER the DNPTrapper session). If the user want to
keep the changes, the reads can be cut out of the old contig if that's
what's desired. This is actually an intended use of DNPTrapper - a "sand
box" approach where the user can copy data, play around with it and
then decide whether to keep it or not.
Secondly, DNPTrapper is under developement, and no warranties are issued
by the developers regarding program performance and stability. It's
probably a good idea to make backups of the contig directory now and
then to be on the safe side. NOTE: backup copying of the project
directory must be performed between runs of the program, NOT during runs!
Known bugs and problems
- A minor visual bug at the lowest zoom level when scrolling.
- It is possible for the user to change directory names in the
Open File dialog. Dangerous!
- ReAligning takes quite a while to perform currently, a faster implementation is under way.
- Dates etc in exported ACE file are bogus.
Please send bug reports, including instructions on how to invoke the
bug, to (remove caps) erik.arnerNOSPAM@gmail.com.
Coming attractions
Below are some features that should appear in DNPTrapper in the near
future.
- Autoscrolling in normal moving mode and with rubberband
- Choosing filename in "Write to file" algo
- Exporting to ACE file format
- Better support for mate pairs
Please send feature requests, and feedback regarding current
functionality of the program, to (remove caps) erik.arnerNOSPAM@gmail.com.
Tutorial
This is just a brief tutorial for getting started with the DNPTrapper
program. Provided with this version are some files for testing DNPTrapper
functionality, and a utility program (aceparser)
for importing .ace files generated by phrap into trapper.
Test files:
- test.seq - FASTA file with
simulated shotgun reads sampling repeated sequence with 2% mean
difference between any two repeat units.
- test.qual - FASTA file
with corresponding quality values.
- test.ace - ACE file
produced by phrap for this dataset.
Tutorial Start
NOTA BENE: In this simple tutorial, the
sequence reads have been trimmed and low quality sequence has been
removed before assembly with phrap. In the normal case, the user
should run the "Quality Trim seqs" operation before DNP analysis. In
this example, this is not needed.
- Go to the trapper installation directory. Run the aceparser program (bin/aceparser)
without any parameters to see a short help text on how it is used. Run
it again with test.ace and test.qual to generate test.ace.xml. This
might take a short while.
- Make an empty directory called trappertest that will be the new
project directory.
- Open DNPTrapper: bash$ bin/trapper -d /path/to/trappertest -c
config/trapperconf.xml
- From the Tools menu,
choose "Import project" and
choose the generated trappertest.xml in the file dialog.
- From the Contig menu,
choose "Open" (or click the
directory icon on the toolbar). A directory called "Contig0" should
now have appeared in your project directory. Choose it and click
OK.
- You should now see an alignment consisting of black boxes on
white background. They represent the reads in the contig. If you zoom
in, you will see the base sequences and quality values visualized.
Zooming can be performed using toolbar buttons or by choosing from the View menu,
alternativly you can press the "+" and "-" keys on your keyboard. If
you press Ctrl at the same time, the zooming is only performed in the
X-direction. Similarly, pressing Ctrl + Alt zooms in Y-direction only.
You can scroll using the mouse, or by using the arrow keys. Hitting Ctrl
while using the arrow keys speeds the scrolling up.
- Try selecting and un-selecting some reads by Ctrl-clicking them.
Also try the "rubber band" functionality for read selection by pressing
the left mouse button where it doesn't hit a read and drag the mouse
pointer. Select all the reads by either a) choosing "Select All" from
the Edit menu, b) hitting Ctrl +
A on the keyboard, or c) pressing the "Select All" button in the
toolbar. If you click where no read is located, all reads are
de-selected. Right click on a read and get familiar with the different
options in the context menu that pops up (see above for more detailed
description of this menu).
- As you notice when you zoom out, each read is located on a
separate row. This makes it hard to get an overview of the contig, so
we'd like to reorganize the reads to occupy less vertical
space. Select all reads, then right click on one of the selected reads
and choose "Compact Reads"
from the Operations entry in
the context menu. You will be asked to decide at what row the
compacted alignment should start - choose row no. 1.
- Look around in the alignment by zooming in and out, scrolling
etc. If you look closely, you'll see that the alignment isn't locally
optimal everywhere. This is because phrap doesn't optimze the
alignment. To optimize the alignment, choose "ReAlign" from the Operations option in the context
menu. This will take a few seconds.
- Now it's time to try separating repeats. Make sure all reads are
selected and run the "Find
DNPs" operation from the context menu. This takes a few
seconds. For more information about DNPs, see these references.
- As you see, a lot of dots with different colors appear in the
alignment. They represent DNPs. Let's try some different ways of
resolving the repeats.
- Select one read containing DNPs and drag it below the alignment.
Using your eyes, try to spot other reads containing DNPs with the same
color on the same column. Drag them down too. If you see "new" DNPs to
the left or right of the first one, look for them in the alignment also
and drag the reads down to fit with your current working group. As you
notice, you're only allowed to drag the reads vertically. This makes
sense, since you want to keep the bases aligned properly. Try this for a
while in order to get a feel of how you can organize the reads into
different repeat groups.
- As you notice, this is a slow and tedious way of doing things.
Let's try a faster version of the same method. Compact the reads again
(see step 9). Again, choose a read containing DNPs and drag it below
the alignment. Choose a DNP and find out it's ID by right clicking it
and looking it up under "Feature
info" under the Info
subsection of the context menu. If you don't see it, it's because you
didn't hit it, which could be due to the fact that one pixel
corresponds to several positions when you're zoomed out. But not to
worry. If you look under "Read
info", you'll see that all DNPs are listed for the read along
with their positions and IDs. Also listed is the current position in
the read. The DNP you're interested in is probably located quite near
this position. Remember its ID.
- Now select all reads above the DNP you chose to work with, by
dragging the "rubber band" so that it intersects all reads at that
position, or by choosing "Select between rows" from the Edit menu and
typing in 1 and a large number as parameters. Right click on one of
the selected reads and choose "Search reads for DNP ID". In the query
box that pops up, type in the ID for the DNP you're working with. As
you see, the reads that have this DNP will remain selected, while the
others are un-selected. Now you can simply press one of the selected
reads and drag all of them to the read you started with.
- Choose an other DNP in your current "working set" by repeating
step 14 and 15, but instead of dragging the reads manually to your
group, use the "Compact Reads"
operation in order to put them in a nice compact fashion starting on a
row just below your group.
- Now let's try the fastest method of resolving the
repeats. Again, compact all the reads (see step 9). Make sure all
reads are selected and run "Sort
reads according to DNPs". You'll be asked for a starting row,
choose no. 1. As you see, the repeats organize themselves into fairly
neat piles with similar DNP content. In some ambigous cases the algo
will make a wrong choice, but that's quite easy to correct manually
using the techniques from the previous steps.
- Finally, it's time to try to connect the different repeat
groups. It is easy to verify that phrap has resolved the repeats
partially. It can be done e.g. by exporting the consensus sequence
("Export Consensus" under
Operations) and viewing a dot
plot of it against itself using a suitable tool (e.g. Dotter).
Actually, phrap has separated the repeats into two units. This means
that some DNPs should be present in two versions, one on the left hand
side of the alignment and one on the right hand side. This is the case
for some DNPs in this example. The DNP colors reflect what type of DNP
we have. For instance a red DNP means that the DNP is an A, while the
consensus on that column is T. See below for complete color code of
DNPs. In our example, some DNP patterns can be detected visually. This
can be a bit hard to see, so you get a hint: find and look at DNP IDs
18 and 23. They correspond to DNP IDs 549 and 550 both in color and in
spacing. Verify this.
- Create a new contig using "Open". In the file dialog, create a new directory and
name it (e.g. to NewContig). Choose it. Under the Window menu, choose "Tile". Now you see the new contig
and the old one next to each other. If you think it's more convenient,
you can manually arrange the contigs on top of each other instead. Now
select all reads in the group containing DNPs 18 and 23. Choose "Drag Mode" from the Tools menu. Now you can drag the
selected reads into the new contig. This copies them to the new
contig, so they're still left in the old one - if you want to remove
them, you have to use "Cut".
- Repeat the above step for the group containing DNPs 549 and 550.
Try to align the matching DNPs against each other. If you zoom in,
you'll see that they don't match exactly(there are some gaps in one of the groups that
displace one of the DNPs), but if you ReAlign you'll see that they match perfectly. Presto!
DNP Colors
A->T
|
Red
|
T->A
|
Dark Red
|
A->G
|
Green
|
G->A
|
Dark Green
|
A->C
|
Blue
|
C->A
|
Dark Blue
|
T->G
|
Cyan
|
G->T
|
Dark Cyan
|
T->C
|
Magenta
|
C->T
|
Dark Magenta
|
G->C
|
Yellow
|
C->G
|
Dark Yellow
|
References
1: Bioinformatics.
2002 Mar;18(3):379-88.
Separation of nearly identical
repeats in shotgun assemblies using defined
nucleotide positions,
DNPs.
Tammi MT, Arner E, Britton T, Andersson B.
2:
Comput Methods Programs Biomed. 2003 Jan;70(1):47-59.
TRAP:
Tandem Repeat Assembly Program produces improved shotgun assemblies
of
repetitive sequences.
Tammi MT, Arner E, Andersson
B.
3: Nucleic Acids Res. 2003 Aug 1;31(15):4663-72.
Correcting errors in shotgun sequences.
Tammi MT, Arner
E, Kindlund E, Andersson B.
4: Bioinformatics. 2004 Mar
22;20(5):803-4. Epub 2004 Jan 29.
ReDiT: Repeat Discrepancy
Tagger--a shotgun assembly finishing aid.
Tammi MT, Arner E,
Kindlund E, Andersson B.
5. BMC Bioinformatics, 2006 Mar 20; 7:155
DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions.
Arner, E, Tammi, MT, Tran, AN, Kindlund, E, Andersson B.