DNPTrapper Manual and Tutorial

Quick Start

To run DNPTrapper, go to the trapper directory and type

bash$ bin/trapper -d /path/to/project/directory -c config/trapperconf.xml

with correct path to project directory.

Overview

Concepts

DNPTrapper is a sequence alignment editing tool developed primarily for finishing and analysis of complicated shotgun sequencing projects. The main goal of DNPTrapper is to provide more power to the user than other finishing tools allow. The user can move sequences around; re-align them; cut, copy and paste them; run algoritms on them; add and remove features; choose between view modes; zoom in and out; et cetera. The user interface is a front end to a database, and changes made using DNPTrapper are automatically reflected in the database.

Directories

Each DNPTrapper project needs its own project directory. This directory must be supplied on the command line with the option -d (or --dbhome-dir). The directory given must be empty or containing an existing project. See above for example how to start a DNPTrapper session.

All contigs in one project are located in separate directories under the project directory. If a new contig is created, it will appear as a new directory in the project directory.

Supported file formats

Currently, the only format supported is a native XML format. Utility programs for converting from ACE-files are included in the distribution. More formats will be supported in the future.

Views and windows

Multiple contigs can be open simultaneously, and multiple views of a contig can be open at the same time. This can be useful for e.g. viewing different parts of the same contigs at the same time, for instance at different zoom levels. Changes made in one view will be reflected in the other(s), since they represent the same document, i.e. contig. Open windows can be arranged within the workspace in overlapping or tiling fashion. The user can switch between windows using the Window menu, or by pressing the Tab key (Shift + Tab switches the windows in reverse order).

The user interacts with the data (= reads) mainly using the mouse. Clicking on a read selects it (indicated by a red instead of a black border). If the user presses Ctrl while clicking on a read, this toggles the "selectedness" of the read. It's also possible to select reads by pressing the left mouse button where it doesn't hit a read and dragging the mouse. A dashed "rubber band" appears, selecting all reads intersecting it upon release of the button. If Ctrl is pressed during this operation, previously selected reads remain selected. Right clicking on selected reads opens a context menu with options for manipulating data (see below).

Moving modes

In the normal moving mode (indicated by an arrow on the toolbar), selected reads can be moved vertically within the present contig. No horizontal movement, or copying reads to other contigs by dragging, is thus allowed in this mode. In drag mode, reads may be moved horizontally within a contig or dragged to other contigs, whereby they are copied to the new contig. In the new contig, the reads end up in the top left corner.

When moving reads within the same contig, the last move is undoable. This is done by selecting "Undo last move" from the Edit menu. NOTA BENE: if the user changes the read selection, or runs an Operation (see below), the last move is no longer undoable. This is to make sure that the integrity of the data is kept.

Final notes on moving: when one or several reads are moved, DNPTrapper will refuse to let reads end up on negative rows or columns. If this is the case, the whole move will be cancelled. DNPTrapper will however let reads up on the same positions, i.e. reads may overlap on the same row. The overlapping part is indicated by a green color.

Zooming

The user can zoom in and out of the views using toolbar buttons, menu options or keyboard shortcuts. It is possible to zoom in one direction only if desired. At a low zoom level ("birdseye view"), certain sequence features are not displayed as they would be unintelligable at this level anyway.

Context menu

By right clicking in a view, a context menu opens up. Four different options are available:

Operations

The different Operations that have been implemented for DNPTrapper so far. Currently these are:

Compact Reads - reorganizes the reads to occupy as little vertical space as possible.
Export Consensus - writes the consensus (according to majority vote) to file.
Find DNPs - analyzes selected reads for DNP content.
Find Mates - finds and selects mate pairs of selected reads, if such data has been imported.
Quality Trim seqs - marks up low quality ends on reads. Low quality parts will be shaded, and will be excluded from some operations (e.g. Find DNPs).
ReAlign - locally optimizes alignment of selected reads.
Remove DNPs - removes DNPs in selected reads.
Remove Quality trimming - resets low quality mark up.
Search reads for DNP ID - highlights reads containing a certain DNP.
Select reads from file - highlights reads specified in a file with readnames.
Select strand - highlights reads of specified strand (normal or reverse).
Sort according to DNPs - a DNP grouping algorithm. Naive, but pretty useful as a starting point.
Write ACE file - writes selected reads to a file in the ACE format.
Write alignment to file - writes alignment in ASCII format to file "alignment.out" in current directory.
Write readnames to file - writes the names of selected sequences to a file. Very useful for saving different selections and used in combination with "Select reads from file" above.
Write selected seqs to file - writes the selected sequences, and their corresponding quality values, to file in FASTA format.

Info

Info about reads and features at the clicked position. If a read, feature etc is not present at the position, no such info is displayed.

Read info - read name, strand, quality region, local position in read and list of read's DNPs.
Feature info - base, quality value and DNP ID, if any.
General info - row and column in alignment.

Switch to viewmode

Selection of different visualization modes for the data. Sequences can be visualized with quality as a grey scale behind or above the sequence, DNPs can be turned on or off etc.

Chromatogram options

If chromatogram and phd data has been imported, these options are used to align the trace data at the desired column.

Set Chromat center - aligns chromatograms at cursor position and highlights the column with a vertical line. NB: This can be used also when no chromatogram data is present and can be useful for keeping track of the column currently being investigated.
Clear Chromat center - clears chromatogram alignment position (i.e. sets it to 0).

Scroll here

Scrolls the view so that the selected position is in the top left corner (if there are enough rows and columns to make this possible).

Menus

Contig

Open

Opens an existing contig or creates a new one (Ctrl + O)

Close

Closes current document and all its views (Ctrl + W)

Exit

Exits the application. Changes are automatically saved (Ctrl + Q)

Edit

Undo last move

Undoes the last move if, in the meantime, no Operations has been performed and read selection hasn't changed (Ctrl + Z)

Cut

Cuts selected reads. They are also copied to the clipboard and can be pasted back into the contig or into another contig (Ctrl + X)

Copy

Copies selected reads to clipboard (Ctrl + C)

Paste

Pastes clipboard content into current contig (Ctrl + V)

Select All

Selects all reads (Ctrl + A)

Select Between Rows

Selects all reads between rows specified by user

View

Different self-explanatory choices for zooming, along with some actions that need more explanation:

Enlarge

Adds more viewable rows and columns to contigview. Does not affect data.

Shrink

Removes viewable rows and columns from contigview. Does not affect data.

Show statistics

Displays database statistics from Berkley DB.

Window

New view

Opens up another view of current contig.

Cascade

Orders open windows in a cascading fashion.

Tile

Orders open windows in a tiling fashion.

Tools

Import project

Imports a dataset from an XML file

Import Chromatograms

Imports chromatograms in an XML format (see below)

Import Mates

Imports mate pair data (see below)

Import phd data

Imports phd data files (see below)

Normal Mode

In this mode, read moving is only allowed in the vertical direction

Drag Mode

In this mode, it is possible to move reads horizontally within a contig, and copy reads to other contigs using dragging

Chromatograms and phd files

Chromatograms are imported in an XML format. A parser that translates ABI trace files into this format is provided with the distribution.

Phred phd file (or equivalent files of same format) must be provided in order for the alignment of chromatograms to work properly. Files must be named READNAME.phd.1 in order to be recognised by DNPTrapper.

Mate Pair data

Mate pair data should be in a file with one pair on each line, with the format

READNAME MATENAME MATELENGTH

Native file format

DNPTrapper uses a simple XML format, see file "format.txt" in the doc directory.

Safety, backups etc

Two issues are important to note about DNPTrapper. Firstly, there is currently no "Undo" function implemented except for the last move. Edit operations made using DNPTrapper are automatically reflected in the database, and changes have to be undone manually if the user changes his/her mind. Some changes made by algorithms are virtually impossible to undo, such as ReAligning a whole contig.

A way to get around this is to copy reads to new contigs and perform crucial editing in the new contigs. If the editing doesn't have the desired effect, the contigs can simply be discarded (by removing the contig directories AFTER the DNPTrapper session). If the user want to keep the changes, the reads can be cut out of the old contig if that's what's desired. This is actually an intended use of DNPTrapper - a "sand box" approach where the user can copy data, play around with it and then decide whether to keep it or not.

Secondly, DNPTrapper is under developement, and no warranties are issued by the developers regarding program performance and stability. It's probably a good idea to make backups of the contig directory now and then to be on the safe side. NOTE: backup copying of the project directory must be performed between runs of the program, NOT during runs!

Known bugs and problems

A minor visual bug at the lowest zoom level when scrolling.
It is possible for the user to change directory names in the Open File dialog. Dangerous!
ReAligning takes quite a while to perform currently, a faster implementation is under way.
Dates etc in exported ACE file are bogus.

Please send bug reports, including instructions on how to invoke the bug, to (remove caps) erik.arnerNOSPAM@gmail.com.

Coming attractions

Below are some features that should appear in DNPTrapper in the near future.

Autoscrolling in normal moving mode and with rubberband
Choosing filename in "Write to file" algo
Exporting to ACE file format
Better support for mate pairs

Please send feature requests, and feedback regarding current functionality of the program, to (remove caps) erik.arnerNOSPAM@gmail.com.

Tutorial

This is just a brief tutorial for getting started with the DNPTrapper program. Provided with this version are some files for testing DNPTrapper functionality, and a utility program (aceparser) for importing .ace files generated by phrap into trapper.

Test files:

test.seq - FASTA file with simulated shotgun reads sampling repeated sequence with 2% mean difference between any two repeat units.
test.qual - FASTA file with corresponding quality values.
test.ace - ACE file produced by phrap for this dataset.

Tutorial Start

NOTA BENE: In this simple tutorial, the sequence reads have been trimmed and low quality sequence has been removed before assembly with phrap. In the normal case, the user should run the "Quality Trim seqs" operation before DNP analysis. In this example, this is not needed.

Go to the trapper installation directory. Run the aceparser program (bin/aceparser) without any parameters to see a short help text on how it is used. Run it again with test.ace and test.qual to generate test.ace.xml. This might take a short while.
Make an empty directory called trappertest that will be the new project directory.
Open DNPTrapper: bash$ bin/trapper -d /path/to/trappertest -c config/trapperconf.xml
From the Tools menu, choose "Import project" and choose the generated trappertest.xml in the file dialog.
From the Contig menu, choose "Open" (or click the directory icon on the toolbar). A directory called "Contig0" should now have appeared in your project directory. Choose it and click OK.
You should now see an alignment consisting of black boxes on white background. They represent the reads in the contig. If you zoom in, you will see the base sequences and quality values visualized. Zooming can be performed using toolbar buttons or by choosing from the View menu, alternativly you can press the "+" and "-" keys on your keyboard. If you press Ctrl at the same time, the zooming is only performed in the X-direction. Similarly, pressing Ctrl + Alt zooms in Y-direction only. You can scroll using the mouse, or by using the arrow keys. Hitting Ctrl while using the arrow keys speeds the scrolling up.
Try selecting and un-selecting some reads by Ctrl-clicking them. Also try the "rubber band" functionality for read selection by pressing the left mouse button where it doesn't hit a read and drag the mouse pointer. Select all the reads by either a) choosing "Select All" from the Edit menu, b) hitting Ctrl + A on the keyboard, or c) pressing the "Select All" button in the toolbar. If you click where no read is located, all reads are de-selected. Right click on a read and get familiar with the different options in the context menu that pops up (see above for more detailed description of this menu).
As you notice when you zoom out, each read is located on a separate row. This makes it hard to get an overview of the contig, so we'd like to reorganize the reads to occupy less vertical space. Select all reads, then right click on one of the selected reads and choose "Compact Reads" from the Operations entry in the context menu. You will be asked to decide at what row the compacted alignment should start - choose row no. 1.
Look around in the alignment by zooming in and out, scrolling etc. If you look closely, you'll see that the alignment isn't locally optimal everywhere. This is because phrap doesn't optimze the alignment. To optimize the alignment, choose "ReAlign" from the Operations option in the context menu. This will take a few seconds.
Now it's time to try separating repeats. Make sure all reads are selected and run the "Find DNPs" operation from the context menu. This takes a few seconds. For more information about DNPs, see these references.
As you see, a lot of dots with different colors appear in the alignment. They represent DNPs. Let's try some different ways of resolving the repeats.
Select one read containing DNPs and drag it below the alignment. Using your eyes, try to spot other reads containing DNPs with the same color on the same column. Drag them down too. If you see "new" DNPs to the left or right of the first one, look for them in the alignment also and drag the reads down to fit with your current working group. As you notice, you're only allowed to drag the reads vertically. This makes sense, since you want to keep the bases aligned properly. Try this for a while in order to get a feel of how you can organize the reads into different repeat groups.
As you notice, this is a slow and tedious way of doing things. Let's try a faster version of the same method. Compact the reads again (see step 9). Again, choose a read containing DNPs and drag it below the alignment. Choose a DNP and find out it's ID by right clicking it and looking it up under "Feature info" under the Info subsection of the context menu. If you don't see it, it's because you didn't hit it, which could be due to the fact that one pixel corresponds to several positions when you're zoomed out. But not to worry. If you look under "Read info", you'll see that all DNPs are listed for the read along with their positions and IDs. Also listed is the current position in the read. The DNP you're interested in is probably located quite near this position. Remember its ID.
Now select all reads above the DNP you chose to work with, by dragging the "rubber band" so that it intersects all reads at that position, or by choosing "Select between rows" from the Edit menu and typing in 1 and a large number as parameters. Right click on one of the selected reads and choose "Search reads for DNP ID". In the query box that pops up, type in the ID for the DNP you're working with. As you see, the reads that have this DNP will remain selected, while the others are un-selected. Now you can simply press one of the selected reads and drag all of them to the read you started with.
Choose an other DNP in your current "working set" by repeating step 14 and 15, but instead of dragging the reads manually to your group, use the "Compact Reads" operation in order to put them in a nice compact fashion starting on a row just below your group.
Now let's try the fastest method of resolving the repeats. Again, compact all the reads (see step 9). Make sure all reads are selected and run "Sort reads according to DNPs". You'll be asked for a starting row, choose no. 1. As you see, the repeats organize themselves into fairly neat piles with similar DNP content. In some ambigous cases the algo will make a wrong choice, but that's quite easy to correct manually using the techniques from the previous steps.
Finally, it's time to try to connect the different repeat groups. It is easy to verify that phrap has resolved the repeats partially. It can be done e.g. by exporting the consensus sequence ("Export Consensus" under Operations) and viewing a dot plot of it against itself using a suitable tool (e.g. Dotter). Actually, phrap has separated the repeats into two units. This means that some DNPs should be present in two versions, one on the left hand side of the alignment and one on the right hand side. This is the case for some DNPs in this example. The DNP colors reflect what type of DNP we have. For instance a red DNP means that the DNP is an A, while the consensus on that column is T. See below for complete color code of DNPs. In our example, some DNP patterns can be detected visually. This can be a bit hard to see, so you get a hint: find and look at DNP IDs 18 and 23. They correspond to DNP IDs 549 and 550 both in color and in spacing. Verify this.
Create a new contig using "Open". In the file dialog, create a new directory and name it (e.g. to NewContig). Choose it. Under the Window menu, choose "Tile". Now you see the new contig and the old one next to each other. If you think it's more convenient, you can manually arrange the contigs on top of each other instead. Now select all reads in the group containing DNPs 18 and 23. Choose "Drag Mode" from the Tools menu. Now you can drag the selected reads into the new contig. This copies them to the new contig, so they're still left in the old one - if you want to remove them, you have to use "Cut".
Repeat the above step for the group containing DNPs 549 and 550. Try to align the matching DNPs against each other. If you zoom in, you'll see that they don't match exactly(there are some gaps in one of the groups that displace one of the DNPs), but if you ReAlign you'll see that they match perfectly. Presto!

DNP Colors

A->T	Red
T->A	Dark Red
A->G	Green
G->A	Dark Green
A->C	Blue
C->A	Dark Blue
T->G	Cyan
G->T	Dark Cyan
T->C	Magenta
C->T	Dark Magenta
G->C	Yellow
C->G	Dark Yellow

References

1: Bioinformatics. 2002 Mar;18(3):379-88.

Separation of nearly identical repeats in shotgun assemblies using defined
nucleotide positions, DNPs.

Tammi MT, Arner E, Britton T, Andersson B.

2: Comput Methods Programs Biomed. 2003 Jan;70(1):47-59.

TRAP: Tandem Repeat Assembly Program produces improved shotgun assemblies of
repetitive sequences.

Tammi MT, Arner E, Andersson B.

3: Nucleic Acids Res. 2003 Aug 1;31(15):4663-72.

Correcting errors in shotgun sequences.

Tammi MT, Arner E, Kindlund E, Andersson B.

4: Bioinformatics. 2004 Mar 22;20(5):803-4. Epub 2004 Jan 29.

ReDiT: Repeat Discrepancy Tagger--a shotgun assembly finishing aid.

Tammi MT, Arner E, Kindlund E, Andersson B.

5. BMC Bioinformatics, 2006 Mar 20; 7:155

DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions.

Arner, E, Tammi, MT, Tran, AN, Kindlund, E, Andersson B.