odgi bin¶
binning of pangenome sequence and path information in the graph
SYNOPSIS¶
odgi bin [-i, –idx=FILE] [OPTION]…
DESCRIPTION¶
The odgi bin(1) command bins a given variation graph. The pangenome
sequence, the one-time traversal of all nodes from smallest to largest
node identifier, can be summed up into bins of a specified size. For
each bin, the path metainformation is summarized. This enables a
summarized view of gigabase scale graphs. Each step of a path is a bin
and connected to its next bin via a link. A link has a start bin
identifier and an end bin identifier.
The concept of odgi bin is also applied in odgi viz. A demonstration of how the odgi
bin JSON output can be used for an interactive visualization is
realized in the Pantograph
project. Per default, odgi bin writes the bins to stdout in a
tab-delimited format: path.name, path.prefix, path.suffix,
bin (bin identifier), mean.cov (mean coverage of the path in
this bin), mean.inv (mean inversion rate of this path in this
bin), mean.pos (mean nucleotide position of this path in this
bin), first.nucl (first nucleotide position of this path in this
bin), last.nucl (last nucleotide position of this path in this
bin). These nucleotide ranges might span positions that are not
present in the bin. Example: A range of 1-100 means that the first
nucleotide has position 1 and the last has position 100, but
nucleotide 45 could be located in another bin. For an exact positional
output, please specify [-j, –json].
Running odgi bin in
HaploBlocker mode, only
arguments [-b, –haplo-blocker], [-p[N],
–haplo-blocker-min-paths[N]], and [-c[N],
–haplo-blocker-min-coverage[N]] are required. A TSV is printed to
stdout: Each row corresponds to a node. Each column corresponds to a
path. Each value is the coverage of a specific node of a specific
path.
OPTIONS¶
Graph Files IO¶
-i, –idx=FILE
File containing the succinct variation graph to investigate the bin
from. The file name usually ends with .og.
FASTA Options¶
-f, –fasta=FILE
Write the pangenome sequence to FILE in FASTA format.
Bin Options¶
-n, –number-bins=N
The number of bins the pangenome sequence should be chopped up to.
-w, –bin-width=N
The bin width specifies the size of each bin.
-D, –path-delim=STRING
Annotate rows by prefix and suffix of this delimiter.
-a, –aggregate-delim
Aggregate on path prefix delimiter. Argument depends on [-D,
–path-delim=STRING].
-j, –json
Print bins and links to stdout in pseudo JSON format. Each line is a
valid JSON object, but the whole file is not a valid JSON! First, each
bin including its pangenome sequence is printed to stdout per line.
Second, for each path in the graph, its traversed bins including
metainformation: bin (bin identifier), mean.cov (mean coverage
of the path in this bin), mean.inv (mean inversion rate of this
path in this bin), mean.pos (mean nucleotide position of this path
in this bin), and an array of ranges determining the nucleotide
position of the path in this bin. Switching first and last nucleotide
in a range represents a complement reverse orientation of that
particular sequence.
-s, –no-seqs
If [-j, –json] is set, no nucleotide sequences will be printed to
stdout in order to save disk space.
-g, –no-gap-links
We divide links into 2 classes:
the links which help to follow complex variations. They need to be drawn, else one could not follow the sequence of a path.
the links helping to follow simple variations. These links are called gap-links. Such links solely connecting a path from left to right may not be relevant to understand a path’s traversal through the bins. Therefore, when this option is set, the gap-links are left out saving disk space
HaploBlocker Options¶
-b, –haplo-blocker
Write a TSV to stdout formatted in a way ready for HaploBlocker: Each
row corresponds to a node. Each column corresponds to a path. Each
value is the coverage of a specific node of a specific path.
-p[N], –haplo-blocker-min-paths[N]
Specify the minimum number of paths that need to be present in the bin
to actually report that bin. The default value is 1.
-c[N], –haplo-blocker-min-coverage[N]
Specify the minimum coverage a path needs to have in a bin to actually
report that bin. The default value is 1.
Program Information¶
-h, –help
Print a help message for odgi bin.
-P, –progress
Write the current progress to stderr.