.. _odgi bin:
#########
odgi bin
#########
binning of pangenome sequence and path information in the graph
SYNOPSIS
========
**odgi bin** [**-i, –idx**\ =\ *FILE*] [*OPTION*]…
DESCRIPTION
===========
| The odgi bin(1) command bins a given variation graph. The pangenome
sequence, the one-time traversal of all nodes from smallest to largest
node identifier, can be summed up into bins of a specified size. For
each bin, the path metainformation is summarized. This enables a
summarized view of gigabase scale graphs. Each step of a path is a bin
and connected to its next bin via a link. A link has a start bin
identifier and an end bin identifier.
| The concept of odgi bin is also applied in :ref:`odgi viz`. A demonstration of how the odgi
bin JSON output can be used for an interactive visualization is
realized in the `Pantograph `__
project. Per default, odgi bin writes the bins to stdout in a
tab-delimited format: **path.name**, **path.prefix**, **path.suffix**,
**bin** (bin identifier), **mean.cov** (mean coverage of the path in
this bin), **mean.inv** (mean inversion rate of this path in this
bin), **mean.pos** (mean nucleotide position of this path in this
bin), **first.nucl** (first nucleotide position of this path in this
bin), **last.nucl** (last nucleotide position of this path in this
bin). These nucleotide ranges might span positions that are not
present in the bin. Example: A range of 1-100 means that the first
nucleotide has position 1 and the last has position 100, but
nucleotide 45 could be located in another bin. For an exact positional
output, please specify [**-j, –json**].
| Running odgi bin in
`HaploBlocker `__ mode, only
arguments [**-b, –haplo-blocker**], [**-p[N],
–haplo-blocker-min-paths[N]**], and [**-c[N],
–haplo-blocker-min-coverage[N]**] are required. A TSV is printed to
stdout: Each row corresponds to a node. Each column corresponds to a
path. Each value is the coverage of a specific node of a specific
path.
OPTIONS
=======
Graph Files IO
--------------
| **-i, –idx**\ =\ *FILE*
| File containing the succinct variation graph to investigate the bin
from. The file name usually ends with *.og*.
FASTA Options
-------------
| **-f, –fasta**\ =\ *FILE*
| Write the pangenome sequence to *FILE* in FASTA format.
Bin Options
-----------
| **-n, –number-bins**\ =\ *N*
| The number of bins the pangenome sequence should be chopped up to.
| **-w, –bin-width**\ =\ *N*
| The bin width specifies the size of each bin.
| **-D, –path-delim**\ =\ *STRING*
| Annotate rows by prefix and suffix of this delimiter.
| **-a, –aggregate-delim**
| Aggregate on path prefix delimiter. Argument depends on [**-D,
–path-delim**\ =\ *STRING*].
| **-j, –json**
| Print bins and links to stdout in pseudo JSON format. Each line is a
valid JSON object, but the whole file is not a valid JSON! First, each
bin including its pangenome sequence is printed to stdout per line.
Second, for each path in the graph, its traversed bins including
metainformation: **bin** (bin identifier), **mean.cov** (mean coverage
of the path in this bin), **mean.inv** (mean inversion rate of this
path in this bin), **mean.pos** (mean nucleotide position of this path
in this bin), and an array of ranges determining the nucleotide
position of the path in this bin. Switching first and last nucleotide
in a range represents a complement reverse orientation of that
particular sequence.
| **-s, –no-seqs**
| If [**-j, –json**] is set, no nucleotide sequences will be printed to
stdout in order to save disk space.
| **-g, –no-gap-links**
| We divide links into 2 classes:
1. the links which help to follow complex variations. They need to be
drawn, else one could not follow the sequence of a path.
2. the links helping to follow simple variations. These links are called
**gap-links**. Such links solely connecting a path from left to right
may not be relevant to understand a path’s traversal through the
bins. Therefore, when this option is set, the gap-links are left out
saving disk space
HaploBlocker Options
--------------------
| **-b, –haplo-blocker**
| Write a TSV to stdout formatted in a way ready for HaploBlocker: Each
row corresponds to a node. Each column corresponds to a path. Each
value is the coverage of a specific node of a specific path.
| **-p[N], –haplo-blocker-min-paths[N]**
| Specify the minimum number of paths that need to be present in the bin
to actually report that bin. The default value is 1.
| **-c[N], –haplo-blocker-min-coverage[N]**
| Specify the minimum coverage a path needs to have in a bin to actually
report that bin. The default value is 1.
Program Information
-------------------
| **-h, –help**
| Print a help message for **odgi bin**.
| **-P, –progress**
| Write the current progress to stderr.
..
EXIT STATUS
===========
| **0**
| Success.
| **1**
| Failure (syntax or usage error; parameter error; file processing
failure; unexpected error).
BUGS
====
Refer to the **odgi** issue tracker at
https://github.com/pangenome/odgi/issues.