.. _odgi bin: ######### odgi bin ######### binning of pangenome sequence and path information in the graph SYNOPSIS ======== **odgi bin** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… DESCRIPTION =========== | The odgi bin(1) command bins a given variation graph. The pangenome sequence, the one-time traversal of all nodes from smallest to largest node identifier, can be summed up into bins of a specified size. For each bin, the path metainformation is summarized. This enables a summarized view of gigabase scale graphs. Each step of a path is a bin and connected to its next bin via a link. A link has a start bin identifier and an end bin identifier. | The concept of odgi bin is also applied in :ref:`odgi viz`. A demonstration of how the odgi bin JSON output can be used for an interactive visualization is realized in the `Pantograph `__ project. Per default, odgi bin writes the bins to stdout in a tab-delimited format: **path.name**, **path.prefix**, **path.suffix**, **bin** (bin identifier), **mean.cov** (mean coverage of the path in this bin), **mean.inv** (mean inversion rate of this path in this bin), **mean.pos** (mean nucleotide position of this path in this bin), **first.nucl** (first nucleotide position of this path in this bin), **last.nucl** (last nucleotide position of this path in this bin). These nucleotide ranges might span positions that are not present in the bin. Example: A range of 1-100 means that the first nucleotide has position 1 and the last has position 100, but nucleotide 45 could be located in another bin. For an exact positional output, please specify [**-j, –json**]. | Running odgi bin in `HaploBlocker `__ mode, only arguments [**-b, –haplo-blocker**], [**-p[N], –haplo-blocker-min-paths[N]**], and [**-c[N], –haplo-blocker-min-coverage[N]**] are required. A TSV is printed to stdout: Each row corresponds to a node. Each column corresponds to a path. Each value is the coverage of a specific node of a specific path. OPTIONS ======= Graph Files IO -------------- | **-i, –idx**\ =\ *FILE* | File containing the succinct variation graph to investigate the bin from. The file name usually ends with *.og*. FASTA Options ------------- | **-f, –fasta**\ =\ *FILE* | Write the pangenome sequence to *FILE* in FASTA format. Bin Options ----------- | **-n, –number-bins**\ =\ *N* | The number of bins the pangenome sequence should be chopped up to. | **-w, –bin-width**\ =\ *N* | The bin width specifies the size of each bin. | **-D, –path-delim**\ =\ *STRING* | Annotate rows by prefix and suffix of this delimiter. | **-a, –aggregate-delim** | Aggregate on path prefix delimiter. Argument depends on [**-D, –path-delim**\ =\ *STRING*]. | **-j, –json** | Print bins and links to stdout in pseudo JSON format. Each line is a valid JSON object, but the whole file is not a valid JSON! First, each bin including its pangenome sequence is printed to stdout per line. Second, for each path in the graph, its traversed bins including metainformation: **bin** (bin identifier), **mean.cov** (mean coverage of the path in this bin), **mean.inv** (mean inversion rate of this path in this bin), **mean.pos** (mean nucleotide position of this path in this bin), and an array of ranges determining the nucleotide position of the path in this bin. Switching first and last nucleotide in a range represents a complement reverse orientation of that particular sequence. | **-s, –no-seqs** | If [**-j, –json**] is set, no nucleotide sequences will be printed to stdout in order to save disk space. | **-g, –no-gap-links** | We divide links into 2 classes: 1. the links which help to follow complex variations. They need to be drawn, else one could not follow the sequence of a path. 2. the links helping to follow simple variations. These links are called **gap-links**. Such links solely connecting a path from left to right may not be relevant to understand a path’s traversal through the bins. Therefore, when this option is set, the gap-links are left out saving disk space HaploBlocker Options -------------------- | **-b, –haplo-blocker** | Write a TSV to stdout formatted in a way ready for HaploBlocker: Each row corresponds to a node. Each column corresponds to a path. Each value is the coverage of a specific node of a specific path. | **-p[N], –haplo-blocker-min-paths[N]** | Specify the minimum number of paths that need to be present in the bin to actually report that bin. The default value is 1. | **-c[N], –haplo-blocker-min-coverage[N]** | Specify the minimum coverage a path needs to have in a bin to actually report that bin. The default value is 1. Program Information ------------------- | **-h, –help** | Print a help message for **odgi bin**. | **-P, –progress** | Write the current progress to stderr. .. EXIT STATUS =========== | **0** | Success. | **1** | Failure (syntax or usage error; parameter error; file processing failure; unexpected error). BUGS ==== Refer to the **odgi** issue tracker at https://github.com/pangenome/odgi/issues.