odgi

dynamic succinct variation graph tool

SYNOPSIS

odgi bin -i graph.og -j -w 100 -s -g

odgi break -i graph.og -o graph.broken.og -s 100 -d

odgi build -g graph.gfa -o graph.og

odgi chop -i graph.og -o graph.choped.og -c 1000

odgi cover -i graph.og -o graph.paths.og

odgi degree -i graph.og -S

odgi depth -i graph.og

odgi draw -i graph.og -c coords.lay -p .png -x 1920 -y 1080 -R -t 28

odgi explode -i graph.og -p prefix

odgi extract -i graph.og -p prefix -r path_name:0-17

odgi flatten -i graph.og -f FASTA.fa -b BED.tsv

odgi groom -i graph.og -o graph.groomed.og

odgi kmers -i graph.og -c -k 23 -e 34 -D 50

odgi layout -i graph.og -o graph.og.lay

odgi matrix -i graph.og -e -d

odgi normalize -i graph.og -o graph.normalized.og -I 100 -d

odgi overlap -i graph.og -r path_name

odgi panpos -i graph.og -p Chr1 -n 4

odgi pathindex -i graph.og -o graph.xp

odgi paths -i graph.og -f

odgi position -i target_graph.og -g

odgi prune -i graph.og -o graph.pruned.og -c 3 -C 345 -T

odgi server -i graph.og -p 4000 -ip 192.168.8.9

odgi sort -i graph.og -o graph.sorted.og -p bSnSnS

odgi squeeze -f input_graphs.txt -o graphs.og

odgi stats -i graph.og -S

odgi test

odgi unchop -i graph.og -o graph.unchopped.og

odgi unitig -i graph.og -f -t 1324 -l 120

odgi validate -i graph.og

odgi version

odgi view -i graph.og -g

odgi viz -i graph.og -o graph.og.png -x 1920 -y 1080 -R -t 28

DESCRIPTION

odgi, the Optimized Dynamic (genome) Graph Interface, links a thrifty dynamic in-memory variation graph data model to a set of algorithms designed for scalable sorting, pruning, transformation, and visualization of very large genome graphs. odgi includes Python Binding that can be used to directly interface with its data model. This odgi manual provides detailed information about its features and subcommands, including examples.

COMMANDS

Each command has its own man page which can be viewed using e.g. man odgi_build. Below we have a brief summary of syntax and subcommand description.

odgi bin [-i, –idx=FILE] [OPTION]…
The odgi bin(1) command bins a given variation graph. The pangenome sequence, the one-time traversal of all nodes from smallest to largest node identifier, can be summed up into bins of a specified size. For each bin, the path metainformation is summarized. This enables a summarized view of gigabase scale graphs. Each step of a path is a bin and connected to its next bin via a link. A link has a start bin identifier and an end bin identifier.
The concept of odgi bin is also applied in odgi viz. A demonstration of how the odgi bin JSON output can be used for an interactive visualization is realized in the Pantograph project. Per default, odgi bin writes the bins to stdout in a tab-delimited format: path.name, path.prefix, path.suffix, bin (bin identifier), mean.cov (mean coverage of the path in this bin), mean.inv (mean inversion rate of this path in this bin), mean.pos (mean nucleotide position of this path in this bin), first.nucl (first nucleotide position of this path in this bin), last.nucl (last nucleotide position of this path in this bin). These nucleotide ranges might span positions that are not present in the bin. Example: A range of 1-100 means that the first nucleotide has position 1 and the last has position 100, but nucleotide 45 could be located in another bin. For an exact positional output, please specify [-j, –json].
odgi break [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi break(1) command finds cycles in a graph via breadth-first search (BFS) and breaks them, also dropping the graph’s paths.
odgi build [-g, –gfa=FILE] [-o, –out=FILE] [OPTION]…
The odgi build(1) command constructs a succinct variation graph from a GFA. Currently, only GFA1 is supported. For details of the format please see https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md.
odgi chop [-i, –idx=FILE] [-o, –out=FILE] [-c, –chop-to=N] [OPTION]…
The odgi chop(1) command chops long nodes into short ones while preserving the graph topology.
odgi cover [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi cover(1) command finds a path cover of a variation graph, with a specified number of paths per component.

odgi degree [-i, –idx=FILE] [OPTION]… The odgi degree(1) command describes the graph in terms of node degree. For the input graph, it shows the node.count, edge.count, avg.degree, min.degree, and max.degree.

odgi depth [-i, –input=FILE] [OPTION]… The odgi depth(1) command finds the depth of graph as defined by query criteria.

odgi draw [-i, –idx=FILE] [-c, –coords-in=FILE] [-p, –png=FILE] [OPTION]… The odgi draw(1) command draws previously-determined 2D layouts of the graph with diverse annotations.

odgi explode [-i, –idx=FILE] [-p, –prefix=STRING] [OPTION]…
The odgi explode(1) command breaks a graph into connected components, writing each component in its own file.

odgi extract [-f, –input-graphs=FILE] [-o, –out=FILE] [OPTION]… The odgi extract(1) command extracts parts of the graph as defined by query criteria.

odgi flatten [-i, –idx=FILE] [OPTION]…
The odgi flatten(1) command projects the graph sequence and paths into FASTA and BED.
odgi kmers [-i, –idx=FILE] [-c, –stdout] [OPTION]…
Given a kmer length, the odgi kmers(1) command can emit all kmers. The output can be refined by setting the maximum number of furcations at edges or by not considering nodes above a given node degree limit.
odgi layout [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi layout(1) command computes 2D layouts of the graph using stochastic gradient descent (SGD). The input graph must be sorted and id-compacted. The algorithm itself is described in Graph Drawing by Stochastic Gradient Descent. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies SGD to move a single pair of nodes at a time.
odgi matrix [-i, –idx=FILE] [OPTION]…
The odgi matrix(1) command generates a sparse matrix format out of the graph topology of a given variation graph.
odgi normalize [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi normalize(1) command unchops odgi unchop a given variation graph and simplifies redundant furcations.

odgi overlap [-i, –input=FILE] [OPTION]… The odgi overlap(1) command finds the paths touched by the input paths.

odgi panpos [-i, –idx=FILE] [-p, –path=STRING] [-n, –nuc-pos=N] [OPTION]…
The odgi panpos(1) command give a pangenome position for a given path and nucleotide position. It requires a path index, which can be created with odgi pathindex. Going from path:positionpangenome:position is important when navigating large graphs in an interactive manner like in the Pantograph project. All input and output positions are 1-based.
odgi pathindex [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi pathindex(1) command generates a path index of a graph. It uses succinct data structures to encode the index. The path index represents a subset of the features of a fully realized xg index. Having a path index, we can use odgi panpos to go from path:positionpangenome:position which is important when navigating large graphs in an interactive manner like in the Pantograph project.
odgi paths [-i, –idx=FILE] [OPTION]…
The odgi paths(1) command allows the investigation of paths of a given variation graph. It can calculate overlap statistics of groupings of paths.

odgi position [-i, –target=FILE] [OPTION]… The odgi position(1) command position parts of the graph as defined by query criteria.

odgi prune [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi prune(1) command can remove complex parts of a graph. One can drop paths, nodes by a certain kind of edge coverage, edges and graph tips. Specifying a kmer length and a maximum number of furcations, the graph can be broken at edges not fitting into these conditions.
odgi server [-i, –idx=FILE] [-p, –port=N] [OPTION]…
The odgi server(1) command starts an HTTP server with a given path index as input. The idea is that we can go from path:positionpangenome:position via GET requests to the HTTP server. The server headers do not block cross origin requests. Example GET request: http://localost:3000/path_name/nucleotide_position.
The required path index can be created with odgi pathindex. Going from path:positionpangenome:position is important when navigating large graphs in an interactive manner like in the Pantograph project. All input and output positions are 1-based. If no IP address is specified, the server will run on localhost.
odgi sort [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi sort(1) command sorts a succinct variation graph. The command offers a diverse palette of sorting algorithms to determine the node order:
  • A topological sort: A graph can be sorted via breadth-first search (BFS) or depth-first search (DFS). Optionally, a chunk size specifies how much of the graph to grab at once in each topological sorting phase. The sorting algorithm will continue the sort from the next node in the prior graph order that has not been sorted, yet. The cycle breaking algorithm applies a DFS sort until a cycle is found. We break and start a new DFS sort phase from where we stopped.

  • A random sort: The graph is randomly sorted. The node order is randomly shuffled from Mersenne Twister pseudo-random generated numbers.

  • A sparse matrix mondriaan sort: We can partition a hypergraph with integer weights and uniform hyperedge costs using the Mondriaan partitioner.

  • A 1D linear SGD sort: Odgi implements a 1D linear, variation graph adjusted, multi-threaded version of the Graph Drawing by Stochastic Gradient Descent algorithm. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time.

  • An eades algorithmic sort: Use Peter Eades’ heuristic for graph drawing.

Sorting the paths in a graph my refine the sorting process. For the users’ convenience, it is possible to specify a whole pipeline of sorts within one parameter.

odgi squeeze [-f, –input-graphs=FILE] [-o, –out=FILE] [OPTION]… The odgi squeeze(1) command squeezes multiple graphs into the same file.

odgi stats [-i, –idx=FILE] [OPTION]…
The odgi stats(1) command produces statistics of a variation graph. Among other metrics, it can calculate the #nodes, #edges, #paths and the total nucleotide length of the graph. Various histogram summary options complement the tool. If [-B, –bed-multicov=BED] is set, the metrics will be produced for the intervals specified in the BED.
odgi test [<TEST NAME|PATTERN|TAGS> …] [OPTION]…
The odgi test(1) command starts all unit tests that are implemented in odgi. For targeted testing, a subset of tests can be selected. odgi test(1) depends on Catch2. In the default setting, all results are printed to stdout.
odgi unchop [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi unchop(1) command merges each unitig into a single node.
odgi unitig [-i, –idx=FILE] [OPTION]…
The odgi unitig(1) command can print all unitigs of a given odgi graph to standard output in FASTA format. Unitigs can also be emitted in a fixed sequence quality FASTQ format. Various parameters can refine the unitigs to print.

odgi validate [-i, –input=FILE] [OPTION]… The odgi validate(1) command validates the graph (currently, it checks if the paths are consistent with the graph topology).

odgi version [OPTION]…
The odgi version(1) command prints the current git version with tags and codename to stdout (like v-44-g89d022b “back to old ABI”). Optionally, only the release, version or codename can be printed.
odgi view [-i, –idx=FILE] [OPTION]…
The odgi view(1) command can convert a graph in odgi format to GFAv1. It can reveal a graph’s internal structures for e.g. debugging processes.
odgi viz [-i, –idx=FILE] [-o, –out=FILE] [OPTION]…
The odgi viz(1) command can produce a linear, static visualization of an odgi variation graph. It aggregates the pangenome into bins and directly renders a raster image. The binning level depends on the target width of the PNG to emit. Can be used to produce visualizations for gigabase scale pangenomes. For more information about the binning process, please refer to odgi bin. If reverse coloring was selected, only the bins with a reverse rate of at least 0.5 are colored. Currently, there is no parameter to color according to the sequence coverage in bins available.

BUGS

Refer to the odgi issue tracker at https://github.com/pangenome/odgi/issues.

AUTHORS

Erik Garrison from the University of California Santa Cruz wrote the whole odgi tool.

Andrea Guarracino from the University of Rome Tor Vergata wrote odgi viz, odgi extract, odgi cover, odgi explode, odgi groom, odgi squeeze, odgi depth, odgi layout, odgi sort, odgi stats, odgi overlap, odgi validate, odgi unchop, odgi test, and this documentation.

Simon Heumos from the Quantitative Biology Center Tübingen wrote odgi bin, odgi layout, odgi sort, odgi pathindex, odgi panpos, odgi server, odgi test, odgi version, and this documentation.

RESOURCES

Project web site: https://github.com/pangenome/odgi

Git source repository on GitHub: https://github.com/pangenome/odgi

GitHub organization: https://github.com/pangenome

Discussion list / forum: https://github.com/pangenome/odgi/issues

COPYING

The MIT License (MIT)

Copyright (c) 2019-2021 Erik Garrison

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.