.. _odgi: ######### odgi ######### dynamic succinct variation graph tool SYNOPSIS ======== :ref:`odgi bin` -i graph.og -j -w 100 -s -g :ref:`odgi break` -i graph.og -o graph.broken.og -s 100 -d :ref:`odgi build` -g graph.gfa -o graph.og :ref:`odgi chop` -i graph.og -o graph.choped.og -c 1000 :ref:`odgi cover` -i graph.og -o graph.paths.og :ref:`odgi degree` -i graph.og -S :ref:`odgi depth` -i graph.og :ref:`odgi draw` -i graph.og -c coords.lay -p .png -x 1920 -y 1080 -R -t 28 :ref:`odgi explode` -i graph.og -p prefix :ref:`odgi extract` -i graph.og -p prefix -r path_name:0-17 :ref:`odgi flatten` -i graph.og -f FASTA.fa -b BED.tsv :ref:`odgi groom` -i graph.og -o graph.groomed.og :ref:`odgi kmers` -i graph.og -c -k 23 -e 34 -D 50 :ref:`odgi layout` -i graph.og -o graph.og.lay :ref:`odgi matrix` -i graph.og -e -d :ref:`odgi normalize` -i graph.og -o graph.normalized.og -I 100 -d :ref:`odgi overlap` -i graph.og -r path_name :ref:`odgi panpos` -i graph.og -p Chr1 -n 4 :ref:`odgi pathindex` -i graph.og -o graph.xp :ref:`odgi paths` -i graph.og -f :ref:`odgi position` -i target_graph.og -g :ref:`odgi prune` -i graph.og -o graph.pruned.og -c 3 -C 345 -T :ref:`odgi server` -i graph.og -p 4000 -ip 192.168.8.9 :ref:`odgi sort` -i graph.og -o graph.sorted.og -p bSnSnS :ref:`odgi squeeze` -f input_graphs.txt -o graphs.og :ref:`odgi stats` -i graph.og -S :ref:`odgi test` :ref:`odgi unchop` -i graph.og -o graph.unchopped.og :ref:`odgi unitig` -i graph.og -f -t 1324 -l 120 :ref:`odgi validate` -i graph.og :ref:`odgi version` :ref:`odgi view` -i graph.og -g :ref:`odgi viz` -i graph.og -o graph.og.png -x 1920 -y 1080 -R -t 28 DESCRIPTION =========== **odgi**, the **Optimized Dynamic (genome) Graph Interface**, links a thrifty dynamic in-memory variation graph data model to a set of algorithms designed for scalable sorting, pruning, transformation, and visualization of very large `genome graphs `__. **odgi** includes :ref:`python bindings` that can be used to :ref:`directly interface with its data model `. This **odgi** manual provides detailed information about its features and subcommands, including examples. COMMANDS ======== Each command has its own man page which can be viewed using e.g. **man odgi_build**. Below we have a brief summary of syntax and subcommand description. | **odgi bin** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… | The odgi bin(1) command bins a given variation graph. The pangenome sequence, the one-time traversal of all nodes from smallest to largest node identifier, can be summed up into bins of a specified size. For each bin, the path metainformation is summarized. This enables a summarized view of gigabase scale graphs. Each step of a path is a bin and connected to its next bin via a link. A link has a start bin identifier and an end bin identifier. | The concept of odgi bin is also applied in :ref:`odgi viz`. A demonstration of how the odgi bin JSON output can be used for an interactive visualization is realized in the `Pantograph `__ project. Per default, odgi bin writes the bins to stdout in a tab-delimited format: **path.name**, **path.prefix**, **path.suffix**, **bin** (bin identifier), **mean.cov** (mean coverage of the path in this bin), **mean.inv** (mean inversion rate of this path in this bin), **mean.pos** (mean nucleotide position of this path in this bin), **first.nucl** (first nucleotide position of this path in this bin), **last.nucl** (last nucleotide position of this path in this bin). These nucleotide ranges might span positions that are not present in the bin. Example: A range of 1-100 means that the first nucleotide has position 1 and the last has position 100, but nucleotide 45 could be located in another bin. For an exact positional output, please specify [**-j, –json**]. | **odgi break** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi break(1) command finds cycles in a graph via `breadth-first search (BFS) `__ and breaks them, also dropping the graph’s paths. | **odgi build** [**-g, –gfa**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi build(1) command constructs a succinct variation graph from a GFA. Currently, only GFA1 is supported. For details of the format please see https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md. | **odgi chop** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [**-c, –chop-to**\ =\ *N*] [*OPTION*]… | The odgi chop(1) command chops long nodes into short ones while preserving the graph topology. | **odgi cover** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi cover(1) command finds a path cover of a variation graph, with a specified number of paths per component. **odgi degree** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… The odgi degree(1) command describes the graph in terms of node degree. For the input graph, it shows the node.count, edge.count, avg.degree, min.degree, and max.degree. **odgi depth** [**-i, –input**\ =\ *FILE*] [*OPTION*]… The odgi depth(1) command finds the depth of graph as defined by query criteria. **odgi draw** [**-i, –idx**\ =\ *FILE*] [**-c, –coords-in**\ =\ *FILE*] [**-p, –png**\ =\ *FILE*] [*OPTION*]… The odgi draw(1) command draws previously-determined 2D layouts of the graph with diverse annotations. | **odgi explode** [**-i, –idx**\ =\ *FILE*] [**-p, –prefix**\ =\ *STRING*] [*OPTION*]… | The odgi explode(1) command breaks a graph into connected components, writing each component in its own file. **odgi extract** [**-f, –input-graphs**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… The odgi extract(1) command extracts parts of the graph as defined by query criteria. | **odgi flatten** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… | The odgi flatten(1) command projects the graph sequence and paths into FASTA and BED. | **odgi kmers** [**-i, –idx**\ =\ *FILE*] [**-c, –stdout**] [*OPTION*]… | Given a kmer length, the odgi kmers(1) command can emit all kmers. The output can be refined by setting the maximum number of furcations at edges or by not considering nodes above a given node degree limit. | **odgi layout** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi layout(1) command computes 2D layouts of the graph using stochastic gradient descent (SGD). The input graph must be sorted and id-compacted. The algorithm itself is described in `Graph Drawing by Stochastic Gradient Descent `__. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies SGD to move a single pair of nodes at a time. | **odgi matrix** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… | The odgi matrix(1) command generates a sparse matrix format out of the graph topology of a given variation graph. | **odgi normalize** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi normalize(1) command unchops :ref:`odgi unchop` a given variation graph and simplifies redundant furcations. **odgi overlap** [**-i, –input**\ =\ *FILE*] [*OPTION*]… The odgi overlap(1) command finds the paths touched by the input paths. | **odgi panpos** [**-i, –idx**\ =\ *FILE*] [**-p, –path**\ =\ *STRING*] [**-n, –nuc-pos**\ =\ *N*] [*OPTION*]… | The odgi panpos(1) command give a pangenome position for a given path and nucleotide position. It requires a path index, which can be created with :ref:`odgi pathindex`. Going from **path:position** → **pangenome:position** is important when navigating large graphs in an interactive manner like in the `Pantograph `__ project. All input and output positions are 1-based. | **odgi pathindex** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi pathindex(1) command generates a path index of a graph. It uses succinct data structures to encode the index. The path index represents a subset of the features of a fully realized `xg index `__. Having a path index, we can use :ref:`odgi panpos` to go from **path:position** → **pangenome:position** which is important when navigating large graphs in an interactive manner like in the `Pantograph `__ project. | **odgi paths** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… | The odgi paths(1) command allows the investigation of paths of a given variation graph. It can calculate overlap statistics of groupings of paths. **odgi position** [**-i, –target**\ =\ *FILE*] [*OPTION*]… The odgi position(1) command position parts of the graph as defined by query criteria. | **odgi prune** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi prune(1) command can remove complex parts of a graph. One can drop paths, nodes by a certain kind of edge coverage, edges and graph tips. Specifying a kmer length and a maximum number of furcations, the graph can be broken at edges not fitting into these conditions. | **odgi server** [**-i, –idx**\ =\ *FILE*] [**-p, –port**\ =\ *N*] [*OPTION*]… | The odgi server(1) command starts an HTTP server with a given path index as input. The idea is that we can go from **path:position** → **pangenome:position** via GET requests to the HTTP server. The server headers do not block cross origin requests. Example GET request: *http://localost:3000/path_name/nucleotide_position*. | The required path index can be created with :ref:`odgi pathindex`. Going from **path:position** → **pangenome:position** is important when navigating large graphs in an interactive manner like in the `Pantograph `__ project. All input and output positions are 1-based. If no IP address is specified, the server will run on localhost. | **odgi sort** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi sort(1) command sorts a succinct variation graph. The command offers a diverse palette of sorting algorithms to determine the node order: - A topological sort: A graph can be sorted via `breadth-first search (BFS) `__ or `depth-first search (DFS) `__. Optionally, a chunk size specifies how much of the graph to grab at once in each topological sorting phase. The sorting algorithm will continue the sort from the next node in the prior graph order that has not been sorted, yet. The cycle breaking algorithm applies a DFS sort until a cycle is found. We break and start a new DFS sort phase from where we stopped. - A random sort: The graph is randomly sorted. The node order is randomly shuffled from `Mersenne Twister pseudo-random `__ generated numbers. - A sparse matrix mondriaan sort: We can partition a hypergraph with integer weights and uniform hyperedge costs using the `Mondriaan `__ partitioner. - A 1D linear SGD sort: Odgi implements a 1D linear, variation graph adjusted, multi-threaded version of the `Graph Drawing by Stochastic Gradient Descent `__ algorithm. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time. - An eades algorithmic sort: Use `Peter Eades’ heuristic for graph drawing `__. Sorting the paths in a graph my refine the sorting process. For the users’ convenience, it is possible to specify a whole pipeline of sorts within one parameter. **odgi squeeze** [**-f, –input-graphs**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… The odgi squeeze(1) command squeezes multiple graphs into the same file. | **odgi stats** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… | The odgi stats(1) command produces statistics of a variation graph. Among other metrics, it can calculate the #nodes, #edges, #paths and the total nucleotide length of the graph. Various histogram summary options complement the tool. If [**-B, –bed-multicov**\ =\ *BED*] is set, the metrics will be produced for the intervals specified in the BED. | **odgi test** [ …] [*OPTION*]… | The odgi test(1) command starts all unit tests that are implemented in odgi. For targeted testing, a subset of tests can be selected. odgi test(1) depends on `Catch2 `__. In the default setting, all results are printed to stdout. | **odgi unchop** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi unchop(1) command merges each unitig into a single node. | **odgi unitig** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… | The odgi unitig(1) command can print all unitigs of a given odgi graph to standard output in FASTA format. Unitigs can also be emitted in a fixed sequence quality FASTQ format. Various parameters can refine the unitigs to print. **odgi validate** [**-i, –input**\ =\ *FILE*] [*OPTION*]… The odgi validate(1) command validates the graph (currently, it checks if the paths are consistent with the graph topology). | **odgi version** [*OPTION*]… | The odgi version(1) command prints the current git version with tags and codename to stdout (like *v-44-g89d022b “back to old ABI”*). Optionally, only the release, version or codename can be printed. | **odgi view** [**-i, –idx**\ =\ *FILE*] [*OPTION*]… | The odgi view(1) command can convert a graph in odgi format to GFAv1. It can reveal a graph’s internal structures for e.g. debugging processes. | **odgi viz** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… | The odgi viz(1) command can produce a linear, static visualization of an odgi variation graph. It aggregates the pangenome into bins and directly renders a raster image. The binning level depends on the target width of the PNG to emit. Can be used to produce visualizations for gigabase scale pangenomes. For more information about the binning process, please refer to :ref:`odgi bin`. If reverse coloring was selected, only the bins with a reverse rate of at least 0.5 are colored. Currently, there is no parameter to color according to the sequence coverage in bins available. BUGS ==== Refer to the **odgi** issue tracker at https://github.com/pangenome/odgi/issues. AUTHORS ======= Erik Garrison from the University of California Santa Cruz wrote the whole **odgi** tool. Andrea Guarracino from the University of Rome Tor Vergata wrote **odgi viz**, **odgi extract**, **odgi cover**, **odgi explode**, **odgi groom**, **odgi squeeze**, **odgi depth**, **odgi layout**, **odgi sort**, **odgi stats**, **odgi overlap**, **odgi validate**, **odgi unchop**, **odgi test**, and this documentation. Simon Heumos from the Quantitative Biology Center Tübingen wrote **odgi bin**, **odgi layout**, **odgi sort**, **odgi pathindex**, **odgi panpos**, **odgi server**, **odgi test**, **odgi version**, and this documentation. RESOURCES ========= **Project web site:** https://github.com/pangenome/odgi **Git source repository on GitHub:** https://github.com/pangenome/odgi **GitHub organization:** https://github.com/pangenome **Discussion list / forum:** https://github.com/pangenome/odgi/issues COPYING ======= The MIT License (MIT) Copyright (c) 2019-2021 Erik Garrison Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.