.. _odgi sort: ######### odgi sort ######### sort a variation graph SYNOPSIS ======== **odgi sort** [**-i, –idx**\ =\ *FILE*] [**-o, –out**\ =\ *FILE*] [*OPTION*]… DESCRIPTION =========== The odgi sort(1) command sorts a succinct variation graph. Odgi sort offers a diverse palette of sorting algorithms to determine the node order: - A topological sort: A graph can be sorted via `breadth-first search (BFS) `__ or `depth-first search (DFS) `__. Optionally, a chunk size specifies how much of the graph to grab at once in each topological sorting phase. The sorting algorithm will continue the sort from the next node in the prior graph order that has not been sorted, yet. The cycle breaking algorithm applies a DFS sort until a cycle is found. We break and start a new DFS sort phase from where we stopped. - A random sort: The graph is randomly sorted. The node order is randomly shuffled from `Mersenne Twister pseudo-random `__ generated numbers. - A 1D linear SGD sort: Odgi implements a 1D linear, variation graph adjusted, multi-threaded version of the `Graph Drawing by Stochastic Gradient Descent `__ algorithm. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time. - A path guided, 1D linear SGD sort: Odgi implements a 1D linear, variation graph adjusted, multi-threaded version of the `Graph Drawing by Stochastic Gradient Descent `__ algorithm. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time. The path index is used to pick the terms to move stochastically. If ran with 1 thread only, the resulting order of the graph is deterministic. The seed is adjustable. Sorting the paths in a graph my refine the sorting process. For the users’ convenience, it is possible to specify a whole pipeline of sorts within one parameter. OPTIONS ======= Graph Files IO -------------- | **-i, –idx**\ =\ *FILE* | File containing the succinct variation graph to sort. The file name usually ends with *.og*. | **-o, –out**\ =\ *FILE* | Write the sorted dynamic succinct variation graph to this file. A file ending with *.og* is recommended. | **-s, –sort-order**\ =\ *FILE* | File containing the sort order. Each line contains one node identifier. Topological Sorts ----------------- | **-b, –breadth-first** | Use a (chunked) breadth first topological sort. | **-B, –breadth-first-chunk**\ =\ *N* | Chunk size for breadth first topological sort. Specify how many nucleotides to grap at once in each BFS phase. | **-z, –depth-first** | Use a (chunked) depth first topological sort. | **-Z, –depth-first-chunk**\ =\ *N* | Chunk size for the depth first topological sort. Specify how many nucleotides to grap at once in each DFS phace. | **-w, –two-way** | Use a two-way topological algorithm for sorting. It is a maximum of head-first and tail-first topological sort. | **-n, –no-seeds** | Don’t use heads or tails to seed topological sort. | **-c, –cycle-breaking** | Use a cycle breaking sort. Random Sort ----------- | **-r, –random** | Randomly sort the graph. Path Guided 1D Linear SGD Sort ------------------------------ | **-Y, –path-sgd** | Apply path guided 1D linear SGD algorithm to organize the graph. | **-X, –path-index**\ =\ *FILE* | Load the path index from this *FILE*. | **-f, –path-sgd-use-paths**\ =FILE | Specify a line separated list of paths to sample from for the on the fly term generation process in the path guided linear 1D SGD. The default value are *all paths*. | **-G, –path-sgd-min-term-updates-paths**\ =\ *N* | The minimum number of terms to be updated before a new path guided linear 1D SGD iteration with adjusted learning rate eta starts, expressed as a multiple of total path steps. The default value is *1.0*. Can be overwritten by *-U, -path-sgd-min-term-updates-nodes=N*. | **-U, –path-sgd-min-term-updates-nodes**\ =\ *N* | The minimum number of terms to be updated before a new path guided linear 1D SGD iteration with adjusted learning rate eta starts, expressed as a multiple of the number of nodes. Per default, the argument is not set. The default of *-G, path-sgd-min-term-updates-paths=N* is used). | **-j, –path-sgd-delta**\ =\ *N* | The threshold of maximum displacement approximately in bp at which to stop path guided linear 1D SGD. Default values is *0.0*. | **-g, –path-sgd-eps**\ =\ *N* | The final learning rate for path guided linear 1D SGD model. The default value is *0.01*. | **-v, –path-sgd-eta-max**\ =\ *N* | The first and maximum learning rate for path guided linear 1D SGD model. The default value is *squared steps of longest path in graph*. | **-a, –path-sgd-zipf-theta**\ =\ *N* | The theta value for the Zipfian distribution which is used as the sampling method for the second node of one term in the path guided linear 1D SGD model. The default value is *0.99*. | **-x, –path-sgd-iter-max**\ =\ *N* | The maximum number of iterations for path guided linear 1D SGD model. The default value is *30*. | **-F, –iteration-max-learning-rate**\ =\ *N* | The iteration where the learning rate is max for path guided linear 1D SGD model. The default value is *0*. | **-k, –path-sgd-zipf-space**\ =\ *N* | The maximum space size of the Zipfian distribution which is used as the sampling method for the second node of one term in the path guided linear 1D SGD model. The default value is the *longest path length*. | **-I, –path-sgd-zipf-space-max**\ =\ *N* | The maximum space size of the Zipfian distribution beyond which quantization occurs. Default value is *100*. | **-l, –path-sgd-zipf-space-quantization-step**\ =\ *N* | Quantization step size when the maximum space size of the Zipfian distribution is exceeded. Default value is *100*. | **-y, –path-sgd-zipf-max-num-distributions**\ =\ *N* | Approximate maximum number of Zipfian distributions to calculate. The default value is *100*. | **-q, –path-sgd-seed**\ =\ *N* | Set the seed for the deterministic 1-threaded path guided linear 1D SGD model. The default value is *pangenomic!*. | **-u, –path-sgd-snapshot**\ =\ *STRING* | Set the prefix to which each snapshot graph of a path guided 1D SGD iteration should be written to. This is turned off per default. This argument only works when *-Y, –path-sgd* was specified. Not applicable in a pipeline of sorts. Path Sorting Options -------------------- | **-L, –paths-min** | Sort paths by their lowest contained node identifier. | **-M, –paths-max** | Sort paths by their highest contained node identifier. | **-A, –paths-avg** | Sort paths by their average contained node identifier. | **-R, –paths-avg-rev** | Sort paths in reverse by their average contained node identifier. | **-D, –path-delim**\ =\ *path-delim* | Sort paths in bins by their prefix up to this delimiter. Pipeline Sorting ---------------- | **-p, –pipeline**\ =\ *STRING* | Apply a series of sorts, based on single character command line arguments given to this command. The default sort is *s*. The reverse sort would be specified via *f*. Additional Parameters --------------------- | **-d, –dagify-sort** | Sort on the basis of a DAGified graph. | **-O, –Optimize** | Use the MutableHandleGraph::optimize method to compact the node identifier space. Threading --------- | **-t, –threads**\ =\ *N* | Number of threads to use for the parallel operations. Processing Information ---------------------- | **-P, –progress** | Print sort progress to stdout. Program Information ------------------- | **-h, –help** | Print a help message for **odgi sort**. .. EXIT STATUS =========== | **0** | Success. | **1** | Failure (syntax or usage error; parameter error; file processing failure; unexpected error). BUGS ==== Refer to the **odgi** issue tracker at https://github.com/pangenome/odgi/issues.