Quick Start¶
A pangenome models the full set of genomic elements in a given species or clade. It can efficiently be encoded in the form of a variation graph, a type of sequence graph that embeds the linear sequences as paths in the graphs themselves.
To exchange pangenomes, the community frequently uses a strict subset of the Graphical Fragment Assembly GFA
format
version 1 (GFAv1). To navigate such files efficiently,
odgi
operates on a dynamic succinct variation graph representation, the odgi
format.
Build graph from GFA¶
Assuming that your current working directory is the root of the odgi
project, to construct an odgi
file from a
GFA
file, execute:
odgi build -g test/DRB1-3123.gfa -o DRB1-3123.og
The command creates a file called DRB1-3123.og
, which contains the input graph in odgi
format.
Display graph stats¶
To have basic information on the graph, execute:
odgi stats -i DRB1-3123.og -S | column -t
#length nodes edges paths
21997 4955 6777 12
This graph file has the following properties:
the total number of nucleotides of all nodes is 21997;
it has 4955 nodes, 6777 edges, and 12 paths.
Display path names and extract paths¶
The path’s names are:
odgi paths -i DRB1-3123.og -L
gi|568815592:32578768-32589835
gi|568815529:3998044-4011446
gi|568815551:3814534-3830133
gi|568815561:3988942-4004531
gi|568815567:3779003-3792415
gi|568815569:3979127-3993865
gi|345525392:5000-18402
gi|29124352:124254-137656
gi|28212469:126036-137103
gi|28212470:131613-146345
gi|528476637:32549024-32560088
gi|157702218:147985-163915
We can obtain their sequences in FASTA
format:
odgi paths -i DRB1-3123.og -f > paths.fasta
head paths.fasta -n 2
>gi|568815592:32578768-32589835
ATTTAACTCCATCTTTGAGAAACATTTAATAATGTAATGTGTTTGTCATACAGGGTGAATACAGATGCACGGG...
Generate a 1D visualization¶
To visualize the graph, execute:
odgi viz -i DRB1-3123.og -o DRB1-3123.png -x 500
to obtain the following PNG image:
In this 1-Dimensional visualization:
The graph nodes are arranged from left to right, forming the
pangenome sequence
.The colored bars represent the the paths versus the
pangenome sequences
in a binary matrix.The path names are visualized on the left.
The black lines under the paths are the links, which represent the graph topology.
See the odgi viz documentation for more information.