Vanno manual
Tutorial 1. Introduction

     

Vanno is based on the framework of CPAP (doi: 10.1002/humu.22386) published in Jul 2013 in Human Mutation. The original version is designed for the analysis of Ion Ampliseq Cancer Panel target sequencing datasets generated by the Life Technologies Ion Personal Genome Machine (PGM) sequencer. Due to the user-friendly interface and ultra-fast computation time of CPAP, more than 15,000 datasets were uploaded to the system in the first year. The advances in sequencing technologies and the emergence of many new disease-targeted and whole-exome sequencing panels since the release of CPAP prompted us to build a completely new system that covers 24 most commonly used disease-targeted gene panels and 4 whole-exome sequencing panels. We also implemented many new functions and databases based on users’ feedback for functional classification and visualization-aided selection of putative disease/cancer associated SNVs. As far as we know, Vanno is the only web-based platform for large-scale comparative analysis of not only targeted sequencing but also whole-exome sequencing datasets from all high-throughput sequencing platforms.  

 

The main features of Vanno include:

  • Variant Call Format Converter
  • Pre-indexed local stored biomedical databases
    • dbSNP
    • dbNSFP
    • 1000 Genomes Project
    • ESP 6500 exomes
    • COSMIC
    • ClinVar
    • OMIM
    • InterPro
    • RefSeq
    • Pfam
    • Gene Ontology
    • ConsensusPathwayDB
    • TCGA mutation landscapes from 18 cancer types
    • Functional annotation results form ANNOVAR
  • Visual analytics modules
    • Circos
    • jHeatmap
    • Dynamic Pie chart & Bar chart
    • Protein domain chart
    • JSmol for displaying protein 3D structure
  • Data mining
    • Sorting
    • Mutually exclusive sorting
    • Cascade filters
  • Data download
    • Detailed annotation table (MS Excel or Tab-delimited text)
    • Filtered annotation tables (MS Excel or Tab-delimited text)
    • Mutant cDNA/ Protein sequence retreival for experimental validation

A comparison table between Vanno and other varaint annotation analytic tools



Tool Vanno CPAP Annotate-it ANNOVAR Anntools KGGSeq SeqAnt SVA Treat
Avalibility Web Web Web Command line Command line Command line Web Graphical Command line
1000 Genomes
ESP 6500 exomes






dbSNP&COSMIC
dbNFSP




ClinVar






Customized filter


Filter history






Multi-sample comparison
(Circos)







Multi-sample comparison
(Heatmap)








TCGA comparison
(18 Cancer types)








Dynamic summarized chart






Mutant DNA sequence retrieval





Mutant Protein sequence retrieval







Pathway information





Mutually exclusive sorting







OMIM





Gene Ontology






Protein Domain visualization
(Mutation spectrum)








Protein 3D structure visualization
(Missense variants mapped to 
protein 3D structure)








Supported commercial gene/ whole-exome panels 28 1








2. Features in Vanno


2-1 Supported NGS Instruments:
  1. IonPGM: Ion Torrent & Ion Proton 
  2. Illumina: MiSeq & HiSeq 


2-2 Supported Variant Callers (Variant File formats)
  1. Torrent Variant Caller (TVC)verion 3.2, ver 3.6 and ver 4.0+ (.vcf or .xls)
  2. MiSeq Reporter
  3. GATK (.vcf)
  4. VarScan2 (.vcf)
  5. QIAGEN DNAseq Sequence Variant Analysis package (.xls)
  6. Agilent SureCall (.xls)


2-3 Supported  Sequencing Panels:
Targeted Sequencing Panel
Whole-exome panel
Ion AmpliSeq
Illumina TruSeq/TruSight
QIAGEN GeneRead
Agilent HaloPlex
1.Cancer Hotspot Panel v1
   (45 genes)
1.TruSeq Amplicon Cancer Panel 
   (50 genes
1.Breast Cancer
   (20 genes)    
1.Cancer Research Panel
   (47 genes)     
2.Cancer Hotspot Panel v2
   (50 genes)       
2.TruSight Autism    
   (101 genes
2.Colon Cancer
   (20 genes)    

3.Comprehensive Cancer Panel
   (409 genes)      
3.TruSight Cancer
   (94 genes & 284 SNPs
3.Leukemia
   (20 genes)    

4.Inherited Disease Panel
   (325 genes)      
4.TruSight Cardiomyopathy
   (46 genes
4.Liver Cancer
   (20 genes)    

5.RNA Apoptosis Panel
   (267 genes)      
5.TruSight Inherited Disease
   (552 genes
5.Lung Cancer
   (20 genes)    

6.RNA Cancer Panel
   (50 genes)       

6.Ovarian Cancer
   (20 genes)    

7.BRCA1 and BRCA2 Panel
   (2 genes)        

7.Prostate Cancer
   (20 genes)    

8.Colon and Lung Cancer Panel
   (22 genes)       

8.Gastric Cancer
   (20 genes)   

9.Dementia Research Gene Panel
   (17 genes)       

9.Comprehensive Cancer
   (124 genes)   


Ion AmpliSeq
1.IonAmpliSeqExome
Illumina
1.Illumina TruSeq_Exome
Nimblegen
1.SeqCapEZ_Exome_v3
Agilent
1.SureSelect_Exon_V5


2-4 Annotaion Resources
Polulation-related DB
dbSNP138,
1000 Genomes Project,
NHLBI GO Exome Sequencing Project (ESP6500
Cancer-associated DB
COSMIC v68
Disease-related DB
ClinVar (Nucl. Acids Res. 2013 Nov 14),
OMIM
Functional annotation
ANNOVAR
Functional prediction
dbNFSP v2.1
PhyloP, SIFT, Polyphen2, LRT, MutationTaster,
MutationAssessor , FATHMM

(Protein 3D structure )
Protein Domain
InterPro, Pfam 
Gene Ontology
GO_Slim biological process, cellular component and molecular function
Pathway
ConsensusPathDB


2-5 Data Visualization
Amplicon View
Gene View



Protein Domain View 
protein domain

Protein 3D Structure (JSmol)


Real-time filter for Circos




2-6 Data Management

2-6-1 Data Comprison






Summary Chart for a cohort study


circos_hist
Cross-sample comparison

(Circos)





Cross-sample comparison

(Heatmap)

heatmap





Group-wise comparison


(Circos)


gw5
( 6 Groups

Group-wise comparison


(Heatmap)


heatmap3groups
(Unlimited Groups or Clinical features)

Cross-panel comparison


Compare to TCGA
mutation spectrum from
18 cancer types deposited at
The Cancer Genome Atlas
(TCGA) 


2-6-2 Data Mining

Vanno
Sorting & Filtering Y
Comparison Table
for TCGA

Y
Dynamic Chart Google Chart API, JavaScript, and HTML5
Mutually exclusive sorting for identifying driver mutation

mutually exclusive

Nucleotide Sequence Retrieval
(for Sanger validation)



Mutant Protein Sequence Retrieval
(for MS/MS validation)







3. Vanno Analysis Workflow








4. How To Use The Vanno Web Server

4-1 Input file preparation


4-1-1 Supported variant calling files
Variant Caller
File Format
Reference Links
Torrent Variant Caller (TVC)
.xls or .vcf
link
MiSeq Reporter
.gvcf
link
QIAGEN DNAseq Sequence Variant Analysis Package
.xls
link
Agilent SureCall
.xls
link
GATK
.vcf
link
VarScan
.vcf
link

4-1-2 Nomenclature criteria for variant calling files
  • Only the combination of alphabetic characters [A to Z, a to z] , numbers [0 to 9] and underscore “_” are allowed in nomenclature.
  • Dot “.”, Hyphen “-”, Comma “,” , Colon “:” and White Space “     “ should be avoided in nomenclature.
  • File name should always starts with a character [A to Z, a to z]. (ex. Sample01.xls, Sample02.xls, Sample_03.xls … or Sample01.vcf, Sample02.vcf, Sample_03.vcf … )
  • The length of file name must be constrained within 30 characters, excluding the file extension (.xls or .vcf).
4-1-3 How to prepare a compressed file for uploading
    • Variant calling files should be compressed into a single compressed file before uploading to the Vanno server.
    • Supported file formats: .tar.bz2, .tar.gz and .zip
    • Directory structure is not allowed in the compressed file.
    • Only the same file format (*.xls or *.vcf) is allowed in a single compressed file.
    • Unix command for compressing file:
      • tar jcvf upload.tar.gz2 *.xls (or *.vcf)
      • tar czvf upload.tar.gz *.xls (or *.vcf)
      • zip -r upload.zip *.xls (or *.vcf)



4-2 Select Gene Panel  & Variant Caller version
 
select panel

4-3 Confirm corresponding parameters before uploading

uploadpanel





5. Output


output

5-1 Summary of Job
summaryofjob

5-1-1 Heatmap

heatmap511

5-1-2 Histogram (Circos)

A compact Circos plot is used to provide a simultaneous exploratory view of all alteration events in each gene as well as
their corresponding population frequencies in tumor samples. Gene mutation frequencies from different alterations types
in a given samples can be organize in layered circles.
The outer-most circle represents all mutation events (All), with the height determined by the frequencies of altered samples,
followed by circles for silent mutation (Si), missense mutation (Ms), nonsense mutation (Ns) and InDels (ID).


histogram2


5-2 Filter History, Filters, and Function Classification


filter

5-3 Charts & Table


5-3-1 Pie chart


The distribution of variants is summarized in pie chart  by items such as chromosome, sample name, gene symbol, mutation type and protein domain.

piechart


5-3-2 Circos (Gene view & Amplicon view)

geneview

ampliconview

5-3-3 Table

table






6. Example Uses

As a proof-of-principle experiment, we applied Vanno to targeted sequencing data from 11 cancer cell lines, which can be further classified into three different tissue types, including colon, oral and normal lung tissue.
Amplicons were amplified from the genomic DNA of 11 cancer cell lines based on the Ion AmpliSeq Cancer Panel v2 protocol, which were subsequently sequenced by the Life Technologies Ion PGM sequencer on a single
Ion 318 chip, targeting at mutation hotspot regions of 50 oncogenes and tumor suppressor genes.


6-1 group-wise comparison

The variant calling files obtained from 11 cell lines (4 oral, 2 normal lung and 5 colon) were separately submitted to the Vanno server as three distinct jobs. After the jobs were completed, genetic variants identified from
three distinct jobs could be displayed in red, blue, and green color, respectively, using the group-wise comparison module of Vanno to facilitate the exploration of disease-causing variants.

tut_groupwise1      tut_groupwise2

6-2 Job Summary


As shown in the following figure, 181 varaints were identified from 11 cancer cell lines.
The highly mutated genes across different cell lines (or clinical featrures)
and the composition of mutation types can be easily depicted in the heatmap and histogram (Circos), respectively.


tut_summary               tut_heatmap      histogram


6-3
Visualize mutually exclusive alteration patterns using heatmap

The concept of mutually exclusive alteration patterns has been exploited to distinguish driver mutations that contribute to tumorigenesis from passenger mutations
(Ciriello et al, Genome Research 2011 and Vandin et al., Genome Research 2012). Considering this feature in identifying recurrent altered driver mutation in cancer,
Vanno incorporate a new heatmap feature with an option to sort genomic alterations by mutually exclusive patterns across multiple samples, making it easy to identify
driver genes that are functional linked in a common pathway or in the same biological process.

Mutation profiles obtained from TCGA Colon and Rectum Adenocarcinoma study (223 samples) were used as an example for a better demonstration for this heatmap module.

tut_mutual


6-4 Prioritize candidate targets of interest


Visual analytics is inherited from the previous version of CPAP based on the architecture of Circos, with interactive filters to visualize the distribution of variants across multiple datasets.
In Vanno, we introduce a new approach to display the impact of different filter settings on Circos in real-time, alleviating the need for regenerating the whole plot.
This is particularly useful in an exploratory stage to grasp the characteristic of the dataset, which can be applied to the subsequent discovery stage to prioritize candidate targets.

As shown in the following figure, samples can be easily included or excluded from the analysis, providing the opportunity to discover variant composition within and across individual samples.

filtercircos


6-5 Inspect alterations from different aspects


After a significant set of genes or variants has been identified, a detailed mutational landscape can be inspected at the gene, amplicon, or variant level.
Vanno also supports visualization of gene mutations in the context of protein domains and tertiary structures.
The domain visualization module can be applied to clarify molecular characteristics of mutant genes.
For examples, KRAS mutations in codon 12 and 13 are recognized as a predictor of non-responsiveness to anti-EGFR therapies in metastatic colorectal cancer.
Detailed molecular analysis of patients could provide a significant clinical benefit and move toward personalized treatment.
 

tut_view


6-6 Compare alterations with TCGA data


Moreover, the mutational landscapes from in-house data and the TCGA can be rendered side-by-side on specific protein domains.
The only requirement is to turn on
the "compare with TCGA" button and select respective TCGA cancer type.

TCGA

6-7 Cross-panel comparison


Clinical laboratories and biotechnology companies have launched several disease-targeted gene panels that target similar disease-causing gene regions, or the same mutation hotspots but with different specificity and sensitivity. To select the most suitable gene panel for a specific project, different gene panel tests such as Ion AmpliSeqTM Cancer HotSpot Panel v1 and Ion Cancer AmpliSeqTM HotSpot Panel v2 are applied to the same cell lines. The identified genetic alterations from different testing panels are displayed in the UCSC Genome Browser as custom tracks.
Gene regions targeted by different testing panels are colored by different colors alongside annotation tracks from dbSNP, COSMIC, ClinVar, OMIM, and RefSeq, providing an easier way to evaluate sensitivity, specificity and reproducibility between testing panels.

tablecrosspanel


tut_crosspanel


6-8 Integration of genomics and proteomics


The sequence retrieval is a crucial step for subsequent validation of selected mutations by Sanger sequencing, which is an easy but usually neglected task by application developers.
Vanno provides the functionality to retrieve the nucleotide sequences from a particular genomic region that span the variant sites for the convenience of PCR primer design.
Moreover, Vanno also translates genomic alterations identified from targeted sequencing data into mutated protein entries in FASTA format, according to their functional consequences in transcripts. Aberrant proteins result from non-synonymous coding variants, and small INDELs are predicted and outputted as a FASTA file for generating peptide sequence tags for mass spectrometry database searching. Researchers can design experiments at multilayers that link sample-specific variations in DNA, RNA and proteins.

validation



7. Benchmarking


For the best usability of our server, the scheduling system dynamically assigns CPU cores to each job (max: 10 cores/ per job).

Operating System
Centos 6.2
benchmark
Job Queuing System
Sun Grid Engine
Hardware
vCPUs 8*2.0 GHz
vRAM 8GB
vHDD 500GB
Programming Languages
Shell Script
Rscript
PHP
JavaScript