barts-and-london_sml
Barts Cancer Institute

This User Guide has been designed to help you navigate through the full suite of options available in SNPnexus v5. Use the menu below to help you navigate this guide:

Before entering your data, you must select the appropriate Human Assembly Reference. SNPnexus supports both GRCh37 (hg19) and GRCh38 (hg38).

In SNPnexus, you can enter your variants manually or use the batch option to upload VCF or Text files to your query

keyboard_double_arrow_right Manual Variant Entry: Use the manual entry form to input specific coordinates (Chromosome, Position, Ref/Alt alleles) or a dbSNP rsID. Click Add to move these into your query list.

Important: All manually entered variants must be Validated before you can proceed to the next steps.

keyboard_double_arrow_rightBatch Queries: For large-scale analysis, you can drag and drop up to six files into the upload area. SNPnexus supports two primary formats:

  • VCF format: A standard text file with tab-separated columns in the following format:
    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO
    1   72662224    .   A   T   .   .   .
    2   103886224    .   CG   C   .   .   .
    2   136322246    .   T   TAA   .   .   .
                                
  • Tab-Separated Text: A text file where the first column specifies the input type (chromosome or dbsnp). See below for examples:
    chromosome  1   72662224    A   T
    chromosome  2   103886224   CG  C
    chromosome  2   136322246   T   TAA
    dbsnp   rs4133590
                                

Important: After adding files to the queue, you must click Upload, followed by Validate. The system will notify you if the format is correct or if errors were detected.

If your query exceeds the 150,000 variant limit, or if you wish to narrow your focus, you can apply pre-processing filters:

  • Gene filter: Select one or multiple genes from the select box. The genes are ordered by most mutated first. You can also paste a list of genes in the Text Box (*)
  • Chromosomal regions: Define specific genomic coordinates
  • Functional focus: Limit analysis to exonic variants only

(*) If pasting a gene list, you must click Validate Filters to ensure they match our internal database.

Select the specific biological datasets you wish to query (e.g., Population Frequency, ClinVar, In-Silico scores). Once chosen, click Submit Query.

Note: The Submit button is only active if your query has been successfully validated.

This page provides a real-time view of your analysis progress, divided into three sections:

  • Query Information and Status: Displays the unique query Id, the number of samples and variants submitted, and a dedicated url for your results. It also tracks the progress of the annotation engine.

  • Valid Variants Table: Appears after pre-processing to show the variants being annotated

  • Plots: An ideogram plot of variant distribution across chromosomes, the top mutated genes, and a summary of variants types.

The Results Page is a dynamic interface designed to help you transition from thousands of raw data points to a shortlist of high-priority variants. Below is a short description of its sections:

  • Query Data and Export Options: This section displays the unique query ID, the precise date and time until which your data will be retained in our servers, and a bullet list showing every filter are currently applied to the view. You can also download a ZIP file containing individual tab-separated text files for every annotation and sample in your query; or alternatively, containing one VCF file per sample in the query with all the annotations integrated.

  • Dynamic Filtering Panel: This section is the "command center" for variant prioritization. The available filters appear based on the annotation categories you selected during the query submission. You can narrow variants based on functional consequence, population thresholds, known mutations, or if you uploaded multiple samples, you can find variants present in a subset of samples, or variants unique to each sample.

  • Visual Analytics: Cohort and Sample Plots: SNPnexus provides high-level cohort-wide and sample specific visualisations to help users spot patterns before exploring the raw data.

  • Interactive Results Tables: The bulk of the data is organised into searchable, sortable tables. Each table can be exported individually as VCF or Text file, respecting any active search or local filters.

  • View per Variant: Clicking the variant ID in the first column of any table opens a dedicated "deep-dive" page. This page presents the variant as a full report, instead of a row in a table, showing its presence across all samples and its complete annotation profile.

Below is a description of the key columns provided in your results:

Genomic Mapping:

  • Variant ID: Identification of the variant in this sample. Usually The chromosome:position:ref_allele:alt_allele(s):strand
  • Chrom: Chromosome
  • Position: Variant start position on chromosome
  • Alleles: Ref Allele/Alt Allele(s)
  • dbSNP: dbSNP ID, if variant maps to a known dbSNP
  • Contig Position: Variant contig and contig location
  • Cytoband: Variant Cytogenetic location
  • Overlapping Genes: Name of the gene(s) to which the variant is overlapped
  • Upstream Gene: If variant is not overlapping any gene, the closest variant in the upstream.
  • Downstream Gene: If variant is not overlapping any gene, the closest variant in the downstream.

Genomic Consequence:

  • ID: Identification of the variant in this sample. Usually The chromosome:position:ref_allele:alt_allele(s)
  • Chrom: Chromosome
  • Position: Variant start position on chromosome
  • Alleles: Ref Allele/Alt Allele(s)
  • Consequence: Predicted function of the variant on the transcript.
  • Transcript: Transcript name in the corresponding system (Ensembl, NCBI, CCDS)
  • Canonical: True if the transcript is canonical
  • Gene: Gene name in the corresponding system (Ensembl, NCBI, CCDS) and its HGVS symbol (if available)
  • Protein: Protein name in the corresponding system (Ensembl, NCBI, CCDS)
  • HGVS: HGVS nomenclature of the variant on the transcript or chromosome level
  • HGVS Protein: HGVS nomenclature of the variant on the protein
  • CDNA Position: Variant position on the CDNA
  • CDS Position: Variant position on the CDS
  • AA Position: Position of the first aminoacid affected in the resultant peptide chain
  • Peptide Mutation: Wild peptide > Mutated Peptide
  • Splice Distance: Distance to splice junction, if variant is intronic.

Non-Synonymous Single Nucleotide Mutations:

  • ID: Identification of the variant in this sample. Usually The chromosome:position:ref_allele:alt_allele(s)
  • Chrom: Chromosome
  • Position: Variant start position on chromosome
  • Ref Allele: Reference Allele
  • Alt Allele: Observed Allele
  • Wild AA: Reference amino acid
  • Mutated AA: Mutated amino acid
  • Transcript: Transcript affected by variant
  • Gene: Gene affected by variant
  • SIFT: SIFT Score and predicted deleteriousness
  • PolyPhen: PolyPhen Score and predicted deleteriousness
  • REVEL: REVEL Score and predicted deleteriousness
  • AlphaMissense: AlphaMissense Score and predicted deleteriousness
  • CADD: CADD Phred Score and predicted deleteriousness

Non-Coding Single Nucleotide Mutations:

  • ID: Identification of the variant in this sample. Usually The chromosome:position:ref_allele:alt_allele(s)
  • Chrom: Chromosome
  • Position: Variant start position on chromosome
  • Ref Allele: Reference Allele
  • Alt Allele: Observed Allele
  • ReMM: ReMM Score and predicted pathogenicity
  • Jarvis: Jarvis Score and predicted pathogenicity
  • CADD: CADD Phred Score and predicted pathogenicity

Population Frequency - ALFA:

  • Total Frequency: Allele Frequency in the global population
  • AFA Frequency: Allele Frequency in Individuals with African Ancestry
  • AFO Frequency: Allele Frequency in African American population
  • AFR Frequency: Allele Frequency in all African individuals (AFA and AFO individuals)
  • ASN Frequency: Allele Frequency in all Asian population (EAS and OAS individuals, excluding South Asians)
  • EAS Frequency: Allele Frequency in the East Asian population
  • EUR Frequency: Allele Frequency in the European population
  • LAC Frequency: Allele Frequency in Latin American individuals with Afro-Caribbean Ancestry
  • LEN Frequency: Allele Frequency in Latin American individuals with mostly European and Native American Ancestry
  • OAS Frequency: Allele Frequency in all Asian individuals excluding South or East Asian
  • OTH Frequency: Allele Frequency in individuals with self-reported population is inconsistent
  • SAS Frequency: Allele Frequency in the South Asian population

Population Frequency - 1000 Genomes:

  • AFR Frequency: Allele Frequency in the African population
  • AMR Frequency: Allele Frequency in the Latin American population
  • EAS Frequency: Allele Frequency in the East Asian population
  • EUR Frequency: Allele Frequency in the European population
  • SAS Frequency: Allele Frequency in the South Asian population

Population Frequency - gnomAD:

  • Total Frequency: Allele Frequency in the global population
  • XX Frequency: Allele Frequency in Female individuals
  • XY Frequency: Allele Frequency in Male individuals
  • AFR Frequency: Allele Frequency in the African/African American population
  • AMI Frequency: Allele Frequency in the Amish population
  • AMR Frequency: Allele Frequency in the Admixed American population
  • ASJ Frequency: Allele Frequency in the Ashkenazi Jewish population
  • EAS Frequency: Allele Frequency in the East Asian population
  • FIN Frequency: Allele Frequency in the European (Finnish) population
  • MID Frequency: Allele Frequency in the Middle Eastern population
  • NFE Frequency: Allele Frequency in the European (Non-Finnish) population
  • Remaining Frequency: Allele Frequency in the remaining populations
  • SAS Frequency: Allele Frequency in the South Asian population

Conserved Regions - Phast:

  • Region Start: Genomic coordinate of the start of the conserved region
  • Region End: Genomic coordinate of the end of the conserved region
  • Phast name: Name of aligned element
  • Score: Estimated probability score for conservation as determined from PHAST package (nominal range: 0-1000)

Conserved Regions - GERP++:

  • Region Start: Genomic coordinate of the start of the conserved region
  • Region End: Genomic coordinate of the end of the conserved region
  • Element RS Score: Rejected Substitutions score for the conserved element as determined from GERP++ package
  • Base RS Score: Rejected Substitutions score calculated per base as determined from GERP++ package

Biological Context - Reactome Pathways:

  • Pathway: Link to Reactome Pathway
  • Description: Pathway description
  • Parents: Main parent(s) of the Pathway
  • p-Value: Statistical significance of the Pathway calculated using the Fisher's Exact Test for all the genes involved in the original queryset
  • Genes involved: Genes from the original queryset involved in the Pathway

Biological Context - KEGG Pathways:

  • Term Id: Link to KEGG Pathway
  • KEGG Term: Pathway description
  • Overlap: Number of genes in the original query overlapping genes involved in pathway
  • p-Value: Probability that the observed overlap occurred by random change. Calculated using GSEApy library.
  • Adjusted p-Value: p-Value corrected for multiple testing
  • Odds ratio: Quantifies the strength of the association between the overlapping genes and the pathway
  • Combined score: Combination of the p-value and odds value
  • Genes involved: Genes from the original queryset involved in the Pathway

Biological Context - Genotype/Tissue Expression GTEx:

  • GTEx ID: Link to the variant in the GTEx Portal
  • Gene: Ensembl gene
  • Tissue: Tissue associated with gene
  • p-Value: Probability that the observed association between the variant and gene expression occurred by chance
  • Beta: Regression coefficient for the linear model
  • Normalised Effect Size: Metric to compare effect sizes across different genes and tissues
  • Slope Standard Error: Uncertainty in the slope estimate

ClinVar:

  • Phenotypes: List of phenotypes associated with the variant
  • Significance: Whether identified as Pathogenic, Benign or Uncertain
  • Review Status: Review status recorded in ClinVar
  • ClinVar ID: ID of the Variant in ClinVar

Cosmic:

  • Gene: Gene affected by the variant
  • AA Mutation: HGVS mutation on the protein
  • Cosmic Mutation: Cosmic ID for the mutation
  • PubMed ID: PubMed ID for the publication of the study
  • Somatic Status: Status reported of the mutation
  • Tumour Source: Source of tumour tissue sample e.g. primary, metastasis.
  • Site: Primary site and site subtypes
  • Histology: Primary histology and histology subtypes

GWAS Catalog:

  • Genes: List of genes affected by the variant
  • Disease: Disease or trait examined in the study
  • p-Value: Reported p-Value for strongest SNP risk allele
  • PubMed ID: PubMed ID for the publication of the study
  • Sample: Sample size and ancestry description for stage 1 of GWAS