icon

Note: This is only intended to showcase processing in Gallia, it is not complete nor thoroughly tested at the moment. Use output at your own risk.

See original announcement on BioStars. For more information, see gallia-core documentation, in particular the bioinformatics examples section.

Description

Uses Gallia transformations

to turn VCF INFO values such as:

AC=1;ANN=G|start_lost|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>G|p.Met1?|1/918|1/918|1/305||,G-C|start_lost|HIGH|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>G|p.Leu1?|1/918|1/918|1/305||WARNING_REF_DOES_NOT_MATCH_GENOME,C|initiator_codon_variant|LOW|OR4F5|ENSG00000186092|transcript|ENST00000335137|protein_coding|1/1|c.1A>C|p.Met1?|1/918|1/918|1/305||;LOF=(OR4F5|ENSG00000186092|1|1.00)

into objects like:

{
  "AC": 1,
  "LOF": [
    { "Gene_Name": "OR4F5",
      "Gene_ID": "ENSG00000186092",
      "Number_of_transcripts_in_gene": 1,
      "Percent_of_transcripts_affected": 1.0 },
    { "Gene_Name": "OR4F5b",
      "Gene_ID": "ENSG00000186092b",
      "Number_of_transcripts_in_gene": 2,
      "Percent_of_transcripts_affected": 0.5 } ],
  "ANN": [
    {
      "Allele": "G",
      "Annotation": "start_lost",
      "Annotation_Impact": "HIGH",
      "Gene_Name": "OR4F5",
      "Gene_ID": "ENSG00000186092",
      "Feature_Type": "transcript",
      "Feature_ID": "ENST00000335137",
      "Transcript_BioType": "protein_coding",
      "Rank": {
        "value": 1,
        "total": 1 },
      "cDNA": {
        "pos": 1,
        "length": 918 },
      "CDS": {
        "pos": 1,
        "length": 918 },
      "AA": {
        "pos": 1,
        "length": 305 },
      "HGVS": {
        "c": "c.1A>G",
        "p": "p.Met1?" }
    },
    {
      "Allele": "G-C",
      "Annotation": "start_lost",
      "Annotation_Impact": "HIGH",
      "Gene_Name": "OR4F5",
      "Gene_ID": "ENSG00000186092",
      "Feature_Type": "transcript",
      "Feature_ID": "ENST00000335137",
      "Transcript_BioType": "protein_coding",
      "ERRORS_WARNINGS_INFO": "WARNING_REF_DOES_NOT_MATCH_GENOME",
      "Rank": {
        "value": 1,
        "total": 1 },
      "cDNA": {
        "pos": 1,
        "length": 918 },
      "CDS": {
        "pos": 1,
        "length": 918 },
      "AA": {
        "pos": 1,
        "length": 305 },
      "HGVS": {
        "c": "c.1A>G",
        "p": "p.Leu1?" }
    },
    {
      "Allele": "C",
      "Annotation": "initiator_codon_variant",
      "Annotation_Impact": "LOW",
      "Gene_Name": "OR4F5",
      "Gene_ID": "ENSG00000186092",
      "Feature_Type": "transcript",
      "Feature_ID": "ENST00000335137",
      "Transcript_BioType": "protein_coding",
      "Rank": {
        "value": 1,
        "total": 1 },
      "cDNA": {
        "pos": 1,
        "length": 918 },
      "CDS": {
        "pos": 1,
        "length": 918 },
      "AA": {
        "pos": 1,
        "length": 305 },
      "HGVS": {
        "c": "c.1A>C",
        "p": "p.Met1?" }
    }
  ]
}

SnpEff References

  • publication: "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.", Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. Fly (Austin). 2012 Apr-Jun;6(2):80-92. PMID: 22728672
  • website: https://pcingola.github.io/SnpEff/