Uploaded image for project: 'Jalview'
  1. Jalview
  2. JAL-2184

resolve alternative references for dataset before attempting to fetch cross-references




      The introduction of Ensembl as a source of sequence accession and coding relationship data for Jalview revealed a number of problems with the cross-ref heuristics (see e.g. JAL-2154).

      One remaining issue is that given the following hypothetical:
       -> ENA1, ENA1.1
       -> ENST1, ENSP1
       -> UNP1

      After retrieving ENSG1, Jalview will offer to retrieve protein products for both Ensembl and Uniprot. Once retrieved, the protein products will have crossreferences to ENA and ENSG.

      However, in reality, UNP, ENSP1 and ENA1.1 are identical protein products, coded for by identical CDS (on ENST1 and ENA1's CDS regions).

      For 2.10.0's cross-reference retrieval, Jalview will unavoidably retrieve additional sequences, unless the user manually performs a fetch database references operation. You can try this by:
      1. Retrieve example Ensembl ENSG.
      2. Show Ensembl cross references.
      3. Select 'Fetch DB References' for Peptides for Ensembl view' - this will verify the ENSP sequences against their Uniprot records.
      4. Go back to the original ENSG locus view and select 'Uniprot' for show cross-references - the same ENSP products will be shown.

      //nb// An aside here: show Uniprot should have resulted in a cross-ref view where the sequence names were changed to their respective uniprot ids, rather than the default ENSP ids originally assigned on import from Ensembl

      One way this behaviour could be avoided is that whenever a show-crossreferences operation is attempted, Jalview should first verify any alternative references for sequences in the dataset.

      For each sequence:
      Search for any dbrefs to other primary sequence databases of appropriate type which do not test as isPrimary(). If these have no mappings, then Jalview should retrieve the accession and construct a mapping if possible.

      The above could take some time, so this could also be done automatically in the background. We could also provide a high-throughput service.


          Issue Links



              jprocter James Procter
              jprocter James Procter
              0 Vote for this issue
              1 Start watching this issue