[JAL-2184] resolve alternative references for dataset before attempting to fetch cross-references - Jalview

XML

Word

Printable

Details

Type: Task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.10.0
Fix Version/s: 2.11.1
Component/s: data retrieval services, Datamodel, na
Labels:
None

Description

The introduction of Ensembl as a source of sequence accession and coding relationship data for Jalview revealed a number of problems with the cross-ref heuristics (see e.g. ~~JAL-2154~~).

One remaining issue is that given the following hypothetical:
ENSG1
-> ENA1, ENA1.1
-> ENST1, ENSP1
-> UNP1

After retrieving ENSG1, Jalview will offer to retrieve protein products for both Ensembl and Uniprot. Once retrieved, the protein products will have crossreferences to ENA and ENSG.

However, in reality, UNP, ENSP1 and ENA1.1 are identical protein products, coded for by identical CDS (on ENST1 and ENA1's CDS regions).

For 2.10.0's cross-reference retrieval, Jalview will unavoidably retrieve additional sequences, unless the user manually performs a fetch database references operation. You can try this by:
1. Retrieve example Ensembl ENSG.
2. Show Ensembl cross references.
3. Select 'Fetch DB References' for Peptides for Ensembl view' - this will verify the ENSP sequences against their Uniprot records.
4. Go back to the original ENSG locus view and select 'Uniprot' for show cross-references - the same ENSP products will be shown.

//nb// An aside here: show Uniprot should have resulted in a cross-ref view where the sequence names were changed to their respective uniprot ids, rather than the default ENSP ids originally assigned on import from Ensembl

One way this behaviour could be avoided is that whenever a show-crossreferences operation is attempted, Jalview should first verify any alternative references for sequences in the dataset.

For each sequence:
Search for any dbrefs to other primary sequence databases of appropriate type which do not test as isPrimary(). If these have no mappings, then Jalview should retrieve the accession and construct a mapping if possible.

The above could take some time, so this could also be done automatically in the background. We could also provide a high-throughput service.

Attachments

Issue Links

related with

JAL-407 normalise new alignments into a dataset

Open

Activity

People

Assignee:: James Procter

Reporter:: James Procter

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 30/Aug/16 4:59 PM

Updated:: 13/Nov/17 5:09 PM