[JAL-4654] Faster, more robust and configurable feature/GFF import - Jalview

XML

Word

Printable

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.11.5.0, 2.11.5.1
Fix Version/s: 2.12.0
Component/s: annotation, file format issue, sequencefeatures
Labels:
None

Description

Jalview's sequence features/gff files can be useful as 'local annotation databases' - where in earlier versions one could drag/drop a local database of features onto an alignment to annotate the sequences. However there has been some degradation in this functionality:
- GFF3 import now implicitly results in 'THISISAPLACEHOLDER' sequences for all unresolved features, which at best need to be deleted, and at worse result in millions of additional sequences unneccessarily created.
- RelaxedIDMatching (JAL-1537 and ~~JAL-753~~) is still a hidden preference, but actually seems these days to not cope with some important use cases:
Sequence name in alignment: UNIPROT|H5DT7|PROT_NAME|FOOO
SequenceID in feature: H5DT7
- looking at the code, Jalview should recognise this association but in 2.12 branch it currently does not.
--> suggest ID matcher should create matchings for all words in the name, and then use an ignore list to ignore strings that are not expected to be a sequence ID (e.g. a database name, or general english words).
--> opportunity for semantics/llm query here ? (what are the appropriate IDs for this protein ?)

Attachments

Activity

People

Assignee:: James Procter

Reporter:: James Procter

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 18/Feb/26 4:58 PM

Updated:: 18/Feb/26 4:58 PM