Uploaded image for project: 'Jalview'
  1. Jalview
  2. JAL-4654

Faster, more robust and configurable feature/GFF import

    XMLWordPrintable

    Details

      Description

      Jalview's sequence features/gff files can be useful as 'local annotation databases' - where in earlier versions one could drag/drop a local database of features onto an alignment to annotate the sequences. However there has been some degradation in this functionality:
      - GFF3 import now implicitly results in 'THISISAPLACEHOLDER' sequences for all unresolved features, which at best need to be deleted, and at worse result in millions of additional sequences unneccessarily created.
      - RelaxedIDMatching (JAL-1537 and JAL-753) is still a hidden preference, but actually seems these days to not cope with some important use cases:
      Sequence name in alignment: UNIPROT|H5DT7|PROT_NAME|FOOO
      SequenceID in feature: H5DT7
      - looking at the code, Jalview should recognise this association but in 2.12 branch it currently does not.
      --> suggest ID matcher should create matchings for all words in the name, and then use an ignore list to ignore strings that are not expected to be a sequence ID (e.g. a database name, or general english words).
      --> opportunity for semantics/llm query here ? (what are the appropriate IDs for this protein ?)

        Attachments

          Activity

            People

            Assignee:
            jprocter James Procter
            Reporter:
            jprocter James Procter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: