Uploaded image for project: 'Jalview'
  1. Jalview
  2. JAL-1804

Show products/fetch xrefs method scales poorly with size of sequence dataset

    XMLWordPrintable

    Details

    • Urgency:
      Highest

      Description

      Importing a very large sequence alignment can take excessive amounts of time if the file format includes database reference annotation, such as those imported via stockholm files from PFAM.

      The PF00145 full stockholm alignment (which is ~9MB gzipped) downloaded from the pfam site's Alignments->Download section results in Jalview apparently hanging. Debugging showed that the bottleneck was in the dbxrefs routine which searches for database cross-reference matches.

      Scalability tests need to be created for the CrossRefs.findSequenceXrefTypes method, and the routine needs to be optimised or another routine used in its place in jalview.gui.AlignFrame.setShowProductsEnabled(), which tests whether there are product/coding region cross-references for the alignment being displayed.

      Bottleneck code is here:

      public boolean canShowProducts(SequenceI[] selection,
                boolean isRegionSelection, Alignment dataset)
        {
          boolean showp = false;
          try
          {
            showProducts.removeAll();
            final boolean dna = viewport.getAlignment().isNucleotide();
            final Alignment ds = dataset;
            String[] ptypes = (selection == null || selection.length == 0) ? null
                    : CrossRef.findSequenceXrefTypes(dna, selection, dataset);

        Attachments

          Activity

            People

            Assignee:
            jprocter James Procter
            Reporter:
            jprocter James Procter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: