Uploaded image for project: 'Jalview'
  1. Jalview
  2. JAL-323

hashCode collisions for large sequence sets when generating ids for sequences in Jalview Archives

    XMLWordPrintable

    Details

    • Mantis ID:
      28756

      Description

      With large numbers of sequences, the chance of the java Object hashCode becoming non-unique increases. Tests with PFAM have shown that for reasonably large alignments (more than a few hundred sequences), collisions can occur.

      Try the pfam family stockholm file PF00072.15 to reproduce.

      ****** ADDITIONAL INFORMATION ******

      Symptoms when collision occurs:
      Jalview archive is generated without exceptions or errors. On reading archive back into Jalview, an ArrayIndexOutOfBounds exception is raised when the code tries to access the corresponding Seq element of the Vamsas sequence set for one of the last JSeq elements.

      Current fix strategy is to generalise id attribute of JSeq to a string (which is done). Secondly, either implement a better hash generator function (and properly implement it for use in a Hashtable), or a disambiguation routine that checks to see if a collision occurs when a seqHash is generated and if it does, generates a new unique hash.

        Attachments

          Activity

            People

            Assignee:
            jprocter James Procter
            Reporter:
            jprocter James Procter
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: