Details
-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 2.4
-
Component/s: Datamodel, file format issue, jvdesktop
-
Labels:None
-
Mantis ID:28756
Description
With large numbers of sequences, the chance of the java Object hashCode becoming non-unique increases. Tests with PFAM have shown that for reasonably large alignments (more than a few hundred sequences), collisions can occur.
Try the pfam family stockholm file PF00072.15 to reproduce.
****** ADDITIONAL INFORMATION ******
Symptoms when collision occurs:
Jalview archive is generated without exceptions or errors. On reading archive back into Jalview, an ArrayIndexOutOfBounds exception is raised when the code tries to access the corresponding Seq element of the Vamsas sequence set for one of the last JSeq elements.
Current fix strategy is to generalise id attribute of JSeq to a string (which is done). Secondly, either implement a better hash generator function (and properly implement it for use in a Hashtable), or a disambiguation routine that checks to see if a collision occurs when a seqHash is generated and if it does, generates a new unique hash.
Try the pfam family stockholm file PF00072.15 to reproduce.
****** ADDITIONAL INFORMATION ******
Symptoms when collision occurs:
Jalview archive is generated without exceptions or errors. On reading archive back into Jalview, an ArrayIndexOutOfBounds exception is raised when the code tries to access the corresponding Seq element of the Vamsas sequence set for one of the last JSeq elements.
Current fix strategy is to generalise id attribute of JSeq to a string (which is done). Secondly, either implement a better hash generator function (and properly implement it for use in a Hashtable), or a disambiguation routine that checks to see if a collision occurs when a seqHash is generated and if it does, generates a new unique hash.