Details
Description
for seq A and B, where A and B are strings including one or more gaps: return percent identity = 100(number of nonequivalent symbol pairs which are not gaps)/Min(length(A), length(B)).
Whilst this heuristic works well for redundancy removal, it does not always yield expected results when calculating the PID for trees  since gapped columns are marked as 'similar' rather than different (sequences 'A' and 'AA' are 100% similar). In this case, phylogeneticists prefer to exclude the column from the calculation entirely (aligned portion of sequences 'A' and 'AA' are 100% similar) but most biologists would expect a gap!=residue (i.e. the sequences A and AA are 50% similar).
Bugs related to this epic concern extensions to the calculation method and GUI components associated with tree building and percent identity filtering/sorting that allow these distinct modes (wildcard gaps, ignore columns with gaps, and include insertions as mismatches) of percent identity calculation to be used if sensible and desired for tree building, similarity matrix computation, sorting and redundancy removal.
