Details
Description
for seq A and B, where A and B are strings including one or more gaps: return percent identity = 100(number of nonequivalent symbol pairs which are not gaps)/Min(length(A), length(B)).
Whilst this heuristic works well for redundancy removal, it does not always yield expected results when calculating the PID for trees  since gapped columns are marked as 'similar' rather than different (sequences 'A' and 'AA' are 100% similar). In this case, phylogeneticists prefer to exclude the column from the calculation entirely (aligned portion of sequences 'A' and 'AA' are 100% similar) but most biologists would expect a gap!=residue (i.e. the sequences A and AA are 50% similar).
Bugs related to this epic concern extensions to the calculation method and GUI components associated with tree building and percent identity filtering/sorting that allow these distinct modes (wildcard gaps, ignore columns with gaps, and include insertions as mismatches) of percent identity calculation to be used if sensible and desired for tree building, similarity matrix computation, sorting and redundancy removal.
Attachments
Issue Links
 depends on

JAL790 Ambiguity codes not supported in PID calculation for trees
 In Progress

JAL791 PID calculation for tree calculation includes gapped columns in the alignment length used for %age calculation
 In Progress

JAL788 Generate percent identity scores for one or more pairs of sequences in an alignment
 Open

JAL233 arbitrary measure of %age ID for a set of homologs
 Open

JAL514 remove redundancy using either unaligned or aligned %age identity
 Open

JAL838 extend jalview's PID routine to allow different types of PIDs to be calculated
 In Progress
 related with

JAL1946 Support PID calculation that matches that done by Belvu
 Open

JAL1632 Dialog to create/edit tree calculation settings
 Closed