Details
Description
for seq A and B, where A and B are strings including one or more gaps: return percent identity = 100-(number of non-equivalent symbol pairs which are not gaps)/Min(length(A), length(B)).
Whilst this heuristic works well for redundancy removal, it does not always yield expected results when calculating the PID for trees - since gapped columns are marked as 'similar' rather than different (sequences 'A' and 'AA' are 100% similar). In this case, phylogeneticists prefer to exclude the column from the calculation entirely (aligned portion of sequences 'A' and 'AA' are 100% similar) but most biologists would expect a gap!=residue (i.e. the sequences A and AA are 50% similar).
Bugs related to this epic concern extensions to the calculation method and GUI components associated with tree building and percent identity filtering/sorting that allow these distinct modes (wildcard gaps, ignore columns with gaps, and include insertions as mismatches) of percent identity calculation to be used if sensible and desired for tree building, similarity matrix computation, sorting and redundancy removal.
Attachments
Issue Links
- depends on
-
JAL-790 Ambiguity codes not supported in PID calculation for trees
- In Progress
-
JAL-791 PID calculation for tree calculation includes gapped columns in the alignment length used for %age calculation
- In Progress
-
JAL-788 Generate percent identity scores for one or more pairs of sequences in an alignment
- Open
-
JAL-233 arbitrary measure of %age ID for a set of homologs
- Open
-
JAL-514 remove redundancy using either unaligned or aligned %age identity
- Open
-
JAL-838 extend jalview's PID routine to allow different types of PIDs to be calculated
- In Progress
- related with
-
JAL-1946 Support PID calculation that matches that done by Belvu
- Open
-
JAL-1632 Dialog to create/edit tree calculation settings
- Closed