[JAL-837] improve range of percent identity calculations and range of functions that use them - Jalview

XML

Word

Printable

Details

Type: Epic
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 3.0
Component/s: analysis
Labels:
None

Description

Jalview has a fairly basic percent identity calculation function that prior to 2.7.x was used for calculating trees, sorting sequences according to similarity and removing redundant sequences. This function used the following Percent Identity heuristic:

for seq A and B, where A and B are strings including one or more gaps: return percent identity = 100-(number of non-equivalent symbol pairs which are not gaps)/Min(length(A), length(B)).

Whilst this heuristic works well for redundancy removal, it does not always yield expected results when calculating the PID for trees - since gapped columns are marked as 'similar' rather than different (sequences 'A' and 'AA' are 100% similar). In this case, phylogeneticists prefer to exclude the column from the calculation entirely (aligned portion of sequences 'A' and 'AA' are 100% similar) but most biologists would expect a gap!=residue (i.e. the sequences A and AA are 50% similar).

Bugs related to this epic concern extensions to the calculation method and GUI components associated with tree building and percent identity filtering/sorting that allow these distinct modes (wildcard gaps, ignore columns with gaps, and include insertions as mismatches) of percent identity calculation to be used if sensible and desired for tree building, similarity matrix computation, sorting and redundancy removal.

Attachments

Issue Links

depends on

JAL-790 Ambiguity codes not supported in PID calculation for trees

In Progress

JAL-791 PID calculation for tree calculation includes gapped columns in the alignment length used for %age calculation

In Progress

JAL-788 Generate percent identity scores for one or more pairs of sequences in an alignment

Open

JAL-233 arbitrary measure of %age ID for a set of homologs

Open

JAL-514 remove redundancy using either unaligned or aligned %age identity

Open

JAL-838 extend jalview's PID routine to allow different types of PIDs to be calculated

In Progress

related with

JAL-1946 Support PID calculation that matches that done by Belvu

Open

JAL-1632 Dialog to create/edit tree calculation settings

Closed

(1 depends on, 2 related with)

Activity

People

Assignee:: James Procter

Reporter:: James Procter

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/May/11 6:58 AM

Updated:: 06/Mar/17 1:29 PM