Uploaded image for project: 'Jalview'
  1. Jalview
  2. JAL-837

improve range of percent identity calculations and range of functions that use them

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.0
    • Component/s: analysis
    • Labels:
      None

      Description

      Jalview has a fairly basic percent identity calculation function that prior to 2.7.x was used for calculating trees, sorting sequences according to similarity and removing redundant sequences. This function used the following Percent Identity heuristic:

      for seq A and B, where A and B are strings including one or more gaps: return percent identity = 100-(number of non-equivalent symbol pairs which are not gaps)/Min(length(A), length(B)).

      Whilst this heuristic works well for redundancy removal, it does not always yield expected results when calculating the PID for trees - since gapped columns are marked as 'similar' rather than different (sequences 'A' and 'AA' are 100% similar). In this case, phylogeneticists prefer to exclude the column from the calculation entirely (aligned portion of sequences 'A' and 'AA' are 100% similar) but most biologists would expect a gap!=residue (i.e. the sequences A and AA are 50% similar).

      Bugs related to this epic concern extensions to the calculation method and GUI components associated with tree building and percent identity filtering/sorting that allow these distinct modes (wildcard gaps, ignore columns with gaps, and include insertions as mismatches) of percent identity calculation to be used if sensible and desired for tree building, similarity matrix computation, sorting and redundancy removal.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jprocter James Procter
              Reporter:
              jprocter James Procter
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated: