Uploaded image for project: 'Jalview'
  1. Jalview
  2. JAL-374

PID pairwise distance miscalculated as zero for unaligned sequences in UPGMA tree view

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.4
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Mantis ID:
      43773

      Description

      See attached example provided by Daron Standley.
      Daron Standley wrote:
      > I recently tried to draw a tree from an MSA in which some of the
      > sequences were completely unaligned (e.g. a sequence fragment that
      > aligned only to the N-terminal region and one that aligned only to the
      > C-terminal region). Oddly, these two sequences were placed together on
      > the (average distance % id) tree, even though their distance would be
      > undefined. This suggests that there is a bug in the Jalview tree code
      > that (perhaps) initializes the distance to zero. Attached is the MSA
      > that cases the problem. Two of the sequences that should not be close
      > together are 1w0tA and MOUSEAkirin2.
      ah. yes. it does look like something strange is going on there - this is a good test case. I verified jalview's UPGMA implementation some time ago, and it looks like BLOSUM62 behaves as expected, so it's probably some strangeness with the PID distance function when two sequences are completly unaligned. I'll add it to the bugtrack.

      Jim.



      ****** ADDITIONAL INFORMATION ******

      >4=3bqoA/1-202
      -------------------------------------------------------------EDAGLVAEAEA
      VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFEN
      DE-----RITPLESALMIWGSIEKEHD--KLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIF-----HMP
      FKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKR-------------
      ------------------------------------------------------------------------
      ------------------------------------------------------------------------
      -------------------------------------------------------
      >7=HUMANTRF1/1-439
      MAEDVSSAAPSPRGCADGRDADPTEEQMAETERNDEEQFECQELLECQVQVGAPEEEEEEEEDAGLVAEAEA
      VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFEN
      DE-----RITPLESALMIWGSIEKEHD--KLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIFGDPNSHMP
      FKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKRT-----RTITSQD
      KPSGNDVEM----ETEANLDTRKSVSDKQSAVTESSEGTVSLLRSHKNLFLSKLQHGTQQQDLNKKERRVGT
      PQSTKKKKESRRAT-E------------SRIPVS--------------KSQPVTPEKH-----RARKRQAWL
      WEEDKNLRSGVRKYGEGNWSKILLHYKFNNRTSVMLKDRWRTMKKLKLISSDSED
      >5=MOUSETRF1/1-421
      MAETVSSAA---------RDAPSREGWTDSDSPEQEEVGDDAELLQCQLQLGTPREM----ENAELVAEVEA
      VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRDSAEAIIHGLHRLTAYQLKTVYICQFLTRVASGKALDAQFEV
      DE-----RITPLESALMIWNSIEKEHD--KLHDEIKNLIKIQAVAVCMEIGSFKEAEEVFERIFGDPEFYTP
      LERKLLKIISQKDVFHSLFQHFSYSCMMEKIQSYVGDVLSEKSSTFLMKAATKVVENEKA-----RTQASKD
      RPDATNTGM----DTEVGLNKEKSVNGQQSTETEPLVDTVSSIRSHKNA-LSQLKHRRAPSDFSRNEARTGT
      LQCETTMERNRRTSGR------------NRLCVS--------------ENQPDTDDKS-----GRRKRQTWL
      WEEDRILKCGVKKYGEGNWAKILSHYKFNNRTSVMLKDRWRTMKRLKLIS-----
      >1=MOUSEAkirin2/1-201
      MACGATLKRTLDFD-PLLSPASPKRR-----------------------------------RCAPLSAPASA
      AASP------AAATAAAAASAAAASPQKYLRMEPSPFGDVSSRLTTEQILYNIKQEYKRMQKRRHLEASFQQ
      ADPGCTSDSQPHAFLISGPASPGTSSATSSPLKKEQPLFTLRQVGM------------ICERLLKEREEKVR
      EEYEEILNTKLAEQYDAFVK-FTHDQIMRRY--------GEQPASYVS------------------------
      ------------------------------------------------------------------------
      ------------------------------------------------------------------------
      -------------------------------------------------------
      >6=1w0tA/1-52
      ------------------------------------------------------------------------
      ------------------------------------------------------------------------
      ------------------------------------------------------------------------
      ------------------------------------------------------------------------
      ------------------------------------------------------------------------
      ------------------------------------------------------------------KRQAWL
      WEEDKNLRSGVRKYGEGNWSKILLHYKFNNRTSVMLKDRWRTMKKL---------
      >2=MOUSAkirin1/1-191
      MACGATLKRPMEFEAALLSPGSPKRR-----------------------------------RCAPLPGPTPG
      LRPP------DAEPPPLQMQTPPASLQ-----QPAPPGSERRLPTPEQIFQNIKQEYNRYQRWRHLEVVLSQ
      SE-ACTSETQPSSSALTAPGSPG-----AFWMKKDQPTFTLRQVGI------------ICERLLKDYEDKVR
      EEYEQILSTKLAEQYESFVK-FTHDQIMRRY--------GTRPTSYVS------------------------
      ------------------------------------------------------------------------
      ------------------------------------------------------------------------
      -------------------------------------------------------
      >3=XenopusTRF1/1-420
      MEEE---------------------------------------------------------TDGPPFDDTAA
      VATNWMCDFMFASMCFYFREDRTEDFQRSTHMLEWLLEGSQKLDA-HRKTIPIAQFLMRVAEGKNLDSQFDT
      DE-----SLTPLETALMAFNQIEEEEDLKHLHEEIELLLKVQAVVTCMEKGRFKLSAEILDRLFKESGSNKY
      LRMKLTMLIEKKDPYHEFLQNFTYAQMMKKIKSYIALKMKERPSVFLLKAAAKVVEATAKEELDIQSQESED
      CEQQTNESLENKDDNSSSEYEERDVLSLSNINHVENKEDISS-SDYEEA-AEQLK--VCNRDINQNELTNTT
      NIQETTEKSTKRHQRRLFSIAQRTPWNPDKPCTSKRLLSSINIGKNSKENQENVKDSRTEKPLNSKKRQHWT
      WEEDELLKKGVRKFGVGNWSKILLHYEFRNRTGVMLKDRWRTMKRLKIVDSDCDL

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              gmungoc Mungo Carstairs
              Reporter:
              jprocter James Procter
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: