Details
-
Type: Bug
-
Status: In Progress
-
Priority: Minor
-
Resolution: Unresolved
-
Affects Version/s: 2.4
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Mantis ID:43773
Description
See attached example provided by Daron Standley.
Daron Standley wrote:
> I recently tried to draw a tree from an MSA in which some of the
> sequences were completely unaligned (e.g. a sequence fragment that
> aligned only to the N-terminal region and one that aligned only to the
> C-terminal region). Oddly, these two sequences were placed together on
> the (average distance % id) tree, even though their distance would be
> undefined. This suggests that there is a bug in the Jalview tree code
> that (perhaps) initializes the distance to zero. Attached is the MSA
> that cases the problem. Two of the sequences that should not be close
> together are 1w0tA and MOUSEAkirin2.
ah. yes. it does look like something strange is going on there - this is a good test case. I verified jalview's UPGMA implementation some time ago, and it looks like BLOSUM62 behaves as expected, so it's probably some strangeness with the PID distance function when two sequences are completly unaligned. I'll add it to the bugtrack.
Jim.
****** ADDITIONAL INFORMATION ******
>4=3bqoA/1-202
-------------------------------------------------------------EDAGLVAEAEA
VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFEN
DE-----RITPLESALMIWGSIEKEHD--KLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIF-----HMP
FKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKR-------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------------------------------
>7=HUMANTRF1/1-439
MAEDVSSAAPSPRGCADGRDADPTEEQMAETERNDEEQFECQELLECQVQVGAPEEEEEEEEDAGLVAEAEA
VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFEN
DE-----RITPLESALMIWGSIEKEHD--KLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIFGDPNSHMP
FKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKRT-----RTITSQD
KPSGNDVEM----ETEANLDTRKSVSDKQSAVTESSEGTVSLLRSHKNLFLSKLQHGTQQQDLNKKERRVGT
PQSTKKKKESRRAT-E------------SRIPVS--------------KSQPVTPEKH-----RARKRQAWL
WEEDKNLRSGVRKYGEGNWSKILLHYKFNNRTSVMLKDRWRTMKKLKLISSDSED
>5=MOUSETRF1/1-421
MAETVSSAA---------RDAPSREGWTDSDSPEQEEVGDDAELLQCQLQLGTPREM----ENAELVAEVEA
VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRDSAEAIIHGLHRLTAYQLKTVYICQFLTRVASGKALDAQFEV
DE-----RITPLESALMIWNSIEKEHD--KLHDEIKNLIKIQAVAVCMEIGSFKEAEEVFERIFGDPEFYTP
LERKLLKIISQKDVFHSLFQHFSYSCMMEKIQSYVGDVLSEKSSTFLMKAATKVVENEKA-----RTQASKD
RPDATNTGM----DTEVGLNKEKSVNGQQSTETEPLVDTVSSIRSHKNA-LSQLKHRRAPSDFSRNEARTGT
LQCETTMERNRRTSGR------------NRLCVS--------------ENQPDTDDKS-----GRRKRQTWL
WEEDRILKCGVKKYGEGNWAKILSHYKFNNRTSVMLKDRWRTMKRLKLIS-----
>1=MOUSEAkirin2/1-201
MACGATLKRTLDFD-PLLSPASPKRR-----------------------------------RCAPLSAPASA
AASP------AAATAAAAASAAAASPQKYLRMEPSPFGDVSSRLTTEQILYNIKQEYKRMQKRRHLEASFQQ
ADPGCTSDSQPHAFLISGPASPGTSSATSSPLKKEQPLFTLRQVGM------------ICERLLKEREEKVR
EEYEEILNTKLAEQYDAFVK-FTHDQIMRRY--------GEQPASYVS------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------------------------------
>6=1w0tA/1-52
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------KRQAWL
WEEDKNLRSGVRKYGEGNWSKILLHYKFNNRTSVMLKDRWRTMKKL---------
>2=MOUSAkirin1/1-191
MACGATLKRPMEFEAALLSPGSPKRR-----------------------------------RCAPLPGPTPG
LRPP------DAEPPPLQMQTPPASLQ-----QPAPPGSERRLPTPEQIFQNIKQEYNRYQRWRHLEVVLSQ
SE-ACTSETQPSSSALTAPGSPG-----AFWMKKDQPTFTLRQVGI------------ICERLLKDYEDKVR
EEYEQILSTKLAEQYESFVK-FTHDQIMRRY--------GTRPTSYVS------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------------------------------
>3=XenopusTRF1/1-420
MEEE---------------------------------------------------------TDGPPFDDTAA
VATNWMCDFMFASMCFYFREDRTEDFQRSTHMLEWLLEGSQKLDA-HRKTIPIAQFLMRVAEGKNLDSQFDT
DE-----SLTPLETALMAFNQIEEEEDLKHLHEEIELLLKVQAVVTCMEKGRFKLSAEILDRLFKESGSNKY
LRMKLTMLIEKKDPYHEFLQNFTYAQMMKKIKSYIALKMKERPSVFLLKAAAKVVEATAKEELDIQSQESED
CEQQTNESLENKDDNSSSEYEERDVLSLSNINHVENKEDISS-SDYEEA-AEQLK--VCNRDINQNELTNTT
NIQETTEKSTKRHQRRLFSIAQRTPWNPDKPCTSKRLLSSINIGKNSKENQENVKDSRTEKPLNSKKRQHWT
WEEDELLKKGVRKFGVGNWSKILLHYEFRNRTGVMLKDRWRTMKRLKIVDSDCDL
Daron Standley wrote:
> I recently tried to draw a tree from an MSA in which some of the
> sequences were completely unaligned (e.g. a sequence fragment that
> aligned only to the N-terminal region and one that aligned only to the
> C-terminal region). Oddly, these two sequences were placed together on
> the (average distance % id) tree, even though their distance would be
> undefined. This suggests that there is a bug in the Jalview tree code
> that (perhaps) initializes the distance to zero. Attached is the MSA
> that cases the problem. Two of the sequences that should not be close
> together are 1w0tA and MOUSEAkirin2.
ah. yes. it does look like something strange is going on there - this is a good test case. I verified jalview's UPGMA implementation some time ago, and it looks like BLOSUM62 behaves as expected, so it's probably some strangeness with the PID distance function when two sequences are completly unaligned. I'll add it to the bugtrack.
Jim.
****** ADDITIONAL INFORMATION ******
>4=3bqoA/1-202
-------------------------------------------------------------EDAGLVAEAEA
VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFEN
DE-----RITPLESALMIWGSIEKEHD--KLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIF-----HMP
FKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKR-------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------------------------------
>7=HUMANTRF1/1-439
MAEDVSSAAPSPRGCADGRDADPTEEQMAETERNDEEQFECQELLECQVQVGAPEEEEEEEEDAGLVAEAEA
VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFEN
DE-----RITPLESALMIWGSIEKEHD--KLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIFGDPNSHMP
FKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKRT-----RTITSQD
KPSGNDVEM----ETEANLDTRKSVSDKQSAVTESSEGTVSLLRSHKNLFLSKLQHGTQQQDLNKKERRVGT
PQSTKKKKESRRAT-E------------SRIPVS--------------KSQPVTPEKH-----RARKRQAWL
WEEDKNLRSGVRKYGEGNWSKILLHYKFNNRTSVMLKDRWRTMKKLKLISSDSED
>5=MOUSETRF1/1-421
MAETVSSAA---------RDAPSREGWTDSDSPEQEEVGDDAELLQCQLQLGTPREM----ENAELVAEVEA
VAAGWMLDFLCLSLCRAFRDGRSEDFRRTRDSAEAIIHGLHRLTAYQLKTVYICQFLTRVASGKALDAQFEV
DE-----RITPLESALMIWNSIEKEHD--KLHDEIKNLIKIQAVAVCMEIGSFKEAEEVFERIFGDPEFYTP
LERKLLKIISQKDVFHSLFQHFSYSCMMEKIQSYVGDVLSEKSSTFLMKAATKVVENEKA-----RTQASKD
RPDATNTGM----DTEVGLNKEKSVNGQQSTETEPLVDTVSSIRSHKNA-LSQLKHRRAPSDFSRNEARTGT
LQCETTMERNRRTSGR------------NRLCVS--------------ENQPDTDDKS-----GRRKRQTWL
WEEDRILKCGVKKYGEGNWAKILSHYKFNNRTSVMLKDRWRTMKRLKLIS-----
>1=MOUSEAkirin2/1-201
MACGATLKRTLDFD-PLLSPASPKRR-----------------------------------RCAPLSAPASA
AASP------AAATAAAAASAAAASPQKYLRMEPSPFGDVSSRLTTEQILYNIKQEYKRMQKRRHLEASFQQ
ADPGCTSDSQPHAFLISGPASPGTSSATSSPLKKEQPLFTLRQVGM------------ICERLLKEREEKVR
EEYEEILNTKLAEQYDAFVK-FTHDQIMRRY--------GEQPASYVS------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------------------------------
>6=1w0tA/1-52
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------KRQAWL
WEEDKNLRSGVRKYGEGNWSKILLHYKFNNRTSVMLKDRWRTMKKL---------
>2=MOUSAkirin1/1-191
MACGATLKRPMEFEAALLSPGSPKRR-----------------------------------RCAPLPGPTPG
LRPP------DAEPPPLQMQTPPASLQ-----QPAPPGSERRLPTPEQIFQNIKQEYNRYQRWRHLEVVLSQ
SE-ACTSETQPSSSALTAPGSPG-----AFWMKKDQPTFTLRQVGI------------ICERLLKDYEDKVR
EEYEQILSTKLAEQYESFVK-FTHDQIMRRY--------GTRPTSYVS------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------------------------------
>3=XenopusTRF1/1-420
MEEE---------------------------------------------------------TDGPPFDDTAA
VATNWMCDFMFASMCFYFREDRTEDFQRSTHMLEWLLEGSQKLDA-HRKTIPIAQFLMRVAEGKNLDSQFDT
DE-----SLTPLETALMAFNQIEEEEDLKHLHEEIELLLKVQAVVTCMEKGRFKLSAEILDRLFKESGSNKY
LRMKLTMLIEKKDPYHEFLQNFTYAQMMKKIKSYIALKMKERPSVFLLKAAAKVVEATAKEELDIQSQESED
CEQQTNESLENKDDNSSSEYEERDVLSLSNINHVENKEDISS-SDYEEA-AEQLK--VCNRDINQNELTNTT
NIQETTEKSTKRHQRRLFSIAQRTPWNPDKPCTSKRLLSSINIGKNSKENQENVKDSRTEKPLNSKKRQHWT
WEEDELLKKGVRKFGVGNWSKILLHYEFRNRTGVMLKDRWRTMKRLKIVDSDCDL