Details
-
Type: New Feature
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: 2.11.2.6
-
Fix Version/s: 2.12.0
-
Component/s: file format issue
-
Labels:None
-
Urgency:Urgent
Description
Feature request from tweet: https://twitter.com/villena_francis/status/1634121992562520064
"@Jalview In a future update, could you include support for the "relaxed phylip format"? I've been able to open them in programs like AliView or IQ-TREE, but not in Jalview... Just opens the "strict" format with names limited to 10 characters 🥲"
Relaxed PHYLIP format removes the restriction to 10 characters for sequence names.
The format allows up to 250 characters (variable across but not within files) with the length determined by the longest sequence name (plus one for a space, call this n).
Sequence names cannot contain spaces.
Sequence names are right-padded to n characters with spaces.
Unfortunately this means strict PHYLIP format is not simply a special case of relaxed PHYLIP since it does not need a space between the sequence name and sequence data, and the name can contain spaces.
Sources of information:
https://bioportal.bioontology.org/ontologies/EDAM?p=classes&conceptid=format_3819
https://bioperl.org/formats/alignment_formats/PHYLIP_multiple_alignment_format.html
http://www.phylo.org/index.php/help/relaxed_phylip
https://biopython.org/docs/1.75/api/Bio.AlignIO.PhylipIO.html
http://www.phylo.org/tools/obsolete/phylip.html
http://scikit-bio.org/docs/0.2.3/generated/skbio.io.phylip.html
Suggest getting reading working first (subclass of PHYLIP).
Suggest renaming PHYLIP as "Strict PHYLIP" and use "Relaxed PHYLIP" (same extension).
"@Jalview In a future update, could you include support for the "relaxed phylip format"? I've been able to open them in programs like AliView or IQ-TREE, but not in Jalview... Just opens the "strict" format with names limited to 10 characters 🥲"
Relaxed PHYLIP format removes the restriction to 10 characters for sequence names.
The format allows up to 250 characters (variable across but not within files) with the length determined by the longest sequence name (plus one for a space, call this n).
Sequence names cannot contain spaces.
Sequence names are right-padded to n characters with spaces.
Unfortunately this means strict PHYLIP format is not simply a special case of relaxed PHYLIP since it does not need a space between the sequence name and sequence data, and the name can contain spaces.
Sources of information:
https://bioportal.bioontology.org/ontologies/EDAM?p=classes&conceptid=format_3819
https://bioperl.org/formats/alignment_formats/PHYLIP_multiple_alignment_format.html
http://www.phylo.org/index.php/help/relaxed_phylip
https://biopython.org/docs/1.75/api/Bio.AlignIO.PhylipIO.html
http://www.phylo.org/tools/obsolete/phylip.html
http://scikit-bio.org/docs/0.2.3/generated/skbio.io.phylip.html
Suggest getting reading working first (subclass of PHYLIP).
Suggest renaming PHYLIP as "Strict PHYLIP" and use "Relaxed PHYLIP" (same extension).