[JAL-835] support for 'insertion positions' in a3m files [ was a2m - in UCSC speak) - Jalview

XML

Word

Printable

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.11.5.1
Component/s: file format issue
Labels:
None

Epic Link:
hmmer3

Description

Updated - now changed to 'a3m' - which is the hhsuite preferred suffix for the same format.

  Although Jalview can read the aligned FASTA format a2m (see attached), one feature of this format, "insertion positions", which formally indicates regions of a sequence not involved in an alignment, is not honoured by Jalview. This poses problems when a2m files are imported into Jalview to create coloured and annotated alignments for publication purposes.
The following description was provided by Saira Mian of lbl.gov:

  For example, I can use the SAM tool "align2model" to align the set of protein sequences "lmna.pep" to the HMM "lmna.mod" to generate "lmna.a2m" (file attached), and then create the more compact display "lmna.out" (attached). Note that 2XV5 and 3GEF are the sequences of RCSC/PDB records that correspond to different regions of one sequence in lmna.pep (HsapreLMNA_2),

align2model lmna -i lmna.mod -db lmna.pep -db 2XV5_A.fasta -db 3GEF_A.fasta ; prettyalign lmna.a2m -l80 -m4 > ! lmna.out

   As far as I can tell, Jalview can read an a2m file and is unable to convert automatically the N-terminal, internal and C-terminal insertions (lowercase letters) to numbers specifying the length of the insertion (see lmna.out). The result is an unwieldy and excessively long alignment.