Details
-
Type: Bug
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: 2.11.2, 2.11.3.0, 2.11.3.1, 2.11.3.2, 2.11.3.3, 2.11.4.0
-
Fix Version/s: None
-
Component/s: file format issue
-
Labels:
Description
This bug affects use of Jalview for genbank format files involving non-coding genes.
Try to import the EMBL file at https://dag.compbio.dundee.ac.uk/daffodils/data/22TAempr.embl
Whilst the sequence is imported, and Calculate->Get Cross-References -> EMBLCDS shows a CDS/Protein alignment with 44 sequences, there were errors shown on import like:
ERROR - Ignoring CDS feature with no protein_id for EMBL:22TAempr
Opening the overview window shows the sequence has several light grey features corresponding to exons on the DNA.
However, the same data is also available as a fasta file and adding a GFF3 file:
https://dag.compbio.dundee.ac.uk/daffodils/data/22TAempr.fasta
https://dag.compbio.dundee.ac.uk/daffodils/data/22TAempr.gff3
Loading 22TAempr.fasta and adding 22TAempr.gff3 works without errors. This time, Calculate->Get Cross-References isn't available, but opening the overview shows there are many more CDS annotations (light grey), and also some dark blue rRNA features marked.
The .embl file is produced from a DNA gene finding/annotation transfer pipeline. It might not be right - but it would be useful if Jalview could extract more data from it if it is. At least show the location of the genes and display the additional metadata included in the .embl file.
Try to import the EMBL file at https://dag.compbio.dundee.ac.uk/daffodils/data/22TAempr.embl
Whilst the sequence is imported, and Calculate->Get Cross-References -> EMBLCDS shows a CDS/Protein alignment with 44 sequences, there were errors shown on import like:
ERROR - Ignoring CDS feature with no protein_id for EMBL:22TAempr
Opening the overview window shows the sequence has several light grey features corresponding to exons on the DNA.
However, the same data is also available as a fasta file and adding a GFF3 file:
https://dag.compbio.dundee.ac.uk/daffodils/data/22TAempr.fasta
https://dag.compbio.dundee.ac.uk/daffodils/data/22TAempr.gff3
Loading 22TAempr.fasta and adding 22TAempr.gff3 works without errors. This time, Calculate->Get Cross-References isn't available, but opening the overview shows there are many more CDS annotations (light grey), and also some dark blue rRNA features marked.
The .embl file is produced from a DNA gene finding/annotation transfer pipeline. It might not be right - but it would be useful if Jalview could extract more data from it if it is. At least show the location of the genes and display the additional metadata included in the .embl file.