Details
-
Type: Improvement
-
Status: Open
-
Priority: Minor
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: 2.11.2
-
Component/s: file format issue
-
Labels:None
-
Urgency:Urgent
Description
The GenBank file parser src/jalview/io/GenBankFile.java to be merged on to 2.11.2, adapted I think from EmblFlatFile.java, requires a sequence id to open. This is obtained from the ACCESSION line, which has been seen to sometimes not be present, causing the file to not be opened at all.
Description of GenBank file format can be seen at
https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html (as provided by David Martínez inJAL-1260), replicated at
https://www.ncbi.nlm.nih.gov/genbank/samplerecord/
and original(?) description at https://europepmc.org/article/pmc/147205
From these is is not explicitly stated whether the ACCESSION line is mandatory, although I believe it _should_ always be present.
However the example file attached toJAL-1260
https://issues.jalview.org/plugins/servlet/com.redmoon.jira.documentvault/download-jira-document?issueId=12298&attId=10633
does NOT have an ACCESSION line and so failed to open.
The description of the LOCUS line in the documentation says that the Locus ID (often/always? similar to the Sequence ID) is given as the first whitespace delimited value (after the "LOCUS" signature/pragma). This should be a suitable alternative if no other ACCESSION is available, although preference should probably be given to a "VERSION" value. (It's probably unlikely a file that has no ACCESSION line with have a VERSION line though.)
At a minimum, this ACCESSION value can be used to at least allow the file to be opened.
Description of GenBank file format can be seen at
https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html (as provided by David Martínez in
https://www.ncbi.nlm.nih.gov/genbank/samplerecord/
and original(?) description at https://europepmc.org/article/pmc/147205
From these is is not explicitly stated whether the ACCESSION line is mandatory, although I believe it _should_ always be present.
However the example file attached to
https://issues.jalview.org/plugins/servlet/com.redmoon.jira.documentvault/download-jira-document?issueId=12298&attId=10633
does NOT have an ACCESSION line and so failed to open.
The description of the LOCUS line in the documentation says that the Locus ID (often/always? similar to the Sequence ID) is given as the first whitespace delimited value (after the "LOCUS" signature/pragma). This should be a suitable alternative if no other ACCESSION is available, although preference should probably be given to a "VERSION" value. (It's probably unlikely a file that has no ACCESSION line with have a VERSION line though.)
At a minimum, this ACCESSION value can be used to at least allow the file to be opened.
Attachments
Issue Links
- related with
-
JAL-1260 Import genbank file
- Closed