Error trying to submit genome to ncbi

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Error trying to submit genome to ncbi

Emmanuel Nnadi
Hi,

I am trying to submit my genome i annotated using maker and they sent back this error,
1. Please remove any N nucleotides from the beginning or end of the sequence
 2.No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.
You have a large number of gene features that are not associated
with other features.  Please include on these features in the
gene description field some description of what the gene would
have encoded.

A feature table example of this is:

<41156  >40652  gene
                        gene_desc       transposon
                        locus_tag       CR513_45338
                        note    nonfunctional due to frameshift
Please how can i use maker to solve this problem?


Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error trying to submit genome to ncbi

Daniel Ence-2
Hi, I think you’ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don’t have associated transcript, CDS or exon features. I’m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the “type” field (column 3) from “gene” to something else, like “transposable_element” perhaps. 

~Daniel


On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi,

I am trying to submit my genome i annotated using maker and they sent back this error,
1. Please remove any N nucleotides from the beginning or end of the sequence
 2.No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.
You have a large number of gene features that are not associated
with other features.  Please include on these features in the
gene description field some description of what the gene would
have encoded.

A feature table example of this is:

<41156  >40652  gene
                        gene_desc       transposon
                        locus_tag       CR513_45338
                        note    nonfunctional due to frameshift
Please how can i use maker to solve this problem?


Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error trying to submit genome to ncbi

Daniel Ence-2
Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? 

~Daniel




On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi Daniel thanks for your reply.

I have attached my .tbl file

you would see 
<77753 >77549 gene
locus_tag CR513_00193
gene AtMg00820
note nonfunctional due to frameshift


Is another example.

Its becoming frustrating.

I have not posted the two errors before 
[1] Please remove any N nucleotides from the beginning or end of the sequence.

[2] No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.

Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.

On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence <[hidden email]> wrote:
Hi, I think you’ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don’t have associated transcript, CDS or exon features. I’m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the “type” field (column 3) from “gene” to something else, like “transposable_element” perhaps. 

~Daniel


On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi,

I am trying to submit my genome i annotated using maker and they sent back this error,
1. Please remove any N nucleotides from the beginning or end of the sequence
 2.No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.
You have a large number of gene features that are not associated
with other features.  Please include on these features in the
gene description field some description of what the gene would
have encoded.

A feature table example of this is:

<41156  >40652  gene
                        gene_desc       transposon
                        locus_tag       CR513_45338
                        note    nonfunctional due to frameshift
Please how can i use maker to solve this problem?


Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


<combined_5001-10000.tbl>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error trying to submit genome to ncbi

Daniel Ence-2
These gene features with the “nonfunctional due to frameshift” indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I’m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. 



On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi Daniel,

This is the mail they sent to me

[1] Please remove any N nucleotides from the beginning or end of the sequence.

[2] No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.

[4] You have a large number of gene features that are not associated
with other features.  Please include on these features in the
gene description field some description of what the gene would
have encoded.

A feature table example of this is:

<41156  >40652  gene
                        gene_desc       transposon
                        locus_tag       CR513_45338
                        note    nonfunctional due to frameshift

[5] Every coding region must have a corresponding mRNA and in
every case the mRNA product name must match exactly that of the
CDS feature.

2 coding regions do not have an mRNA
ORIG/combined_1-5000.sqn:CDS    cytochrome c oxidase subunit 2  (contig_100:<38458-
39198, 40429->40623)    CR513_00692
ORIG/combined_1-5000.sqn:CDS    cytochrome c oxidase subunit 1
(contig_100:c>113064-111485, c111245-111221)    CR513_00691

So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files 

I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. 


I have ran out of idea

Please help me






Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.

On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence <[hidden email]> wrote:
Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? 

~Daniel




On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi Daniel thanks for your reply.

I have attached my .tbl file

you would see 
<77753 >77549 gene
locus_tag CR513_00193
gene AtMg00820
note nonfunctional due to frameshift


Is another example.

Its becoming frustrating.

I have not posted the two errors before 
[1] Please remove any N nucleotides from the beginning or end of the sequence.

[2] No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.

Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.

On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence <[hidden email]> wrote:
Hi, I think you’ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don’t have associated transcript, CDS or exon features. I’m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the “type” field (column 3) from “gene” to something else, like “transposable_element” perhaps. 

~Daniel


On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi,

I am trying to submit my genome i annotated using maker and they sent back this error,
1. Please remove any N nucleotides from the beginning or end of the sequence
 2.No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.
You have a large number of gene features that are not associated
with other features.  Please include on these features in the
gene description field some description of what the gene would
have encoded.

A feature table example of this is:

<41156  >40652  gene
                        gene_desc       transposon
                        locus_tag       CR513_45338
                        note    nonfunctional due to frameshift
Please how can i use maker to solve this problem?


Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


<combined_5001-10000.tbl>




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error trying to submit genome to ncbi

Carson Holt-2
If you modified the fasta files to remove N’s etc after they were annotated, then that would generate a mismatch between the GFF3 coordinates and the fasta sequence.

Have you modified or split contigs in the assembly in any way? I seem to remember you posting an issue about the fasta submission to NCBI previously.

—Carson


On Nov 2, 2017, at 2:46 PM, Daniel Ence <[hidden email]> wrote:

These gene features with the “nonfunctional due to frameshift” indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I’m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. 



On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi Daniel,

This is the mail they sent to me

[1] Please remove any N nucleotides from the beginning or end of the sequence.

[2] No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.

[4] You have a large number of gene features that are not associated
with other features.  Please include on these features in the
gene description field some description of what the gene would
have encoded.

A feature table example of this is:

<41156  >40652  gene
                        gene_desc       transposon
                        locus_tag       CR513_45338
                        note    nonfunctional due to frameshift

[5] Every coding region must have a corresponding mRNA and in
every case the mRNA product name must match exactly that of the
CDS feature.

2 coding regions do not have an mRNA
ORIG/combined_1-5000.sqn:CDS    cytochrome c oxidase subunit 2  (contig_100:<38458-
39198, 40429->40623)    CR513_00692
ORIG/combined_1-5000.sqn:CDS    cytochrome c oxidase subunit 1
(contig_100:c>113064-111485, c111245-111221)    CR513_00691

So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files 

I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. 


I have ran out of idea

Please help me






Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.

On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence <[hidden email]> wrote:
Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? 

~Daniel




On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi Daniel thanks for your reply.

I have attached my .tbl file

you would see 
<77753 >77549 gene
locus_tag CR513_00193
gene AtMg00820
note nonfunctional due to frameshift


Is another example.

Its becoming frustrating.

I have not posted the two errors before 
[1] Please remove any N nucleotides from the beginning or end of the sequence.

[2] No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.

Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.

On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence <[hidden email]> wrote:
Hi, I think you’ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don’t have associated transcript, CDS or exon features. I’m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the “type” field (column 3) from “gene” to something else, like “transposable_element” perhaps. 

~Daniel


On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi,

I am trying to submit my genome i annotated using maker and they sent back this error,
1. Please remove any N nucleotides from the beginning or end of the sequence
 2.No feature should begin or end inside a gap.  Instead the feature should
be made partial at the gap boundary.

[3] Coding regions should not be 5' partial if they begin with the start
methionine.  If this is an internal methionine int he translation than
it is fine if they are partial.  Conversely, all coding regions
must have a stop codon or be 3' partial.
You have a large number of gene features that are not associated
with other features.  Please include on these features in the
gene description field some description of what the gene would
have encoded.

A feature table example of this is:

<41156  >40652  gene
                        gene_desc       transposon
                        locus_tag       CR513_45338
                        note    nonfunctional due to frameshift
Please how can i use maker to solve this problem?


Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


<combined_5001-10000.tbl>



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error trying to submit genome to ncbi

Daniel Ence-2
In reply to this post by Daniel Ence-2
Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the “NNN” characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. 

~Daniel









On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi <[hidden email]> wrote:



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error trying to submit genome to ncbi

Daniel Ence-2
Hi Emmanuel, Please “reply all” to in these exchanges so that they’ll stay stored on the maker-devel list for others to find in the future. It also helps keep the conversation open so that others can chime in and help out too. :) 

I looked at several of the “nonfunctional due to frameshift” genes and they have associated features in the gff3 file. So there might be a frameshift issue in the original annotations, but I’d doubt that, or a frameshift error might be getting introduced when you convert to the tbl format. 


On Nov 2, 2017, at 5:12 PM, Emmanuel Nnadi <[hidden email]> wrote:

Hi Daniel

I NCBI first complained of this even when I hadn't used GAG annotation to remove N's,

On my raw file they complained about this

Nnadi Nnaemeka Emmanuel
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.

On Thu, Nov 2, 2017 at 10:07 PM, Daniel Ence <[hidden email]> wrote:
Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the “NNN” characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. 

~Daniel









On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi <[hidden email]> wrote:





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (1K) Download Attachment