maker_gff parameter - problem when gff contains fasta sequences

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

maker_gff parameter - problem when gff contains fasta sequences

Jacques Dainat-4
Dear Carson,

I’m using  maker/3.01.02 with open MPI.
I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file.
e.g: 
```
###
##FASTA
>3098|quiver
TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA
GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC
```
I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc… I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn’t consider the rest of it.

I haven’t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven’t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence?


Best regards,

/Jacques
-------------------------------------------------
Jacques Dainat, Ph.D.
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service
http://nbis.se/about/staff/jacques-dainat

Contact — 
Address: Uppsala University, Biomedicinska Centrum
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: +46 18 471 46 25


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker_gff parameter - problem when gff contains fasta sequences

Carson Holt-2
All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like “cat” however results in a broken file.

—Carson 

Sent from my iPhone

On Aug 20, 2019, at 2:14 AM, Jacques Dainat <[hidden email]> wrote:

Dear Carson,

I’m using  maker/3.01.02 with open MPI.
I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file.
e.g: 
```
###
##FASTA
>3098|quiver
TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA
GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC
```
I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc… I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn’t consider the rest of it.

I haven’t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven’t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence?


Best regards,

/Jacques
-------------------------------------------------
Jacques Dainat, Ph.D.
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service
http://nbis.se/about/staff/jacques-dainat

Contact — 
Address: Uppsala University, Biomedicinska Centrum
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: +46 18 471 46 25

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker_gff parameter - problem when gff contains fasta sequences

Carson Holt-2
Here is the relevant part of the format specification —>
##FASTA

This notation indicates that the annotation portion of the file is at an end and that the remainder of the file contains one or more sequences (nucleotide or protein) in FASTA format. This allows features and sequences to be bundled together. All FASTA sequences included in the file must be included together at the end of the file and may not be interspersed with the features lines. Once a ##FASTA section is encountered no other content beyond valid FASTA sequence is allowed.



—Carson

Sent from my iPhone

On Aug 20, 2019, at 7:16 AM, Carson Holt <[hidden email]> wrote:

All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like “cat” however results in a broken file.

—Carson 

Sent from my iPhone

On Aug 20, 2019, at 2:14 AM, Jacques Dainat <[hidden email]> wrote:

Dear Carson,

I’m using  maker/3.01.02 with open MPI.
I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file.
e.g: 
```
###
##FASTA
>3098|quiver
TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA
GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC
```
I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc… I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn’t consider the rest of it.

I haven’t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven’t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence?


Best regards,

/Jacques
-------------------------------------------------
Jacques Dainat, Ph.D.
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service
http://nbis.se/about/staff/jacques-dainat

Contact — 
Address: Uppsala University, Biomedicinska Centrum
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: +46 18 471 46 25

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: maker_gff parameter - problem when gff contains fasta sequences

Jacques Dainat-4
In reply to this post by Carson Holt-2
Thank you for your quick answer,  you are right I should have read the gff3 specification more carefully. 
I will investigate which step I modified that introduced the problem.
Thank again. 

/Jacques

On 20 Aug 2019, at 15:16, Carson Holt <[hidden email]> wrote:

All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like “cat” however results in a broken file.

—Carson 

Sent from my iPhone

On Aug 20, 2019, at 2:14 AM, Jacques Dainat <[hidden email]> wrote:

Dear Carson,

I’m using  maker/3.01.02 with open MPI.
I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file.
e.g: 
```
###
##FASTA
>3098|quiver
TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA
GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC
```
I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc… I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn’t consider the rest of it.

I haven’t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven’t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence?


Best regards,

/Jacques
-------------------------------------------------
Jacques Dainat, Ph.D.
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service
http://nbis.se/about/staff/jacques-dainat

Contact — 
Address: Uppsala University, Biomedicinska Centrum
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: +46 18 471 46 25

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org