Maker GFF output with features of 0 length

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Maker GFF output with features of 0 length

Marc Höppner-3
Hi,

I’ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. 

For example:

scaffold_2927   maker   CDS     13013   13013   .       +       1       ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1


This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn’t mean that there aren’t any cryptic issues that only on these occasions read their head… Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still...

I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference.

Regards,

Marc



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker GFF output with features of 0 length

Carson Holt-2
Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)?

--Carson


From: Marc Höppner <[hidden email]>
Date: Wednesday, July 30, 2014 at 4:44 AM
To: <[hidden email]>
Subject: [maker-devel] Maker GFF output with features of 0 length

Hi,

I’ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. 

For example:

scaffold_2927   maker   CDS     13013   13013   .       +       1       ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1


This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn’t mean that there aren’t any cryptic issues that only on these occasions read their head… Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still...

I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference.

Regards,

Marc


_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker GFF output with features of 0 length

Carson Holt-2
One more thing.  From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example.  Do you have non-CDS feature types where this happens, or any internal CDS's where this happens?

--Carson


From: Carson Holt <[hidden email]>
Date: Tuesday, August 5, 2014 at 2:21 PM
To: Marc Höppner <[hidden email]>, <[hidden email]>
Subject: Re: [maker-devel] Maker GFF output with features of 0 length

Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)?

--Carson


From: Marc Höppner <[hidden email]>
Date: Wednesday, July 30, 2014 at 4:44 AM
To: <[hidden email]>
Subject: [maker-devel] Maker GFF output with features of 0 length

Hi,

I’ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. 

For example:

scaffold_2927   maker   CDS     13013   13013   .       +       1       ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1


This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn’t mean that there aren’t any cryptic issues that only on these occasions read their head… Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still...

I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference.

Regards,

Marc


_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker GFF output with features of 0 length

Marc Höppner-3
Hi,

I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway).

What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn’t only affect CDS features. 

I have put the Maker output for a test scaffold here:


The problematic lines: 
scaffold_563    maker   five_prime_UTR  38501   38501   .       -       .       ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1
scaffold_563    maker   exon    69967   69967   .       -       .       ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1
scaffold_563    maker   CDS     69967   69967   .       -       1       ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1


Strange stuff…

Regards,

Marc

On 05 Aug 2014, at 22:49, Carson Holt <[hidden email]> wrote:

One more thing.  From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example.  Do you have non-CDS feature types where this happens, or any internal CDS's where this happens?

--Carson


From: Carson Holt <[hidden email]>
Date: Tuesday, August 5, 2014 at 2:21 PM
To: Marc Höppner <[hidden email]>, <[hidden email]>
Subject: Re: [maker-devel] Maker GFF output with features of 0 length

Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)?

--Carson


From: Marc Höppner <[hidden email]>
Date: Wednesday, July 30, 2014 at 4:44 AM
To: <[hidden email]>
Subject: [maker-devel] Maker GFF output with features of 0 length

Hi,

I’ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. 

For example:

scaffold_2927   maker   CDS     13013   13013   .       +       1       ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1


This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn’t mean that there aren’t any cryptic issues that only on these occasions read their head… Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still...

I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference.

Regards,

Marc


_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker GFF output with features of 0 length

Carson Holt-2
In reply to this post by Marc Höppner-3
If it happening only with GFF3 pass-through, then it may be something I saw and fixed a while ago (there were some GFF3 passthrough fixes since 2.31.4). Could you check and see if it still happens in 2.31.6.  Also if it is only the first or last CDS/exon, then Augustus can do that and it's not actually a bug.  Basically it is truncating the model to the start/stop codon so the first or last exon/CDS may appear short, but it's really just incomplete.  If you can find any example of a non-CDS/exon feature then could you send it to me?

Thanks,
Carson


From: Marc Höppner <[hidden email]>
Date: Wednesday, July 30, 2014 at 4:44 AM
To: <[hidden email]>
Subject: [maker-devel] Maker GFF output with features of 0 length

Hi,

I’ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. 

For example:

scaffold_2927   maker   CDS     13013   13013   .       +       1       ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1


This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn’t mean that there aren’t any cryptic issues that only on these occasions read their head… Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still...

I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference.

Regards,

Marc


_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker GFF output with features of 0 length

Carson Hinton Holt
In reply to this post by Marc Höppner-3
Ok.  I took a look and I'm relatively sure the issue you are seeing is caused by GFF3 passthrough combined with correct_est_fusion=1.  This is something that only happens when both are used simultaneously and should be corrected in the current version of MAKER.

Thanks,
Carson


From: Marc Höppner <[hidden email]>
Date: Wednesday, August 6, 2014 at 12:14 AM
To: Carson Holt <[hidden email]>
Cc: <[hidden email]>
Subject: Re: [maker-devel] Maker GFF output with features of 0 length

Hi,

I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway).

What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn’t only affect CDS features. 

I have put the Maker output for a test scaffold here:


The problematic lines: 
scaffold_563    maker   five_prime_UTR  38501   38501   .       -       .       ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1
scaffold_563    maker   exon    69967   69967   .       -       .       ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1
scaffold_563    maker   CDS     69967   69967   .       -       1       ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1


Strange stuff…

Regards,

Marc

On 05 Aug 2014, at 22:49, Carson Holt <[hidden email]> wrote:

One more thing.  From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example.  Do you have non-CDS feature types where this happens, or any internal CDS's where this happens?

--Carson


From: Carson Holt <[hidden email]>
Date: Tuesday, August 5, 2014 at 2:21 PM
To: Marc Höppner <[hidden email]>, <[hidden email]>
Subject: Re: [maker-devel] Maker GFF output with features of 0 length

Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)?

--Carson


From: Marc Höppner <[hidden email]>
Date: Wednesday, July 30, 2014 at 4:44 AM
To: <[hidden email]>
Subject: [maker-devel] Maker GFF output with features of 0 length

Hi,

I’ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. 

For example:

scaffold_2927   maker   CDS     13013   13013   .       +       1       ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1


This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn’t mean that there aren’t any cryptic issues that only on these occasions read their head… Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still...

I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference.

Regards,

Marc


_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org