GFF3 Is_circular

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

GFF3 Is_circular

Andrew McArthur
Hello all,

The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml) now has format definitions for supporting circular molecules such as plasmids or bacterial genomes.  This is done using a new Is_circular flag in the GFF3 attributes field.  Notably, "For features that cross the origin of a circular feature (e.g. most bacterial genomes, plasmids, and some viral genomes), the requirement for start to be less than or equal to end is satisfied by making end = the position of the end + the length of the landmark feature."

Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to GFF3 or should I wait before changing my GFF3 files?

Thanks,
Andrew McArthur

------
Andrew G. McArthur, Ph.D.
Bioinformatics Consulting Services
Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
AIM: [hidden email], Skype: agmcarthur




------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Chris Mungall-3

Chado best practices should follow GFF3 here.

Are any changes required to either schema or loaders required here?  
For the former I think no, the fmin<=fmax constraint is never violated  
and the is_circular attribute would become an embedded featureprop  
like any other attribute. I'm less familiar with the loaders, and with  
any downstream software that makes assumptions that fmax <=  
srcfeature.length

On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:

> Hello all,
>
> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml 
> ) now has format definitions for supporting circular molecules such  
> as plasmids or bacterial genomes.  This is done using a new  
> Is_circular flag in the GFF3 attributes field.  Notably, "For  
> features that cross the origin of a circular feature (e.g. most  
> bacterial genomes, plasmids, and some viral genomes), the  
> requirement for start to be less than or equal to end is satisfied  
> by making end = the position of the end + the length of the landmark  
> feature."
>
> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to  
> GFF3 or should I wait before changing my GFF3 files?
>
> Thanks,
> Andrew McArthur
>
> ------
> Andrew G. McArthur, Ph.D.
> Bioinformatics Consulting Services
> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
> AIM: [hidden email], Skype: agmcarthur
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Fields, Christopher J
Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)

chris

On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:

>
> Chado best practices should follow GFF3 here.
>
> Are any changes required to either schema or loaders required here?  
> For the former I think no, the fmin<=fmax constraint is never violated  
> and the is_circular attribute would become an embedded featureprop  
> like any other attribute. I'm less familiar with the loaders, and with  
> any downstream software that makes assumptions that fmax <=  
> srcfeature.length
>
> On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:
>
>> Hello all,
>>
>> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml 
>> ) now has format definitions for supporting circular molecules such  
>> as plasmids or bacterial genomes.  This is done using a new  
>> Is_circular flag in the GFF3 attributes field.  Notably, "For  
>> features that cross the origin of a circular feature (e.g. most  
>> bacterial genomes, plasmids, and some viral genomes), the  
>> requirement for start to be less than or equal to end is satisfied  
>> by making end = the position of the end + the length of the landmark  
>> feature."
>>
>> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to  
>> GFF3 or should I wait before changing my GFF3 files?
>>
>> Thanks,
>> Andrew McArthur
>>
>> ------
>> Andrew G. McArthur, Ph.D.
>> Bioinformatics Consulting Services
>> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
>> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
>> AIM: [hidden email], Skype: agmcarthur
>>
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

lpritc@scri.ac.uk
In reply to this post by Chris Mungall-3
Hi,

That's good news (for me, at least ;)) about the is_circular attribute!

On 14/07/2010 Wednesday, July 14, 18:55, "Chris Mungall"
<[hidden email]> wrote:

> Chado best practices should follow GFF3 here.
>
> Are any changes required to either schema or loaders required here?
> For the former I think no, the fmin<=fmax constraint is never violated

Is that strictly true?

For a concrete example, one problem I've had with CHADO  has been with the
N.equitans NEQ001 feature in NC_005213:

    gene            complement(join(490883..490885,1..879))
                     /locus_tag="NEQ001"
                     /db_xref="GeneID:2732620"

Which is, in the NCBI GFF3 file:

NC_005213.1     RefSeq  gene    1       879     .       -       .
locus_tag=NEQ001;db_xref=GeneID:2732620
NC_005213.1     RefSeq  gene    490883  490885  .       -       .
locus_tag=NEQ001;db_xref=GeneID:2732620

While the GFF3 spec allows for the connection of two features at the same
level of the hierarchy by their ID attributes, my understanding is that the
CHADO schema and the loaders do not respect this, as the feature ID
contributes to a composite key (I have had to work around this issue when
adding annotation features that span exons but do not correspond to
introns).  

In the above example, the individual gene features shown do not violate
fmin<=fmax, but a parent feature - which I believe would currently be
necessary for CHADO to unite them into the single gene that they are -
*would* violate this relationship.

If my understanding is wrong, or there's an alternative solution for uniting
the two features across the origin of replication, I'd be glad of some
pointers.

Cheers,

L.

--
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:[hidden email]       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [hidden email] quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Peter-2-2
On Tue, Jul 20, 2010 at 8:50 AM, Leighton Pritchard <[hidden email]> wrote:

>
> Hi,
>
> That's good news (for me, at least ;)) about the is_circular attribute!
>
> On 14/07/2010 Wednesday, July 14, 18:55, "Chris Mungall"
> <[hidden email]> wrote:
>
>> Chado best practices should follow GFF3 here.
>>
>> Are any changes required to either schema or loaders required here?
>> For the former I think no, the fmin<=fmax constraint is never violated
>
> Is that strictly true?
>
> For a concrete example, one problem I've had with CHADO  has been with the
> N.equitans NEQ001 feature in NC_005213:
>
>    gene            complement(join(490883..490885,1..879))
>                     /locus_tag="NEQ001"
>                     /db_xref="GeneID:2732620"
>
> Which is, in the NCBI GFF3 file:
>
> NC_005213.1     RefSeq  gene    1       879     .       -       .
> locus_tag=NEQ001;db_xref=GeneID:2732620
> NC_005213.1     RefSeq  gene    490883  490885  .       -       .
> locus_tag=NEQ001;db_xref=GeneID:2732620
>
> While the GFF3 spec allows for the connection of two features at the same
> level of the hierarchy by their ID attributes, my understanding is that the
> CHADO schema and the loaders do not respect this, as the feature ID
> contributes to a composite key (I have had to work around this issue when
> adding annotation features that span exons but do not correspond to
> introns).
>
> In the above example, the individual gene features shown do not violate
> fmin<=fmax, but a parent feature - which I believe would currently be
> necessary for CHADO to unite them into the single gene that they are -
> *would* violate this relationship.
>
> If my understanding is wrong, or there's an alternative solution for uniting
> the two features across the origin of replication, I'd be glad of some
> pointers.
>

I would guess that this NCBI GFF3 file would have to be updated to follow
the new specification, and do this as a single entry (note 879 + 490885
is 491764) instead:

NC_005213.1     RefSeq  gene    490883  491764  .       -       .
 locus_tag=NEQ001;db_xref=GeneID:2732620

It is then implicit that since the end point 491764 is more than the
circular genome length of 490885, that the feature continues round
the origin for another 491764 - 490885 = 879 bases. My counting may
be off by one, I haven't double checked that.

If I am right, someone from the GFF3 committee should ask the NCBI
nicely to update their GFF3 files.

Peter

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

lpritc@scri.ac.uk
Hi,

On 20/07/2010 Tuesday, July 20, 10:48, "Biopython-1"
<[hidden email]> wrote:

> On Tue, Jul 20, 2010 at 8:50 AM, Leighton Pritchard <[hidden email]> wrote:
>> On 14/07/2010 Wednesday, July 14, 18:55, "Chris Mungall"
>> <[hidden email]> wrote:
>>
>>> Chado best practices should follow GFF3 here.
>> For a concrete example, one problem I've had with CHADO  has been with the
>> N.equitans NEQ001 feature in NC_005213:
>>
>>    gene            complement(join(490883..490885,1..879))
>>                     /locus_tag="NEQ001"
>>                     /db_xref="GeneID:2732620"
>>
>> Which is, in the NCBI GFF3 file:
>>
>> NC_005213.1     RefSeq  gene    1       879     .       -       .
>> locus_tag=NEQ001;db_xref=GeneID:2732620
>> NC_005213.1     RefSeq  gene    490883  490885  .       -       .
>> locus_tag=NEQ001;db_xref=GeneID:2732620
[...]
>> In the above example, the individual gene features shown do not violate
>> fmin<=fmax, but a parent feature - which I believe would currently be
>> necessary for CHADO to unite them into the single gene that they are -
>> *would* violate this relationship.

I'm wrong about the above - apologies for my misunderstanding - a new parent
feature would span the origin using the new specification quite nicely, as
Peter outlines below:
 
> I would guess that this NCBI GFF3 file would have to be updated to follow
> the new specification, and do this as a single entry (note 879 + 490885
> is 491764) instead:
>
> NC_005213.1     RefSeq  gene    490883  491764  .       -       .
>  locus_tag=NEQ001;db_xref=GeneID:2732620

[...]

> If I am right, someone from the GFF3 committee should ask the NCBI
> nicely to update their GFF3 files.

That seems like the appropriate fix for my problem (not that it's too hard
to hack a fix together just now).

Cheers,

L.

--
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:[hidden email]       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [hidden email] quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Fields, Christopher J
On Jul 20, 2010, at 6:47 AM, Leighton Pritchard wrote:

> Hi,
>
> On 20/07/2010 Tuesday, July 20, 10:48, "Biopython-1"
> <[hidden email]> wrote:
>
> [...]
> [...]
>
>> If I am right, someone from the GFF3 committee should ask the NCBI
>> nicely to update their GFF3 files.
>
> That seems like the appropriate fix for my problem (not that it's too hard
> to hack a fix together just now).
>
> Cheers,
>
> L.

There was supposedly some movement on this around Jan 2010, not sure how much progress there has been (seems like very little).  I don't use their autogenerated GFF3, I've been told it's not a gaod idea to use it with the possible exception of RefSeq, but even that has issues.

chris


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Scott Cain
In reply to this post by lpritc@scri.ac.uk
Hi Leighton,

You are correct about the grouping by ID, but as Peter pointed out,
you shouldn't need it in this instance anyway.

Scott


On Tue, Jul 20, 2010 at 3:50 AM, Leighton Pritchard <[hidden email]> wrote:

> Hi,
>
> That's good news (for me, at least ;)) about the is_circular attribute!
>
> On 14/07/2010 Wednesday, July 14, 18:55, "Chris Mungall"
> <[hidden email]> wrote:
>
>> Chado best practices should follow GFF3 here.
>>
>> Are any changes required to either schema or loaders required here?
>> For the former I think no, the fmin<=fmax constraint is never violated
>
> Is that strictly true?
>
> For a concrete example, one problem I've had with CHADO  has been with the
> N.equitans NEQ001 feature in NC_005213:
>
>    gene            complement(join(490883..490885,1..879))
>                     /locus_tag="NEQ001"
>                     /db_xref="GeneID:2732620"
>
> Which is, in the NCBI GFF3 file:
>
> NC_005213.1     RefSeq  gene    1       879     .       -       .
> locus_tag=NEQ001;db_xref=GeneID:2732620
> NC_005213.1     RefSeq  gene    490883  490885  .       -       .
> locus_tag=NEQ001;db_xref=GeneID:2732620
>
> While the GFF3 spec allows for the connection of two features at the same
> level of the hierarchy by their ID attributes, my understanding is that the
> CHADO schema and the loaders do not respect this, as the feature ID
> contributes to a composite key (I have had to work around this issue when
> adding annotation features that span exons but do not correspond to
> introns).
>
> In the above example, the individual gene features shown do not violate
> fmin<=fmax, but a parent feature - which I believe would currently be
> necessary for CHADO to unite them into the single gene that they are -
> *would* violate this relationship.
>
> If my understanding is wrong, or there's an alternative solution for uniting
> the two features across the origin of replication, I'd be glad of some
> pointers.
>
> Cheers,
>
> L.
>
> --
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:[hidden email]       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
>
>
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.
> The Scottish Crop Research Institute is a charitable company limited by guarantee.
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
>
>
> DISCLAIMER:
>
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [hidden email] quoting the name of the sender and delete the email from your system.
>
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Scott Cain
In reply to this post by Fields, Christopher J
Yeah, I have vague concerns about BioPerl as well.  The only way to be
sure is to just try it :-)

Scott


On Wed, Jul 14, 2010 at 2:17 PM, Chris Fields <[hidden email]> wrote:

> Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)
>
> chris
>
> On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:
>
>>
>> Chado best practices should follow GFF3 here.
>>
>> Are any changes required to either schema or loaders required here?
>> For the former I think no, the fmin<=fmax constraint is never violated
>> and the is_circular attribute would become an embedded featureprop
>> like any other attribute. I'm less familiar with the loaders, and with
>> any downstream software that makes assumptions that fmax <=
>> srcfeature.length
>>
>> On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:
>>
>>> Hello all,
>>>
>>> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml
>>> ) now has format definitions for supporting circular molecules such
>>> as plasmids or bacterial genomes.  This is done using a new
>>> Is_circular flag in the GFF3 attributes field.  Notably, "For
>>> features that cross the origin of a circular feature (e.g. most
>>> bacterial genomes, plasmids, and some viral genomes), the
>>> requirement for start to be less than or equal to end is satisfied
>>> by making end = the position of the end + the length of the landmark
>>> feature."
>>>
>>> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to
>>> GFF3 or should I wait before changing my GFF3 files?
>>>
>>> Thanks,
>>> Andrew McArthur
>>>
>>> ------
>>> Andrew G. McArthur, Ph.D.
>>> Bioinformatics Consulting Services
>>> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
>>> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
>>> AIM: [hidden email], Skype: agmcarthur
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Fields, Christopher J
Easy enough to test for.  The phix174 sequence is a good test case (and it's a classic):

http://www.ncbi.nlm.nih.gov/nuccore/NC_001422.1

chris

On Jul 23, 2010, at 5:35 PM, Scott Cain wrote:

> Yeah, I have vague concerns about BioPerl as well.  The only way to be
> sure is to just try it :-)
>
> Scott
>
>
> On Wed, Jul 14, 2010 at 2:17 PM, Chris Fields <[hidden email]> wrote:
>> Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)
>>
>> chris
>>
>> On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:
>>
>>>
>>> Chado best practices should follow GFF3 here.
>>>
>>> Are any changes required to either schema or loaders required here?
>>> For the former I think no, the fmin<=fmax constraint is never violated
>>> and the is_circular attribute would become an embedded featureprop
>>> like any other attribute. I'm less familiar with the loaders, and with
>>> any downstream software that makes assumptions that fmax <=
>>> srcfeature.length
>>>
>>> On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:
>>>
>>>> Hello all,
>>>>
>>>> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml
>>>> ) now has format definitions for supporting circular molecules such
>>>> as plasmids or bacterial genomes.  This is done using a new
>>>> Is_circular flag in the GFF3 attributes field.  Notably, "For
>>>> features that cross the origin of a circular feature (e.g. most
>>>> bacterial genomes, plasmids, and some viral genomes), the
>>>> requirement for start to be less than or equal to end is satisfied
>>>> by making end = the position of the end + the length of the landmark
>>>> feature."
>>>>
>>>> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to
>>>> GFF3 or should I wait before changing my GFF3 files?
>>>>
>>>> Thanks,
>>>> Andrew McArthur
>>>>
>>>> ------
>>>> Andrew G. McArthur, Ph.D.
>>>> Bioinformatics Consulting Services
>>>> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
>>>> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
>>>> AIM: [hidden email], Skype: agmcarthur
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Jim Hu
In reply to this post by Scott Cain
Note that is_circular has been in BioPerl for years, but I'm not sure that it's implemented.

Jim

On Jul 23, 2010, at 5:35 PM, Scott Cain wrote:

Yeah, I have vague concerns about BioPerl as well.  The only way to be
sure is to just try it :-)

Scott


On Wed, Jul 14, 2010 at 2:17 PM, Chris Fields <[hidden email]> wrote:
Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)

chris

On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:


Chado best practices should follow GFF3 here.

Are any changes required to either schema or loaders required here?
For the former I think no, the fmin<=fmax constraint is never violated
and the is_circular attribute would become an embedded featureprop
like any other attribute. I'm less familiar with the loaders, and with
any downstream software that makes assumptions that fmax <=
srcfeature.length

On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:

Hello all,

The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml
) now has format definitions for supporting circular molecules such
as plasmids or bacterial genomes.  This is done using a new
Is_circular flag in the GFF3 attributes field.  Notably, "For
features that cross the origin of a circular feature (e.g. most
bacterial genomes, plasmids, and some viral genomes), the
requirement for start to be less than or equal to end is satisfied
by making end = the position of the end + the length of the landmark
feature."

Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to
GFF3 or should I wait before changing my GFF3 files?

Thanks,
Andrew McArthur

------
Andrew G. McArthur, Ph.D.
Bioinformatics Consulting Services
Email: [hidden email], Web: http://mcarthurlab.blogspot.com
Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
AIM: [hidden email], Skype: agmcarthur



------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

=====================================

Jim Hu

Associate Professor

Dept. of Biochemistry and Biophysics

2128 TAMU

Texas A&M Univ.

College Station, TX 77843-2128

979-862-4054




------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Fields, Christopher J
I can firmly establish there are significant problems with circular seqs.  For the phi-x174 sequence (PX1CG.gb), the three CDS that overlap the origin all have start = 1, end = 5386, and the splicing is incorrect.  Truncated SFs from original genbank file (acc J02482):

     CDS             join(3981..5386,1..136)
                     /product="A"
     CDS             join(4497..5386,1..136)
                     /product="A*"
     CDS             join(5075..5386,1..51)
                     /product="B"

I'll commit some tests on this.  It's a bit of a hornet's nest to fix the way generic features are implemented, currently (using Bio::LocationI), but will have a look.  As for other SeqFeatureI, anything that uses spliced_seq() is affected if the implementation uses split locations.

chris

On Jul 26, 2010, at 12:09 PM, Jim Hu wrote:

> Note that is_circular has been in BioPerl for years, but I'm not sure that it's implemented.
>
> Jim
>
> On Jul 23, 2010, at 5:35 PM, Scott Cain wrote:
>
>> Yeah, I have vague concerns about BioPerl as well.  The only way to be
>> sure is to just try it :-)
>>
>> Scott
>>
>>
>> On Wed, Jul 14, 2010 at 2:17 PM, Chris Fields <[hidden email]> wrote:
>>> Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)
>>>
>>> chris
>>>
>>> On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:
>>>
>>>>
>>>> Chado best practices should follow GFF3 here.
>>>>
>>>> Are any changes required to either schema or loaders required here?
>>>> For the former I think no, the fmin<=fmax constraint is never violated
>>>> and the is_circular attribute would become an embedded featureprop
>>>> like any other attribute. I'm less familiar with the loaders, and with
>>>> any downstream software that makes assumptions that fmax <=
>>>> srcfeature.length
>>>>
>>>> On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml
>>>>> ) now has format definitions for supporting circular molecules such
>>>>> as plasmids or bacterial genomes.  This is done using a new
>>>>> Is_circular flag in the GFF3 attributes field.  Notably, "For
>>>>> features that cross the origin of a circular feature (e.g. most
>>>>> bacterial genomes, plasmids, and some viral genomes), the
>>>>> requirement for start to be less than or equal to end is satisfied
>>>>> by making end = the position of the end + the length of the landmark
>>>>> feature."
>>>>>
>>>>> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to
>>>>> GFF3 or should I wait before changing my GFF3 files?
>>>>>
>>>>> Thanks,
>>>>> Andrew McArthur
>>>>>
>>>>> ------
>>>>> Andrew G. McArthur, Ph.D.
>>>>> Bioinformatics Consulting Services
>>>>> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
>>>>> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
>>>>> AIM: [hidden email], Skype: agmcarthur
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Sprint
>>>>> What will you do first with EVO, the first 4G phone?
>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
> =====================================
> Jim Hu
> Associate Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054
>
>


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Fields, Christopher J
To follow up on this, the bug is present when using start, end, and length.  Other methods (spliced_seq, to_FTstring, strand) are not affected; you do get correct splicing across the origin.

What should the start, end, and length of the below be?  We have always defined fstart < fend, even for split locations.  Not sure how chado handles it.

chris

On Jul 26, 2010, at 4:58 PM, Chris Fields wrote:

> I can firmly establish there are significant problems with circular seqs.  For the phi-x174 sequence (PX1CG.gb), the three CDS that overlap the origin all have start = 1, end = 5386, and the splicing is incorrect.  Truncated SFs from original genbank file (acc J02482):
>
>    CDS             join(3981..5386,1..136)
>                    /product="A"
>    CDS             join(4497..5386,1..136)
>                    /product="A*"
>    CDS             join(5075..5386,1..51)
>                    /product="B"
>
> I'll commit some tests on this.  It's a bit of a hornet's nest to fix the way generic features are implemented, currently (using Bio::LocationI), but will have a look.  As for other SeqFeatureI, anything that uses spliced_seq() is affected if the implementation uses split locations.
>
> chris
>
> On Jul 26, 2010, at 12:09 PM, Jim Hu wrote:
>
>> Note that is_circular has been in BioPerl for years, but I'm not sure that it's implemented.
>>
>> Jim
>>
>> On Jul 23, 2010, at 5:35 PM, Scott Cain wrote:
>>
>>> Yeah, I have vague concerns about BioPerl as well.  The only way to be
>>> sure is to just try it :-)
>>>
>>> Scott
>>>
>>>
>>> On Wed, Jul 14, 2010 at 2:17 PM, Chris Fields <[hidden email]> wrote:
>>>> Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)
>>>>
>>>> chris
>>>>
>>>> On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:
>>>>
>>>>>
>>>>> Chado best practices should follow GFF3 here.
>>>>>
>>>>> Are any changes required to either schema or loaders required here?
>>>>> For the former I think no, the fmin<=fmax constraint is never violated
>>>>> and the is_circular attribute would become an embedded featureprop
>>>>> like any other attribute. I'm less familiar with the loaders, and with
>>>>> any downstream software that makes assumptions that fmax <=
>>>>> srcfeature.length
>>>>>
>>>>> On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml
>>>>>> ) now has format definitions for supporting circular molecules such
>>>>>> as plasmids or bacterial genomes.  This is done using a new
>>>>>> Is_circular flag in the GFF3 attributes field.  Notably, "For
>>>>>> features that cross the origin of a circular feature (e.g. most
>>>>>> bacterial genomes, plasmids, and some viral genomes), the
>>>>>> requirement for start to be less than or equal to end is satisfied
>>>>>> by making end = the position of the end + the length of the landmark
>>>>>> feature."
>>>>>>
>>>>>> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to
>>>>>> GFF3 or should I wait before changing my GFF3 files?
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew McArthur
>>>>>>
>>>>>> ------
>>>>>> Andrew G. McArthur, Ph.D.
>>>>>> Bioinformatics Consulting Services
>>>>>> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
>>>>>> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
>>>>>> AIM: [hidden email], Skype: agmcarthur
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> This SF.net email is sponsored by Sprint
>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Sprint
>>>>> What will you do first with EVO, the first 4G phone?
>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>> =====================================
>> Jim Hu
>> Associate Professor
>> Dept. of Biochemistry and Biophysics
>> 2128 TAMU
>> Texas A&M Univ.
>> College Station, TX 77843-2128
>> 979-862-4054
>>
>>
>
>
> ------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://ad.doubleclick.net/clk;226879339;13503038;l?
> http://clk.atdmt.com/CRS/go/247765532/direct/01/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Scott Cain
Hi Chris,

I think Chado will use the GFF3 spec for these locations, so the start
and end would be:

A: 3981..5522
B: 4497..5522
C: 5075..5437

Scott


On Tue, Jul 27, 2010 at 12:10 AM, Chris Fields <[hidden email]> wrote:

> To follow up on this, the bug is present when using start, end, and length.  Other methods (spliced_seq, to_FTstring, strand) are not affected; you do get correct splicing across the origin.
>
> What should the start, end, and length of the below be?  We have always defined fstart < fend, even for split locations.  Not sure how chado handles it.
>
> chris
>
> On Jul 26, 2010, at 4:58 PM, Chris Fields wrote:
>
>> I can firmly establish there are significant problems with circular seqs.  For the phi-x174 sequence (PX1CG.gb), the three CDS that overlap the origin all have start = 1, end = 5386, and the splicing is incorrect.  Truncated SFs from original genbank file (acc J02482):
>>
>>    CDS             join(3981..5386,1..136)
>>                    /product="A"
>>    CDS             join(4497..5386,1..136)
>>                    /product="A*"
>>    CDS             join(5075..5386,1..51)
>>                    /product="B"
>>
>> I'll commit some tests on this.  It's a bit of a hornet's nest to fix the way generic features are implemented, currently (using Bio::LocationI), but will have a look.  As for other SeqFeatureI, anything that uses spliced_seq() is affected if the implementation uses split locations.
>>
>> chris
>>
>> On Jul 26, 2010, at 12:09 PM, Jim Hu wrote:
>>
>>> Note that is_circular has been in BioPerl for years, but I'm not sure that it's implemented.
>>>
>>> Jim
>>>
>>> On Jul 23, 2010, at 5:35 PM, Scott Cain wrote:
>>>
>>>> Yeah, I have vague concerns about BioPerl as well.  The only way to be
>>>> sure is to just try it :-)
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Wed, Jul 14, 2010 at 2:17 PM, Chris Fields <[hidden email]> wrote:
>>>>> Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)
>>>>>
>>>>> chris
>>>>>
>>>>> On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:
>>>>>
>>>>>>
>>>>>> Chado best practices should follow GFF3 here.
>>>>>>
>>>>>> Are any changes required to either schema or loaders required here?
>>>>>> For the former I think no, the fmin<=fmax constraint is never violated
>>>>>> and the is_circular attribute would become an embedded featureprop
>>>>>> like any other attribute. I'm less familiar with the loaders, and with
>>>>>> any downstream software that makes assumptions that fmax <=
>>>>>> srcfeature.length
>>>>>>
>>>>>> On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml
>>>>>>> ) now has format definitions for supporting circular molecules such
>>>>>>> as plasmids or bacterial genomes.  This is done using a new
>>>>>>> Is_circular flag in the GFF3 attributes field.  Notably, "For
>>>>>>> features that cross the origin of a circular feature (e.g. most
>>>>>>> bacterial genomes, plasmids, and some viral genomes), the
>>>>>>> requirement for start to be less than or equal to end is satisfied
>>>>>>> by making end = the position of the end + the length of the landmark
>>>>>>> feature."
>>>>>>>
>>>>>>> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to
>>>>>>> GFF3 or should I wait before changing my GFF3 files?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Andrew McArthur
>>>>>>>
>>>>>>> ------
>>>>>>> Andrew G. McArthur, Ph.D.
>>>>>>> Bioinformatics Consulting Services
>>>>>>> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
>>>>>>> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
>>>>>>> AIM: [hidden email], Skype: agmcarthur
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by Sprint
>>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
>>>>>>> Gmod-schema mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> This SF.net email is sponsored by Sprint
>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>> _______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Sprint
>>>>> What will you do first with EVO, the first 4G phone?
>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>>> Ontario Institute for Cancer Research
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> The Palm PDK Hot Apps Program offers developers who use the
>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>> of $1 Million in cash or HP Products. Visit us here for more details:
>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://ad.doubleclick.net/clk;226879339;13503038;l?
> http://clk.atdmt.com/CRS/go/247765532/direct/01/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: GFF3 Is_circular

Fields, Christopher J
I'm tending to think this is how BioPerl should deal with it, if anything else just for consistency.

chris

On Jul 27, 2010, at 9:18 AM, Scott Cain wrote:

> Hi Chris,
>
> I think Chado will use the GFF3 spec for these locations, so the start
> and end would be:
>
> A: 3981..5522
> B: 4497..5522
> C: 5075..5437
>
> Scott
>
>
> On Tue, Jul 27, 2010 at 12:10 AM, Chris Fields <[hidden email]> wrote:
>> To follow up on this, the bug is present when using start, end, and length.  Other methods (spliced_seq, to_FTstring, strand) are not affected; you do get correct splicing across the origin.
>>
>> What should the start, end, and length of the below be?  We have always defined fstart < fend, even for split locations.  Not sure how chado handles it.
>>
>> chris
>>
>> On Jul 26, 2010, at 4:58 PM, Chris Fields wrote:
>>
>>> I can firmly establish there are significant problems with circular seqs.  For the phi-x174 sequence (PX1CG.gb), the three CDS that overlap the origin all have start = 1, end = 5386, and the splicing is incorrect.  Truncated SFs from original genbank file (acc J02482):
>>>
>>>    CDS             join(3981..5386,1..136)
>>>                    /product="A"
>>>    CDS             join(4497..5386,1..136)
>>>                    /product="A*"
>>>    CDS             join(5075..5386,1..51)
>>>                    /product="B"
>>>
>>> I'll commit some tests on this.  It's a bit of a hornet's nest to fix the way generic features are implemented, currently (using Bio::LocationI), but will have a look.  As for other SeqFeatureI, anything that uses spliced_seq() is affected if the implementation uses split locations.
>>>
>>> chris
>>>
>>> On Jul 26, 2010, at 12:09 PM, Jim Hu wrote:
>>>
>>>> Note that is_circular has been in BioPerl for years, but I'm not sure that it's implemented.
>>>>
>>>> Jim
>>>>
>>>> On Jul 23, 2010, at 5:35 PM, Scott Cain wrote:
>>>>
>>>>> Yeah, I have vague concerns about BioPerl as well.  The only way to be
>>>>> sure is to just try it :-)
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Wed, Jul 14, 2010 at 2:17 PM, Chris Fields <[hidden email]> wrote:
>>>>>> Not sure on the chado end, but it's still very possible this is problematic on the bioperl end, so any bp-reliant stuff might be affected.  I think there are some current assumptions with Bio::Location that run counter to this ATM, but it needs examples/testing (I think I have a bug report on this somewhere....)
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jul 14, 2010, at 12:55 PM, Chris Mungall wrote:
>>>>>>
>>>>>>>
>>>>>>> Chado best practices should follow GFF3 here.
>>>>>>>
>>>>>>> Are any changes required to either schema or loaders required here?
>>>>>>> For the former I think no, the fmin<=fmax constraint is never violated
>>>>>>> and the is_circular attribute would become an embedded featureprop
>>>>>>> like any other attribute. I'm less familiar with the loaders, and with
>>>>>>> any downstream software that makes assumptions that fmax <=
>>>>>>> srcfeature.length
>>>>>>>
>>>>>>> On Jul 14, 2010, at 9:17 AM, Andrew McArthur wrote:
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> The definition of GFF3 at the Sequence Ontology site (http://www.sequenceontology.org/gff3.shtml
>>>>>>>> ) now has format definitions for supporting circular molecules such
>>>>>>>> as plasmids or bacterial genomes.  This is done using a new
>>>>>>>> Is_circular flag in the GFF3 attributes field.  Notably, "For
>>>>>>>> features that cross the origin of a circular feature (e.g. most
>>>>>>>> bacterial genomes, plasmids, and some viral genomes), the
>>>>>>>> requirement for start to be less than or equal to end is satisfied
>>>>>>>> by making end = the position of the end + the length of the landmark
>>>>>>>> feature."
>>>>>>>>
>>>>>>>> Are Chado 1.1 and gmod_bulk_load_gff3.pl supporting this change to
>>>>>>>> GFF3 or should I wait before changing my GFF3 files?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Andrew McArthur
>>>>>>>>
>>>>>>>> ------
>>>>>>>> Andrew G. McArthur, Ph.D.
>>>>>>>> Bioinformatics Consulting Services
>>>>>>>> Email: [hidden email], Web: http://mcarthurlab.blogspot.com
>>>>>>>> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
>>>>>>>> AIM: [hidden email], Skype: agmcarthur
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> This SF.net email is sponsored by Sprint
>>>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first_______________________________________________
>>>>>>>> Gmod-schema mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by Sprint
>>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>>> _______________________________________________
>>>>>>> Gmod-schema mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> This SF.net email is sponsored by Sprint
>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>> _______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>>>> Ontario Institute for Cancer Research
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Sprint
>>>>> What will you do first with EVO, the first 4G phone?
>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>> =====================================
>>>> Jim Hu
>>>> Associate Professor
>>>> Dept. of Biochemistry and Biophysics
>>>> 2128 TAMU
>>>> Texas A&M Univ.
>>>> College Station, TX 77843-2128
>>>> 979-862-4054
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> The Palm PDK Hot Apps Program offers developers who use the
>>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>>> of $1 Million in cash or HP Products. Visit us here for more details:
>>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>> ------------------------------------------------------------------------------
>> The Palm PDK Hot Apps Program offers developers who use the
>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>> of $1 Million in cash or HP Products. Visit us here for more details:
>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema