Chado polypeptide bug

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Chado polypeptide bug

Carson Hinton Holt
Chado polypeptide bug Hi Scott,

Recently I’ve been loading large GFF3 files into Chado, I don’t break them up because the machine I’m using has tons of memory and breaking them up takes so much longer to load into Chado.  The problem is that when I load large files, Chado starts to create duplicate polypeptide features.  There should only be one polypeptide per mRNA, but if I load large GFF3 files, for some reason I get two for many mRNAs.  I can get around this by loading in smaller chunks, but it still seems like a bug to me.  Right now my solution has been a script that looks for and deletes duplicate polypeptide features.

Thanks,
Carson

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado polypeptide bug

Scott Cain
Hi Carson,

That's an interesting bug.  So a given mRNA would have two polypeptide
features associated with?  I wonder if there are other mRNA features
that are missing polypeptides.  Are the coordinates of the both
polypeptide features the same?  I wonder if you could use the --save
flag and send me (or post somewhere, as they are likely to be big) the
temp files that the loader creates for loading.  With those and the
original GFF, I might be able to figure out where the extra features
are coming from, since I don't think I'll be able to reproduce it
(since I don't have a big beefy machine like you :-)

Scott


On Wed, Jul 28, 2010 at 4:59 PM, Carson Holt
<[hidden email]> wrote:

> Hi Scott,
>
> Recently I’ve been loading large GFF3 files into Chado, I don’t break them
> up because the machine I’m using has tons of memory and breaking them up
> takes so much longer to load into Chado.  The problem is that when I load
> large files, Chado starts to create duplicate polypeptide features.  There
> should only be one polypeptide per mRNA, but if I load large GFF3 files, for
> some reason I get two for many mRNAs.  I can get around this by loading in
> smaller chunks, but it still seems like a bug to me.  Right now my solution
> has been a script that looks for and deletes duplicate polypeptide features.
>
> Thanks,
> Carson



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado polypeptide bug

Suzanna Lewis-3
Assuming, of course, that it isn't a dicistronic mRNA, in which case 2 polypeptides for that 1 mRNA would be just what you want.

On Jul 28, 2010, at 2:04 PM, Scott Cain wrote:

> Hi Carson,
>
> That's an interesting bug.  So a given mRNA would have two polypeptide
> features associated with?  I wonder if there are other mRNA features
> that are missing polypeptides.  Are the coordinates of the both
> polypeptide features the same?  I wonder if you could use the --save
> flag and send me (or post somewhere, as they are likely to be big) the
> temp files that the loader creates for loading.  With those and the
> original GFF, I might be able to figure out where the extra features
> are coming from, since I don't think I'll be able to reproduce it
> (since I don't have a big beefy machine like you :-)
>
> Scott
>
>
> On Wed, Jul 28, 2010 at 4:59 PM, Carson Holt
> <[hidden email]> wrote:
>> Hi Scott,
>>
>> Recently I’ve been loading large GFF3 files into Chado, I don’t break them
>> up because the machine I’m using has tons of memory and breaking them up
>> takes so much longer to load into Chado.  The problem is that when I load
>> large files, Chado starts to create duplicate polypeptide features.  There
>> should only be one polypeptide per mRNA, but if I load large GFF3 files, for
>> some reason I get two for many mRNAs.  I can get around this by loading in
>> smaller chunks, but it still seems like a bug to me.  Right now my solution
>> has been a script that looks for and deletes duplicate polypeptide features.
>>
>> Thanks,
>> Carson
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
> ------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://p.sf.net/sfu/dev2dev-palm
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado polypeptide bug

Carson Hinton Holt
In reply to this post by Scott Cain
Re: Chado polypeptide bug I’ll try this.  I’ll post the temp files to our server for you to download when they are ready.

Thanks,
Carson


On 7/28/10 3:04 PM, "Scott Cain" <scott@...> wrote:

Hi Carson,

That's an interesting bug.  So a given mRNA would have two polypeptide
features associated with?  I wonder if there are other mRNA features
that are missing polypeptides.  Are the coordinates of the both
polypeptide features the same?  I wonder if you could use the --save
flag and send me (or post somewhere, as they are likely to be big) the
temp files that the loader creates for loading.  With those and the
original GFF, I might be able to figure out where the extra features
are coming from, since I don't think I'll be able to reproduce it
(since I don't have a big beefy machine like you :-)

Scott


On Wed, Jul 28, 2010 at 4:59 PM, Carson Holt
<carson.holt@...> wrote:
> Hi Scott,
>
> Recently I’ve been loading large GFF3 files into Chado, I don’t break them
> up because the machine I’m using has tons of memory and breaking them up
> takes so much longer to load into Chado.  The problem is that when I load
> large files, Chado starts to create duplicate polypeptide features.  There
> should only be one polypeptide per mRNA, but if I load large GFF3 files, for
> some reason I get two for many mRNAs.  I can get around this by loading in
> smaller chunks, but it still seems like a bug to me.  Right now my solution
> has been a script that looks for and deletes duplicate polypeptide features.
>
> Thanks,
> Carson



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema