nested tandem repeats/gff

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

nested tandem repeats/gff

anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: nested tandem repeats/gff

J.M.P. Alves
Hallo,

A few possibilities I see, although I am not positive they are the
actual problem:

 > 2 NTR program nested_repeat 831 1720 . . . ID=ID1;Name=ID1

Are "nested_repeat" and "repeat_fragment" part of the sequence ontology
vocabulary? I think you can use anything you want on the second column
there ("program"), but the 3rd one is restricted.

NTR is the name of the sequence (as in the FASTA file), right?

Another possible problem:

 > 3 NTR program repeat_fragment 1505 1553 . + . Parent=ID1

I don't know if this is the case, but I thought every line had to have
an ID attribute, e.g.:

NTR program repeat_fragment 1505 1553 . + . ID=rf1;Parent=ID1

I hope some of these ideas help.

J

Anja Friedrich wrote:

> Hi all,
>
> not sure if my earlier mail reached, because I was texting from a different e-mail.
>
> I tried to load nested tandem repeats into chado. As this fature doesnt exist yet for gff3 I tried to get around:
>
> 0  ##gff-version   3
> 1  ##sequence-region   taro 5428 bp    
> 2 NTR program nested_repeat 831 1720 . . . ID=ID1;Name=ID1
> 3 NTR program repeat_fragment 1505 1553 . + . Parent=ID1
> 4 NTR program repeat_fragement 473 483 . + . Parent=ID1
>
> But I get this error message:
>
> anou@anou-laptop:~$ gmod_bulk_load_gff3.pl --organism Taro  --gfffile taro.gffCommand line argument used for root
> Preparing data for inserting into the chado database
> (This may take a while ...)
>
> --------------------- WARNING ---------------------
> MSG: Calling end without a defined start position
> ---------------------------------------------------
> Use of uninitialized value $featuretype in pattern match (m//) at /usr/local/bin/gmod_bulk_load_gff3.pl line 808, <GEN0> line 1.
> Use of uninitialized value $featuretype in pattern match (m//) at /usr/local/bin/gmod_bulk_load_gff3.pl line 809, <GEN0> line 1.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: no cvterm for
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
> STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4579
> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:838
> -----------------------------------------------------------
>
> Someone an idea? Cant I add the feature like this?
>
> Cheers,
> Anja
>      
>
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
> Be part of this innovative community and reach millions of netbook users
> worldwide. Take advantage of special opportunities to increase revenue and
> speed time-to-market. Join now, and jumpstart your future.
> http://p.sf.net/sfu/intel-atom-d2d
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

--
-------------------------------
João Marcelo Pereira Alves (J)
Post-doctoral fellow
MCV / VCU - Richmond, VA
http://bioinfo.lpb.mic.vcu.edu
f. 1-804-828-3897


------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users
worldwide. Take advantage of special opportunities to increase revenue and
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: nested tandem repeats/gff

anja
In reply to this post by anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: nested tandem repeats/gff

anja
In reply to this post by J.M.P. Alves
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: nested tandem repeats/gff

anja
In reply to this post by J.M.P. Alves
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: nested tandem repeats/gff

Scott Cain
In reply to this post by J.M.P. Alves
Hi Anja,

I'm going to put comments below in response to both you and J.

Scott


On Thu, Aug 26, 2010 at 11:54 AM, J.M.P. Alves <[hidden email]> wrote:

> Hallo,
>
> A few possibilities I see, although I am not positive they are the
> actual problem:
>
>  > 2 NTR program nested_repeat 831 1720 . . . ID=ID1;Name=ID1
>
> Are "nested_repeat" and "repeat_fragment" part of the sequence ontology
> vocabulary? I think you can use anything you want on the second column
> there ("program"), but the 3rd one is restricted.

In my instance of Chado (a few weeks old at the most), SO has both
terms, so that isn't likely the problem, but it's a good idea to
check.  A query like this would tell you if it's present:

  select cvterm.* from cvterm join cv using (cv_id)
    where cv.name='sequence' and cvterm.name='nested_repeat';

>
> NTR is the name of the sequence (as in the FASTA file), right?

Right, if a feature named NTR isn't already in the database, you will
have problems.  From the error message you are getting, I'm guessing
it's already there (or it would have complained about that), but
perhaps I misremembering the order of error messaging.  I would
suggest getting rid of the sequence-region directive, as unless you
have a fairly recent checkout of bioperl-live it will cause problems,
and in any event, will never be supported for defining a feature (and
this one isn't properly formed anyway--there's no start value and "bp"
isn't part of the spec).  Instead, add a full gff line:

NTR    .    contig   1   5428   .   .    .    ID=NTR;Name=NTR

Looking at your sample GFF again, it looks to me like you want these
feature to reside on a feature called "taro", is that right?  Or is
there a feature called NTR?  If the contig/chromosome/whatever is
called taro, then you should replace the text in the first column of
the gff with "taro" and create a GFF line for it, like I did for NTR
above.

>
> Another possible problem:
>
>  > 3 NTR program repeat_fragment 1505 1553 . + . Parent=ID1
>
> I don't know if this is the case, but I thought every line had to have
> an ID attribute, e.g.:
>
> NTR program repeat_fragment 1505 1553 . + . ID=rf1;Parent=ID1

Not so: ID tags are only needed in two cases:

1. To identify a feature so it can be referred to later to show
parentage (as Anja did in the sample GFF) and

2. To identify a reference sequence so it can be referred to in column
one (this is NOT part of the GFF3 spec, but life will work a lot
better with Chado if reference sequences look like
"ID=chr1;Name=chr1..."

Scott

>
> I hope some of these ideas help.
>
> J
>
> Anja Friedrich wrote:
>> Hi all,
>>
>> not sure if my earlier mail reached, because I was texting from a different e-mail.
>>
>> I tried to load nested tandem repeats into chado. As this fature doesnt exist yet for gff3 I tried to get around:
>>
>> 0  ##gff-version   3
>> 1  ##sequence-region   taro 5428 bp
>> 2 NTR program nested_repeat 831 1720 . . . ID=ID1;Name=ID1
>> 3 NTR program repeat_fragment 1505 1553 . + . Parent=ID1
>> 4 NTR program repeat_fragement 473 483 . + . Parent=ID1
>>
>> But I get this error message:
>>
>> anou@anou-laptop:~$ gmod_bulk_load_gff3.pl --organism Taro  --gfffile taro.gffCommand line argument used for root
>> Preparing data for inserting into the chado database
>> (This may take a while ...)
>>
>> --------------------- WARNING ---------------------
>> MSG: Calling end without a defined start position
>> ---------------------------------------------------
>> Use of uninitialized value $featuretype in pattern match (m//) at /usr/local/bin/gmod_bulk_load_gff3.pl line 808, <GEN0> line 1.
>> Use of uninitialized value $featuretype in pattern match (m//) at /usr/local/bin/gmod_bulk_load_gff3.pl line 809, <GEN0> line 1.
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: no cvterm for
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
>> STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4579
>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:838
>> -----------------------------------------------------------
>>
>> Someone an idea? Cant I add the feature like this?
>>
>> Cheers,
>> Anja
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------------
>> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
>> Be part of this innovative community and reach millions of netbook users
>> worldwide. Take advantage of special opportunities to increase revenue and
>> speed time-to-market. Join now, and jumpstart your future.
>> http://p.sf.net/sfu/intel-atom-d2d
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
> --
> -------------------------------
> João Marcelo Pereira Alves (J)
> Post-doctoral fellow
> MCV / VCU - Richmond, VA
> http://bioinfo.lpb.mic.vcu.edu
> f. 1-804-828-3897
>
>
> ------------------------------------------------------------------------------
> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
> Be part of this innovative community and reach millions of netbook users
> worldwide. Take advantage of special opportunities to increase revenue and
> speed time-to-market. Join now, and jumpstart your future.
> http://p.sf.net/sfu/intel-atom-d2d
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users
worldwide. Take advantage of special opportunities to increase revenue and
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: nested tandem repeats/gff

anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: nested tandem repeats/gff

Scott Cain
Hi Anja,

I'm cc'ing this to the JBrowse list they can chime in to.

The wonderful thing about JBrowse is that it is quite flexible in what
it can display; that is also a downside, though, as it might require
you to create the underlying graphics if native support isn't there.
Since I don't really understand what you would want the image to look
like, would it be possible for you to create a sketch of what you want
to see?  That would make it a lot easier to say whether what you want
to do is possible, easy or hard.

Scott


On Thu, Aug 26, 2010 at 2:21 PM, Anja Friedrich
<[hidden email]> wrote:

> Hi Scott,
>
> thanks for your explanation. I will go through it.
> As I told J my request to add nested_tandem_repeat to SO was accepted. My
> problem is that I have 3 different repeat regions with 2 motifs. I want to
> display the regions and the motifs in JBrowse. I was wondering how I have to
> create the single lines. One line for the region and 1 for each motif? For
> all 3 repeat regions of course... Would that work?
>
> Cheers,
> Anja
>
>
>> Date: Thu, 26 Aug 2010 14:10:58 -0400
>> Subject: Re: [Gmod-schema] nested tandem repeats/gff
>> From: [hidden email]
>> To: [hidden email]
>> CC: [hidden email]; [hidden email]
>>
>> Hi Anja,
>>
>> I'm going to put comments below in response to both you and J.
>>
>> Scott
>>
>>
>> On Thu, Aug 26, 2010 at 11:54 AM, J.M.P. Alves <[hidden email]> wrote:
>> > Hallo,
>> >
>> > A few possibilities I see, although I am not positive they are the
>> > actual problem:
>> >
>> >  > 2 NTR program nested_repeat 831 1720 . . . ID=ID1;Name=ID1
>> >
>> > Are "nested_repeat" and "repeat_fragment" part of the sequence ontology
>> > vocabulary? I think you can use anything you want on the second column
>> > there ("program"), but the 3rd one is restricted.
>>
>> In my instance of Chado (a few weeks old at the most), SO has both
>> terms, so that isn't likely the problem, but it's a good idea to
>> check. A query like this would tell you if it's present:
>>
>> select cvterm.* from cvterm join cv using (cv_id)
>> where cv.name='sequence' and cvterm.name='nested_repeat';
>>
>> >
>> > NTR is the name of the sequence (as in the FASTA file), right?
>>
>> Right, if a feature named NTR isn't already in the database, you will
>> have problems. From the error message you are getting, I'm guessing
>> it's already there (or it would have complained about that), but
>> perhaps I misremembering the order of error messaging. I would
>> suggest getting rid of the sequence-region directive, as unless you
>> have a fairly recent checkout of bioperl-live it will cause problems,
>> and in any event, will never be supported for defining a feature (and
>> this one isn't properly formed anyway--there's no start value and "bp"
>> isn't part of the spec). Instead, add a full gff line:
>>
>> NTR . contig 1 5428 . . . ID=NTR;Name=NTR
>>
>> Looking at your sample GFF again, it looks to me like you want these
>> feature to reside on a feature called "taro", is that right? Or is
>> there a feature called NTR? If the contig/chromosome/whatever is
>> called taro, then you should replace the text in the first column of
>> the gff with "taro" and create a GFF line for it, like I did for NTR
>> above.
>>
>> >
>> > Another possible problem:
>> >
>> >  > 3 NTR program repeat_fragment 1505 1553 . + . Parent=ID1
>> >
>> > I don't know if this is the case, but I thought every line had to have
>> > an ID attribute, e.g.:
>> >
>> > NTR program repeat_fragment 1505 1553 . + . ID=rf1;Parent=ID1
>>
>> Not so: ID tags are only needed in two cases:
>>
>> 1. To identify a feature so it can be referred to later to show
>> parentage (as Anja did in the sample GFF) and
>>
>> 2. To identify a reference sequence so it can be referred to in column
>> one (this is NOT part of the GFF3 spec, but life will work a lot
>> better with Chado if reference sequences look like
>> "ID=chr1;Name=chr1..."
>>
>> Scott
>>
>> >
>> > I hope some of these ideas help.
>> >
>> > J
>> >
>> > Anja Friedrich wrote:
>> >> Hi all,
>> >>
>> >> not sure if my earlier mail reached, because I was texting from a
>> >> different e-mail.
>> >>
>> >> I tried to load nested tandem repeats into chado. As this fature doesnt
>> >> exist yet for gff3 I tried to get around:
>> >>
>> >> 0  ##gff-version   3
>> >> 1  ##sequence-region   taro 5428 bp
>> >> 2 NTR program nested_repeat 831 1720 . . . ID=ID1;Name=ID1
>> >> 3 NTR program repeat_fragment 1505 1553 . + . Parent=ID1
>> >> 4 NTR program repeat_fragement 473 483 . + . Parent=ID1
>> >>
>> >> But I get this error message:
>> >>
>> >> anou@anou-laptop:~$ gmod_bulk_load_gff3.pl --organism Taro  --gfffile
>> >> taro.gffCommand line argument used for root
>> >> Preparing data for inserting into the chado database
>> >> (This may take a while ...)
>> >>
>> >> --------------------- WARNING ---------------------
>> >> MSG: Calling end without a defined start position
>> >> ---------------------------------------------------
>> >> Use of uninitialized value $featuretype in pattern match (m//) at
>> >> /usr/local/bin/gmod_bulk_load_gff3.pl line 808, <GEN0> line 1.
>> >> Use of uninitialized value $featuretype in pattern match (m//) at
>> >> /usr/local/bin/gmod_bulk_load_gff3.pl line 809, <GEN0> line 1.
>> >>
>> >> ------------- EXCEPTION: Bio::Root::Exception -------------
>> >> MSG: no cvterm for
>> >> STACK: Error::throw
>> >> STACK: Bio::Root::Root::throw
>> >> /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
>> >> STACK: Bio::GMOD::DB::Adapter::get_type
>> >> /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4579
>> >> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:838
>> >> -----------------------------------------------------------
>> >>
>> >> Someone an idea? Cant I add the feature like this?
>> >>
>> >> Cheers,
>> >> Anja
>> >>
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
>> >> Be part of this innovative community and reach millions of netbook
>> >> users
>> >> worldwide. Take advantage of special opportunities to increase revenue
>> >> and
>> >> speed time-to-market. Join now, and jumpstart your future.
>> >> http://p.sf.net/sfu/intel-atom-d2d
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------
>> >>
>> >> _______________________________________________
>> >> Gmod-schema mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>> > --
>> > -------------------------------
>> > João Marcelo Pereira Alves (J)
>> > Post-doctoral fellow
>> > MCV / VCU - Richmond, VA
>> > http://bioinfo.lpb.mic.vcu.edu
>> > f. 1-804-828-3897
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
>> > Be part of this innovative community and reach millions of netbook users
>> > worldwide. Take advantage of special opportunities to increase revenue
>> > and
>> > speed time-to-market. Join now, and jumpstart your future.
>> > http://p.sf.net/sfu/intel-atom-d2d
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users
worldwide. Take advantage of special opportunities to increase revenue and
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema