[Gmod-ajax] grouping match features with same ID

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-ajax] grouping match features with same ID

Scott Cain
Hello,

I'm trying to figure out if there is a way to get flatfile-to-json to realize that gff lines that share the same ID are the same feature and should be joined in the display.  I have a large set of EST and mRNA similarity results that look very much like the example in the GFF3 spec:

ctg123 . cDNA_match 1050 1500 5.8e-42 + . ID=match00001
ctg123 . cDNA_match 5000 5500 8.1e-43 + . ID=match00001
ctg123 . cDNA_match 7000 9000 1.4e-40 + . ID=match00001

(I took out the Target attributes to hopefully get these all on one line each).  

If I run flatfile-to-json on this gff, I would get three individual features that aren't linked in anyway.  Is this just something that flatfile-to-json just can't do (and I'll have to write a "parentify" script to "fix" it) or is there something I'm missing?

This is the command I'm running:

bin/flatfile-to-json.pl --gff ../c_elegans_gff/blat_est_best.gff --out data/c_elegans --type expressed_sequence_match:BLAT_EST_BEST --trackType CanvasFeatures --trackLabel "ESTs (best)" --key "ESTs (best)"

I've tried the Alignment and Segments canvas glyphs; is there something else that would work instead?

Thanks much,
Scott



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: grouping match features with same ID

Scott Cain
Hi again,

After a chat with Colin on IRC, we've come to the conclusion that flatfile-to-json doesn't support GFF structured like I've shown above.  Given that, does anybody happen to have a script handy that will take match features (or any specific type of match feature like expressed_sequence_match) that share an ID and converts them into GFF where there is a parent and several children match_part features?  Certainly, I can write my own, but if somebody else has already done it...  :-)

Thanks,
Scott

PS: it occurs to me that it might be nice to have a Contrib directory in the JBrowse git repository for things like custom glyphs and helper scripts.  Does anybody have an objection to me creating one?  

PPS: does anybody have an objection to me raising a structural question in a postscript?  I think it's terrible form :-)


On Wed, Aug 20, 2014 at 11:14 AM, Scott Cain <[hidden email]> wrote:
Hello,

I'm trying to figure out if there is a way to get flatfile-to-json to realize that gff lines that share the same ID are the same feature and should be joined in the display.  I have a large set of EST and mRNA similarity results that look very much like the example in the GFF3 spec:

ctg123 . cDNA_match 1050 1500 5.8e-42 + . ID=match00001
ctg123 . cDNA_match 5000 5500 8.1e-43 + . ID=match00001
ctg123 . cDNA_match 7000 9000 1.4e-40 + . ID=match00001

(I took out the Target attributes to hopefully get these all on one line each).  

If I run flatfile-to-json on this gff, I would get three individual features that aren't linked in anyway.  Is this just something that flatfile-to-json just can't do (and I'll have to write a "parentify" script to "fix" it) or is there something I'm missing?

This is the command I'm running:

bin/flatfile-to-json.pl --gff ../c_elegans_gff/blat_est_best.gff --out data/c_elegans --type expressed_sequence_match:BLAT_EST_BEST --trackType CanvasFeatures --trackLabel "ESTs (best)" --key "ESTs (best)"

I've tried the Alignment and Segments canvas glyphs; is there something else that would work instead?

Thanks much,
Scott



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: grouping match features with same ID

vkrishna
Hi Scott,

I have a python script which does this (chains the child features to create a parent and transfer a chosen set of attributes from children to parent). It is part of a larger python repository, originally developed by my coworker (Haibao Tang), to which I contribute modules and fixed.
The code repo can be accessed here: https://github.com/tanghaibao/jcvi

The specific module to use in this particular case is called “gff”, available here: https://github.com/tanghaibao/jcvi/blob/master/formats/gff.py#L658-L752.

Within the “gff” module, the "chain” action is the one you want to invoke:
python -m jcvi.formats.gff action match.gff -o match.chained.gff

You can either clone this repository directly from github and set it up to be in your PYTHONPATH or install it from PyPI (by invoking the command "easy_install jcvi)
Please refer to the README for more info: https://github.com/tanghaibao/jcvi/blob/master/README.rst (it’s only REQUIRED dependencies are biopython, numpy and matplotlib).

Thank you.
Vivek

On Aug 20, 2014, at 1:47 PM, Scott Cain <[hidden email]> wrote:

Hi again,

After a chat with Colin on IRC, we've come to the conclusion that flatfile-to-json doesn't support GFF structured like I've shown above.  Given that, does anybody happen to have a script handy that will take match features (or any specific type of match feature like expressed_sequence_match) that share an ID and converts them into GFF where there is a parent and several children match_part features?  Certainly, I can write my own, but if somebody else has already done it...  :-)

Thanks,
Scott

PS: it occurs to me that it might be nice to have a Contrib directory in the JBrowse git repository for things like custom glyphs and helper scripts.  Does anybody have an objection to me creating one?  

PPS: does anybody have an objection to me raising a structural question in a postscript?  I think it's terrible form :-)


On Wed, Aug 20, 2014 at 11:14 AM, Scott Cain <[hidden email]> wrote:
Hello,

I'm trying to figure out if there is a way to get flatfile-to-json to realize that gff lines that share the same ID are the same feature and should be joined in the display.  I have a large set of EST and mRNA similarity results that look very much like the example in the GFF3 spec:

ctg123 . cDNA_match 1050 1500 5.8e-42 + . ID=match00001
ctg123 . cDNA_match 5000 5500 8.1e-43 + . ID=match00001
ctg123 . cDNA_match 7000 9000 1.4e-40 + . ID=match00001

(I took out the Target attributes to hopefully get these all on one line each).  

If I run flatfile-to-json on this gff, I would get three individual features that aren't linked in anyway.  Is this just something that flatfile-to-json just can't do (and I'll have to write a "parentify" script to "fix" it) or is there something I'm missing?

This is the command I'm running:

bin/flatfile-to-json.pl --gff ../c_elegans_gff/blat_est_best.gff --out data/c_elegans --type expressed_sequence_match:BLAT_EST_BEST --trackType CanvasFeatures --trackLabel "ESTs (best)" --key "ESTs (best)"

I've tried the Alignment and Segments canvas glyphs; is there something else that would work instead?

Thanks much,
Scott



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax


------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: grouping match features with same ID

vkrishna
Sorry, I meant to type “chain” instead of “action":
python -m jcvi.formats.gff chain match.gff -o match.chained.gff

Vivek

On Aug 20, 2014, at 2:26 PM, Krishnakumar, Vivek <[hidden email]> wrote:

Within the “gff” module, the "chain” action is the one you want to invoke:
python -m jcvi.formats.gff action match.gff -o match.chained.gff


------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: grouping match features with same ID

Scott Cain
In reply to this post by vkrishna
Hi Vivek,

Thanks for this.  I'm not much of a python person, so it feels very foreign.  I'm not working on my own server, so I want to be careful about installing things in a system-wide way (I don't want to step on anybody's toes).  Since we don't already have biopython, I'll have to poke some people to see if it's ok.

Thanks,
Scott



On Wed, Aug 20, 2014 at 2:26 PM, Krishnakumar, Vivek <[hidden email]> wrote:
Hi Scott,

I have a python script which does this (chains the child features to create a parent and transfer a chosen set of attributes from children to parent). It is part of a larger python repository, originally developed by my coworker (Haibao Tang), to which I contribute modules and fixed.
The code repo can be accessed here: https://github.com/tanghaibao/jcvi

The specific module to use in this particular case is called “gff”, available here: https://github.com/tanghaibao/jcvi/blob/master/formats/gff.py#L658-L752.

Within the “gff” module, the "chain” action is the one you want to invoke:
python -m jcvi.formats.gff action match.gff -o match.chained.gff

You can either clone this repository directly from github and set it up to be in your PYTHONPATH or install it from PyPI (by invoking the command "easy_install jcvi)
Please refer to the README for more info: https://github.com/tanghaibao/jcvi/blob/master/README.rst (it’s only REQUIRED dependencies are biopython, numpy and matplotlib).

Thank you.
Vivek

On Aug 20, 2014, at 1:47 PM, Scott Cain <[hidden email]> wrote:

Hi again,

After a chat with Colin on IRC, we've come to the conclusion that flatfile-to-json doesn't support GFF structured like I've shown above.  Given that, does anybody happen to have a script handy that will take match features (or any specific type of match feature like expressed_sequence_match) that share an ID and converts them into GFF where there is a parent and several children match_part features?  Certainly, I can write my own, but if somebody else has already done it...  :-)

Thanks,
Scott

PS: it occurs to me that it might be nice to have a Contrib directory in the JBrowse git repository for things like custom glyphs and helper scripts.  Does anybody have an objection to me creating one?  

PPS: does anybody have an objection to me raising a structural question in a postscript?  I think it's terrible form :-)


On Wed, Aug 20, 2014 at 11:14 AM, Scott Cain <[hidden email]> wrote:
Hello,

I'm trying to figure out if there is a way to get flatfile-to-json to realize that gff lines that share the same ID are the same feature and should be joined in the display.  I have a large set of EST and mRNA similarity results that look very much like the example in the GFF3 spec:

ctg123 . cDNA_match 1050 1500 5.8e-42 + . ID=match00001
ctg123 . cDNA_match 5000 5500 8.1e-43 + . ID=match00001
ctg123 . cDNA_match 7000 9000 1.4e-40 + . ID=match00001

(I took out the Target attributes to hopefully get these all on one line each).  

If I run flatfile-to-json on this gff, I would get three individual features that aren't linked in anyway.  Is this just something that flatfile-to-json just can't do (and I'll have to write a "parentify" script to "fix" it) or is there something I'm missing?

This is the command I'm running:

bin/flatfile-to-json.pl --gff ../c_elegans_gff/blat_est_best.gff --out data/c_elegans --type expressed_sequence_match:BLAT_EST_BEST --trackType CanvasFeatures --trackLabel "ESTs (best)" --key "ESTs (best)"

I've tried the Alignment and Segments canvas glyphs; is there something else that would work instead?

Thanks much,
Scott



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax