[biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

joe carlson
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

Arek Kasprzyk
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

joe carlson
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson

On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser


--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

Arek Kasprzyk
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <[hidden email]> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

joe carlson
Hello again Arek (and other group members),

I have the sequence plugin working for our new biomart. At this point, I have a lot of styling to do (and actually load data) but I think all my conceptual hurdles are over with.

That said, I'm very concerned about one aspect of the code that I see. The parseLine method in classes that inherit from SequenceParser often contain a block of code (like this one from CodingParse):


        if (!line[transcriptIDField].equals(transcriptID)) {
            //If it's a new transcript, print the current sequence
            if (transcriptID != null) {
                results = getCoding(getHeader(), chr, start, end, codingStartOffset, codingEndOffset, startExonRank, endExonRank, strand, startPhase, codonTableID, seqEdit,isProtein);
            }
            transcriptID = line[transcriptIDField];
            ....

You''re processing the rows one at a time and displaying the coding portion of the previously seen transcript as soon as you advance to a new transcript. But there is nothing in the SQL that specifies that the results are ordered by transcript id; so how can you be assured that when we advance to the next transcript we captured all exons of the previous transcripts? Is there some sorting or ordering that can be specified?

I'm extracting data from a table sequence__metaseq__main as you described to me earlier. When I load that table with rows grouped together by transcript id, everything works perfectly. But if I update a record I see that a subsequent select of multiple transcripts will not keep things grouped by transcript id. This was generating errors for me since I was processing single exons of a multiple-exon transcript; the "start" and "end" TreeMaps were not fully populated and I was getting NullPointerExceptions.

I don't see anything in QueryCompiler that introduces ORDER BY into the SQL. I was wondering it there is some alternate mechanism for getting the results to be properly grouped. Is there some property on the attribute list 'coding' on the sequence__metaseq__main table that would guarantee this?

Thanks,

Joe

On Wednesday, May 18, 2016 at 6:59:20 AM UTC-7, Arek Kasprzyk wrote:
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="k7PTkB1LCgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">jwca...@...> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="k7PTkB1LCgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">jwca...@...> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="k7PTkB1LCgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">biomart-user...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/biomart-users" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/biomart-users&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/biomart-users&#39;;return true;">https://groups.google.com/group/biomart-users.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

joe carlson

Following up on my own posting....

I saw the SQL that you have given me for generating the metaseq table. It has an "ORDER BY main.stable_id_1023, main.stable_id_1066" at the end. I guess I did not realize the importance of that clause until now. Since I was doing updates on this table, that order was not preserved after I mucked with it.

I'm coming from more of a postgresql background and didn't know about the ORDER BY when creating tables in mysql. I think this alone will solve all of my concerns.

Joe


On Thursday, June 16, 2016 at 4:20:11 PM UTC-7, [hidden email] wrote:
Hello again Arek (and other group members),

I have the sequence plugin working for our new biomart. At this point, I have a lot of styling to do (and actually load data) but I think all my conceptual hurdles are over with.

That said, I'm very concerned about one aspect of the code that I see. The parseLine method in classes that inherit from SequenceParser often contain a block of code (like this one from CodingParse):


        if (!line[transcriptIDField].equals(transcriptID)) {
            //If it's a new transcript, print the current sequence
            if (transcriptID != null) {
                results = getCoding(getHeader(), chr, start, end, codingStartOffset, codingEndOffset, startExonRank, endExonRank, strand, startPhase, codonTableID, seqEdit,isProtein);
            }
            transcriptID = line[transcriptIDField];
            ....

You''re processing the rows one at a time and displaying the coding portion of the previously seen transcript as soon as you advance to a new transcript. But there is nothing in the SQL that specifies that the results are ordered by transcript id; so how can you be assured that when we advance to the next transcript we captured all exons of the previous transcripts? Is there some sorting or ordering that can be specified?

I'm extracting data from a table sequence__metaseq__main as you described to me earlier. When I load that table with rows grouped together by transcript id, everything works perfectly. But if I update a record I see that a subsequent select of multiple transcripts will not keep things grouped by transcript id. This was generating errors for me since I was processing single exons of a multiple-exon transcript; the "start" and "end" TreeMaps were not fully populated and I was getting NullPointerExceptions.

I don't see anything in QueryCompiler that introduces ORDER BY into the SQL. I was wondering it there is some alternate mechanism for getting the results to be properly grouped. Is there some property on the attribute list 'coding' on the sequence__metaseq__main table that would guarantee this?

Thanks,

Joe

On Wednesday, May 18, 2016 at 6:59:20 AM UTC-7, Arek Kasprzyk wrote:
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <[hidden email]> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biomart-user...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/biomart-users" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/group/biomart-users&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/biomart-users&#39;;return true;">https://groups.google.com/group/biomart-users.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

Arek Kasprzyk
In reply to this post by joe carlson
Hi Joe,

I do remember having prolonged discussions about this subject but I am not 100% sure what we settled on in the end. As far as I remember we decided against 'order by' because of the performance hit we were getting with that. Instead we decided to order data in the table and drive ordering through that. I am fully aware that the order of the rows depends on the type of query that you issue. However, since we always were asking for the same stuff, the order seemed to be maintained for us. 

As for your question regarding updates. I wonder if caching may have something to do it with that. Maybe it would be worth flushing tables or even restart the MySQL server to be on a safe side?


a.



On 17 June 2016 at 00:20, <[hidden email]> wrote:
Hello again Arek (and other group members),

I have the sequence plugin working for our new biomart. At this point, I have a lot of styling to do (and actually load data) but I think all my conceptual hurdles are over with.

That said, I'm very concerned about one aspect of the code that I see. The parseLine method in classes that inherit from SequenceParser often contain a block of code (like this one from CodingParse):


        if (!line[transcriptIDField].equals(transcriptID)) {
            //If it's a new transcript, print the current sequence
            if (transcriptID != null) {
                results = getCoding(getHeader(), chr, start, end, codingStartOffset, codingEndOffset, startExonRank, endExonRank, strand, startPhase, codonTableID, seqEdit,isProtein);
            }
            transcriptID = line[transcriptIDField];
            ....

You''re processing the rows one at a time and displaying the coding portion of the previously seen transcript as soon as you advance to a new transcript. But there is nothing in the SQL that specifies that the results are ordered by transcript id; so how can you be assured that when we advance to the next transcript we captured all exons of the previous transcripts? Is there some sorting or ordering that can be specified?

I'm extracting data from a table sequence__metaseq__main as you described to me earlier. When I load that table with rows grouped together by transcript id, everything works perfectly. But if I update a record I see that a subsequent select of multiple transcripts will not keep things grouped by transcript id. This was generating errors for me since I was processing single exons of a multiple-exon transcript; the "start" and "end" TreeMaps were not fully populated and I was getting NullPointerExceptions.

I don't see anything in QueryCompiler that introduces ORDER BY into the SQL. I was wondering it there is some alternate mechanism for getting the results to be properly grouped. Is there some property on the attribute list 'coding' on the sequence__metaseq__main table that would guarantee this?

Thanks,

Joe

On Wednesday, May 18, 2016 at 6:59:20 AM UTC-7, Arek Kasprzyk wrote:
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <[hidden email]> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

joe carlson
Hi Arek,

Thanks again for your thoughts. I think I agree with you that it ought to work if the tables are stable even though it seems a big fragile. It was only an issue for me since I was still trying to set up the backing database and had to correct some off-by-one values in the cds_exon_start fields. This is what forced the update and consequently the reordering. After everything is settled down I should be able to write the tables correctly initially.

But now I’d like to bring up my next hurdle. As I have it now, I have 2 gui pages, a martform for features and another for sequences. And both of them work well. I can filter the sequences based on the gene or transcript name and based on the location.

But what I’d like to do is to connect the 2 gui pages. Think of the use case of someone who wants to retrieve the protein sequences of everything that contains a specific PFAM domain. This was the type of query that gets used on our 0.7 site and I wanted to continue with this type of case. I understand how this was implemented with importables and exportable in 0.7 but cannot see how it could be done in 0.9. Of course the results can retrieved by running the query in the martform, then copying and pasting into the filter for the sequence retrieval, but I was hoping for something more direct.

I had thought that this was something that could have been achieved by creating a dataset link in the martconfigurator, but this seems not to be the case. As best I can understand what I need to do is to write my own gui plugin which is a hybrid of the martform and sequence gui. Essentially it boils down to being able to set the processor attribute in the <Query> element and the name/config attributes in the <Dataset> element. I believe I see how this is done; but wanted to pass this by you in case there was something I was missing.

Thanks,

Joe

On Jun 22, 2016, at 1:54 AM, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

I do remember having prolonged discussions about this subject but I am not 100% sure what we settled on in the end. As far as I remember we decided against 'order by' because of the performance hit we were getting with that. Instead we decided to order data in the table and drive ordering through that. I am fully aware that the order of the rows depends on the type of query that you issue. However, since we always were asking for the same stuff, the order seemed to be maintained for us. 

As for your question regarding updates. I wonder if caching may have something to do it with that. Maybe it would be worth flushing tables or even restart the MySQL server to be on a safe side?


a.



On 17 June 2016 at 00:20, <[hidden email]> wrote:
Hello again Arek (and other group members),

I have the sequence plugin working for our new biomart. At this point, I have a lot of styling to do (and actually load data) but I think all my conceptual hurdles are over with.

That said, I'm very concerned about one aspect of the code that I see. The parseLine method in classes that inherit from SequenceParser often contain a block of code (like this one from CodingParse):


        if (!line[transcriptIDField].equals(transcriptID)) {
            //If it's a new transcript, print the current sequence
            if (transcriptID != null) {
                results = getCoding(getHeader(), chr, start, end, codingStartOffset, codingEndOffset, startExonRank, endExonRank, strand, startPhase, codonTableID, seqEdit,isProtein);
            }
            transcriptID = line[transcriptIDField];
            ....

You''re processing the rows one at a time and displaying the coding portion of the previously seen transcript as soon as you advance to a new transcript. But there is nothing in the SQL that specifies that the results are ordered by transcript id; so how can you be assured that when we advance to the next transcript we captured all exons of the previous transcripts? Is there some sorting or ordering that can be specified?

I'm extracting data from a table sequence__metaseq__main as you described to me earlier. When I load that table with rows grouped together by transcript id, everything works perfectly. But if I update a record I see that a subsequent select of multiple transcripts will not keep things grouped by transcript id. This was generating errors for me since I was processing single exons of a multiple-exon transcript; the "start" and "end" TreeMaps were not fully populated and I was getting NullPointerExceptions.

I don't see anything in QueryCompiler that introduces ORDER BY into the SQL. I was wondering it there is some alternate mechanism for getting the results to be properly grouped. Is there some property on the attribute list 'coding' on the sequence__metaseq__main table that would guarantee this?

Thanks,

Joe

On Wednesday, May 18, 2016 at 6:59:20 AM UTC-7, Arek Kasprzyk wrote:
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <[hidden email]> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser




--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

Arek Kasprzyk
Hi Joe,

You will not be able to connect two GUI pages in a 'end user' type of way as it was possible in 0.7. The only way to do it is to combine attributes from different datasets in one config (using 'import from sources' on the left hand side of your config viewer window) during configuration time. The end user is then presented with a single GUI that contains attributes from different datasets.

This works well in generic case. However for sequence retrieval, I believe you need to define all the attributes in the meta_seq database before importing them to a config?

Sorry could not be of a more help here,
a.






On 22 June 2016 at 17:56, Joe Carlson <[hidden email]> wrote:
Hi Arek,

Thanks again for your thoughts. I think I agree with you that it ought to work if the tables are stable even though it seems a big fragile. It was only an issue for me since I was still trying to set up the backing database and had to correct some off-by-one values in the cds_exon_start fields. This is what forced the update and consequently the reordering. After everything is settled down I should be able to write the tables correctly initially.

But now I’d like to bring up my next hurdle. As I have it now, I have 2 gui pages, a martform for features and another for sequences. And both of them work well. I can filter the sequences based on the gene or transcript name and based on the location.

But what I’d like to do is to connect the 2 gui pages. Think of the use case of someone who wants to retrieve the protein sequences of everything that contains a specific PFAM domain. This was the type of query that gets used on our 0.7 site and I wanted to continue with this type of case. I understand how this was implemented with importables and exportable in 0.7 but cannot see how it could be done in 0.9. Of course the results can retrieved by running the query in the martform, then copying and pasting into the filter for the sequence retrieval, but I was hoping for something more direct.

I had thought that this was something that could have been achieved by creating a dataset link in the martconfigurator, but this seems not to be the case. As best I can understand what I need to do is to write my own gui plugin which is a hybrid of the martform and sequence gui. Essentially it boils down to being able to set the processor attribute in the <Query> element and the name/config attributes in the <Dataset> element. I believe I see how this is done; but wanted to pass this by you in case there was something I was missing.

Thanks,

Joe

On Jun 22, 2016, at 1:54 AM, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

I do remember having prolonged discussions about this subject but I am not 100% sure what we settled on in the end. As far as I remember we decided against 'order by' because of the performance hit we were getting with that. Instead we decided to order data in the table and drive ordering through that. I am fully aware that the order of the rows depends on the type of query that you issue. However, since we always were asking for the same stuff, the order seemed to be maintained for us. 

As for your question regarding updates. I wonder if caching may have something to do it with that. Maybe it would be worth flushing tables or even restart the MySQL server to be on a safe side?


a.



On 17 June 2016 at 00:20, <[hidden email]> wrote:
Hello again Arek (and other group members),

I have the sequence plugin working for our new biomart. At this point, I have a lot of styling to do (and actually load data) but I think all my conceptual hurdles are over with.

That said, I'm very concerned about one aspect of the code that I see. The parseLine method in classes that inherit from SequenceParser often contain a block of code (like this one from CodingParse):


        if (!line[transcriptIDField].equals(transcriptID)) {
            //If it's a new transcript, print the current sequence
            if (transcriptID != null) {
                results = getCoding(getHeader(), chr, start, end, codingStartOffset, codingEndOffset, startExonRank, endExonRank, strand, startPhase, codonTableID, seqEdit,isProtein);
            }
            transcriptID = line[transcriptIDField];
            ....

You''re processing the rows one at a time and displaying the coding portion of the previously seen transcript as soon as you advance to a new transcript. But there is nothing in the SQL that specifies that the results are ordered by transcript id; so how can you be assured that when we advance to the next transcript we captured all exons of the previous transcripts? Is there some sorting or ordering that can be specified?

I'm extracting data from a table sequence__metaseq__main as you described to me earlier. When I load that table with rows grouped together by transcript id, everything works perfectly. But if I update a record I see that a subsequent select of multiple transcripts will not keep things grouped by transcript id. This was generating errors for me since I was processing single exons of a multiple-exon transcript; the "start" and "end" TreeMaps were not fully populated and I was getting NullPointerExceptions.

I don't see anything in QueryCompiler that introduces ORDER BY into the SQL. I was wondering it there is some alternate mechanism for getting the results to be properly grouped. Is there some property on the attribute list 'coding' on the sequence__metaseq__main table that would guarantee this?

Thanks,

Joe

On Wednesday, May 18, 2016 at 6:59:20 AM UTC-7, Arek Kasprzyk wrote:
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <[hidden email]> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser




--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

joe carlson
Hi Arek,

Thanks for the thoughts. Maybe I didn't make myself too clear on what I was hoping to do. Essentially, I've written my own GUI plugin by combining elements of the martform and the sequence plugin. I have a single page with the usual set of filters and attributes, along with a button that displays the TSV features of as in the martform, and a second button that generates the fasta file of sequences. Needless to say, there was extensive coping and pasting of your code into the hybrid (Frankenstein?) jsp and js.

I expect this will be public in a couple weeks; I'll send along a link when its up.

There was 1 (hopefully final) issue. You may have seen that I made a pull request to fix a trivial typo in the one of the parsers in the sequence plugin. I also was thinking of submiting another pull request to turn a few private methods of Sequence.java into protected methods. For our implementation I wanted to subclass the sequence processor classes. We have a single mart with many (>50) organisms and I believe the easiest way for us to maintain it is with a single mart of features and separate tables of genomic sequence for each organism. The code is very clean if I can subclass Sequence.java, but this requires access to a couple of methods. Do you want this pull requests? It would be slightly more complicated for us to manage our deployments by patching the source as part of our build process - but not impossible. (This is actually what we are doing with our 0.7 biomart: we patch the downloaded perl code to enable selecting the organism as part of the sequence retrieval.)

Thanks for your help,

Joe

On 07/06/2016 07:00 AM, Arek Kasprzyk wrote:
Hi Joe,

You will not be able to connect two GUI pages in a 'end user' type of way as it was possible in 0.7. The only way to do it is to combine attributes from different datasets in one config (using 'import from sources' on the left hand side of your config viewer window) during configuration time. The end user is then presented with a single GUI that contains attributes from different datasets.

This works well in generic case. However for sequence retrieval, I believe you need to define all the attributes in the meta_seq database before importing them to a config?

Sorry could not be of a more help here,
a.






On 22 June 2016 at 17:56, Joe Carlson <[hidden email]> wrote:
Hi Arek,

Thanks again for your thoughts. I think I agree with you that it ought to work if the tables are stable even though it seems a big fragile. It was only an issue for me since I was still trying to set up the backing database and had to correct some off-by-one values in the cds_exon_start fields. This is what forced the update and consequently the reordering. After everything is settled down I should be able to write the tables correctly initially.

But now I’d like to bring up my next hurdle. As I have it now, I have 2 gui pages, a martform for features and another for sequences. And both of them work well. I can filter the sequences based on the gene or transcript name and based on the location.

But what I’d like to do is to connect the 2 gui pages. Think of the use case of someone who wants to retrieve the protein sequences of everything that contains a specific PFAM domain. This was the type of query that gets used on our 0.7 site and I wanted to continue with this type of case. I understand how this was implemented with importables and exportable in 0.7 but cannot see how it could be done in 0.9. Of course the results can retrieved by running the query in the martform, then copying and pasting into the filter for the sequence retrieval, but I was hoping for something more direct.

I had thought that this was something that could have been achieved by creating a dataset link in the martconfigurator, but this seems not to be the case. As best I can understand what I need to do is to write my own gui plugin which is a hybrid of the martform and sequence gui. Essentially it boils down to being able to set the processor attribute in the <Query> element and the name/config attributes in the <Dataset> element. I believe I see how this is done; but wanted to pass this by you in case there was something I was missing.

Thanks,

Joe

On Jun 22, 2016, at 1:54 AM, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

I do remember having prolonged discussions about this subject but I am not 100% sure what we settled on in the end. As far as I remember we decided against 'order by' because of the performance hit we were getting with that. Instead we decided to order data in the table and drive ordering through that. I am fully aware that the order of the rows depends on the type of query that you issue. However, since we always were asking for the same stuff, the order seemed to be maintained for us. 

As for your question regarding updates. I wonder if caching may have something to do it with that. Maybe it would be worth flushing tables or even restart the MySQL server to be on a safe side?


a.



On 17 June 2016 at 00:20, <[hidden email]> wrote:
Hello again Arek (and other group members),

I have the sequence plugin working for our new biomart. At this point, I have a lot of styling to do (and actually load data) but I think all my conceptual hurdles are over with.

That said, I'm very concerned about one aspect of the code that I see. The parseLine method in classes that inherit from SequenceParser often contain a block of code (like this one from CodingParse):


        if (!line[transcriptIDField].equals(transcriptID)) {
            //If it's a new transcript, print the current sequence
            if (transcriptID != null) {
                results = getCoding(getHeader(), chr, start, end, codingStartOffset, codingEndOffset, startExonRank, endExonRank, strand, startPhase, codonTableID, seqEdit,isProtein);
            }
            transcriptID = line[transcriptIDField];
            ....

You''re processing the rows one at a time and displaying the coding portion of the previously seen transcript as soon as you advance to a new transcript. But there is nothing in the SQL that specifies that the results are ordered by transcript id; so how can you be assured that when we advance to the next transcript we captured all exons of the previous transcripts? Is there some sorting or ordering that can be specified?

I'm extracting data from a table sequence__metaseq__main as you described to me earlier. When I load that table with rows grouped together by transcript id, everything works perfectly. But if I update a record I see that a subsequent select of multiple transcripts will not keep things grouped by transcript id. This was generating errors for me since I was processing single exons of a multiple-exon transcript; the "start" and "end" TreeMaps were not fully populated and I was getting NullPointerExceptions.

I don't see anything in QueryCompiler that introduces ORDER BY into the SQL. I was wondering it there is some alternate mechanism for getting the results to be properly grouped. Is there some property on the attribute list 'coding' on the sequence__metaseq__main table that would guarantee this?

Thanks,

Joe

On Wednesday, May 18, 2016 at 6:59:20 AM UTC-7, Arek Kasprzyk wrote:
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <[hidden email]> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser




--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.




--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Help in configuring a 0.9 BioMart for sequence retrieval

Arek Kasprzyk
Hi Joe,
Thanks for submitting your pull requests. I merged them accordingly.

HTH,
a.

On 7 July 2016 at 23:48, Joe Carlson <[hidden email]> wrote:
Hi Arek,

Thanks for the thoughts. Maybe I didn't make myself too clear on what I was hoping to do. Essentially, I've written my own GUI plugin by combining elements of the martform and the sequence plugin. I have a single page with the usual set of filters and attributes, along with a button that displays the TSV features of as in the martform, and a second button that generates the fasta file of sequences. Needless to say, there was extensive coping and pasting of your code into the hybrid (Frankenstein?) jsp and js.

I expect this will be public in a couple weeks; I'll send along a link when its up.

There was 1 (hopefully final) issue. You may have seen that I made a pull request to fix a trivial typo in the one of the parsers in the sequence plugin. I also was thinking of submiting another pull request to turn a few private methods of Sequence.java into protected methods. For our implementation I wanted to subclass the sequence processor classes. We have a single mart with many (>50) organisms and I believe the easiest way for us to maintain it is with a single mart of features and separate tables of genomic sequence for each organism. The code is very clean if I can subclass Sequence.java, but this requires access to a couple of methods. Do you want this pull requests? It would be slightly more complicated for us to manage our deployments by patching the source as part of our build process - but not impossible. (This is actually what we are doing with our 0.7 biomart: we patch the downloaded perl code to enable selecting the organism as part of the sequence retrieval.)

Thanks for your help,

Joe


On 07/06/2016 07:00 AM, Arek Kasprzyk wrote:
Hi Joe,

You will not be able to connect two GUI pages in a 'end user' type of way as it was possible in 0.7. The only way to do it is to combine attributes from different datasets in one config (using 'import from sources' on the left hand side of your config viewer window) during configuration time. The end user is then presented with a single GUI that contains attributes from different datasets.

This works well in generic case. However for sequence retrieval, I believe you need to define all the attributes in the meta_seq database before importing them to a config?

Sorry could not be of a more help here,
a.






On 22 June 2016 at 17:56, Joe Carlson <[hidden email]> wrote:
Hi Arek,

Thanks again for your thoughts. I think I agree with you that it ought to work if the tables are stable even though it seems a big fragile. It was only an issue for me since I was still trying to set up the backing database and had to correct some off-by-one values in the cds_exon_start fields. This is what forced the update and consequently the reordering. After everything is settled down I should be able to write the tables correctly initially.

But now I’d like to bring up my next hurdle. As I have it now, I have 2 gui pages, a martform for features and another for sequences. And both of them work well. I can filter the sequences based on the gene or transcript name and based on the location.

But what I’d like to do is to connect the 2 gui pages. Think of the use case of someone who wants to retrieve the protein sequences of everything that contains a specific PFAM domain. This was the type of query that gets used on our 0.7 site and I wanted to continue with this type of case. I understand how this was implemented with importables and exportable in 0.7 but cannot see how it could be done in 0.9. Of course the results can retrieved by running the query in the martform, then copying and pasting into the filter for the sequence retrieval, but I was hoping for something more direct.

I had thought that this was something that could have been achieved by creating a dataset link in the martconfigurator, but this seems not to be the case. As best I can understand what I need to do is to write my own gui plugin which is a hybrid of the martform and sequence gui. Essentially it boils down to being able to set the processor attribute in the <Query> element and the name/config attributes in the <Dataset> element. I believe I see how this is done; but wanted to pass this by you in case there was something I was missing.

Thanks,

Joe

On Jun 22, 2016, at 1:54 AM, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

I do remember having prolonged discussions about this subject but I am not 100% sure what we settled on in the end. As far as I remember we decided against 'order by' because of the performance hit we were getting with that. Instead we decided to order data in the table and drive ordering through that. I am fully aware that the order of the rows depends on the type of query that you issue. However, since we always were asking for the same stuff, the order seemed to be maintained for us. 

As for your question regarding updates. I wonder if caching may have something to do it with that. Maybe it would be worth flushing tables or even restart the MySQL server to be on a safe side?


a.



On 17 June 2016 at 00:20, <[hidden email]> wrote:
Hello again Arek (and other group members),

I have the sequence plugin working for our new biomart. At this point, I have a lot of styling to do (and actually load data) but I think all my conceptual hurdles are over with.

That said, I'm very concerned about one aspect of the code that I see. The parseLine method in classes that inherit from SequenceParser often contain a block of code (like this one from CodingParse):


        if (!line[transcriptIDField].equals(transcriptID)) {
            //If it's a new transcript, print the current sequence
            if (transcriptID != null) {
                results = getCoding(getHeader(), chr, start, end, codingStartOffset, codingEndOffset, startExonRank, endExonRank, strand, startPhase, codonTableID, seqEdit,isProtein);
            }
            transcriptID = line[transcriptIDField];
            ....

You''re processing the rows one at a time and displaying the coding portion of the previously seen transcript as soon as you advance to a new transcript. But there is nothing in the SQL that specifies that the results are ordered by transcript id; so how can you be assured that when we advance to the next transcript we captured all exons of the previous transcripts? Is there some sorting or ordering that can be specified?

I'm extracting data from a table sequence__metaseq__main as you described to me earlier. When I load that table with rows grouped together by transcript id, everything works perfectly. But if I update a record I see that a subsequent select of multiple transcripts will not keep things grouped by transcript id. This was generating errors for me since I was processing single exons of a multiple-exon transcript; the "start" and "end" TreeMaps were not fully populated and I was getting NullPointerExceptions.

I don't see anything in QueryCompiler that introduces ORDER BY into the SQL. I was wondering it there is some alternate mechanism for getting the results to be properly grouped. Is there some property on the attribute list 'coding' on the sequence__metaseq__main table that would guarantee this?

Thanks,

Joe

On Wednesday, May 18, 2016 at 6:59:20 AM UTC-7, Arek Kasprzyk wrote:
Hi Joe,

No problem. The sequence plugin is a bit tricky in 0.9 so i can relate to your frustrations.
 If you like i can send you a config file (off the list) with this plugin configured if this is of any help to you

a.

On 13 May 2016 at 02:57, Joe Carlson <[hidden email]> wrote:
Hi Arek,

After a (long) delay I've started working on this again. I'm afraid I'm still having some troubles using MartConfigurator to establish the links between sources when using the sequence plugin.

I have one source with the exon structure and I know I need to make an attribute list of transcript_id, chromsome name, exon start, exon end and strand to send to the sequence plugin. But I'm not clear on the steps involved to do this in the 0.9 code base. I understand how it was done in the perl code: create an Exportable (with the names 'cdna',...) in structure dataset with the attributes, and an Importable in the genomic sequence source with the same fields, name and link version.

With the new code, I have been trying to create an attribute list in the structure data source and put the proper fields in it. But I don't understand how to create the corresponding object in the genomic data source. Do I create pseudo attributes, then a filter list? When making the links in MartConfigurator, what do I drag onto what?

Sorry to bug you. But I just cannot quite get it straightened out. I've gotten it to the point that the CDNAParser code objects that I'm not giving it the proper fields. This is progress of a sort, but I'm missing the final step.

Thanks,

joe Carlson


On 01/06/2016 02:28 AM, Arek Kasprzyk wrote:
Hi Joe,
Hope you had a good holiday.

you are right in saying that the sequence plugin is needed for connecting gene coordinates with the sequence. However, as far as I know the RegionsDino is only needed for the Enrichment plugin. 
I have a community portal registry with the configuration for sequence retrieval. I'll email this to you separately.

a.





On 24 December 2015 at 23:18, <[hidden email]> wrote:
Hello,

Thanks for all your work on BioMart. And your previous explanation on the submain tables helped me clear up an issue with configuring a registry file for a 0.9 mart.

What I have is a simple mart that will allow me do some basic filter operations and attribute selection. So this part is working.

Now my next task is to get gene/transcript/protein sequence retrieval to work. From what I can see in the code and the documentation, this requires using the sequence plugin and RegionsDino for connecting the gene coordinates in 1 mart to the separate mart with chunks of dna sequence. The two marts are set up, and I can bring up web pages in the sequence ui. But since I have not specified any input parameters, I cannot actually extract sequence. It is not clear to me how to use martconfigurator to link the attributes together.

Does someone have an example registry file in which this is done? Or a cookbook on how to get started?

Our dna chunk mart is a holdover from a biomart-0.7 server. Is the structure of this mart still suitable for a 0.9 server?

Thanks,

Joe Carlson
--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


"The universe is made of stories, not of atoms."
 Muriel Rukeyser




--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.




--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.