Add strand-specific genomic region search option to main IM repo?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Add strand-specific genomic region search option to main IM repo?

Sam Hokin-3
Hi, devs. I've come up against the need to do strand-specific genomic region searches. I've added a MEME motif analysis to the list
analysis for Gene Flanking Regions, and one of the tasks that I do is to BLAST an interesting motif against the genome, resulting in
a list of regions, some on the + strand (start<end) and some on the - strand (start>end). I'm then using IM to find overlaps of
those BLAST-generated regions with Gene Flanking Regions. But for that to make sense, the search needs to be strand-specific.

So, first of all I did a minor update to the dev branch to allow regions with start>end. That was meant to be possible, but there
was a bug in one routine and explicit disallowing of this (for no known reason) in another. So the current dev branch now lets you
search against regions with start>end - but it's still not strand-specific, so it's really not any different than before, other than
the convenience of not having to switch coordinates in the input when start>end.

I've also done an update locally that adds a checkbox to demand strand-specific searches, presuming, of course, that you have any
regions on the - strand indicated by start>end. I've added a minusStrand boolean to GenomicRegion and a strandSpecific boolean flag
to other code that is set by this checkbox.

It works, but there is a major problem: if I do a search with strandSpecific=false, I'll get some (correct) results. If I go back
and toggle strandSpecific=true with the same input regions, the queries are cached and I get the same (now incorrect) results as
before. Something is telling the back end to use precomputed tables without being aware of the strandSpecific flag. I'd appreciate
any tips on how I can stop that from happening. The log confirms this, only showing the queries in the first go-round.

Otherwise, I'm wondering how folks feel about this added functionality. I think it's an important feature since there are
situations, like this one, where you really only want to find overlapping regions on the same strand as the query region. Default is
off, of course. I've included a screen grab of the genomic region search form with the added checkbox.

I had to hack eight files for this, so I'd like to PR it soon, if folks are happy with having this new genomic region search option.
Presuming that I can also prevent it from using precomputed tables when the flag is switched!

modified:   bio/webapp/resources/webapp/model/genomicRegionSearchOptionsBase.jsp
modified:   bio/webapp/src/org/intermine/bio/web/logic/GenomicRegionSearchService.java
modified:   bio/webapp/src/org/intermine/bio/web/logic/GenomicRegionSearchUtil.java
modified:   bio/webapp/src/org/intermine/bio/web/model/GenomicRegion.java
modified:   bio/webapp/src/org/intermine/bio/web/model/GenomicRegionSearchConstraint.java
modified:   bio/webapp/src/org/intermine/bio/web/struts/GenomicRegionSearchAction.java
modified:   bio/webapp/src/org/intermine/bio/webservice/GenomicRegionSearchListInput.java
modified:   bio/webapp/src/org/intermine/bio/webservice/GenomicRegionSearchService.java

If not, of course, I'll just roll these updates into my own webapps, suffering the fork off of the dev branch for those files.

Cheers!
Sam



_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

strand-specific-checkbox.png (79K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Add strand-specific genomic region search option to main IM repo?

Sam Hokin-3
OK, I fixed the precomputed tables problem - I needed to add my strandSpecific flag to GenomicRegionSearchConstraint.equals() so the
query running code knows that the constraints are different with strandSpecific=true or false.

So, the remaining question is: should this region search tweak be PRed to the main dev repo? Again, it's a bunch of files so I'd
rather do it fairly soon so merging isn't required.

On 09/21/2016 06:41 PM, Sam Hokin wrote:
>
> It works, but there is a major problem: if I do a search with strandSpecific=false, I'll get some (correct) results. If I go back
> and toggle strandSpecific=true with the same input regions, the queries are cached and I get the same (now incorrect) results as
> before. Something is telling the back end to use precomputed tables without being aware of the strandSpecific flag. I'd appreciate
> any tips on how I can stop that from happening. The log confirms this, only showing the queries in the first go-round.
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Add strand-specific genomic region search option to main IM repo?

Justin Clark-Casey-2
Hi Sam.  I would say, yes, it would be great to have a PR for this.  If
you want to be absolutely sure, you might want to wait until Monday when
Rachel and Julie are back around, but if generating the PR is not too
much work then I would say do it anyway.  It will put it in front of us
on Github to simply merge or discuss further.

On 2016-09-22 17:38, Sam Hokin wrote:

> OK, I fixed the precomputed tables problem - I needed to add my
> strandSpecific flag to GenomicRegionSearchConstraint.equals() so the
> query running code knows that the constraints are different with
> strandSpecific=true or false.
>
> So, the remaining question is: should this region search tweak be PRed
> to the main dev repo? Again, it's a bunch of files so I'd rather do it
> fairly soon so merging isn't required.
>
> On 09/21/2016 06:41 PM, Sam Hokin wrote:
>>
>> It works, but there is a major problem: if I do a search with
>> strandSpecific=false, I'll get some (correct) results. If I go back
>> and toggle strandSpecific=true with the same input regions, the
>> queries are cached and I get the same (now incorrect) results as
>> before. Something is telling the back end to use precomputed tables
>> without being aware of the strandSpecific flag. I'd appreciate
>> any tips on how I can stop that from happening. The log confirms this,
>> only showing the queries in the first go-round.
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Add strand-specific genomic region search option to main IM repo?

Sam Hokin-3
Done. I was terribly clunky in making a PR for each of the eight files; I should have combined the eight commits into a single PR
but I'd gone down the wrong path at the start and figured it's best to go one way or the other. I'm still a git newb but I promise
to get better.

On 09/23/2016 05:34 AM, [hidden email] wrote:
> Hi Sam.  I would say, yes, it would be great to have a PR for this.  If you want to be absolutely sure, you might want to wait until
> Monday when Rachel and Julie are back around, but if generating the PR is not too much work then I would say do it anyway.  It will
> put it in front of us on Github to simply merge or discuss further.
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Loading...