How to get search to automatically search text that CONTAINS the search string?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get search to automatically search text that CONTAINS the search string?

Sam Hokin-3
I've updated LegumeMine and we're using the new gigantic LIS identifiers for genes, like glyma.Wm82.gnm2.ann1.Glyma.08G189800. But
I'd like someone to find that gene if they just type "Glyma.08G189800" into the search without asterisks ... which isn't happening.
I have to search on "*Glyma.08G189800". (The secondaryIdentifier also has a prefix: "glyma.Glyma.08G189800".)

It seems to me there is a setting somewhere that tells the search to return results that _contain_ a given string, without having to
append wildcards. Is that on the IM side, or should I dig into Solr?

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Daniela Butano-2
Hi Sam,
not sure if you can configure InterMine to change the behaviour
Arunan, could you help us on this?
Thanks

Daniela

> I've updated LegumeMine and we're using the new gigantic LIS
> identifiers for genes, like glyma.Wm82.gnm2.ann1.Glyma.08G189800. But
> I'd like someone to find that gene if they just type "Glyma.08G189800"
> into the search without asterisks ... which isn't happening. I have to
> search on "*Glyma.08G189800". (The secondaryIdentifier also has a
> prefix: "glyma.Glyma.08G189800".)
>
> It seems to me there is a setting somewhere that tells the search to
> return results that _contain_ a given string, without having to append
> wildcards. Is that on the IM side, or should I dig into Solr?
>
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Paulo Nuin
Hi Sam

I would say SOLR might be a good alternative to explore, not knowing how InterMine setup the indexes in it. There might be some alternate way to configure the server to do partial string searches


Don’t know if that’s the solution, but might be a way to explore it. I have been getting errors or long searches timeouts on some WormMine terms, but under BlueGenes.

Cheers

Paulo


On Mar 5, 2020, at 10:20 AM, [hidden email] wrote:

Hi Sam,
not sure if you can configure InterMine to change the behaviour
Arunan, could you help us on this?
Thanks

Daniela

I've updated LegumeMine and we're using the new gigantic LIS
identifiers for genes, like glyma.Wm82.gnm2.ann1.Glyma.08G189800. But
I'd like someone to find that gene if they just type "Glyma.08G189800"
into the search without asterisks ... which isn't happening. I have to
search on "*Glyma.08G189800". (The secondaryIdentifier also has a
prefix: "glyma.Glyma.08G189800".)
It seems to me there is a setting somewhere that tells the search to
return results that _contain_ a given string, without having to append
wildcards. Is that on the IM side, or should I dig into Solr?
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Arunan Sugunakumar
In reply to this post by Daniela Butano-2
Hi all,

Sorry for the delayed response, I missed this email. In Solr there are ways to achieve this, but our current implementation in Intermine does not support it. The Tokenizers and Analyzers we have used for indexing the data are currently fixed. This was done with the intention to replicate the behaviour of the old Lucene library as much as possible. Maybe we can do this as an improvement in the future. We have to test out other Tokenizers properly before using them since it would change the search results significantly.

Regards,
Arunan

On Thu, 5 Mar 2020 at 22:50, <[hidden email]> wrote:
Hi Sam,
not sure if you can configure InterMine to change the behaviour
Arunan, could you help us on this?
Thanks

Daniela

> I've updated LegumeMine and we're using the new gigantic LIS
> identifiers for genes, like glyma.Wm82.gnm2.ann1.Glyma.08G189800. But
> I'd like someone to find that gene if they just type "Glyma.08G189800"
> into the search without asterisks ... which isn't happening. I have to
> search on "*Glyma.08G189800". (The secondaryIdentifier also has a
> prefix: "glyma.Glyma.08G189800".)
>
> It seems to me there is a setting somewhere that tells the search to
> return results that _contain_ a given string, without having to append
> wildcards. Is that on the IM side, or should I dig into Solr?
>
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Daniela Butano-2
Thanks Arunana,
I will add a ticket (as improvement) on github
Daniela

> Hi all,
>
> Sorry for the delayed response, I missed this email. In Solr there are
> ways to achieve this, but our current implementation in Intermine does
> not support it. The Tokenizers and Analyzers we have used for indexing
> the data are currently fixed [1]. This was done with the intention to
> replicate the behaviour of the old Lucene library as much as possible.
> Maybe we can do this as an improvement in the future. We have to test
> out other Tokenizers properly before using them since it would change
> the search results significantly.
>
> Regards,
> Arunan
>
> On Thu, 5 Mar 2020 at 22:50, <[hidden email]> wrote:
>
>> Hi Sam,
>> not sure if you can configure InterMine to change the behaviour
>> Arunan, could you help us on this?
>> Thanks
>>
>> Daniela
>>
>>> I've updated LegumeMine and we're using the new gigantic LIS
>>> identifiers for genes, like glyma.Wm82.gnm2.ann1.Glyma.08G189800.
>> But
>>> I'd like someone to find that gene if they just type
>> "Glyma.08G189800"
>>> into the search without asterisks ... which isn't happening. I
>> have to
>>> search on "*Glyma.08G189800". (The secondaryIdentifier also has a
>>> prefix: "glyma.Glyma.08G189800".)
>>>
>>> It seems to me there is a setting somewhere that tells the search
>> to
>>> return results that _contain_ a given string, without having to
>> append
>>> wildcards. Is that on the IM side, or should I dig into Solr?
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [hidden email]
>>> https://lists.intermine.org/mailman/listinfo/dev
>
>
> Links:
> ------
> [1]
> https://github.com/intermine/intermine/blob/dev/intermine/api/src/main/java/org/intermine/api/searchengine/solr/SolrIndexHandler.java#L295

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Sam Hokin-3
In reply to this post by Paulo Nuin
So the solution Paulo suggested works great. It took me just a little bit to figure out, but it's easy.
----------
1. ADD the following to /var/solr/data/[mine]-search/conf/managed-schema (this example implements it for hits against
Gene.primaryIdentifier and Gene.secondaryIdentifier):

<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
         <analyzer type="index">
                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                 <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="50"/>
                 <filter class="solr.LowerCaseFilterFactory"/>
         </analyzer>
         <analyzer type="query">
                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                 <filter class="solr.LowerCaseFilterFactory"/>
         </analyzer>
</fieldType>
<field name="gene_primaryidentifier" type="text_ngram" indexed="true" stored="true"/>
<field name="gene_secondaryidentifier" type="text_ngram" indexed="true" stored="true"/>

2. REMOVE the gene_primaryidentifier and gene_secondaryidentifier field definitions from the earlier part of the file. They look
like this:

<field name="gene_primaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>
<field name="gene_secondaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>

3. RESTART Solr to load the new config, e.g. under System V:

# systemctl restart solr

4. REBUILD the search index using the Solr-related postprocesses:

./gradlew postprocess -Pprocess=create-search-index
----------

Partial search hits will now work against the fields that you set with type="text_ngram" in the mine's Solr config.
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Paulo Nuin
Hi Sam

Great to know that it worked, I am thinking of applying a similar approach on WormMine.

Cheers

Paulo



> On Mar 16, 2020, at 2:19 PM, Sam Hokin <[hidden email]> wrote:
>
> So the solution Paulo suggested works great. It took me just a little bit to figure out, but it's easy.
> ----------
> 1. ADD the following to /var/solr/data/[mine]-search/conf/managed-schema (this example implements it for hits against Gene.primaryIdentifier and Gene.secondaryIdentifier):
>
> <fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
>        <analyzer type="index">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="50"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
> </fieldType>
> <field name="gene_primaryidentifier" type="text_ngram" indexed="true" stored="true"/>
> <field name="gene_secondaryidentifier" type="text_ngram" indexed="true" stored="true"/>
>
> 2. REMOVE the gene_primaryidentifier and gene_secondaryidentifier field definitions from the earlier part of the file. They look like this:
>
> <field name="gene_primaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>
> <field name="gene_secondaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>
>
> 3. RESTART Solr to load the new config, e.g. under System V:
>
> # systemctl restart solr
>
> 4. REBUILD the search index using the Solr-related postprocesses:
>
> ./gradlew postprocess -Pprocess=create-search-index
> ----------
>
> Partial search hits will now work against the fields that you set with type="text_ngram" in the mine's Solr config.
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Daniela Butano-2
In reply to this post by Sam Hokin-3
Thanks Sam!
I'm glad it works :)

> So the solution Paulo suggested works great. It took me just a little
> bit to figure out, but it's easy.
> ----------
> 1. ADD the following to
> /var/solr/data/[mine]-search/conf/managed-schema (this example
> implements it for hits against Gene.primaryIdentifier and
> Gene.secondaryIdentifier):
>
> <fieldType name="text_ngram" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.NGramFilterFactory"
> minGramSize="1" maxGramSize="50"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
> </fieldType>
> <field name="gene_primaryidentifier" type="text_ngram" indexed="true"
> stored="true"/>
> <field name="gene_secondaryidentifier" type="text_ngram"
> indexed="true" stored="true"/>
>
> 2. REMOVE the gene_primaryidentifier and gene_secondaryidentifier
> field definitions from the earlier part of the file. They look like
> this:
>
> <field name="gene_primaryidentifier" type="analyzed_string"
> multiValued="true" indexed="true" required="false" stored="false"/>
> <field name="gene_secondaryidentifier" type="analyzed_string"
> multiValued="true" indexed="true" required="false" stored="false"/>
>
> 3. RESTART Solr to load the new config, e.g. under System V:
>
> # systemctl restart solr
>
> 4. REBUILD the search index using the Solr-related postprocesses:
>
> ./gradlew postprocess -Pprocess=create-search-index
> ----------
>
> Partial search hits will now work against the fields that you set with
> type="text_ngram" in the mine's Solr config.
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: How to get search to automatically search text that CONTAINS the search string?

Arunan Sugunakumar
In reply to this post by Sam Hokin-3
Hi Sam,

I was under the impression that running the postprocess task would override the types of the fields again. Apparantly it has not.

Glad it worked out :-)

Regards,
Arunan


On Tue, Mar 17, 2020, 01:51 Sam Hokin <[hidden email]> wrote:
So the solution Paulo suggested works great. It took me just a little bit to figure out, but it's easy.
----------
1. ADD the following to /var/solr/data/[mine]-search/conf/managed-schema (this example implements it for hits against
Gene.primaryIdentifier and Gene.secondaryIdentifier):

<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
         <analyzer type="index">
                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                 <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="50"/>
                 <filter class="solr.LowerCaseFilterFactory"/>
         </analyzer>
         <analyzer type="query">
                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                 <filter class="solr.LowerCaseFilterFactory"/>
         </analyzer>
</fieldType>
<field name="gene_primaryidentifier" type="text_ngram" indexed="true" stored="true"/>
<field name="gene_secondaryidentifier" type="text_ngram" indexed="true" stored="true"/>

2. REMOVE the gene_primaryidentifier and gene_secondaryidentifier field definitions from the earlier part of the file. They look
like this:

<field name="gene_primaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>
<field name="gene_secondaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>

3. RESTART Solr to load the new config, e.g. under System V:

# systemctl restart solr

4. REBUILD the search index using the Solr-related postprocesses:

./gradlew postprocess -Pprocess=create-search-index
----------

Partial search hits will now work against the fields that you set with type="text_ngram" in the mine's Solr config.
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev