requested indexes for Chado

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

requested indexes for Chado

Scott Cain
Hi Karl,

I'm going through the requested additions for Chado, and I have a question for you about this index:

create index feature_idx1b
 on feature (feature_id, dbxref_id)
 where dbxref_id is not null;


In what context do you imagine this index being helpful?  I wonder for two reasons: one, dbxref_id is defined as not null for the table, so the where clause doesn't appear to do anything, and B, the majority of the time, you would be querying for one item (either the feature_id or the dbxref_id) in order to get a list of the other items, in which case the existing indexes on the individual foreign keys would be used.  When would I be querying for both at the same time such that this index would help?

Thanks,
Scott


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: requested indexes for Chado

Karl O. Pinc
Hi Scott,

On Tue, 19 Jan 2016 11:26:55 -0500
Scott Cain <[hidden email]> wrote:

> I'm going through the requested additions for Chado, and I have a
> question for you about this index:
>
> create index feature_idx1b
>  on feature (feature_id, dbxref_id)
>  where dbxref_id is not null;
>
> In what context do you imagine this index being helpful?  I wonder
> for two reasons: one, dbxref_id is defined as not null for the table,
> so the where clause doesn't appear to do anything,

feature.dbxref_id is allowed to be NULL, as far as I can tell.
It's allowed to be null in our version of chado, and
see line 22 of
https://github.com/GMOD/Chado/blob/master/chado/modules/sequence/sequence.sql

If you don't say "not null" then you're allowed nulls.

> and B, the
> majority of the time, you would be querying for one item (either the
> feature_id or the dbxref_id) in order to get a list of the other
> items, in which case the existing indexes on the individual foreign
> keys would be used.  When would I be querying for both at the same
> time such that this index would help?

I can't tell you off the top of my head.  I'm pretty sure we
had to add the index to get something we do that's important
to perform decently -- or at all.  Our revison control system
says we made the index to support our VCF loading process,
which can take days and days.  For some reason the optimizer
wanted the index.  We spent some time doing this sort of
tuning.

One use case that's off the top of my head is when joining feature
just to get a dbxref_id.  You don't really want anything on feature,
but you've got the feature_id and want something on dbxref.
In our case it's possible that we might have both; having
made a dbxref and having looked up a feature we want to know if they
are paired so that we can make a new feature to go with the
dbxref if necessary.  (With VCF files I think we are creating
features for each baboon sequenced, although not for each
sequencing of each baboon.  But it's been a while since
I've touched this and could be wrong.  Anyway, one
loci could have several features with different dbxrefs --
the dbxref.accession containing the baboon's identifier.)

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: requested indexes for Chado

Scott Cain
Hi Karl,

Sorry, that's what I get for looking at code changes first thing in the morning with very little sleep the night before.  For some reason, I thought this index was for feature_dbxref.  It makes much more sense as an index on feature :-)

Thanks,
Scott


On Tue, Jan 19, 2016 at 12:14 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Scott,

On Tue, 19 Jan 2016 11:26:55 -0500
Scott Cain <[hidden email]> wrote:

> I'm going through the requested additions for Chado, and I have a
> question for you about this index:
>
> create index feature_idx1b
>  on feature (feature_id, dbxref_id)
>  where dbxref_id is not null;
>
> In what context do you imagine this index being helpful?  I wonder
> for two reasons: one, dbxref_id is defined as not null for the table,
> so the where clause doesn't appear to do anything,

feature.dbxref_id is allowed to be NULL, as far as I can tell.
It's allowed to be null in our version of chado, and
see line 22 of
https://github.com/GMOD/Chado/blob/master/chado/modules/sequence/sequence.sql

If you don't say "not null" then you're allowed nulls.

> and B, the
> majority of the time, you would be querying for one item (either the
> feature_id or the dbxref_id) in order to get a list of the other
> items, in which case the existing indexes on the individual foreign
> keys would be used.  When would I be querying for both at the same
> time such that this index would help?

I can't tell you off the top of my head.  I'm pretty sure we
had to add the index to get something we do that's important
to perform decently -- or at all.  Our revison control system
says we made the index to support our VCF loading process,
which can take days and days.  For some reason the optimizer
wanted the index.  We spent some time doing this sort of
tuning.

One use case that's off the top of my head is when joining feature
just to get a dbxref_id.  You don't really want anything on feature,
but you've got the feature_id and want something on dbxref.
In our case it's possible that we might have both; having
made a dbxref and having looked up a feature we want to know if they
are paired so that we can make a new feature to go with the
dbxref if necessary.  (With VCF files I think we are creating
features for each baboon sequenced, although not for each
sequencing of each baboon.  But it's been a while since
I've touched this and could be wrong.  Anyway, one
loci could have several features with different dbxrefs --
the dbxref.accession containing the baboon's identifier.)

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Loading...