64 bit object id's

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

64 bit object id's

joe carlson
Hi Julie and team,

An old issue is rearing its ugly head.

I'm loading another mine. And the object id has rolled past 2^31 and
gone negative. I do not have 2^31 objects in the db -  it is about 2^30.
There are enough holes in the object id that I've pushed things up a
bit. (It looks a little weird to me that there are this many holes)

In the past I've had negative object id's in the past with no problems.
It looks like there have been some changes in the code such that the
webapp does not like negative id's right now. But I think I can fix that
for now. And for the next build I need to think of what I can store as
simple objects rather than intermine objects.

The reason I'm worried is for our mid to long term plans. We're hoping
to scale our dataset by at least a factor of 5 or so in the near term.
And even with a more efficient use of object id's, the 32 bit counter is
going to be a serious problem. Was it on your plans to have a 64 bit
counter? How tough of a problem do you think that would be?

Thanks,

Joe Carlson

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: 64 bit object id's

Justin Clark-Casey
Hi Joe.  Julie asked me to take an initial think about this.

Basically, I think this probably involves

1) Changing all the object ID columns in the DB to be bigint rather than int
2) Changing all the Java references dealing with object ids to be long rather than int.

Step 2 is the more time-consuming I imagine - I don't have a detailed knowledge yet of how far internal object IDs propagate in InterMine Java code.  But it
shouldn't be insurmountable.

Regarding storage, I imagine requirements would increase by a non-trivial amount.  But I would hope storage size is not a critical issue, disk being cheap.

Regarding memory usage, this will also increase.  This might be more of an issue than storage since I know some people have bumped against memory limits.  A
case of suck it and see.

Regarding performance, the PostgreSQL docs themselves says that int is definitely faster [1], though I haven't yet found any detailed information.  I think this
would be another case of suck it and see.

So I think one could either

1) Make the necessary changes for bigint in a branch and do performance testing.  Once done I don't think a branch would be that hard to maintain as the
data-side internals of InterMine are not mutating at a fast rate presently.  If things prove out then integrate the branch down the line.

2) Find and fix the issue with using negative IDs and officially support this in InterMine.  However, relying on rollover makes me distinctly queasy.  It might
be better to start the ID counter at -2^31 rather than 0.

In the short term, I would lean towards option 2, though this might just be punting the problem of very large warehouses down the road.

[1] http://www.postgresql.org/docs/9.1/static/datatype-numeric.html

--
Justin Clark-Casey, Synbiomine/InterMine Developer
http://synbiomine.org
http://twitter.com/justincc

On 17/11/15 18:42, Joe Carlson wrote:

> Hi Julie and team,
>
> An old issue is rearing its ugly head.
>
> I'm loading another mine. And the object id has rolled past 2^31 and gone negative. I do not have 2^31 objects in the db -  it is about 2^30. There are enough
> holes in the object id that I've pushed things up a bit. (It looks a little weird to me that there are this many holes)
>
> In the past I've had negative object id's in the past with no problems. It looks like there have been some changes in the code such that the webapp does not
> like negative id's right now. But I think I can fix that for now. And for the next build I need to think of what I can store as simple objects rather than
> intermine objects.
>
> The reason I'm worried is for our mid to long term plans. We're hoping to scale our dataset by at least a factor of 5 or so in the near term. And even with a
> more efficient use of object id's, the 32 bit counter is going to be a serious problem. Was it on your plans to have a 64 bit counter? How tough of a problem do
> you think that would be?
>
> Thanks,
>
> Joe Carlson
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: 64 bit object id's

Justin Clark-Casey
Julie pointed out [1] to me where this has been looked at before and turned out to be more work than I optimistically wrote below :).  Probably best to continue
discussion there.

[1] https://github.com/intermine/intermine/issues/533

--
Justin Clark-Casey, Synbiomine/InterMine Developer
http://synbiomine.org
http://twitter.com/justincc

On 18/11/15 14:22, Justin Clark-Casey wrote:

> Hi Joe.  Julie asked me to take an initial think about this.
>
> Basically, I think this probably involves
>
> 1) Changing all the object ID columns in the DB to be bigint rather than int
> 2) Changing all the Java references dealing with object ids to be long rather than int.
>
> Step 2 is the more time-consuming I imagine - I don't have a detailed knowledge yet of how far internal object IDs propagate in InterMine Java code.  But it
> shouldn't be insurmountable.
>
> Regarding storage, I imagine requirements would increase by a non-trivial amount.  But I would hope storage size is not a critical issue, disk being cheap.
>
> Regarding memory usage, this will also increase.  This might be more of an issue than storage since I know some people have bumped against memory limits.  A
> case of suck it and see.
>
> Regarding performance, the PostgreSQL docs themselves says that int is definitely faster [1], though I haven't yet found any detailed information.  I think this
> would be another case of suck it and see.
>
> So I think one could either
>
> 1) Make the necessary changes for bigint in a branch and do performance testing.  Once done I don't think a branch would be that hard to maintain as the
> data-side internals of InterMine are not mutating at a fast rate presently.  If things prove out then integrate the branch down the line.
>
> 2) Find and fix the issue with using negative IDs and officially support this in InterMine.  However, relying on rollover makes me distinctly queasy.  It might
> be better to start the ID counter at -2^31 rather than 0.
>
> In the short term, I would lean towards option 2, though this might just be punting the problem of very large warehouses down the road.
>
> [1] http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
>
> --
> Justin Clark-Casey, Synbiomine/InterMine Developer
> http://synbiomine.org
> http://twitter.com/justincc
>
> On 17/11/15 18:42, Joe Carlson wrote:
>> Hi Julie and team,
>>
>> An old issue is rearing its ugly head.
>>
>> I'm loading another mine. And the object id has rolled past 2^31 and gone negative. I do not have 2^31 objects in the db -  it is about 2^30. There are enough
>> holes in the object id that I've pushed things up a bit. (It looks a little weird to me that there are this many holes)
>>
>> In the past I've had negative object id's in the past with no problems. It looks like there have been some changes in the code such that the webapp does not
>> like negative id's right now. But I think I can fix that for now. And for the next build I need to think of what I can store as simple objects rather than
>> intermine objects.
>>
>> The reason I'm worried is for our mid to long term plans. We're hoping to scale our dataset by at least a factor of 5 or so in the near term. And even with a
>> more efficient use of object id's, the 32 bit counter is going to be a serious problem. Was it on your plans to have a 64 bit counter? How tough of a problem do
>> you think that would be?
>>
>> Thanks,
>>
>> Joe Carlson
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: 64 bit object id's

joe carlson
In reply to this post by Justin Clark-Casey
Hi Justin,

Thanks for the quick reply.

I was thinking about this a bit myself after sending the email.

I maybe incorrect, but aren’t the object id’s all Integer class variables and not primitives? If the performance hit is not too bad, then making a wrapper ObjectId class to capture the type should do it. There conceivably could be some sort of compile-time configuration of what we want or an ant task to generate ObjectId.java based on a setting. At least people would have the noise as to whether they would take the performance/memory hit.

Here we really want to have a C-style typedef and #ifdef. Can this be handled by annotations now? I thought the javanistas were hesitant about putting this in the language since we all know that writing a wrapper class has no performance penalty.

I may try an experiment with a wrapper class. I’ll let you know.

Joe


> On Nov 18, 2015, at 6:22 AM, Justin Clark-Casey <[hidden email]> wrote:
>
> Hi Joe.  Julie asked me to take an initial think about this.
>
> Basically, I think this probably involves
>
> 1) Changing all the object ID columns in the DB to be bigint rather than int
> 2) Changing all the Java references dealing with object ids to be long rather than int.
>
> Step 2 is the more time-consuming I imagine - I don't have a detailed knowledge yet of how far internal object IDs propagate in InterMine Java code.  But it shouldn't be insurmountable.
>
> Regarding storage, I imagine requirements would increase by a non-trivial amount.  But I would hope storage size is not a critical issue, disk being cheap.
>
> Regarding memory usage, this will also increase.  This might be more of an issue than storage since I know some people have bumped against memory limits.  A case of suck it and see.
>
> Regarding performance, the PostgreSQL docs themselves says that int is definitely faster [1], though I haven't yet found any detailed information.  I think this would be another case of suck it and see.
>
> So I think one could either
>
> 1) Make the necessary changes for bigint in a branch and do performance testing.  Once done I don't think a branch would be that hard to maintain as the data-side internals of InterMine are not mutating at a fast rate presently.  If things prove out then integrate the branch down the line.
>
> 2) Find and fix the issue with using negative IDs and officially support this in InterMine.  However, relying on rollover makes me distinctly queasy.  It might be better to start the ID counter at -2^31 rather than 0.
>
> In the short term, I would lean towards option 2, though this might just be punting the problem of very large warehouses down the road.
>
> [1] http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
>
> --
> Justin Clark-Casey, Synbiomine/InterMine Developer
> http://synbiomine.org
> http://twitter.com/justincc
>
> On 17/11/15 18:42, Joe Carlson wrote:
>> Hi Julie and team,
>>
>> An old issue is rearing its ugly head.
>>
>> I'm loading another mine. And the object id has rolled past 2^31 and gone negative. I do not have 2^31 objects in the db -  it is about 2^30. There are enough
>> holes in the object id that I've pushed things up a bit. (It looks a little weird to me that there are this many holes)
>>
>> In the past I've had negative object id's in the past with no problems. It looks like there have been some changes in the code such that the webapp does not
>> like negative id's right now. But I think I can fix that for now. And for the next build I need to think of what I can store as simple objects rather than
>> intermine objects.
>>
>> The reason I'm worried is for our mid to long term plans. We're hoping to scale our dataset by at least a factor of 5 or so in the near term. And even with a
>> more efficient use of object id's, the 32 bit counter is going to be a serious problem. Was it on your plans to have a 64 bit counter? How tough of a problem do
>> you think that would be?
>>
>> Thanks,
>>
>> Joe Carlson
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: 64 bit object id's

Justin Clark-Casey
The IDs do use the Integer wrapper.  I expect this is because they're used in a lot of generics.

I doubt that a further ObjectId wrapper would work out.  Apart from the extra overhead, I think you would still come to the point where you actually had to get
it out as either an Integer/int or a Long/long.  As you say, there's no #ifdef approach in Java that would allow this, I think you'd have to do something
super-messy.  You also have to switch between calls such as java.sql.ResultSet.getInt()/getLong().  All this adds up to more complexity and higher codebase
maintenance costs.

Personally, I would favour an approach of simply converting Integer ID references to Longs.  I would be optimistic that the memory and performance tradeoffs are
okay, but one really does have to just try this and see.  One has to overcome the issues that Richard hit in [1] but hopefully these aren't insurmountable.

But I'd be very interested to hear about any experiments you do with a wrapper class.  I may well take a look at the Long conversion approach myself.  If I do,
I'll let you know on [1].

[1] https://github.com/intermine/intermine/issues/533

--
Justin Clark-Casey, Synbiomine/InterMine Developer
http://synbiomine.org
http://twitter.com/justincc

On 18/11/15 18:14, Joe Carlson wrote:

> Hi Justin,
>
> Thanks for the quick reply.
>
> I was thinking about this a bit myself after sending the email.
>
> I maybe incorrect, but aren’t the object id’s all Integer class variables and not primitives? If the performance hit is not too bad, then making a wrapper ObjectId class to capture the type should do it. There conceivably could be some sort of compile-time configuration of what we want or an ant task to generate ObjectId.java based on a setting. At least people would have the noise as to whether they would take the performance/memory hit.
>
> Here we really want to have a C-style typedef and #ifdef. Can this be handled by annotations now? I thought the javanistas were hesitant about putting this in the language since we all know that writing a wrapper class has no performance penalty.
>
> I may try an experiment with a wrapper class. I’ll let you know.
>
> Joe
>
>
>> On Nov 18, 2015, at 6:22 AM, Justin Clark-Casey <[hidden email]> wrote:
>>
>> Hi Joe.  Julie asked me to take an initial think about this.
>>
>> Basically, I think this probably involves
>>
>> 1) Changing all the object ID columns in the DB to be bigint rather than int
>> 2) Changing all the Java references dealing with object ids to be long rather than int.
>>
>> Step 2 is the more time-consuming I imagine - I don't have a detailed knowledge yet of how far internal object IDs propagate in InterMine Java code.  But it shouldn't be insurmountable.
>>
>> Regarding storage, I imagine requirements would increase by a non-trivial amount.  But I would hope storage size is not a critical issue, disk being cheap.
>>
>> Regarding memory usage, this will also increase.  This might be more of an issue than storage since I know some people have bumped against memory limits.  A case of suck it and see.
>>
>> Regarding performance, the PostgreSQL docs themselves says that int is definitely faster [1], though I haven't yet found any detailed information.  I think this would be another case of suck it and see.
>>
>> So I think one could either
>>
>> 1) Make the necessary changes for bigint in a branch and do performance testing.  Once done I don't think a branch would be that hard to maintain as the data-side internals of InterMine are not mutating at a fast rate presently.  If things prove out then integrate the branch down the line.
>>
>> 2) Find and fix the issue with using negative IDs and officially support this in InterMine.  However, relying on rollover makes me distinctly queasy.  It might be better to start the ID counter at -2^31 rather than 0.
>>
>> In the short term, I would lean towards option 2, though this might just be punting the problem of very large warehouses down the road.
>>
>> [1] http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
>>
>> --
>> Justin Clark-Casey, Synbiomine/InterMine Developer
>> http://synbiomine.org
>> http://twitter.com/justincc
>>
>> On 17/11/15 18:42, Joe Carlson wrote:
>>> Hi Julie and team,
>>>
>>> An old issue is rearing its ugly head.
>>>
>>> I'm loading another mine. And the object id has rolled past 2^31 and gone negative. I do not have 2^31 objects in the db -  it is about 2^30. There are enough
>>> holes in the object id that I've pushed things up a bit. (It looks a little weird to me that there are this many holes)
>>>
>>> In the past I've had negative object id's in the past with no problems. It looks like there have been some changes in the code such that the webapp does not
>>> like negative id's right now. But I think I can fix that for now. And for the next build I need to think of what I can store as simple objects rather than
>>> intermine objects.
>>>
>>> The reason I'm worried is for our mid to long term plans. We're hoping to scale our dataset by at least a factor of 5 or so in the near term. And even with a
>>> more efficient use of object id's, the 32 bit counter is going to be a serious problem. Was it on your plans to have a 64 bit counter? How tough of a problem do
>>> you think that would be?
>>>
>>> Thanks,
>>>
>>> Joe Carlson
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [hidden email]
>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev