I have an obscure question about a minor point in the code.
I'm all about trying to speed up loading. My mine is getting even larger and I'm working on trying to trim time whenever I can. In this latest build I'm trying to avoid using the tracking table whenever I can; this saves some space in addition to time.
What I found is that if I am not tracking an object class and try to do an integration of two data sources that store objects of the same class, I run into a build exception. It is triggered from lines 571 to 574 in org.intermine.dataloader.IntegrationWriterDataTrackingImpl:
Since the object has no information in the tracking table, lastSource is indeed null. In the specific case I see, I've entered the attributes for the proteins in a previous loading step. In a later step I need to make a reference to the protein. I create the new object and specify the primary key. When it's time to merge, the integration loader properly determines the correct object id and merges the newly created object with the existing one. But not having a source for the previous object triggers this exception.
I've looked at the history in GibHub and Richard Smith made a comment that this was added to prevent the tracking table from entering info if there are nulls in the columns so we're not keeping a history of data loaders that do not affect attributes. But that really doesn't seem to be the case here.
I've commented this check in my own code and successfully did a
big integration step. And not dealing with the tracking table
certainly helped. But I'm worried if I may be missing something.
Does anyone have any experience with things of this nature?
I've not tried to see what happens if my new object has conflicting information in fields other than the primary key. Or if the field in the new object is a collection. I'm just looking at the case of a simple attribute field.
dev mailing list
I don't have much experience with things of this nature, but have you thought to add a check if the data tracker is enabled instead of commenting the code?
On 28/03/2019 15:42, Joe Carlson wrote:
dev mailing list
|Free forum by Nabble||Edit this page|