Quantcast

Duplicate records error, integration based solution?

classic Classic list List threaded Threaded
3 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Duplicate records error, integration based solution?

JD Wong
Hi dev,

When I add an items file to my mine I get this error:
java.lang.RuntimeException: Exception while dataloading - to allow multiple errors, set the property "dataLoader.allowMultipleErrors" to true
Problem while loading item identifier 164_198 because
Duplicate objects from the same data source; o1 = "Gene:89000137" (in database), o2 = "Gene:89000139" (in database), source1 = "<Source: name="fb-transcripts", type="null", skeleton=false>", source2 = "<Source: name="fb-transcripts", type="null", skeleton=false>"

Triggered by this record (which is being loaded):

<item id="164_198" class="Gene">
  
<attribute name="primaryIdentifier" value="FBgn0024983"/>
....


... because these had been loaded already by another source (trimmed for simplicity):

<item id="163_69" class="Transcript">
  
<attribute name="primaryIdentifier" value="FBtr0070089"/>
   <reference name="gene" ref_id="164_69"/>
</item>
<item id="164_69" class="Gene">
  
<attribute name="primaryIdentifier" value="FBgn0024983"/>
</item>
<item id="163_70" class="Transcript">
  
<attribute name="primaryIdentifier" value="FBtr0070090"/>
   <reference name="gene" ref_id="164_70"/>
</item>
<item id="164_70" class="Gene">
  
<attribute name="primaryIdentifier" value="FBgn0024983"/>
</item>


Is there any way to have intermine automatically combine items like 164_69 and 164_70 into a single item referenced by both "Transcript" records shown?  A post processing command perhaps?

Cheers,
-JD


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Duplicate records error, integration based solution?

Julie Sullivan
No, currently there is no mechanism in InterMine to handle duplicates in an
items XML file.  Any `processing` of the items has to be handled in the code
that accesses the Items API and generates the XML.

On 07/11/11 21:36, JD Wong wrote:

> Hi dev,
>
> When I add an items file to my mine I get this error:
> java.lang.RuntimeException: Exception while dataloading - to allow multiple
> errors, set the property "dataLoader.allowMultipleErrors" to true
> Problem while loading item identifier *164_198* because
> Duplicate objects from the same data source; o1 = "Gene:89000137" (in
> database), o2 = "Gene:89000139" (in database), source1 = "<Source:
> name="fb-transcripts", type="null", skeleton=false>", source2 ="<Source:
> name="fb-transcripts", type="null", skeleton=false>"
>
> Triggered by this record (which is being loaded):
>
> <item id="*164_198*" class="Gene">
>     <attribute name="primaryIdentifier" value="*FBgn0024983*"/>
> ....
>
> ... because these had been loaded already by another source (trimmed for
> simplicity):
>
> <item id="163_69" class="Transcript">
>     <attribute name="primaryIdentifier" value="FBtr0070089"/>
>     <reference name="gene" ref_id="164_69"/>
> </item>
> <item id="164_69" class="Gene">
>     <attribute name="primaryIdentifier" value="*FBgn0024983*"/>
> </item>
> <item id="163_70" class="Transcript">
>     <attribute name="primaryIdentifier" value="FBtr0070090"/>
>     <reference name="gene" ref_id="164_70"/>
> </item>
> <item id="164_70" class="Gene">
>     <attribute name="primaryIdentifier" value="*FBgn0024983*"/>
> </item>
>
> Is there any way to have intermine automatically combine items like 164_69
> and 164_70 into a single item referenced by both "Transcript" records
> shown?  A post processing command perhaps?
>
> Cheers,
> -JD
>
>
>
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Duplicate records error, integration based solution?

Richard Smith
It looks like where there are two transcripts for a gene your code is
creating the gene each time.  Instead it will need to keep a map from
FlyBase identifier to item identifier to use the same item each time:

        FBgn0024983 => 164_69

Cheers,
Richard.





On 08/11/2011 09:42, Julie Sullivan wrote:

> No, currently there is no mechanism in InterMine to handle duplicates in
> an items XML file. Any `processing` of the items has to be handled in
> the code that accesses the Items API and generates the XML.
>
> On 07/11/11 21:36, JD Wong wrote:
>> Hi dev,
>>
>> When I add an items file to my mine I get this error:
>> java.lang.RuntimeException: Exception while dataloading - to allow
>> multiple
>> errors, set the property "dataLoader.allowMultipleErrors" to true
>> Problem while loading item identifier *164_198* because
>> Duplicate objects from the same data source; o1 = "Gene:89000137" (in
>> database), o2 = "Gene:89000139" (in database), source1 = "<Source:
>> name="fb-transcripts", type="null", skeleton=false>", source2 ="<Source:
>> name="fb-transcripts", type="null", skeleton=false>"
>>
>> Triggered by this record (which is being loaded):
>>
>> <item id="*164_198*" class="Gene">
>> <attribute name="primaryIdentifier" value="*FBgn0024983*"/>
>> ....
>>
>> ... because these had been loaded already by another source (trimmed for
>> simplicity):
>>
>> <item id="163_69" class="Transcript">
>> <attribute name="primaryIdentifier" value="FBtr0070089"/>
>> <reference name="gene" ref_id="164_69"/>
>> </item>
>> <item id="164_69" class="Gene">
>> <attribute name="primaryIdentifier" value="*FBgn0024983*"/>
>> </item>
>> <item id="163_70" class="Transcript">
>> <attribute name="primaryIdentifier" value="FBtr0070090"/>
>> <reference name="gene" ref_id="164_70"/>
>> </item>
>> <item id="164_70" class="Gene">
>> <attribute name="primaryIdentifier" value="*FBgn0024983*"/>
>> </item>
>>
>> Is there any way to have intermine automatically combine items like
>> 164_69
>> and 164_70 into a single item referenced by both "Transcript" records
>> shown? A post processing command perhaps?
>>
>> Cheers,
>> -JD
>>
>>
>>
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Loading...