Handling of gene-designation in UniProt config

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Handling of gene-designation in UniProt config

Hello IM Devs,

ThaleMine is currently in the process of merging in code changes from upstream InterMine 1.6.6 release. In doing so, we've encountered an issue with the UniProt data loading step.

In ThaleMine, we have the following uniprot configuration (used to establish the relationship between "Protein" and "Gene" entities):
3702.uniqueField = primaryIdentifier
3702.primaryIdentifier.gene-designation = EnsemblPlants
3702.gene-designation = gene ID

Upon a full load of ThaleMine with the new code, we are noticing that for proteins from the Swiss-Prot data set, one subset of the "Proteins" correctly have a reference to "Genes", while another subset is missing that reference.

In our testing, we've made sure to use the same version of UniProt data that was utilized when building ThaleMine based on InterMine 1.6.5 release (which is what we have on our current production instance).

Have any of the other mines using the UniProt converter experienced this particular problem?

Thank you.


dev mailing list
[hidden email]