Setting a two part key

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Setting a two part key

JD Wong
Hi all,

I'm having errors loading gff3 files because two sources refer to chromosomes with identical names.  It looks as if InterMine is using these names as it's primary key, and confusing them together.  

The error text:
Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", genus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism [commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in the genomic_priorities.properties file

My question is how do I have InterMine use a two part key?


I've tried:
In sourcename/resources/sourname_keys.properties
Chromosome.key = primaryIdentifier, organism

for good measure
in dbmodel/resources/genomic_keyDefs.properties
Chromosome.key = primaryIdentifier, organism

setting these both still causes this conflict, has anyone else faced difficulty with this?

Thanks!
-JD

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Setting a two part key

Jayaraman, Pushkala

Have you tried to give it a unique name

Such as

Chromosome.key_primaryIdentifier_organism

 

Instead of just Chromosome.key

 

That’s what I have in my mine:

Chromosome.key_primaryidentifier_organism=primaryIdentifier, organism

 

 

 

DataSet.key = name

DataSource.key = name

SOTerm.key = name, ontology

Organism.key = taxonId

Ontology.key = name

Publication.key = pubMedId

Gene.key_primaryidentifier=primaryIdentifier

Gene.key_symbol_org_secondaryidentifier=symbol, organism, secondaryIdentifier

#Gene.key_secondaryidentifier=secondaryIdentifier

OMIM.key_primaryidentifier=primaryIdentifier

Protein.key_primaryaccession=primaryAccession

QTL.key_primaryidentifier=primaryIdentifier

SimpleSequenceLengthVariation.key_primaryidentifier=primaryIdentifier

mRNA.key_primaryidentifier=primaryIdentifier

Exon.key_primaryidentifier=primaryIdentifier

FivePrimeUTR.key_primaryidentifier=primaryIdentifier

ThreePrimeUTR.key_primaryidentifier=primaryIdentifier

Congenic.key_primaryidentifier=primaryIdentifier

Chromosome.key_primaryidentifier_organism=primaryIdentifier, organism

Chromosome.key_primaryidentifier=primaryIdentifier

OntologyTerm.key_name_ontology=name, ontology

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of JD Wong
Sent: Tuesday, September 24, 2013 6:25 PM
To: Intermine Developer List
Subject: [InterMine Dev] Setting a two part key

 

Hi all,

 

I'm having errors loading gff3 files because two sources refer to chromosomes with identical names.  It looks as if InterMine is using these names as it's primary key, and confusing them together.  

 

The error text:

Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", genus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism [commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in the genomic_priorities.properties file

 

My question is how do I have InterMine use a two part key?

 

 

I've tried:

In sourcename/resources/sourname_keys.properties

Chromosome.key = primaryIdentifier, organism

 

for good measure

in dbmodel/resources/genomic_keyDefs.properties

Chromosome.key = primaryIdentifier, organism

 

setting these both still causes this conflict, has anyone else faced difficulty with this?

 

Thanks!

-JD


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Setting a two part key

JD Wong
In reply to this post by JD Wong
Thank you Pushkala and Steven,

I tried both your suggestions and unfortunately neither of them have worked.  I've included my genomic_keyDefs and source key file for good measure:

genomic_keyDefs.properties:

# BioEntity is abstract
Chromosome.key_primaryidentifier = primaryIdentifier, organism
DataSet.key_title = name
DataSource.key_name = name
Exon.key_primaryidentifier = primaryIdentifier
Gene.key_primaryidentifier=primaryIdentifier
Gene.key_secondaryidentifier=secondaryIdentifier, organism
Gene.key_symbol_org=symbol, organism
Location.key_all = locatedOn, feature, end, start, strand
Ontology.key_title=name
OntologyRelation.key=parentTerm, childTerm, relationship
OntologyTerm.key_identifier=identifier
OntologyTerm.key_name_ontology=name, ontology
OntologyTermSynonym.key=name, type
Organism.key_taxonid=taxonId
ProteinDomain.key_identifier = primaryIdentifier
Protein.key_md5checksum_taxonid=md5checksum, organism
Protein.key_primaryacc=primaryAccession
Protein.key_primaryidentifier=primaryIdentifier
Protein.key_secondaryidentifier=secondaryIdentifier
Publication.key_paper = title, pages, issue, journal, year
Publication.key_pubmed = pubMedId
SequenceFeature.key_primaryidentifier=primaryIdentifier, organism
Sequence.key_md5checksum=md5checksum
Synonym.key_synonym = subject, value
Transcript.key_primaryidentifier = primaryIdentifier

wormbase-gff3-core_keys.properties

DataSet.key = name
DataSource.key = name
SOTerm.key = name, ontology
Organism.key = taxonId
Ontology.key = name
Publication.key = pubMedId
BioEntity.key = primaryIdentifier
Gene.key = primaryIdentifier
Transcript.key = primaryIdentifier
Chromosome.key_primaryidentifier_organism = primaryIdentifier, organism
SequenceFeature.key_primaryidentifier=primaryIdentifier, organism

The error:

Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", g
enus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism 
[commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in 
the genomic_priorities.properties file


By all accounts this should work, according to the config the mine should use the organism and primaryIdentifier as the key, which should avoid collisions between species sets ...




On Wed, Sep 25, 2013 at 9:12 AM, Steven Neuhauser <[hidden email]> wrote:

Hi JD,

 

I had exactly the same problem loading a gff filie.

 

I think I resolved the same issue by adding

 

SequenceFeature.key =primaryIdentifier, organism

 

To the genomic_keyDefs.properties file.

 

-Steve

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of JD Wong
Sent: Tuesday, September 24, 2013 7:25 PM
To: Intermine Developer List
Subject: [InterMine Dev] Setting a two part key

 

Hi all,

 

I'm having errors loading gff3 files because two sources refer to chromosomes with identical names.  It looks as if InterMine is using these names as it's primary key, and confusing them together.  

 

The error text:

Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", genus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism [commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in the genomic_priorities.properties file

 

My question is how do I have InterMine use a two part key?

 

 

I've tried:

In sourcename/resources/sourname_keys.properties

Chromosome.key = primaryIdentifier, organism

 

for good measure

in dbmodel/resources/genomic_keyDefs.properties

Chromosome.key = primaryIdentifier, organism

 

setting these both still causes this conflict, has anyone else faced difficulty with this?

 

Thanks!

-JD

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.



_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Setting a two part key

Jayaraman, Pushkala

And why do you not want to add priorities in the genomic_priorities.properties file?

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of JD Wong
Sent: Wednesday, September 25, 2013 10:45 AM
To: Steven Neuhauser; Intermine Developer List
Subject: Re: [InterMine Dev] Setting a two part key

 

Thank you Pushkala and Steven,

 

I tried both your suggestions and unfortunately neither of them have worked.  I've included my genomic_keyDefs and source key file for good measure:

 

genomic_keyDefs.properties:

 

# BioEntity is abstract

Chromosome.key_primaryidentifier = primaryIdentifier, organism

DataSet.key_title = name

DataSource.key_name = name

Exon.key_primaryidentifier = primaryIdentifier

Gene.key_primaryidentifier=primaryIdentifier

Gene.key_secondaryidentifier=secondaryIdentifier, organism

Gene.key_symbol_org=symbol, organism

Location.key_all = locatedOn, feature, end, start, strand

Ontology.key_title=name

OntologyRelation.key=parentTerm, childTerm, relationship

OntologyTerm.key_identifier=identifier

OntologyTerm.key_name_ontology=name, ontology

OntologyTermSynonym.key=name, type

Organism.key_taxonid=taxonId

ProteinDomain.key_identifier = primaryIdentifier

Protein.key_md5checksum_taxonid=md5checksum, organism

Protein.key_primaryacc=primaryAccession

Protein.key_primaryidentifier=primaryIdentifier

Protein.key_secondaryidentifier=secondaryIdentifier

Publication.key_paper = title, pages, issue, journal, year

Publication.key_pubmed = pubMedId

SequenceFeature.key_primaryidentifier=primaryIdentifier, organism

Sequence.key_md5checksum=md5checksum

Synonym.key_synonym = subject, value

Transcript.key_primaryidentifier = primaryIdentifier

 

wormbase-gff3-core_keys.properties

 

DataSet.key = name

DataSource.key = name

SOTerm.key = name, ontology

Organism.key = taxonId

Ontology.key = name

Publication.key = pubMedId

BioEntity.key = primaryIdentifier

Gene.key = primaryIdentifier

Transcript.key = primaryIdentifier

Chromosome.key_primaryidentifier_organism = primaryIdentifier, organism

SequenceFeature.key_primaryidentifier=primaryIdentifier, organism

 

The error:

 

Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", g

enus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism 

[commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in 

the genomic_priorities.properties file

 

 

By all accounts this should work, according to the config the mine should use the organism and primaryIdentifier as the key, which should avoid collisions between species sets ...

 

 

 

On Wed, Sep 25, 2013 at 9:12 AM, Steven Neuhauser <[hidden email]> wrote:

Hi JD,

 

I had exactly the same problem loading a gff filie.

 

I think I resolved the same issue by adding

 

SequenceFeature.key =primaryIdentifier, organism

 

To the genomic_keyDefs.properties file.

 

-Steve

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of JD Wong
Sent: Tuesday, September 24, 2013 7:25 PM
To: Intermine Developer List
Subject: [InterMine Dev] Setting a two part key

 

Hi all,

 

I'm having errors loading gff3 files because two sources refer to chromosomes with identical names.  It looks as if InterMine is using these names as it's primary key, and confusing them together.  

 

The error text:

Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", genus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism [commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in the genomic_priorities.properties file

 

My question is how do I have InterMine use a two part key?

 

 

I've tried:

In sourcename/resources/sourname_keys.properties

Chromosome.key = primaryIdentifier, organism

 

for good measure

in dbmodel/resources/genomic_keyDefs.properties

Chromosome.key = primaryIdentifier, organism

 

setting these both still causes this conflict, has anyone else faced difficulty with this?

 

Thanks!

-JD

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

 


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Setting a two part key

JD Wong
That would bypass the error, but would lead to inaccurate data in the mine, one species annotating another species chromosome.  If intermine were using two part keys instead of just chromosome name, there wouldn't be a collision.


On Wed, Sep 25, 2013 at 7:31 PM, Jayaraman, Pushkala <[hidden email]> wrote:

And why do you not want to add priorities in the genomic_priorities.properties file?

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of JD Wong
Sent: Wednesday, September 25, 2013 10:45 AM
To: Steven Neuhauser; Intermine Developer List
Subject: Re: [InterMine Dev] Setting a two part key

 

Thank you Pushkala and Steven,

 

I tried both your suggestions and unfortunately neither of them have worked.  I've included my genomic_keyDefs and source key file for good measure:

 

genomic_keyDefs.properties:

 

# BioEntity is abstract

Chromosome.key_primaryidentifier = primaryIdentifier, organism

DataSet.key_title = name

DataSource.key_name = name

Exon.key_primaryidentifier = primaryIdentifier

Gene.key_primaryidentifier=primaryIdentifier

Gene.key_secondaryidentifier=secondaryIdentifier, organism

Gene.key_symbol_org=symbol, organism

Location.key_all = locatedOn, feature, end, start, strand

Ontology.key_title=name

OntologyRelation.key=parentTerm, childTerm, relationship

OntologyTerm.key_identifier=identifier

OntologyTerm.key_name_ontology=name, ontology

OntologyTermSynonym.key=name, type

Organism.key_taxonid=taxonId

ProteinDomain.key_identifier = primaryIdentifier

Protein.key_md5checksum_taxonid=md5checksum, organism

Protein.key_primaryacc=primaryAccession

Protein.key_primaryidentifier=primaryIdentifier

Protein.key_secondaryidentifier=secondaryIdentifier

Publication.key_paper = title, pages, issue, journal, year

Publication.key_pubmed = pubMedId

SequenceFeature.key_primaryidentifier=primaryIdentifier, organism

Sequence.key_md5checksum=md5checksum

Synonym.key_synonym = subject, value

Transcript.key_primaryidentifier = primaryIdentifier

 

wormbase-gff3-core_keys.properties

 

DataSet.key = name

DataSource.key = name

SOTerm.key = name, ontology

Organism.key = taxonId

Ontology.key = name

Publication.key = pubMedId

BioEntity.key = primaryIdentifier

Gene.key = primaryIdentifier

Transcript.key = primaryIdentifier

Chromosome.key_primaryidentifier_organism = primaryIdentifier, organism

SequenceFeature.key_primaryidentifier=primaryIdentifier, organism

 

The error:

 

Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", g

enus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism 

[commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in 

the genomic_priorities.properties file

 

 

By all accounts this should work, according to the config the mine should use the organism and primaryIdentifier as the key, which should avoid collisions between species sets ...

 

 

 

On Wed, Sep 25, 2013 at 9:12 AM, Steven Neuhauser <[hidden email]> wrote:

Hi JD,

 

I had exactly the same problem loading a gff filie.

 

I think I resolved the same issue by adding

 

SequenceFeature.key =primaryIdentifier, organism

 

To the genomic_keyDefs.properties file.

 

-Steve

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of JD Wong
Sent: Tuesday, September 24, 2013 7:25 PM
To: Intermine Developer List
Subject: [InterMine Dev] Setting a two part key

 

Hi all,

 

I'm having errors loading gff3 files because two sources refer to chromosomes with identical names.  It looks as if InterMine is using these names as it's primary key, and confusing them together.  

 

The error text:

Caused by: java.lang.IllegalArgumentException: Conflicting values for field Chromosome.organism between a_suum-gff3-cds (value "Organism [commonName="null", genus="null", id="1000000", name="null", shortName="null", species="null", taxonId="6253"]" in database with ID 1087636) and c_sp11-gff3-gene (value "Organism [commonName="null", genus="null", id="2000000", name="null", shortName="null", species="null", taxonId="-11"]" being stored). This field needs configuring in the genomic_priorities.properties file

 

My question is how do I have InterMine use a two part key?

 

 

I've tried:

In sourcename/resources/sourname_keys.properties

Chromosome.key = primaryIdentifier, organism

 

for good measure

in dbmodel/resources/genomic_keyDefs.properties

Chromosome.key = primaryIdentifier, organism

 

setting these both still causes this conflict, has anyone else faced difficulty with this?

 

Thanks!

-JD

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

 



_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Setting a two part key

Julie Sullivan
In reply to this post by JD Wong
Hi JD

Not sure what's wrong, that looks like it should work! We have this issue
with our mines but this key works correctly:

https://github.com/intermine/intermine/blob/dev/bio/sources/ensembl/ensembl-core/resources/ensembl-core_keys.properties#L7

And Pushkala is right, the key name needs to be unique. e.g. you can't
have different definitions for the same key name. I assume you dropped the
database to get rid of the old index?

Can you try to run just these two sources?

Cheers
Julie

> Hi all,
>
> I'm having errors loading gff3 files because two sources refer to
> chromosomes with identical names.  It looks as if InterMine is using these
> names as it's primary key, and confusing them together.
>
> The error text:
> Caused by: java.lang.IllegalArgumentException: Conflicting values for
> field
> Chromosome.organism between a_suum-gff3-cds (value "Organism
> [commonName="null", genus="null", id="1000000", name="null",
> shortName="null", species="null", taxonId="6253"]" in database with ID
> 1087636) and c_sp11-gff3-gene (value "Organism [commonName="null",
> genus="null", id="2000000", name="null", shortName="null", species="null",
> taxonId="-11"]" being stored). This field needs configuring in the
> genomic_priorities.properties file
>
> My question is how do I have InterMine use a two part key?
>
> As per
> https://intermine.readthedocs.org/en/latest/database/database-building/primary-keys/
>
> I've tried:
> In sourcename/resources/sourname_keys.properties
> Chromosome.key = primaryIdentifier, organism
>
> for good measure
> in dbmodel/resources/genomic_keyDefs.properties
> Chromosome.key = primaryIdentifier, organism
>
> setting these both still causes this conflict, has anyone else faced
> difficulty with this?
>
> Thanks!
> -JD
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Setting a two part key

Joel Richardson-2

Hi JD,

Just curious - what do the entries in your project.xml look like for these
two sources?

Joel
--
Joel E. Richardson, Ph.D.
Sr. Research Scientist
Mouse Genome Informatics
The Jackson Laboratory
600 Main Street
Bar Harbor, Maine 04609
207-288-6435
[hidden email]





On 9/26/13 4:41 AM, "[hidden email]" <[hidden email]> wrote:

>Hi JD
>
>Not sure what's wrong, that looks like it should work! We have this issue
>with our mines but this key works correctly:
>
>https://github.com/intermine/intermine/blob/dev/bio/sources/ensembl/ensemb
>l-core/resources/ensembl-core_keys.properties#L7
>
>And Pushkala is right, the key name needs to be unique. e.g. you can't
>have different definitions for the same key name. I assume you dropped the
>database to get rid of the old index?
>
>Can you try to run just these two sources?
>
>Cheers
>Julie
>
>> Hi all,
>>
>> I'm having errors loading gff3 files because two sources refer to
>> chromosomes with identical names.  It looks as if InterMine is using
>>these
>> names as it's primary key, and confusing them together.
>>
>> The error text:
>> Caused by: java.lang.IllegalArgumentException: Conflicting values for
>> field
>> Chromosome.organism between a_suum-gff3-cds (value "Organism
>> [commonName="null", genus="null", id="1000000", name="null",
>> shortName="null", species="null", taxonId="6253"]" in database with ID
>> 1087636) and c_sp11-gff3-gene (value "Organism [commonName="null",
>> genus="null", id="2000000", name="null", shortName="null",
>>species="null",
>> taxonId="-11"]" being stored). This field needs configuring in the
>> genomic_priorities.properties file
>>
>> My question is how do I have InterMine use a two part key?
>>
>> As per
>>
>>https://intermine.readthedocs.org/en/latest/database/database-building/pr
>>imary-keys/
>>
>> I've tried:
>> In sourcename/resources/sourname_keys.properties
>> Chromosome.key = primaryIdentifier, organism
>>
>> for good measure
>> in dbmodel/resources/genomic_keyDefs.properties
>> Chromosome.key = primaryIdentifier, organism
>>
>> setting these both still causes this conflict, has anyone else faced
>> difficulty with this?
>>
>> Thanks!
>> -JD
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>
>
>_______________________________________________
>dev mailing list
>[hidden email]
>http://mail.intermine.org/cgi-bin/mailman/listinfo/dev


The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev