When the build system is storing a record into the database, it first checks to
see if that record is *already* in the database. It uses the keys to determine
this. The only difference between those keys is which field is used to compare
1. run uniprot source
. loads proteins
. Uniprot accession = Protein.primaryAccession (eg. Q96EK7)
. Uniprot identifier = Protein.primaryIdentifier (eg. F120B_HUMAN)
2. run new-source
. key is primary accession
. for new protein:
a. check database for new protein's primary accession
b. if value found
- build system considers new protein to already be in the database
- merge new protein data with protein record in database
- ERROR if conflict, resolve in genomic_priorities.properties
- ERROR if duplicate keys, eg. two proteins have same accession
c. if primary accession is not found in database, new record added
The tutorial has a section on the integration system:
> Can you please explain about difference among primary identifier,secondary
> identifier,primary accession from keys file?
> Whether Primary accession value will merge the data already existing class?
> Thanks in advance.
> dev mailing list
> [hidden email] > http://mail.intermine.org/cgi-bin/mailman/listinfo/dev