Quantcast

InterMine 1.0

classic Classic list List threaded Threaded
3 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

InterMine 1.0

Julie Sullivan
Hello InterMiners,

I'm happy to announce the release of InterMine 1.0.  We've written some new data
parsers, fixed some bugs and added lots of new functionality.

  * Query results - you can now sort and filter query results
  * Widgets - you can now add widgets without writing any Java code

See here for the full list:  http://intermine.org/wiki/ReleaseNotes
How to upgrade:  http://intermine.org/wiki/UpgradeInterMine

Please note that we're now using Git for our source control.  See:
http://intermine.org/wiki/Git       

Cheers
Julie

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Uniprot parser and secondary identifiers

James Blackshaw
Hi,

We've noticed a problem with the Uniprot parser when it runs across
entries that used to be covered by the same accession number. For
example A8MRP4 and F4IDU9 both used to be known as Q9LN81, but are now
recorded as two separate proteins. As the uniprot entries both contain
Q9LN81 as a secondary accession, when the parser encounters it again it
considers the record to be a duplicate and does not populate any fields.
The sequences for both records are different, it's only the secondary
identifiers that are the same

It's only a handful of records in our dataset, which is why we didn't
notice it until now. The problem is that if we take out the checking for
duplicates, of course we wind up with a lot of duplicate entries. Is
there a way to make the parser only check against the first UniProt
accession in a file? With any luck, this is the last thing I'll ever
have to do with Intermine 0.97, and I can do the upgrade when I've got
this release out.

-James

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Uniprot parser and secondary identifiers

Julie Sullivan
Hi James,

Try this:

http://intrac.flymine.org/changeset/28194/trunk/bio/sources/uniprot/main/src/org/intermine/bio/dataconversion/UniprotConverter.java

That change only checks the primary accession for duplicates.  I think that will
get you what you need.

On 02/08/12 11:59, James Blackshaw wrote:

> Hi,
>
> We've noticed a problem with the Uniprot parser when it runs across entries that
> used to be covered by the same accession number. For example A8MRP4 and F4IDU9
> both used to be known as Q9LN81, but are now recorded as two separate proteins.
> As the uniprot entries both contain Q9LN81 as a secondary accession, when the
> parser encounters it again it considers the record to be a duplicate and does
> not populate any fields. The sequences for both records are different, it's only
> the secondary identifiers that are the same
>
> It's only a handful of records in our dataset, which is why we didn't notice it
> until now. The problem is that if we take out the checking for duplicates, of
> course we wind up with a lot of duplicate entries. Is there a way to make the
> parser only check against the first UniProt accession in a file? With any luck,
> this is the last thing I'll ever have to do with Intermine 0.97, and I can do
> the upgrade when I've got this release out.

I hope so!  I think you will really like 1.0.

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Loading...