|
Hello InterMiners,
I'm happy to announce the release of InterMine 1.0. We've written some new data parsers, fixed some bugs and added lots of new functionality. * Query results - you can now sort and filter query results * Widgets - you can now add widgets without writing any Java code See here for the full list: http://intermine.org/wiki/ReleaseNotes How to upgrade: http://intermine.org/wiki/UpgradeInterMine Please note that we're now using Git for our source control. See: http://intermine.org/wiki/Git Cheers Julie _______________________________________________ dev mailing list [hidden email] http://mail.intermine.org/cgi-bin/mailman/listinfo/dev |
|
Hi,
We've noticed a problem with the Uniprot parser when it runs across entries that used to be covered by the same accession number. For example A8MRP4 and F4IDU9 both used to be known as Q9LN81, but are now recorded as two separate proteins. As the uniprot entries both contain Q9LN81 as a secondary accession, when the parser encounters it again it considers the record to be a duplicate and does not populate any fields. The sequences for both records are different, it's only the secondary identifiers that are the same It's only a handful of records in our dataset, which is why we didn't notice it until now. The problem is that if we take out the checking for duplicates, of course we wind up with a lot of duplicate entries. Is there a way to make the parser only check against the first UniProt accession in a file? With any luck, this is the last thing I'll ever have to do with Intermine 0.97, and I can do the upgrade when I've got this release out. -James _______________________________________________ dev mailing list [hidden email] http://mail.intermine.org/cgi-bin/mailman/listinfo/dev |
|
Hi James,
Try this: http://intrac.flymine.org/changeset/28194/trunk/bio/sources/uniprot/main/src/org/intermine/bio/dataconversion/UniprotConverter.java That change only checks the primary accession for duplicates. I think that will get you what you need. On 02/08/12 11:59, James Blackshaw wrote: > Hi, > > We've noticed a problem with the Uniprot parser when it runs across entries that > used to be covered by the same accession number. For example A8MRP4 and F4IDU9 > both used to be known as Q9LN81, but are now recorded as two separate proteins. > As the uniprot entries both contain Q9LN81 as a secondary accession, when the > parser encounters it again it considers the record to be a duplicate and does > not populate any fields. The sequences for both records are different, it's only > the secondary identifiers that are the same > > It's only a handful of records in our dataset, which is why we didn't notice it > until now. The problem is that if we take out the checking for duplicates, of > course we wind up with a lot of duplicate entries. Is there a way to make the > parser only check against the first UniProt accession in a file? With any luck, > this is the last thing I'll ever have to do with Intermine 0.97, and I can do > the upgrade when I've got this release out. I hope so! I think you will really like 1.0. _______________________________________________ dev mailing list [hidden email] http://mail.intermine.org/cgi-bin/mailman/listinfo/dev |
| Powered by Nabble | Edit this page |
