entity references to reduce redundancy in things like project.xml? Generated project.xml?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

entity references to reduce redundancy in things like project.xml? Generated project.xml?

Todd Harris-2
Hi devs -

Our project.xml file is growing frightening large.  We've got 16 species now. I want to scale this to 100s if not 1000s of species.

Going forward, I'm concerned about the possibility of typos in duplicated entries in project.xml.  For example, typos could easily and silently associate data with the wrong organism.

Entities are an easy answer, defining oft-repeated semi-static data in one place.  Better to have a typo in an entity that blocks build than a silent typo in a taxon ID.  They work well; any reason NOT to use them in project.xml?

   <!ENTITY  celegans_taxon_id "6239">
 
   <!-- and reference -->
   &celegans_taxon_id;    
  <property name="fasta.taxonId"         value="&celegans_taxon_id;"/>

This also raises the possibility of programmatically generating large portions of the project.xml.  Has anybody traversed that path?

Todd


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: entity references to reduce redundancy in things like project.xml? Generated project.xml?

JD Wong
not the project.xml, but I use XSLT which is very convenient and fast.

-JD

On Thu, Jan 19, 2012 at 3:01 PM, Todd Harris <[hidden email]> wrote:
Hi devs -

Our project.xml file is growing frightening large.  We've got 16 species now. I want to scale this to 100s if not 1000s of species.

Going forward, I'm concerned about the possibility of typos in duplicated entries in project.xml.  For example, typos could easily and silently associate data with the wrong organism.

Entities are an easy answer, defining oft-repeated semi-static data in one place.  Better to have a typo in an entity that blocks build than a silent typo in a taxon ID.  They work well; any reason NOT to use them in project.xml?

  <!ENTITY  celegans_taxon_id "6239">

  <!-- and reference -->
  &celegans_taxon_id;
 <property name="fasta.taxonId"         value="&celegans_taxon_id;"/>

This also raises the possibility of programmatically generating large portions of the project.xml.  Has anybody traversed that path?

Todd


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: entity references to reduce redundancy in things like project.xml? Generated project.xml?

Todd Harris-2
Excellent suggestion. Thanks, JD.  I think I might explore that (x)path.

Todd

On Jan 19, 2012, at 1:19 PM, JD Wong wrote:

not the project.xml, but I use XSLT which is very convenient and fast.

-JD

On Thu, Jan 19, 2012 at 3:01 PM, Todd Harris <[hidden email]> wrote:
Hi devs -

Our project.xml file is growing frightening large.  We've got 16 species now. I want to scale this to 100s if not 1000s of species.

Going forward, I'm concerned about the possibility of typos in duplicated entries in project.xml.  For example, typos could easily and silently associate data with the wrong organism.

Entities are an easy answer, defining oft-repeated semi-static data in one place.  Better to have a typo in an entity that blocks build than a silent typo in a taxon ID.  They work well; any reason NOT to use them in project.xml?

  <!ENTITY  celegans_taxon_id "6239">

  <!-- and reference -->
  &celegans_taxon_id;
 <property name="fasta.taxonId"         value="&celegans_taxon_id;"/>

This also raises the possibility of programmatically generating large portions of the project.xml.  Has anybody traversed that path?

Todd


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev



_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: entity references to reduce redundancy in things like project.xml? Generated project.xml?

Julie Sullivan
In reply to this post by Todd Harris-2
Do your fasta files have the organism in the header?

If so, you could extend the FASTA loader and override the getOrganism() method.
  You can then parse the header and get the organism information from the file
instead of from the project XML attributes.

Then you would only need one project XML entry for each feature type.

I'd be happy to help you with this.

On 19/01/12 20:01, Todd Harris wrote:

> Hi devs -
>
> Our project.xml file is growing frightening large.  We've got 16 species now. I want to scale this to 100s if not 1000s of species.
>
> Going forward, I'm concerned about the possibility of typos in duplicated entries in project.xml.  For example, typos could easily and silently associate data with the wrong organism.
>
> Entities are an easy answer, defining oft-repeated semi-static data in one place.  Better to have a typo in an entity that blocks build than a silent typo in a taxon ID.  They work well; any reason NOT to use them in project.xml?
>
>     <!ENTITY  celegans_taxon_id "6239">
>
>     <!-- and reference -->
>     &celegans_taxon_id;
>    <property name="fasta.taxonId"         value="&celegans_taxon_id;"/>
>
> This also raises the possibility of programmatically generating large portions of the project.xml.  Has anybody traversed that path?
>
> Todd
>
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev