Post-processors should delete items they create before re-creating them

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Post-processors should delete items they create before re-creating them

Sam Hokin-3
Something I've run into a lot is that I'd like to re-run post-processors (because I'm coding them, or I've added some new data), and
many of the stock feature-creation post-processors append new records. So I've slowly been hacking post-processors to delete items
before adding them again, using a code chunk like below (verbose because the bulk delete function is broken, which is a submitted
issue that Justin has looked at a bit).

I think we should have a _requirement_ in some sort of post-processor _spec/guide_ that says "Delete all items first that you will
be creating (again or not) in this post-processor." IMO post-processors should be able to be run multiple times without creating
duplicates. A fix to the bulk delete method would be helpful, of course. Here's what I do:

         // delete existing GeneFlankingRegion objects by first loading them into a collection...
         LOG.info("Grabbing existing gene flanking regions...");
         Set<GeneFlankingRegion> gfrSet = new HashSet<GeneFlankingRegion>();
         Query qGFR = new Query();
         QueryClass qcGFR = new QueryClass(GeneFlankingRegion.class);
         qGFR.addToSelect(qcGFR);
         qGFR.addFrom(qcGFR);
         Results gfrResults = osw.getObjectStore().execute(qGFR);
         Iterator<?> gfrIter = gfrResults.iterator();
         while (gfrIter.hasNext()) {
             ResultsRow<?> rr = (ResultsRow<?>) gfrIter.next();
             gfrSet.add((GeneFlankingRegion)rr.get(0));
         }
         // ...and then deleting them
         LOG.info("Deleting existing GeneFlankingRegion records...");
         osw.beginTransaction();
         for (GeneFlankingRegion gfr : gfrSet) {
             osw.delete(gfr);
         }
         osw.commitTransaction();
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Post-processors should delete items they create before re-creating them

Justin Clark-Casey-2
On 20/02/18 20:54, Sam Hokin wrote:
> Something I've run into a lot is that I'd like to re-run post-processors (because I'm coding them, or I've added some new data), and many of the stock
> feature-creation post-processors append new records. So I've slowly been hacking post-processors to delete items before adding them again, using a code chunk
> like below (verbose because the bulk delete function is broken, which is a submitted issue that Justin has looked at a bit).

Sam, could you remind me of the issue number?  I recently looked at bulk delete in the Gradle transition, because some unit tests ended up running in a
different order and made assumptions about the initial state of the database when instead they should also wipe before they start.  Though any fixes would only
be available in 2.0 onwards.

>
> I think we should have a _requirement_ in some sort of post-processor _spec/guide_ that says "Delete all items first that you will be creating (again or not) in
> this post-processor." IMO post-processors should be able to be run multiple times without creating duplicates. A fix to the bulk delete method would be helpful,
> of course. Here's what I do:

Though I haven't thought about the issue in huge detail, I'm inclined to agree with you, though I would also be inclined so say that post-processors should not
fail if they encounter an already added item, rather than delete beforehand in case some other postprocessor is operating on the same dataset/classes.  However,
either requirement would be hard to enforce.

>
>          // delete existing GeneFlankingRegion objects by first loading them into a collection...
>          LOG.info("Grabbing existing gene flanking regions...");
>          Set<GeneFlankingRegion> gfrSet = new HashSet<GeneFlankingRegion>();
>          Query qGFR = new Query();
>          QueryClass qcGFR = new QueryClass(GeneFlankingRegion.class);
>          qGFR.addToSelect(qcGFR);
>          qGFR.addFrom(qcGFR);
>          Results gfrResults = osw.getObjectStore().execute(qGFR);
>          Iterator<?> gfrIter = gfrResults.iterator();
>          while (gfrIter.hasNext()) {
>              ResultsRow<?> rr = (ResultsRow<?>) gfrIter.next();
>              gfrSet.add((GeneFlankingRegion)rr.get(0));
>          }
>          // ...and then deleting them
>          LOG.info("Deleting existing GeneFlankingRegion records...");
>          osw.beginTransaction();
>          for (GeneFlankingRegion gfr : gfrSet) {
>              osw.delete(gfr);
>          }
>          osw.commitTransaction();
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Post-processors should delete items they create before re-creating them

Sam Hokin-3
Sorry, Justin, apparently I never submitted it as an issue, we talked about it here. Subject line is:

[InterMine Dev] Why can't I use ObjectStoreWriter.delete(QueryClass qc, null) ?

Happy to issue-ize it, or you can. No biggy before v2, just reporting stuff. The workaround works.

On 02/21/2018 06:22 AM, Justin Clark-Casey wrote:
> On 20/02/18 20:54, Sam Hokin wrote:
>> Something I've run into a lot is that I'd like to re-run post-processors (because I'm coding them, or I've added some new
>> data), and many of the stock feature-creation post-processors append new records. So I've slowly been hacking post-processors
>> to delete items before adding them again, using a code chunk like below (verbose because the bulk delete function is broken,
>> which is a submitted issue that Justin has looked at a bit).
>
> Sam, could you remind me of the issue number?  I recently looked at bulk delete in the Gradle transition, because some unit tests
>  ended up running in a different order and made assumptions about the initial state of the database when instead they should also
>  wipe before they start.  Though any fixes would only be available in 2.0 onwards.
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev