bulk upload with nested data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

bulk upload with nested data

sanjuro
Hi all,

I'm trying to upload a set of field collections that contains population information for each sample and multiple samples per population.  If printed to a single file, the data would have this pattern where the numbers are the population fields and letters are sample fields  :
1 2 3 a b c
1 2 3 d e f
4 5 6 j k l
4 5 6 m n o

The population fields are going into nd_experiment, nd_experimentprop, nd_geolocation, and nd_geolocationprop.  The sample fields are going into organism, stock, stockprop, and nd_experiment_stockprop, with nd_experiment_stock to join everything.

I've tried uploading a file like the one above using records with "insert or select select if duplicate" as the record action, but I still get multiple records in the database for the population data.  I've also tried to use two files that follow the pattern below.  uniqX is a unique identifier for the experiment that is put in the nd_experimentprop table.
1 2 3 uniq1
4 5 6 uniq2

a b c uniq1
d e f uniq1
j k l uniq2
m n o uniq2

The population file loads fine, but when I try to use the uniqX value in the sample file to select the nd_experimentid, I get the following error:
"WD tripal_core: tripal_core_chado_select: There is no value for nd_experiment_id thus we cannot check if this record is unique    [error]"

I'm assuming this is because the nd_experimentprop table has three keys and I'm trying to use just one to select a record. 

Can anyone make a suggestion for how I can adjust my templates to work the way I want them too.  Or adjust the tables I'm using or the way I'm using them?  Or anything else I might do?

Thanks!

Sanjuro

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: bulk upload with nested data

Stephen Ficklin-2
Hi Sanjuro,

My apologies you have not received a quick response.   Have you been able to work this out?  If not, would you be able to export your loader, send along and a snippet of your input file and I can take a look?  I'm not quite sure I understand the problem and seeing the data and loader might help.

Thanks,
Stephen

On 1/15/2014 7:51 PM, Sanjuro Jogdeo wrote:
Hi all,

I'm trying to upload a set of field collections that contains population information for each sample and multiple samples per population.  If printed to a single file, the data would have this pattern where the numbers are the population fields and letters are sample fields  :
1 2 3 a b c
1 2 3 d e f
4 5 6 j k l
4 5 6 m n o

The population fields are going into nd_experiment, nd_experimentprop, nd_geolocation, and nd_geolocationprop.  The sample fields are going into organism, stock, stockprop, and nd_experiment_stockprop, with nd_experiment_stock to join everything.

I've tried uploading a file like the one above using records with "insert or select select if duplicate" as the record action, but I still get multiple records in the database for the population data.  I've also tried to use two files that follow the pattern below.  uniqX is a unique identifier for the experiment that is put in the nd_experimentprop table.
1 2 3 uniq1
4 5 6 uniq2

a b c uniq1
d e f uniq1
j k l uniq2
m n o uniq2

The population file loads fine, but when I try to use the uniqX value in the sample file to select the nd_experimentid, I get the following error:
"WD tripal_core: tripal_core_chado_select: There is no value for nd_experiment_id thus we cannot check if this record is unique    [error]"

I'm assuming this is because the nd_experimentprop table has three keys and I'm trying to use just one to select a record. 

Can anyone make a suggestion for how I can adjust my templates to work the way I want them too.  Or adjust the tables I'm using or the way I'm using them?  Or anything else I might do?

Thanks!

Sanjuro


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: bulk upload with nested data

sanjuro
No worries, I figured everyone was busy at the conference. 

I turns out that when I look at the database, everything does seem to be uploaded, in spite of the error message, when I split the files into population and sample.

I've attached a zipped folder containing sample data files and exported bulk upload templates.  Unfortunately, the sample data for the attempt to load as a single file is different than the sample data for the two-file upload, but I think you will get the picture. 

While testing this again, I ran into an issue with missing field data.  You will notice that in the sample_data.tab file, some of the values in the note field are missing.  When I try to upload this file, I get an error about the PREPARE statement (image attached separately).  This does stop the upload of the data.  When I fill those fields with dummy values, the file is loaded successfully.  I tried using an optional insert type but that didn't make any difference.

Thanks for looking at this!

Sanjuro


On Wed, Jan 22, 2014 at 5:36 AM, Stephen Ficklin <[hidden email]> wrote:
Hi Sanjuro,

My apologies you have not received a quick response.   Have you been able to work this out?  If not, would you be able to export your loader, send along and a snippet of your input file and I can take a look?  I'm not quite sure I understand the problem and seeing the data and loader might help.

Thanks,
Stephen


On 1/15/2014 7:51 PM, Sanjuro Jogdeo wrote:
Hi all,

I'm trying to upload a set of field collections that contains population information for each sample and multiple samples per population.  If printed to a single file, the data would have this pattern where the numbers are the population fields and letters are sample fields  :
1 2 3 a b c
1 2 3 d e f
4 5 6 j k l
4 5 6 m n o

The population fields are going into nd_experiment, nd_experimentprop, nd_geolocation, and nd_geolocationprop.  The sample fields are going into organism, stock, stockprop, and nd_experiment_stockprop, with nd_experiment_stock to join everything.

I've tried uploading a file like the one above using records with "insert or select select if duplicate" as the record action, but I still get multiple records in the database for the population data.  I've also tried to use two files that follow the pattern below.  uniqX is a unique identifier for the experiment that is put in the nd_experimentprop table.
1 2 3 uniq1
4 5 6 uniq2

a b c uniq1
d e f uniq1
j k l uniq2
m n o uniq2

The population file loads fine, but when I try to use the uniqX value in the sample file to select the nd_experimentid, I get the following error:
"WD tripal_core: tripal_core_chado_select: There is no value for nd_experiment_id thus we cannot check if this record is unique    [error]"

I'm assuming this is because the nd_experimentprop table has three keys and I'm trying to use just one to select a record. 

Can anyone make a suggestion for how I can adjust my templates to work the way I want them too.  Or adjust the tables I'm using or the way I'm using them?  Or anything else I might do?

Thanks!

Sanjuro


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal



------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal

sql prepare error.png (198K) Download Attachment
bulkuploadfiles.zip (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: bulk upload with nested data

Stephen Ficklin-2
Hi Sanjuro,

Thanks for your patience on this one.  It turns out that there's a bit of a bug with the bulk loader when it comes to empty values with not null fields.   It has to do with the way prepared statements are implemented for Drupal 6.  I believe the problem will be corrected in Tripal 2.0 for Drupal 7.   We will be releasing an alpha version soon.  But, until then there are two ways to handle the problem

1)  Add a value for the missing data such as 'N/A' or a space.  If you use a space it will still add an entry to the nd_experiment_stockprop, but probably won't look right on the website as it won't have a visible value, so 'N/A' may be a better choise.
2)  Move your properties into a new file just for properties such that missing values won't be included and create a template for just loading properties.

I realize both choices have some unwanted consequences.  The second will give you cleaner dataset but will require a bit more work.

Stephen

On 1/22/2014 5:46 PM, Sanjuro Jogdeo wrote:
No worries, I figured everyone was busy at the conference. 

I turns out that when I look at the database, everything does seem to be uploaded, in spite of the error message, when I split the files into population and sample.

I've attached a zipped folder containing sample data files and exported bulk upload templates.  Unfortunately, the sample data for the attempt to load as a single file is different than the sample data for the two-file upload, but I think you will get the picture. 

While testing this again, I ran into an issue with missing field data.  You will notice that in the sample_data.tab file, some of the values in the note field are missing.  When I try to upload this file, I get an error about the PREPARE statement (image attached separately).  This does stop the upload of the data.  When I fill those fields with dummy values, the file is loaded successfully.  I tried using an optional insert type but that didn't make any difference.

Thanks for looking at this!

Sanjuro


On Wed, Jan 22, 2014 at 5:36 AM, Stephen Ficklin <[hidden email]> wrote:
Hi Sanjuro,

My apologies you have not received a quick response.   Have you been able to work this out?  If not, would you be able to export your loader, send along and a snippet of your input file and I can take a look?  I'm not quite sure I understand the problem and seeing the data and loader might help.

Thanks,
Stephen


On 1/15/2014 7:51 PM, Sanjuro Jogdeo wrote:
Hi all,

I'm trying to upload a set of field collections that contains population information for each sample and multiple samples per population.  If printed to a single file, the data would have this pattern where the numbers are the population fields and letters are sample fields  :
1 2 3 a b c
1 2 3 d e f
4 5 6 j k l
4 5 6 m n o

The population fields are going into nd_experiment, nd_experimentprop, nd_geolocation, and nd_geolocationprop.  The sample fields are going into organism, stock, stockprop, and nd_experiment_stockprop, with nd_experiment_stock to join everything.

I've tried uploading a file like the one above using records with "insert or select select if duplicate" as the record action, but I still get multiple records in the database for the population data.  I've also tried to use two files that follow the pattern below.  uniqX is a unique identifier for the experiment that is put in the nd_experimentprop table.
1 2 3 uniq1
4 5 6 uniq2

a b c uniq1
d e f uniq1
j k l uniq2
m n o uniq2

The population file loads fine, but when I try to use the uniqX value in the sample file to select the nd_experimentid, I get the following error:
"WD tripal_core: tripal_core_chado_select: There is no value for nd_experiment_id thus we cannot check if this record is unique    [error]"

I'm assuming this is because the nd_experimentprop table has three keys and I'm trying to use just one to select a record. 

Can anyone make a suggestion for how I can adjust my templates to work the way I want them too.  Or adjust the tables I'm using or the way I'm using them?  Or anything else I might do?

Thanks!

Sanjuro


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal