Not releasing connections during data loading.

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Not releasing connections during data loading.

joe carlson
Hi folks,

I've finally worked out all my issues with adapting my data loaders to
the gradle build system, but now I see a new issue which I'm pretty sure
is gradle specific. I'm not releasing database connections after an
individual data source is loaded. Eventually, I run into a situation in
which there are no available slots:

Failed to create a Non-Pooling DataSource from PostgreSQL JDBC Driver
42.2.2 for phytomine at
jdbc:postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30:
org.postgresql.util.PSQLException: FATAL: remaining connection slots are
reserved for non-replication superuser connections

I'm using the hikari connection pool with (I think) default connection
properties on the production and items databases.

A simple way to reproduce the error is to monitor the database
connections with this xml:

   <source name="uniprot-malaria-1" type="uniprot">
       <property name="uniprot.organisms" value="1"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-2" type="uniprot">
       <property name="uniprot.organisms" value="2"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-3" type="uniprot">
       <property name="uniprot.organisms" value="3"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-4" type="uniprot">
       <property name="uniprot.organisms" value="4"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-5" type="uniprot">
       <property name="uniprot.organisms" value="5"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-6" type="uniprot">
       <property name="uniprot.organisms" value="6"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-7" type="uniprot">
       <property name="uniprot.organisms" value="7"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-8" type="uniprot">
       <property name="uniprot.organisms" value="8"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>
     <source name="uniprot-malaria-9" type="uniprot">
       <property name="uniprot.organisms" value="9"/>
       <property name="creatego" value="true"/>
       <property name="src.data.dir"
location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
     </source>

The taxon_id is bogus, so each step will not find a file and no data is
loaded. I monitor the db connections from another terminal window with
"select application_name,datname,count(*) from pg_stat_activity group by
1,2 order by 1,2". Here is what I see:

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |     4

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |    10

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |    20
  PostgreSQL JDBC Driver | phytomine-13.1  |    20

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |    16
  PostgreSQL JDBC Driver | phytomine-13.1  |    20

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |    30
  PostgreSQL JDBC Driver | phytomine-13.1  |    21

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |    36
  PostgreSQL JDBC Driver | phytomine-13.1  |    40

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |    35
  PostgreSQL JDBC Driver | phytomine-13.1  |    40

     application_name    |     datname     | count
------------------------+-----------------+-------
  PostgreSQL JDBC Driver | items-phytomine |    46
  PostgreSQL JDBC Driver | phytomine-13.1  |    51

     application_name    |    datname     | count
------------------------+----------------+-------
  PostgreSQL JDBC Driver | phytomine-13.1 |    13

It looks like each connection to the items database open ~ 5 connections
and each load into the production database opens ~ 20. None of these get
closed. I have max_connections set to 100 in my postgres server and this
is when it fails. Obviously, I could tweak this parameter and delay the
error. Or, I could split the integration problem into individual
sources., But that will only hide the problem for now. Has anyone else
seen this problem and have a fix?

Thanks,

Joe

(This was not an issue with ant builds before. But it may have been an
issue in postprocessing in the ant build process if there were too many
postprocessing steps at once, as I recall).

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

Daniela Butano-2

Hi Joe,

do you have you gradle daemon enabled?

Try with:

export GRADLE_OPTS="-Dorg.gradle.daemon=false"
let me know if it does not work.
Daniela

On 17/12/2018 22:45, Joe Carlson wrote:
Hi folks,

I've finally worked out all my issues with adapting my data loaders to the gradle build system, but now I see a new issue which I'm pretty sure is gradle specific. I'm not releasing database connections after an individual data source is loaded. Eventually, I run into a situation in which there are no available slots:

Failed to create a Non-Pooling DataSource from PostgreSQL JDBC Driver 42.2.2 for phytomine at jdbc:postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30: org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections

I'm using the hikari connection pool with (I think) default connection properties on the production and items databases.

A simple way to reproduce the error is to monitor the database connections with this xml:

  <source name="uniprot-malaria-1" type="uniprot">
      <property name="uniprot.organisms" value="1"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-2" type="uniprot">
      <property name="uniprot.organisms" value="2"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-3" type="uniprot">
      <property name="uniprot.organisms" value="3"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-4" type="uniprot">
      <property name="uniprot.organisms" value="4"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-5" type="uniprot">
      <property name="uniprot.organisms" value="5"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-6" type="uniprot">
      <property name="uniprot.organisms" value="6"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-7" type="uniprot">
      <property name="uniprot.organisms" value="7"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-8" type="uniprot">
      <property name="uniprot.organisms" value="8"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-9" type="uniprot">
      <property name="uniprot.organisms" value="9"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>

The taxon_id is bogus, so each step will not find a file and no data is loaded. I monitor the db connections from another terminal window with "select application_name,datname,count(*) from pg_stat_activity group by 1,2 order by 1,2". Here is what I see:

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |     4

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    10

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    20
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    16
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    30
 PostgreSQL JDBC Driver | phytomine-13.1  |    21

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    36
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    35
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    46
 PostgreSQL JDBC Driver | phytomine-13.1  |    51

    application_name    |    datname     | count
------------------------+----------------+-------
 PostgreSQL JDBC Driver | phytomine-13.1 |    13

It looks like each connection to the items database open ~ 5 connections and each load into the production database opens ~ 20. None of these get closed. I have max_connections set to 100 in my postgres server and this is when it fails. Obviously, I could tweak this parameter and delay the error. Or, I could split the integration problem into individual sources., But that will only hide the problem for now. Has anyone else seen this problem and have a fix?

Thanks,

Joe

(This was not an issue with ant builds before. But it may have been an issue in postprocessing in the ant build process if there were too many postprocessing steps at once, as I recall).

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

Sam Hokin-3
This is generally true of both loaders (when you bulk load with ./gradlew integrate) and postprocessors (when you bulk post-process
with ./gradlew postprocess). There's a ticket about it methinks. I may have chimed in on this before. It has nothing to do with the
org.gradle.daemon=false setting, I've always had that.

[shokin@shokin-mines ~]$ set | grep gradle
GRADLE_OPTS='-server -Xmx8g -XX:+UseParallelGC -Xms2g -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99 -Dorg.gradle.daemon=false'

On 12/18/18 2:37 AM, Daniela Butano wrote:

> Hi Joe,
>
> do you have you gradle daemon enabled?
>
> Try with:
>
> export GRADLE_OPTS="-Dorg.gradle.daemon=false"
>
> let me know if it does not work.
> Daniela
>
> On 17/12/2018 22:45, Joe Carlson wrote:
>> Hi folks,
>>
>> I've finally worked out all my issues with adapting my data loaders to the gradle build system, but now I see a new issue which
>> I'm pretty sure is gradle specific. I'm not releasing database connections after an individual data source is loaded. Eventually,
>> I run into a situation in which there are no available slots:
>>
>> Failed to create a Non-Pooling DataSource from PostgreSQL JDBC Driver 42.2.2 for phytomine at
>> jdbc:postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30: org.postgresql.util.PSQLException: FATAL: remaining
>> connection slots are reserved for non-replication superuser connections
>>
>> I'm using the hikari connection pool with (I think) default connection properties on the production and items databases.
>>
>> A simple way to reproduce the error is to monitor the database connections with this xml:
>>
>>   <source name="uniprot-malaria-1" type="uniprot">
>>       <property name="uniprot.organisms" value="1"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-2" type="uniprot">
>>       <property name="uniprot.organisms" value="2"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-3" type="uniprot">
>>       <property name="uniprot.organisms" value="3"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-4" type="uniprot">
>>       <property name="uniprot.organisms" value="4"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-5" type="uniprot">
>>       <property name="uniprot.organisms" value="5"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-6" type="uniprot">
>>       <property name="uniprot.organisms" value="6"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-7" type="uniprot">
>>       <property name="uniprot.organisms" value="7"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-8" type="uniprot">
>>       <property name="uniprot.organisms" value="8"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>     <source name="uniprot-malaria-9" type="uniprot">
>>       <property name="uniprot.organisms" value="9"/>
>>       <property name="creatego" value="true"/>
>>       <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>     </source>
>>
>> The taxon_id is bogus, so each step will not find a file and no data is loaded. I monitor the db connections from another terminal
>> window with "select application_name,datname,count(*) from pg_stat_activity group by 1,2 order by 1,2". Here is what I see:
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |     4
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |    10
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |    20
>>  PostgreSQL JDBC Driver | phytomine-13.1  |    20
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |    16
>>  PostgreSQL JDBC Driver | phytomine-13.1  |    20
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |    30
>>  PostgreSQL JDBC Driver | phytomine-13.1  |    21
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |    36
>>  PostgreSQL JDBC Driver | phytomine-13.1  |    40
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |    35
>>  PostgreSQL JDBC Driver | phytomine-13.1  |    40
>>
>>     application_name    |     datname     | count
>> ------------------------+-----------------+-------
>>  PostgreSQL JDBC Driver | items-phytomine |    46
>>  PostgreSQL JDBC Driver | phytomine-13.1  |    51
>>
>>     application_name    |    datname     | count
>> ------------------------+----------------+-------
>>  PostgreSQL JDBC Driver | phytomine-13.1 |    13
>>
>> It looks like each connection to the items database open ~ 5 connections and each load into the production database opens ~ 20.
>> None of these get closed. I have max_connections set to 100 in my postgres server and this is when it fails. Obviously, I could
>> tweak this parameter and delay the error. Or, I could split the integration problem into individual sources., But that will only
>> hide the problem for now. Has anyone else seen this problem and have a fix?
>>
>> Thanks,
>>
>> Joe
>>
>> (This was not an issue with ant builds before. But it may have been an issue in postprocessing in the ant build process if there
>> were too many postprocessing steps at once, as I recall).
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> https://lists.intermine.org/mailman/listinfo/dev
>
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev
>
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

joe carlson
In reply to this post by Daniela Butano-2

Ciao Daniela,

Thanks for the reply. I'd been running with the daemon off. I just tried it with the daemon, and with using db superuser credentials. I still got the error.

Do you get the error with my example project? I have parameters

db.production.datasource.maxConnections=20
db.common-tgt-items.datasource.maxConnections=5

and max_connections=100 for the server.

Thanks,

Joe

On 12/18/18 1:37 AM, Daniela Butano wrote:

Hi Joe,

do you have you gradle daemon enabled?

Try with:

export GRADLE_OPTS="-Dorg.gradle.daemon=false"
let me know if it does not work.
Daniela

On 17/12/2018 22:45, Joe Carlson wrote:
Hi folks,

I've finally worked out all my issues with adapting my data loaders to the gradle build system, but now I see a new issue which I'm pretty sure is gradle specific. I'm not releasing database connections after an individual data source is loaded. Eventually, I run into a situation in which there are no available slots:

Failed to create a Non-Pooling DataSource from PostgreSQL JDBC Driver 42.2.2 for phytomine at jdbc:postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30: org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections

I'm using the hikari connection pool with (I think) default connection properties on the production and items databases.

A simple way to reproduce the error is to monitor the database connections with this xml:

  <source name="uniprot-malaria-1" type="uniprot">
      <property name="uniprot.organisms" value="1"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-2" type="uniprot">
      <property name="uniprot.organisms" value="2"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-3" type="uniprot">
      <property name="uniprot.organisms" value="3"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-4" type="uniprot">
      <property name="uniprot.organisms" value="4"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-5" type="uniprot">
      <property name="uniprot.organisms" value="5"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-6" type="uniprot">
      <property name="uniprot.organisms" value="6"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-7" type="uniprot">
      <property name="uniprot.organisms" value="7"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-8" type="uniprot">
      <property name="uniprot.organisms" value="8"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-9" type="uniprot">
      <property name="uniprot.organisms" value="9"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>

The taxon_id is bogus, so each step will not find a file and no data is loaded. I monitor the db connections from another terminal window with "select application_name,datname,count(*) from pg_stat_activity group by 1,2 order by 1,2". Here is what I see:

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |     4

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    10

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    20
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    16
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    30
 PostgreSQL JDBC Driver | phytomine-13.1  |    21

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    36
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    35
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    46
 PostgreSQL JDBC Driver | phytomine-13.1  |    51

    application_name    |    datname     | count
------------------------+----------------+-------
 PostgreSQL JDBC Driver | phytomine-13.1 |    13

It looks like each connection to the items database open ~ 5 connections and each load into the production database opens ~ 20. None of these get closed. I have max_connections set to 100 in my postgres server and this is when it fails. Obviously, I could tweak this parameter and delay the error. Or, I could split the integration problem into individual sources., But that will only hide the problem for now. Has anyone else seen this problem and have a fix?

Thanks,

Joe

(This was not an issue with ant builds before. But it may have been an issue in postprocessing in the ant build process if there were too many postprocessing steps at once, as I recall).

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

Daniela Butano-2

Hi Joe,

I haven't tried yet. I will have a look tomorrow

On 18/12/2018 16:32, Joe Carlson wrote:

Ciao Daniela,

Thanks for the reply. I'd been running with the daemon off. I just tried it with the daemon, and with using db superuser credentials. I still got the error.

Do you get the error with my example project? I have parameters

db.production.datasource.maxConnections=20
db.common-tgt-items.datasource.maxConnections=5

and max_connections=100 for the server.

Thanks,

Joe

On 12/18/18 1:37 AM, Daniela Butano wrote:

Hi Joe,

do you have you gradle daemon enabled?

Try with:

export GRADLE_OPTS="-Dorg.gradle.daemon=false"
let me know if it does not work.
Daniela

On 17/12/2018 22:45, Joe Carlson wrote:
Hi folks,

I've finally worked out all my issues with adapting my data loaders to the gradle build system, but now I see a new issue which I'm pretty sure is gradle specific. I'm not releasing database connections after an individual data source is loaded. Eventually, I run into a situation in which there are no available slots:

Failed to create a Non-Pooling DataSource from PostgreSQL JDBC Driver 42.2.2 for phytomine at jdbc:postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30: org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections

I'm using the hikari connection pool with (I think) default connection properties on the production and items databases.

A simple way to reproduce the error is to monitor the database connections with this xml:

  <source name="uniprot-malaria-1" type="uniprot">
      <property name="uniprot.organisms" value="1"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-2" type="uniprot">
      <property name="uniprot.organisms" value="2"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-3" type="uniprot">
      <property name="uniprot.organisms" value="3"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-4" type="uniprot">
      <property name="uniprot.organisms" value="4"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-5" type="uniprot">
      <property name="uniprot.organisms" value="5"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-6" type="uniprot">
      <property name="uniprot.organisms" value="6"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-7" type="uniprot">
      <property name="uniprot.organisms" value="7"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-8" type="uniprot">
      <property name="uniprot.organisms" value="8"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-9" type="uniprot">
      <property name="uniprot.organisms" value="9"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>

The taxon_id is bogus, so each step will not find a file and no data is loaded. I monitor the db connections from another terminal window with "select application_name,datname,count(*) from pg_stat_activity group by 1,2 order by 1,2". Here is what I see:

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |     4

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    10

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    20
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    16
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    30
 PostgreSQL JDBC Driver | phytomine-13.1  |    21

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    36
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    35
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    46
 PostgreSQL JDBC Driver | phytomine-13.1  |    51

    application_name    |    datname     | count
------------------------+----------------+-------
 PostgreSQL JDBC Driver | phytomine-13.1 |    13

It looks like each connection to the items database open ~ 5 connections and each load into the production database opens ~ 20. None of these get closed. I have max_connections set to 100 in my postgres server and this is when it fails. Obviously, I could tweak this parameter and delay the error. Or, I could split the integration problem into individual sources., But that will only hide the problem for now. Has anyone else seen this problem and have a fix?

Thanks,

Joe

(This was not an issue with ant builds before. But it may have been an issue in postprocessing in the ant build process if there were too many postprocessing steps at once, as I recall).

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

joe carlson
In reply to this post by Sam Hokin-3
Hi Sam and Daniela,

Here is (at least) a partial solution. One of the problems is that
org.intermine.dataloader.ObjectStoreDataLoaderTask does not close the
hikari connection properly. It looks like the ProxyConnection is closed,
but that's not enough. You need to close the HikariDataSource.

Some other tasks (StoreMetadataTask and BuildDbTask) call
org.intermine.sql.Database.shutdown(). This code has some logic in it
that checks if it's a Hikari connection and calls close the
HikariDataSource correctly.

Here is my little patch that (almost) enables my example uniprot loading
to work:

 > git diff
integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
diff --git
a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
index f093f05635..10b7fe6f69 100644
---
a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
+++
b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
@@ -15,6 +15,7 @@ import org.apache.tools.ant.Task;

  import org.intermine.model.FastPathObject;
  import org.intermine.objectstore.ObjectStoreFactory;
+import org.intermine.objectstore.intermine.ObjectStoreInterMineImpl;

  /**
   * Uses an IntegrationWriter to load data from another ObjectStore.
@@ -127,6 +128,7 @@ public class ObjectStoreDataLoaderTask extends Task
                  new
ObjectStoreDataLoader(iw).process(ObjectStoreFactory.getObjectStore(source),
iw.getMainSource(sourceName, sourceType),
iw.getSkeletonSource(sourceName, sourceType));
+
((ObjectStoreInterMineImpl)(iw.getObjectStore())).getDatabase().shutdown();
              }
          } catch (Exception e) {
              throw new BuildException(e);

I say is it "almost" enables my example to load since there are still
connections to the items database that hang around. It's only 5
connections per source, so it takes a bit longer, but it still runs into
my limit eventually. I'm searching for where these need to be cleaned up.

Sam, there is probably a similar issue with the postprocessors.

Is there any danger that the class cast I'm doing here will throw an
exception?

Joe

On 12/18/18 7:45 AM, Sam Hokin wrote:

> This is generally true of both loaders (when you bulk load with
> ./gradlew integrate) and postprocessors (when you bulk post-process
> with ./gradlew postprocess). There's a ticket about it methinks. I may
> have chimed in on this before. It has nothing to do with the
> org.gradle.daemon=false setting, I've always had that.
>
> [shokin@shokin-mines ~]$ set | grep gradle
> GRADLE_OPTS='-server -Xmx8g -XX:+UseParallelGC -Xms2g
> -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99
> -Dorg.gradle.daemon=false'
>
> On 12/18/18 2:37 AM, Daniela Butano wrote:
>> Hi Joe,
>>
>> do you have you gradle daemon enabled?
>>
>> Try with:
>>
>> export GRADLE_OPTS="-Dorg.gradle.daemon=false"
>>
>> let me know if it does not work.
>> Daniela
>>
>> On 17/12/2018 22:45, Joe Carlson wrote:
>>> Hi folks,
>>>
>>> I've finally worked out all my issues with adapting my data loaders
>>> to the gradle build system, but now I see a new issue which I'm
>>> pretty sure is gradle specific. I'm not releasing database
>>> connections after an individual data source is loaded. Eventually, I
>>> run into a situation in which there are no available slots:
>>>
>>> Failed to create a Non-Pooling DataSource from PostgreSQL JDBC
>>> Driver 42.2.2 for phytomine at
>>> jdbc:postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30:
>>> org.postgresql.util.PSQLException: FATAL: remaining connection slots
>>> are reserved for non-replication superuser connections
>>>
>>> I'm using the hikari connection pool with (I think) default
>>> connection properties on the production and items databases.
>>>
>>> A simple way to reproduce the error is to monitor the database
>>> connections with this xml:
>>>
>>>   <source name="uniprot-malaria-1" type="uniprot">
>>>       <property name="uniprot.organisms" value="1"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-2" type="uniprot">
>>>       <property name="uniprot.organisms" value="2"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-3" type="uniprot">
>>>       <property name="uniprot.organisms" value="3"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-4" type="uniprot">
>>>       <property name="uniprot.organisms" value="4"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-5" type="uniprot">
>>>       <property name="uniprot.organisms" value="5"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-6" type="uniprot">
>>>       <property name="uniprot.organisms" value="6"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-7" type="uniprot">
>>>       <property name="uniprot.organisms" value="7"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-8" type="uniprot">
>>>       <property name="uniprot.organisms" value="8"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>     <source name="uniprot-malaria-9" type="uniprot">
>>>       <property name="uniprot.organisms" value="9"/>
>>>       <property name="creatego" value="true"/>
>>>       <property name="src.data.dir"
>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>     </source>
>>>
>>> The taxon_id is bogus, so each step will not find a file and no data
>>> is loaded. I monitor the db connections from another terminal window
>>> with "select application_name,datname,count(*) from pg_stat_activity
>>> group by 1,2 order by 1,2". Here is what I see:
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |     4
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |    10
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |    20
>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    20
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |    16
>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    20
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |    30
>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    21
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |    36
>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    40
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |    35
>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    40
>>>
>>>     application_name    |     datname     | count
>>> ------------------------+-----------------+-------
>>>  PostgreSQL JDBC Driver | items-phytomine |    46
>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    51
>>>
>>>     application_name    |    datname     | count
>>> ------------------------+----------------+-------
>>>  PostgreSQL JDBC Driver | phytomine-13.1 |    13
>>>
>>> It looks like each connection to the items database open ~ 5
>>> connections and each load into the production database opens ~ 20.
>>> None of these get closed. I have max_connections set to 100 in my
>>> postgres server and this is when it fails. Obviously, I could tweak
>>> this parameter and delay the error. Or, I could split the
>>> integration problem into individual sources., But that will only
>>> hide the problem for now. Has anyone else seen this problem and have
>>> a fix?
>>>
>>> Thanks,
>>>
>>> Joe
>>>
>>> (This was not an issue with ant builds before. But it may have been
>>> an issue in postprocessing in the ant build process if there were
>>> too many postprocessing steps at once, as I recall).
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [hidden email]
>>> https://lists.intermine.org/mailman/listinfo/dev
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> https://lists.intermine.org/mailman/listinfo/dev
>>
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

Daniela Butano-2
Thank you Joe to investigate on the issue and find a solution!

I have run .gradlew integrate on biotestmine and experimented the same
issue.

I have created a ticket
https://github.com/intermine/intermine/issues/1973 linked to a similar
issue documented by Sam (on postprocess task).

I'm testing your fix and then I will create a PR. As you said, this
fixes partially the issue, there are still connections left open to the
items db.

I think we have to check all the converters....

Thank you!!!

Daniela

On 19/12/2018 02:33, Joe Carlson wrote:

> Hi Sam and Daniela,
>
> Here is (at least) a partial solution. One of the problems is that
> org.intermine.dataloader.ObjectStoreDataLoaderTask does not close the
> hikari connection properly. It looks like the ProxyConnection is
> closed, but that's not enough. You need to close the HikariDataSource.
>
> Some other tasks (StoreMetadataTask and BuildDbTask) call
> org.intermine.sql.Database.shutdown(). This code has some logic in it
> that checks if it's a Hikari connection and calls close the
> HikariDataSource correctly.
>
> Here is my little patch that (almost) enables my example uniprot
> loading to work:
>
> > git diff
> integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
> diff --git
> a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
> b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
>
> index f093f05635..10b7fe6f69 100644
> ---
> a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
> +++
> b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
> @@ -15,6 +15,7 @@ import org.apache.tools.ant.Task;
>
>  import org.intermine.model.FastPathObject;
>  import org.intermine.objectstore.ObjectStoreFactory;
> +import org.intermine.objectstore.intermine.ObjectStoreInterMineImpl;
>
>  /**
>   * Uses an IntegrationWriter to load data from another ObjectStore.
> @@ -127,6 +128,7 @@ public class ObjectStoreDataLoaderTask extends Task
>                  new
> ObjectStoreDataLoader(iw).process(ObjectStoreFactory.getObjectStore(source),
> iw.getMainSource(sourceName, sourceType),
> iw.getSkeletonSource(sourceName, sourceType));
> +
> ((ObjectStoreInterMineImpl)(iw.getObjectStore())).getDatabase().shutdown();
>              }
>          } catch (Exception e) {
>              throw new BuildException(e);
>
> I say is it "almost" enables my example to load since there are still
> connections to the items database that hang around. It's only 5
> connections per source, so it takes a bit longer, but it still runs
> into my limit eventually. I'm searching for where these need to be
> cleaned up.
>
> Sam, there is probably a similar issue with the postprocessors.
>
> Is there any danger that the class cast I'm doing here will throw an
> exception?
>
> Joe
>
> On 12/18/18 7:45 AM, Sam Hokin wrote:
>> This is generally true of both loaders (when you bulk load with
>> ./gradlew integrate) and postprocessors (when you bulk post-process
>> with ./gradlew postprocess). There's a ticket about it methinks. I
>> may have chimed in on this before. It has nothing to do with the
>> org.gradle.daemon=false setting, I've always had that.
>>
>> [shokin@shokin-mines ~]$ set | grep gradle
>> GRADLE_OPTS='-server -Xmx8g -XX:+UseParallelGC -Xms2g
>> -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99
>> -Dorg.gradle.daemon=false'
>>
>> On 12/18/18 2:37 AM, Daniela Butano wrote:
>>> Hi Joe,
>>>
>>> do you have you gradle daemon enabled?
>>>
>>> Try with:
>>>
>>> export GRADLE_OPTS="-Dorg.gradle.daemon=false"
>>>
>>> let me know if it does not work.
>>> Daniela
>>>
>>> On 17/12/2018 22:45, Joe Carlson wrote:
>>>> Hi folks,
>>>>
>>>> I've finally worked out all my issues with adapting my data loaders
>>>> to the gradle build system, but now I see a new issue which I'm
>>>> pretty sure is gradle specific. I'm not releasing database
>>>> connections after an individual data source is loaded. Eventually,
>>>> I run into a situation in which there are no available slots:
>>>>
>>>> Failed to create a Non-Pooling DataSource from PostgreSQL JDBC
>>>> Driver 42.2.2 for phytomine at
>>>> jdbc:postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30:
>>>> org.postgresql.util.PSQLException: FATAL: remaining connection
>>>> slots are reserved for non-replication superuser connections
>>>>
>>>> I'm using the hikari connection pool with (I think) default
>>>> connection properties on the production and items databases.
>>>>
>>>> A simple way to reproduce the error is to monitor the database
>>>> connections with this xml:
>>>>
>>>>   <source name="uniprot-malaria-1" type="uniprot">
>>>>       <property name="uniprot.organisms" value="1"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-2" type="uniprot">
>>>>       <property name="uniprot.organisms" value="2"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-3" type="uniprot">
>>>>       <property name="uniprot.organisms" value="3"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-4" type="uniprot">
>>>>       <property name="uniprot.organisms" value="4"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-5" type="uniprot">
>>>>       <property name="uniprot.organisms" value="5"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-6" type="uniprot">
>>>>       <property name="uniprot.organisms" value="6"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-7" type="uniprot">
>>>>       <property name="uniprot.organisms" value="7"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-8" type="uniprot">
>>>>       <property name="uniprot.organisms" value="8"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>     <source name="uniprot-malaria-9" type="uniprot">
>>>>       <property name="uniprot.organisms" value="9"/>
>>>>       <property name="creatego" value="true"/>
>>>>       <property name="src.data.dir"
>>>> location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
>>>>     </source>
>>>>
>>>> The taxon_id is bogus, so each step will not find a file and no
>>>> data is loaded. I monitor the db connections from another terminal
>>>> window with "select application_name,datname,count(*) from
>>>> pg_stat_activity group by 1,2 order by 1,2". Here is what I see:
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |     4
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |    10
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |    20
>>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    20
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |    16
>>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    20
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |    30
>>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    21
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |    36
>>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    40
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |    35
>>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    40
>>>>
>>>>     application_name    |     datname     | count
>>>> ------------------------+-----------------+-------
>>>>  PostgreSQL JDBC Driver | items-phytomine |    46
>>>>  PostgreSQL JDBC Driver | phytomine-13.1  |    51
>>>>
>>>>     application_name    |    datname     | count
>>>> ------------------------+----------------+-------
>>>>  PostgreSQL JDBC Driver | phytomine-13.1 |    13
>>>>
>>>> It looks like each connection to the items database open ~ 5
>>>> connections and each load into the production database opens ~ 20.
>>>> None of these get closed. I have max_connections set to 100 in my
>>>> postgres server and this is when it fails. Obviously, I could tweak
>>>> this parameter and delay the error. Or, I could split the
>>>> integration problem into individual sources., But that will only
>>>> hide the problem for now. Has anyone else seen this problem and
>>>> have a fix?
>>>>
>>>> Thanks,
>>>>
>>>> Joe
>>>>
>>>> (This was not an issue with ant builds before. But it may have been
>>>> an issue in postprocessing in the ant build process if there were
>>>> too many postprocessing steps at once, as I recall).
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [hidden email]
>>>> https://lists.intermine.org/mailman/listinfo/dev
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [hidden email]
>>> https://lists.intermine.org/mailman/listinfo/dev
>>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> https://lists.intermine.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

joe carlson
Here is what I learned this morning:

I wrote a simple routine to connect to postgres and sleep for 5 seconds, then close the Hikari ProxyConnection and sleep another 5 seconds. I set maximumPoolSize=5 and no minimum. In another window I monitored the connections every 2 seconds. (select usename,backend_start,state_change,query_start from pg_stat_activity where datname='phytomine-13.1’ ). This is what shows up:


  usename  |         backend_start         |         state_change          |          query_start          
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:02.619748-08 | 2018-12-19 08:20:02.619694-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:02.619748-08 | 2018-12-19 08:20:02.619694-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:07.503808-08 | 2018-12-19 08:20:07.503733-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:07.503808-08 | 2018-12-19 08:20:07.503733-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:07.503808-08 | 2018-12-19 08:20:07.503733-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

There are 5 connections opened at 8:20:02. The third query shows that one of them changed state at 8:20:07; this was the one that was closed, but then immediately reopened. If I close the HikariDataSource, all 5 connections disappear.

An alternate is to set minimumIdle to 0. This will prevent the connection from reopening after it is closed. But, then again, that kinda defeats the purposes of pooling.

Joe

On Dec 19, 2018, at 6:34 AM, Daniela Butano <[hidden email]> wrote:

Thank you Joe to investigate on the issue and find a solution!

I have run .gradlew integrate on biotestmine and experimented the same issue.

I have created a ticket https://github.com/intermine/intermine/issues/1973 linked to a similar issue documented by Sam (on postprocess task).

I'm testing your fix and then I will create a PR. As you said, this fixes partially the issue, there are still connections left open to the items db.

I think we have to check all the converters....

Thank you!!!

Daniela

On 19/12/2018 02:33, Joe Carlson wrote:
Hi Sam and Daniela,

Here is (at least) a partial solution. One of the problems is that org.intermine.dataloader.ObjectStoreDataLoaderTask does not close the hikari connection properly. It looks like the ProxyConnection is closed, but that's not enough. You need to close the HikariDataSource.

Some other tasks (StoreMetadataTask and BuildDbTask) call org.intermine.sql.Database.shutdown(). This code has some logic in it that checks if it's a Hikari connection and calls close the HikariDataSource correctly.

Here is my little patch that (almost) enables my example uniprot loading to work:

> git diff integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
diff --git a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
index f093f05635..10b7fe6f69 100644
--- a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
+++ b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
@@ -15,6 +15,7 @@ import org.apache.tools.ant.Task;

 import org.intermine.model.FastPathObject;
 import org.intermine.objectstore.ObjectStoreFactory;
+import org.intermine.objectstore.intermine.ObjectStoreInterMineImpl;

 /**
  * Uses an IntegrationWriter to load data from another ObjectStore.
@@ -127,6 +128,7 @@ public class ObjectStoreDataLoaderTask extends Task
                 new ObjectStoreDataLoader(iw).process(ObjectStoreFactory.getObjectStore(source),
iw.getMainSource(sourceName, sourceType),
iw.getSkeletonSource(sourceName, sourceType));
+ ((ObjectStoreInterMineImpl)(iw.getObjectStore())).getDatabase().shutdown();
             }
         } catch (Exception e) {
             throw new BuildException(e);

I say is it "almost" enables my example to load since there are still connections to the items database that hang around. It's only 5 connections per source, so it takes a bit longer, but it still runs into my limit eventually. I'm searching for where these need to be cleaned up.

Sam, there is probably a similar issue with the postprocessors.

Is there any danger that the class cast I'm doing here will throw an exception?

Joe

On 12/18/18 7:45 AM, Sam Hokin wrote:
This is generally true of both loaders (when you bulk load with ./gradlew integrate) and postprocessors (when you bulk post-process with ./gradlew postprocess). There's a ticket about it methinks. I may have chimed in on this before. It has nothing to do with the org.gradle.daemon=false setting, I've always had that.

[shokin@shokin-mines ~]$ set | grep gradle
GRADLE_OPTS='-server -Xmx8g -XX:+UseParallelGC -Xms2g -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99 -Dorg.gradle.daemon=false'

On 12/18/18 2:37 AM, Daniela Butano wrote:
Hi Joe,

do you have you gradle daemon enabled?

Try with:

export GRADLE_OPTS="-Dorg.gradle.daemon=false"

let me know if it does not work.
Daniela

On 17/12/2018 22:45, Joe Carlson wrote:
Hi folks,

I've finally worked out all my issues with adapting my data loaders to the gradle build system, but now I see a new issue which I'm pretty sure is gradle specific. I'm not releasing database connections after an individual data source is loaded. Eventually, I run into a situation in which there are no available slots:

Failed to create a Non-Pooling DataSource from PostgreSQL JDBC Driver 42.2.2 for phytomine at jdbc:<a href="postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30:" class="">postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30: org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections

I'm using the hikari connection pool with (I think) default connection properties on the production and items databases.

A simple way to reproduce the error is to monitor the database connections with this xml:

  <source name="uniprot-malaria-1" type="uniprot">
      <property name="uniprot.organisms" value="1"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-2" type="uniprot">
      <property name="uniprot.organisms" value="2"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-3" type="uniprot">
      <property name="uniprot.organisms" value="3"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-4" type="uniprot">
      <property name="uniprot.organisms" value="4"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-5" type="uniprot">
      <property name="uniprot.organisms" value="5"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-6" type="uniprot">
      <property name="uniprot.organisms" value="6"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-7" type="uniprot">
      <property name="uniprot.organisms" value="7"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-8" type="uniprot">
      <property name="uniprot.organisms" value="8"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-9" type="uniprot">
      <property name="uniprot.organisms" value="9"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>

The taxon_id is bogus, so each step will not find a file and no data is loaded. I monitor the db connections from another terminal window with "select application_name,datname,count(*) from pg_stat_activity group by 1,2 order by 1,2". Here is what I see:

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |     4

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    10

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    20
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    16
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    30
 PostgreSQL JDBC Driver | phytomine-13.1  |    21

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    36
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    35
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    46
 PostgreSQL JDBC Driver | phytomine-13.1  |    51

    application_name    |    datname     | count
------------------------+----------------+-------
 PostgreSQL JDBC Driver | phytomine-13.1 |    13

It looks like each connection to the items database open ~ 5 connections and each load into the production database opens ~ 20. None of these get closed. I have max_connections set to 100 in my postgres server and this is when it fails. Obviously, I could tweak this parameter and delay the error. Or, I could split the integration problem into individual sources., But that will only hide the problem for now. Has anyone else seen this problem and have a fix?

Thanks,

Joe

(This was not an issue with ant builds before. But it may have been an issue in postprocessing in the ant build process if there were too many postprocessing steps at once, as I recall).

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Not releasing connections during data loading.

Paulo Nuin
Hi Joe

I use this https://github.com/julmon/pg_activity to monitor connections to my Postgres instance. It is pretty good.

Cheers
Paulo

On Dec 19, 2018, at 9:31 AM, Joe Carlson <[hidden email]> wrote:

Here is what I learned this morning:

I wrote a simple routine to connect to postgres and sleep for 5 seconds, then close the Hikari ProxyConnection and sleep another 5 seconds. I set maximumPoolSize=5 and no minimum. In another window I monitored the connections every 2 seconds. (select usename,backend_start,state_change,query_start from pg_stat_activity where datname='phytomine-13.1’ ). This is what shows up:


  usename  |         backend_start         |         state_change          |          query_start          
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:02.619748-08 | 2018-12-19 08:20:02.619694-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:02.619748-08 | 2018-12-19 08:20:02.619694-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:07.503808-08 | 2018-12-19 08:20:07.503733-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:07.503808-08 | 2018-12-19 08:20:07.503733-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

  usename  |         backend_start         |         state_change          |          query_start
-----------+-------------------------------+-------------------------------+-------------------------------
 phytomine | 2018-12-19 08:20:02.410797-08 | 2018-12-19 08:20:02.488477-08 | 2018-12-19 08:20:02.488439-08
 phytomine | 2018-12-19 08:20:02.606717-08 | 2018-12-19 08:20:07.503808-08 | 2018-12-19 08:20:07.503733-08
 phytomine | 2018-12-19 08:20:02.629504-08 | 2018-12-19 08:20:02.642065-08 | 2018-12-19 08:20:02.642011-08
 phytomine | 2018-12-19 08:20:02.651777-08 | 2018-12-19 08:20:02.664404-08 | 2018-12-19 08:20:02.664351-08
 phytomine | 2018-12-19 08:20:02.673785-08 | 2018-12-19 08:20:02.686363-08 | 2018-12-19 08:20:02.686314-08
(5 rows)

There are 5 connections opened at 8:20:02. The third query shows that one of them changed state at 8:20:07; this was the one that was closed, but then immediately reopened. If I close the HikariDataSource, all 5 connections disappear.

An alternate is to set minimumIdle to 0. This will prevent the connection from reopening after it is closed. But, then again, that kinda defeats the purposes of pooling.

Joe

On Dec 19, 2018, at 6:34 AM, Daniela Butano <[hidden email]> wrote:

Thank you Joe to investigate on the issue and find a solution!

I have run .gradlew integrate on biotestmine and experimented the same issue.

I have created a ticket https://github.com/intermine/intermine/issues/1973 linked to a similar issue documented by Sam (on postprocess task).

I'm testing your fix and then I will create a PR. As you said, this fixes partially the issue, there are still connections left open to the items db.

I think we have to check all the converters....

Thank you!!!

Daniela

On 19/12/2018 02:33, Joe Carlson wrote:
Hi Sam and Daniela,

Here is (at least) a partial solution. One of the problems is that org.intermine.dataloader.ObjectStoreDataLoaderTask does not close the hikari connection properly. It looks like the ProxyConnection is closed, but that's not enough. You need to close the HikariDataSource.

Some other tasks (StoreMetadataTask and BuildDbTask) call org.intermine.sql.Database.shutdown(). This code has some logic in it that checks if it's a Hikari connection and calls close the HikariDataSource correctly.

Here is my little patch that (almost) enables my example uniprot loading to work:

> git diff integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
diff --git a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
index f093f05635..10b7fe6f69 100644
--- a/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
+++ b/intermine/integrate/src/main/java/org/intermine/dataloader/ObjectStoreDataLoaderTask.java
@@ -15,6 +15,7 @@ import org.apache.tools.ant.Task;

 import org.intermine.model.FastPathObject;
 import org.intermine.objectstore.ObjectStoreFactory;
+import org.intermine.objectstore.intermine.ObjectStoreInterMineImpl;

 /**
  * Uses an IntegrationWriter to load data from another ObjectStore.
@@ -127,6 +128,7 @@ public class ObjectStoreDataLoaderTask extends Task
                 new ObjectStoreDataLoader(iw).process(ObjectStoreFactory.getObjectStore(source),
iw.getMainSource(sourceName, sourceType),
iw.getSkeletonSource(sourceName, sourceType));
+ ((ObjectStoreInterMineImpl)(iw.getObjectStore())).getDatabase().shutdown();
             }
         } catch (Exception e) {
             throw new BuildException(e);

I say is it "almost" enables my example to load since there are still connections to the items database that hang around. It's only 5 connections per source, so it takes a bit longer, but it still runs into my limit eventually. I'm searching for where these need to be cleaned up.

Sam, there is probably a similar issue with the postprocessors.

Is there any danger that the class cast I'm doing here will throw an exception?

Joe

On 12/18/18 7:45 AM, Sam Hokin wrote:
This is generally true of both loaders (when you bulk load with ./gradlew integrate) and postprocessors (when you bulk post-process with ./gradlew postprocess). There's a ticket about it methinks. I may have chimed in on this before. It has nothing to do with the org.gradle.daemon=false setting, I've always had that.

[shokin@shokin-mines ~]$ set | grep gradle
GRADLE_OPTS='-server -Xmx8g -XX:+UseParallelGC -Xms2g -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99 -Dorg.gradle.daemon=false'

On 12/18/18 2:37 AM, Daniela Butano wrote:
Hi Joe,

do you have you gradle daemon enabled?

Try with:

export GRADLE_OPTS="-Dorg.gradle.daemon=false"

let me know if it does not work.
Daniela

On 17/12/2018 22:45, Joe Carlson wrote:
Hi folks,

I've finally worked out all my issues with adapting my data loaders to the gradle build system, but now I see a new issue which I'm pretty sure is gradle specific. I'm not releasing database connections after an individual data source is loaded. Eventually, I run into a situation in which there are no available slots:

Failed to create a Non-Pooling DataSource from PostgreSQL JDBC Driver 42.2.2 for phytomine at jdbc:<a href="postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30:" class="">postgresql://jgi-plant-db-2.nersc.gov/phytomine-13.1?loginTimeout=30: org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections

I'm using the hikari connection pool with (I think) default connection properties on the production and items databases.

A simple way to reproduce the error is to monitor the database connections with this xml:

  <source name="uniprot-malaria-1" type="uniprot">
      <property name="uniprot.organisms" value="1"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-2" type="uniprot">
      <property name="uniprot.organisms" value="2"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-3" type="uniprot">
      <property name="uniprot.organisms" value="3"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-4" type="uniprot">
      <property name="uniprot.organisms" value="4"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-5" type="uniprot">
      <property name="uniprot.organisms" value="5"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-6" type="uniprot">
      <property name="uniprot.organisms" value="6"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-7" type="uniprot">
      <property name="uniprot.organisms" value="7"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-8" type="uniprot">
      <property name="uniprot.organisms" value="8"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>
    <source name="uniprot-malaria-9" type="uniprot">
      <property name="uniprot.organisms" value="9"/>
      <property name="creatego" value="true"/>
      <property name="src.data.dir" location="/home/jcarlson/biotestmine-sample-data/malaria/uniprot/"/>
    </source>

The taxon_id is bogus, so each step will not find a file and no data is loaded. I monitor the db connections from another terminal window with "select application_name,datname,count(*) from pg_stat_activity group by 1,2 order by 1,2". Here is what I see:

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |     4

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    10

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    20
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    16
 PostgreSQL JDBC Driver | phytomine-13.1  |    20

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    30
 PostgreSQL JDBC Driver | phytomine-13.1  |    21

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    36
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    35
 PostgreSQL JDBC Driver | phytomine-13.1  |    40

    application_name    |     datname     | count
------------------------+-----------------+-------
 PostgreSQL JDBC Driver | items-phytomine |    46
 PostgreSQL JDBC Driver | phytomine-13.1  |    51

    application_name    |    datname     | count
------------------------+----------------+-------
 PostgreSQL JDBC Driver | phytomine-13.1 |    13

It looks like each connection to the items database open ~ 5 connections and each load into the production database opens ~ 20. None of these get closed. I have max_connections set to 100 in my postgres server and this is when it fails. Obviously, I could tweak this parameter and delay the error. Or, I could split the integration problem into individual sources., But that will only hide the problem for now. Has anyone else seen this problem and have a fix?

Thanks,

Joe

(This was not an issue with ant builds before. But it may have been an issue in postprocessing in the ant build process if there were too many postprocessing steps at once, as I recall).

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev