Debugging Entrez species step

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Debugging Entrez species step

Paulo Nuin
Hi everyone

We are having issues (again) with the Entrez step to load species features in the organism DB. It’s out last step of the build and earlier when we had the same problem we added species/taxon IDs to our XML that were loaded in the DB and not listed in it. At this time, it seems that all species are listed and loaded from our XML, but the Entrez step is still failing after the build is complete.

Running the step by itself doesn’t output any problems, just at the end of the build. Is there a straightforward to debug the Entrez step?

Thanks

Paulo

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Debugging Entrez species step

sergio contrino-2
hi paul, could you please send the logs of when it failed?
the entrez step involves calls to ncbi, so there could be external
factors as well.
thanks
sergio

On 18/12/2018 22:47, Paulo Nuin wrote:

> Hi everyone
>
> We are having issues (again) with the Entrez step to load species features in the organism DB. It’s out last step of the build and earlier when we had the same problem we added species/taxon IDs to our XML that were loaded in the DB and not listed in it. At this time, it seems that all species are listed and loaded from our XML, but the Entrez step is still failing after the build is complete.
>
> Running the step by itself doesn’t output any problems, just at the end of the build. Is there a straightforward to debug the Entrez step?
>
> Thanks
>
> Paulo
>
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev
>

--
sergio contrino                  InterMine, University of Cambridge
https://sergiocontrino.github.io           http://www.intermine.org
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Debugging Entrez species step

Paulo Nuin
Hi Sergio

I don’t have the log anymore, but it basically showed that the step failed. The XML file created by it has just

<item>

in it, nothing else.

Cheers
Paulo



> On Dec 19, 2018, at 5:29 AM, sergio contrino <[hidden email]> wrote:
>
> hi paul, could you please send the logs of when it failed?
> the entrez step involves calls to ncbi, so there could be external factors as well.
> thanks
> sergio
>
> On 18/12/2018 22:47, Paulo Nuin wrote:
>> Hi everyone
>> We are having issues (again) with the Entrez step to load species features in the organism DB. It’s out last step of the build and earlier when we had the same problem we added species/taxon IDs to our XML that were loaded in the DB and not listed in it. At this time, it seems that all species are listed and loaded from our XML, but the Entrez step is still failing after the build is complete.
>> Running the step by itself doesn’t output any problems, just at the end of the build. Is there a straightforward to debug the Entrez step?
>> Thanks
>> Paulo
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> https://lists.intermine.org/mailman/listinfo/dev
>
> --
> sergio contrino                  InterMine, University of Cambridge
> https://sergiocontrino.github.io           http://www.intermine.org
> _______________________________________________
> dev mailing list
> [hidden email]
> https://lists.intermine.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Debugging Entrez species step

sergio contrino-2
hi paulo,
would it be possible for you to run it again?
i would suggest you add an intermediate dump just before the
entrez-organism step, so you can retry the step without losing all the
build up to that moment (see below for example)
then you could share the integration log file (and also test if it is a
temporary network issue)
thanks!
sergio


in your project.xml file:
[..]
<source name="ensembl-compara-yeast" type="ensembl-compara" dump="true">
...
<source name="entrez-organism" type="entrez-organism">
[..]

you can then restart your build after failure from the entrez-organism
step (project_build script with -r switch)


On 19/12/2018 18:28, Paulo Nuin wrote:

> Hi Sergio
>
> I don’t have the log anymore, but it basically showed that the step failed. The XML file created by it has just
>
> <item>
>
> in it, nothing else.
>
> Cheers
> Paulo
>
>
>
>> On Dec 19, 2018, at 5:29 AM, sergio contrino <[hidden email]> wrote:
>>
>> hi paul, could you please send the logs of when it failed?
>> the entrez step involves calls to ncbi, so there could be external factors as well.
>> thanks
>> sergio
>>
>> On 18/12/2018 22:47, Paulo Nuin wrote:
>>> Hi everyone
>>> We are having issues (again) with the Entrez step to load species features in the organism DB. It’s out last step of the build and earlier when we had the same problem we added species/taxon IDs to our XML that were loaded in the DB and not listed in it. At this time, it seems that all species are listed and loaded from our XML, but the Entrez step is still failing after the build is complete.
>>> Running the step by itself doesn’t output any problems, just at the end of the build. Is there a straightforward to debug the Entrez step?
>>> Thanks
>>> Paulo
>>> _______________________________________________
>>> dev mailing list
>>> [hidden email]
>>> https://lists.intermine.org/mailman/listinfo/dev
>>
>> --
>> sergio contrino                  InterMine, University of Cambridge
>> https://sergiocontrino.github.io           http://www.intermine.org
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> https://lists.intermine.org/mailman/listinfo/dev
>

--
sergio contrino                  InterMine, University of Cambridge
https://sergiocontrino.github.io           http://www.intermine.org
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Debugging Entrez species step

Paulo Nuin
Hi Sergio

I think I found the error on the step (building with 2.x now). It seems that one species that has items in our sources don’t have a species name on NCBI. Caenorhabditis inopinata is the species, which is in our XML initially with a taxon ID 1978547. The problem is that inopinata doesn’t exist as such on NCBI, as it doesn’t have a species name


I think when the entrez steps tries to get the information there it doesn’t find a match. Any way to solve this on our side?

Thanks

Paulo




On Dec 20, 2018, at 3:47 AM, sergio contrino <[hidden email]> wrote:

hi paulo,
would it be possible for you to run it again?
i would suggest you add an intermediate dump just before the entrez-organism step, so you can retry the step without losing all the build up to that moment (see below for example)
then you could share the integration log file (and also test if it is a temporary network issue)
thanks!
sergio


in your project.xml file:
[..]
<source name="ensembl-compara-yeast" type="ensembl-compara" dump="true">
...
<source name="entrez-organism" type="entrez-organism">
[..]

you can then restart your build after failure from the entrez-organism step (project_build script with -r switch)


On 19/12/2018 18:28, Paulo Nuin wrote:
Hi Sergio
I don’t have the log anymore, but it basically showed that the step failed. The XML file created by it has just
<item>
in it, nothing else.
Cheers
Paulo
On Dec 19, 2018, at 5:29 AM, sergio contrino <[hidden email]> wrote:

hi paul, could you please send the logs of when it failed?
the entrez step involves calls to ncbi, so there could be external factors as well.
thanks
sergio

On 18/12/2018 22:47, Paulo Nuin wrote:
Hi everyone
We are having issues (again) with the Entrez step to load species features in the organism DB. It’s out last step of the build and earlier when we had the same problem we added species/taxon IDs to our XML that were loaded in the DB and not listed in it. At this time, it seems that all species are listed and loaded from our XML, but the Entrez step is still failing after the build is complete.
Running the step by itself doesn’t output any problems, just at the end of the build. Is there a straightforward to debug the Entrez step?
Thanks
Paulo
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

--
sergio contrino                  InterMine, University of Cambridge
https://sergiocontrino.github.io           http://www.intermine.org
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev

--
sergio contrino                  InterMine, University of Cambridge
https://sergiocontrino.github.io           http://www.intermine.org


_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Debugging Entrez species step

sergio contrino-2
hi paulo,
i think the entrez step uses the taxid, and with the taxonomy browser
you can find the species (but the name is Caenorhabditis sp. 34 TK-2017).
should you conform to ncbi naming? or ask them for a correction?
still i think it should work.
can you send the error message, or the log?
thanks!
sergio


On 08/02/2019 05:03, Paulo Nuin wrote:

> Hi Sergio
>
> I think I found the error on the step (building with 2.x now). It seems
> that one species that has items in our sources don’t have a species name
> on NCBI. Caenorhabditis inopinata is the species, which is in our XML
> initially with a taxon ID 1978547. The problem is that inopinata doesn’t
> exist as such on NCBI, as it doesn’t have a species name
>
> https://www.ncbi.nlm.nih.gov/bioproject/382947
>
> I think when the entrez steps tries to get the information there it
> doesn’t find a match. Any way to solve this on our side?
>
> Thanks
>
> Paulo
>
>
>
>
>> On Dec 20, 2018, at 3:47 AM, sergio contrino <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> hi paulo,
>> would it be possible for you to run it again?
>> i would suggest you add an intermediate dump just before the
>> entrez-organism step, so you can retry the step without losing all the
>> build up to that moment (see below for example)
>> then you could share the integration log file (and also test if it is
>> a temporary network issue)
>> thanks!
>> sergio
>>
>>
>> in your project.xml file:
>> [..]
>> <source name="ensembl-compara-yeast" type="ensembl-compara" dump="true">
>> ...
>> <source name="entrez-organism" type="entrez-organism">
>> [..]
>>
>> you can then restart your build after failure from the entrez-organism
>> step (project_build script with -r switch)
>>
>>
>> On 19/12/2018 18:28, Paulo Nuin wrote:
>>> Hi Sergio
>>> I don’t have the log anymore, but it basically showed that the step
>>> failed. The XML file created by it has just
>>> <item>
>>> in it, nothing else.
>>> Cheers
>>> Paulo
>>>> On Dec 19, 2018, at 5:29 AM, sergio contrino <[hidden email]
>>>> <mailto:[hidden email]>> wrote:
>>>>
>>>> hi paul, could you please send the logs of when it failed?
>>>> the entrez step involves calls to ncbi, so there could be external
>>>> factors as well.
>>>> thanks
>>>> sergio
>>>>
>>>> On 18/12/2018 22:47, Paulo Nuin wrote:
>>>>> Hi everyone
>>>>> We are having issues (again) with the Entrez step to load species
>>>>> features in the organism DB. It’s out last step of the build and
>>>>> earlier when we had the same problem we added species/taxon IDs to
>>>>> our XML that were loaded in the DB and not listed in it. At this
>>>>> time, it seems that all species are listed and loaded from our XML,
>>>>> but the Entrez step is still failing after the build is complete.
>>>>> Running the step by itself doesn’t output any problems, just at the
>>>>> end of the build. Is there a straightforward to debug the Entrez step?
>>>>> Thanks
>>>>> Paulo
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> [hidden email] <mailto:[hidden email]>
>>>>> https://lists.intermine.org/mailman/listinfo/dev
>>>>
>>>> --
>>>> sergio contrino                  InterMine, University of Cambridge
>>>> https://sergiocontrino.github.io http://www.intermine.org
>>>> _______________________________________________
>>>> dev mailing list
>>>> [hidden email] <mailto:[hidden email]>
>>>> https://lists.intermine.org/mailman/listinfo/dev
>>
>> --
>> sergio contrino                  InterMine, University of Cambridge
>> https://sergiocontrino.github.io http://www.intermine.org
>

--
sergio contrino                  InterMine, University of Cambridge
https://sergiocontrino.github.io           http://www.intermine.org
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Debugging Entrez species step

Paulo Nuin
Hi

It really fails on the undetermined sequence on NCBI. My organism table has all taxon IDs but that one. Message and log are below

Paulo

gerBinder.class]
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] SLF4J: Found binding in [jar:file:/home/nuin/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-jdk14/1.7.10/13675a57601d9106fe82314fe9eebb9b2d9b13c6/slf4j-jdk14-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Starting...
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Start completed.
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Shutdown initiated...
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Shutdown completed.
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] :dbmodel:preRetrieveSingleSource
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] Pre-retrieving entrez-organism
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Class path contains multiple SLF4J bindings.
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Found binding in [jar:file:/home/nuin/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-simple/1.7.7/8095d0b9f7e0a9cd79a663c740e0f8fb31d0e2c8/slf4j-simple-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Found binding in [jar:file:/home/nuin/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-jdk14/1.7.10/13675a57601d9106fe82314fe9eebb9b2d9b13c6/slf4j-jdk14-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.production - Starting...
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.production - Start completed.
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] :dbmodel:preRetrieveSingleSource FAILED
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] 10 actionable tasks: 6 executed, 4 up-to-date
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [Thread-29] INFO com.zaxxer.hikari.HikariDataSource - db.production - Shutdown initiated...
  [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [Thread-29] INFO com.zaxxer.hikari.HikariDataSource - db.production - Shutdown completed.
Sat Feb  9 03:51:36 UTC 2019

finished


failed with exit code 0: /mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon --stacktrace integrate -Psource=entrez-organism




* What went wrong:
Execution failed for task ':dbmodel:preRetrieveSingleSource'.
> exception while retrieving organisms

* Try:
Run with --info or --debug option to get more log output.

* Exception is:
org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':dbmodel:preRetrieveSingleSource'.
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:100)
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:70)
        at org.gradle.api.internal.tasks.execution.SkipUpToDateTaskExecuter.execute(SkipUpToDateTaskExecuter.java:63)
        at org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java
:54)
        at org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
        at org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:88)
        at org.gradle.api.internal.tasks.execution.ResolveTaskArtifactStateTaskExecuter.execute(ResolveTaskArtifactStateTaskExecuter.java:5
2)
        at org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:52)
        at org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:54)
        at org.gradle.api.internal.tasks.execution.ExecuteAtMostOnceTaskExecuter.execute(ExecuteAtMostOnceTaskExecuter.java:43)
        at org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:34)
        at org.gradle.execution.taskgraph.DefaultTaskGraphExecuter$EventFiringTaskWorker$1.run(DefaultTaskGraphExecuter.java:248)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.ja
va:336)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.ja
va:328)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:197)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:107)
        at org.gradle.execution.taskgraph.DefaultTaskGraphExecuter$EventFiringTaskWorker.execute(DefaultTaskGraphExecuter.java:241)
        at org.gradle.execution.taskgraph.DefaultTaskGraphExecuter$EventFiringTaskWorker.execute(DefaultTaskGraphExecuter.java:230)
        at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.processTask(DefaultTaskPlanExecutor.java:124)
        at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.access$200(Defau
        at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.access$200(DefaultTaskPlanExecutor.java:80)   [0/1901]
        at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker$1.execute(DefaultTaskPlanExecutor.java:105)
        at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker$1.execute(DefaultTaskPlanExecutor.java:99)
        at org.gradle.execution.taskgraph.DefaultTaskExecutionPlan.execute(DefaultTaskExecutionPlan.java:625)
        at org.gradle.execution.taskgraph.DefaultTaskExecutionPlan.executeWithTask(DefaultTaskExecutionPlan.java:580)
        at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.run(DefaultTaskPlanExecutor.java:99)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
Caused by: : exception while retrieving organisms
        at org.intermine.bio.dataconversion.EntrezOrganismRetriever.execute(EntrezOrganismRetriever.java:127)
        at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
        at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
        at org.intermine.plugin.integrate.IntegrateUtils$_closure1.doCall(IntegrateUtils.groovy:39)
        at org.intermine.plugin.integrate.IntegratePlugin$_apply_closure6$_closure18.doCall(IntegratePlugin.groovy:150)
        at org.gradle.api.internal.AbstractTask$ClosureTaskAction.execute(AbstractTask.java:681)
        at org.gradle.api.internal.AbstractTask$ClosureTaskAction.execute(AbstractTask.java:656)
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$1.run(ExecuteActionsTaskExecuter.java:122)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:336)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:328)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:197)
        at org.gradle.internal.progress.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:107)
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeAction(ExecuteActionsTaskExecuter.java:111)
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:92)
        ... 27 more
Caused by: java.lang.NullPointerException
        at org.intermine.metadata.StringUtil.join(StringUtil.java:103)
        at org.intermine.bio.dataconversion.EntrezOrganismRetriever.getReader(EntrezOrganismRetriever.java:171)
        at org.intermine.bio.dataconversion.EntrezOrganismRetriever.execute(EntrezOrganismRetriever.java:115)
        ... 40 more

> On Feb 8, 2019, at 5:28 AM, sergio contrino <[hidden email]> wrote:
>
> hi paulo,
> i think the entrez step uses the taxid, and with the taxonomy browser you can find the species (but the name is Caenorhabditis sp. 34 TK-2017).
> should you conform to ncbi naming? or ask them for a correction?
> still i think it should work.
> can you send the error message, or the log?
> thanks!
> sergio
>
>
> On 08/02/2019 05:03, Paulo Nuin wrote:
>> Hi Sergio
>> I think I found the error on the step (building with 2.x now). It seems that one species that has items in our sources don’t have a species name on NCBI. Caenorhabditis inopinata is the species, which is in our XML initially with a taxon ID 1978547. The problem is that inopinata doesn’t exist as such on NCBI, as it doesn’t have a species name
>> https://www.ncbi.nlm.nih.gov/bioproject/382947
>> I think when the entrez steps tries to get the information there it doesn’t find a match. Any way to solve this on our side?
>> Thanks
>> Paulo
>>> On Dec 20, 2018, at 3:47 AM, sergio contrino <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>> hi paulo,
>>> would it be possible for you to run it again?
>>> i would suggest you add an intermediate dump just before the entrez-organism step, so you can retry the step without losing all the build up to that moment (see below for example)
>>> then you could share the integration log file (and also test if it is a temporary network issue)
>>> thanks!
>>> sergio
>>>
>>>
>>> in your project.xml file:
>>> [..]
>>> <source name="ensembl-compara-yeast" type="ensembl-compara" dump="true">
>>> ...
>>> <source name="entrez-organism" type="entrez-organism">
>>> [..]
>>>
>>> you can then restart your build after failure from the entrez-organism step (project_build script with -r switch)
>>>
>>>
>>> On 19/12/2018 18:28, Paulo Nuin wrote:
>>>> Hi Sergio
>>>> I don’t have the log anymore, but it basically showed that the step failed. The XML file created by it has just
>>>> <item>
>>>> in it, nothing else.
>>>> Cheers
>>>> Paulo
>>>>> On Dec 19, 2018, at 5:29 AM, sergio contrino <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>
>>>>> hi paul, could you please send the logs of when it failed?
>>>>> the entrez step involves calls to ncbi, so there could be external factors as well.
>>>>> thanks
>>>>> sergio
>>>>>
>>>>> On 18/12/2018 22:47, Paulo Nuin wrote:
>>>>>> Hi everyone
>>>>>> We are having issues (again) with the Entrez step to load species features in the organism DB. It’s out last step of the build and earlier when we had the same problem we added species/taxon IDs to our XML that were loaded in the DB and not listed in it. At this time, it seems that all species are listed and loaded from our XML, but the Entrez step is still failing after the build is complete.
>>>>>> Running the step by itself doesn’t output any problems, just at the end of the build. Is there a straightforward to debug the Entrez step?
>>>>>> Thanks
>>>>>> Paulo
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> [hidden email] <mailto:[hidden email]>
>>>>>> https://lists.intermine.org/mailman/listinfo/dev
>>>>>
>>>>> --
>>>>> sergio contrino                  InterMine, University of Cambridge
>>>>> https://sergiocontrino.github.io http://www.intermine.org
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> [hidden email] <mailto:[hidden email]>
>>>>> https://lists.intermine.org/mailman/listinfo/dev
>>>
>>> --
>>> sergio contrino                  InterMine, University of Cambridge
>>> https://sergiocontrino.github.io http://www.intermine.org
>
> --
> sergio contrino                  InterMine, University of Cambridge
> https://sergiocontrino.github.io           http://www.intermine.org

_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Debugging Entrez species step

sergio contrino-2
hi paulo,
sorry for the slow reply.
can you clarify if your inopinata taxid is in the organism table before
the entrez organism step? you can easily check querying the dump db that
you should have created.
it seems you have a record in the organism table with a null taxid.
could you send me a select * from organism query result?
thanks
sergio

On 09/02/2019 03:56, Paulo Nuin wrote:

> Hi
>
> It really fails on the undetermined sequence on NCBI. My organism table has all taxon IDs but that one. Message and log are below
>
> Paulo
>
> gerBinder.class]
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] SLF4J: Foundprior binding in [jar:file:/home/nuin/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-jdk14/1.7.10/13675a57601d9106fe82314fe9eebb9b2d9b13c6/slf4j-jdk14-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Starting...
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Start completed.
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Shutdown initiated...
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daprioremon] [insertModel] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.common-tgt-items - Shutdown completed.
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] :dbmodel:preRetrieveSingleSource
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] Pre-retrieving entrez-organism
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Class path contains multiple SLF4J bindings.
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Found binding in [jar:file:/home/nuin/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-simple/1.7.7/8095d0b9f7e0a9cd79a663c740e0f8fb31d0e2c8/slf4j-simple-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Found binding in [jar:file:/home/nuin/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-jdk14/1.7.10/13675a57601d9106fe82314fe9eebb9b2d9b13c6/slf4j-jdk14-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.production - Starting...
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [preRetrieve] [Task worker for ':' Thread 6] INFO com.zaxxer.hikari.HikariDataSource - db.production - Start completed.
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] :dbmodel:preRetrieveSingleSource FAILED
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] 10 actionable tasks: 6 executed, 4 up-to-date
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [Thread-29] INFO com.zaxxer.hikari.HikariDataSource - db.production - Shutdown initiated...
>    [/mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon] [Thread-29] INFO com.zaxxer.hikari.HikariDataSource - db.production - Shutdown completed.
> Sat Feb  9 03:51:36 UTC 2019
>
> finished
>
>
> failed with exit code 0: /mnt/data2/2.0/mine/WormMine/gradlew --stacktrace --no-daemon --stacktrace integrate -Psource=entrez-organism
>
>
>
>
> * What went wrong:
> Execution failed for task ':dbmodel:preRetrieveSingleSource'.
>> exception while retrieving organisms
>
> * Try:
> Run with --info or --debug option to get more log output.
>
> * Exception is:
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':dbmodel:preRetrieveSingleSource'.
>          at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:100)
>          at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:70)
>          at org.gradle.api.internal.tasks.execution.SkipUpToDateTaskExecuter.execute(SkipUpToDateTaskExecuter.java:63)
>          at org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java
> :54)
>          at org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
>          at org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:88)
>          at org.gradle.api.internal.tasks.execution.ResolveTaskArtifactStateTaskExecuter.execute(ResolveTaskArtifactStateTaskExecuter.java:5
> 2)
>          at org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:52)
>          at org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:54)
>          at org.gradle.api.internal.tasks.execution.ExecuteAtMostOnceTaskExecuter.execute(ExecuteAtMostOnceTaskExecuter.java:43)
>          at org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:34)
>          at org.gradle.execution.taskgraph.DefaultTaskGraphExecuter$EventFiringTaskWorker$1.run(DefaultTaskGraphExecuter.java:248)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.ja
> va:336)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.ja
> va:328)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:197)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:107)
>          at org.gradle.execution.taskgraph.DefaultTaskGraphExecuter$EventFiringTaskWorker.execute(DefaultTaskGraphExecuter.java:241)
>          at org.gradle.execution.taskgraph.DefaultTaskGraphExecuter$EventFiringTaskWorker.execute(DefaultTaskGraphExecuter.java:230)
>          at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.processTask(DefaultTaskPlanExecutor.java:124)
>          at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.access$200(Defau
>          at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.access$200(DefaultTaskPlanExecutor.java:80)   [0/1901]
>          at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker$1.execute(DefaultTaskPlanExecutor.java:105)
>          at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker$1.execute(DefaultTaskPlanExecutor.java:99)
>          at org.gradle.execution.taskgraph.DefaultTaskExecutionPlan.execute(DefaultTaskExecutionPlan.java:625)
>          at org.gradle.execution.taskgraph.DefaultTaskExecutionPlan.executeWithTask(DefaultTaskExecutionPlan.java:580)
>          at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor$TaskExecutorWorker.run(DefaultTaskPlanExecutor.java:99)
>          at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
>          at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
>          at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
> Caused by: : exception while retrieving organisms
>          at org.intermine.bio.dataconversion.EntrezOrganismRetriever.execute(EntrezOrganismRetriever.java:127)
>          at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
>          at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>          at org.intermine.plugin.integrate.IntegrateUtils$_closure1.doCall(IntegrateUtils.groovy:39)
>          at org.intermine.plugin.integrate.IntegratePlugin$_apply_closure6$_closure18.doCall(IntegratePlugin.groovy:150)
>          at org.gradle.api.internal.AbstractTask$ClosureTaskAction.execute(AbstractTask.java:681)
>          at org.gradle.api.internal.AbstractTask$ClosureTaskAction.execute(AbstractTask.java:656)
>          at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$1.run(ExecuteActionsTaskExecuter.java:122)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:336)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:328)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:197)
>          at org.gradle.internal.progress.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:107)
>          at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeAction(ExecuteActionsTaskExecuter.java:111)
>          at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:92)
>          ... 27 more
> Caused by: java.lang.NullPointerException
>          at org.intermine.metadata.StringUtil.join(StringUtil.java:103)
>          at org.intermine.bio.dataconversion.EntrezOrganismRetriever.getReader(EntrezOrganismRetriever.java:171)
>          at org.intermine.bio.dataconversion.EntrezOrganismRetriever.execute(EntrezOrganismRetriever.java:115)
>          ... 40 more
>
>> On Feb 8, 2019, at 5:28 AM, sergio contrino <[hidden email]> wrote:
>>
>> hi paulo,
>> i think the entrez step uses the taxid, and with the taxonomy browser you can find the species (but the name is Caenorhabditis sp. 34 TK-2017).
>> should you conform to ncbi naming? or ask them for a correction?
>> still i think it should work.
>> can you send the error message, or the log?
>> thanks!
>> sergio
>>
>>
>> On 08/02/2019 05:03, Paulo Nuin wrote:
>>> Hi Sergio
>>> I think I found the error on the step (building with 2.x now). It seems that one species that has items in our sources don’t have a species name on NCBI. Caenorhabditis inopinata is the species, which is in our XML initially with a taxon ID 1978547. The problem is that inopinata doesn’t exist as such on NCBI, as it doesn’t have a species name
>>> https://www.ncbi.nlm.nih.gov/bioproject/382947
>>> I think when the entrez steps tries to get the information there it doesn’t find a match. Any way to solve this on our side?
>>> Thanks
>>> Paulo
>>>> On Dec 20, 2018, at 3:47 AM, sergio contrino <[hidden email] <mailto:[hidden email]>> wrote:
>>>>
>>>> hi paulo,
>>>> would it be possible for you to run it again?
>>>> i would suggest you add an intermediate dump just before the entrez-organism step, so you can retry the step without losing all the build up to that moment (see below for example)
>>>> then you could share the integration log file (and also test if it is a temporary network issue)
>>>> thanks!
>>>> sergio
>>>>
>>>>
>>>> in your project.xml file:
>>>> [..]
>>>> <source name="ensembl-compara-yeast" type="ensembl-compara" dump="true">
>>>> ...
>>>> <source name="entrez-organism" type="entrez-organism">
>>>> [..]
>>>>
>>>> you can then restart your build after failure from the entrez-organism step (project_build script with -r switch)
>>>>
>>>>
>>>> On 19/12/2018 18:28, Paulo Nuin wrote:
>>>>> Hi Sergio
>>>>> I don’t have the log anymore, but it basically showed that the step failed. The XML file created by it has just
>>>>> <item>
>>>>> in it, nothing else.
>>>>> Cheers
>>>>> Paulo
>>>>>> On Dec 19, 2018, at 5:29 AM, sergio contrino <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>> hi paul, could you please send the logs of when it failed?
>>>>>> the entrez step involves calls to ncbi, so there could be external factors as well.
>>>>>> thanks
>>>>>> sergio
>>>>>>
>>>>>> On 18/12/2018 22:47, Paulo Nuin wrote:
>>>>>>> Hi everyone
>>>>>>> We are having issues (again) with the Entrez step to load species features in the organism DB. It’s out last step of the build and earlier when we had the same problem we added species/taxon IDs to our XML that were loaded in the DB and not listed in it. At this time, it seems that all species are listed and loaded from our XML, but the Entrez step is still failing after the build is complete.
>>>>>>> Running the step by itself doesn’t output any problems, just at the end of the build. Is there a straightforward to debug the Entrez step?
>>>>>>> Thanks
>>>>>>> Paulo
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> [hidden email] <mailto:[hidden email]>
>>>>>>> https://lists.intermine.org/mailman/listinfo/dev
>>>>>>
>>>>>> --
>>>>>> sergio contrino                  InterMine, University of Cambridge
>>>>>> https://sergiocontrino.github.io http://www.intermine.org
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> [hidden email] <mailto:[hidden email]>
>>>>>> https://lists.intermine.org/mailman/listinfo/dev
>>>>
>>>> --
>>>> sergio contrino                  InterMine, University of Cambridge
>>>> https://sergiocontrino.github.io http://www.intermine.org
>>
>> --
>> sergio contrino                  InterMine, University of Cambridge
>> https://sergiocontrino.github.io           http://www.intermine.org
>

--
sergio contrino                  InterMine, University of Cambridge
https://sergiocontrino.github.io           http://www.intermine.org
_______________________________________________
dev mailing list
[hidden email]
https://lists.intermine.org/mailman/listinfo/dev