Problem loading match features into CHADO with gmob_bulk_load_gff3.pl

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Problem loading match features into CHADO with gmob_bulk_load_gff3.pl

lpritc@scri.ac.uk
Hi,

I have a set of EST matches (generated elsewhere) in GFF3 format that I'm
trying to get into a local CHADO db.  The matches are in match/match_part
format, but some matches throw an error on upload while others do not, and I
can't see what's going wrong.  A working example is:

#gff-version 3
SuperContig693 BLAT_SangerEST_spore match 5764 6388 72.18 - . Note=BLAT%20ma
tch%20to%20EST%20Hp_ENSC_38P24%2C%20Sanger%20EST%20from%20spore;Target=Hp_EN
SC_38P24%20231%20834%20%2B;Name=Hp_ENSC_38P24;ID=SuperContig693:est35
SuperContig693 BLAT_SangerEST_spore match_part 5764 6222 . - . Target=Hp_ENS
C_38P24%20515%20604%20%2B;Name=Hp_ENSC_38P24;Parent=SuperContig693:est35;ID=
SuperContig693:est35:1
SuperContig693 BLAT_SangerEST_spore match_part 6233 6285 . - . Target=Hp_ENS
C_38P24%20461%20513%20%2B;Name=Hp_ENSC_38P24;Parent=SuperContig693:est35;ID=
SuperContig693:est35:2
SuperContig693 BLAT_SangerEST_spore match_part 6299 6388 . - . Target=Hp_ENS
C_38P24%201%20459%20%2B;Name=Hp_ENSC_38P24;Parent=SuperContig693:est35;ID=Su
perContig693:est35:3

$ gmod_bulk_load_gff3.pl --gfffile test2.gff3 --organism "Hyaloperonospora
arabidopsidis EMOY2"     --dbname oomycete_reference --dbuser ******
--dbpass ****** --dbhost localhost --analysis --score identity
Preparing data for inserting into the oomycete_reference database
(This may take a while ...)
Loading data into feature table ...
Loading data into featureloc table ...
Loading data into feature_relationship table ...
Loading data into featureprop table ...
Skipping feature_cvterm table since the load file is empty...
Skipping synonym table since the load file is empty...
Loading data into feature_synonym table ...
Skipping dbxref table since the load file is empty...
Loading data into feature_dbxref table ...
Loading data into analysisfeature table ...
Skipping cvterm table since the load file is empty...
Skipping db table since the load file is empty...
Skipping cv table since the load file is empty...
Skipping analysis table since the load file is empty...
Skipping organism table since the load file is empty...
Loading sequences (if any) ...

Done.

However, the following failing example doesn't appear to be significantly
different in terms of it content:

#gff-version 3
VelvetSuperContig2944 BLAT_454EST_3dpi match 1 39 60.00 + . Note=BLAT%20matc
h%20to%20EST%20EIVZC0C01A6QJP%2C%20454%20EST%203dpi;Target=EIVZC0C01A6QJP%20
27%2065%20%2B;Name=EIVZC0C01A6QJP;ID=VelvetSuperContig2944:est1
VelvetSuperContig2944 BLAT_454EST_3dpi match_part 1 39 . + . Target=EIVZC0C0
1A6QJP%2027%2065%20%2B;Name=EIVZC0C01A6QJP;Parent=VelvetSuperContig2944:est1
;ID=VelvetSuperContig2944:est1:1

But throws the error:

$ gmod_bulk_load_gff3.pl --gfffile test.gff3 --organism "Hyaloperonospora
arabidopsidis EMOY2"     --dbname oomycete_reference --dbuser ******
--dbpass ****** --dbhost localhost --analysis --score identity
Preparing data for inserting into the oomycete_reference database
(This may take a while ...)
Loading data into feature table ...
Loading data into featureloc table ...
Loading data into feature_relationship table ...
Loading data into featureprop table ...
Skipping feature_cvterm table since the load file is empty...
Skipping synonym table since the load file is empty...
Loading data into feature_synonym table ...
Skipping dbxref table since the load file is empty...
Loading data into feature_dbxref table ...
Loading data into analysisfeature table ...
Skipping cvterm table since the load file is empty...
Skipping db table since the load file is empty...
Skipping cv table since the load file is empty...
Skipping analysis table since the load file is empty...
Skipping organism table since the load file is empty...
DBD::Pg::db commit failed: ERROR:  insert or update on table "featureloc"
violates foreign key constraint "featureloc_srcfeature_id_fkey"
DETAIL:  Key (srcfeature_id)=(1990175) is not present in table "feature". at
/usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 3151, <$fh> line
4.
commit failed: ERROR:  insert or update on table "featureloc" violates
foreign key constraint "featureloc_srcfeature_id_fkey"
DETAIL:  Key (srcfeature_id)=(1990175) is not present in table "feature". at
/usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 3151, <$fh> line
4.

Abnormal termination, trying to clean up...

Attempting to clean up the loader temp table (so that --recreate_cache
won't be needed)...
Trying to remove the run lock (so that --remove_lock won't be needed)...
Exiting...


There is no feature with id 1990175 in the database, but
VelvetSuperContig2944 does exist with a different id:

oomycete_reference=> SELECT feature_id, name FROM feature WHERE feature_id =
1990175;
 feature_id | name
------------+------
(0 rows)

oomycete_reference=> SELECT feature_id, name FROM feature WHERE name =
'VelvetSuperContig2944';
 feature_id |         name
------------+-----------------------
    1993085 | VelvetSuperContig2944


There is no entry with the name of the Target sequence in either case (until
the successful upload), and the target sequence is not in the dataset at all
for either EST.  When the Target attribute is deleted from the match and
match_part features of the failing EST example, that entry is uploaded into
the database without error, so I guess that's where the problem lies - I
just can't see what it is.


Does anyone have any suggestions for what might be going wrong, here?  I can
work around the issue by stripping the Target attribute from all 250000+
ESTs, but I'm reluctant to make that our long-term solution.

Thanks,

L.

--
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:[hidden email]       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [hidden email] quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema