Error from Pseudo gene identification scripts

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Error from Pseudo gene identification scripts

Quanwei Zhang
Hello:

I am trying to identify pseudo genes following http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene

After I get the blast result, I am trying to scan pseudogenes by the command "python pseudo_wrap.py parameter". But I got the following errors. Do you have any ideas and suggestions about the errors? Thanks.

##below shows reported errors
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

Best
Quanwei

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

Carson Holt-2
I’m going to CC Michael Campbell on this. I wasn’t really involved with any of the pseudogene accessory scripts and protocols that went with the MAKER-P publication nor have I really been involved with pseudogene annotation in general. So Michael might have more insight here.

—Carson

On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

I am trying to identify pseudo genes following http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene

After I get the blast result, I am trying to scan pseudogenes by the command "python pseudo_wrap.py parameter". But I got the following errors. Do you have any ideas and suggestions about the errors? Thanks.

##below shows reported errors
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

Quanwei Zhang
Thank you Carson and Michael.

Best
Quanwei

2017-12-07 23:42 GMT-05:00 Carson Holt <[hidden email]>:
I’m going to CC Michael Campbell on this. I wasn’t really involved with any of the pseudogene accessory scripts and protocols that went with the MAKER-P publication nor have I really been involved with pseudogene annotation in general. So Michael might have more insight here.

—Carson

On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

I am trying to identify pseudo genes following http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene

After I get the blast result, I am trying to scan pseudogenes by the command "python pseudo_wrap.py parameter". But I got the following errors. Do you have any ideas and suggestions about the errors? Thanks.

##below shows reported errors
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

Michael Campbell
Hi Quanwei,

My guess would be a file format issue, but the code has evolved since I worked with it. The last time that ran it the fasta header had to contain only the sequence ID without a space after it. That was the big gotcha that I remember.

I’ve ccd Shin-Han Shiu on this one. The pipeline was developed in his lab.

Thanks,
Mike

On Dec 8, 2017, at 8:46 AM, Quanwei Zhang <[hidden email]> wrote:

Thank you Carson and Michael.

Best
Quanwei

2017-12-07 23:42 GMT-05:00 Carson Holt <[hidden email]>:
I’m going to CC Michael Campbell on this. I wasn’t really involved with any of the pseudogene accessory scripts and protocols that went with the MAKER-P publication nor have I really been involved with pseudogene annotation in general. So Michael might have more insight here.

—Carson

On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

I am trying to identify pseudo genes following http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene

After I get the blast result, I am trying to scan pseudogenes by the command "python pseudo_wrap.py parameter". But I got the following errors. Do you have any ideas and suggestions about the errors? Thanks.

##below shows reported errors
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

Shin-Han Shiu

Hi Mike and Carson, we will take over from here. Thanks for referring the message to us.

Quanwei, it looks like for some reason your input sequence file is missing "maker-Contig2656-snap-gene-1.9-mRNA-1". This can be an issue with the sequence name since the code use space as delimiter in places. Can you check your sequence file for this sequence and let us know how the name after ">" look like?

Nick, sorry for bugging you. Do you have any input on this?

Shinhan


On 12/10/2017 8:37 PM, Michael Campbell wrote:
Hi Quanwei,

My guess would be a file format issue, but the code has evolved since I worked with it. The last time that ran it the fasta header had to contain only the sequence ID without a space after it. That was the big gotcha that I remember.

I’ve ccd Shin-Han Shiu on this one. The pipeline was developed in his lab.

Thanks,
Mike

On Dec 8, 2017, at 8:46 AM, Quanwei Zhang <[hidden email]> wrote:

Thank you Carson and Michael.

Best
Quanwei

2017-12-07 23:42 GMT-05:00 Carson Holt <[hidden email]>:
I’m going to CC Michael Campbell on this. I wasn’t really involved with any of the pseudogene accessory scripts and protocols that went with the MAKER-P publication nor have I really been involved with pseudogene annotation in general. So Michael might have more insight here.

—Carson

On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

I am trying to identify pseudo genes following http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene

After I get the blast result, I am trying to scan pseudogenes by the command "python pseudo_wrap.py parameter". But I got the following errors. Do you have any ideas and suggestions about the errors? Thanks.

##below shows reported errors
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



-- 
--------------------------------------
Shin-Han Shiu
Michigan State University
Department of Plant Biology
2265 Mol Plant Sci Bldg
(TEL) +1-517-353-7196
http://goo.gl/keiHZX
--------------------------------------

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

Quanwei Zhang
Hello Shinhan and Michael:

Thanks for your help. The sequence is shown below, which is reported protein sequence by Maker2. The error occur when I run "pseudo_wrap.py". With the blast results and predicted protein sequences by Maker2, I am trying to predict the pseudo genes in the whole assembly (both for those in the intergenic regions and those among the predicted proteins).

>maker-Contig2656-snap-gene-1.9-mRNA-1 protein AED:0.04 eAED:0.04 QI:43|1|1|1|0.85|0.87|8|1768|297
MGTSLDIKIKRANKVYHAGEMLSGVVVISSKDSVQHQGMSLTMEGTVNLQLSAKSVGVFE
AFYNSVKPIQIINSTIEMVKPGKFPSGKTEIPFEFPLHVKGNKVLYETYHGVFVNIQYVL
RCDMRRSLLAKDLTKTCEFIVHSVPQKGKLTPSPVDFTITPETLQNVKERALLPKFLIRG
HLNSTNCAITQPLTGELVVEHSDAAIRSIELQLVRVETCGCAEGYARDATEIQNIQIADG
DVCRSLSVPIYMVFPRLFTCPTLETTNFKVEFEINVVVLLHADHLITENFPLKLCRT


#below is the blast results
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    100.000    51    0    0    220    270    424151    424303    3.23e-25    111
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    93.103    58    4    0    170    227    423367    423540    4.19e-24    108
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    85.000    60    7    1    66    123    404001    404180    5.67e-24    107
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    100.000    48    0    0    20    67    402613    402756    3.47e-20    96.7
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    50.725    69    25    1    238    297    426022    426228    6.48e-09    63.2
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    67.308    52    15    2    118    168    417125    417277    2.07e-08    61.6
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    100.000    25    0    0    145    169    419825    419899    5.54e-06    53.9
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2656    76.667    30    5    1    1    30    382922    383005    0.012    43.5
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig3808    22.545    275    175    9    15    283    112218    111490    1.13e-07    59.3
maker-Contig2656-snap-gene-1.9-mRNA-1    Contig2791    26.667    60    43    1    236    295    20108374    20108550    9.6    34.3


Many thanks

Best
Quanwei

2017-12-11 8:13 GMT-05:00 Shin-Han Shiu <[hidden email]>:

Hi Mike and Carson, we will take over from here. Thanks for referring the message to us.

Quanwei, it looks like for some reason your input sequence file is missing "maker-Contig2656-snap-gene-1.9-mRNA-1". This can be an issue with the sequence name since the code use space as delimiter in places. Can you check your sequence file for this sequence and let us know how the name after ">" look like?

Nick, sorry for bugging you. Do you have any input on this?

Shinhan


On 12/10/2017 8:37 PM, Michael Campbell wrote:
Hi Quanwei,

My guess would be a file format issue, but the code has evolved since I worked with it. The last time that ran it the fasta header had to contain only the sequence ID without a space after it. That was the big gotcha that I remember.

I’ve ccd Shin-Han Shiu on this one. The pipeline was developed in his lab.

Thanks,
Mike

On Dec 8, 2017, at 8:46 AM, Quanwei Zhang <[hidden email]> wrote:

Thank you Carson and Michael.

Best
Quanwei

2017-12-07 23:42 GMT-05:00 Carson Holt <[hidden email]>:
I’m going to CC Michael Campbell on this. I wasn’t really involved with any of the pseudogene accessory scripts and protocols that went with the MAKER-P publication nor have I really been involved with pseudogene annotation in general. So Michael might have more insight here.

—Carson

On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

I am trying to identify pseudo genes following http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene

After I get the blast result, I am trying to scan pseudogenes by the command "python pseudo_wrap.py parameter". But I got the following errors. Do you have any ideas and suggestions about the errors? Thanks.

##below shows reported errors
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



-- 
--------------------------------------
Shin-Han Shiu
Michigan State University
Department of Plant Biology
2265 Mol Plant Sci Bldg
(TEL) <a href="tel:(517)%20353-7196" value="+15173537196" target="_blank">+1-517-353-7196
http://goo.gl/keiHZX
--------------------------------------


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

panchyni
In reply to this post by Shin-Han Shiu

Quanwei, in addition to checking the sequence file, can you also send
me the standard output of the run (this the text that normally prints
to the terminal unless you pipe it somewhere else)? This would help
in diagnosing the problem, but Shin-Han is likely correct that it is an
issue in the name formatting.

Nick

Quoting Shin-Han Shiu <[hidden email]>:

> Hi Mike and Carson, we will take over from here. Thanks for referring
> the message to us.
>
> Quanwei, it looks like for some reason your input sequence file is
> missing "maker-Contig2656-snap-gene-1.9-mRNA-1". This can be an issue
> with the sequence name since the code use space as delimiter in
> places. Can you check your sequence file for this sequence and let us
> know how the name after ">" look like?
>
> Nick, sorry for bugging you. Do you have any input on this?
>
> Shinhan
>
>
> On 12/10/2017 8:37 PM, Michael Campbell wrote:
>> Hi Quanwei,
>>
>> My guess would be a file format issue, but the code has evolved
>> since I worked with it. The last time that ran it the fasta header
>> had to contain only the sequence ID without a space after it. That
>> was the big gotcha that I remember.
>>
>> I?ve ccd Shin-Han Shiu on this one. The pipeline was developed in his lab.
>>
>> Thanks,
>> Mike
>>
>> On Dec 8, 2017, at 8:46 AM, Quanwei Zhang <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>> Thank you Carson and Michael.
>>>
>>> Best
>>> Quanwei
>>>
>>> 2017-12-07 23:42 GMT-05:00 Carson Holt <[hidden email]
>>> <mailto:[hidden email]>>:
>>>
>>>    I?m going to CC Michael Campbell on this. I wasn?t really
>>>    involved with any of the pseudogene accessory scripts and
>>>    protocols that went with the MAKER-P publication nor have I
>>>    really been involved with pseudogene annotation in general. So
>>>    Michael might have more insight here.
>>>
>>>    ?Carson
>>>
>>>>    On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]
>>>>    <mailto:[hidden email]>> wrote:
>>>>
>>>>    Hello:
>>>>
>>>>    I am trying to identify pseudo genes following
>>>>    http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene
>>>>    <http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene>
>>>>
>>>>    After I get the blast result, I am trying to scan pseudogenes by
>>>>    the command "python pseudo_wrap.py parameter". But I got the
>>>>    following errors. Do you have any ideas and suggestions about
>>>>    the errors? Thanks.
>>>>
>>>>    ##below shows reported errors
>>>>    Traceback (most recent call last):
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py",
>>>>    line 4330, in <module>
>>>>        parse.get_qualified4(blast,fasta,E,I,L,P,Q)
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py",
>>>>    line 3050, in get_qualified4
>>>>        N = sizes[L[0]]
>>>>    KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
>>>>    Traceback (most recent call last):
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py",
>>>>    line 98, in <module>
>>>>        oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
>>>>    IndexError: list index out of range
>>>>    Done!
>>>>
>>>>    Best
>>>>    Quanwei
>>>>    _______________________________________________
>>>>    maker-devel mailing list
>>>>    [hidden email]
>>>>    <mailto:[hidden email]>
>>>>    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>   
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=nE__W8dFE-shTxStwXtp0A&r=rf2UnAHeUSb4ulp2JbXt_w&m=4eCUx-nUmZ43poIB8geM9XkIKXoND4Yzi4aw4bXAfUU&s=2GYyuVGmT8vENvvk0LPCHjSUEmEzXdcyOnhXDjoTEcQ&e=>
>>>
>>>
>
> --
> --------------------------------------
> Shin-Han Shiu
> Michigan State University
> Department of Plant Biology
> 2265 Mol Plant Sci Bldg
> (TEL) +1-517-353-7196
> http://goo.gl/keiHZX
> --------------------------------------
>
>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

Quanwei Zhang
Thank you Nick. The output is shown below.

Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

NOTE: DatabaseOp not imported
Program  : tfasty34
Pair list: ../maker2.blastn.m_parsed_G500.PE_I500.PS1.pairs
Fasta1   : ../../prediction2_final.proteins.fasta
Fasta2   : ../maker2.blastn.m_parsed_G500.PE_I500.PS1.subj_coord.fa
Fasta dir: /gs/gsfs0/users/qzhang/tools/maker2_pseudogene/fasta36.3.8e/bin/
Working d:
Flags    : -A -m 3 -q
E thres  : 1.0
Read gene pairs...
 0 pairs
Read fasta files...
Do sw...
Done!
Read BLOSUM50 matrix...
Read the sw.out file...
Compare sequences:
 total: 0 alignments
Done!
Check parameter file...
 default: ml_t=30
 default: ev_t=5
 default: ml_p=0.05
 default: id_t=40
Filter ../maker2.blastn.m...
 E:5 I:40 L:30 P:0.05
Get pseudoexons...
 pseudoexon file: ../maker2.blastn.m_parsed_G500.PE
Get phase 1 pseudogene...
 phase1 ps file: ../maker2.blastn.m_parsed_G500.PE_I500.PS1
Get pair file and subject coordinates...
 pair file  : ../maker2.blastn.m_parsed_G500.PE_I500.PS1.pairs
 coordinates: ../maker2.blastn.m_parsed_G500.PE_I500.PS1.subj_coord
Get phase 1 pseudogene sequences...
 phase 1 ps sequence: ../maker2.blastn.m_parsed_G500.PE_I500.PS1.subj_coord.fa
Find stop and framshifts...
 Smith-Waterman outputs: ../maker2.blastn.m_parsed_G500.PE_I500.PS1_pairs.sw.*
 Final output: ../maker2.blastn.m_parsed_G500.PE_I500.PS1_pairs.sw.out.disable_count

The pseudogene pipeline has finished!


Best
Quanwei

2017-12-11 10:01 GMT-05:00 <[hidden email]>:

Quanwei, in addition to checking the sequence file, can you also send
me the standard output of the run (this the text that normally prints
to the terminal unless you pipe it somewhere else)? This would help
in diagnosing the problem, but Shin-Han is likely correct that it is an
issue in the name formatting.

Nick

Quoting Shin-Han Shiu <[hidden email]>:

> Hi Mike and Carson, we will take over from here. Thanks for referring
> the message to us.
>
> Quanwei, it looks like for some reason your input sequence file is
> missing "maker-Contig2656-snap-gene-1.9-mRNA-1". This can be an issue
> with the sequence name since the code use space as delimiter in
> places. Can you check your sequence file for this sequence and let us
> know how the name after ">" look like?
>
> Nick, sorry for bugging you. Do you have any input on this?
>
> Shinhan
>
>
> On 12/10/2017 8:37 PM, Michael Campbell wrote:
>> Hi Quanwei,
>>
>> My guess would be a file format issue, but the code has evolved
>> since I worked with it. The last time that ran it the fasta header
>> had to contain only the sequence ID without a space after it. That
>> was the big gotcha that I remember.
>>
>> I?ve ccd Shin-Han Shiu on this one. The pipeline was developed in his lab.
>>
>> Thanks,
>> Mike
>>
>> On Dec 8, 2017, at 8:46 AM, Quanwei Zhang <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>> Thank you Carson and Michael.
>>>
>>> Best
>>> Quanwei
>>>
>>> 2017-12-07 23:42 GMT-05:00 Carson Holt <[hidden email]
>>> <mailto:[hidden email]>>:
>>>
>>>    I?m going to CC Michael Campbell on this. I wasn?t really
>>>    involved with any of the pseudogene accessory scripts and
>>>    protocols that went with the MAKER-P publication nor have I
>>>    really been involved with pseudogene annotation in general. So
>>>    Michael might have more insight here.
>>>
>>>    ?Carson
>>>
>>>>    On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]

>>>>    <mailto:[hidden email]>> wrote:
>>>>
>>>>    Hello:
>>>>
>>>>    I am trying to identify pseudo genes following
>>>>    http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene
>>>>    <http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene>
>>>>
>>>>    After I get the blast result, I am trying to scan pseudogenes by
>>>>    the command "python pseudo_wrap.py parameter". But I got the
>>>>    following errors. Do you have any ideas and suggestions about
>>>>    the errors? Thanks.
>>>>
>>>>    ##below shows reported errors
>>>>    Traceback (most recent call last):
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py",
>>>>    line 4330, in <module>
>>>>        parse.get_qualified4(blast,fasta,E,I,L,P,Q)
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py",
>>>>    line 3050, in get_qualified4
>>>>        N = sizes[L[0]]
>>>>    KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
>>>>    Traceback (most recent call last):
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py",
>>>>    line 98, in <module>
>>>>        oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
>>>>    IndexError: list index out of range
>>>>    Done!
>>>>
>>>>    Best
>>>>    Quanwei
>>>>    _______________________________________________
>>>>    maker-devel mailing list
>>>>    [hidden email]
>>>>    <mailto:[hidden email]>
>>>>    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>   
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=nE__W8dFE-shTxStwXtp0A&r=rf2UnAHeUSb4ulp2JbXt_w&m=4eCUx-nUmZ43poIB8geM9XkIKXoND4Yzi4aw4bXAfUU&s=2GYyuVGmT8vENvvk0LPCHjSUEmEzXdcyOnhXDjoTEcQ&e=>
>>>
>>>
>
> --
> --------------------------------------
> Shin-Han Shiu
> Michigan State University
> Department of Plant Biology
> 2265 Mol Plant Sci Bldg
> (TEL) <a href="tel:(517)%20353-7196" value="+15173537196" target="_blank">+1-517-353-7196
> http://goo.gl/keiHZX
> --------------------------------------
>
>



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Error from Pseudo gene identification scripts

Quanwei Zhang
Hi Nick:

Thank you Nick. It seems the error is really due to the sequence name. Now "pseudo_wrap.py" is running after I change the names. But there are still some things seem strange. 

(1) I found in the file "log_step1", it shows below information at the top. Does it matter?

NOTE: DatabaseOp not imported
NOTE: DatabaseOp not imported
...

(2) It also shows below information
Done!
sh: /gs/gsfs0/users/qzhang/tools/maker2_pseudogene/fasta36.3.8e/bin//tfasty34: No such file or directory
sh: /gs/gsfs0/users/qzhang/tools/maker2_pseudogene/fasta36.3.8e/bin//tfasty34: No such file or directory
sh: /gs/gsfs0/users/qzhang/tools/maker2_pseudogene/fasta36.3.8e/bin//tfasty34: No such file or directory
....

While when I look into the folder "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/fasta36.3.8e/bin/", I found there is no file "tfasty34". Instead, there is a file named "tfasty36".
fasta36  fastf36  fastm36  fasts36  fastx36  fasty36  ggsearch36  glsearch36  lalign36  map_db  README  ssearch36  tfastf36  tfastm36  tfasts36  tfastx36  tfasty36

Do you have any suggestions on this?

Best
Quanwei

2017-12-11 10:18 GMT-05:00 Quanwei Zhang <[hidden email]>:
Thank you Nick. The output is shown below.

Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 4330, in <module>
    parse.get_qualified4(blast,fasta,E,I,L,P,Q)
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py", line 3050, in get_qualified4
    N = sizes[L[0]]
KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
Traceback (most recent call last):
  File "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py", line 98, in <module>
    oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
IndexError: list index out of range
Done!

NOTE: DatabaseOp not imported
Program  : tfasty34
Pair list: ../maker2.blastn.m_parsed_G500.PE_I500.PS1.pairs
Fasta1   : ../../prediction2_final.proteins.fasta
Fasta2   : ../maker2.blastn.m_parsed_G500.PE_I500.PS1.subj_coord.fa
Fasta dir: /gs/gsfs0/users/qzhang/tools/maker2_pseudogene/fasta36.3.8e/bin/
Working d:
Flags    : -A -m 3 -q
E thres  : 1.0
Read gene pairs...
 0 pairs
Read fasta files...
Do sw...
Done!
Read BLOSUM50 matrix...
Read the sw.out file...
Compare sequences:
 total: 0 alignments
Done!
Check parameter file...
 default: ml_t=30
 default: ev_t=5
 default: ml_p=0.05
 default: id_t=40
Filter ../maker2.blastn.m...
 E:5 I:40 L:30 P:0.05
Get pseudoexons...
 pseudoexon file: ../maker2.blastn.m_parsed_G500.PE
Get phase 1 pseudogene...
 phase1 ps file: ../maker2.blastn.m_parsed_G500.PE_I500.PS1
Get pair file and subject coordinates...
 pair file  : ../maker2.blastn.m_parsed_G500.PE_I500.PS1.pairs
 coordinates: ../maker2.blastn.m_parsed_G500.PE_I500.PS1.subj_coord
Get phase 1 pseudogene sequences...
 phase 1 ps sequence: ../maker2.blastn.m_parsed_G500.PE_I500.PS1.subj_coord.fa
Find stop and framshifts...
 Smith-Waterman outputs: ../maker2.blastn.m_parsed_G500.PE_I500.PS1_pairs.sw.*
 Final output: ../maker2.blastn.m_parsed_G500.PE_I500.PS1_pairs.sw.out.disable_count

The pseudogene pipeline has finished!


Best
Quanwei

2017-12-11 10:01 GMT-05:00 <[hidden email]>:

Quanwei, in addition to checking the sequence file, can you also send
me the standard output of the run (this the text that normally prints
to the terminal unless you pipe it somewhere else)? This would help
in diagnosing the problem, but Shin-Han is likely correct that it is an
issue in the name formatting.

Nick

Quoting Shin-Han Shiu <[hidden email]>:

> Hi Mike and Carson, we will take over from here. Thanks for referring
> the message to us.
>
> Quanwei, it looks like for some reason your input sequence file is
> missing "maker-Contig2656-snap-gene-1.9-mRNA-1". This can be an issue
> with the sequence name since the code use space as delimiter in
> places. Can you check your sequence file for this sequence and let us
> know how the name after ">" look like?
>
> Nick, sorry for bugging you. Do you have any input on this?
>
> Shinhan
>
>
> On 12/10/2017 8:37 PM, Michael Campbell wrote:
>> Hi Quanwei,
>>
>> My guess would be a file format issue, but the code has evolved
>> since I worked with it. The last time that ran it the fasta header
>> had to contain only the sequence ID without a space after it. That
>> was the big gotcha that I remember.
>>
>> I?ve ccd Shin-Han Shiu on this one. The pipeline was developed in his lab.
>>
>> Thanks,
>> Mike
>>
>> On Dec 8, 2017, at 8:46 AM, Quanwei Zhang <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>> Thank you Carson and Michael.
>>>
>>> Best
>>> Quanwei
>>>
>>> 2017-12-07 23:42 GMT-05:00 Carson Holt <[hidden email]
>>> <mailto:[hidden email]>>:
>>>
>>>    I?m going to CC Michael Campbell on this. I wasn?t really
>>>    involved with any of the pseudogene accessory scripts and
>>>    protocols that went with the MAKER-P publication nor have I
>>>    really been involved with pseudogene annotation in general. So
>>>    Michael might have more insight here.
>>>
>>>    ?Carson
>>>
>>>>    On Dec 7, 2017, at 2:44 PM, Quanwei Zhang <[hidden email]

>>>>    <mailto:[hidden email]>> wrote:
>>>>
>>>>    Hello:
>>>>
>>>>    I am trying to identify pseudo genes following
>>>>    http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene
>>>>    <http://shiulab.plantbiology.msu.edu/index.php/Protocol:Pseudogene>
>>>>
>>>>    After I get the blast result, I am trying to scan pseudogenes by
>>>>    the command "python pseudo_wrap.py parameter". But I got the
>>>>    following errors. Do you have any ideas and suggestions about
>>>>    the errors? Thanks.
>>>>
>>>>    ##below shows reported errors
>>>>    Traceback (most recent call last):
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py",
>>>>    line 4330, in <module>
>>>>        parse.get_qualified4(blast,fasta,E,I,L,P,Q)
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg//ParseBlast.py",
>>>>    line 3050, in get_qualified4
>>>>        N = sizes[L[0]]
>>>>    KeyError: 'maker-Contig2656-snap-gene-1.9-mRNA-1'
>>>>    Traceback (most recent call last):
>>>>      File
>>>>   
>>>> "/gs/gsfs0/users/qzhang/tools/maker2_pseudogene/pseudo_pkg/script_step3b.py",
>>>>    line 98, in <module>
>>>>        oup.write("%s\t%s\t%s\t%s\n" % (all_contigs[i][0][0],
>>>>    IndexError: list index out of range
>>>>    Done!
>>>>
>>>>    Best
>>>>    Quanwei
>>>>    _______________________________________________
>>>>    maker-devel mailing list
>>>>    [hidden email]
>>>>    <mailto:[hidden email]>
>>>>    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>   
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=nE__W8dFE-shTxStwXtp0A&r=rf2UnAHeUSb4ulp2JbXt_w&m=4eCUx-nUmZ43poIB8geM9XkIKXoND4Yzi4aw4bXAfUU&s=2GYyuVGmT8vENvvk0LPCHjSUEmEzXdcyOnhXDjoTEcQ&e=>
>>>
>>>
>
> --
> --------------------------------------
> Shin-Han Shiu
> Michigan State University
> Department of Plant Biology
> 2265 Mol Plant Sci Bldg
> (TEL) <a href="tel:(517)%20353-7196" value="+15173537196" target="_blank">+1-517-353-7196
> http://goo.gl/keiHZX
> --------------------------------------
>
>




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org