Re: maker-devel post from christopher.keeling.1@ulaval.ca requires approval

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: maker-devel post from christopher.keeling.1@ulaval.ca requires approval

Carson Holt-2
Hi Chris,

Sorry for the slow reply. Actually The desired behavior is to capture the entire name from the fasta header and not try and divide on the pipes. NCBI BLAST versions have historically done this dividing but only for NCBI sourced data (it won’t do it for Swiss-prot for example or at least it wouldn’t with all previous versions).  If it is doing that now, that is a rather big behavior change, but can be turned off by adding -show_gis to the blast command line.

Thanks,
Carson




From: Christopher Keeling <[hidden email]>
Subject: Re: Maker 2.31.10: maker_functional_gff and maker_functional_fasta not parsing correctly, Can't use string ("") as a HASH ref while "strict refs" in use
Date: July 7, 2020 at 6:12:37 PM MDT


Hi Carson,

I’m now using Maker 3.01.03, and I’m finding that maker_functional_gff and maker_functional_fasta still are not behaving as they should. I’m getting an error:

Can't use string ("") as a HASH ref while "strict refs" in use at /usr/local/bin/maker/bin/maker_functional_gff line 55, <$IN> line 167.

Version 2020_03 of uniprot_sprot.fasta starts like this:

>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1

Based on your scripts, this is the example of your first condition.  However, I find that I need to change it (in red) to get it to work as I understand it should work:

                #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1
if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) {
                        $id   = $1;
                        $desc = $2;
                        $org  = $3;
                        $name = $5 || ‘';
}

Compared to what is in 3.01.03:
                #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1
                if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+OX=(.*?)\s+(GN=(.*?)\s+)?PE=/) {
                        $id   = $1;
                        $desc = $2;
                        $org  = $3;
                        $name = $6 || '';
                }

Thus, with my edits:
>sp|Q62559|IFT52_MOUSE Intraflagellar transport protein 52 homolog OS=Mus musculus OX=10090 GN=Ift52 PE=1 SV=2

maker_functional_gff would result in:
...Note=Similar to Ift52: Intraflagellar transport protein 52 homolog (Mus musculus);

And maker_function_gff would result in:
Name:"Similar to Ift52 Intraflagellar transport protein 52 homolog (Mus musculus)"

Are these the expected behaviours?

Cheers,

Chris

On Mar 14, 2020, at 1:24 PM, Christopher Keeling <[hidden email]> wrote:

Hello,

In sub parse_blast{, during parsing of uniprot fasta file:

if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+(GN=(.*?)\s+)?PE=/) {

should be changed to:

if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) {

to avoid "Can't use string ("") as a HASH ref while "strict refs" in use at…" errors.


Cheers,
Chris





Subject: confirm 4103e2b4c7646d07c7e79febdc4867fcd9cb2430
Date: July 7, 2020 at 6:12:59 PM MDT


If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message.  Do this if the message is
spam.  If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for posting
to the list.  The Approved: header can also appear in the first line
of the body of the reply.



_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org