Re: [maker] transcripts doesn't provide any help

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [maker] transcripts doesn't provide any help

Michael Campbell
Hi Pei-Ying,

The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? 

For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don’t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone.

I’ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I’m also ccing the devlist. There may be others in the community that can comment on the run times.

Thanks,
Mike
On May 17, 2016, at 10:10 PM, Pei-Ying Huang <[hidden email]> wrote:

Hi mike,

My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated.
Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence.
Then based on the post, I expect I get the result no more than two weeks.  However, it seems it will take me more than three months.

Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff.
Then autotrain with augustus, here is my command
autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log 


But I saw one's method below, so I wonder if I am doing wrong?

"We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 
1. EST data
2. Proteins from related species 
3. a SNAP model trained using CEGMA 
4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 
5. Running maker2zff on the output of MAKER, and converting that to GFF3
Once done, we run MAKER a second time using the Augustus model and more stringent settings."

Thank you.
Pei-Ying

 



2016-05-18 9:16 GMT+08:00 Michael Campbell <[hidden email]>:
Hi Pei-Ying,

One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. 

I would only redo the sequencing if you are convinced that the original sequencing is bad.

Mike


On May 16, 2016, at 8:42 PM, Pei-Ying Huang <[hidden email]> wrote:

Hi mike,

As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq.

If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out?
If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing?
Thank you.

Pei-Ying

2016-05-09 22:18 GMT+08:00 Michael Campbell <[hidden email]>:
I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. 

What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn’t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead.
 
Thanks,
Mike
On May 8, 2016, at 10:08 AM, Pei-Ying Huang <[hidden email]> wrote:

Have you done all of the test?
What would you suggest me to run my data?

To get ab initio model by setting the est2genome =1 and protein2genome = 0,
then training with sanp model with est2genome = 0 and protein2genome = 0,
training second snap model with est2genome = 0 and protein2genome = 0.

Thank you.

2016-05-07 0:30 GMT+08:00 Michael Campbell <[hidden email]>:
So far in the tests that I’ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene.
Mike
On May 5, 2016, at 10:18 PM, Pei-Ying Huang <[hidden email]> wrote:

Hi Mike,

I found one five_prime_UTP evidence, but only this one shown in the scaff0001.
Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others?
Thank you.

GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426
GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308
GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1

Pei-Ying

2016-05-06 8:31 GMT+08:00 Pei-Ying Huang <[hidden email]>:
Hi Mike,

Any clue about the problems?
Or my thought is wrong.  I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file.
Thank you.

Pei-Ying


2016-05-05 1:22 GMT+08:00 Pei-Ying Huang <[hidden email]>:
Hi Mike,

Attached file is the folder I use to run maker. Thank you.
Pei-Ying

2016-05-04 22:54 GMT+08:00 Michael Campbell <[hidden email]>:
Hi Pei-Ying,

If the sample data didn’t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file.

your maker_opts.ctl file looks fine.

If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. 

Thanks,
Mike
On May 4, 2016, at 12:33 AM, Pei-Ying Huang <[hidden email]> wrote:

Hi Mike,

basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER.

I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3.
I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input.
Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs.

Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker?
When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below.

/home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff

Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221531.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221532.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221533.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221534.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221535.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221536.
less A_guli_1.all.gff
GULI.scaff0001  maker   gene    1750118 1755997 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37
GULI.scaff0001  maker   mRNA    1750118 1755997 5292    -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764
GULI.scaff0001  maker   exon    1750118 1750214 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1750304 1750815 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1750896 1751717 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1751849 1752373 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1752515 1753488 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1753554 1754406 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1754489 1755997 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1754489 1755997 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1753554 1754406 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1752515 1753488 .       -       2       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1751849 1752373 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1750896 1751717 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1750304 1750815 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1750118 1750214 .       -       1       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1

Thank you.
Pei-Ying




2016-04-14 21:09 GMT+08:00 Michael Campbell <[hidden email]>:
It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. 

It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER.

Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. 

Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes.

The attached protocol paper also addresses your quality question to an extent.


<basic_protocol_1.tar.gz>












_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: [maker] transcripts doesn't provide any help

Carson Hinton Holt
Yes. Use top to check cpu usage. If it’s not 100% for the machine (or 6400% for all processes - 64 cpus * 100%), then we can look if you are launching the command correctly or have other issues.

—Carson


On May 18, 2016, at 7:16 AM, Michael Campbell <[hidden email]> wrote:

Hi Pei-Ying,

The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? 

For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don’t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone.

I’ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I’m also ccing the devlist. There may be others in the community that can comment on the run times.

Thanks,
Mike
On May 17, 2016, at 10:10 PM, Pei-Ying Huang <[hidden email]> wrote:

Hi mike,

My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated.
Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence.
Then based on the post, I expect I get the result no more than two weeks.  However, it seems it will take me more than three months.

Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff.
Then autotrain with augustus, here is my command
autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log 


But I saw one's method below, so I wonder if I am doing wrong?

"We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 
1. EST data
2. Proteins from related species 
3. a SNAP model trained using CEGMA 
4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 
5. Running maker2zff on the output of MAKER, and converting that to GFF3
Once done, we run MAKER a second time using the Augustus model and more stringent settings."

Thank you.
Pei-Ying

 



2016-05-18 9:16 GMT+08:00 Michael Campbell <[hidden email]>:
Hi Pei-Ying,

One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. 

I would only redo the sequencing if you are convinced that the original sequencing is bad.

Mike


On May 16, 2016, at 8:42 PM, Pei-Ying Huang <[hidden email]> wrote:

Hi mike,

As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq.

If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out?
If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing?
Thank you.

Pei-Ying

2016-05-09 22:18 GMT+08:00 Michael Campbell <[hidden email]>:
I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. 

What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn’t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead.
 
Thanks,
Mike
On May 8, 2016, at 10:08 AM, Pei-Ying Huang <[hidden email]> wrote:

Have you done all of the test?
What would you suggest me to run my data?

To get ab initio model by setting the est2genome =1 and protein2genome = 0,
then training with sanp model with est2genome = 0 and protein2genome = 0,
training second snap model with est2genome = 0 and protein2genome = 0.

Thank you.

2016-05-07 0:30 GMT+08:00 Michael Campbell <[hidden email]>:
So far in the tests that I’ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene.
Mike
On May 5, 2016, at 10:18 PM, Pei-Ying Huang <[hidden email]> wrote:

Hi Mike,

I found one five_prime_UTP evidence, but only this one shown in the scaff0001.
Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others?
Thank you.

GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426
GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308
GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1

Pei-Ying

2016-05-06 8:31 GMT+08:00 Pei-Ying Huang <[hidden email]>:
Hi Mike,

Any clue about the problems?
Or my thought is wrong.  I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file.
Thank you.

Pei-Ying


2016-05-05 1:22 GMT+08:00 Pei-Ying Huang <[hidden email]>:
Hi Mike,

Attached file is the folder I use to run maker. Thank you.
Pei-Ying

2016-05-04 22:54 GMT+08:00 Michael Campbell <[hidden email]>:
Hi Pei-Ying,

If the sample data didn’t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file.

your maker_opts.ctl file looks fine.

If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. 

Thanks,
Mike
On May 4, 2016, at 12:33 AM, Pei-Ying Huang <[hidden email]> wrote:

Hi Mike,

basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER.

I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3.
I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input.
Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs.

Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker?
When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below.

/home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff

Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221531.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221532.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221533.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221534.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221535.
Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221536.
less A_guli_1.all.gff
GULI.scaff0001  maker   gene    1750118 1755997 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37
GULI.scaff0001  maker   mRNA    1750118 1755997 5292    -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764
GULI.scaff0001  maker   exon    1750118 1750214 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1750304 1750815 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1750896 1751717 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1751849 1752373 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1752515 1753488 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1753554 1754406 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   exon    1754489 1755997 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1754489 1755997 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1753554 1754406 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1752515 1753488 .       -       2       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1751849 1752373 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1750896 1751717 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1750304 1750815 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
GULI.scaff0001  maker   CDS     1750118 1750214 .       -       1       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1

Thank you.
Pei-Ying




2016-04-14 21:09 GMT+08:00 Michael Campbell <[hidden email]>:
It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. 

It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER.

Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. 

Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes.

The attached protocol paper also addresses your quality question to an extent.


<basic_protocol_1.tar.gz>













_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org