Unable to reproduce MAKER blastn results

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Unable to reproduce MAKER blastn results

Lior Glick
Hello,

I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure).
I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)?
Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them.
Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence?

Just to make sure you have all the details:
Relevant maker_bopts parameters:
pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments
pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments
eval_blastn=1e-10 #Blastn eval cutoff
bit_blastn=40 #Blastn bit cutoff
depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff)

Blastn command:
blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out

Thank you!

_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

my.blastn (588K) Download Attachment
MAKER.blastn (1M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unable to reproduce MAKER blastn results

Carson Holt-2
The only downstream change to the blast results would be the removal of HSPs not meeting the bit_blastn of a minimum bitscore. Also note the prove is not a blast parameter. It is a post blast filter.  The HSPs are tiled and flattened, then the percent coverage against the original query is calculated (i.e if every base of the query is represented at least once in the result, then coverage is 100% ). The blast results are used only for identifying the rough region a model overlaps that is then passed to exonerate. The exonerate alignment is used to generate the splice aware est2genome model.  Many good blastn alignments will produce poor exonerate alignments, and no est2genome results.

—Carson



> On May 5, 2020, at 3:46 AM, Lior Glick <[hidden email]> wrote:
>
> Hello,
>
> I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure).
> I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)?
> Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them.
> Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence?
>
> Just to make sure you have all the details:
> Relevant maker_bopts parameters:
> pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments
> pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments
> eval_blastn=1e-10 #Blastn eval cutoff
> bit_blastn=40 #Blastn bit cutoff
> depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff)
>
> Blastn command:
> blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out
>
> Thank you!
> <my.blastn><MAKER.blastn>_______________________________________________
> maker-devel mailing list
> [hidden email]
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org