how the maker2 pipeline deal with a Ns sequence

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how the maker2 pipeline deal with a Ns sequence

Quanwei Zhang
Hello:

I wonder how the maker2 pipeline deals with a 1kb or 100bp NN sequences in a scafold, especially when such sequences within a gene region?

Recently, we improved a rodent genome. However, at a few certain regions we filled the gap with both sequences and Ns sequences. We found some predicted genes in the old assembly lost in our new assembly. By looking deep into the region, the gene region in the new assembly highly match the old assembly at both ends of the gene, where covered most predicted coding sequences (predicted in the old assembly). The only difference is shown in the middle of the gene with some new sequence and Ns sequences. Attached in the alignments between sequence in the old and new assembly in the gene region (Y coordinate shows the new assembly).

Many thanks and Merry Christmas

Best
Quanwei  

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

hit_matrix.png (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: how the maker2 pipeline deal with a Ns sequence

Carson Holt-2
MAKER doesn’t do anything special for stretches of N’s. They still will be run through each step, just producing empty results in most cases for the region. Programs MAKER uses like BLAST will not align to N’s, so any alignment must either happen around the N’s for short stretches or no result is produced for long stretches (results will be produced in regions where the N’s stop). Also gene predictors will not include N’s in their models, so the N’s will either occur within the intron, or will be skipped over entirely.

—Carson


> On Dec 21, 2017, at 8:06 AM, Quanwei Zhang <[hidden email]> wrote:
>
> Hello:
>
> I wonder how the maker2 pipeline deals with a 1kb or 100bp NN sequences in a scafold, especially when such sequences within a gene region?
>
> Recently, we improved a rodent genome. However, at a few certain regions we filled the gap with both sequences and Ns sequences. We found some predicted genes in the old assembly lost in our new assembly. By looking deep into the region, the gene region in the new assembly highly match the old assembly at both ends of the gene, where covered most predicted coding sequences (predicted in the old assembly). The only difference is shown in the middle of the gene with some new sequence and Ns sequences. Attached in the alignments between sequence in the old and new assembly in the gene region (Y coordinate shows the new assembly).
>
> Many thanks and Merry Christmas
>
> Best
> Quanwei  
> <hit_matrix.png>_______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org