About loss of Histone H2A, H2B, H4

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

About loss of Histone H2A, H2B, H4

Quanwei Zhang
Hello:

We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic.

Thanks

Best
Quanwei

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: About loss of Histone H2A, H2B, H4

Carson Holt-2
No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan.

Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly.

—Carson


On Nov 16, 2017, at 12:46 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic.

Thanks

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: About loss of Histone H2A, H2B, H4

Quanwei Zhang
Dear Carson:

Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking?
By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think?


C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104

  3902  Complete BUSCOs (C)

  3806  Complete and single-copy BUSCOs (S)

  96  Complete and duplicated BUSCOs (D)

  92  Fragmented BUSCOs (F)

  110  Missing BUSCOs (M)

Thanks
Best
Quanwei
 

2017-11-21 11:19 GMT-05:00 Carson Holt <[hidden email]>:
No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan.

Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly.

—Carson


On Nov 16, 2017, at 12:46 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic.

Thanks

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: About loss of Histone H2A, H2B, H4

Carson Holt-2
You should not have to train separately for SNAP on unmasked sequence, and I do believe adding back genes that were rejected because of lack of support but contain an identifiable domain may help. These will be in the fasta files labeled non-overlapping file in the datastore.

—Carson

On Nov 21, 2017, at 10:42 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking?
By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think?


C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104
  3902  Complete BUSCOs (C)
  3806  Complete and single-copy BUSCOs (S)
  96  Complete and duplicated BUSCOs (D)
  92  Fragmented BUSCOs (F)
  110  Missing BUSCOs (M)

Thanks
Best
Quanwei
 

2017-11-21 11:19 GMT-05:00 Carson Holt <[hidden email]>:
No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan.

Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly.

—Carson


On Nov 16, 2017, at 12:46 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic.

Thanks

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: About loss of Histone H2A, H2B, H4

Quanwei Zhang
Dear Carson:

Thank you!

Best
Quanwei

2017-11-27 14:56 GMT-05:00 Carson Holt <[hidden email]>:
You should not have to train separately for SNAP on unmasked sequence, and I do believe adding back genes that were rejected because of lack of support but contain an identifiable domain may help. These will be in the fasta files labeled non-overlapping file in the datastore.

—Carson

On Nov 21, 2017, at 10:42 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking?
By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think?


C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104
  3902  Complete BUSCOs (C)
  3806  Complete and single-copy BUSCOs (S)
  96  Complete and duplicated BUSCOs (D)
  92  Fragmented BUSCOs (F)
  110  Missing BUSCOs (M)

Thanks
Best
Quanwei
 

2017-11-21 11:19 GMT-05:00 Carson Holt <[hidden email]>:
No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan.

Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly.

—Carson


On Nov 16, 2017, at 12:46 PM, Quanwei Zhang <[hidden email]> wrote:

Hello:

We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic.

Thanks

Best
Quanwei
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org