Repeats annotation

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Repeats annotation

Quanwei Zhang
Dear Carson:

We have generated species specific repeat library following your pipeline (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic). And did genome annotation by maker2 by using both species specific repeat library and mammalian repeat library.

Now, we want to do some comparison about the repeat contexts among different species. So I want to generate species specific for other species and also use both their species specific repeat library and mammalian repeat library. But I found, I can only provide either the species specific repeat library or mammalian repeat library to RepeatMasker (not for both). I wonder whether I can run maker2 on those genome but only for repeat masking.

BTW, by running RepeatMasker we can get a summary report (as below), I wonder whether there is any script from maker2 to analyze repeats element (or other tools to process the output of maker2).

Many thanks


file name: test_scaffold31.fasta   
sequences:             1
total length:     863590 bp  (858757 bp excl N/X-runs)
GC level:         37.02 %
bases masked:     301634 bp ( 34.93 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:               134        14362 bp    1.66 %
      Alu/B1          28         2183 bp    0.25 %
      MIRs            21         2860 bp    0.33 %

LINEs:               188       129104 bp   14.95 %
      LINE1          168       124633 bp   14.43 %
      LINE2           16         4266 bp    0.49 %
      L3/CR1           4          205 bp    0.02 %
      RTE              0            0 bp    0.00 %

LTR elements:        127       101129 bp   11.71 %
      ERVL            10         3057 bp    0.35 %
      ERVL-MaLRs      22         6902 bp    0.80 %
      ERV_classI      66        80258 bp    9.29 %
      ERV_classII     29        10912 bp    1.26 %

DNA elements:         27         4402 bp    0.51 %
      hAT-Charlie     13         1836 bp    0.21 %
      TcMar-Tigger     8         1651 bp    0.19 %

Unclassified:          4         1590 bp    0.18 %

Total interspersed repeats:    250587 bp   29.02 %


Small RNA:             9          616 bp    0.07 %

Satellites:           66        40820 bp    4.73 %
Simple repeats:      159         7235 bp    0.84 %
Low complexity:       50         2766 bp    0.32 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
                                                     

The query species was assumed to be mammalia     
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
       
run with rmblastn version 2.2.27+


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Repeats annotation

Carson Holt-2
I don’t know of any tool to analyze the repeat info. MAKER really only focuses on getting the masking done for the gene prediction, and while it does keep the repeats as features in the GFF3, it does not do any kind of analysis. You would have to do that outside of MAKER.

—Carson


On Sep 13, 2017, at 8:51 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

We have generated species specific repeat library following your pipeline (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic). And did genome annotation by maker2 by using both species specific repeat library and mammalian repeat library.

Now, we want to do some comparison about the repeat contexts among different species. So I want to generate species specific for other species and also use both their species specific repeat library and mammalian repeat library. But I found, I can only provide either the species specific repeat library or mammalian repeat library to RepeatMasker (not for both). I wonder whether I can run maker2 on those genome but only for repeat masking.

BTW, by running RepeatMasker we can get a summary report (as below), I wonder whether there is any script from maker2 to analyze repeats element (or other tools to process the output of maker2).

Many thanks


file name: test_scaffold31.fasta   
sequences:             1
total length:     863590 bp  (858757 bp excl N/X-runs)
GC level:         37.02 %
bases masked:     301634 bp ( 34.93 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:               134        14362 bp    1.66 %
      Alu/B1          28         2183 bp    0.25 %
      MIRs            21         2860 bp    0.33 %

LINEs:               188       129104 bp   14.95 %
      LINE1          168       124633 bp   14.43 %
      LINE2           16         4266 bp    0.49 %
      L3/CR1           4          205 bp    0.02 %
      RTE              0            0 bp    0.00 %

LTR elements:        127       101129 bp   11.71 %
      ERVL            10         3057 bp    0.35 %
      ERVL-MaLRs      22         6902 bp    0.80 %
      ERV_classI      66        80258 bp    9.29 %
      ERV_classII     29        10912 bp    1.26 %

DNA elements:         27         4402 bp    0.51 %
      hAT-Charlie     13         1836 bp    0.21 %
      TcMar-Tigger     8         1651 bp    0.19 %

Unclassified:          4         1590 bp    0.18 %

Total interspersed repeats:    250587 bp   29.02 %


Small RNA:             9          616 bp    0.07 %

Satellites:           66        40820 bp    4.73 %
Simple repeats:      159         7235 bp    0.84 %
Low complexity:       50         2766 bp    0.32 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
                                                     

The query species was assumed to be mammalia     
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
       
run with rmblastn version 2.2.27+



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org