advanced repeat libraries

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

advanced repeat libraries

Xabier Vázquez-Campos
I'm dealing with a fungal genome with at least 40% of repeats, so I'm trying to follow the advanced repeat construction protocol.
So far, so good, but I have doubts about how to build the protein database as explained at the end of the page

In summary
1. get SwissProt and RefSeq fungal proteins
2. tblastn (from 1) against EST-NCBI database and keep the matches
3. blastp the output from 2 against the transposase protein db. Remove matches
but from here on I'm a bit lost...

"Finally, the rice protein sequences were compared with verified transposons (such as Pack-MULEs) in the rice genome. If the protein sequence matched a transposon perfectly and was the only perfect match in the genome, the relevant protein sequence was excluded. Although elements such as Pack-MULEs contain true gene sequences, the annotation (the protein sequence in the database) often extends to non-gene sequences such as terminal inverted repeat or sub-terminal repeat, which are not true plant proteins and would cause great complications. As a result, it is essential to exclude them."

Are the proteins kept at the end of the step 3 the 'protein database'?
Could you provide a bit more detail on how to tackle this?

Thank you in advance,

Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales

maker-devel mailing list
[hidden email]