Advice on handling large sequences in Apollo editor

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Advice on handling large sequences in Apollo editor

Gareth Maslen
Hello Nathan et al,

VectorBase is moving from to the new Aedes aegypti L5 assembly in our December 2017 release, and we would like to get an Apollo database built for community use as quickly as possible.

The new L5 assembly has chromosomal level sequences of up to ~600Mb length which is significantly longer than any of the other sequences that we have attempted to use in Apollo before. Assuming that you have worked with sequences of this size before - do you have any advice/tips for handling these large sequences ?

Any pointers would be much appreciated.

Best regards,

Gareth



This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|

Re: Advice on handling large sequences in Apollo editor

nathandunn

I don’t anticipate issues.   I think the hard-limit would be 2GB’s (we don’t have long’s for chromosome languages, just ints). 

That being said, we added (in 2.0.8) the ability to add indexed fasta for sequences (an option in JBrowse for awhile).  


This might be a better option for a file that large as the other option will create a very large number of tiny files proportional to the size of the chromosome.   Some groups have had issues with doing backups and other file I/O issues, though I think its unlikely it would negatively affect the performance within Apollo itself.

Nathan


On Dec 4, 2017, at 5:09 AM, Gareth Maslen <[hidden email]> wrote:

Hello Nathan et al,

VectorBase is moving from to the new Aedes aegypti L5 assembly in our December 2017 release, and we would like to get an Apollo database built for community use as quickly as possible.

The new L5 assembly has chromosomal level sequences of up to ~600Mb length which is significantly longer than any of the other sequences that we have attempted to use in Apollo before. Assuming that you have worked with sequences of this size before - do you have any advice/tips for handling these large sequences ?

Any pointers would be much appreciated.

Best regards,

Gareth


This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.






This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.