Quantcast

[Ergatis-devel] patches to prokaryotic annotation components

classic Classic list List threaded Threaded
6 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Ergatis-devel] patches to prokaryotic annotation components

Chris Hemmerich

We have four modified Ergatis components (made for ISGA) that are more
than bugfixes, but I believe are safe to commit without breaking other
people's pipelines. I've listed them below, do any of them raise
objections?

1: start_site_curation.pl

In trunk, this script loads every evidence file for each contig - making
the script very slow for poorly assembled draft genomes. Our verson only
opens evidence files for the polypeptides on a given contig.

2: overlap_analysis.pl

We change the ncRNA input to be optional as we allow users to disable the
ncRNA components.

3 & 4:  bsml2fasta.pl, split_multifasta.pl

We run sequence headers through BSML::BsmlElement->getCleanID() so that
headers are consistently scrubbed throughout the pipeline.

Thanks,
  Chris

------------------------------------------------------------------------------
10 Tips for Better Server Consolidation
Server virtualization is being driven by many needs.  
But none more important than the need to reduce IT complexity
while improving strategic productivity.  Learn More!
http://www.accelacomm.com/jaw/sdnl/114/51507609/
_______________________________________________
Ergatis-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-devel
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Ergatis-devel] patches to prokaryotic annotation components

Kevin Galens
Chris,

Here are some comments on the proposed changes below.

1. start_site_curation.pl

Without opening the evidence bsml files, how do you determine which
evidence file contains information for which polypeptide? Originally, when
I wrote this script I relied on the filenames. This worked in most cases,
but for some other applications (outside of the prok pipeline) this
assumption did not hold true. For example, one ber bsml file might contain
alignment information for multiple polypeptides.

2. overlap_analysis.pl

I can't think of any issues with this. Surprised we made that required in
the first place.

3 & 4. bsml2fasta.pl, split_multifasta.pl

This would need to be an optional parameter. I can see cases where we
would not want the sequence headers to be changed when using these
components. Was this causing issues within the pipeline? Or just general
confusion?

Kevin

On 12/15/11 1:39 PM, "Chris Hemmerich" <[hidden email]> wrote:

>
>We have four modified Ergatis components (made for ISGA) that are more
>than bugfixes, but I believe are safe to commit without breaking other
>people's pipelines. I've listed them below, do any of them raise
>objections?
>
>1: start_site_curation.pl
>
>In trunk, this script loads every evidence file for each contig - making
>the script very slow for poorly assembled draft genomes. Our verson only
>opens evidence files for the polypeptides on a given contig.
>
>2: overlap_analysis.pl
>
>We change the ncRNA input to be optional as we allow users to disable the
>ncRNA components.
>
>3 & 4:  bsml2fasta.pl, split_multifasta.pl
>
>We run sequence headers through BSML::BsmlElement->getCleanID() so that
>headers are consistently scrubbed throughout the pipeline.
>
>Thanks,
>  Chris
>
>--------------------------------------------------------------------------
>----
>10 Tips for Better Server Consolidation
>Server virtualization is being driven by many needs.
>But none more important than the need to reduce IT complexity
>while improving strategic productivity.  Learn More!
>http://www.accelacomm.com/jaw/sdnl/114/51507609/
>_______________________________________________
>Ergatis-devel mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/ergatis-devel



------------------------------------------------------------------------------
10 Tips for Better Server Consolidation
Server virtualization is being driven by many needs.  
But none more important than the need to reduce IT complexity
while improving strategic productivity.  Learn More!
http://www.accelacomm.com/jaw/sdnl/114/51507609/
_______________________________________________
Ergatis-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-devel
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Ergatis-devel] patches to prokaryotic annotation components

Chris Hemmerich

Thanks, Kevin. Responses are below.

On Thu, 15 Dec 2011, Kevin Galens wrote:

> Chris,
>
> Here are some comments on the proposed changes below.
>
> 1. start_site_curation.pl
>
> Without opening the evidence bsml files, how do you determine which
> evidence file contains information for which polypeptide? Originally, when
> I wrote this script I relied on the filenames. This worked in most cases,
> but for some other applications (outside of the prok pipeline) this
> assumption did not hold true. For example, one ber bsml file might contain
> alignment information for multiple polypeptides.
>

We read in the defined peptides from INPUT_FILE and then match on the ber
evidence file name, so it would break on the example above.  We can
continue to maintain this patch for ISGA, or if the slowdown is a general
problem, a solution might be to accept an optional mapping file from id ->
file, and then modify the ber component to produce such a mapping.

> 2. overlap_analysis.pl
>
> I can't think of any issues with this. Surprised we made that required in
> the first place.
>

Ok, if no one else objects, I'll commit this.

> 3 & 4. bsml2fasta.pl, split_multifasta.pl
>
> This would need to be an optional parameter. I can see cases where we
> would not want the sequence headers to be changed when using these
> components. Was this causing issues within the pipeline? Or just general
> confusion?
>

Vanilla split_multifasta.pl cleans the header to be a safe file name. But
if this name is different than what getCleanID() produces, the prok.
pipeline will break because it can not map the fasta header onto the BSML
entry. I'll run a test on v16 and get back to you with where the pipeline
fails, maybe there is a better solution than these patches.

Thanks,
  Chris

> Kevin
>
> On 12/15/11 1:39 PM, "Chris Hemmerich" <[hidden email]> wrote:
>
>>
>> We have four modified Ergatis components (made for ISGA) that are more
>> than bugfixes, but I believe are safe to commit without breaking other
>> people's pipelines. I've listed them below, do any of them raise
>> objections?
>>
>> 1: start_site_curation.pl
>>
>> In trunk, this script loads every evidence file for each contig - making
>> the script very slow for poorly assembled draft genomes. Our verson only
>> opens evidence files for the polypeptides on a given contig.
>>
>> 2: overlap_analysis.pl
>>
>> We change the ncRNA input to be optional as we allow users to disable the
>> ncRNA components.
>>
>> 3 & 4:  bsml2fasta.pl, split_multifasta.pl
>>
>> We run sequence headers through BSML::BsmlElement->getCleanID() so that
>> headers are consistently scrubbed throughout the pipeline.
>>
>> Thanks,
>>  Chris
>>
>> --------------------------------------------------------------------------
>> ----
>> 10 Tips for Better Server Consolidation
>> Server virtualization is being driven by many needs.
>> But none more important than the need to reduce IT complexity
>> while improving strategic productivity.  Learn More!
>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>> _______________________________________________
>> Ergatis-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>
>
>
> ------------------------------------------------------------------------------
> 10 Tips for Better Server Consolidation
> Server virtualization is being driven by many needs.
> But none more important than the need to reduce IT complexity
> while improving strategic productivity.  Learn More!
> http://www.accelacomm.com/jaw/sdnl/114/51507609/
> _______________________________________________
> Ergatis-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>

------------------------------------------------------------------------------
10 Tips for Better Server Consolidation
Server virtualization is being driven by many needs.  
But none more important than the need to reduce IT complexity
while improving strategic productivity.  Learn More!
http://www.accelacomm.com/jaw/sdnl/114/51507609/
_______________________________________________
Ergatis-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-devel
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Ergatis-devel] patches to prokaryotic annotation components

Kevin Galens
Chris,

1. Yeah, I've thought about the mapping file as well. Although, I'd hate
to add this functionality to a general component like BER with the only
application for the change being the Prok Pipeline. A better solution
would be to pre-parse the BER bsml files within the start_site_curation
component and create the mapping file just before the analyze overlaps
step. I'll take a look into changing this.

2. Sounds good.

3. Thanks for looking into this.

On 12/15/11 3:10 PM, "Chris Hemmerich" <[hidden email]> wrote:

>
>Thanks, Kevin. Responses are below.
>
>On Thu, 15 Dec 2011, Kevin Galens wrote:
>
>> Chris,
>>
>> Here are some comments on the proposed changes below.
>>
>> 1. start_site_curation.pl
>>
>> Without opening the evidence bsml files, how do you determine which
>> evidence file contains information for which polypeptide? Originally,
>>when
>> I wrote this script I relied on the filenames. This worked in most
>>cases,
>> but for some other applications (outside of the prok pipeline) this
>> assumption did not hold true. For example, one ber bsml file might
>>contain
>> alignment information for multiple polypeptides.
>>
>
>We read in the defined peptides from INPUT_FILE and then match on the ber
>evidence file name, so it would break on the example above.  We can
>continue to maintain this patch for ISGA, or if the slowdown is a general
>problem, a solution might be to accept an optional mapping file from id
>->
>file, and then modify the ber component to produce such a mapping.
>
>> 2. overlap_analysis.pl
>>
>> I can't think of any issues with this. Surprised we made that required
>>in
>> the first place.
>>
>
>Ok, if no one else objects, I'll commit this.
>
>> 3 & 4. bsml2fasta.pl, split_multifasta.pl
>>
>> This would need to be an optional parameter. I can see cases where we
>> would not want the sequence headers to be changed when using these
>> components. Was this causing issues within the pipeline? Or just general
>> confusion?
>>
>
>Vanilla split_multifasta.pl cleans the header to be a safe file name. But
>if this name is different than what getCleanID() produces, the prok.
>pipeline will break because it can not map the fasta header onto the BSML
>entry. I'll run a test on v16 and get back to you with where the pipeline
>fails, maybe there is a better solution than these patches.
>
>Thanks,
>  Chris
>
>> Kevin
>>
>> On 12/15/11 1:39 PM, "Chris Hemmerich" <[hidden email]> wrote:
>>
>>>
>>> We have four modified Ergatis components (made for ISGA) that are more
>>> than bugfixes, but I believe are safe to commit without breaking other
>>> people's pipelines. I've listed them below, do any of them raise
>>> objections?
>>>
>>> 1: start_site_curation.pl
>>>
>>> In trunk, this script loads every evidence file for each contig -
>>>making
>>> the script very slow for poorly assembled draft genomes. Our verson
>>>only
>>> opens evidence files for the polypeptides on a given contig.
>>>
>>> 2: overlap_analysis.pl
>>>
>>> We change the ncRNA input to be optional as we allow users to disable
>>>the
>>> ncRNA components.
>>>
>>> 3 & 4:  bsml2fasta.pl, split_multifasta.pl
>>>
>>> We run sequence headers through BSML::BsmlElement->getCleanID() so that
>>> headers are consistently scrubbed throughout the pipeline.
>>>
>>> Thanks,
>>>  Chris
>>>
>>>
>>>------------------------------------------------------------------------
>>>--
>>> ----
>>> 10 Tips for Better Server Consolidation
>>> Server virtualization is being driven by many needs.
>>> But none more important than the need to reduce IT complexity
>>> while improving strategic productivity.  Learn More!
>>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>>> _______________________________________________
>>> Ergatis-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>-----
>> 10 Tips for Better Server Consolidation
>> Server virtualization is being driven by many needs.
>> But none more important than the need to reduce IT complexity
>> while improving strategic productivity.  Learn More!
>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>> _______________________________________________
>> Ergatis-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>
>
>--------------------------------------------------------------------------
>----
>10 Tips for Better Server Consolidation
>Server virtualization is being driven by many needs.
>But none more important than the need to reduce IT complexity
>while improving strategic productivity.  Learn More!
>http://www.accelacomm.com/jaw/sdnl/114/51507609/
>_______________________________________________
>Ergatis-devel mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/ergatis-devel



------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for
developers. It will provide a great way to learn Windows Azure and what it
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Ergatis-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-devel
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Ergatis-devel] patches to prokaryotic annotation components

Chris Hemmerich

Kevin,

I ran the header

> 1

through a v16 (glimmer, non-pseudo) prok pipeline, and the pipeline dies
with mapping problems between '1' and '_1' as a sequence header. Rather
than modify these sequences at the beginning, I'm looking at injecting
getCleanID where a sequence id extracted from a BSML file is mapped
against external sequence ids - which will avoid the problem of changing
headers when not necessary.

The first two scripts to fail are bsml2fasta.pl and translate_sequence.pl.
Patching bsml2fasta.pl is straightforward, but translate_sequence.pl is
more complicated as it can take either a fasta file or bsml file as input
and then calls Fasta::SimpleIndexer to index the fasta file. My plan is to
add an optional 'use_bsml' named parameter to Fasta::SimpleIndexer::new
that causes getCleanID to be called on the header names and then modify
translate_sequence.pl to use that parameter when it detects a BSML input.

Cheers,
  Chris

On Fri, 16 Dec 2011, Kevin Galens wrote:

> Chris,
>
> 1. Yeah, I've thought about the mapping file as well. Although, I'd hate
> to add this functionality to a general component like BER with the only
> application for the change being the Prok Pipeline. A better solution
> would be to pre-parse the BER bsml files within the start_site_curation
> component and create the mapping file just before the analyze overlaps
> step. I'll take a look into changing this.
>
> 2. Sounds good.
>
> 3. Thanks for looking into this.
>
> On 12/15/11 3:10 PM, "Chris Hemmerich" <[hidden email]> wrote:
>
>>
>> Thanks, Kevin. Responses are below.
>>
>> On Thu, 15 Dec 2011, Kevin Galens wrote:
>>
>>> Chris,
>>>
>>> Here are some comments on the proposed changes below.
>>>
>>> 1. start_site_curation.pl
>>>
>>> Without opening the evidence bsml files, how do you determine which
>>> evidence file contains information for which polypeptide? Originally,
>>> when
>>> I wrote this script I relied on the filenames. This worked in most
>>> cases,
>>> but for some other applications (outside of the prok pipeline) this
>>> assumption did not hold true. For example, one ber bsml file might
>>> contain
>>> alignment information for multiple polypeptides.
>>>
>>
>> We read in the defined peptides from INPUT_FILE and then match on the ber
>> evidence file name, so it would break on the example above.  We can
>> continue to maintain this patch for ISGA, or if the slowdown is a general
>> problem, a solution might be to accept an optional mapping file from id
>> ->
>> file, and then modify the ber component to produce such a mapping.
>>
>>> 2. overlap_analysis.pl
>>>
>>> I can't think of any issues with this. Surprised we made that required
>>> in
>>> the first place.
>>>
>>
>> Ok, if no one else objects, I'll commit this.
>>
>>> 3 & 4. bsml2fasta.pl, split_multifasta.pl
>>>
>>> This would need to be an optional parameter. I can see cases where we
>>> would not want the sequence headers to be changed when using these
>>> components. Was this causing issues within the pipeline? Or just general
>>> confusion?
>>>
>>
>> Vanilla split_multifasta.pl cleans the header to be a safe file name. But
>> if this name is different than what getCleanID() produces, the prok.
>> pipeline will break because it can not map the fasta header onto the BSML
>> entry. I'll run a test on v16 and get back to you with where the pipeline
>> fails, maybe there is a better solution than these patches.
>>
>> Thanks,
>>  Chris
>>
>>> Kevin
>>>
>>> On 12/15/11 1:39 PM, "Chris Hemmerich" <[hidden email]> wrote:
>>>
>>>>
>>>> We have four modified Ergatis components (made for ISGA) that are more
>>>> than bugfixes, but I believe are safe to commit without breaking other
>>>> people's pipelines. I've listed them below, do any of them raise
>>>> objections?
>>>>
>>>> 1: start_site_curation.pl
>>>>
>>>> In trunk, this script loads every evidence file for each contig -
>>>> making
>>>> the script very slow for poorly assembled draft genomes. Our verson
>>>> only
>>>> opens evidence files for the polypeptides on a given contig.
>>>>
>>>> 2: overlap_analysis.pl
>>>>
>>>> We change the ncRNA input to be optional as we allow users to disable
>>>> the
>>>> ncRNA components.
>>>>
>>>> 3 & 4:  bsml2fasta.pl, split_multifasta.pl
>>>>
>>>> We run sequence headers through BSML::BsmlElement->getCleanID() so that
>>>> headers are consistently scrubbed throughout the pipeline.
>>>>
>>>> Thanks,
>>>>  Chris
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>> ----
>>>> 10 Tips for Better Server Consolidation
>>>> Server virtualization is being driven by many needs.
>>>> But none more important than the need to reduce IT complexity
>>>> while improving strategic productivity.  Learn More!
>>>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>>>> _______________________________________________
>>>> Ergatis-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------------
>>> -----
>>> 10 Tips for Better Server Consolidation
>>> Server virtualization is being driven by many needs.
>>> But none more important than the need to reduce IT complexity
>>> while improving strategic productivity.  Learn More!
>>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>>> _______________________________________________
>>> Ergatis-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>>
>>
>> --------------------------------------------------------------------------
>> ----
>> 10 Tips for Better Server Consolidation
>> Server virtualization is being driven by many needs.
>> But none more important than the need to reduce IT complexity
>> while improving strategic productivity.  Learn More!
>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>> _______________________________________________
>> Ergatis-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>
>
>
> ------------------------------------------------------------------------------
> Learn Windows Azure Live!  Tuesday, Dec 13, 2011
> Microsoft is holding a special Learn Windows Azure training event for
> developers. It will provide a great way to learn Windows Azure and what it
> provides. You can attend the event by watching it streamed LIVE online.
> Learn more at http://p.sf.net/sfu/ms-windowsazure
> _______________________________________________
> Ergatis-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>

------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for
developers. It will provide a great way to learn Windows Azure and what it
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Ergatis-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-devel
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Ergatis-devel] patches to prokaryotic annotation components

Chris Hemmerich

Kevin,

I committed revision 7535, which fixes this problem for our pipeline via a
bugfix to bsml2fasta.pl and small addition to translate_sequence.pl. Can
you please look at translate_sequence.pl in particular to make sure I
haven't broken some other corner case fixing this one?

bsml2fasta.pl - This script already addressed my problem by using the
'identifier' attr of 'Seq-data-import' to map to the fasta header. However
a bug caused it to attempt this mapping twice - I fixed this and it works.

translate_sequence.pl - I used the same method as bsml2fasta.pl does, and
use the fasta identifier when it is available from the source bsml.

Thanks,
  Chris


On Fri, 16 Dec 2011, Chris Hemmerich wrote:

>
> Kevin,
>
> I ran the header
>
>> 1
>
> through a v16 (glimmer, non-pseudo) prok pipeline, and the pipeline dies
> with mapping problems between '1' and '_1' as a sequence header. Rather
> than modify these sequences at the beginning, I'm looking at injecting
> getCleanID where a sequence id extracted from a BSML file is mapped
> against external sequence ids - which will avoid the problem of changing
> headers when not necessary.
>
> The first two scripts to fail are bsml2fasta.pl and translate_sequence.pl.
> Patching bsml2fasta.pl is straightforward, but translate_sequence.pl is
> more complicated as it can take either a fasta file or bsml file as input
> and then calls Fasta::SimpleIndexer to index the fasta file. My plan is to
> add an optional 'use_bsml' named parameter to Fasta::SimpleIndexer::new
> that causes getCleanID to be called on the header names and then modify
> translate_sequence.pl to use that parameter when it detects a BSML input.
>
> Cheers,
>  Chris
>
> On Fri, 16 Dec 2011, Kevin Galens wrote:
>
>> Chris,
>>
>> 1. Yeah, I've thought about the mapping file as well. Although, I'd hate
>> to add this functionality to a general component like BER with the only
>> application for the change being the Prok Pipeline. A better solution
>> would be to pre-parse the BER bsml files within the start_site_curation
>> component and create the mapping file just before the analyze overlaps
>> step. I'll take a look into changing this.
>>
>> 2. Sounds good.
>>
>> 3. Thanks for looking into this.
>>
>> On 12/15/11 3:10 PM, "Chris Hemmerich" <[hidden email]> wrote:
>>
>>>
>>> Thanks, Kevin. Responses are below.
>>>
>>> On Thu, 15 Dec 2011, Kevin Galens wrote:
>>>
>>>> Chris,
>>>>
>>>> Here are some comments on the proposed changes below.
>>>>
>>>> 1. start_site_curation.pl
>>>>
>>>> Without opening the evidence bsml files, how do you determine which
>>>> evidence file contains information for which polypeptide? Originally,
>>>> when
>>>> I wrote this script I relied on the filenames. This worked in most
>>>> cases,
>>>> but for some other applications (outside of the prok pipeline) this
>>>> assumption did not hold true. For example, one ber bsml file might
>>>> contain
>>>> alignment information for multiple polypeptides.
>>>>
>>>
>>> We read in the defined peptides from INPUT_FILE and then match on the ber
>>> evidence file name, so it would break on the example above.  We can
>>> continue to maintain this patch for ISGA, or if the slowdown is a general
>>> problem, a solution might be to accept an optional mapping file from id
>>> ->
>>> file, and then modify the ber component to produce such a mapping.
>>>
>>>> 2. overlap_analysis.pl
>>>>
>>>> I can't think of any issues with this. Surprised we made that required
>>>> in
>>>> the first place.
>>>>
>>>
>>> Ok, if no one else objects, I'll commit this.
>>>
>>>> 3 & 4. bsml2fasta.pl, split_multifasta.pl
>>>>
>>>> This would need to be an optional parameter. I can see cases where we
>>>> would not want the sequence headers to be changed when using these
>>>> components. Was this causing issues within the pipeline? Or just general
>>>> confusion?
>>>>
>>>
>>> Vanilla split_multifasta.pl cleans the header to be a safe file name. But
>>> if this name is different than what getCleanID() produces, the prok.
>>> pipeline will break because it can not map the fasta header onto the BSML
>>> entry. I'll run a test on v16 and get back to you with where the pipeline
>>> fails, maybe there is a better solution than these patches.
>>>
>>> Thanks,
>>>  Chris
>>>
>>>> Kevin
>>>>
>>>> On 12/15/11 1:39 PM, "Chris Hemmerich" <[hidden email]> wrote:
>>>>
>>>>>
>>>>> We have four modified Ergatis components (made for ISGA) that are more
>>>>> than bugfixes, but I believe are safe to commit without breaking other
>>>>> people's pipelines. I've listed them below, do any of them raise
>>>>> objections?
>>>>>
>>>>> 1: start_site_curation.pl
>>>>>
>>>>> In trunk, this script loads every evidence file for each contig -
>>>>> making
>>>>> the script very slow for poorly assembled draft genomes. Our verson
>>>>> only
>>>>> opens evidence files for the polypeptides on a given contig.
>>>>>
>>>>> 2: overlap_analysis.pl
>>>>>
>>>>> We change the ncRNA input to be optional as we allow users to disable
>>>>> the
>>>>> ncRNA components.
>>>>>
>>>>> 3 & 4:  bsml2fasta.pl, split_multifasta.pl
>>>>>
>>>>> We run sequence headers through BSML::BsmlElement->getCleanID() so that
>>>>> headers are consistently scrubbed throughout the pipeline.
>>>>>
>>>>> Thanks,
>>>>>  Chris
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> --
>>>>> ----
>>>>> 10 Tips for Better Server Consolidation
>>>>> Server virtualization is being driven by many needs.
>>>>> But none more important than the need to reduce IT complexity
>>>>> while improving strategic productivity.  Learn More!
>>>>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>>>>> _______________________________________________
>>>>> Ergatis-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>>>
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------------
>>>> -----
>>>> 10 Tips for Better Server Consolidation
>>>> Server virtualization is being driven by many needs.
>>>> But none more important than the need to reduce IT complexity
>>>> while improving strategic productivity.  Learn More!
>>>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>>>> _______________________________________________
>>>> Ergatis-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>>>
>>>
>>> --------------------------------------------------------------------------
>>> ----
>>> 10 Tips for Better Server Consolidation
>>> Server virtualization is being driven by many needs.
>>> But none more important than the need to reduce IT complexity
>>> while improving strategic productivity.  Learn More!
>>> http://www.accelacomm.com/jaw/sdnl/114/51507609/
>>> _______________________________________________
>>> Ergatis-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Windows Azure Live!  Tuesday, Dec 13, 2011
>> Microsoft is holding a special Learn Windows Azure training event for
>> developers. It will provide a great way to learn Windows Azure and what it
>> provides. You can attend the event by watching it streamed LIVE online.
>> Learn more at http://p.sf.net/sfu/ms-windowsazure
>> _______________________________________________
>> Ergatis-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>>
>
> ------------------------------------------------------------------------------
> Learn Windows Azure Live!  Tuesday, Dec 13, 2011
> Microsoft is holding a special Learn Windows Azure training event for
> developers. It will provide a great way to learn Windows Azure and what it
> provides. You can attend the event by watching it streamed LIVE online.
> Learn more at http://p.sf.net/sfu/ms-windowsazure
> _______________________________________________
> Ergatis-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/ergatis-devel
>

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Ergatis-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/ergatis-devel
Loading...