annotating assembly errors in apollo => fixing the underlying genome

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

Suzanna Lewis-3
Hi Malcolm,

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

Hope this helps, Suzanna


On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.







This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

Deepak Unni
Hi Malcolm,

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

I hope this helps!

Cheers,

Deepak Unni


On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:
Hi Malcolm,

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

Hope this helps, Suzanna


On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.







This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.





--
Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

Hi Suzanne, Deepak and Unni, and Apollo Dev team,

 

Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.

 

While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.

 

After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

Apollo Devs,

 

If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 

 

If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.

 

If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.

 

Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.

 

I have been trying to come up to speed with some of the related issues, such as these:

 

 Add residues to the output of the GFF3 for insertions

  Write GFF3 adaptors to meet specific needs

  Add sequence alterations and other feature types to GFF3+FASTA export

 

As well as issues related to doing all this from the command line:

 

  #425 – wherein export_annotations_to_gff3.p was removed

  Export curation track using command-line tool

  Can annotation track be exported as gff3 on command line?

 

I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:

 

  convinces me that my effort is misguided in the first place

  agrees to incorporate this into Apollo post haste

 

So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…

 

Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.

 

FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.

 

Thanks for reading and the encouragement, but, most of all,

 

Thanks for Apollo!

 

Malcolm Cook

 

From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Malcolm,

 

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

 

I hope this helps!

 

Cheers,

 

Deepak Unni

 

 

On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:

Hi Malcolm,

 

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

 

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

 

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

 

Hope this helps, Suzanna

 

 

On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.



 

--

Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

Hello All,

 

So, think I have this figured out… I’ll deploy and test tomorrow…

 

But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:

 

https://github.com/malcook/Apollo/commit/10292ba4e941d0e35d861ba8bb3f44ebadd6e7b6

 

Thanks for following,

 

Malcolm

 

From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Suzanne, Deepak and Unni, and Apollo Dev team,

 

Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.

 

While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.

 

After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

Apollo Devs,

 

If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 

 

If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.

 

If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.

 

Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.

 

I have been trying to come up to speed with some of the related issues, such as these:

 

 Add residues to the output of the GFF3 for insertions

  Write GFF3 adaptors to meet specific needs

  Add sequence alterations and other feature types to GFF3+FASTA export

 

As well as issues related to doing all this from the command line:

 

  #425 – wherein export_annotations_to_gff3.p was removed

  Export curation track using command-line tool

  Can annotation track be exported as gff3 on command line?

 

I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:

 

  convinces me that my effort is misguided in the first place

  agrees to incorporate this into Apollo post haste

 

So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…

 

Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.

 

FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.

 

Thanks for reading and the encouragement, but, most of all,

 

Thanks for Apollo!

 

Malcolm Cook

 

From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Malcolm,

 

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

 

I hope this helps!

 

Cheers,

 

Deepak Unni

 

 

On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:

Hi Malcolm,

 

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

 

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

 

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

 

Hope this helps, Suzanna

 

 

On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.




 

--

Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

nathandunn

Malcolm,

The code itself looks fine if you want to initiate a PR. 

The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  

Thanks,

Nathan

On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:

Hello All,
 
So, think I have this figured out… I’ll deploy and test tomorrow…
 
But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:
 
 
Thanks for following,
 
Malcolm
 
From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Suzanne, Deepak and Unni, and Apollo Dev team,
 
Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.
 
While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.
 
After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."
Apollo Devs,
 
If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 
 
If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.
 
If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.
 
Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.
 
I have been trying to come up to speed with some of the related issues, such as these:
 
 
As well as issues related to doing all this from the command line:
 
  #425 – wherein export_annotations_to_gff3.p was removed
 
I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:
 
  convinces me that my effort is misguided in the first place
  agrees to incorporate this into Apollo post haste
 
So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…
 
Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.
 
FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.
 
Thanks for reading and the encouragement, but, most of all,
 
Thanks for Apollo!
 
Malcolm Cook
 
From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Malcolm,
 
To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 
 
It is still a work in progress but feel free to contact Chris and I if you have any questions.
 
I hope this helps!
 
Cheers,
 
Deepak Unni
 
 
On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:
Hi Malcolm,
 
In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.
 
Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 
 
However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).
 
#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 
 
Hope this helps, Suzanna
 
 
On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.




 
--
Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia



This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.






This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

Suzanna Lewis-3
I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 

Direct export of VCF has my strong vote.

With that one change, I like the plan.

-S


On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:

Malcolm,

The code itself looks fine if you want to initiate a PR. 

The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  

Thanks,

Nathan

On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:

Hello All,
 
So, think I have this figured out… I’ll deploy and test tomorrow…
 
But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:
 
 
Thanks for following,
 
Malcolm
 
From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Suzanne, Deepak and Unni, and Apollo Dev team,
 
Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.
 
While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.
 
After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."
Apollo Devs,
 
If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 
 
If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.
 
If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.
 
Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.
 
I have been trying to come up to speed with some of the related issues, such as these:
 
 
As well as issues related to doing all this from the command line:
 
  #425 – wherein export_annotations_to_gff3.p was removed
 
I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:
 
  convinces me that my effort is misguided in the first place
  agrees to incorporate this into Apollo post haste
 
So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…
 
Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.
 
FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.
 
Thanks for reading and the encouragement, but, most of all,
 
Thanks for Apollo!
 
Malcolm Cook
 
From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Malcolm,
 
To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 
 
It is still a work in progress but feel free to contact Chris and I if you have any questions.
 
I hope this helps!
 
Cheers,
 
Deepak Unni
 
 
On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:
Hi Malcolm,
 
In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.
 
Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 
 
However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).
 
#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 
 
Hope this helps, Suzanna
 
 
On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.




 
--
Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia



This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.






This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.







This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

Thanks for the encouragement. 

 

Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.

 

Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).

 

However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Cheers,

 

Malcolm

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 

 

Direct export of VCF has my strong vote.

 

With that one change, I like the plan.

 

-S

 

 

On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:

 

Malcolm,

 

The code itself looks fine if you want to initiate a PR. 

 

The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  

 

Thanks,

 

Nathan

 

On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:

 

Hello All,

 

So, think I have this figured out… I’ll deploy and test tomorrow…

 

But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:

 

 

Thanks for following,

 

Malcolm

 

From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Suzanne, Deepak and Unni, and Apollo Dev team,

 

Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.

 

While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.

 

After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

Apollo Devs,

 

If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 

 

If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.

 

If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.

 

Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.

 

I have been trying to come up to speed with some of the related issues, such as these:

 

 

As well as issues related to doing all this from the command line:

 

  #425 – wherein export_annotations_to_gff3.p was removed

 

I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:

 

  convinces me that my effort is misguided in the first place

  agrees to incorporate this into Apollo post haste

 

So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…

 

Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.

 

FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.

 

Thanks for reading and the encouragement, but, most of all,

 

Thanks for Apollo!

 

Malcolm Cook

 

From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Malcolm,

 

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

 

I hope this helps!

 

Cheers,

 

Deepak Unni

 

 

On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:

Hi Malcolm,

 

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

 

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

 

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

 

Hope this helps, Suzanna

 

 

On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.



 

--

Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

nathandunn

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

Malcolm,

Just make it so it works the best for you and create the PR off of that.  

If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  

Either way, thanks for the PR and the detailed analysis.

Nathan


On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:

Thanks for the encouragement. 
 
Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.
 
Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).
 
However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs
 
On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?
 
Cheers,
 
Malcolm
 
From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 
 
Direct export of VCF has my strong vote.
 
With that one change, I like the plan.
 
-S
 
 
On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:
 
Malcolm,
 
The code itself looks fine if you want to initiate a PR. 
 
The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  
 
Thanks,
 
Nathan
 
On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:
 
Hello All,
 
So, think I have this figured out… I’ll deploy and test tomorrow…
 
But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:
 
 
Thanks for following,
 
Malcolm
 
From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Suzanne, Deepak and Unni, and Apollo Dev team,
 
Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.
 
While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.
 
After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."
Apollo Devs,
 
If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 
 
If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.
 
If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.
 
Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.
 
I have been trying to come up to speed with some of the related issues, such as these:
 
 
As well as issues related to doing all this from the command line:
 
  #425 – wherein export_annotations_to_gff3.p was removed
 
I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:
 
  convinces me that my effort is misguided in the first place
  agrees to incorporate this into Apollo post haste
 
So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…
 
Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.
 
FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.
 
Thanks for reading and the encouragement, but, most of all,
 
Thanks for Apollo!
 
Malcolm Cook
 
From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Malcolm, 
 
To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 
 
It is still a work in progress but feel free to contact Chris and I if you have any questions.
 
I hope this helps!
 
Cheers,
 
Deepak Unni
 
 
On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:
Hi Malcolm, 
 
In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.
 
Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 
 
However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).
 
#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 
 
Hope this helps, Suzanna
 
 
On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:
Hello,
 
I am seeking a workflow allowing to
  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome
 
We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  
 
I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.
 
I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?
 
Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.
 
Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?
 
Thanks,
 
-- 
Malcolm Cook
Computation Biology Core
Stowers Institute for Medical Research
Kansas City, Missouri




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.



 
-- 
Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 



This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email]| 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

nathandunn

Malcom, 

We discussed this and felt this was a good interim solution and is definitely more comprehensive.   We had one caveat, which is that we should try to follow the spec and use:

“Variant_seq” instead of “residues” or “variantSeq” to more tightly correlate with the official sequence ontology spec: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gvf.md


Suzi had added a more comprehensive solution (which subsumes what you have proposed here), but that will be implemented longer-term.


Thanks and looking forward to your PR!

Thanks,

Nathan

On Feb 7, 2017, at 8:49 AM, Nathan Dunn <[hidden email]> wrote:


On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

Malcolm,

Just make it so it works the best for you and create the PR off of that.  

If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  

Either way, thanks for the PR and the detailed analysis.

Nathan


On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:

Thanks for the encouragement. 
 
Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.
 
Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).
 
However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs
 
On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?
 
Cheers,
 
Malcolm
 
From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 
 
Direct export of VCF has my strong vote.
 
With that one change, I like the plan.
 
-S
 
 
On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:
 
Malcolm,
 
The code itself looks fine if you want to initiate a PR. 
 
The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  
 
Thanks,
 
Nathan
 
On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:
 
Hello All,
 
So, think I have this figured out… I’ll deploy and test tomorrow…
 
But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:
 
 
Thanks for following,
 
Malcolm
 
From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Suzanne, Deepak and Unni, and Apollo Dev team,
 
Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.
 
While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.
 
After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."
Apollo Devs,
 
If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 
 
If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.
 
If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.
 
Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.
 
I have been trying to come up to speed with some of the related issues, such as these:
 
 
As well as issues related to doing all this from the command line:
 
  #425 – wherein export_annotations_to_gff3.p was removed
 
I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:
 
  convinces me that my effort is misguided in the first place
  agrees to incorporate this into Apollo post haste
 
So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…
 
Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.
 
FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.
 
Thanks for reading and the encouragement, but, most of all,
 
Thanks for Apollo!
 
Malcolm Cook
 
From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Malcolm, 
 
To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 
 
It is still a work in progress but feel free to contact Chris and I if you have any questions.
 
I hope this helps!
 
Cheers,
 
Deepak Unni
 
 
On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:
Hi Malcolm, 
 
In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.
 
Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 
 
However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).
 
#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 
 
Hope this helps, Suzanna
 
 
On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:
Hello,
 
I am seeking a workflow allowing to
  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome
 
We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  
 
I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.
 
I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?
 
Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.
 
Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?
 
Thanks,
 
-- 
Malcolm Cook
Computation Biology Core
Stowers Institute for Medical Research
Kansas City, Missouri




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.



 
-- 
Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 



This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email]| 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 






This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

Nathan,

 

Excellent.

 

I am now contriving then for the GFF export of insertions|deletions|substitutions to in effect be well-formed GVF by using “Reference_seq” and “Variant_seq” attributes, which will take the value ‘-‘ for insertions and deletions, appropriately.  GVF is a better format to adopt than PacBio’s take on the same problem (VariantsGffSpecification), especially considering the present company.  Heh.

 

Also, since GVF adheres to SO, it requires ‘Reference_seq’ instead of simply ‘reference’.  So be it!

 

GVF features so-exported can in turn be converted to VCF4.3, which may then in turn be “applied” (using vcf-consensus script from VCFTools) to create an edited reference genome.

 

However, this GVF->VCF4.3 conversion requires accessing the genome to look up the single-base upstream “anchor sequence” which is required by VCF4.x spec.

 

For this reason, I have contrived to augment the Apollo-exported GVF with a new, non-standard, attribute of “VCF_anchor_seq”, being this single upstream reference base.

 

With this new attribute, the GVF to VCF4.1 conversion does not  to require to access the genome fasta.

 

I have coded these changes, and will be testing the whole process in the week to come and will issue a PR upon completion.

 

Thanks All,

 

Malcolm

 

From: <[hidden email]> on behalf of Nathan Dunn <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thursday, February 9, 2017 at 1:34 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

Malcom, 

 

We discussed this and felt this was a good interim solution and is definitely more comprehensive.   We had one caveat, which is that we should try to follow the spec and use:

 

“Variant_seq” instead of “residues” or “variantSeq” to more tightly correlate with the official sequence ontology spec: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gvf.md

 

 

Suzi had added a more comprehensive solution (which subsumes what you have proposed here), but that will be implemented longer-term.

 

 

Thanks and looking forward to your PR!

 

Thanks,

 

Nathan

 

On Feb 7, 2017, at 8:49 AM, Nathan Dunn <[hidden email]> wrote:

 

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Malcolm,

 

Just make it so it works the best for you and create the PR off of that.  

 

If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  

 

Either way, thanks for the PR and the detailed analysis.

 

Nathan

 

 

On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:

 

Thanks for the encouragement. 

 

Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.

 

Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).

 

However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Cheers,

 

Malcolm

 

From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 

 

Direct export of VCF has my strong vote.

 

With that one change, I like the plan.

 

-S

 

 

On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:

 

Malcolm,

 

The code itself looks fine if you want to initiate a PR. 

 

The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  

 

Thanks,

 

Nathan

 

On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:

 

Hello All,

 

So, think I have this figured out… I’ll deploy and test tomorrow…

 

But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:

 

 

Thanks for following,

 

Malcolm

 

From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Suzanne, Deepak and Unni, and Apollo Dev team,

 

Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.

 

While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.

 

After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

Apollo Devs,

 

If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 

 

If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.

 

If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.

 

Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.

 

I have been trying to come up to speed with some of the related issues, such as these:

 

 

As well as issues related to doing all this from the command line:

 

  #425 – wherein export_annotations_to_gff3.p was removed

 

I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:

 

  convinces me that my effort is misguided in the first place

  agrees to incorporate this into Apollo post haste

 

So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…

 

Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.

 

FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.

 

Thanks for reading and the encouragement, but, most of all,

 

Thanks for Apollo!

 

Malcolm Cook

 

From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Malcolm, 

 

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

 

I hope this helps!

 

Cheers,

 

Deepak Unni

 

 

On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:

Hi Malcolm, 

 

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

 

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

 

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

 

Hope this helps, Suzanna

 

 

On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.




 

-- 

Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 




This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email]| 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 

 







This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

nathandunn

This sounds great Malcolm.  Looking forward to the PR. 

Nathan

On Feb 12, 2017, at 2:34 AM, Cook, Malcolm <[hidden email]> wrote:

Nathan,
 
Excellent.
 
I am now contriving then for the GFF export of insertions|deletions|substitutions to in effect be well-formed GVF by using “Reference_seq” and “Variant_seq” attributes, which will take the value ‘-‘ for insertions and deletions, appropriately.  GVF is a better format to adopt than PacBio’s take on the same problem (VariantsGffSpecification), especially considering the present company.  Heh.
 
Also, since GVF adheres to SO, it requires ‘Reference_seq’ instead of simply ‘reference’.  So be it!
 
GVF features so-exported can in turn be converted to VCF4.3, which may then in turn be “applied” (using vcf-consensus script from VCFTools) to create an edited reference genome.
 
However, this GVF->VCF4.3 conversion requires accessing the genome to look up the single-base upstream “anchor sequence” which is required by VCF4.x spec.
 
For this reason, I have contrived to augment the Apollo-exported GVF with a new, non-standard, attribute of “VCF_anchor_seq”, being this single upstream reference base.
 
With this new attribute, the GVF to VCF4.1 conversion does not  to require to access the genome fasta.
 
I have coded these changes, and will be testing the whole process in the week to come and will issue a PR upon completion.
 
Thanks All,
 
Malcolm
 
From: <[hidden email]> on behalf of Nathan Dunn <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thursday, February 9, 2017 at 1:34 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
 
Malcom, 
 
We discussed this and felt this was a good interim solution and is definitely more comprehensive.   We had one caveat, which is that we should try to follow the spec and use:
 
“Variant_seq” instead of “residues” or “variantSeq” to more tightly correlate with the official sequence ontology spec: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gvf.md
 
 
Suzi had added a more comprehensive solution (which subsumes what you have proposed here), but that will be implemented longer-term.
 
 
Thanks and looking forward to your PR!
 
Thanks,
 
Nathan
 
On Feb 7, 2017, at 8:49 AM, Nathan Dunn <[hidden email]> wrote:
 
 
On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?
 
Malcolm,
 
Just make it so it works the best for you and create the PR off of that.  
 
If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  
 
Either way, thanks for the PR and the detailed analysis.
 
Nathan
 
 
On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:
 
Thanks for the encouragement. 
 
Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.
 
Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).
 
However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs
 
On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?
 
Cheers,
 
Malcolm
 
From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 
 
Direct export of VCF has my strong vote.
 
With that one change, I like the plan.
 
-S
 
 
On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:
 
Malcolm,
 
The code itself looks fine if you want to initiate a PR. 
 
The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  
 
Thanks,
 
Nathan
 
On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:
 
Hello All,
 
So, think I have this figured out… I’ll deploy and test tomorrow…
 
But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:
 
 
Thanks for following,
 
Malcolm
 
From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Suzanne, Deepak and Unni, and Apollo Dev team,
 
Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.
 
While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.
 
After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."
Apollo Devs,
 
If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 
 
If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.
 
If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.
 
Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.
 
I have been trying to come up to speed with some of the related issues, such as these:
 
 
As well as issues related to doing all this from the command line:
 
  #425 – wherein export_annotations_to_gff3.p was removed
 
I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:
 
  convinces me that my effort is misguided in the first place
  agrees to incorporate this into Apollo post haste
 
So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…
 
Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.
 
FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.
 
Thanks for reading and the encouragement, but, most of all,
 
Thanks for Apollo!
 
Malcolm Cook
 
From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Malcolm, 
 
To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 
 
It is still a work in progress but feel free to contact Chris and I if you have any questions.
 
I hope this helps!
 
Cheers,
 
Deepak Unni
 
 
On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:
Hi Malcolm, 
 
In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.
 
Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 
 
However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).
 
#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 
 
Hope this helps, Suzanna
 
 
On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:
Hello,
 
I am seeking a workflow allowing to
  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome
 
We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  
 
I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.
 
I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?
 
Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.
 
Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?
 
Thanks,
 
-- 
Malcolm Cook
Computation Biology Core
Stowers Institute for Medical Research
Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.




 
-- 
Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 



This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email]| 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 
 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 






This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

Nathan,

 

Can I get a help?  I’m coding by example.   When my new changes run, I’m getting this error:

 

Feb 13, 2017 12:25:38 PM org.apache.catalina.core.StandardWrapperValve invoke

SEVERE: Servlet.service() for servlet [default] in context with path [/apollo] threw exception [org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.codehaus.groovy.grails.web.pages.exceptions.GroovyPagesException: Error processing GroovyPageView: getOutputStream() has already been called for this response] with root cause

java.lang.IllegalStateException: getOutputStream() has already been called for this response

        at org.apache.catalina.connector.Response.getWriter(Response.java:636)

        at org.apache.catalina.connector.ResponseFacade.getWriter(ResponseFacade.java:213)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:158)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:156)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsRoutablePrintWriter.activateDestination(GrailsRoutablePrintWriter.java:75)

… (full log attached)

 

 

Can I ask you to take a quick look at

 

https://github.com/malcook/Apollo/commit/2e10a756f0480062b89d778c239b7a768a2ee975

 

especially where I attempt to pull the single base upstream to the insertion|deletion with:

 

sequenceService.getRawResiduesFromSequence(featureLocation.sequence,featureLocation.fmin-1,featureLocation.fmin)

 

I understand if this is out of scope for you… so, don’t hesitate to decline.

 

Thanks for your helps, and, I think we can get this working together in pretty short order if you have a mind to guide me through the ropes….

 

Cheers,

 

Malcolm

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Nathan Dunn
Sent: Monday, February 13, 2017 10:47 AM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

This sounds great Malcolm.  Looking forward to the PR. 

 

Nathan

 

On Feb 12, 2017, at 2:34 AM, Cook, Malcolm <[hidden email]> wrote:

 

Nathan,

 

Excellent.

 

I am now contriving then for the GFF export of insertions|deletions|substitutions to in effect be well-formed GVF by using “Reference_seq” and “Variant_seq” attributes, which will take the value ‘-‘ for insertions and deletions, appropriately.  GVF is a better format to adopt than PacBio’s take on the same problem (VariantsGffSpecification), especially considering the present company.  Heh.

 

Also, since GVF adheres to SO, it requires ‘Reference_seq’ instead of simply ‘reference’.  So be it!

 

GVF features so-exported can in turn be converted to VCF4.3, which may then in turn be “applied” (using vcf-consensus script from VCFTools) to create an edited reference genome.

 

However, this GVF->VCF4.3 conversion requires accessing the genome to look up the single-base upstream “anchor sequence” which is required by VCF4.x spec.

 

For this reason, I have contrived to augment the Apollo-exported GVF with a new, non-standard, attribute of “VCF_anchor_seq”, being this single upstream reference base.

 

With this new attribute, the GVF to VCF4.1 conversion does not  to require to access the genome fasta.

 

I have coded these changes, and will be testing the whole process in the week to come and will issue a PR upon completion.

 

Thanks All,

 

Malcolm

 

From: <[hidden email]> on behalf of Nathan Dunn <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thursday, February 9, 2017 at 1:34 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

Malcom, 

 

We discussed this and felt this was a good interim solution and is definitely more comprehensive.   We had one caveat, which is that we should try to follow the spec and use:

 

“Variant_seq” instead of “residues” or “variantSeq” to more tightly correlate with the official sequence ontology spec: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gvf.md

 

 

Suzi had added a more comprehensive solution (which subsumes what you have proposed here), but that will be implemented longer-term.

 

 

Thanks and looking forward to your PR!

 

Thanks,

 

Nathan

 

On Feb 7, 2017, at 8:49 AM, Nathan Dunn <[hidden email]> wrote:

 

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Malcolm,

 

Just make it so it works the best for you and create the PR off of that.  

 

If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  

 

Either way, thanks for the PR and the detailed analysis.

 

Nathan

 

 

On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:

 

Thanks for the encouragement. 

 

Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.

 

Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).

 

However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Cheers,

 

Malcolm

 

From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 

 

Direct export of VCF has my strong vote.

 

With that one change, I like the plan.

 

-S

 

 

On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:

 

Malcolm,

 

The code itself looks fine if you want to initiate a PR. 

 

The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  

 

Thanks,

 

Nathan

 

On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:

 

Hello All,

 

So, think I have this figured out… I’ll deploy and test tomorrow…

 

But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:

 

 

Thanks for following,

 

Malcolm

 

From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Suzanne, Deepak and Unni, and Apollo Dev team,

 

Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.

 

While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.

 

After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

Apollo Devs,

 

If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 

 

If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.

 

If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.

 

Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.

 

I have been trying to come up to speed with some of the related issues, such as these:

 

 

As well as issues related to doing all this from the command line:

 

  #425 – wherein export_annotations_to_gff3.p was removed

 

I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:

 

  convinces me that my effort is misguided in the first place

  agrees to incorporate this into Apollo post haste

 

So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…

 

Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.

 

FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.

 

Thanks for reading and the encouragement, but, most of all,

 

Thanks for Apollo!

 

Malcolm Cook

 

From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Malcolm, 

 

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

 

I hope this helps!

 

Cheers,

 

Deepak Unni

 

 

On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:

Hi Malcolm, 

 

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

 

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

 

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

 

Hope this helps, Suzanna

 

 

On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri






This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.



 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.





 

-- 

Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.



 




This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email]| 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 

 







This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


localhost.2017-02-13.log (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

Hi Nathan,

 

I tried adding semi-colons to my groovy, which I guess I first intuited were optional, but adding them in seems to cleared up the earlier “Error processing GroovyPageView: getOutputStream() has already been called for this response]”

 

Still in need of a help though…

 

I’m still pretty sure my call to

 

sequenceService.getRawResiduesFromSequence(featureLocation.sequence,featureLocation.fmin-1,featureLocation.fmin)

 

is the culprit based on the following appearing in my stacktrace.log:

 

org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.bbop.apollo.UserOrganismPreference#79805]

                at org.bbop.apollo.PreferenceService.$tt__getCurrentOrganismPreference(PreferenceService.groovy:183)

                at org.bbop.apollo.PreferenceService.$tt__getOrganismFromPreferences(PreferenceService.groovy:252)

                at org.bbop.apollo.PreferenceService.$tt__getCurrentOrganismForCurrentUser(PreferenceService.groovy:16)

                at org.bbop.apollo.SequenceController.$tt__getSequences(SequenceController.groovy:185)

                at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198)

                at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)

                at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)

                at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)

                at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)

                at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)

                at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)

                at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)

                at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

 

All helps from all quarters very welcome.

 

Thx,

 

Malcolm

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Cook, Malcolm
Sent: Monday, February 13, 2017 12:48 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: RE: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Nathan,

 

Can I get a help?  I’m coding by example.   When my new changes run, I’m getting this error:

 

Feb 13, 2017 12:25:38 PM org.apache.catalina.core.StandardWrapperValve invoke

SEVERE: Servlet.service() for servlet [default] in context with path [/apollo] threw exception [org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.codehaus.groovy.grails.web.pages.exceptions.GroovyPagesException: Error processing GroovyPageView: getOutputStream() has already been called for this response] with root cause

java.lang.IllegalStateException: getOutputStream() has already been called for this response

        at org.apache.catalina.connector.Response.getWriter(Response.java:636)

        at org.apache.catalina.connector.ResponseFacade.getWriter(ResponseFacade.java:213)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:158)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:156)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsRoutablePrintWriter.activateDestination(GrailsRoutablePrintWriter.java:75)

… (full log attached)

 

 

Can I ask you to take a quick look at

 

https://github.com/malcook/Apollo/commit/2e10a756f0480062b89d778c239b7a768a2ee975

 

especially where I attempt to pull the single base upstream to the insertion|deletion with:

 

sequenceService.getRawResiduesFromSequence(featureLocation.sequence,featureLocation.fmin-1,featureLocation.fmin)

 

I understand if this is out of scope for you… so, don’t hesitate to decline.

 

Thanks for your helps, and, I think we can get this working together in pretty short order if you have a mind to guide me through the ropes….

 

Cheers,

 

Malcolm

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Nathan Dunn
Sent: Monday, February 13, 2017 10:47 AM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

This sounds great Malcolm.  Looking forward to the PR. 

 

Nathan

 

On Feb 12, 2017, at 2:34 AM, Cook, Malcolm <[hidden email]> wrote:

 

Nathan,

 

Excellent.

 

I am now contriving then for the GFF export of insertions|deletions|substitutions to in effect be well-formed GVF by using “Reference_seq” and “Variant_seq” attributes, which will take the value ‘-‘ for insertions and deletions, appropriately.  GVF is a better format to adopt than PacBio’s take on the same problem (VariantsGffSpecification), especially considering the present company.  Heh.

 

Also, since GVF adheres to SO, it requires ‘Reference_seq’ instead of simply ‘reference’.  So be it!

 

GVF features so-exported can in turn be converted to VCF4.3, which may then in turn be “applied” (using vcf-consensus script from VCFTools) to create an edited reference genome.

 

However, this GVF->VCF4.3 conversion requires accessing the genome to look up the single-base upstream “anchor sequence” which is required by VCF4.x spec.

 

For this reason, I have contrived to augment the Apollo-exported GVF with a new, non-standard, attribute of “VCF_anchor_seq”, being this single upstream reference base.

 

With this new attribute, the GVF to VCF4.1 conversion does not  to require to access the genome fasta.

 

I have coded these changes, and will be testing the whole process in the week to come and will issue a PR upon completion.

 

Thanks All,

 

Malcolm

 

From: <[hidden email]> on behalf of Nathan Dunn <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thursday, February 9, 2017 at 1:34 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

Malcom, 

 

We discussed this and felt this was a good interim solution and is definitely more comprehensive.   We had one caveat, which is that we should try to follow the spec and use:

 

“Variant_seq” instead of “residues” or “variantSeq” to more tightly correlate with the official sequence ontology spec: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gvf.md

 

 

Suzi had added a more comprehensive solution (which subsumes what you have proposed here), but that will be implemented longer-term.

 

 

Thanks and looking forward to your PR!

 

Thanks,

 

Nathan

 

On Feb 7, 2017, at 8:49 AM, Nathan Dunn <[hidden email]> wrote:

 

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Malcolm,

 

Just make it so it works the best for you and create the PR off of that.  

 

If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  

 

Either way, thanks for the PR and the detailed analysis.

 

Nathan

 

 

On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:

 

Thanks for the encouragement. 

 

Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.

 

Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).

 

However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Cheers,

 

Malcolm

 

From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 

 

Direct export of VCF has my strong vote.

 

With that one change, I like the plan.

 

-S

 

 

On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:

 

Malcolm,

 

The code itself looks fine if you want to initiate a PR. 

 

The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  

 

Thanks,

 

Nathan

 

On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:

 

Hello All,

 

So, think I have this figured out… I’ll deploy and test tomorrow…

 

But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:

 

 

Thanks for following,

 

Malcolm

 

From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Suzanne, Deepak and Unni, and Apollo Dev team,

 

Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.

 

While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.

 

After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

Apollo Devs,

 

If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 

 

If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.

 

If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.

 

Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.

 

I have been trying to come up to speed with some of the related issues, such as these:

 

 

As well as issues related to doing all this from the command line:

 

  #425 – wherein export_annotations_to_gff3.p was removed

 

I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:

 

  convinces me that my effort is misguided in the first place

  agrees to incorporate this into Apollo post haste

 

So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…

 

Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.

 

FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.

 

Thanks for reading and the encouragement, but, most of all,

 

Thanks for Apollo!

 

Malcolm Cook

 

From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Malcolm, 

 

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

 

I hope this helps!

 

Cheers,

 

Deepak Unni

 

 

On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:

Hi Malcolm, 

 

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

 

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

 

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

 

Hope this helps, Suzanna

 

 

On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome

 

We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  

 

I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.

 

I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?

 

Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.

 

Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?

 

Thanks,

 

-- 

Malcolm Cook

Computation Biology Core

Stowers Institute for Medical Research

Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.




 

-- 

Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 




This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email]| 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 

 






This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: annotating assembly errors in apollo => fixing the underlying genome

nathandunn

Malcolm,

The getOutputStream() call was probably due to it rendering output twice.  The changes in your code, would probably not have triggered that, nor the changes to the semi-colon as far as I can tell.    

When you making changes on the server (running “./apollo run-local”), you should see it triggering “compilation” after a few seconds of any change.  I usually use an IDE like IntelliJ or NetBeans for this type of works. 

For this error below, I would:

A - delete all contents from your preference table.   2.0.6 fixed a number a problems with this, so hopefully you won’t be seeing this
B - reload and try again

If this is blocking your code, you can create a PR and we can take a look. 

Nathan

On Feb 13, 2017, at 11:27 AM, Cook, Malcolm <[hidden email]> wrote:

Hi Nathan,
 
I tried adding semi-colons to my groovy, which I guess I first intuited were optional, but adding them in seems to cleared up the earlier “Error processing GroovyPageView: getOutputStream() has already been called for this response]”
 
Still in need of a help though…
 
I’m still pretty sure my call to
 
sequenceService.getRawResiduesFromSequence(featureLocation.sequence,featureLocation.fmin-1,featureLocation.fmin)
 
is the culprit based on the following appearing in my stacktrace.log:
 
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.bbop.apollo.UserOrganismPreference#79805]
                at org.bbop.apollo.PreferenceService.$tt__getCurrentOrganismPreference(PreferenceService.groovy:183)
                at org.bbop.apollo.PreferenceService.$tt__getOrganismFromPreferences(PreferenceService.groovy:252)
                at org.bbop.apollo.PreferenceService.$tt__getCurrentOrganismForCurrentUser(PreferenceService.groovy:16)
                at org.bbop.apollo.SequenceController.$tt__getSequences(SequenceController.groovy:185)
                at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198)
                at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)
                at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
                at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
                at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
                at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
                at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
                at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
                at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:745)
 
All helps from all quarters very welcome.
 
Thx,
 
Malcolm
 
From: [hidden email] [[hidden email]] On Behalf Of Cook, Malcolm
Sent: Monday, February 13, 2017 12:48 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: RE: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Nathan,
 
Can I get a help?  I’m coding by example.   When my new changes run, I’m getting this error:
 
Feb 13, 2017 12:25:38 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [default] in context with path [/apollo] threw exception [org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.codehaus.groovy.grails.web.pages.exceptions.GroovyPagesException: Error processing GroovyPageView: getOutputStream() has already been called for this response] with root cause
java.lang.IllegalStateException: getOutputStream() has already been called for this response
        at org.apache.catalina.connector.Response.getWriter(Response.java:636)
        at org.apache.catalina.connector.ResponseFacade.getWriter(ResponseFacade.java:213)
        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:158)
        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:156)
        at org.codehaus.groovy.grails.web.sitemesh.GrailsRoutablePrintWriter.activateDestination(GrailsRoutablePrintWriter.java:75)
… (full log attached)
 
 
Can I ask you to take a quick look at
 
 
especially where I attempt to pull the single base upstream to the insertion|deletion with:
 
sequenceService.getRawResiduesFromSequence(featureLocation.sequence,featureLocation.fmin-1,featureLocation.fmin)
 
I understand if this is out of scope for you… so, don’t hesitate to decline.
 
Thanks for your helps, and, I think we can get this working together in pretty short order if you have a mind to guide me through the ropes….
 
Cheers,
 
Malcolm
 
 
 
From: [hidden email] [[hidden email]] On Behalf Of Nathan Dunn
Sent: Monday, February 13, 2017 10:47 AM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
 
This sounds great Malcolm.  Looking forward to the PR. 
 
Nathan
 
On Feb 12, 2017, at 2:34 AM, Cook, Malcolm <[hidden email]> wrote:
 
Nathan,
 
Excellent.
 
I am now contriving then for the GFF export of insertions|deletions|substitutions to in effect be well-formed GVF by using “Reference_seq” and “Variant_seq” attributes, which will take the value ‘-‘ for insertions and deletions, appropriately.  GVF is a better format to adopt than PacBio’s take on the same problem (VariantsGffSpecification), especially considering the present company.  Heh.
 
Also, since GVF adheres to SO, it requires ‘Reference_seq’ instead of simply ‘reference’.  So be it!
 
GVF features so-exported can in turn be converted to VCF4.3, which may then in turn be “applied” (using vcf-consensus script from VCFTools) to create an edited reference genome.
 
However, this GVF->VCF4.3 conversion requires accessing the genome to look up the single-base upstream “anchor sequence” which is required by VCF4.x spec.
 
For this reason, I have contrived to augment the Apollo-exported GVF with a new, non-standard, attribute of “VCF_anchor_seq”, being this single upstream reference base.
 
With this new attribute, the GVF to VCF4.1 conversion does not  to require to access the genome fasta.
 
I have coded these changes, and will be testing the whole process in the week to come and will issue a PR upon completion.
 
Thanks All,
 
Malcolm
 
From: <[hidden email]> on behalf of Nathan Dunn <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thursday, February 9, 2017 at 1:34 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
 
Malcom, 
 
We discussed this and felt this was a good interim solution and is definitely more comprehensive.   We had one caveat, which is that we should try to follow the spec and use:
 
“Variant_seq” instead of “residues” or “variantSeq” to more tightly correlate with the official sequence ontology spec: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gvf.md
 
 
Suzi had added a more comprehensive solution (which subsumes what you have proposed here), but that will be implemented longer-term.
 
 
Thanks and looking forward to your PR!
 
Thanks,
 
Nathan
 
On Feb 7, 2017, at 8:49 AM, Nathan Dunn <[hidden email]> wrote:
 
 
On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?
 
Malcolm,
 
Just make it so it works the best for you and create the PR off of that.  
 
If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  
 
Either way, thanks for the PR and the detailed analysis.
 
Nathan
 
 
On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:
 
Thanks for the encouragement. 
 
Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.
 
Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).
 
However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs
 
On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?
 
Cheers,
 
Malcolm
 
From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 
 
Direct export of VCF has my strong vote.
 
With that one change, I like the plan.
 
-S
 
 
On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:
 
Malcolm,
 
The code itself looks fine if you want to initiate a PR. 
 
The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  
 
Thanks,
 
Nathan
 
On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:
 
Hello All,
 
So, think I have this figured out… I’ll deploy and test tomorrow…
 
But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:
 
 
Thanks for following,
 
Malcolm
 
From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Suzanne, Deepak and Unni, and Apollo Dev team,
 
Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.
 
While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.
 
After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."
Apollo Devs,
 
If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 
 
If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.
 
If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.
 
Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.
 
I have been trying to come up to speed with some of the related issues, such as these:
 
 
As well as issues related to doing all this from the command line:
 
  #425 – wherein export_annotations_to_gff3.p was removed
 
I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:
 
  convinces me that my effort is misguided in the first place
  agrees to incorporate this into Apollo post haste
 
So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…
 
Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.
 
FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.
 
Thanks for reading and the encouragement, but, most of all,
 
Thanks for Apollo!
 
Malcolm Cook
 
From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome
 
Hi Malcolm, 
 
To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 
 
It is still a work in progress but feel free to contact Chris and I if you have any questions.
 
I hope this helps!
 
Cheers,
 
Deepak Unni
 
 
On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:
Hi Malcolm, 
 
In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.
 
Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 
 
However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).
 
#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 
 
Hope this helps, Suzanna
 
 
On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:
Hello,
 
I am seeking a workflow allowing to
  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those  judgements, as edits, to the coordinates of a set of tracks (i.e. GFF GFF3 bed, etc) on the genome
  3. systematically apply those  judgements, as edits, to the sequence of the underlying genome
 
We are currently using Apollo Version: 2.0.6 which provides for some aspects of #1 with its ability to "Create genomic insertion|deletion|substitution".  
 
I believe that Apollo does not provide for #2 or #3 but am hoping to find I am wrong, or that others have working approach to them they might be willing to share.
 
I can image converting the genomic indels from exported GFF3 format to a .chain format for use with the liftOver tool, allowing to effect #2.  Has anyone made this, or similar work?
 
Similarly, the exported GFF3 should be interpretable as an “edit decision list” of changes to be applied to the underlying genome.  But I have never seen this done.  Any pointers / suggestions are very welcome.
 
Finally, if this has not already been done elsewhere, is this of sufficient relevance to the Apollo project that implementation of #2 and #3 would be a welcome addition to the project?
 
Thanks,
 
-- 
Malcolm Cook
Computation Biology Core
Stowers Institute for Medical Research
Kansas City, Missouri





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.




 
-- 
Research Analyst
S104A Animal Science Research Center,
University of Missouri, Columbia




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.


 



This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email]| 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 
 






This list is for the Apollo Annotation Editing Tool. Info at 
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to 
[hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 

 



This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: annotating assembly errors in apollo => fixing the underlying genome

Cook, Malcolm

I figured it out!!    I was confused as to how to use the sequenceService calls to pull the single upstream base from the feature.  I am now using getRawResiduesFromSequence correctly.

 

I’ve been doing my edits in emacs, commiting to my github fork, and Sofia has been deploying for me.

 

Very long winded… Obviously there is much for us to learn if we are to do more of this.

 

Anyway, I am now onto testing the downstream of this GVF generation.

 

Hoping for a PR by Wednesday (other things, etc).

 

Thanks for your helps!

 

If you’re interested, my changes begin at:

 

https://github.com/malcook/Apollo/blob/40d6554e6171388dfbf3caed0e976b8a304b40ce/grails-app/services/org/bbop/apollo/Gff3HandlerService.groovy#L394

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Nathan Dunn
Sent: Monday, February 13, 2017 1:57 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

Malcolm,

 

The getOutputStream() call was probably due to it rendering output twice.  The changes in your code, would probably not have triggered that, nor the changes to the semi-colon as far as I can tell.    

 

When you making changes on the server (running “./apollo run-local”), you should see it triggering “compilation” after a few seconds of any change.  I usually use an IDE like IntelliJ or NetBeans for this type of works. 

 

For this error below, I would:

 

A - delete all contents from your preference table.   2.0.6 fixed a number a problems with this, so hopefully you won’t be seeing this

B - reload and try again

 

If this is blocking your code, you can create a PR and we can take a look. 

 

Nathan

 

On Feb 13, 2017, at 11:27 AM, Cook, Malcolm <[hidden email]> wrote:

 

Hi Nathan,

 

I tried adding semi-colons to my groovy, which I guess I first intuited were optional, but adding them in seems to cleared up the earlier “Error processing GroovyPageView: getOutputStream() has already been called for this response]”

 

Still in need of a help though…

 

I’m still pretty sure my call to

 

sequenceService.getRawResiduesFromSequence(featureLocation.sequence,featureLocation.fmin-1,featureLocation.fmin)

 

is the culprit based on the following appearing in my stacktrace.log:

 

org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.bbop.apollo.UserOrganismPreference#79805]

                at org.bbop.apollo.PreferenceService.$tt__getCurrentOrganismPreference(PreferenceService.groovy:183)

                at org.bbop.apollo.PreferenceService.$tt__getOrganismFromPreferences(PreferenceService.groovy:252)

                at org.bbop.apollo.PreferenceService.$tt__getCurrentOrganismForCurrentUser(PreferenceService.groovy:16)

                at org.bbop.apollo.SequenceController.$tt__getSequences(SequenceController.groovy:185)

                at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198)

                at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)

                at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)

                at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)

                at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)

                at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)

                at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)

                at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)

                at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

 

All helps from all quarters very welcome.

 

Thx,

 

Malcolm

 

From: [hidden email] [[hidden email]] On Behalf Of Cook, Malcolm
Sent: Monday, February 13, 2017 12:48 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: RE: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Nathan,

 

Can I get a help?  I’m coding by example.   When my new changes run, I’m getting this error:

 

Feb 13, 2017 12:25:38 PM org.apache.catalina.core.StandardWrapperValve invoke

SEVERE: Servlet.service() for servlet [default] in context with path [/apollo] threw exception [org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.codehaus.groovy.grails.web.pages.exceptions.GroovyPagesException: Error processing GroovyPageView: getOutputStream() has already been called for this response] with root cause

java.lang.IllegalStateException: getOutputStream() has already been called for this response

        at org.apache.catalina.connector.Response.getWriter(Response.java:636)

        at org.apache.catalina.connector.ResponseFacade.getWriter(ResponseFacade.java:213)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:158)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsPageResponseWrapper$5.activateDestination(GrailsPageResponseWrapper.java:156)

        at org.codehaus.groovy.grails.web.sitemesh.GrailsRoutablePrintWriter.activateDestination(GrailsRoutablePrintWriter.java:75)

… (full log attached)

 

 

Can I ask you to take a quick look at

 

 

especially where I attempt to pull the single base upstream to the insertion|deletion with:

 

sequenceService.getRawResiduesFromSequence(featureLocation.sequence,featureLocation.fmin-1,featureLocation.fmin)

 

I understand if this is out of scope for you… so, don’t hesitate to decline.

 

Thanks for your helps, and, I think we can get this working together in pretty short order if you have a mind to guide me through the ropes….

 

Cheers,

 

Malcolm

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Nathan Dunn
Sent: Monday, February 13, 2017 10:47 AM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

This sounds great Malcolm.  Looking forward to the PR. 

 

Nathan

 

On Feb 12, 2017, at 2:34 AM, Cook, Malcolm <[hidden email]> wrote:

 

Nathan,

 

Excellent.

 

I am now contriving then for the GFF export of insertions|deletions|substitutions to in effect be well-formed GVF by using “Reference_seq” and “Variant_seq” attributes, which will take the value ‘-‘ for insertions and deletions, appropriately.  GVF is a better format to adopt than PacBio’s take on the same problem (VariantsGffSpecification), especially considering the present company.  Heh.

 

Also, since GVF adheres to SO, it requires ‘Reference_seq’ instead of simply ‘reference’.  So be it!

 

GVF features so-exported can in turn be converted to VCF4.3, which may then in turn be “applied” (using vcf-consensus script from VCFTools) to create an edited reference genome.

 

However, this GVF->VCF4.3 conversion requires accessing the genome to look up the single-base upstream “anchor sequence” which is required by VCF4.x spec.

 

For this reason, I have contrived to augment the Apollo-exported GVF with a new, non-standard, attribute of “VCF_anchor_seq”, being this single upstream reference base.

 

With this new attribute, the GVF to VCF4.1 conversion does not  to require to access the genome fasta.

 

I have coded these changes, and will be testing the whole process in the week to come and will issue a PR upon completion.

 

Thanks All,

 

Malcolm

 

From: <[hidden email]> on behalf of Nathan Dunn <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thursday, February 9, 2017 at 1:34 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

 

Malcom, 

 

We discussed this and felt this was a good interim solution and is definitely more comprehensive.   We had one caveat, which is that we should try to follow the spec and use:

 

“Variant_seq” instead of “residues” or “variantSeq” to more tightly correlate with the official sequence ontology spec: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gvf.md

 

 

Suzi had added a more comprehensive solution (which subsumes what you have proposed here), but that will be implemented longer-term.

 

 

Thanks and looking forward to your PR!

 

Thanks,

 

Nathan

 

On Feb 7, 2017, at 8:49 AM, Nathan Dunn <[hidden email]> wrote:

 

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Malcolm,

 

Just make it so it works the best for you and create the PR off of that.  

 

If, for some reason, there is a portion of the change we don’t want / need we can make the change when we pull the change in.  

 

Either way, thanks for the PR and the detailed analysis.

 

Nathan

 

 

On Feb 6, 2017, at 12:58 PM, Cook, Malcolm <[hidden email]> wrote:

 

Thanks for the encouragement. 

 

Nonetheless, I’m happy to say that my changes to the GFF exporter worked as I hoped upon testing just an hour ago.  So I have a approach that meets my immediate needs.

 

Nathan, I will submit a PR as soon as I have confirmed a few more cases, and demonstrated that it works with my pipeline (i.e it has value to at least one person).

 

However, I expect your team might find that it would need implementation as a (TBW) CustomGff3HandlerService, such as discussed in  Write GFF3 adaptors to meet specific needs

 

On a related note, since Add residues to the output of the GFF3 for insertions is so recent and not yet documented (AFAIK), if I were on team, I might we just take the PR as written, but with the addition that we also change the name of the ‘residues’ attribute to be ‘variantSeq’, so as to be instantly compatible with variants.gff File Format (Version 2.1) (at least its something).  Any thoughts about that?  Should I roll it into my PR?

 

Cheers,

 

Malcolm

 

From: [hidden email] [[hidden email]] On Behalf Of Suzanna Lewis
Sent: Monday, February 06, 2017 2:29 PM
To: [hidden email]
Cc: Robb, Sofia <[hidden email]>; Parker, Hugo <[hidden email]>; Wiedemann, Leanne <[hidden email]>; Chris Elsik <[hidden email]>; Colin Diesh <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

I definitely like the idea of using vcf as the export format for variants (whether biological or artifactual, but not mixed). Ideally this capability should be an option directly within Apollo. Rather than add yet another hacky piece of information to gff column 9 I'd much rather see direct export of VCF. The would be useful for other applications as well. This is what you say here "If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF." 

 

Direct export of VCF has my strong vote.

 

With that one change, I like the plan.

 

-S

 

 

On Mon, Feb 6, 2017 at 7:15 AM, Nathan Dunn <[hidden email]> wrote:

 

Malcolm,

 

The code itself looks fine if you want to initiate a PR. 

 

The Apollo dev team will talk later this week and see if its something we want to add to the mainline.  

 

Thanks,

 

Nathan

 

On Feb 5, 2017, at 7:43 PM, Cook, Malcolm <[hidden email]> wrote:

 

Hello All,

 

So, think I have this figured out… I’ll deploy and test tomorrow…

 

But, I’d welcome a code review in advance of testing if any of you devs are working late on a Sunday and care to chime in. Here’s the commit to my fork:

 

 

Thanks for following,

 

Malcolm

 

From: Malcolm Cook <[hidden email]>
Date: Sunday, February 5, 2017 at 7:09 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Suzanne, Deepak and Unni, and Apollo Dev team,

 

Yes, Suzi, your remark re: #3 “Or perhaps what you're after is simply the modified genomic as fasta” is exactly the case.  I want errors in the reference sequence, as identified by curators, to be understood as “edits” to be applied to the reference sequence, thereby producing an updated error-free version.

 

While it is good to know that Apollo export of transcript or peptide sequences can respect curated genomic edits, what I am seeking is to allow creation of an entire new version of the genome and corresponding new version of any genome annotations which observe the edits and their consequence.

 

After further study, I believe my aim can be achieved as follows:

·         extract the insertion/deletion/substitution as gff using a version of Apollo’s Gff3HandlerService slightly modified to include a new column 9 annotation of “reference”, being “the reference base or bases for the variant site. May be . to represent a zero-length substring (for insertion events)” as defined by PacBio by their spec of variant.gff

·         convert the extracted GFF to comport with PacBio’s variant.GFF (perl trivial one-liner)

·         convert the resulting variant.GFF into well-formed VCF file using gffToVcf, also from PacBio.

·         convert the VCF to a .chain file, possibly using vcf2chain - (part of g2gtools)

·         use the .chain file with either UCSC’s liftover or CrossMap to create updated versions of any bed,gff,wig,etc tracks using coordinates appropriate to the updated, error-free genome

·         use the vcf file to apply the edits to the reference sequence using one of

    • vcf-consensus (part of VCFtools)- which can "Apply VCF variants to a fasta file to create consensus sequence."
    • FastaAlternateReferenceMaker (part of Genome Analysis Toolkit - GATK) which "Given a variant callset, this tool replaces the reference bases at variation sites with the bases supplied in the corresponding callset records. Additionally, it allows for one or more "snpmask" VCFs to set overlapping bases to 'N'."

Apollo Devs,

 

If I can get the proposed “reference” column added to current GFF export, I have a path to achieving my mission. 

 

If the Apollo GFF exporter can further use the term “variantSeq” where it currently uses the term “residues”, my perl one-liner to convert Apollo exported GFF to pacBio’s variant.GFF is that much shorter.

 

If Apollo were enabled to export the substitution/insertion/deletion features as VCF directly, well, that would very happily remove another step in my pipeline above.  This is arguably a reasonable thing to do; since VCF has been picked up by other browsers as a track format (IGV, IGB, UCSC, inter alia) there is little use of requiring it as GFF.

 

Is all this another hacky solution?  I don’t think so.  It is a complex problem, and I think using VCF and .chain files are the appropriate representations and allow a good separation of concerns between the tools involved.  Of course care must be taken, or else!  As noted.

 

I have been trying to come up to speed with some of the related issues, such as these:

 

 

As well as issues related to doing all this from the command line:

 

  #425 – wherein export_annotations_to_gff3.p was removed

 

I see that the change to add residues to the GFF in column 9 is very recent (thanks Colin), and it guides me to where a change might arguably be made.  I am poised to try and undertake this myself but will gladly refrain if someone either:

 

  convinces me that my effort is misguided in the first place

  agrees to incorporate this into Apollo post haste

 

So… I welcome further guidance, suggestions, contrary opinions, hey, I’ll take a mocking sneer if you’ve read this far…

 

Chris & Deepak, I am unsure how closely my use case matches your mission.  We are close by and I would welcome a visit at the Stowers Institute if you’re even in KC.

 

FWIW: my inquiry arises in the context of curation of the Sea Lamprey genome.

 

Thanks for reading and the encouragement, but, most of all,

 

Thanks for Apollo!

 

Malcolm Cook

 

From: <[hidden email]> on behalf of Deepak Unni <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Friday, February 3, 2017 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Cc: Sofia Robb <[hidden email]>, "Parker, Hugo" <[hidden email]>, Leanne Wiedemann/Krumlauf <[hidden email]>, Chris Elsik <[hidden email]>
Subject: Re: [apollo] annotating assembly errors in apollo => fixing the underlying genome

 

Hi Malcolm, 

 

To extend on Suzi's email, there is another tool called the Locus Specific Alternate Assembly (LSAA), being developed in Elsik Lab at University of Missouri, which is a plugin to Apollo that aims at addressing the 2nd and 3rd use case. But this tool is being designed as part of a separately funded project and is not part of Apollo. Chris Elsik (CCed in this email) is the lead PI on LSAA and I am the developer working on this tool. 

 

It is still a work in progress but feel free to contact Chris and I if you have any questions.

 

I hope this helps!

 

Cheers,

 

Deepak Unni

 

 

On Fri, Feb 3, 2017 at 2:14 PM, Suzanna Lewis <[hidden email]> wrote:

Hi Malcolm, 

 

In answer to your broader question, this is very much of relevance to the Apollo project. It's a problem that we've faced since the days of desktop Apollo and have never had anything more than a hacky solution.

 

Within Apollo itself #2 is handled. That is, if you view the transcript or peptide sequences the genome edits are applied before determining the resulting transcript or peptide sequence you can see in the sequence window. Likewise if you export fasta of either of these the corrections are in place. (If they aren't then that is a bug and please file an ticket on the issue tracker). 

 

However the export of the GFF3 contains the feature coordinates on the original reference genome sequence. Where and how to include this information in the GFF3 in a non-hacky way is an open question. For one we need to clarify the origin of the genomic variation from the reference because we (Apollo developers) want to enable curators to annotate both technical artefacts and biological variations (see https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/388, which by chance I just happened to open yesterday, they said they'd look at it over the weekend).

 

#3 I'm uncertain what precisely you might be after. I believe that Chris Elsik in Missouri was doing some work to take curators assembly error edits and feed these back to NCBI to improve the public reference sequence. Or perhaps what you're after is simply the modified genomic as fasta? 

 

Hope this helps, Suzanna

 

 

On Fri, Feb 3, 2017 at 7:31 AM, Cook, Malcolm <[hidden email]> wrote:

Hello,

 

I am seeking a workflow allowing to

  1. capture curator's judgements as to assembly error using Apollo
  2. systematically apply those &