Re: [galaxy-user] Filename extension in new tool

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [galaxy-user] Filename extension in new tool

Peter Cock
On Thu, Feb 17, 2011 at 3:00 AM, Sean Davis <[hidden email]> wrote:
> I have a tool that takes a pdb file as input.  The authors of the *compiled*
> code require that the suffix be either ".pdb" or ".ent".  When I upload a
> .pdb file, the filename that gets fed to the tool now ends in .dat.  What is
> the best way to get the original file extension stored in the file database?
>
> Thanks,
> Sean

Once in Galaxy all the data files have the extension .dat on disk, so
I would try using a wrapper script that creates a symbolic link from the
input.dat file to something like input.pdb or input.ent (and if that doesn't
work, copy the file) before running the compiled code and then remove
it afterwards.

Separately from this, you may need to extend Galaxy to define pdb
as a new file format (ideally with a data type sniffer).

This kind of question is better asked on the dev list (CC'dd)

Peter

_______________________________________________
To manage your subscriptions to this and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: [galaxy-user] Filename extension in new tool

Sean Davis


On Thu, Feb 17, 2011 at 5:48 AM, Peter Cock <[hidden email]> wrote:
On Thu, Feb 17, 2011 at 3:00 AM, Sean Davis <[hidden email]> wrote:
> I have a tool that takes a pdb file as input.  The authors of the *compiled*
> code require that the suffix be either ".pdb" or ".ent".  When I upload a
> .pdb file, the filename that gets fed to the tool now ends in .dat.  What is
> the best way to get the original file extension stored in the file database?
>
> Thanks,
> Sean

Once in Galaxy all the data files have the extension .dat on disk, so
I would try using a wrapper script that creates a symbolic link from the
input.dat file to something like input.pdb or input.ent (and if that doesn't
work, copy the file) before running the compiled code and then remove
it afterwards.


Hi, Peter.  I ended up doing just that.  The hack in all its messiness is here:


 
Separately from this, you may need to extend Galaxy to define pdb
as a new file format (ideally with a data type sniffer).

This kind of question is better asked on the dev list (CC'dd)


Thanks.  That is the next step.

Sean


_______________________________________________
To manage your subscriptions to this and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: [galaxy-user] Filename extension in new tool

Peter Cock
On Thu, Feb 17, 2011 at 12:37 PM, Sean Davis wrote:

>
> On Thu, Feb 17, 2011 at 5:48 AM, Peter wrote:
>>
>> Once in Galaxy all the data files have the extension .dat on disk, so
>> I would try using a wrapper script that creates a symbolic link from the
>> input.dat file to something like input.pdb or input.ent (and if that
>> doesn't
>> work, copy the file) before running the compiled code and then remove
>> it afterwards.
>>
>
> Hi, Peter.  I ended up doing just that.  The hack in all its messiness is
> here:
> https://gist.github.com/831017

I would be wary of using ${input.name} like that - test with things
like renaming the dataset in Galaxy, and pasting in a PBD file
rather than uploading one. Also I suspect you can get filenames
with spaces in them which will probably cause trouble. You'll
notice that Galaxy generates its own *.dat filename which avoid
spaces.

Personally I would generate the *.pdb or *.ent filename within
the wrapper script based on the input file name (*.dat). Try:

os.symlink(fname,fname+".pdb")
...
symdcmd = "SymD %s.pdb" % fname


>>
>> Separately from this, you may need to extend Galaxy to define pdb
>> as a new file format (ideally with a data type sniffer).
>>
>> This kind of question is better asked on the dev list (CC'dd)
>>
>
> Thanks.  That is the next step.

I haven't done this myself yet (but I may well need to before long).

Peter

_______________________________________________
To manage your subscriptions to this and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: [galaxy-user] Filename extension in new tool

Sean Davis


On Thu, Feb 17, 2011 at 8:07 AM, Peter Cock <[hidden email]> wrote:
On Thu, Feb 17, 2011 at 12:37 PM, Sean Davis wrote:
>
> On Thu, Feb 17, 2011 at 5:48 AM, Peter wrote:
>>
>> Once in Galaxy all the data files have the extension .dat on disk, so
>> I would try using a wrapper script that creates a symbolic link from the
>> input.dat file to something like input.pdb or input.ent (and if that
>> doesn't
>> work, copy the file) before running the compiled code and then remove
>> it afterwards.
>>
>
> Hi, Peter.  I ended up doing just that.  The hack in all its messiness is
> here:
> https://gist.github.com/831017

I would be wary of using ${input.name} like that - test with things
like renaming the dataset in Galaxy, and pasting in a PBD file
rather than uploading one. Also I suspect you can get filenames
with spaces in them which will probably cause trouble. You'll
notice that Galaxy generates its own *.dat filename which avoid
spaces.

Personally I would generate the *.pdb or *.ent filename within
the wrapper script based on the input file name (*.dat). Try:


Unfortunately, the command-line executable assumes that the filename contains the ID of the PDB record, so I actually need this right now.  I'm going to have a chat with the command-line tool developer about designing a more robust interface.

 
os.symlink(fname,fname+".pdb")
...
symdcmd = "SymD %s.pdb" % fname


>>
>> Separately from this, you may need to extend Galaxy to define pdb
>> as a new file format (ideally with a data type sniffer).
>>
>> This kind of question is better asked on the dev list (CC'dd)
>>
>
> Thanks.  That is the next step.

I haven't done this myself yet (but I may well need to before long).


I extended based on filename extension and added the datatype to data.py.  This works like a charm, but it isn't foolproof, obviously (no sniffer yet).  The PDB format isn't too complicated, but it is flexible, so I need to find out exactly what is required as opposed to "possible".  I see that biopython has a class and parser for it, so I might be able to use that rather directly. 

Sean


_______________________________________________
To manage your subscriptions to this and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/