CWL workflow as Debian package

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

CWL workflow as Debian package

Steffen Möller-4
Hello,

I had an exchange with Stian yesterday about what CWL workflow of his
database he would propose to use as an experience-gathering example. He
proposed the GATK workflow by Farah Zaib Khan et al. for being good to
cite about workflows and reproducibility.

https://doi.org/10.1186/s12859-017-1747-0
https://github.com/skanwal/GATK-CaseStudy/tree/master/CWL

We have BWA, GATK and Picard Toolkit already in Debian from what I
understand (not sure about the state of GATK). Stian had pointed to
https://github.com/h3abionet/h3agatk/blob/master/workflows/GATK/GATK-complete-WES-Workflow-h3abionet.cwl
as a current variant of the same, but then again, I would not mind to
start with a smaller one. Any comments?

The main point for me is to have a small test case for running this
workflow repeatedly. We would hence also need to decide on appropriate
test data at some point. Should we also introduce a package like
"genome-human"?

Best,

Steffen


Reply | Threaded
Open this post in threaded view
|

Re: CWL workflow as Debian package

Michael Crusoe
GATK4 is DFSG compatible, GATK3 still has a non-DFSG core

https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl is entirely based upon free software, but the reference libraries are quite large





2017-08-11 18:57 GMT+03:00 Steffen Möller <[hidden email]>:
Hello,

I had an exchange with Stian yesterday about what CWL workflow of his
database he would propose to use as an experience-gathering example. He
proposed the GATK workflow by Farah Zaib Khan et al. for being good to
cite about workflows and reproducibility.

https://doi.org/10.1186/s12859-017-1747-0
https://github.com/skanwal/GATK-CaseStudy/tree/master/CWL

We have BWA, GATK and Picard Toolkit already in Debian from what I
understand (not sure about the state of GATK). Stian had pointed to
https://github.com/h3abionet/h3agatk/blob/master/workflows/GATK/GATK-complete-WES-Workflow-h3abionet.cwl
as a current variant of the same, but then again, I would not mind to
start with a smaller one. Any comments?

The main point for me is to have a small test case for running this
workflow repeatedly. We would hence also need to decide on appropriate
test data at some point. Should we also introduce a package like
"genome-human"?

Best,

Steffen





--
Michael R. Crusoe
Community Engineer & Co-founder
Common Workflow Language project
https://impactstory.org/u/0000-0002-2961-9670
[hidden email]
+1 480 627 9108
Reply | Threaded
Open this post in threaded view
|

Re: CWL workflow as Debian package

Fabian Klötzl
In reply to this post by Steffen Möller-4
Hi all,

On 11.08.2017 17:57, Steffen Möller wrote:
> I had an exchange with Stian yesterday about what CWL workflow of his
> database he would propose to use as an experience-gathering example. He
> proposed the GATK workflow by Farah Zaib Khan et al. for being good to
> cite about workflows and reproducibility.

I think providing CWL workflows in debian is a great idea. However, I
have another use case in mind: We know that bioinformatics is mostly
converting from one file format to the other. Given the EDAM annotation
which has already been added to various packages, one can propose the
user tools to do the conversion. If we had CWL tool descriptions, one
would be able to tell the user how to call the tool to achieve the
desired effect. With a fitting workflow one would be able to do the
conversion automatically.

The combination of CWL + EDAM provides some very nice synergistic
effects. However, the lack of annotated tools might make achieving the
functionality, as stated above, a big effort. Neither bio.tools, nor the
CWL repo contain enough entries to really be useful. Borrowing from the
other discussion, Appstream might be a way to let the author of a tool
provide this metadata, relieving the packagers of some work.

Hope this text was somewhat comprehensible; My thoughts on this are
still in a rough state.

Best,
Fabian

Reply | Threaded
Open this post in threaded view
|

Re: CWL workflow as Debian package

Michael Crusoe
Hey Fabian,

Thank you for your enthusiasm, I agree!

FYI: the easiest way to find a CWL description is to do a Google **and** GitHub search for the tool name along with "cwlVersion" (which is required in all CWL v1+ documents).

Cheers,

Pe 15 aug. 2017 1:10 p.m., "Fabian Klötzl" <[hidden email]> a scris:
Hi all,

On 11.08.2017 17:57, Steffen Möller wrote:
> I had an exchange with Stian yesterday about what CWL workflow of his
> database he would propose to use as an experience-gathering example. He
> proposed the GATK workflow by Farah Zaib Khan et al. for being good to
> cite about workflows and reproducibility.

I think providing CWL workflows in debian is a great idea. However, I
have another use case in mind: We know that bioinformatics is mostly
converting from one file format to the other. Given the EDAM annotation
which has already been added to various packages, one can propose the
user tools to do the conversion. If we had CWL tool descriptions, one
would be able to tell the user how to call the tool to achieve the
desired effect. With a fitting workflow one would be able to do the
conversion automatically.

The combination of CWL + EDAM provides some very nice synergistic
effects. However, the lack of annotated tools might make achieving the
functionality, as stated above, a big effort. Neither bio.tools, nor the
CWL repo contain enough entries to really be useful. Borrowing from the
other discussion, Appstream might be a way to let the author of a tool
provide this metadata, relieving the packagers of some work.

Hope this text was somewhat comprehensible; My thoughts on this are
still in a rough state.

Best,
Fabian