This guide will show you how to add a new method to the pipeline.
A method is a technique to solve a specific problem when analysing omics data. Its performance is assessed by comparing it to other methods and control methods.
This guide will show you how to create a new Viash component. In the following we will show examples for both Python and R. Note that the Task template repo is used throughout the guide, so make sure to replace any occurrences of "task_template" with your task of interest.
Use the create_*_method.sh script found in the scripts repository to start creating a new method. Open the script and update the name parameter to the desired name of the method.
[notice] Checking if Docker image is available at 'ghcr.io/openproblems-bio/core/project/create_component:build_main'
Check inputs
Check language
Check API file
Read API file
Create output dir
Create config
Create script
Done!
This creates a new folder at src/methods/my_python_method containing a Viash config and a script.
tree src/methods/my_python_method
├── script.py Script for running the method.
├── config.vsh.yaml Config file for method.
└── ... Optional additional resources.
scripts/create_component/create_r_method.sh
Check inputs
Check language
Check API file
Read API file
Create output dir
Create config
Create script
Done!
scripts/create_component/create_r_method.sh
common/scripts/create_component\--name my_r_method \--language r \--type method
This creates a new folder at src/methods/my_r_method containing a Viash config and a script.
tree src/methods/my_r_method
├── script.R Script for running the method.
├── config.vsh.yaml Config file for method.
└── ... Optional additional resources.
Change the --name to a unique name for your method. It must match the regex [a-z][a-z0-9_]* (snakecase).
A config file contains metadata of the component and the dependencies required to run it. In steps 2 and 3 we will fill in the required information.
A script contains the code to run the method. In step 4 we will edit the script.
Tip
Some tasks have multiple method subtypes (e.g. batch_integration), which will require you to use a different value for --type corresponding to the desired method subtype.
Step 2: Fill in metadata
The Viash config contains metadata of your method, which script is used to run it, and the required dependencies.
Generated config file
This is what the config.vsh.yaml generated by the create_component component looks like:
# The API specifies which type of component this is.# It contains specifications for:# - The input/output files# - Common parameters# - A unit test__merge__: ../../api/comp_method.yaml# A unique identifier for your component (required).# Can contain only lowercase letters or underscores.name: my_python_method# A relatively short label, used when rendering visualisations (required)label: My Python Method# A one sentence summary of how this method works (required). Used when # rendering summary tables.summary:"FILL IN: A one sentence summary of this method."# A multi-line description of how this component works (required). Used# when rendering reference documentation.description: | FILL IN: A (multi-line) description of how this method works.# references:# doi: # - 10.1000/xx.123456.789# bibtex:# - |# @article{foo,# title={Foo},# author={Bar},# journal={Baz},# year={2024}# }links: # URL to the documentation for this method (required).documentation: https://url.to/the/documentation # URL to the code repository for this method (required).repository: https://github.com/organisation/repository# Metadata for your componentinfo: # Which normalisation method this component prefers to use (required).preferred_normalization: log_cp10k# Component-specific parameters (optional)# arguments:# - name: "--n_neighbors"# type: "integer"# default: 5# description: Number of neighbors to use.# Resources required to run the componentresources: # The script of your component (required)-type: python_scriptpath: script.py # Additional resources your script needs (optional) # - type: file # path: weights.ptengines: # Specifications for the Docker image for this component.-type: dockerimage: openproblems/base_python:1.0.0 # Add custom dependencies here (optional). For more information, see # https://viash.io/reference/config/engines/docker/#setup . # setup: # - type: python # packages: numpy<2runners: # This platform allows running the component natively-type: executable # Allows turning the component into a Nextflow module / pipeline.-type: nextflowdirectives:label:[midtime,midmem,midcpu]
Contents of config.vsh.yaml
# The API specifies which type of component this is.# It contains specifications for:# - The input/output files# - Common parameters# - A unit test__merge__: ../../api/comp_method.yaml# A unique identifier for your component (required).# Can contain only lowercase letters or underscores.name: my_r_method# A relatively short label, used when rendering visualisations (required)label: My R Method# A one sentence summary of how this method works (required). Used when # rendering summary tables.summary:"FILL IN: A one sentence summary of this method."# A multi-line description of how this component works (required). Used# when rendering reference documentation.description: | FILL IN: A (multi-line) description of how this method works.# references:# doi: # - 10.1000/xx.123456.789# bibtex:# - |# @article{foo,# title={Foo},# author={Bar},# journal={Baz},# year={2024}# }links: # URL to the documentation for this method (required).documentation: https://url.to/the/documentation # URL to the code repository for this method (required).repository: https://github.com/organisation/repository# Metadata for your componentinfo: # Which normalisation method this component prefers to use (required).preferred_normalization: log_cp10k# Component-specific parameters (optional)# arguments:# - name: "--n_neighbors"# type: "integer"# default: 5# description: Number of neighbors to use.# Resources required to run the componentresources: # The script of your component (required)-type: r_scriptpath: script.R # Additional resources your script needs (optional) # - type: file # path: weights.ptengines: # Specifications for the Docker image for this component.-type: dockerimage: openproblems/base_r:1.0.0 # Add custom dependencies here (optional). For more information, see # https://viash.io/reference/config/engines/docker/#setup . # setup: # - type: r # packages: tibblerunners: # This platform allows running the component natively-type: executable # Allows turning the component into a Nextflow module / pipeline.-type: nextflowdirectives:label:[midtime,midmem,midcpu]
Required metadata fields
Please edit info section in the config file to fill in the necessary metadata.
.__merge__: The API specifies which type of component this is. It contains specifications for:
The input/output files
Common parameters
A unit test
.name: A unique identifier. Can only contain lowercase letters, numbers or underscores.
.label: A unique, human-readable, short label. Used for creating summary tables and visualisations.
.summary: A one sentence summary of purpose and methodology. Used for creating an overview tables.
.description: A longer description (one or more paragraphs). Used for creating reference documentation and supplementary information.
Step 3: Add dependencies
Each component has it’s own set of dependencies, because different components might have conflicting dependencies.
base images
For your convenience we have created several base images that can be used for python or R scripts. These images can be found in the OpenProblems Docker repository. Click on the packages to view the url you need to use. You are not required to use these images but install the required packages to make sure OpenProblems works properly.
openproblems/base_python Base image for python scripts.
openproblems/base_r Base image for R scripts.
openproblems/base_pytorch_nvidia Base image for scripts that use pytorch with nvidia gpu support.
openproblems/base_tensorflow_nvidia Base image for scripts that use tensorflow with nvidia gpu support.
custom image
Update the setup definition in the platforms section of the config file. This section describes the packages that need to be installed in the Docker image and are required for your method to run.
If you’re using a custom image use the following minimum setup:
The required sections are explained here in more detail:
a. Imports and libraries
In the top section of the script you can define which packages/libraries the method needs. If you add a new or different package add the dependency to config.vsh.yaml in the setup field (see above).
b. Argument block
The Viash code block is designed to facilitate prototyping, by enabling you to execute directly by running python script.py (or Rscript script.R for R users). Note that anything between “VIASH START” and “VIASH END” will be removed and replaced with a CLI argument parser when the components are being built by Viash.
Here, the par dictionary contains all the arguments defined in the config.vsh.yaml file (including those from the defined __merge__ file). When adding a argument in the par dict also add it to the config.vsh.yaml in the arguments section.
c. Read input data
This section reads any input AnnData files passed to the component.
d. Generate results
This is the most important section of your script, as it defines the core functionality provided by the component. It processes the input data to create results for the particular task at hand.
e. Write output data to file
The output stored in a AnnData object and then written to an .h5ad file. The format is specified by the API file specified in the __merge__ field in the config file.
Step 5: Add resources (optional)
It is possible to add additional resources such as a file containing helper functions or other resources. Please visit this page for more information on how to do this.
Step 6: Try component
Your component’s API file contains the necessary unit tests to check whether your component works and the output is in the correct format.
You can test your component by using the following command:
viash test src/methods/my_python_method/config.vsh.yaml
Output
Running tests in temporary directory: '/tmp/viash_test_logistic_regression_6709186558598111404'
====================================================================
+/tmp/viash_test_logistic_regression_6709186558598111404/build_engine_environment/logistic_regression ---verbosity 6 ---setup cachedbuild ---engine docker
[notice] Building container 'ghcr.io/openproblems-bio/task_template/methods/logistic_regression:test' with Dockerfile
[info] docker build -t 'ghcr.io/openproblems-bio/task_template/methods/logistic_regression:test' '/tmp/viash_test_logistic_regression_6709186558598111404/build_engine_environment' -f '/tmp/viash_test_logistic_regression_6709186558598111404/build_engine_environment/tmp/dockerbuild-logistic_regression-TP2Ilp/Dockerfile'
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 571B done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/openproblems/base_python:1.0.0
#2 DONE 0.1s
#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s
#4 [1/2] FROM docker.io/openproblems/base_python:1.0.0@sha256:6c094ecee24215a2aa6e268beec6a1473a80e86c18ea3468e4357c004f343e15
#4 DONE 0.0s
#5 [2/2] RUN pip install --upgrade pip && pip install --upgrade --no-cache-dir "scikit-learn"
#5 CACHED
#6 exporting to image
#6 exporting layers done
#6 writing image sha256:c6dd3882c29aac46222318f46c9f0213ebeb46840ee0c72949eb2d5694c707a5 done
#6 naming to ghcr.io/openproblems-bio/task_template/methods/logistic_regression:test done
#6 DONE 0.0s
====================================================================
+/tmp/viash_test_logistic_regression_6709186558598111404/test_run_and_check_output/test_executable
>> Running test 'run'
>> Checking whether input files exist
>> Running script as test
Reading input files
Preprocess data
Train model
Generate predictions
Write output AnnData to file
>> Checking whether output file exists
>> Reading h5ad files and checking formats
Reading and checking output
AnnData object with n_obs × n_vars = 123 × 0
obs: 'label_pred'
uns: 'dataset_id', 'method_id', 'normalization_id'
All checks succeeded!
====================================================================
+/tmp/viash_test_logistic_regression_6709186558598111404/test_check_config/test_executable
Load config data
Check .namespace
Check .info.type
Check component metadata
Check references fields
Checking contents of .info.preferred_normalization
Check Nextflow runner
All checks succeeded!
====================================================================
SUCCESS! All 2 out of 2 test scripts succeeded!
Cleaning up temporary directory
Visit “Run tests” for more information on running unit tests and how to interpret common error messages.
You can also run your component on local files using the viash run command. For example:
viash run src/methods/my_python_method/config.vsh.yaml --\--input_train resources_test/task_template/cxg_mouse_pancreas_atlas/train.h5ad \--input_test resources_test/task_template/cxg_mouse_pancreas_atlas/test.h5ad \--output output.h5ad