Skip to main content

Features

Feature Overview

1. Argo & Kubeflow Support

TIR Pipelines support two industry-standard template formats:

  • Argo Workflows — Define multi-step workflows using Argo's YAML specification. Supports DAG-based and step-based execution.
  • Kubeflow Pipelines — Use Kubeflow's pipeline SDK to define ML workflows.

When you upload a YAML file, TIR automatically detects the pipeline type (Argo, Kubeflow, or generic). The YAML is validated on upload — containerSet templates are not supported.

2. Pipeline Versioning

Each pipeline supports multiple versions, allowing you to track iterative changes to your workflow definition.

How versioning works:

  • When you create a pipeline, the first uploaded YAML becomes the default version.
  • You can upload additional versions under the same pipeline by clicking CREATE VERSION in the Pipeline Versions tab.
  • Each version is an independent YAML upload with its own version_id.
  • One version is marked as the default version for the pipeline.
  • Runs can be created against any specific version.

Managing versions:

  • View versions: Select a pipeline and go to the Pipeline Versions tab.
  • Create a version: Click CREATE VERSION, upload a .yaml file, and click UPLOAD.
  • Delete a version: Click the delete icon next to a version. This terminates all active runs for that version before deletion.

3. Pipeline Actions

From the pipeline listing page, the Actions menu provides:

  • Create Version — Upload a new .yaml version for the selected pipeline.
  • Delete — Permanently delete the pipeline and all associated versions and runs.
warning

Deleting a pipeline soft-deletes all associated versions and runs. This action cannot be undone.

4. Runs & Experiments

A Run is a single execution of a pipeline version. Each run uses a selected resource plan (CPU or GPU) and executes the workflow defined in the YAML.

Creating a run:

  1. Navigate to Pipelines > Run, or click Create Run from a specific pipeline or version.
  2. Select or create an Experiment to organize the run.
  3. Choose the pipeline version and configure any run parameters.
  4. Select a resource plan and click FINISH.

Experiments are containers that group related pipelines and their run histories. Use them to organize runs by project phase, model type, or any logical grouping.

Run actions:

  • Retry — Restart a failed run without losing completed work. Uses the PUT /runs/{run_id}/?action=retry endpoint.
  • Terminate — Stop a running execution immediately. Uses the PUT /runs/{run_id}/?action=terminate endpoint.
  • Delete — Remove a run. Terminates it first if still active, then releases allocated resources.

Viewing run details:

  • Click the run name to see execution details, workflow manifests, pod status, and progress.
  • Run states include: pending, running, succeeded, failed, and terminated.

5. Scheduled Runs

Scheduled runs automate pipeline execution at specific times or recurring intervals.

Creating a scheduled run:

  1. During run creation, enable the Schedule Run toggle.
  2. Configure the schedule:
    • Cron expression — Define a recurring pattern (e.g., 0 0 * * * for daily at midnight).
    • Start time / End time — Optional time boundaries for the schedule.
    • Max concurrency — Limit the number of simultaneous runs from this schedule.
  3. Select a resource plan and click CREATE.

Managing scheduled runs:

  • Navigate to Pipelines > Scheduled Run to view all scheduled jobs.
  • Enable/Disable — Toggle a schedule on or off without deleting it.
  • Delete — Permanently remove a scheduled job. The schedule is disabled first, then deleted.
  • View related runs — See all runs triggered by a specific scheduled job.

6. Docker Image Execution

Run custom Docker images as pipeline workflows by defining an Argo Workflow YAML with your container image, command, and arguments.

  • Supports both public and private images (private images require imagePullSecrets).
  • Customize the entrypoint and arguments for your container.

For complete instructions, YAML templates, and ImagePullSecret setup, see the Docker Run Guide.

7. Data Transfer (EOS/PFS)

Transfer data between EOS object storage and Parallel File-System (PFS) using pre-built Argo Workflow templates.

  • PFS to EOS — Upload files from your filesystem to an EOS bucket.
  • EOS to PFS — Download data from an EOS bucket into your PFS filesystem.

Both workflows are available as downloadable YAML files that you can upload as pipelines.

For step-by-step instructions and YAML downloads, see the Data Transfer Guide.

8. Scalable & Reliable Execution

TIR Pipelines are built for production ML workloads:

CapabilityDescription
Serverless executionNo infrastructure to manage — runs execute on demand.
Asynchronous processingPipelines run in the background; monitor via dashboard or API.
Best-in-class retryRestart failed jobs without losing completed work.
Unlimited re-runsExecute a pipeline version as many times as needed.
Stored resultsRun artifacts and logs are stored in EOS buckets.
CPU & GPU plansChoose the right resource plan for each step of your workflow.

Best Practices

Pipeline Design

  • Keep YAML definitions clean and simple — avoid introducing extra nodes or commands that may conflict.
  • Use pipeline versioning to track iterative changes rather than overwriting existing pipelines.
  • Use experiments to organize related runs by project phase or model type.

Resource Optimization

  • Choose the right resource plan (CPU vs GPU) for your workload.
  • Use CPU plans for data preprocessing steps; reserve GPU plans for training.
  • Set appropriate max_concurrency on scheduled runs to avoid resource contention.

Reliability

  • Leverage the retry mechanism to resume failed jobs without restarting from scratch.
  • Store intermediate results in EOS buckets to avoid recomputation.
  • Use scheduled runs for recurring batch jobs to automate execution.

Best Practices for Pipelines

Keep YAML simple

Avoid extra nodes or commands that conflict. Use clean Argo or Kubeflow templates.

Use versioning

Track changes by creating new pipeline versions rather than overwriting existing ones.

Right-size resources

Use CPU plans for preprocessing and GPU plans only for training steps to optimize costs.

Leverage retry

Use the built-in retry mechanism to resume failed jobs without losing completed work.