--- title: "Features" description: "Explore Pipeline features and capabilities" --- import { PipelineFeaturesNav, PipelineBestPracticesCard } from './PipelineFeaturesCards' # Features ## Feature Overview ### 1. Argo & Kubeflow Support TIR Pipelines support two industry-standard template formats: - **Argo Workflows** — Define multi-step workflows using Argo's YAML specification. Supports DAG-based and step-based execution. - **Kubeflow Pipelines** — Use Kubeflow's pipeline SDK to define ML workflows. When you upload a YAML file, TIR automatically detects the pipeline type (Argo, Kubeflow, or generic). The YAML is validated on upload — `containerSet` templates are not supported. ### 2. Pipeline Versioning Each pipeline supports multiple versions, allowing you to track iterative changes to your workflow definition. **How versioning works:** - When you create a pipeline, the first uploaded YAML becomes the default version. - You can upload additional versions under the same pipeline by clicking **CREATE VERSION** in the Pipeline Versions tab. - Each version is an independent YAML upload with its own `version_id`. - One version is marked as the **default version** for the pipeline. - Runs can be created against any specific version. **Managing versions:** - **View versions:** Select a pipeline and go to the **Pipeline Versions** tab. - **Create a version:** Click **CREATE VERSION**, upload a `.yaml` file, and click **UPLOAD**. - **Delete a version:** Click the delete icon next to a version. This terminates all active runs for that version before deletion. ### 3. Pipeline Actions From the pipeline listing page, the **Actions** menu provides: - **Create Version** — Upload a new `.yaml` version for the selected pipeline. - **Delete** — Permanently delete the pipeline and all associated versions and runs. :::warning Deleting a pipeline soft-deletes all associated versions and runs. This action cannot be undone. ::: ### 4. Runs & Experiments A **Run** is a single execution of a pipeline version. Each run uses a selected resource plan (CPU or GPU) and executes the workflow defined in the YAML. **Creating a run:** 1. Navigate to **Pipelines** > **Run**, or click **Create Run** from a specific pipeline or version. 2. Select or create an **Experiment** to organize the run. 3. Choose the pipeline version and configure any run parameters. 4. Select a resource plan and click **FINISH**. **Experiments** are containers that group related pipelines and their run histories. Use them to organize runs by project phase, model type, or any logical grouping. **Run actions:** - **Retry** — Restart a failed run without losing completed work. Uses the `PUT /runs/{run_id}/?action=retry` endpoint. - **Terminate** — Stop a running execution immediately. Uses the `PUT /runs/{run_id}/?action=terminate` endpoint. - **Delete** — Remove a run. Terminates it first if still active, then releases allocated resources. **Viewing run details:** - Click the run name to see execution details, workflow manifests, pod status, and progress. - Run states include: pending, running, succeeded, failed, and terminated. ### 5. Scheduled Runs Scheduled runs automate pipeline execution at specific times or recurring intervals. **Creating a scheduled run:** 1. During run creation, enable the **Schedule Run** toggle. 2. Configure the schedule: - **Cron expression** — Define a recurring pattern (e.g., `0 0 * * *` for daily at midnight). - **Start time / End time** — Optional time boundaries for the schedule. - **Max concurrency** — Limit the number of simultaneous runs from this schedule. 3. Select a resource plan and click **CREATE**. **Managing scheduled runs:** - Navigate to **Pipelines** > **Scheduled Run** to view all scheduled jobs. - **Enable/Disable** — Toggle a schedule on or off without deleting it. - **Delete** — Permanently remove a scheduled job. The schedule is disabled first, then deleted. - **View related runs** — See all runs triggered by a specific scheduled job. ### 6. Docker Image Execution Run custom Docker images as pipeline workflows by defining an Argo Workflow YAML with your container image, command, and arguments. - Supports both public and private images (private images require `imagePullSecrets`). - Customize the entrypoint and arguments for your container. For complete instructions, YAML templates, and ImagePullSecret setup, see the [Docker Run Guide](/docs/tir/Pipeline/DockerRun). ### 7. Data Transfer (EOS/PFS) Transfer data between EOS object storage and Parallel File-System (PFS) using pre-built Argo Workflow templates. - **PFS to EOS** — Upload files from your filesystem to an EOS bucket. - **EOS to PFS** — Download data from an EOS bucket into your PFS filesystem. Both workflows are available as downloadable YAML files that you can upload as pipelines. For step-by-step instructions and YAML downloads, see the [Data Transfer Guide](/docs/tir/Pipeline/DataTransferGuide). ### 8. Scalable & Reliable Execution TIR Pipelines are built for production ML workloads: | Capability | Description | | :--- | :--- | | **Serverless execution** | No infrastructure to manage — runs execute on demand. | | **Asynchronous processing** | Pipelines run in the background; monitor via dashboard or API. | | **Best-in-class retry** | Restart failed jobs without losing completed work. | | **Unlimited re-runs** | Execute a pipeline version as many times as needed. | | **Stored results** | Run artifacts and logs are stored in EOS buckets. | | **CPU & GPU plans** | Choose the right resource plan for each step of your workflow. | ## Best Practices #### Pipeline Design - Keep YAML definitions clean and simple — avoid introducing extra nodes or commands that may conflict. - Use pipeline versioning to track iterative changes rather than overwriting existing pipelines. - Use experiments to organize related runs by project phase or model type. #### Resource Optimization - Choose the right resource plan (CPU vs GPU) for your workload. - Use CPU plans for data preprocessing steps; reserve GPU plans for training. - Set appropriate `max_concurrency` on scheduled runs to avoid resource contention. #### Reliability - Leverage the retry mechanism to resume failed jobs without restarting from scratch. - Store intermediate results in EOS buckets to avoid recomputation. - Use scheduled runs for recurring batch jobs to automate execution. --- ---