---
title: Quick Start Guide
---

# Quick Start Guide

> Create your first Model Evaluation job and score your model's outputs using an automated LLM-as-judge approach. This guide walks through the complete setup flow.

---

## What you need before you start

| Requirement | Details |
|-------------|---------|
| **Account** | Active E2E AI Cloud account with access to Foundation Studio |
| **Dataset** | A dataset containing model outputs to evaluate, in EOS or on Hugging Face |
| **Column names** | Know the column names for input, model output, and optionally the reference/ground-truth answer |
| **Hugging Face token** | Required only if your Hugging Face dataset is private |

Download a sample dataset: <a href="/data/sample.json" download="dataset.json" style={{ textDecoration: "none" }}>here</a>

---

## Step 1: Navigate to Model Evaluation

1. In the TIR Dashboard sidebar, click **Foundation Studio**.
2. From the dropdown, select **Model Evaluation**.
3. You will land on the **Manage Evaluation Jobs** page.

---

## Step 2: Create an evaluation job

Click **Create Job** or the **Click Here** button.

---

## Step 3: Configure the input dataset

On the **Input Dataset** page, fill in the following:

| Field | Description | Example |
|-------|-------------|---------|
| **Job Name** | A clear, descriptive name for the job | `tir-job-12181052011` |
| **Input Column** | Column containing input prompts or questions | `question` |
| **Output Column** | Column containing the model's predicted outputs | `answer` |
| **Reference Answer Column** | *(Optional)* Ground-truth answers for comparison | `expected_answer` |
| **Num Rows Limit** | Maximum rows to evaluate. Use `-1` for no limit | `500` or `-1` |

### Dataset type: EOS Dataset

1. Select **EOS Dataset** as the dataset type.
2. Click **Choose** to browse available datasets.
3. Select the dataset and the specific file to use.

### Dataset type: Hugging Face

1. Select **Hugging Face** as the dataset type.
2. Enter the Hugging Face dataset name.
3. *(Optional)* If the dataset is private, select an existing Hugging Face integration or click **Click Here** to create one. Paste your token and click **Create**.

Click **Next** to proceed.

---

## Step 4: Select the evaluator model

On the **Model Selection** page, configure the evaluator:

| Field | Description |
|-------|-------------|
| **Evaluator Model** | The LLM that will judge your model's outputs |
| **Temperature** | Controls output randomness (range: `0.0–1.0`) |
| **Top-P** | Nucleus sampling probability (range: `0.001–1.0`) |
| **Max Tokens** | Token limit for the evaluator's scoring output |

**Available evaluator models:**

| Model | Best for |
|-------|----------|
| **Llama 3.1 8B Instruct** | General-purpose evaluation with strong instruction-following |

**Parameter guidance:**

| Parameter | Conservative | Creative |
|-----------|-------------|----------|
| Temperature | `0.2` (deterministic) | `1.0` (varied) |
| Top-P | `0.1` (focused) | `1.0` (all tokens) |
| Max Tokens | `512` (short) | `1024` (detailed) |

Click **Next** to proceed.

---

## Step 5: Select the evaluation framework

On the **Framework Selection** page:

1. **Choose an evaluation framework** that matches your task:

   | Framework | Use when |
   |-----------|----------|
   | **Text Summarization** | Your model generates summaries of documents |
   | **General Assistant** | Your model handles general conversation or instruction-following |
   | **Question Answering** | Your model answers factual or context-based questions |
   | **Text Classification** | Your model classifies text into predefined categories |

2. **Model Evaluation Prompt (optional):** Provide additional context or instructions for the evaluator.
   - Example: `Please ensure that the summarization does not introduce fabricated details.`

3. **Select a result dataset** where scores will be stored. Results are saved to a folder named after your job at the root of the selected EOS bucket.

---

## Step 6: Review and launch

Review your configuration on the **Summary** page, then click **Launch**.

The job appears in the **Manage Evaluation Jobs** list. Processing time depends on dataset size and evaluator model selected.

---

## Step 7: Monitor and review results

Click on the job to view details:

| Tab | What it shows |
|-----|---------------|
| **Overview** | Job configuration, status, and resource details |
| **Events** | Pod scheduling and container start events |
| **Logs** | Real-time job logs to monitor progress or diagnose issues |
| **Evaluation Results** | Scores across the 4 framework-specific metrics |

---

## Next steps

- [Features](../Features) — Explore all evaluation capabilities and framework metrics in detail.
- [FAQs](../FAQs) — Troubleshoot common issues.


---