--- title: Quick Start Guide --- # Quick Start Guide > Create your first Model Evaluation job and score your model's outputs using an automated LLM-as-judge approach. This guide walks through the complete setup flow. --- ## What you need before you start | Requirement | Details | |-------------|---------| | **Account** | Active E2E AI Cloud account with access to Foundation Studio | | **Dataset** | A dataset containing model outputs to evaluate, in EOS or on Hugging Face | | **Column names** | Know the column names for input, model output, and optionally the reference/ground-truth answer | | **Hugging Face token** | Required only if your Hugging Face dataset is private | Download a sample dataset: here --- ## Step 1: Navigate to Model Evaluation 1. In the TIR Dashboard sidebar, click **Foundation Studio**. 2. From the dropdown, select **Model Evaluation**. 3. You will land on the **Manage Evaluation Jobs** page. --- ## Step 2: Create an evaluation job Click **Create Job** or the **Click Here** button. --- ## Step 3: Configure the input dataset On the **Input Dataset** page, fill in the following: | Field | Description | Example | |-------|-------------|---------| | **Job Name** | A clear, descriptive name for the job | `tir-job-12181052011` | | **Input Column** | Column containing input prompts or questions | `question` | | **Output Column** | Column containing the model's predicted outputs | `answer` | | **Reference Answer Column** | *(Optional)* Ground-truth answers for comparison | `expected_answer` | | **Num Rows Limit** | Maximum rows to evaluate. Use `-1` for no limit | `500` or `-1` | ### Dataset type: EOS Dataset 1. Select **EOS Dataset** as the dataset type. 2. Click **Choose** to browse available datasets. 3. Select the dataset and the specific file to use. ### Dataset type: Hugging Face 1. Select **Hugging Face** as the dataset type. 2. Enter the Hugging Face dataset name. 3. *(Optional)* If the dataset is private, select an existing Hugging Face integration or click **Click Here** to create one. Paste your token and click **Create**. Click **Next** to proceed. --- ## Step 4: Select the evaluator model On the **Model Selection** page, configure the evaluator: | Field | Description | |-------|-------------| | **Evaluator Model** | The LLM that will judge your model's outputs | | **Temperature** | Controls output randomness (range: `0.0–1.0`) | | **Top-P** | Nucleus sampling probability (range: `0.001–1.0`) | | **Max Tokens** | Token limit for the evaluator's scoring output | **Available evaluator models:** | Model | Best for | |-------|----------| | **Llama 3.1 8B Instruct** | General-purpose evaluation with strong instruction-following | **Parameter guidance:** | Parameter | Conservative | Creative | |-----------|-------------|----------| | Temperature | `0.2` (deterministic) | `1.0` (varied) | | Top-P | `0.1` (focused) | `1.0` (all tokens) | | Max Tokens | `512` (short) | `1024` (detailed) | Click **Next** to proceed. --- ## Step 5: Select the evaluation framework On the **Framework Selection** page: 1. **Choose an evaluation framework** that matches your task: | Framework | Use when | |-----------|----------| | **Text Summarization** | Your model generates summaries of documents | | **General Assistant** | Your model handles general conversation or instruction-following | | **Question Answering** | Your model answers factual or context-based questions | | **Text Classification** | Your model classifies text into predefined categories | 2. **Model Evaluation Prompt (optional):** Provide additional context or instructions for the evaluator. - Example: `Please ensure that the summarization does not introduce fabricated details.` 3. **Select a result dataset** where scores will be stored. Results are saved to a folder named after your job at the root of the selected EOS bucket. --- ## Step 6: Review and launch Review your configuration on the **Summary** page, then click **Launch**. The job appears in the **Manage Evaluation Jobs** list. Processing time depends on dataset size and evaluator model selected. --- ## Step 7: Monitor and review results Click on the job to view details: | Tab | What it shows | |-----|---------------| | **Overview** | Job configuration, status, and resource details | | **Events** | Pod scheduling and container start events | | **Logs** | Real-time job logs to monitor progress or diagnose issues | | **Evaluation Results** | Scores across the 4 framework-specific metrics | --- ## Next steps - [Features](../Features) — Explore all evaluation capabilities and framework metrics in detail. - [FAQs](../FAQs) — Troubleshoot common issues. ---