Skip to main content

Model Evaluation

Assess the quality of any LLM's outputs using automated LLM-as-judge scoring. Choose an evaluation framework, connect your dataset, and get structured quality metrics — no manual annotation required.

LLM-as-JudgeText SummarizationQuestion AnsweringGeneral AssistantText Classification

Quick Start


What can you do with Model Evaluation?

Evaluate model outputs from any LLM using an automated LLM-as-judge approach

Score outputs across 4 framework-specific metrics per evaluation run

Use datasets from EOS storage or Hugging Face as input

Use Llama 3.1 8B Instruct as the evaluator model (LLM-as-judge)

Store structured results in your EOS bucket for download and analysis

Manage jobs with Retry, Terminate, and Delete actions

Key Characteristics

Approach

LLM-as-Judge Scoring

A capable evaluator LLM scores each output against framework-specific criteria — no manual annotation required at scale.

Data

Flexible Dataset Input

Use EOS datasets or Hugging Face datasets. Specify input, output, and an optional reference answer column.

Evaluator

Llama 3.1 8B Instruct as Judge

Evaluation jobs use Llama 3.1 8B Instruct as the LLM judge to score model outputs against framework-specific criteria.


Best Practices

Best Practices for Model Evaluation

Match the framework to your task

Choose the evaluation framework that reflects your model's actual use case. Using the wrong framework produces irrelevant scores.

Use a row limit for quick validation

Set Num Rows Limit to 100–500 rows to validate your dataset and column configuration before running the full evaluation.

Use lower temperature for consistent scoring

Set Temperature to 0.0–0.2 for deterministic, reproducible scores across repeated evaluation runs.

Include a reference answer column when available

Providing ground-truth answers enables comparison-based scoring, which produces more accurate results for QA and classification tasks.


API Reference

REST API

</>Model Evaluation API Reference

Programmatically create, list, manage, and delete model evaluation jobs in TIR.

Explore REST APIs
Authentication & Endpoints
Request and Response Schemas
Open API Reference →
tir.e2enetworks.com / api / v1
GET/teams/{Team_Id}/projects/{Project_Id}/evaluation/jobs/List evaluation jobs
POST/teams/{Team_Id}/projects/{Project_Id}/evaluation/jobs/Create an evaluation job
GET/teams/{Team_Id}/projects/{Project_Id}/evaluation/jobs/{job_id}/Get evaluation job details
PUT/teams/{Team_Id}/projects/{Project_Id}/evaluation/jobs/{job_id}/Retry or terminate a job
DELETE/teams/{Team_Id}/projects/{Project_Id}/evaluation/jobs/{job_id}/Delete an evaluation job