LLM Eval & Testing

9 companies in this category

Total

$3M

Median ARR

$25M

Top ARR

78%

Free tier %

Arize AI

Arize AI provides an end-to-end ML observability platform, expanding to include deep capabilities for LLM evaluation and monitoring.

ARR $25MsubscriptionUSA2019

Vellum provides a platform for prompt engineering, LLM deployment, and evaluation with built-in analytics and monitoring.

ARR $8MFree tierusage-basedUSA2022

Humanloop offers tools for fine-tuning, evaluating, and deploying large language models with human feedback.

ARR $5MFree tierusage-basedUK2020

Langfuse is an open-source observability and evaluation platform for LLM applications, tracking traces, metrics, and user feedback.

ARR $4MFree tierusage-basedGermany2023

An open-source tool and platform for testing and evaluating LLM prompts and models.

ARR $3MFree tiersubscriptionUSA2023

Promptlayer acts as an API wrapper for all LLMs, providing logging, analytics, and prompt management for developers.

ARR $3MFree tierusage-basedUSA2022

Patronus AI offers an automated LLM evaluation platform to detect flaws such as hallucinations, toxicity, and bias before deployment.

ARR $2MsubscriptionUSA2023

Helicone (an open-source project supported by Braintrust) provides an observability platform for LLMs, including logging, caching, and analytics.

ARR $2MFree tierusage-basedUSA2022

Giskard is an open-source platform for ML model testing, including an expanding focus on evaluating and debugging LLMs for security and robustness.

ARR $1MFree tierusage-basedFrance2021