Quality observability

Score, alert and diagnose your agents in production.

Every agent your team runs — WhatsApp chatbot, internal copilot, legal RAG, customer-support. aferiq evaluates continuously, the CTO sees a consolidated quality dashboard, the engineer gets an alert before the client complains.

Start free View on GitHub See dashboard demo

BYOK · 10k traces/month free · LGPD-native (DPA + SCC) · BR-aware (Lei, CNPJ, INSS)

Evaluated traces (30d)

1.4M

Hallucinations caught

8,230

Agencies on aferiq

p50 judge latency

1.8s

Try in 30s, no signup

Paste a chatbot answer and see if it made things up

Question + retrieved context + generated answer → PT-BR LLM judge detects hallucination and categorizes (invented laws, fake tax IDs, fictional gov agencies, etc).

Want more? 500 free traces on a real account

Create free account

2-minute setup

1 line. No boilerplate. Traces flowing.

Paste the key in your .env and run the app. Brazilian PII (CPF/CNPJ/RG/CEP/email/phone) is redacted before any network call — you can read the regex in the SDK source, it's not a magic flag.

.env + main.py

# .env — 1 env var (Sentry-style DSN)
AFERIQ_DSN=https://rg_pk_live_xxx@your-deploy.com.br/api/v1/traces

# main.py — UMA linha, qualquer framework
import openai
import aferiq
aferiq.start()  # reads AFERIQ_DSN + auto-patches openai/anthropic

Supported stacks

Plug into any stack you already use.

LangChain, LangGraph, LlamaIndex via callback. Python decorator for any function. CLI auto-instruments. CrewAI, Haystack, AutoGen, Pydantic AI in /dashboard/integrate.

# 1. App boot — once:
import aferiq
aferiq.start()  # reads AFERIQ_DSN from env

# 2. On EVERY chain (RetrievalQA, ConversationalRetrievalChain, LCEL):
chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    callbacks=[aferiq.handler()],   # ← one line
)

chain.invoke({"query": "..."})
# Trace is in the dashboard, with PII redacted (CPF/CNPJ/email).

For any team running agents

A panel for every role.

Solo eng, agency serving clients, or 200-person company with internal agents — aferiq serves who creates, who operates, and who decides budget.

engineering

Solo eng or technical lead

Decorator on app boot, Slack alert at 3am when hallucination spikes in production. Categorized diagnosis (LAW, ID_NUMBER, NUMBER_DATE), 10-min fix.

product / pm

PM and Product Owner

Compare week to week with regression datasets. Run pre-deploy evals. Show clients a quality dashboard — turns into upsell material, not just logs.

leadership

CTO, VP Eng. and founder

Consolidated view across every agent in the team. Quality score, cost estimate, incidents, top hallucinations. Board reporting, budget decisions, regulatory risk.

Why vertical for Brazil

Details that only matter here.

Global tools don't speak BR. Building from scratch doesn't scale.

Instead of Langfuse

USD pricing, English-only onboarding, no n8n. Great for Bay Area Series A; wrong fit for Brazilian operations.

Instead of Ragas

Library, not product. No cloud, no alerts, no exec dashboard. You install — then build the rest.

Instead of building from scratch

200h of senior dev time for v1 + 20% ongoing maintenance. Multiplied by each agent that grows on your team.

3 metrics, hand-tuned

Focus. Not 12 metrics nobody reads.

Generic hallucination + BR-specific patterns + per-claim diagnosis.

faithfulness

Does the answer follow from the retrieved context?

The default metric. Always run it.

citation_accuracy

Does each atomic claim map to a cited chunk?

Useful for debugging bad retrieval.

hallucination

Claim-level detection in PT-BR.

Categorizes invented laws, fabricated CNPJ/CPF, fictional Receita Federal/INSS references, fake government programs. Actionable diagnosis.

Built for Brazil

Not Langfuse with auto-translation.

The difference is in the details that only matter here: BR-specific judge prompts, auth in SP + LGPD via DPA + SCC, BRL billing, Portuguese support.

PT-BR judge prompts hand-tuned for BR patterns (Lei nº, CNPJ/CPF, Receita Federal, INSS, BACEN, ANPD)
BR-labeled adversarial dataset to validate prompts continuously
Auth + metadata in São Paulo (Supabase) · traces with SCC + DPA · LGPD-native
BRL billing via Stripe Brasil — no FX, no IOF
Portuguese support in BR timezone — not Bay Area SLA
WhatsApp alerts via Z-API on Pro tier — agencies live on WhatsApp

Security & LGPD

BR enterprise won't buy AI without these answers.

Regulated sectors (legal, finance, health) demand DPO + DPA before they sign. We deliver the checklist upfront — no Phase-2 wait.

PII redacted locally before the wire

SDK runs on your machine. redact_pii=True flag strips CPF, CNPJ, RG, CEP, phone and email via BR regex before any POST. What can't leave your environment, doesn't.

API keys encrypted (AES-256-GCM)

OpenAI/Anthropic keys live in workspace_settings encrypted with per-row IV + auth tag. Plaintext only in memory during a request. Never logged.

Bearer auth + bcrypt + RLS

Ingest API keys hashed with bcrypt. Cross-tenant access blocked via Postgres Row Level Security. Auth callback uses timing-safe comparison.

Hardened browser headers

Strict CSP, X-Frame-Options DENY, HSTS preload, minimal Permissions-Policy. Cookie consent gates PostHog/Sentry per LGPD Art. 7.

DPA + audit on hand

DPA template, SCC equivalents for cross-border flows, public subprocessor list at /legal/lgpd. Right-to-erasure via DPO in 15 business days.

Open where it matters

aferiq-eval lib on PyPI ships with judge prompts visible. Fork, audit, run self-hosted. No black-box judge.

Technical controls LGPD + subprocessors

Free to start. Pay when the bot scales.

10,000 traces/month on the free tier with full coverage of all 3 PT-BR metrics.

See pricing in BRL Versão em português