Features

The boring parts of monitoring, done well.

No “AIOps” buzzword soup. Just the things that matter when your API starts throwing 500s at 3am.

6 core features 3 alert channels 0 per-seat fees

Minute-resolution checks

Pick 1, 5, 10, or 15 min. Status-code ranges, keyword match, SSL verify, follow-redirects, custom headers — the things you’d check by hand if you had time.

Alerts that actually arrive

Email, Slack, and webhook destinations. Per-monitor routing. Silent monitors get flagged — no accidental “everything’s fine” while your site burns.

Multi-user from day one

Magic-link sign in. Team-scoped monitors, contacts, and incidents. Roles coming. No per-seat surprise invoices.

Percentiles, not just averages

24h / 7d / 30d uptime. p50, p95, p99 response time. Hourly breakdown. Because “average” is a lie for long-tail latency.

Incidents with a paper trail

Auto-opened on first failure, auto-resolved on recovery. Acknowledge from the UI. Full history. Perfect for post-mortems — and receipts.

Retries before it wakes you

A single timeout doesn’t page anyone. We retry with backoff — only sustained failures trigger an alert. Blips from flaky CDNs stay quiet so you trust the ones that do arrive.

What we’re not

A monitoring suite pretending to be a platform.

TraceCrowd does one thing well. We leave the rest to tools that already do them right — and save you the enterprise contract.

No status pages. Use Statuspage or an OSS page; ours would be worse.
No synthetic transactions. If you need Playwright flows, you need Checkly or Datadog.
No AI root-cause panels. We’ll tell you what failed. The why is your job.
No SDR calling you next Tuesday. Sign up, try it, leave if it’s not for you.

Anatomy of an alert

When something breaks,
the right people know within 60s.

A check fails. The incident opens. Every contact and channel wired up to that monitor gets pinged — no duplicates, no silent failures, no paging the whole company by accident.

Down and Up transitions fire exactly once per incident
Fan out to email, Slack, and your own tools in parallel
Retries with backoff, so a blip doesn’t wake anyone
Recovery fires an “all clear” — no manual reset

alert center live

api-checkout is down

HTTP 500

Incident opened · 04:23:58 UTC

Slack · #eng-oncall delivered · 1.2s

checkout is down — open incident →

Email · sarah@acme.com delivered · 2.4s

Incident #142: api-checkout is down

Email · marcus@acme.com delivered · 2.7s

Incident #142: api-checkout is down

Acknowledged by Sarah

04:26:12 UTC · 2m 14s after opening

Incidents

Every outage, on the record.

Opens the moment a check fails. Closes when it recovers. Durations, timestamps, and who acknowledged — all captured. Your post-mortem half-writes itself.

Time-to-detect and time-to-recover, no calculator required
Ack from the UI so the rest of the team sees someone’s on it
Full timeline of what failed, when, and for how long

app.tracecrowd.com/incidents Preview

Active api-checkout unexpected status 500 started 04:23 UTC · 12m ago

Resolved www-shop connection timeout yesterday 18:02 4m 31s

Resolved docs SSL handshake failed 2 days ago 1m 47s

Resolved billing-svc keyword missing 3 days ago 12m 05s

1 active · 14 resolved this month Avg recover time 5m 22s

app.tracecrowd.com/monitors Preview

All 41 Up 38 Down 1 Slow 1 Paused 2 p95 fleet · 287ms

GET api-checkout Down — 99.83%

GET www-shop Up 142ms 100.00%

GET docs Up 89ms 99.99%

POST billing-svc Slow 1,214ms 99.91%

GET auth Up 64ms 100.00%

GET staging-cdn Paused — —

Fleet overview

All your URLs, at a glance.

One row per monitor. Sort by worst uptime or slowest p95. Tag by environment, team, or service and filter to just what you own.

Silent monitors (no alerts wired up) get flagged, loud
Group by tag: env, team, service — whatever you use
Outlier lists: slowest, flakiest, SSL expiring soon

See it running on your URLs.

Less than a minute to your first check. Free during beta.

Start free See how it works