TraceCrowd beta
Features

The boring parts of monitoring, done well.

No “AIOps” buzzword soup. Just the things that matter when your API starts throwing 500s at 3am.

6 core features 3 alert channels 0 per-seat fees

Minute-resolution checks

Pick 1, 5, 10, or 15 min. Status-code ranges, keyword match, SSL verify, follow-redirects, custom headers — the things you’d check by hand if you had time.

1

Alerts that actually arrive

Email, Slack, and webhook destinations. Per-monitor routing. Silent monitors get flagged — no accidental “everything’s fine” while your site burns.

Multi-user from day one

Magic-link sign in. Team-scoped monitors, contacts, and incidents. Roles coming. No per-seat surprise invoices.

p99

Percentiles, not just averages

24h / 7d / 30d uptime. p50, p95, p99 response time. Hourly breakdown. Because “average” is a lie for long-tail latency.

open ack resolved 04:23 04:26 04:31

Incidents with a paper trail

Auto-opened on first failure, auto-resolved on recovery. Acknowledge from the UI. Full history. Perfect for post-mortems — and receipts.

PAGE ok retry retry fail

Retries before it wakes you

A single timeout doesn’t page anyone. We retry with backoff — only sustained failures trigger an alert. Blips from flaky CDNs stay quiet so you trust the ones that do arrive.

What we’re not

A monitoring suite pretending to be a platform.

TraceCrowd does one thing well. We leave the rest to tools that already do them right — and save you the enterprise contract.

  • No status pages. Use Statuspage or an OSS page; ours would be worse.
  • No synthetic transactions. If you need Playwright flows, you need Checkly or Datadog.
  • No AI root-cause panels. We’ll tell you what failed. The why is your job.
  • No SDR calling you next Tuesday. Sign up, try it, leave if it’s not for you.
Anatomy of an alert

When something breaks,
the right people know within 60s.

A check fails. The incident opens. Every contact and channel wired up to that monitor gets pinged — no duplicates, no silent failures, no paging the whole company by accident.

  • Down and Up transitions fire exactly once per incident
  • Fan out to email, Slack, and your own tools in parallel
  • Retries with backoff, so a blip doesn’t wake anyone
  • Recovery fires an “all clear” — no manual reset
alert center live
api-checkout is down
HTTP 500
Incident opened · 04:23:58 UTC
Slack · #eng-oncall delivered · 1.2s
checkout is down — open incident →
Email · sarah@acme.com delivered · 2.4s
Incident #142: api-checkout is down
Email · marcus@acme.com delivered · 2.7s
Incident #142: api-checkout is down
Acknowledged by Sarah
04:26:12 UTC · 2m 14s after opening
Incidents

Every outage, on the record.

Opens the moment a check fails. Closes when it recovers. Durations, timestamps, and who acknowledged — all captured. Your post-mortem half-writes itself.

  • Time-to-detect and time-to-recover, no calculator required
  • Ack from the UI so the rest of the team sees someone’s on it
  • Full timeline of what failed, when, and for how long
app.tracecrowd.com/incidents Preview
Active api-checkout unexpected status 500 started 04:23 UTC · 12m ago
Resolved www-shop connection timeout yesterday 18:02 4m 31s
Resolved docs SSL handshake failed 2 days ago 1m 47s
Resolved billing-svc keyword missing 3 days ago 12m 05s
1 active · 14 resolved this month Avg recover time 5m 22s
app.tracecrowd.com/monitors Preview
All 41 Up 38 Down 1 Slow 1 Paused 2 p95 fleet · 287ms
GET api-checkout Down 99.83%
GET www-shop Up 142ms 100.00%
GET docs Up 89ms 99.99%
POST billing-svc Slow 1,214ms 99.91%
GET auth Up 64ms 100.00%
GET staging-cdn Paused
Fleet overview

All your URLs, at a glance.

One row per monitor. Sort by worst uptime or slowest p95. Tag by environment, team, or service and filter to just what you own.

  • Silent monitors (no alerts wired up) get flagged, loud
  • Group by tag: env, team, service — whatever you use
  • Outlier lists: slowest, flakiest, SSL expiring soon

See it running on your URLs.

Less than a minute to your first check. Free during beta.