How it works
Blify Mod doesn't keyword-match. Every potentially-moderated message goes through a multi-stage pipeline designed to be cheap on easy cases and thorough on hard ones.
#The pipeline
message
│
▼
┌─────────────┐ trivially safe / rate-limited / immune
│ pre-filter ├─────────────────────────────────────────► skip
└─────┬───────┘
│
▼
┌─────────────┐ "no" — clearly doesn't match any rule
│ screen ├─────────────────────────────────────────► skip
└─────┬───────┘
│ "yes — worth a closer look"
▼
┌─────────────┐
│ judge │ full reasoning against the rule list
└─────┬───────┘
│ context_sufficient = false
▼
┌─────────────┐
│ reconsider │ pulls replied-to + prior author messages
└─────┬───────┘
│
▼
apply action + log + DM the offender
#1. Pre-filter
Before any AI call, the bot drops:
- Trivial content (very short, mostly emoji, etc.)
- Messages from users currently inside a per-user screen rate-limit
- Messages from immune users or users with an immune role
- Messages already flagged by Discord's native AutoMod (no double-action)
#2. Screen
A tiny, fast model is asked one yes/no question: "Could this plausibly violate one of THIS server's listed rules?" On every tier this is OpenAI's gpt-5-nano with a Claude Haiku 4.5 fallback if OpenAI is rate-limiting. The screen is a binary gate — nano is plenty for it, and the savings keep the bot affordable. The actual judgment is where tier matters.
The screen is intentionally conservative — it answers "yes" on anything ambiguous so the judge gets the final call.
#3. Judge
If the screen says "yes", the bot calls Claude with the full system prompt: your rule list, the target message, and a slice of channel context (8–20 messages on Free, 15–100 on Pro). Claude returns structured JSON:
{
"violation": true,
"rule_broken": "No harassment or hate speech",
"severity": 4,
"action": "timeout",
"timeout_minutes": 60,
"reason": "Targeted slur at another member.",
"context_sufficient": true
}
- Free uses OpenAI gpt-5-mini — fast, cheap, accurate on the vast majority of cases. Claude Haiku 4.5 is the automatic fallback if OpenAI is unavailable.
- Pro/Premium uses Claude Sonnet 4.6 — noticeably better on subtle context, sarcasm, and adversarial framing. OpenAI gpt-5-mini is the automatic fallback if Anthropic is unavailable.
#4. Reconsider (only when needed)
If the model says context_sufficient: false — i.e. "I can't be sure without seeing what this was a reply to" — the bot pulls the replied-to message and the user's prior consecutive messages, then re-runs the judge with that targeted context. This is what lets the bot correctly handle:
- Sarcasm: "yeah do it" is harmless after a joke and an admission after a threat.
- Callbacks: "same" referring to a slur made three messages back.
- Ongoing thoughts: a single sentence that's part of a longer paragraph the author was typing.
#5. Apply, log, and DM
The action is applied (or skipped, if dry-run is on), a log embed is posted to your log channel, and the offender gets a DM explaining what happened, the rule, and the appeal link if you've set one.
#Strictness caps
After the model decides on an action, the configured strictness level can downgrade it before it's applied. See /setstrictness:
| Level | What gets capped |
|---|---|
| Low | Timeouts, kicks, and bans are all converted to warnings |
| Medium | Kicks and bans are converted to timeouts |
| High | No cap — full action ladder (default) |
Strictness only ever softens the bot's decision; it never escalates one.
#What the bot will not do
- Enforce rules you didn't list. If a message contains content you find awful but no listed rule covers it, the bot returns
action: "none". Servers choose what to moderate. - Follow instructions inside messages. Every user message is sent inside a fenced "untrusted data" block, and the system prompt explicitly forbids treating message content as instructions. Prompt-injection attempts ("ignore previous instructions", fake JSON, fake admin claims) are themselves treated as evidence of bad-faith behavior.
- Action the server owner. Discord doesn't allow it.
- Action anyone whose top role is ≥ the bot's role. Move the bot's role up.
- Punish the user who @mentioned it. Manual scans only judge the content being scanned.
#Caching and deduplication
- Cheap-judge cache: identical message text + identical rule set returns the cached "clean" verdict for a few minutes, so re-pasted spam doesn't pay for repeated full-judge calls.
- AutoMod dedup: if Discord's native AutoMod has already flagged a message, Blify Mod skips it.
- Per-user rate limits: rapid-fire messages from a single author share a screen result for a short window.
- Per-server mention cooldown: manual @mention scans are limited to one every 5 minutes per server.