As the surface area for attacks on the web increases, Cloudflare’s Web Application Firewall (WAF) provides a myriad of solutions to mitigate these attacks. This is great for our customers, but the cardinality in the workloads of the millions of requests we service means that generating false positives is inevitable. This means that the default configuration we have for our customers has to be fine-tuned.
Fine-tuning isn’t an opaque process: customers have to get some data points and then decide what works for them. This post explains the technologies we offer to enable customers to see why the WAF takes certain actions — and the improvements that have been made to reduce noise and increase signal.
Cloudflare’s WAF protects origin servers from different kinds of layer 7 attacks, which are attacks that target the application layer. Protection is provided with various tools like:
Managed rules, which security analysts at Cloudflare write to address common vulnerabilities and exposures (CVE), OWASP security risks, and vulnerabilities like Log4Shell.
Custom rules, where customers can write rules with the expressive Rules language.
Rate limiting rules, malicious uploads detection, leaked credentials detection, etc.
These tools are built on the Rulesets engine. When there is a match on a Rule expression, the engine executes an action.
The Log action is used to simulate the behaviour of rules. This action proves that a rule expression is matched by the engine and emits a log event which can be accessed via Security Analytics, Security Events, Logpush or Edge Log Delivery.
Logs are great at validating a rule works as expected on the traffic it was expected to match, but showing that the rule matches isn’t sufficient, especially when a rule expression can take many code paths. In pseudocode, an expression can look like:
If any of the http request headers contains an “authorization” key OR the lowercased representation of the http host header starts with “cloudflare” THEN log
The rules language syntax will be:
any(http.request.headers[*] contains "authorization") or starts_with(lower(http.host), "cloudflare")
Debugging this expression poses a couple of problems. Is it the left-hand side (LHS) or right-hand side (RHS) of the OR expression above that matches? Functions such as Base64 decoding, URL decoding, and in this case lowercasing can apply transformations to the original representation of these fields, which leads to further ambiguity as to which characteristics of the request led to a match.
To further complicate this, many rules in a ruleset can register matches. Rulesets like Cloudflare OWASP use a cumulative score of different rules to trigger an action when the score crosses a set threshold.
Additionally, the expressions of the Cloudflare Managed and OWASP rules are private. This increases our security posture – but it also means that customers can only guess what these rules do from their titles, tags and descriptions. For instance, one might be labeled “SonicWall SMA - Remote Code Execution - CVE:CVE-2025-32819.”
Which raises questions: What part of my request led to a match in the Rulesets engine? Are these false positives?
This is where payload logging shines. It can help us drill down to the specific fields and their respective values, post-transformation, in the rule that led to a match.
Payload logging is a feature that logs which fields in the request are associated with a rule that led to the WAF taking an action. This reduces ambiguity and provides useful information that can help spot check false positives, guarantee correctness, and aid in fine-tuning of these rules for better performance.
From the example above, a payload log entry will contain either the LHS or RHS of the expression, but not both.
The payload logging and Rulesets engines are built on Wirefilter, which has been explained extensively.
Fundamentally, these engines are objects written in Rust which implement a compiler trait. This trait drives the compilation of the abstract syntax trees (ASTs) derived from these expressions.
struct PayloadLoggingCompiler {
regex_cache HashMap>
}
impl wirefilter::Compiler for PayloadLoggingCompiler {
type U = PayloadLoggingUserData
fn compile_logical_expr(&mut self, node: LogicalExpr) -> CompiledExpr {
// ...
let regex = self.regex_cache.entry(regex_pattern)
.or_insert_with(|| Arc::new(regex))
// ...
}
}
The Rulesets Engine executes an expression and if it evaluates to true, the expression and its execution context are sent to the payload logging compiler for re-evaluation. The execution context provides all the runtime values needed to evaluate the expression.
After re-evaluation is done, the fields involved in branches of the expression that evaluate to true are logged.
The structure of the log is a map of wirefilter fields and their values Map
{
“http.host”: “cloudflare.com”,
“http.method”: “get”,
“http.user_agent”: “mozilla”
}
Note: These logs are encrypted with the public key provided by the customer.
These logs go through our logging pipeline and can be read in different ways. Customers can configure a Logpush job to write to a custom Worker we built that uses the customer’s private key to automatically decrypt these logs. The Payload logging CLI tool, Worker, or the Cloudflare dashboard can also be used for decryption.
In wirefilter, some fields are array types. The field http.request.headers.names is an array of all the header names in a request. For example:
[“content-type”, “content-length”, “authorization”, "host"]
An expression that reads any(http.request.headers.names[*] contains “c”) will evaluate to true because at least one of the headers contains the letter “c”. With the previous version of the payload logging compiler, all the headers in the “http.request.headers.names” field will be logged since it's a part of the expression that evaluates to true.
Payload log (previous)
http.request.headers.names[*] = [“content-type”, “content-length”, “authorization”, "host"]
Now, we partially evaluate the array fields and log the indexes that match the expressions constraint. In this case, it’ll be just the headers that contain a “c”!
Payload log (new)
http.request.headers.names[0,1] = [“content-type”, “content-length”]
This brings us to operators in wirefilter. Some operators like “eq” result in exact matches, e.g. http.host eq “a.com”. There are other operators that result in “partial” matches – like “in”, “contains”, “matches” – that work alongside regexes.
The expression in this example: `any(http.request.headers[*] contains “c”)` uses a “contains” operator which produces a partial match. It also uses the “any” function which we can say produces a partial match, because if at least one of the headers contains a “c”, then we should log that header – not all the headers as we did in the previous version.
With the improvements to the payload logging compiler, when these expressions are evaluated, we log just the partial matches. In this case, the new payload logging compiler handles the “contains” operator similarly to the “find” method for bytes in the Rust standard library. This improves our payload log to:
http.request.headers.names[0,1] = [“c”, “c”]
This makes things a lot clearer. It also saves our logging pipeline from processing millions of bytes. For example, a field that is analyzed a lot is the request body — http.request.body.raw — which can be tens of kilobytes in size. Sometimes the expressions are checking for a regex pattern that should match three characters. In this case we’ll be logging 3 bytes instead of kilobytes!
I know, I know, [“c”, “c”] doesn’t really mean much. Even if we’ve provided the exact reason for the match and are significantly saving on the volume of bytes written to our customers storage destinations, the key goal is to provide useful debugging information to the customer. As part of the payload logging improvements, the compiler now also logs a “before” and "after” (if applicable) for partial matches. The size for these buffers are currently 15 bytes each. This means our payload log now looks like:
http.request.headers[0,1] = [
{
before: null, // isnt included in the final log
content: “c”,
after: “ontent-length”
},
{
before: null, // isnt included in the final log
content: “c”,
after:”ontent-type”
}
]
Example of payload log (previous)
Example of payload log (new)
In the previous log, we have all the header values. In the new log, we have the 8th index which is a malicious script in a HTTP header. The match is on the “
StackPages is a Serverless CMS for jamstack websites & AI Tools. Simplifying your code,
Deploy at the Edge, and Work with AI.
Everything you need to build and deploy fast Powered by Cloudflare's global network. Your site loads in
milliseconds, anywhere. No databases to hack. Content stored on GitHub, encrypted and version
controlled. One-click deployment to Cloudflare Workers. No DevOps required. 5+ professional templates included. Switch themes without losing
content. 100% open source. Customize everything. Deploy your own instance. Free tier includes 100k requests/day. Perfect for most projects. From zero to production in minutes Clone StackPages to your GitHub account One-click deploy to Cloudflare's edge network Create content and go live instantly {{description}} {{date}} Start building with StackPages today With free developers API With advanced options Need more information? Contact us
The fastest way to build in jamstack
Built for modern developers
Lightning Fast
Secure by Default
Easy to Deploy
Beautiful Templates
Open Source
Cost Effective
Get started in 3 steps
Fork the Repository
Deploy to Workers
Start Publishing
Annoucements
{{title}}
Video Tutorials
{{title}}
Choose your deployment
Self Git Hosted
Fork on GitHub
Managed Deployment
Contact Sales
Serverless Architecture
StackPages is built entirely on Cloudflare Workers, providing a global, high-performance, and secure foundation for your content.
Edge Rendering
Unlike traditional servers, our Worker runs on Cloudflare's global network. Every request is handled by the data center closest to the user, ensuring near-zero latency. HTML is generated dynamically at the edge, combining static templates with dynamic content.
Universal Feed Parser
The Worker acts as a universal adapter. It fetches RSS feeds from Substack, YouTube, and Podcast hosts, parses the XML, and transforms it into a unified JSON structure. This allows you to use any content source without changing your frontend code.
HTMX Native
The backend is aware of HTMX. When it detects an `HX-Request` header, it returns only the necessary HTML fragment (like a card or a list) instead of the full page. This creates a Single Page Application (SPA) feel without the complexity of client-side frameworks.
Smart Caching
To minimize latency and avoid hitting rate limits on external APIs, the Worker implements a smart caching strategy. Parsed feed data is cached using the Cache API and KV storage, ensuring that your site remains fast and available even if the source feeds are slow or down.
Worker Logic Snippet
// Example of Edge Routing Logic
export default {
async fetch(req, env) {
const url = new URL(req.url);
// 1. HTMX Detection
const isHtmx = req.headers.get("HX-Request") === "true";
// 2. Route Handling
if (url.pathname === "/api/posts") {
const posts = await getCachedRSS(env.SUBSTACK_URL);
if (isHtmx) {
// Return HTML fragment for HTMX
return generatePostCards(posts);
}
// Return JSON for API
return Response.json(posts);
}
// 3. Static Asset Fallback
return env.ASSETS.fetch(req);
}
}