Evaluation is the process of sending each instruction in a document to the LLM for analysis. The LLM reads the instruction text alongside the active criteria set and, if available, the project’s product context. It returns a structured result for each instruction: a color that summarizes overall quality, a pass/fail result for every criterion, and written recommendations for anything that needs improvement. You see results stream in as they complete — you do not have to wait for the full document to finish before reviewing individual instructions.Documentation Index
Fetch the complete documentation index at: https://www.doc-reviewer.site/llms.txt
Use this file to discover all available pages before exploring further.
Running an evaluation
To evaluate a document, open it in the Evaluate view and click Evaluate. Doc Reviewer sends allinstruction and possible sections that are marked as included. Sections classified as non-instruction, and any instructions you have manually excluded, are skipped.
Real-time streaming progress
Results stream in as they complete. As each instruction finishes, its result appears in the document tree and in the results panel immediately — you do not wait for the full batch to complete. This means you can start reviewing early results while the remaining instructions are still being evaluated. Doc Reviewer evaluates instructions one at a time. On transient errors such as network timeouts, it retries automatically before reporting a failure for that instruction.The color scale
Every evaluated instruction receives one of four colors based on how many criteria it failed and how severely:Green
No errors, at most one warning. The instruction meets all criteria or has only minor issues that do not affect usability.
Yellow
Has warnings or at most one error. Non-critical criteria failed. The instruction has gaps but is still functional for most readers.
Orange
Two or three errors. Important criteria failed. The instruction has notable problems that are likely to cause confusion or errors for readers.
Red
Four or more errors. The instruction is significantly incomplete. Critical structural elements are missing.
ok, warning, or error. The number of error values determines the color tier.
Per-criterion results
For each evaluated instruction, Doc Reviewer shows a result for every criterion in your active criteria set. You can expand an instruction to see a breakdown that lists each criterion with its result:- ok — the criterion is fully met
- warning — the criterion is partially met or has minor issues
- error — the criterion is not met
ok automatically.
Recommendations
When the LLM gives a criterion awarning or error result, it also writes a recommendation explaining what is missing or needs improvement. Recommendations appear below the per-criterion results for each instruction. Each recommendation includes a description of the problem and, where helpful, a brief example showing what the corrected content should look like.
False positives
Sometimes the LLM flags a criterion as failed when the instruction actually satisfies it — for example, a prerequisite section that is phrased unconventionally, or a result description that uses a valid alternative structure. You can mark individual criterion results as false positives to override the LLM’s assessment. To mark a false positive, click the flag icon next to the criterion result in the instruction detail panel. The override is stored separately from the LLM result and is preserved across re-evaluations.False positive overrides are never reset when you re-evaluate. When you run evaluation again, Doc Reviewer updates the color, criterion results, and recommendations from the new LLM response — but any overrides you have set remain in place.
Re-evaluation
You can re-run evaluation on a document at any time. This is useful when:- You have updated your criteria set and want to apply the new rules
- You have regenerated or manually edited the product context and want more accurate results
- The LLM returned an unexpected result on a previous run and you want a fresh assessment