I built PRScope after noticing that large pull requests often get merged without anyone fully understanding their real risk surface.
The idea is simple:
Take the raw unified diff from a GitHub PR
Parse and structure it
Feed it into an LLM with a deterministic prompt
Generate a structured Markdown review that includes:
Severity levels
Risk assessment
Suggested improvements
Positive observations
The hard parts weren’t “using AI” — they were:
• Handling large diffs without blowing token limits
• Keeping output consistent across runs
• Avoiding hallucinated issues
• Making scoring feel rational instead of arbitrary
• Supporting both hosted APIs and local inference (Ollama)
One thing that improved reliability significantly was separating the prompt into:
I built PRScope after noticing that large pull requests often get merged without anyone fully understanding their real risk surface.
The idea is simple:
Take the raw unified diff from a GitHub PR
Parse and structure it
Feed it into an LLM with a deterministic prompt
Generate a structured Markdown review that includes:
Severity levels
Risk assessment
Suggested improvements
Positive observations
The hard parts weren’t “using AI” — they were:
• Handling large diffs without blowing token limits • Keeping output consistent across runs • Avoiding hallucinated issues • Making scoring feel rational instead of arbitrary • Supporting both hosted APIs and local inference (Ollama)
One thing that improved reliability significantly was separating the prompt into:
Analysis phase
Structured output phase
It currently works as a CLI and GitHub Action.
I’m especially curious about:
Deterministic scoring approaches
Handling monorepo-scale PRs
Preventing false positives in AI review
CI performance tradeoffs
Repo: https://github.com/KinanNasri/PRScope
Happy to answer questions.