Planned private beta surface
Flock GitHub App
Run synthetic review checks on pull requests and post evidence-backed findings where teams already review code.
This surface is on the launch path. The current target shape is repo-scoped configuration, preview URL resolution, check runs, inline comments, and summary updates.
Evidence-backed by default
The app should only post findings when an issue is observable: readable UI copy, visible state mismatch, screenshot evidence, console/runtime error, or network failure. No evidence means no claim is posted.
How the PR loop works
Install + scope
Install the app on selected repos and map each repo to preview or staging targets first.
Run on PR events
On opened, reopened, or synchronized PRs, Flock will resolve the target URL and run scoped synthetic journeys.
Post grounded feedback
The app will post or update inline comments and a summary only when claims have attached evidence artifacts.
Review with clarity
A check run will report status with severity thresholds and direct links to artifacts.
Artifacts in pull requests
Inline finding comment (example)
Flock finding (Major, 92% confidence)
Claim:
The shipping method labels are visually identical, and novice personas selected the wrong option 3/4 times.
Evidence:
- Screenshot: artifacts://runs/run_013/step-07.png
- DOM text: "Standard" and "Priority" appear without delivery-time context
- Replay event: timeline://run_013/events/183
Suggested change:
Add helper text under each method (e.g., "3-5 days" vs "1-2 days") and increase spacing between radio cards. Check run summary (example)
check: flock/user-friction
status: completed
conclusion: action_required
critical: 0
major: 2
moderate: 3
needs_review:
- checkout/shipping-method labels ambiguous (evidence: screenshot + DOM)
- payment tab-switch drops field state (evidence: console error + replay trace) Target GitHub outputs
- Inline comments anchored to changed files
- PR summary with severity totals and top findings
- Check run status for PR review
- Artifact deep links (screenshots, DOM snippets, errors, traces)
Configuration shape
This draft shape shows the launch target: PR triggers, preview URL resolution, evidence requirements, and review thresholds should stay explicit.
# .flock.yml (illustrative target shape)
version: 1
github:
pull_request:
events:
- opened
- synchronize
- reopened
target_url:
prefer: deployment_status
url_pattern: "https://pr-{number}.preview.example.com"
on_missing: skip
policy:
minimum_evidence_per_claim: 1
require_artifact_link: true
fail_check_on:
- severity: critical
- severity: major
confidence_gte: 0.85
evidence:
include:
- screenshot
- dom_snapshot
- console_error
- network_failure
output:
inline_comments: true
summary_comment: true
check_run: true
update_existing_comments: true Tell us how strict the PR check should be
We are deciding which evidence types and severity levels should warn, fail, or require review by default. Feedback now directly shapes the product.
Request Access