Skip to main content

Planned private beta surface

Flock GitHub App

Run synthetic review checks on pull requests and post evidence-backed findings where teams already review code.

This surface is on the launch path. The current target shape is repo-scoped configuration, preview URL resolution, check runs, inline comments, and summary updates.

Evidence-backed by default

The app should only post findings when an issue is observable: readable UI copy, visible state mismatch, screenshot evidence, console/runtime error, or network failure. No evidence means no claim is posted.

How the PR loop works

01

Install + scope

Install the app on selected repos and map each repo to preview or staging targets first.

02

Run on PR events

On opened, reopened, or synchronized PRs, Flock will resolve the target URL and run scoped synthetic journeys.

03

Post grounded feedback

The app will post or update inline comments and a summary only when claims have attached evidence artifacts.

04

Review with clarity

A check run will report status with severity thresholds and direct links to artifacts.

Artifacts in pull requests

Inline finding comment (example)

Flock finding (Major, 92% confidence)

Claim:
The shipping method labels are visually identical, and novice personas selected the wrong option 3/4 times.

Evidence:
- Screenshot: artifacts://runs/run_013/step-07.png
- DOM text: "Standard" and "Priority" appear without delivery-time context
- Replay event: timeline://run_013/events/183

Suggested change:
Add helper text under each method (e.g., "3-5 days" vs "1-2 days") and increase spacing between radio cards.

Check run summary (example)

check: flock/user-friction
status: completed
conclusion: action_required

critical: 0
major: 2
moderate: 3

needs_review:
- checkout/shipping-method labels ambiguous (evidence: screenshot + DOM)
- payment tab-switch drops field state (evidence: console error + replay trace)

Target GitHub outputs

  • Inline comments anchored to changed files
  • PR summary with severity totals and top findings
  • Check run status for PR review
  • Artifact deep links (screenshots, DOM snippets, errors, traces)

Configuration shape

This draft shape shows the launch target: PR triggers, preview URL resolution, evidence requirements, and review thresholds should stay explicit.

# .flock.yml (illustrative target shape)
version: 1

github:
  pull_request:
    events:
      - opened
      - synchronize
      - reopened
    target_url:
      prefer: deployment_status
      url_pattern: "https://pr-{number}.preview.example.com"
      on_missing: skip

policy:
  minimum_evidence_per_claim: 1
  require_artifact_link: true
  fail_check_on:
    - severity: critical
    - severity: major
      confidence_gte: 0.85

evidence:
  include:
    - screenshot
    - dom_snapshot
    - console_error
    - network_failure

output:
  inline_comments: true
  summary_comment: true
  check_run: true
  update_existing_comments: true

Tell us how strict the PR check should be

We are deciding which evidence types and severity levels should warn, fail, or require review by default. Feedback now directly shapes the product.

Request Access