# prperf Manual

prperf is a thin GitHub App that checks each PR for performance regressions. Measurement runs inside your CI via the open-source Ruby sampling profiler [rperf](https://github.com/ko1/rperf); prperf just compares the base (e.g. main) against this PR and reports on the PR. If you know Codecov for test coverage, this is that, for performance.

Open a PR and the Check Run shows numbers like:

> 2,001ms → 2,140ms (+7%) · alloc 48,741 → 59,950 (+23%) · GC 4 → 7

The next chapter, "What prperf is," gives the overview.

## A tour of prperf

### Setup

1. Install the GitHub App.
2. Provide a benchmark. Here we'll measure boot time with `bin/rails runner ""`.
3. Add a workflow that runs it, triggered on both `push` (the default branch) and `pull_request`.

```yaml
# .github/workflows/prperf.yml
name: prperf
on:
  push:
    branches: [main, master]   # records the base (default branch; list both so main or master works)
  pull_request:                # compared against the base
jobs:
  bench:
    runs-on: ubuntu-latest
    permissions: { contents: read, id-token: write }
    steps:
      - uses: actions/checkout@v6
      - uses: ruby/setup-ruby@v1
        with: { bundler-cache: true }
      - uses: rperf-dev/prperf-action@v1
        with:
          run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- bin/rails runner ""   # ← your measurement command (step 2)
```

Options like threshold alerts, multiple benchmarks (`benchmark`), comment control (`comment`), and run count (`count`, default 3, median) are available too (see "Setup").

### Results

On each PR, the result shows up right in the PR's Checks (a summary compared against the base). A comment is posted only when a threshold is exceeded, and the flamegraph diff shows which method got heavier (see "Reading results").

Every PR and push also records a measurement, so you can browse the history over time at [prperf.atdot.net](https://prperf.atdot.net).

prperf never blocks CI and needs no secrets. PRs from forks can't be measured, and during the free beta only public repositories are supported.

# What prperf is

First, let's be clear about **what prperf is**.

## In one line

**prperf is a thin GitHub App that automatically checks each PR for performance
regressions and reports the result on the PR.** Measurement happens inside your
CI with the open-source Ruby profiler [rperf](https://github.com/ko1/rperf); prperf
itself just compares **the base branch's latest measurement (usually main)**
against **this PR's**, and reports.

We call that base-branch baseline **base** and the PR side **head** — the same
terms GitHub uses for pull requests. The rest of this manual uses them.

> If you know **Codecov** for test coverage, prperf is that, for performance.

rperf is a **time (CPU) sampling profiler** — at heart a **flamegraph** of where
time went. prperf pulls **run time, GC, and allocations** out of that profile and
compares base vs head. Open a PR and the Check Run shows a **summary** like:

> 2,001ms → 2,140ms (+7%) · alloc 48,741 → 59,950 (+23%) · GC 4 → 7

prperf lets you notice "how performance changed in this commit" **at the PR
stage, before it merges**.

## What it does / doesn't do

**It does:**

- Show the base→head performance delta (allocations, GC, time) on the Check Run
  for every PR
- Comment on the PR only when a threshold is exceeded (sticky, quiet
  notifications)
- Visualize *which method got heavier* with a flamegraph diff

**It does not:**

- **It is not production monitoring.** It complements Datadog / Grafana rather
  than replacing them (those watch production; prperf catches regressions at the
  PR stage).
- **It never fails your CI.** The verdict is informational; the Check's
  conclusion is always success.
- **It never runs your code on the server.** Measurement happens **inside your
  CI**, and prperf only receives the result (the profile) and compares it. That
  is what makes it a "thin" App — light on both security and cost.

## The big picture

```
Your CI (GitHub Actions)
  └─ prperf-action
       ├─ measures your benchmark N times with rperf
       └─ uploads the profiles (.json.gz) to the prperf server
            │  (authenticated with the GitHub OIDC token — no secrets)
            ▼
prperf server
  ├─ compares base vs PR
  └─ reports on the Check Run / PR comment
```

## The user experience

1. Install the GitHub App on your repository
2. Add a few lines of the provided GitHub Action to your workflow
3. Open a PR and the result appears on the Check and a PR comment

## Why it's trustworthy (the design ideas)

- **The verdict leans on deterministic metrics.** What rperf measures is mainly
  time (the flamegraph), but CI wall time swings ±10–20%. So **for the verdict**
  prperf weights the **allocation and GC counts**, which don't (time stays
  informational). The result still shows time, GC, allocations, and the
  flamegraph.
- **No secrets.** Authentication is the GitHub Actions OIDC token — no API keys
  to issue or manage.
- **Quiet notifications.** The Check Run is a permanent home (zero
  notifications); a comment appears only on a threshold breach, one per PR,
  edited in place.
- **You see "why."** The flamegraph diff pinpoints the method that got heavier.

## Who it's for

prperf suits authors of public gems and libraries who want to stop performance
regressions at the PR. A dependency bump or a refactor can quietly add
allocations or slow the boot; prperf catches it before the PR merges.

It also suits teams whose private apps care about performance — Rails apps where
speed drives UX or revenue. You catch a heavy change in the same place you review
it, before it reaches production.

Pricing: public repositories are free, private repositories are on a paid plan
(currently public-only during the free beta).

## How to read this manual

- **Setup** — installing and adding the workflows
- **Writing a benchmark** / **the Rails quickstart** — what to
  measure
- **Reading the results** / **Reading a flamegraph** — interpreting the output

Next, head to Setup.

# Setup

Getting prperf running takes about 10–15 minutes. For an overview of the
service, see "What prperf is."

There are three things to do:

1. Install the GitHub App
2. Provide a benchmark
3. Add a workflow that runs it (triggered on both PRs and pushes to the default branch)

## Prerequisites

- **rperf 0.10 or newer** in your Gemfile. prperf uses the `meta` / `summary`
  embedded in the profile; with an older rperf the action stops with a clear
  error.
- A **benchmark command** to measure (see below).
- A **public repository**. Private repositories require a paid plan (currently
  public-only during the free beta).

## Install the GitHub App

Install the prperf GitHub App on your repository from its App page. This lets
prperf write the Check Run and PR comments for that repository.

## Provide a benchmark

What you measure determines the range of regressions you can catch. A good
benchmark is deterministic, runs the path you care about, and does enough work
to be stable. What and how to measure depends on your project, so the how-to and
per-project examples are in "Writing a benchmark."

For this guide we measure one concrete example: a Rails app's boot. The
benchmark is just `bin/rails runner ""` — it boots the app and runs an empty
script, so there's no benchmark file to write. The next section wraps it in
rperf and puts it in the workflow.

## Add the workflow

Add a workflow that runs your benchmark. prperf compares the **PR
head** against the **latest snapshot of the base branch**. Trigger the workflow
on both `push` (the default branch) and `pull_request`: the push records the
base, and the PR is compared against it (prperf tells them apart from the OIDC
token's ref). List both `main` and `master` for the branch so it works either
way — only your default branch exists, so only that one fires. Until the default
branch has been pushed once, there's nothing to compare against: "No base
snapshot found — showing this run's numbers only."

```yaml
# .github/workflows/prperf.yml
name: prperf
on:
  push:
    branches: [ main, master ]   # records the base (your default branch)
  pull_request:                  # compared against the base

jobs:
  bench:
    runs-on: ubuntu-latest
    permissions:
      contents: read         # for checkout
      id-token: write        # required for OIDC upload
    steps:
      - uses: actions/checkout@v6
      - uses: ruby/setup-ruby@v1
        with:
          bundler-cache: true
      - uses: rperf-dev/prperf-action@v1
        with:
          run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- bin/rails runner ""
```

In `run:`, wrap the command you want to measure in `rperf record`; it must write
at least one profile. Point its output at the action-provided `$PRPERF_DIR` with
`--snapshot-dir "$PRPERF_DIR"`.

You **must** include `permissions: id-token: write`. Without it there is no
OIDC token and the upload cannot happen. `contents: read` lets
`actions/checkout` fetch the repository; once you set `permissions:`, anything
you don't list defaults to none, so both are spelled out.

## Thresholds and comments (optional)

A threshold is what gives you a ⚠️ on the Check Run when something regresses, and
where you draw the line — how much of an increase counts — is yours to set, per
metric. Concretely, it caps how much a metric (allocations, GC, time, etc.) may
increase from base to head; crossing it adds the ⚠️ and, per `comment`, a PR
comment. Thresholds are **optional**: without them the Check Run still shows
numbers, but no ⚠️ and no comment. Add them only when you want to be warned on a
regression.

All configuration lives **in the workflow** — there is no separate config file.
Write the **global defaults** once in the job's `env`, and **override per
benchmark** if needed.

```yaml
jobs:
  bench:
    runs-on: ubuntu-latest
    permissions: { contents: read, id-token: write }
    env:
      PRPERF_DEFAULT_THRESHOLDS: |     # applies to every benchmark
        alloc: "+10%"
        total_ms: "+20%"
    steps:
      - uses: actions/checkout@v6
      - uses: ruby/setup-ruby@v1
        with: { bundler-cache: true }
      - uses: rperf-dev/prperf-action@v1
        with:
          run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- bin/rails runner ""
```

Threshold keys, with a recommended starting value for each (prperf has no
built-in threshold — they take effect only once you set them):

| Key | Recommended default | Meaning |
|---|---|---|
| `alloc` | `"+10%"` | Allocation increase. Can also be absolute, e.g. `"+5000"` |
| `gc_count` | `"+2"` | GC count (minor+major) increase |
| `total_ms` | `"+20%"` | Wall-time increase. Noisy, so use relative (%) |
| `cpu_ms` | `"+15%"` | CPU-time increase |
| `method` | (none) | When a named method's self-time share exceeds the given %. E.g. `{ "JSON.generate": "15%" }` |

- Summary values are `"+N%"` (relative) or `"+N"` (absolute); method values are
  `"N%"`.
- Invalid values are ignored, with one warning line on the Check Run (CI is
  never failed).
- Relative thresholds (`+10%`) generalize cleanly across benchmarks. Absolute
  and method thresholds mean different things per benchmark, so override them
  per benchmark only when needed.

Comment behavior is controlled by the `comment` input (default `on_threshold`):

| Value | Behavior |
|---|---|
| `on_threshold` | Comment only when a threshold is exceeded (default) |
| `always` | Comment every time |
| `never` | Never comment (Check Run only) |

There is one comment per PR, and each push **edits the same comment**, so
notifications stay at one.

## Action inputs

| Input | Default | Description |
|---|---|---|
| `run` | (required) | Measurement command; must emit at least one `.json.gz` |
| `prepare_run` | `""` | One-time setup before measuring (generate fixtures, seed, etc.); not measured |
| `count` | `3` | Number of runs; the server compares the median |
| `benchmark` | `default` | Benchmark series name; one commit can carry several, compared independently |
| `thresholds` | `""` | Thresholds for this benchmark (overrides the global defaults per key) |
| `comment` | `on_threshold` | Comment behavior |
| `server` | `https://prperf.atdot.net` | prperf server (replaceable) |
| `upload` | `true` | Set `false` to measure without uploading |

## Multiple benchmarks

You can measure one commit with several benchmarks — use **one step per
benchmark** with a distinct `benchmark` name. The server compares each against
its own base and shows them all in **one Check Run**.

```yaml
- uses: rperf-dev/prperf-action@v1
  with:
    benchmark: boot
    run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- bin/rails runner ""
- uses: rperf-dev/prperf-action@v1
  with:
    benchmark: render
    run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- ruby bench/render.rb
```

Use the **same benchmark names** on the PR and push-to-default-branch triggers
so each series has a baseline.

## Verify it works

1. Push to main first → the workflow runs on the push and the base snapshot
   reaches the server.
2. Open a PR → the workflow runs on the PR; **numbers on the Check Run** mean
   success.
3. A link to the uploaded result also appears in each job's **Summary**.

## Limitations

- **PRs from forks cannot upload.** GitHub does not grant `id-token: write` to
  fork-triggered workflows, so no OIDC token is available. Same-repository
  branch PRs work normally.
- Upload problems (plan limits, rate limits, server errors) are **warnings
  only**; the step still succeeds. Only the measurement command itself failing
  fails the step.
- During the free beta, **public repositories only**. Private repositories are
  coming with paid plans.

# Writing a benchmark

The numbers prperf reports are decided almost entirely by what you measure.
Benchmark design is the part of this chapter that most shapes the result, and
the part that takes the most effort.

## What a benchmark is

To prperf, a benchmark is the command you pass to `run:`. Usually it's a small
Ruby script (for example `bench/main.rb`) wrapped in `rperf record`:

```yaml
run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- ruby bench/main.rb
```

Put rperf in your Gemfile (0.10 or newer, so `bundle exec rperf` resolves). The
action runs this `count` times (default 3) and the server compares the median
against base. What you write is the body of `bench/main.rb` — a script that does
a representative chunk of work.

## What makes a good benchmark

A good benchmark satisfies three things.

1. **It exercises the code you care about.** A PR that doesn't touch that code
   leaves the numbers unchanged.
2. **It is deterministic** (does exactly the same work every time). Otherwise
   alloc and GC drift and you get warnings that aren't regressions.
3. **It does a fixed amount of work.** If it finishes in an instant it collects
   few samples and the result is unstable.

The axis of regression judgement is the allocation count. When a benchmark is
deterministic, the allocation count is stable to the single object across PRs,
so even a small increase is caught. GC counts are deterministic and easy to
count too, so they are shown alongside. A benchmark that drifts loses that
stability.

## How to write one

The basic shape of `bench/main.rb` is: build a fixed input once, warm up, then
repeat the real loop enough times.

```ruby
# bench/main.rb
require "json"
require_relative "../config/environment"   # if needed (Rails, etc.)

# 1) Build the fixed input once (no randomness, time, or network)
DATA = { "users" => Array.new(100) { |i| { "id" => i, "name" => "user#{i}" } } }

# 2) Warm up (exclude one-time lazy loading / initialization from the measurement)
JSON.generate(DATA)

# 3) The real thing: repeat enough times
5_000.times do
  JSON.generate(DATA)
end
```

Fix the input; don't let it depend on `rand`, `Time.now`, a DB, an external API,
or filesystem enumeration order. If you truly need randomness, pin it with
`srand(42)`. Warm up so that one-time work (autoload, constant init, lazy
loading) stays out of the measurement. Tune the count (5,000 here) so the whole
run takes a few hundred ms to a few seconds: too short is unstable, too long
slows CI down.

## Check that it's deterministic

Before wiring it into CI, run it two or three times locally and confirm the
allocation and GC counts are identical each time. `rperf stat` prints the
summary to stderr.

```sh
bundle exec rperf stat -- ruby bench/main.rb
bundle exec rperf stat -- ruby bench/main.rb
```

If `allocated_objects` and the GC counts match across runs, the benchmark is
deterministic. If they drift, eliminate the causes one at a time.

- [ ] No `rand` or `SecureRandom` (pin with `srand` if you must)
- [ ] The result doesn't depend on `Time.now` or `Date.today`
- [ ] No network or external services
- [ ] No dependence on changing DB state (use fixed in-memory data, fixtures, or
      a fixed seed)
- [ ] No dependence on file enumeration order (`Dir.glob` order, etc.)
- [ ] The input size is the same every run

To get a sense of the cause, look inside with the flamegraph.

```sh
bundle exec rperf record -o out.json.gz -- ruby bench/main.rb
bundle exec rperf report out.json.gz       # opens the viewer
```

Once you've pinned down the nondeterminism, confirm the match with `rperf stat`
again before putting it into CI.

## Preparation (optional)

If you have setup that should run once before the benchmark, put it in
`prepare_run:`. Generating fixtures, seeding a DB, and building assets all
qualify. It runs once before the measurement and is not included in it.

```yaml
- uses: rperf-dev/prperf-action@v1
  with:
    prepare_run: bin/rails db:prepare db:seed   # once, before measuring
    run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- ruby bench/request.rb
```

A failure here fails the step. Use a fixed seed or input so each run starts from
the same state.

## Per-project examples

Use the workflow from "Setup" as is, and just point `run:` at each
`bench/*.rb`. Only benchmarks that need preparation (generating fixtures,
seeding a DB, building assets) add a `prepare_run:`.

### gem / library

Call the public API N times on deterministic, fixed input. With the input and
call count fixed, you measure only regressions in the public API.

```ruby
# bench/main.rb
require "your_gem"

# Build the fixed input deterministically (no randomness, time, or network)
DATA = { "items" => Array.new(200) { |i| { "id" => i, "name" => "item-#{i}" } } }

YourGem.encode(DATA)                 # warm up
5_000.times { YourGem.encode(DATA) }
```

Change the `require`, the fixed input `DATA`, the API you call, and the count.
You can reuse an existing `benchmark/` script (benchmark-ips and the like), but a
time-based loop varies its iteration count and makes alloc drift, so switch to a
fixed-count loop.

### Sinatra / Rack apps

For any Rack app, send one request through the full stack N times.

```ruby
# bench/request.rb
require_relative "../app"            # load your Sinatra/Rack app
require "rack/mock"

app  = Sinatra::Application           # classic style. Modular: app = MyApp
PATH = ENV.fetch("BENCH_PATH", "/")   # change to the path you want to measure
make = -> { Rack::MockRequest.env_for(PATH, "HTTP_HOST" => "localhost") }
pump = ->(r) { b = r[2]; b.each { |_| }; b.close if b.respond_to?(:close) }

3.times    { pump.call(app.call(make.call)) }   # warm up
2_000.times { pump.call(app.call(make.call)) }
```

Set `run:` to `ruby bench/request.rb`. Change the load line, `app` (the object
you `run` in `config.ru`), `PATH`, and the count. If the request reads a DB, add
a seed to the preparation and a postgres service to the workflow (see the "Rails
quickstart").

### CLI / plain Ruby

Calling the entry point in-process N times with fixed arguments avoids startup
cost and external state, and keeps samples stable.

```ruby
# bench/main.rb
require_relative "../lib/my_cli"

ARGS = %w[build --format json]        # fixed arguments
200.times { MyCli.run(ARGS) }         # call your entry point
```

To measure the executable itself, confirm it does enough work and then pass it
to `run:` directly. A single short startup collects few samples and is unstable,
so loop over it, or use the in-process loop above.

### Rails apps

Rails is covered in the next chapter, the "Rails quickstart" — boot, endpoints,
typical queries, and jobs are all there. Roda and grape are Rack apps, so measure
them as in
"Sinatra / Rack"; Hanami follows the same idea as Rails.

## Anti-patterns

The following ways of measuring move the numbers for reasons unrelated to the
PR's changes, so avoid them.

- **Measuring the test suite as is** (`rperf record -- rspec`). Adding tests in a
  PR inflates alloc, and you can't tell that apart from a regression.
- **Depending on randomness, time, or network.** It drifts every run and causes
  false positives.
- **Too short.** Time drifts and alloc is too small for a delta to show.
- **Measuring a path you barely care about.** The PR never touches it, so it
  always reads "no change."
- **Using real external dependencies (API or DB).** It drifts with the network.

A giant everything-in-one benchmark is also worth avoiding, because it's hard to
tell what regressed. Split by concern and measure several benchmarks for one
commit, and you can trace the regressed code through the Check Run and the diff
(for how to split, see "Multiple benchmarks" in "Setup").

## How this ties to thresholds

A deterministic benchmark lets you set tight relative thresholds (for example
`alloc: "+5%"`) without false positives. A benchmark that drifts forces you to
loosen the thresholds, and the signal weakens. Start with one benchmark that
measures, deterministically, the path your PRs are most likely to touch, and get
to a state where base and head show a difference.

# Rails quickstart

Before agonizing over "what to measure," here is an **almost copy-paste**
starting point for Rails, in two steps. Step ① gives you the "numbers appear"
experience in 30 seconds; go to ② when you want more.

## Measure boot first (no extra files)

`bin/rails runner ""` **boots the app and exits doing nothing**, so it measures
boot itself. It catches added gems, heavier initializers, and autoload changes;
it is **deterministic** and needs no extra files and no database.

Paste this as `.github/workflows/prperf.yml`:

```yaml
name: prperf
on:
  push:
    branches: [main, master]   # records the base (default branch)
  pull_request:                # compared against the base

jobs:
  bench:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write
    steps:
      - uses: actions/checkout@v6
      - uses: ruby/setup-ruby@v1
        with:
          bundler-cache: true
      - uses: rperf-dev/prperf-action@v1
        with:
          benchmark: boot
          run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- bin/rails runner ""
```

The single workflow's push records the base, so that alone gives you **boot
alloc/GC compared on every PR**.

> Make sure rperf 0.10 or newer is in your Gemfile.

## Measure one request (the full version)

This measures the allocations and GC along an endpoint's request-handling path.
Paste three files.

### A measurement environment `config/environments/benchmark.rb`

A production-like but CI-friendly dedicated environment.

```ruby
# config/environments/benchmark.rb
require_relative "production"

Rails.application.configure do
  config.eager_load = true          # load all code, like production
  config.force_ssl = false          # avoid an SSL redirect measuring nothing
  config.hosts.clear                # drop host restrictions (for the benchmark)
  config.require_master_key = false # boot without the master key
  config.log_level = :warn
  config.consider_all_requests_local = false
end
```

### The benchmark `bench/request.rb`

```ruby
# bench/request.rb — one request through the full stack, N times
require_relative "../config/environment"
require "rack/mock"

PATH = ENV.fetch("BENCH_PATH", "/api/health")  # ← change to the endpoint you care about

app = Rails.application
build_env = -> { Rack::MockRequest.env_for(PATH, "HTTP_HOST" => "localhost") }

consume = lambda do |result|
  body = result[2]
  body.each { |_| }                 # consume the body so rendering is measured
  body.close if body.respond_to?(:close)
end

# warm up (autoload, template compilation, connection setup)
3.times { consume.call(app.call(build_env.call)) }

1_000.times { consume.call(app.call(build_env.call)) }
```

### The workflow `.github/workflows/prperf.yml`

```yaml
name: prperf
on:
  push:
    branches: [main, master]   # records the base (default branch)
  pull_request:                # compared against the base

jobs:
  bench:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write
    services:
      postgres:                      # drop services and db:prepare if you don't use a DB
        image: postgres:16
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
        ports: [ "5432:5432" ]
        options: >-
          --health-cmd pg_isready --health-interval 10s
          --health-timeout 5s --health-retries 5
    env:
      RAILS_ENV: benchmark
      SECRET_KEY_BASE: dummy-for-benchmark
      DATABASE_URL: postgres://postgres:postgres@localhost:5432/app_benchmark
    steps:
      - uses: actions/checkout@v6
      - uses: ruby/setup-ruby@v1
        with:
          bundler-cache: true
      - run: bin/rails db:prepare db:seed   # only if the request hits the DB; seed fixed data
      - uses: rperf-dev/prperf-action@v1
        with:
          benchmark: boot
          run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- bin/rails runner ""
      - uses: rperf-dev/prperf-action@v1
        with:
          benchmark: request
          run: bundle exec rperf record --snapshot-dir "$PRPERF_DIR" -- ruby bench/request.rb
```

The single workflow records the base on push to the default branch.

## The only things you change

- **`PATH`** (`bench/request.rb`) — the endpoint to measure. A **JSON/API
  endpoint is easiest** (no asset precompilation, less likely to be auth-gated).
- **`db:seed`** — if the request reads the DB, provide fixed seed data; if not,
  delete the postgres service and the db line.
- **The count (1,000)** — tune so the whole run is a few hundred ms to a few
  seconds.

## When it doesn't work

- **Empty results / all redirects** — `force_ssl` is issuing 301s, or auth is
  blocking you. The `benchmark` environment already sets `force_ssl = false`. For
  an auth-gated path, pick a public endpoint or build a signed-in env in
  `bench/request.rb`.
- **Asset-related errors** — an asset helper in the view. The quick fix is to
  measure an **API/JSON endpoint**.
- **Numbers jiggle every run** — check that the seed is fixed and the request
  has no time/randomness (see the "Writing a benchmark" checklist). Locally, run
  `RAILS_ENV=benchmark bundle exec rperf stat -- ruby bench/request.rb` twice and
  confirm alloc/GC match.

# Reading the results

prperf reports in three places: the **Check Run** (numbers), the **PR comment**
(when a threshold is exceeded), and the **flamegraph viewer** (to dig in). Here
is how to read each.

## The Check Run

A Check Run named `prperf` appears in the PR's Checks. Its **conclusion is
always success** — prperf never fails your CI. The verdict is informational.

### Title

```
2,001ms → 2,140ms (+7%) · alloc 48,741 → 59,950 (+23%) · GC 4 → 7
```

The key `base → head` metrics. If any metric exceeded its threshold, a **⚠️** is
prepended.

### Summary (body)

- **base / head metric table** — alloc, GC (minor/major), GC time, total ms,
  CPU ms, max RSS. Rows over threshold are **bold**.
- **Top 10 method diff rows** — `base self% → head self% → Δpt`, so you can see
  which methods grew in share.
- **A diff link to the viewer** — the way into the flamegraph.
- When there is no base, it shows "No base snapshot found — showing this run's
  numbers only."

### Which numbers to trust

- **Allocation and GC counts are deterministic** — they barely move even when
  the CI runner changes. These are the primary signal for regressions.
- **Time (total_ms / cpu_ms) is noisy** — CI wall time swings ±10–20%. We
  compare the **median** of `count` (default 3) runs, but treat time as
  meaningful only when it moves a lot.

## The PR comment (sticky)

When a threshold is exceeded (subject to the `comment` setting), one comment is
posted on the PR.

- **One comment per PR.** Each push **edits the same comment**, so notifications
  stay at one and the thread isn't spammed.
- Exceeded metrics are listed like
  `⚠️ **alloc** 48,741 → 59,950 (threshold +10%)`. With multiple benchmarks the
  benchmark name is included.

With `comment: never` there is no comment, only the Check Run; with `always` it
comments every time even without an exceedance.

## The flamegraph viewer

Open it from the Check Run's diff link or a shareable URL
(`/view/<repo>/<sha>`). It is rperf's viewer, so the controls are the same.

- **Flamegraph** — width is time (weight); wider is heavier.
- **Top tab (table)** — the same data as a table of methods sorted by self and cumulative weight; handy when the flamegraph is hard to read or you want to tackle the heaviest first. The `Tags` tab breaks it down by rperf label (tag).
- **Diff mode** — colors the difference between base and head: **methods whose
  share increased are red, decreased are blue**. The Check Run's diff link opens
  in this mode.
- **Time-travel sidebar** — past snapshots of that benchmark series are listed,
  so you can walk main's trend. `j` / `k` move to newer / older.
- **Pin a method** — Shift+click a method to pin it; a sparkline shows its share
  across snapshots.

Share the permanent link (`/view/<repo>/<sha>`). Under the hood the viewer
fetches a short-lived signed URL on demand, so revoked access takes effect
within ten minutes.

With multiple benchmarks, open
`/view/<repo>/diff?base=<sha>&head=<sha>&bench=<name>` to diff a particular
benchmark. The viewer sidebar is scoped to one benchmark series, so boot and
endpoint snapshots never interleave.

## The dashboard

- **`/` (top)** — the marketing/explanation page.
- **`/me` (after sign-in)** — a list of repositories you can see, each linking
  to the latest result per benchmark. Public repositories appear for everyone;
  private ones only for people with GitHub read access.

Sign-in is GitHub OAuth; there is no prperf-specific account. Authorization
always follows GitHub permissions (public snapshots are viewable without
signing in).

## What to do when a regression is flagged

1. From the **⚠️ title and metric table**, see what grew (alloc / GC / time / a
   specific method) and by how much.
2. Open the flamegraph via the **diff link** and find the methods that turned
   red.
3. For an allocation increase, look for where extra objects are created; for GC,
   that's the consequence; for time, suspect **CI noise** first.
4. Push a fix and the same Check Run and sticky comment **update in place**.

## Common states

- **"No base snapshot found"** — there is no latest base-branch snapshot yet.
  The workflow must have run on a push at least once on a commit that is an
  ancestor of the PR.
- **Always "no change"** — the PR simply doesn't touch the path the benchmark
  exercises. Check that the benchmark covers what you care about.
- **Numbers swing every time** — the benchmark has nondeterminism (randomness,
  time, I/O). Make it deterministic, or loosen / drop the time thresholds.

# Reading a flamegraph

A flamegraph is a picture of **where your program spent its time**. It takes a
little practice to read, so this chapter starts from zero and goes slowly.

## What the picture actually is

While measuring, prperf (rperf) takes an **enormous number of snapshots** of
your program and records "what was running at that moment" (the call stack).
Stacking thousands of those photos and tallying them gives the flamegraph.

- **One box = one method** (function).
- **The width of a box = the share of time spent in that method** (and whatever
  it called) — i.e. how many photos caught it running. **Wider = heavier.**
- **The vertical stacking = call depth.** The bottom is the entry point; the
  higher you go, the closer to the **leaf** (what was actually executing).

A picture beats words. Here is an example (bottom = entry, top = leaf):

```
          ┌──────────┐
          │ String#* │                      ← leaf: actually on the CPU (wider = heavier)
      ┌───┴──────────┴───┐
      │  JSON.generate   │                  ← its caller
   ┌──┴──────────────────┴────────────┐
   │          Integer#times           │     ← a loop: wide, but it just calls its children
┌──┴──────────────────────────────────┴──┐
│                 <main>                 │  ← the whole program; bottom = root, always ~100% wide
└────────────────────────────────────────┘
```

What this tells you:

- The bottom `<main>` is 100% wide. That's the **whole program**, so it is
  expected — don't be alarmed by it.
- The real cost is in the **wide boxes near the top** (here `String#*`). That is
  where the CPU was actually pinned.

## The two rules that matter most

1. **Only look at width.** Width = cost. Read each box as "if I could make this
   zero, that's how much I'd save."
2. **Ignore height.** A tall tower (deep nesting) is cheap if it's thin. A short
   box is heavy if it's wide. **Hunt for wide plateaus, not tall towers.**

## Common misreadings

- **Left-to-right is NOT time order.** It is not "this ran, then that"; boxes are
  just sorted (e.g. by name). Don't read it as a timeline.
- **Color means nothing in the single view.** It only separates boxes; "red" is
  not "hot." (Color *does* mean something in **diff mode** — see below.)
- **A 100%-wide box at the bottom is normal.** It just represents everything.

## A reading recipe

1. Find the **wide boxes near the top** — that's where your time goes.
2. **Click to zoom** into that box: its subtree fills the width, so the
   breakdown is easier to see.
3. If a method name interests you, **search** for it to highlight it across the
   whole graph — even if it's scattered, the combined width reveals the cost.
4. Prefer a table? Use the **Top tab**: methods listed by self / cumulative
   weight. Work down from the top.

## In prperf you mostly use "diff mode"

Opening the diff link from the Check Run puts the viewer in **diff mode** — this
is the heart of regression hunting.

- **Now color has meaning.** Compared to base, **methods whose share increased
  are red, decreased are blue**, and near-unchanged are neutral.
- So in diff mode you look for the **reddest** box, not the widest. That is
  "what got heavier in this PR."
- How to read it:
  - **Wide and red** = something already significant that got worse. Top
    priority.
  - **Narrow but bright red** = newly appeared, or grew sharply in share. A likely
    new culprit.
  - Remember: width = head's share, color = the change from base.

## Handy viewer controls

- **Click** — zoom into a box (its subtree fills the width).
- **Search box** — filter/highlight by name; type `JSON` to grasp all related
  spots at once.
- **Top / Tags tabs** — for table lovers; sorted by self / cumulative weight.
- **Shift+click** — pin a method; a sparkline shows its share across snapshots
  (time travel).
- **j / k** — move to newer / older snapshots.

## Note: this is a picture of *time*

prperf's flamegraph is a picture of **time (CPU weight)**. Allocation and GC
counts live in the **summary numbers** (the Check Run table). For an "alloc went
up" regression, use the flamegraph to find the method whose **time grew (turned
red)**, then check whether that path is creating extra objects.

## Summary (all you really need)

- Width = cost; ignore height.
- Wide near the top = where your time goes.
- Left-to-right is not order; single-view color is meaningless.
- In prperf's diff, **find the red box** — that's what got worse this time.