The Go Blog

More predictable benchmarking with testing.B.Loop

Junyang Shao
2 April 2025

Go developers who have written benchmarks using the testing package might have encountered some of its various pitfalls. Go 1.24 introduces a new way to write benchmarks that’s just as easy to use, but at the same time far more robust: testing.B.Loop.

Traditionally, Go benchmarks are written using a loop from 0 to b.N:

func Benchmark(b *testing.B) {
  for range b.N {
    ... code to measure ...
  }
}

Using b.Loop instead is a trivial change:

func Benchmark(b *testing.B) {
  for b.Loop() {
    ... code to measure ...
  }
}

testing.B.Loop has many benefits:

  • It prevents unwanted compiler optimizations within the benchmark loop.
  • It automatically excludes setup and cleanup code from benchmark timing.
  • Code can’t accidentally depend on the total number of iterations or the current iteration.

These were all easy mistakes to make with b.N-style benchmarks that would silently result in bogus benchmark results. As an added bonus, b.Loop-style benchmarks even complete in less time!

Let’s explore the advantages of testing.B.Loop and how to effectively utilize it.

Old benchmark loop problems

Before Go 1.24, while the basic structure of a benchmark was simple, more sophisticated benchmarks required more care:

func Benchmark(b *testing.B) {
  ... setup ...
  b.ResetTimer() // if setup may be expensive
  for range b.N {
    ... code to measure ...
    ... use sinks or accumulation to prevent dead-code elimination ...
  }
  b.StopTimer() // if cleanup or reporting may be expensive
  ... cleanup ...
  ... report ...
}

If setup or cleanup are non-trivial, the developer needs to surround the benchmark loop with ResetTimer and/or StopTimer calls. These are easy to forget, and even if the developer remembers they may be necessary, it can be difficult to judge whether setup or cleanup are “expensive enough” to require them.

Without these, the testing package can only time the entire benchmark function. If a benchmark function omits them, the setup and cleanup code will be included in the overall time measurement, silently skewing the final benchmark result.

There is another, more subtle pitfall that requires deeper understanding: (Example source)

func isCond(b byte) bool {
  if b%3 == 1 && b%7 == 2 && b%17 == 11 && b%31 == 9 {
    return true
  }
  return false
}

func BenchmarkIsCondWrong(b *testing.B) {
  for range b.N {
    isCond(201)
  }
}

In this example, the user might observe isCond executing in sub-nanosecond time. CPUs are fast, but not that fast! This seemingly anomalous result stems from the fact that isCond is inlined, and since its result is never used, the compiler eliminates it as dead code. As a result, this benchmark doesn’t measure isCond at all; it measures how long it takes to do nothing. In this case, the sub-nanosecond result is a clear red flag, but in more complex benchmarks, partial dead-code elimination can lead to results that look reasonable but still aren’t measuring what was intended.

How testing.B.Loop helps

Unlike a b.N-style benchmark, testing.B.Loop is able to track when it is first called in a benchmark when the final iteration ends. The b.ResetTimer at the loop’s start and b.StopTimer at its end are integrated into testing.B.Loop, eliminating the need to manually manage the benchmark timer for setup and cleanup code.

Furthermore, the Go compiler now detects loops where the condition is just a call to testing.B.Loop and prevents dead code elimination within the loop. In Go 1.24, this is implemented by disallowing inlining into the body of such a loop, but we plan to improve this in the future.

Another nice feature of testing.B.Loop is its one-shot ramp-up approach. With a b.N-style benchmark, the testing package must call the benchmark function several times with different values of b.N, ramping up until the measured time reached a threshold. In contrast, b.Loop can simply run the benchmark loop until it reaches the time threshold, and only needs to call the benchmark function once. Internally, b.Loop still uses a ramp-up process to amortize measurement overhead, but this is hidden from the caller and can be more efficient.

Certain constraints of the b.N-style loop still apply to the b.Loop-style loop. It remains the user’s responsibility to manage the timer within the benchmark loop, when necessary: (Example source)

func BenchmarkSortInts(b *testing.B) {
  ints := make([]int, N)
  for b.Loop() {
    b.StopTimer()
    fillRandomInts(ints)
    b.StartTimer()
    slices.Sort(ints)
  }
}

In this example, to benchmark the in-place sorting performance of slices.Sort,a randomly initialized array is required for each iteration. The user must still manually manage the timer in such cases.

Furthermore, there still needs to be exactly one such loop in the benchmark function body (a b.N-style loop cannot coexist with a b.Loop-style loop), and every iteration of the loop should do the same thing.

When to use

The testing.B.Loop method is now the preferred way to write benchmarks:

func Benchmark(b *testing.B) {
  ... setup ...
  for b.Loop() {
    // optional timer control for in-loop setup/cleanup
    ... code to measure ...
  }
  ... cleanup ...
}

testing.B.Loop offers faster, more accurate, and more intuitive benchmarking.

Acknowledgements

A huge thank you to everyone in the community who provided feedback on the proposal issue and reported bugs as this feature was released! I’m also grateful to Eli Bendersky for his helpful blog summaries. And finally a big thank you to Austin Clements, Cherry Mui and Michael Pratt for their review, thoughtful work on the design options and documentation improvements. Thank you all for your contributions!

Previous article: Goodbye core types - Hello Go as we know and love it!
Blog Index