proposal: testing: add Keep, to force evaluation in benchmarks #61179

aclements · 2023-07-05T16:12:34Z

Benchmarks frequently need to prevent certain compiler optimizations that may optimize away parts of the code the programmer intends to benchmark. Usually, this comes up in two situations where the benchmark use of an API is slightly artificial compared to a “real” use of the API. The following example comes from @davecheney's 2013 blog post, How to write benchmarks in Go, and demonstrates both issues:

func BenchmarkFib10(b *testing.B) {
	// run the Fib function b.N times
	for n := 0; n < b.N; n++ {
		Fib(10)
	}
}

Most commonly, the result of the function under test is not used because we only care about its timing. In the example, since Fib is a pure function, the compiler could optimize away the call completely. Indeed, in “real” code, the compiler would often be expected to do exactly this. But in benchmark code, we’re interested only in the side-effect of the function’s timing, which this optimization would destroy.
An argument to the function under test may be unintentionally constant-folded into the function. In the example, even if we addressed the first issue, the compiler may compute Fib(10) entirely at compile time, again destroying the benchmark. This is more subtle because sometimes the intent is to benchmark a function with a particular constant-valued argument, and sometimes the constant argument is simply a placeholder.

There are ways around both of these, but they are difficult to use and tend to introduce overhead into the benchmark loop. For example, a common workaround is to add the result of the call to an accumulator. However, there’s not always a convenient accumulator type, this introduces some overhead into the loop, and the benchmark must then somehow ensure the accumulator itself doesn’t get optimized away.

In both cases, these optimizations can be partial, where part of the function under test is optimized away and part isn’t, as demonstrated in @eliben’s example. This is particularly subtle because it leads to timings that are incorrect but also not obviously wrong.

Proposal

I propose we add the following function to the testing package:

package testing

// Keep returns its argument. It ensures that its argument and result
// will be evaluated at run time and treated as non-constant.
// This is for use in benchmarks to prevent undesired compiler optimizations.
func Keep[T any](v T) T

(This proposal is an expanded and tweaked version of @randall77’s comment.)

The Keep function can be used on the result of a function under test, on arguments, or even on the function itself. Using Keep, the corrected version of the example would be:

func BenchmarkFib10(b *testing.B) {
	// run the Fib function b.N times
	for n := 0; n < b.N; n++ {
		testing.Keep(Fib(testing.Keep(10)))
	}
}

(Or testing.Keep(Fib)(10), but this is subtle enough that I don’t think we should recommend this usage.)

Unlike various other solutions, Keep also lets the benchmark author choose whether to treat an argument as constant or not, making it possible to benchmark expected constant folding.

Alternatives

Keep may not be the best name. This is essentially equivalent to Rust’s black_box, and we could call it testing.BlackBox. Other options include Opaque, NoOpt, Used, and Sink.
testing: document best practices for avoiding compiler optimizations in benchmarks #27400 asks for documentation of best practices for avoiding unwanted optimization. While we could document workarounds, the basic problem is Go doesn’t currently have a good way to write benchmarks that run afoul of compiler optimizations.
proposal: testing: a less error-prone API for benchmark iteration #48768 proposes testing.Iterate, which forces evaluation of all arguments and results of a function, in addition to abstracting away the b.N loop, which is another common benchmarking mistake. However, its heavy use of reflection would be difficult to make zero or even low overhead, and it lacks static type-safety. It also seems likely that users would often just pass a func() with the body of the benchmark, negating its benefits for argument and result evaluation.
runtime.KeepAlive can be used to force evaluation of the result of a function under test. However, this isn’t the intended use and it’s not clear how this might interact with future optimizations to KeepAlive. It also can’t be used for arguments because it doesn’t return anything. @cespare has some arguments against KeepAlive in this comment.

The text was updated successfully, but these errors were encountered:

eliben · 2023-07-05T16:28:15Z

Bikeshedding the name aside (Keep SGTM), I really like this proposal.

It's simple and sufficient. It doesn't prevent us from working on a different API like the proposed Iterate in the future.

rsc · 2023-07-05T16:46:39Z

If we planned to do Iterate, we might not want to also do Keep. That said, I think the drawbacks listed above for Iterate are quite serious and we should simply not do it.

aclements · 2023-07-05T17:01:08Z

It doesn't prevent us from working on a different API like the proposed Iterate in the future.

FWIW, I'm planning to file another proposal for an API that covers just the looping aspect of Iterate and would complement Keep.

earthboundkid · 2023-07-05T17:04:03Z

Doing Iterate would cause a lot of churn as all the old benchmarks are rewritten to be in the new style. They would have different nesting depths, which make the diffs harder to ignore. Adding testing.Keep (or should it be b.KeepAlive()?) would cause minimal churn as just a bunch of globalSinks and runtime.KeepAlives would be replaced and only those specific lines would be affected.

rsc · 2023-07-05T18:25:49Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

bcmills · 2023-07-05T18:44:21Z

Unlike various other solutions, Keep also lets the benchmark author choose whether to treat an argument as constant or not, making it possible to benchmark expected constant folding.

Note that that is also possible with the API proposed in #48768 by closing over the intentionally-constant values:

func BenchmarkFibConstant10(b *testing.T) {
	b.Iterate(func() int {
		return Fib(10)
	})
}

It seems to me that this proposal and #48768 are equally expressive, and the key difference is just whether constant-propagation is opt-in (#48768) or opt-out (this proposal).

bcmills · 2023-07-05T18:52:13Z

However, [Iterate's] heavy use of reflection would be difficult to make zero or even low overhead

As I have repeatedly stated on #48768, I believe that there are several viable ways to overcome that overhead.

I am becoming somewhat frustrated that #48768 (comment) in particular seems to have been ignored. I may not be on the Go compiler team, but I am well acquainted with compiler optimization techniques, and so far nobody has explained why those techniques would not apply in this case.

bcmills · 2023-07-05T18:56:08Z

and it lacks static type-safety.

While that is true, any type mismatch errors would be diagnosed immediately if the benchmark is ever actually run, and a similar lack of type safety was not seen as a significant barrier for the closely-related fuzz testing API (#44551).

rsc · 2023-07-05T18:58:59Z

@bcmills, my experience over >25 frustrating years of trying to benchmark things is that, in general, attempting to subtract out per-loop overhead sounds good in theory, but in practice that overhead can and often does include various random noise. And the more overhead there is, the louder the noise. This means if you are trying to benchmark a very short operation, then subtracting out a (different) reflect.Call measurement is very likely to destroy the measurement, perhaps even making it negative. The best approach we have for getting the most reliable numbers we can is to introduce as little overhead as possible to begin with.

For the trivial loop for i := 0; i < b.N; i++, we just ignore the overhead of the i++, i < N entirely and include it as part of the thing being measured. This turns out to be far more accurate than trying to subtract it out.

bcmills · 2023-07-05T18:59:54Z

	testing.Keep(Fib(testing.Keep(10)))

From what I can tell, this would require N+1 calls to Keep in order to benchmark a function with N arguments. Although N is usually fairly small, that still seems like a very noisy call site for even a modest number of arguments.

rsc · 2023-07-05T19:01:25Z

The main place where testing.Keep is needed is around the overall result. I write code to work around that all the time.
It is far less common to need to worry about making the arguments opaque. I can't remember ever doing that.

rsc · 2023-07-05T19:03:50Z

I see now that you also mentioned making b.Iterate a compiler intrinsic. I suppose that is possible, but it seems very special-case. At that point it's basically a back-door language change, since either you can't do x := b.Iterate; x(Fib, 10) or it does something very different from b.Iterate(Fib, 10).

bcmills · 2023-07-05T19:09:24Z

It is far less common to need to worry about making the arguments opaque. I can't remember ever doing that.

I expect that that will become more common as the compiler gets better at inlining. That said, it is also more straightforward to work around (without new API) today, such as by alternating among multiple entries in a slice of inputs.

bcmills · 2023-07-05T19:18:57Z

I agree that subtracting out the overhead from a naive implementation based on reflect.Call does not seem viable.

Making b.Iterate itself a compiler intrinsic is one possible alternative, although I agree that the implication for Iterate as a method-value is unfortunate.

I think probably the most promising approach is an implementation that sets up a stack frame with arguments and then repeatedly invokes the function starting from that frame. It isn't obvious to me whether the reflect.Caller API in #49340 is sufficient for that or if it would need some other hook, but even in that case the hook could be provided in internal/reflectlite or a similar internal package.

rsc · 2023-07-05T19:30:46Z

The stack frame implementation would not be able to set up the arguments just once. It would have to set them up on every iteration, since in general a function can overwrite its arguments, and many do. reflect.Caller would amortize the allocation but not the setup.

aclements · 2023-07-05T20:17:39Z

All good points, @bcmills.

From what I can tell, this would require N+1 calls to Keep in order to benchmark a function with N arguments. Although N is usually fairly small, that still seems like a very noisy call site for even a modest number of arguments.

I'm not sure if you're referring to "line noise" here (which, I agree, this does introduce a fair amount of line noise) or measurement noise. For the latter, a naive implementation of testing.Keep will introduce just a CALL/RET pair, and that we could easily optimize away either by making the compiler recognize no-op functions, or by making it recognize this particular function. Intrinsify-ing this function seems more palatable than intrinsify-ing Iterate, though that's just my opinion.

Making b.Iterate itself a compiler intrinsic is one possible alternative, although I agree that the implication for Iterate as a method-value is unfortunate.

Another possible option is that we make sure b.Iterate can be inlined and then teach the compiler how to eliminate a reflect.Call of a statically-known function. That feels less "special" than teaching it about b.Iterate. I'm not sure this is a good option, though, since it would also have to figure out the loop that sets up the reflect.Value arguments, and have to deal with the type-checking that reflect would be doing at run-time.

I'm not that concerned about people capturing b.Iterate (or any alternative) as a method value.

We already do some code generation for tests. Is there anything we could code-generate to help with this? We don't rewrite any test code right now, so this might require pushing that too far.

The stack frame implementation would not be able to set up the arguments just once. It would have to set them up on every iteration, since in general a function can overwrite its arguments, and many do.

Not to mention, I would expect most or all of the arguments to be passed in registers. We would certainly have to re-initialze those.

bcmills · 2023-07-05T20:31:24Z

What I had in mind is something like two functions: a (somewhat expensive) setup hook that checks types and copies function arguments from a slice into a more compact argument block, and a (cheap) “invoke” hook that initializes the call stack frame, copies arguments into registers, and calls the function.

The argument block might look something like:

+------------------------------------------------+
| pointer to GC map for argument block           |
+------------------------------------------------+
| function address                               |
+------------------------------------------------+
| closure address                                |
+------------------------------------------------+
| # of integer argument registers                |
+------------------------------------------------+
| # of FP argument registers                     |
+------------------------------------------------+
| spill + stack-assigned result area size        |
+------------------------------------------------+
| stack-assigned argument area size              |
+------------------------------------------------+
| integer register arguments                     |
|          ...                                   |
+------------------------------------------------+
| FP register arguments                          |
|          ...                                   |
+------------------------------------------------+
| stack-assigned arguments                       |
|          ...                                   |
+------------------------------------------------+

The implementation of iterate would be something like:

func (b *B) Iterate(f any, args ...any) {
	call := reflectlite.NewCall(f, args...)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		call.Invoke()
	}
}

where call.Invoke() is similar in cost to a non-inlined function call, but perhaps with a couple of extra jumps to deal with argument-register counts that aren't compile-time constants.

That seems like it might be easier than teaching the compiler to inline through reflect.Call proper, but still a lot less run-time overhead than reflect.Call. (And it still leaves the door open for the inliner to figure out call.Invoke without trying to reason its way through all of the type-checking code that precedes it.)

rittneje · 2023-07-05T22:59:25Z

As a developer I would much prefer the compiler be taught not to optimize away calls within a _test.go file instead of me having to remember to write a bunch of wrapper calls. I didn't see that listed in the alternatives, so my apologies if that has been proposed previously.

seebs · 2023-07-05T23:03:37Z

So I naively want the compiler not to optimize away things in a benchmark... but also some amount of the optimization happening would in fact be part of what the compiler would do running the code in reality, and thus, part of what I want to benchmark. The trick is distinguishing between optimizing-away the benchmark and optimizing-away some of the work inside the benchmark, which would also be optimized-away outside of the benchmark.

go101 · 2023-07-06T04:42:35Z

Another alternative name for Keep is eval.

This function is not only useful for testing, but also for non-testing code.

mateusz834 · 2023-07-07T08:59:30Z

I belive that another problem with testing.Iterate() would be with escape analysis. Right?
It would cause heap allocations when returning pointer types, so it might cause bad benchmarking results.

bcmills · 2023-07-12T18:03:16Z

It would cause heap allocations when returning pointer types, so it might cause bad benchmarking results.

The existing ABI is such that if a function that returns a pointer to a new object is not inlined into the caller, the object to which it points must be heap-allocated.
But that is part of the cost of calling the function; if you want to test how the function performs when it is fully inlined, you should benchmark an outer function that calls it in an inlineable way.

rsc · 2023-07-19T17:35:32Z

I am still concerned about the overhead of reflect in Iterate. We can't subtract it, and that means we can't reliably measure very short functions - which are the ones most likely to be affected by throwing away parts of the computation.

The compiler is going to be involved no matter what. What if it's more involved? Specifically, suppose we have a function that just does looping and takes func(), like

b.Loop(func() { Fib(10) })

or maybe

for range b.Loop {
   Fib(10)
}

and the compiler would recognize testing.B.Loop and apply Keep to every function and every argument in every call in that closure. We could still provide Keep separately and explain in the docs for Loop what the compiler is doing, in terms of Keep.

This would end up being like b.Iterate but (a) you get to write the actual calls, and (b) there is no reflect. On the other hand, the compiler does more work. But this is the Go compiler and the Go testing package; the compiler already knows about sync/atomic and math and other packages.

For that matter we could also recognize for i := 0; i < b.N; i++ { ... } and do the same to that loop body (it might still help to have something like Iterate or Loop though).

aclements · 2023-07-21T20:30:37Z

I've just filed #61515, which I consider closely related to and complementary to this proposal and the Iterate proposal. I suggest we keep discussions of trade-offs between Iterate, Keep, and Loop in this issue.

cespare · 2023-07-22T00:00:32Z

To quote #61515:

b.Loop could be a clear signal to the compiler not to perform certain optimizations in the loop body that often quietly invalidate benchmark results.

To be explicit, does that proposal include this compiler special-casing? Or is that proposal only for the API change and then in the future we will take a separate decision regarding special compilation of Loop?

You say that that proposal is complementary to this Keep proposal, but this specific change seems non-orthogonal. If we decide to do the compiler special-casing, that seems like it should bear on our decision about whether to expose Keep to the user at all.

rsc · 2023-07-24T03:57:52Z

We are considering Keep and Loop-with-implicit-Keep together.
(And the place for that combined discussion is this issue, not #61515.)

If we do the special casing, then we basically have to expose Keep too,
so that we can explain what the special casing does and provide a way
for users to have that same power themselves.

cespare · 2023-08-31T01:28:32Z

@willfaught FWIW I have a ton of code that has complex benchmarks like

func BenchmarkFoo(b *testing.B) {
	for _, bb := range []struct{
		name string
		/* lots of testing parameters */
	} {
		{ /* test case 1 */ },
		// ...
	} {
		// lots of setup code
		b.Run(bb.name, func(b *testing.B) {
			benchFoo(b, bb.x, bb.y, some, other, params)
		})
	}
}

func benchFoo(b *testing.B, x, y, z int) {
	// ...
	for i := 0; i < b.N; b++ {
		// ...
	}
}

IOW, my experience lines up with with what @aclements said: it's common that the b.N loop isn't lexically inside the BenchmarkXxx function, but the thing I want to measure is always lexically inside the b.N loop.

This doesn't seem very Go-like. Is there a precedent of taking this special-function-wrapped-around-value optimization approach before in Go?

A function approach seems more consistent with how Go does things:
[...]
where the compiler is free to disable the appropriate optimizations inside the func literal for b.Do.

Your b.Do just sounds like b.Loop from #61515, which is I guess part of this proposal as well now. Loop is described and implemented in terms of Keep. So it sounds like what you said boils down to you would like Loop but without exposing Keep. Is that right?

If so, then I guess I'd point you at @rsc's comment when I asked about this:

If we do the special casing, then we basically have to expose Keep too,
so that we can explain what the special casing does and provide a way
for users to have that same power themselves.

aclements · 2023-08-31T01:30:44Z

@aclements I don't follow. What does it mean for the loop to be factored out of the benchmark func? Which part of the benchmark func do you mean by "actual body of the benchmark"?

I mean that the b.N loop often doesn't appear directly inside a Benchmark function, like in this example:

func Benchmark(b *testing.B) {
    b.Run("1", func(b *testing.B) { f(b, 1) })
    b.Run("2", func(b *testing.B) { f(b, 2) })
}

func f(b *testing.B, arg int) {
    for i := 0; i < b.N; i++ {
        // .. do something ..
    }
}

Factoring the b.N loop out of Benchmark and into a helper function would change the behavior of this benchmark if we apply "implicit Keep" only to Benchmark functions. If instead we apply implicit Keep to b.Loop, this sort of refactoring has no effect on the benchmark results.

Of course, it's also possible that the b.N loop is in a Benchmark function but the body of it gets factored into another function. In that case, neither recognizing Benchmark functions or b.Loop will help.

Another option would be that the compiler recognizes a loop over testing.B.N and deoptimizes that, wherever it appears. To me, that feels more magic than deoptimizing a b.Loop loop, but also deoptimizing b.Loop loops doesn't preclude deoptimizing testing.B.N loops. We could do both.

@rsc is planning to gather some data on how often the b.N loop is factored out of the Benchmark function. He had to analyze through this for the vet check from #38677, and said that it seemed to be pretty common, but he'll get hard data on that.

(Haha, looks like @cespare beat me to this point by 2 minutes. 😄)

Go benchmarks don't have a history that the Go tool compares results against

I believe comparing across time is one of the most common uses of benchstat. We regularly get reports of things that have slowed down from one release to another. On the Go team, we certainly do this all the time with https://perf.golang.org/dashboard/.

I'm surprised as well. This doesn't seem very Go-like. Is there a precedent of taking this special-function-wrapped-around-value optimization approach before in Go?

There isn't precedent in Go for any of these approaches. I could see an argument for not doing any implicit Keep/benchmark deoptimization because any such approach is too implicit, but I think that's not the argument you're making.

A function approach seems more consistent with how Go does things

This is what I originally proposed in #61515. However, it's harder to eliminate the overhead of that for very short benchmarks (certainly not impossible, but it requires more inlining and more complex inlining). This also opens the possibility of passing something that isn't just a function literal, in which case we definitely wouldn't be able to deoptimize the loop body.

go101 · 2023-08-31T05:59:47Z

It is some weird that the Keep function is added to the testing package. At least it should be put in the runtime package.
Building it in is even better.

ianlancetaylor · 2023-08-31T22:07:26Z

The point of testing.Keep is to force evaluation in a benchmark. It's not expected to have much application outside of benchmarking code. So there doesn't seem to be an obvious reason to put it in the runtime package.

go101 · 2023-09-01T04:49:38Z

What is the effect of calling testing.Keep in non-test files? Same as in test files or not?

ianlancetaylor · 2023-09-01T23:28:25Z

It should be the same whether it is in a test file or not.

go101 · 2023-09-02T10:01:50Z

So it is a general purpose function in syntax/semantics, but a testing specific function subjectively. Not a big problem though.

willfaught · 2023-09-03T06:17:06Z

@cespare @aclements Thanks for the explanations.

Your b.Do just sounds like b.Loop from #61515, which is #61179 (comment). Loop is described and implemented in terms of Keep.

I agree it's the same as #61515, although the Loop in this proposal seems to be different, not taking a function, and returning a boolean.

Regarding that change, can this proposal be updated to include that? It's difficult to track the current state of the proposal by piecing together all the comments.

So it sounds like what you said boils down to you would like Loop but without exposing Keep. Is that right?

Yes, assuming you mean the Loop from #61515, and not the Loop here, as explained just above.

If so, then I guess I'd point you at #61179 (comment) when I asked about this:

If we do the special casing, then we basically have to expose Keep too, so that we can explain what the special casing does and provide a way for users to have that same power themselves.

Why do we need to expose Keep to explain what Loop is doing? I don't see why we can't explain what Loop does in the same way, e.g. "All function values, all arguments, and all function results are forced to be evaluated etc etc etc..." Why do users need this general power?

What if we limit the disabled optimizations to just func literals that are assigned to a new package testing type type BenchFunc func(), with func (*B) Loop(BenchFunc). Then

func Benchmark1(b *testing.B) {
    b.Loop(func() { Fib(10) }) //  Not optimized
}

func Benchmark2(b *testing.B) {
    var notOptimized testing.BenchFunc = func() { Fib(10) }
    var optimized func() = func() { Fib(10) }
    b.Loop(notOptimized)
    b.Loop(testing.BenchFunc(optimized))
}

work as expected. @cespare's example would be

func BenchmarkFoo(b *testing.B) {
	for _, bb := range []struct{
		name string
		/* lots of testing parameters */
	} {
		{ /* test case 1 */ },
		// ...
	} {
		// lots of setup code
		b.Run(bb.name, func(b *testing.B) {
			benchFoo(b, bb.x, bb.y, some, other, params)
		})
	}
}

func benchFoo(b *testing.B, x, y, z int) {
	// ...
	b.Loop(func() {
                Foo(x, y, z)
        }
}

aclements · 2023-09-03T20:05:28Z

Regarding that change, can this proposal be updated to include that? It's difficult to track the current state of the proposal by piecing together all the comments.

The #61515 proposal does include a pointer to the latest version in the top post. We tend not to do significant rewrites of the top post in a proposal because then it makes it hard to follow the conversation that follows it, and instead add updates to it linking to the comment explaining the latest version. There's no really ideal way to do this. It may be that the way I wrote the update to #61515 wasn't clear enough, so I've tried to rewrite it.

Why do we need to expose Keep to explain what Loop is doing?

You're right that we can explain how Loop deoptimizes without exposing Keep. However, not exposing Keep limits refactoring opportunities, and also makes it impossible to write examples the allow partial optimization like in @bcmills' comment. Granted, we expect both of these situations to be rare.

What if we limit the disabled optimizations to just func literals that are assigned to a new package testing type type BenchFunc func(), with func (*B) Loop(BenchFunc).

This seems strictly more complicated to me.

Earlier you argued that "A function approach seems more consistent with how Go does things", but I'm not sure I agree with that. Go APIs tend not to reach for closures when simpler and more direct constructs will do. For example, for b.Loop() { ... } makes it clear even if you don't know what b.Loop is that we're going to execute the code in the body of the loop some number of times, and it can't involve any potentially complicated capture or scoping. Passing a closure, on the other hand, enforces no constraints on how or when that closure may be invoked, or the capture behavior of state used by that closure.

aclements · 2023-09-03T20:08:31Z

examples the allow partial optimization like in @bcmills' comment

Oops, I guess his example doesn't technically show partial optimization since there's only one argument to the function under test. Partial optimization would mix one (or more) argument passed in the b.Loop body to the intermediate closure with or (or more) argument passed directly from the intermediate closure. The former would not be constant-propagated, while the latter could be.

willfaught · 2023-09-03T21:39:28Z

The #61515 proposal does include a pointer to the latest version in the top post. We tend not to do significant rewrites of the top post in a proposal because then it makes it hard to follow the conversation that follows it, and instead add updates to it linking to the comment explaining the latest version. There's no really ideal way to do this.

@aclements Adding an hr divider at the end, followed by an "Edit: Changed to [...], see these comments [...]" line(s) would be sufficient. This is comparable to the practice of reserving the first comment for FAQs and proposal updates that I've seen the Go team use elsewhere recently, which worked well.

However, not exposing Keep limits refactoring opportunities

Can you demonstrate an example using Keep?

Oops, I guess his example doesn't technically show partial optimization since there's only one argument to the function under test. Partial optimization would mix one (or more) argument passed in the b.Loop body to the intermediate closure with or (or more) argument passed directly from the intermediate closure. The former would not be constant-propagated, while the latter could be.

Can you demonstrate an example using Keep?

rsc · 2023-09-06T17:27:41Z

The auto-Keep during b.Loop could be applied the same in any loop that counts from 0 to b.N where b has type *testing.B. Then b.Loop is less special - the compiler just applies it to both kinds of benchmark loops.

rsc · 2023-09-20T17:31:56Z

If we make Keep auto-apply inside b.N loops, then b.Loop is no longer special, and converting a b.N loop to a b.Loop loop is not a performance change at all, so it seems like we should make Keep auto-apply inside b.N loops. That will also fix a very large number of Go microbenchmarks, although it will look like it slowed them down.

That leaves the question of whether to auto-Keep at all. If we don't, we leave a big mistake that users, even experienced ones, will make over and over. The compiler can do the right thing for us here.

Maybe it would help if someone could try a compiler implementation and see how difficult this is.

aclements · 2023-10-04T17:33:42Z

However, not exposing Keep limits refactoring opportunities

Can you demonstrate an example using Keep?

Suppose you have a function like

func Benchmark(b *testing.B) {
    for b.Loop() {
        ... complicated body ...
    }
}

In this form, we're proposing that we automatically apply Keep to "complicated body". However, if you were to refactor out parts of the complicated body into a new function:

func Benchmark(b *testing.B) {
    for b.Loop() {
        ... less complicated body, calls helper ...
    }
}

func helper() {
    ... parts of complicated body ...
}

The code in helper no longer gets Keep applied automatically, so the programmer would need to call Keep themself.

Partial optimization would mix one (or more) argument passed in the b.Loop body to the intermediate closure with or (or more) argument passed directly from the intermediate closure.

Can you demonstrate an example using Keep?

I guess you don't actually need Keep to express partial optimization. Here's an example of partial optimization, where one argument to the benchmarked function can be constant-propagated, but the other is treated as a variable even though it's a constant.

func BenchmarkPowPartialConstant(b *testing.B) {
	xPow20 := func(a float64) float64 {
		return math.Pow(a, 20)
	}

	for b.Loop() {
		// Compute 5 ** 20. This allows the '20' to be constant-propagated,
		// but treats the '5' as a variable.
		xPow20(5)
	}
}

Another way to express this does use Keep:

func BenchmarkPowPartialConstant(b *testing.B) {
	xPow := func() float64 {
		return math.Pow(math.Keep(5), 20)
	}

	for b.Loop() {
		// Compute 5 ** 20. This is defined separately to avoid auto-Keep
		// so the '20' can be constant-propagated.
		xPow()
	}
}

In both of these, xPow returns the result of math.Pow, so auto-Keep in the loop keeps the result. Otherwise, the call to math.Pow could be eliminated entirely since it's pure.

meling · 2023-10-05T06:13:50Z

Here is an ergonomic thing that doesn't seem to have been considered in the above discussion. When the benchmarked function has multiple input parameters and return values (of different types):

for i := 0; i < b.N; i++ {
	bb, _ = bbhash.New(gamma, partitions, keys)
}

It would be nice to be able to write this...

for i := 0; i < b.N; i++ {
	testing.Keep(bbhash.New(gamma, partitions, keys))
}

However, since the return types may be different, the proposed Keep function won't work for the above scenario.

A variadic variant like the one below would work for the return types:

func Keep(v ...any) []any {
	return v
}

But it doesn't work for the input arguments if one wants to avoid testing.Keep for each argument. Maybe that's not a relevant concern?

aclements · 2023-10-09T21:15:03Z

@meling , that's an interesting point. Unfortunately, while I agree that makes Keep a little more ergonomic for that one case, it seems to make it less ergonomic for any case that needs to use the result of Keep, which includes using it on arguments to functions and using the return value from functions.

For example, to fully write out your example with the signature you proposed, you'd need to write:

for i := 0; i < b.N; i++ {
	r := testing.Keep(bbhash.New(gamma, partitions, keys))
	bb = r[0].(something)
}

That type assertion will also add a little benchmarking overhead.

With the original proposed signature, you would write this as:

for i := 0; i < b.N; i++ {
	bb, _ = bbhash.New(gamma, partitions, keys)
	testing.Keep(bb)
}

That doesn't seem very bad to me.

Also, the idea with auto-Keep is that in these cases you wouldn't have to write Keep at all.

meling · 2023-10-09T23:38:09Z

Yeah, I agree. testing.Keep(bb) is definitely reasonable.

The main reason I brought it up was that it didn't seem like it was discussed. Moreover, I had some initial idea that perhaps it would be valuable to have some form of "type matching" (for lack of a better term):

func Keep[T any{0,3}](v T) T

Where the T would match 0-3 input and corresponding output types, which could be different types (not sure this notation makes that clear enough, though). Something like this could also be helpful for the iter.Seq definition being considered.

However, I removed it since I didn't want to distract the first message with such a "controversial" proposal 😅... Don't know if this is worth writing up as a separate proposal.

rsc · 2023-10-11T20:55:28Z

Placed on hold.
— rsc for the proposal review group

aclements · 2023-10-12T01:26:55Z

To be clear, this is on hold pending an experimental implementation, per #61179 (comment)

randall77 · 2023-10-24T19:58:27Z

Just to loop back here, the proposal is to both:

Add a function Keep to the testing package
Auto-add calls to Keep in some situations

Number 1 seems pretty easy.
Number 2 could be pretty difficult. This proposal is on hold for someone to prototype 2.

The question is, would we want to do 1 without also doing 2? Particularly, I think we could do 1 for 1.22 if we want. Getting 2 in for 1.22 looks more speculative.

But then there is a problem: having 1 and not 2 will encourage people to add lots of explicit testing.Keep calls that are unnecessary once 2 lands (because they would have been auto-added). So it sounds like a recipe for unnecessary churn. Hence, my thought is we hold on 1 as well until 2 is ready.

eliben · 2023-10-24T20:21:50Z

But then there is a problem: having 1 and not 2 will encourage people to add lots of explicit testing.Keep calls that are unnecessary once 2 lands (because they would have been auto-added). So it sounds like a recipe for unnecessary churn. Hence, my thought is we hold on 1 as well until 2 is ready.

Isn't this the case for runtime.KeepAlive already today? Benchmarks can call it or just use custom sink solutions to thwart optimizations.

IMHO we can do (1) first, in 1.22. Whatever calls to testing.Keep people will add is likely to fix many benchmarks, which is a positive development overall. Unless we plan to do (2) without doing (1) at all (exposing it to users), I vote for (1) in 1.22

randall77 · 2023-10-24T20:24:10Z

Isn't this the case for runtime.KeepAlive already today?

It is for function results, true. It can't be used to solve the function arg problem.

cespare · 2023-10-24T20:28:44Z

I'm concerned about losing momentum on this. Is there someone who is planning on doing the prototype work for auto-Keep?

If it is going to be years before we get auto-Keep, I'd much rather we decouple these and land Keep now. Then we at least have a path forward to eliminate the hacky workarounds (KeepAlive, sinks) while we wait for the better thing which may or may not happen.

eliben · 2023-10-24T20:44:50Z

Isn't this the case for runtime.KeepAlive already today?

It is for function results, true. It can't be used to solve the function arg problem.

Yes, my argument isn't that runtime.KeepAlive gives us what we want; on the contrary, I'm arguing that the problem of "people will just splat Keep all over the place" already exists today with runtime.KeepAlive, so I don't think this is a good reason to delay shipping testing.Keep as soon as we can.

rsc · 2023-11-02T12:58:29Z

@eliben and @cespare, I hear your concern, but there are only a few weeks left in the Go 1.22 cycle, Austin is away, and we don't have consensus on the actual design yet, nor an implementation. This will have to wait for Go 1.23.

aclements added the Proposal label Jul 5, 2023

gopherbot added this to the Proposal milestone Jul 5, 2023

rsc mentioned this issue Jul 5, 2023

proposal: testing: a less error-prone API for benchmark iteration #48768

Closed

aclements mentioned this issue Jul 21, 2023

testing: add testing.B.Loop for iteration #61515

Open

This comment was marked as resolved.

Sign in to view

rsc added Proposal-Hold and removed Proposal-FinalCommentPeriod labels Oct 11, 2023

thanm mentioned this issue Oct 24, 2023

Go compiler and runtime meeting notes #43930

Open

mknyszek mentioned this issue Nov 28, 2023

cmd/compile/internal/base: GODEBUG=zerocopy=0 tag doesn't seem to work #64427

Closed

proposal: testing: add Keep, to force evaluation in benchmarks #61179

proposal: testing: add Keep, to force evaluation in benchmarks #61179

Comments

aclements commented Jul 5, 2023

Proposal

Alternatives

eliben commented Jul 5, 2023

rsc commented Jul 5, 2023

aclements commented Jul 5, 2023

earthboundkid commented Jul 5, 2023

rsc commented Jul 5, 2023

bcmills commented Jul 5, 2023

bcmills commented Jul 5, 2023

bcmills commented Jul 5, 2023

rsc commented Jul 5, 2023 • edited

bcmills commented Jul 5, 2023

rsc commented Jul 5, 2023

rsc commented Jul 5, 2023

bcmills commented Jul 5, 2023

bcmills commented Jul 5, 2023 • edited

rsc commented Jul 5, 2023

aclements commented Jul 5, 2023

bcmills commented Jul 5, 2023

rittneje commented Jul 5, 2023

seebs commented Jul 5, 2023

go101 commented Jul 6, 2023

mateusz834 commented Jul 7, 2023

bcmills commented Jul 12, 2023 • edited

rsc commented Jul 19, 2023

aclements commented Jul 21, 2023

cespare commented Jul 22, 2023

rsc commented Jul 24, 2023

cespare commented Aug 31, 2023 • edited

aclements commented Aug 31, 2023 • edited

go101 commented Aug 31, 2023

ianlancetaylor commented Aug 31, 2023

go101 commented Sep 1, 2023

ianlancetaylor commented Sep 1, 2023

go101 commented Sep 2, 2023

willfaught commented Sep 3, 2023 • edited

aclements commented Sep 3, 2023

aclements commented Sep 3, 2023

willfaught commented Sep 3, 2023

rsc commented Sep 6, 2023

rsc commented Sep 20, 2023

aclements commented Oct 4, 2023

meling commented Oct 5, 2023

This comment was marked as resolved.

aclements commented Oct 9, 2023

meling commented Oct 9, 2023

rsc commented Oct 11, 2023

aclements commented Oct 12, 2023

randall77 commented Oct 24, 2023

eliben commented Oct 24, 2023

randall77 commented Oct 24, 2023

cespare commented Oct 24, 2023

eliben commented Oct 24, 2023

rsc commented Nov 2, 2023

rsc commented Jul 5, 2023 •

edited

bcmills commented Jul 5, 2023 •

edited

bcmills commented Jul 12, 2023 •

edited

cespare commented Aug 31, 2023 •

edited

aclements commented Aug 31, 2023 •

edited

willfaught commented Sep 3, 2023 •

edited