cmd/go: decide, document standard benchmark data format #14313

rsc · 2016-02-12T19:02:40Z

We propose to make the current output of go test -bench the defined format for recording all Go benchmark data. Having a defined format allows benchmark measurement programs and benchmark analysis programs to interoperate while evolving independently.

See golang.org/design/14313-benchmark-format for details.

The text was updated successfully, but these errors were encountered:

For golang/go#14313. Change-Id: Ib9483714bbd004ff2be6cfa0d6e730d2d7f5da42 Reviewed-on: https://go-review.googlesource.com/19490 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>

btracey · 2016-02-12T22:59:57Z

Why have the key start with a lowercase letter rather than an uppercase one? Wouldn't having it start with an uppercase letter make it easier to write go code that unmarshals the configuration (for example, not needing struct tags like in json)?

nigeltao · 2016-02-12T23:23:25Z

A line from the example output:

BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op

is it possible to change "100" to "100 iters" or "100 runs" so that every number also has a unit? Or will that be backwards incompatible?

aclements · 2016-02-12T23:31:01Z

@nigeltao, unfortunately that would be backwards incompatible.

cespare · 2016-02-12T23:34:33Z

What's the reasoning for this restriction?

Tooling can expect that the unit strings are identical for all runs to be compared; for example, a result reporting “ns/op” need not be considered comparable to one reporting “µs/op.”

It doesn't seem like it makes the job of a parser author harder to just drop this restriction (and generation code can pick an appropriate unit per line, if desired).

aclements · 2016-02-12T23:48:55Z

@cespare, in the general case dropping that restriction would make the job of the parser much harder, if not impossible. Since we're proposing that benchmarks can emit custom metrics, that would require the parser to know how to conform arbitrary scalings of arbitrary units. It would also require that we specify much more precisely the syntax of units. Currently we've specified the unit syntax enough to make "best effort" rescaling possible in benchmark processors, but even if a processor doesn't understand a unit, it can at least expect it to be spelled consistently so it can match up results.

cespare · 2016-02-13T00:54:27Z

@aclements got it, thanks.

cespare · 2016-02-13T01:01:48Z

This proposal adds one new kind of line: configuration. What do you think about also standardizing a column header line format? Right now the format doesn't provide any mechanism for carrying the context around with it, so a line like this

BenchmarkDecodeDigitsSpeed1e4-8          100        154125 ns/op      64.88 MB/s       40418 B/op          7 allocs/op

only makes sense given the current benchmark conventions.

Now, given that lines other than configuration lines and benchmark result lines are ignored, it's possible for someone to make up their own format for column headers. But might it be better to settle on a single convention from the start?

(A slightly different convention would be to decide on a well-known configuration key (columns?) which specifies the column names.)

cespare · 2016-02-13T01:02:07Z

By the way, I commented on a few typos in the CL (https://go-review.googlesource.com/#/c/19490/) after it was already submitted.

ulikunitz · 2016-02-13T06:34:37Z

I wonder whether the specification should call out that the file is UTF-8 encoded and that lines are separated by line feeds.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

ulikunitz · 2016-02-13T07:17:40Z

I suggest to require all lines beginning with a # character to be ignored.

Currently

#foo: bar

is a valid configuration line. It cannot be ignored because of the word other in

All other lines in the data file, including but not limited to blank lines and lines beginning with a # character, are ignored.

josharian · 2016-02-15T20:27:58Z

LGTM overall. Seems very useful.

Some nits and questions:

space characters separate “key:” from “value.”

0 or more space characters, or 1 or more space characters?

Value can be omitted entirely but the colon must still be present.

Presumably if the answer to the above is "1 or more", then the spaces may also be omitted in this case.

The first field is the benchmark name, which must begin with Benchmark and is typically followed by a capital letter, as in BenchmarkReverseString. Tools displaying benchmark data conventionally omit the Benchmark prefix.

It'd be better to require a following capital letter. If the name is just Benchmark and the tool omits the prefix, the name will be blank, which is likely to confuse the user or look like a bug. The Go tool already requires this of benchmark function names.

The same benchmark name can appear on multiple result lines, indicating that the benchmark was run multiple times.

This is a mismatch with running benchmarks over multiple packages right now. Using tip:

$ go test -bench=. encoding/base32 encoding/base64
PASS
BenchmarkEncodeToString-8      30000         42033 ns/op     194.89 MB/s
BenchmarkDecodeString-8        10000        182128 ns/op      71.99 MB/s
ok      encoding/base32 3.593s
PASS
BenchmarkEncodeToString-8     100000         22681 ns/op     361.17 MB/s
BenchmarkDecodeString-8        20000         99789 ns/op     109.47 MB/s
ok      encoding/base64 5.487s

According to the proposal, this would be interpreted incorrectly as two benchmarks run twice, rather than four benchmarks run once.

If this proposal were adopted, however, we could add to the output a config line to disambiguate, perhaps like:

$ go test -bench=. encoding/base32 encoding/base64
PASS
package: encoding/base32
BenchmarkEncodeToString-8      30000         42033 ns/op     194.89 MB/s
BenchmarkDecodeString-8        10000        182128 ns/op      71.99 MB/s
ok      encoding/base32 3.593s
PASS
package: encoding/base64
BenchmarkEncodeToString-8     100000         22681 ns/op     361.17 MB/s
BenchmarkDecodeString-8        20000         99789 ns/op     109.47 MB/s
ok      encoding/base64 5.487s

The main known issue with the current go test -bench is that we'd like to emit finer-grained detail about runs, for linearity testing and more robust statistics. This proposal allows that by simply printing more result lines.

Link to issue 10669.

We anticipate that the new x/perf subrepo will include a library for loading benchmark data from files

Will https://godoc.org/golang.org/x/tools/benchmark/parse be updated? Deprecated? Replaced?

rsc · 2016-02-16T19:47:44Z

Thanks for the comments.

@ulikunitz:

I wonder whether the specification should call out that the file is UTF-8 encoded and that lines are separated by line feeds.

Sure.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

It's a fascinating question, but one that maybe we should leave open, at least at first. I can't think of any units that wouldn't have exactly one slash, but maybe (probably) I am just not thinking hard enough.

I suggest to require all lines beginning with a # character to be ignored.

Nice catch. Thank you.

rsc · 2016-02-16T20:11:23Z

@josharian:

0 or more space characters, or 1 or more space characters?

1 or more ("x:y: 12" sets "x:y" = "12").

Presumably if the answer to the above is "1 or more", then the spaces may also be omitted in this case.

Yes.

It'd be better to require a following capital letter. If the name is just Benchmark and the tool omits the prefix, the name will be blank, which is likely to confuse the user or look like a bug. The Go tool already requires this of benchmark function names.

Interesting corner case, thanks. Actually the go command allows plain Benchmark. The tool will have to rename it to something. The go command does require that if something follows "Benchmark", that something must not be a lower case letter: "Benchmarked" is not a benchmark.

If this proposal were adopted, however, we could add to the output a config line to disambiguate,

Nice.

Will https://godoc.org/golang.org/x/tools/benchmark/parse be updated? Deprecated? Replaced?

I expect deprecated in favor of a new parser in x/perf (see #14304). The code in package parse is very tied to benchcmp's view of the world (unsurprising since it was basically an mv from cmd/benchcmp). Worse, benchcmp's view of the world - which I accept full blame for - is fundamentally flawed, since it doesn't understand the idea of running a benchmark multiple times and therefore cannot do any statistical validity checks. A general parsing library should really provide something closer to the raw input, leaving analysis to the code invoking the parser. For benchstat, I started by using package parse and trying to recook the results into the form I wanted, but I found that it was easier to parse the raw data from scratch.

For that matter, benchstat will move to x/perf, but probably, due to the statistical failings, benchcmp will be left behind, deprecated, and eventually removed. Not for a while though, and somewhat off-topic for the current proposal.

aclements · 2016-03-22T18:52:06Z

Thanks for the comments! I believe I've addressed all of them in https://go-review.googlesource.com/21030.

I suggest to require all lines beginning with a # character to be ignored.

Good catch. I made the rule a bit stricter than you proposed by requiring that the first character satisfy unicode.IsLower(). If there's a compelling case to weaken this, we can do so in the future.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

It's a fascinating question, but one that maybe we should leave open, at least at first. I can't think of any units that wouldn't have exactly one slash, but maybe (probably) I am just not thinking hard enough.

There are useful units with zero slashes, such as binary size in a compiler benchmark and median GC utilization (though you could argue these are "per iteration"). There are also useful units with multiple slashes, such as bytes/CPU/sec. You could represent this as bytes/(CPU*sec), but this goes one step deeper into the rabbit hole of specifying unit syntax, which we're trying to avoid at least for now. I've been using some of these more complex units in the GC benchmark suite I've been putting together (see metrics.go).

aclements · 2016-03-22T19:11:21Z

This proposal adds one new kind of line: configuration. What do you think about also standardizing a column header line format?

I'm not sure I understand what you're proposing. What would be in the column header line format that isn't already in the benchmark line? Could you give a concrete example?

cespare · 2016-03-22T19:32:54Z

@aclements the meaning of the columns isn't given in the benchmark line. The unit alone does not tell you what is being measured. A header line would give this information:

name                                     iters      time/iter         throughput       allocs              allocs
BenchmarkDecodeDigitsSpeed1e4-8          100        154125 ns/op      64.88 MB/s       40418 B/op          7 allocs/op

(Apologies for the lousy names.)

It's less relevant today because Go benchmarks have a known set (and ordering) of columns, but your format says that this is also a valid benchmark line:

BenchmarkFoo     1000        12.34 MB/s

and without more context one cannot know what the 12.34 MB/s represents. (Compression rate? alloc rate? network throughput?)

Anyway, as I mentioned, there's plenty of leeway in the specification for anyone to create their own header convention; I was just wondering if it makes sense to prescribe one from the outset.

aclements · 2016-03-22T21:06:47Z

@cespare, interesting. I think I understand. This is the difference between dimensions and units: one tells you what's being measured (e.g., "time"), while the other relates a value to a specific physical quantity (e.g., "minutes", and "seconds").

I've actually been fighting with this in a general unit parser/renormalizer I wrote for benchstat, since benchstat wants to show a dimension for its own column headers, and values with nicely (and differently) scaled units in the rows. Specifying it explicitly feels almost unnecessary because you can usually derive the dimension from the unit, but you're right that some of the units used by go test are quite vague. Arguably, these units are bad: any two numbers in the same or conformable units should be comparable, but "B/op" could many any number of different things that aren't necessary comparable. It might be possible to always derive reasonable dimensions from "better" units, such as "bytes allocated/op".

So, another solution is to say that we know what ns/op, MB/s, B/op, and allocs/op mean, but any custom metrics have to have "better" units.

josharian · 2016-03-26T20:58:06Z

I just hit this myself. rsc/benchstat#5

aclements · 2016-04-20T21:44:54Z

I just encountered one annoying problem with this format. If there are no tests (such as in the go1 benchmarks), the testing package prints "testing: warning: no tests to run", which this format interprets as a configuration line with key "testing" and value "warning: no tests to run".

rsc · 2016-04-21T01:19:19Z

@mpvl can we change that print to use the SKIP form you had suggested?

minux · 2016-04-21T06:59:06Z

Or we can specifically reserve and ignore the "testing" configuration key, which is reasonable given that "testing" is the package name.

davecheney · 2016-04-21T08:52:45Z

The 'testing ...' line is written to stderr. It should not be present in
stdout.

On Thu, Apr 21, 2016 at 3:59 PM, Minux Ma notifications@github.com wrote:

Or we can specifically reserve and ignore the "testing" configuration key,
which is reasonable given that "testing" is the package name.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#14313 (comment)

aclements · 2016-04-21T15:19:13Z

The 'testing ...' line is written to stderr. It should not be present in stdout.

True, but I usually capture both when I'm logging benchmarks because I want to see the errors in the log.

davecheney · 2016-04-22T06:35:43Z

I'm sorry. This was my misunderstanding, I thought that you were piping the
benchmark to a file, then the testing: line was throwing off a tool that
post processed that file.

On Fri, 22 Apr 2016, 00:19 Austin Clements, notifications@github.com
wrote:

The 'testing ...' line is written to stderr. It should not be present in
stdout.

True, but I usually capture both when I'm logging benchmarks because I
want to see the errors in the log.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#14313 (comment)

bradfitz · 2016-08-22T23:19:42Z

Closing this as a dup of #2981 (JSON as the standard format?), because both would likely be resolved regardless of what is decided.

Feel free to reopen this if you feel otherwise.

aclements · 2016-08-23T00:31:05Z

@bradfitz, to be clear, the currently proposed JSON format doesn't capture most of what's interesting about this proposal (custom metrics and benchmark metadata), and according to #2981 (comment), it seems these things are out of scope for the current JSON proposal.

Not that there's any harm in closing this issue. We're already using this format. But it's not a dup of #2981 since it covers different ground.

josharian · 2016-08-23T01:00:51Z

I agree—not a dup. There's a lot happening here beyond mere encoding. And it's not clear to me how the JSON proposal handles the header lines discussed here, which are not tied to specific benchmark runs. Re-opening.

bradfitz · 2016-08-25T02:21:27Z

Then I'm marking this as "Proposal-Accepted" so it doesn't come up in future proposal review searches. This is affected accepted because you're all using it anyway. I still think there's value in a standard format like JSON, but that discussion can continue in #2981.

We can keep this bug open to document the design proposal here.

gopherbot · 2016-11-23T18:55:57Z

CL https://golang.org/cl/33570 mentions this issue.

The benchmark format proposal was written in parallel with the sub-benchmarks support and the two didn't quite agree on the format of benchmark names. This commit makes two changes to the benchmark format document so that it agrees with the practice and implementation of sub-benchmarks in the testing package: * It replaces key:value in the benchmark name with key=value, which is the format suggested by the sub-benchmarks documentation. * It removes the special rule about reporting gomaxprocs as a special key=value pair. This was never implemented in the testing package, so it still emits a -N suffix for the gomaxprocs value. Technically this syntax is ambiguous, but it's what we're doing. Updates golang/go#14313. Change-Id: I67cbd735de7ab145bc7c3a93b5dadbe05120929c Reviewed-on: https://go-review.googlesource.com/33570 Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>

bcmills · 2019-01-23T18:02:50Z

Is there anything left to be done for this proposal? (Does the documentation need to be moved to some more prominent or searchable location?)

cep21 · 2019-09-08T15:38:37Z

Hi,

I didn't find one, so I wrote an API for encoding/decoding/streaming the benchmark output as defined by the spec readme at https://github.com/cep21/benchparse. I'm happy to contribute it if desired.

mdempsky · 2019-10-03T18:47:07Z

Ping @rsc @aclements .

ligurio · 2019-10-23T07:28:10Z

We are unaware of any standard formats for recording raw benchmark data, and we've been unable to find any using web searches. One might expect that a standard benchmark suite such as SPEC CPU2006 would have defined a format for raw results, but that appears not to be the case.

Looks like you want to reinvent a wheel. I know about at least these formats: JMeter, Skia, Chrome, BenchmarkAnything, Google Benchmark, dstat.

P.S. I have overlooked that it has been already merged.

rsc added this to the Proposal milestone Feb 12, 2016

rsc added the Proposal label Feb 12, 2016

rsc mentioned this issue Feb 12, 2016

proposal: cmd/go: -benchformat option in go test #12826

Closed

aclements mentioned this issue Aug 17, 2016

proposal: testing: add -json flag for json results #2981

Closed

bradfitz closed this as completed Aug 22, 2016

josharian reopened this Aug 23, 2016

bradfitz added the Proposal-Accepted label Aug 25, 2016

bradfitz added the Documentation label Aug 25, 2016

bradfitz changed the title ~~proposal: standard benchmark data format~~ cmd/go: decide, document standard benchmark data format Aug 25, 2016

bradfitz modified the milestones: Unplanned, Proposal Aug 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/go: decide, document standard benchmark data format #14313

cmd/go: decide, document standard benchmark data format #14313

rsc commented Feb 12, 2016

btracey commented Feb 12, 2016

nigeltao commented Feb 12, 2016

aclements commented Feb 12, 2016

cespare commented Feb 12, 2016

aclements commented Feb 12, 2016

cespare commented Feb 13, 2016

cespare commented Feb 13, 2016

cespare commented Feb 13, 2016

ulikunitz commented Feb 13, 2016

ulikunitz commented Feb 13, 2016

josharian commented Feb 15, 2016

rsc commented Feb 16, 2016

rsc commented Feb 16, 2016

aclements commented Mar 22, 2016

aclements commented Mar 22, 2016

cespare commented Mar 22, 2016

aclements commented Mar 22, 2016

josharian commented Mar 26, 2016

aclements commented Apr 20, 2016

rsc commented Apr 21, 2016

minux commented Apr 21, 2016 via email

davecheney commented Apr 21, 2016

aclements commented Apr 21, 2016

davecheney commented Apr 22, 2016

bradfitz commented Aug 22, 2016

aclements commented Aug 23, 2016

josharian commented Aug 23, 2016

bradfitz commented Aug 25, 2016

gopherbot commented Nov 23, 2016

bcmills commented Jan 23, 2019

cep21 commented Sep 8, 2019

mdempsky commented Oct 3, 2019

ligurio commented Oct 23, 2019 •

edited

cmd/go: decide, document standard benchmark data format #14313

cmd/go: decide, document standard benchmark data format #14313

Comments

rsc commented Feb 12, 2016

btracey commented Feb 12, 2016

nigeltao commented Feb 12, 2016

aclements commented Feb 12, 2016

cespare commented Feb 12, 2016

aclements commented Feb 12, 2016

cespare commented Feb 13, 2016

cespare commented Feb 13, 2016

cespare commented Feb 13, 2016

ulikunitz commented Feb 13, 2016

ulikunitz commented Feb 13, 2016

josharian commented Feb 15, 2016

rsc commented Feb 16, 2016

rsc commented Feb 16, 2016

aclements commented Mar 22, 2016

aclements commented Mar 22, 2016

cespare commented Mar 22, 2016

aclements commented Mar 22, 2016

josharian commented Mar 26, 2016

aclements commented Apr 20, 2016

rsc commented Apr 21, 2016

minux commented Apr 21, 2016 via email

davecheney commented Apr 21, 2016

aclements commented Apr 21, 2016

davecheney commented Apr 22, 2016

bradfitz commented Aug 22, 2016

aclements commented Aug 23, 2016

josharian commented Aug 23, 2016

bradfitz commented Aug 25, 2016

gopherbot commented Nov 23, 2016

bcmills commented Jan 23, 2019

cep21 commented Sep 8, 2019

mdempsky commented Oct 3, 2019

ligurio commented Oct 23, 2019 • edited

ligurio commented Oct 23, 2019 •

edited