Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: decide, document standard benchmark data format #14313

Open
rsc opened this issue Feb 12, 2016 · 33 comments
Open

cmd/go: decide, document standard benchmark data format #14313

rsc opened this issue Feb 12, 2016 · 33 comments

Comments

@rsc
Copy link
Contributor

rsc commented Feb 12, 2016

We propose to make the current output of go test -bench the defined format for recording all Go benchmark data. Having a defined format allows benchmark measurement programs and benchmark analysis programs to interoperate while evolving independently.

See golang.org/design/14313-benchmark-format for details.

@rsc rsc added this to the Proposal milestone Feb 12, 2016
@rsc rsc added the Proposal label Feb 12, 2016
gopherbot pushed a commit to golang/proposal that referenced this issue Feb 12, 2016
For golang/go#14313.

Change-Id: Ib9483714bbd004ff2be6cfa0d6e730d2d7f5da42
Reviewed-on: https://go-review.googlesource.com/19490
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@btracey
Copy link
Contributor

btracey commented Feb 12, 2016

Why have the key start with a lowercase letter rather than an uppercase one? Wouldn't having it start with an uppercase letter make it easier to write go code that unmarshals the configuration (for example, not needing struct tags like in json)?

@nigeltao
Copy link
Contributor

A line from the example output:

BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op

is it possible to change "100" to "100 iters" or "100 runs" so that every number also has a unit? Or will that be backwards incompatible?

@aclements
Copy link
Member

@nigeltao, unfortunately that would be backwards incompatible.

@cespare
Copy link
Contributor

cespare commented Feb 12, 2016

What's the reasoning for this restriction?

Tooling can expect that the unit strings are identical for all runs to be compared; for example, a result reporting “ns/op” need not be considered comparable to one reporting “µs/op.”

It doesn't seem like it makes the job of a parser author harder to just drop this restriction (and generation code can pick an appropriate unit per line, if desired).

@aclements
Copy link
Member

@cespare, in the general case dropping that restriction would make the job of the parser much harder, if not impossible. Since we're proposing that benchmarks can emit custom metrics, that would require the parser to know how to conform arbitrary scalings of arbitrary units. It would also require that we specify much more precisely the syntax of units. Currently we've specified the unit syntax enough to make "best effort" rescaling possible in benchmark processors, but even if a processor doesn't understand a unit, it can at least expect it to be spelled consistently so it can match up results.

@cespare
Copy link
Contributor

cespare commented Feb 13, 2016

@aclements got it, thanks.

@cespare
Copy link
Contributor

cespare commented Feb 13, 2016

This proposal adds one new kind of line: configuration. What do you think about also standardizing a column header line format? Right now the format doesn't provide any mechanism for carrying the context around with it, so a line like this

BenchmarkDecodeDigitsSpeed1e4-8          100        154125 ns/op      64.88 MB/s       40418 B/op          7 allocs/op

only makes sense given the current benchmark conventions.

Now, given that lines other than configuration lines and benchmark result lines are ignored, it's possible for someone to make up their own format for column headers. But might it be better to settle on a single convention from the start?

(A slightly different convention would be to decide on a well-known configuration key (columns?) which specifies the column names.)

@cespare
Copy link
Contributor

cespare commented Feb 13, 2016

By the way, I commented on a few typos in the CL (https://go-review.googlesource.com/#/c/19490/) after it was already submitted.

@ulikunitz
Copy link
Contributor

I wonder whether the specification should call out that the file is UTF-8 encoded and that lines are separated by line feeds.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

@ulikunitz
Copy link
Contributor

I suggest to require all lines beginning with a # character to be ignored.

Currently

#foo: bar

is a valid configuration line. It cannot be ignored because of the word other in

All other lines in the data file, including but not limited to blank lines and lines beginning with a # character, are ignored.

@josharian
Copy link
Contributor

LGTM overall. Seems very useful.

Some nits and questions:

space characters separate “key:” from “value.”

0 or more space characters, or 1 or more space characters?

Value can be omitted entirely but the colon must still be present.

Presumably if the answer to the above is "1 or more", then the spaces may also be omitted in this case.

The first field is the benchmark name, which must begin with Benchmark and is typically followed by a capital letter, as in BenchmarkReverseString. Tools displaying benchmark data conventionally omit the Benchmark prefix.

It'd be better to require a following capital letter. If the name is just Benchmark and the tool omits the prefix, the name will be blank, which is likely to confuse the user or look like a bug. The Go tool already requires this of benchmark function names.

The same benchmark name can appear on multiple result lines, indicating that the benchmark was run multiple times.

This is a mismatch with running benchmarks over multiple packages right now. Using tip:

$ go test -bench=. encoding/base32 encoding/base64
PASS
BenchmarkEncodeToString-8      30000         42033 ns/op     194.89 MB/s
BenchmarkDecodeString-8        10000        182128 ns/op      71.99 MB/s
ok      encoding/base32 3.593s
PASS
BenchmarkEncodeToString-8     100000         22681 ns/op     361.17 MB/s
BenchmarkDecodeString-8        20000         99789 ns/op     109.47 MB/s
ok      encoding/base64 5.487s

According to the proposal, this would be interpreted incorrectly as two benchmarks run twice, rather than four benchmarks run once.

If this proposal were adopted, however, we could add to the output a config line to disambiguate, perhaps like:

$ go test -bench=. encoding/base32 encoding/base64
PASS
package: encoding/base32
BenchmarkEncodeToString-8      30000         42033 ns/op     194.89 MB/s
BenchmarkDecodeString-8        10000        182128 ns/op      71.99 MB/s
ok      encoding/base32 3.593s
PASS
package: encoding/base64
BenchmarkEncodeToString-8     100000         22681 ns/op     361.17 MB/s
BenchmarkDecodeString-8        20000         99789 ns/op     109.47 MB/s
ok      encoding/base64 5.487s

The main known issue with the current go test -bench is that we'd like to emit finer-grained detail about runs, for linearity testing and more robust statistics. This proposal allows that by simply printing more result lines.

Link to issue 10669.

We anticipate that the new x/perf subrepo will include a library for loading benchmark data from files

Will https://godoc.org/golang.org/x/tools/benchmark/parse be updated? Deprecated? Replaced?

@rsc
Copy link
Contributor Author

rsc commented Feb 16, 2016

Thanks for the comments.

@ulikunitz:

I wonder whether the specification should call out that the file is UTF-8 encoded and that lines are separated by line feeds.

Sure.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

It's a fascinating question, but one that maybe we should leave open, at least at first. I can't think of any units that wouldn't have exactly one slash, but maybe (probably) I am just not thinking hard enough.

I suggest to require all lines beginning with a # character to be ignored.

Nice catch. Thank you.

@rsc
Copy link
Contributor Author

rsc commented Feb 16, 2016

@josharian:

0 or more space characters, or 1 or more space characters?

1 or more ("x:y: 12" sets "x:y" = "12").

Presumably if the answer to the above is "1 or more", then the spaces may also be omitted in this case.

Yes.

It'd be better to require a following capital letter. If the name is just Benchmark and the tool omits the prefix, the name will be blank, which is likely to confuse the user or look like a bug. The Go tool already requires this of benchmark function names.

Interesting corner case, thanks. Actually the go command allows plain Benchmark. The tool will have to rename it to something. The go command does require that if something follows "Benchmark", that something must not be a lower case letter: "Benchmarked" is not a benchmark.

If this proposal were adopted, however, we could add to the output a config line to disambiguate,

Nice.

Will https://godoc.org/golang.org/x/tools/benchmark/parse be updated? Deprecated? Replaced?

I expect deprecated in favor of a new parser in x/perf (see #14304). The code in package parse is very tied to benchcmp's view of the world (unsurprising since it was basically an mv from cmd/benchcmp). Worse, benchcmp's view of the world - which I accept full blame for - is fundamentally flawed, since it doesn't understand the idea of running a benchmark multiple times and therefore cannot do any statistical validity checks. A general parsing library should really provide something closer to the raw input, leaving analysis to the code invoking the parser. For benchstat, I started by using package parse and trying to recook the results into the form I wanted, but I found that it was easier to parse the raw data from scratch.

For that matter, benchstat will move to x/perf, but probably, due to the statistical failings, benchcmp will be left behind, deprecated, and eventually removed. Not for a while though, and somewhat off-topic for the current proposal.

@aclements
Copy link
Member

Thanks for the comments! I believe I've addressed all of them in https://go-review.googlesource.com/21030.

I suggest to require all lines beginning with a # character to be ignored.

Good catch. I made the rule a bit stricter than you proposed by requiring that the first character satisfy unicode.IsLower(). If there's a compelling case to weaken this, we can do so in the future.

It may also be helpful to specify whether or not a tool can assume that value units are always ratios, so must include a slash character. There is also the question whether a value unit with multiple slashes is undefined, forbidden or something else.

It's a fascinating question, but one that maybe we should leave open, at least at first. I can't think of any units that wouldn't have exactly one slash, but maybe (probably) I am just not thinking hard enough.

There are useful units with zero slashes, such as binary size in a compiler benchmark and median GC utilization (though you could argue these are "per iteration"). There are also useful units with multiple slashes, such as bytes/CPU/sec. You could represent this as bytes/(CPU*sec), but this goes one step deeper into the rabbit hole of specifying unit syntax, which we're trying to avoid at least for now. I've been using some of these more complex units in the GC benchmark suite I've been putting together (see metrics.go).

@aclements
Copy link
Member

This proposal adds one new kind of line: configuration. What do you think about also standardizing a column header line format?

I'm not sure I understand what you're proposing. What would be in the column header line format that isn't already in the benchmark line? Could you give a concrete example?

@cespare
Copy link
Contributor

cespare commented Mar 22, 2016

@aclements the meaning of the columns isn't given in the benchmark line. The unit alone does not tell you what is being measured. A header line would give this information:

name                                     iters      time/iter         throughput       allocs              allocs
BenchmarkDecodeDigitsSpeed1e4-8          100        154125 ns/op      64.88 MB/s       40418 B/op          7 allocs/op

(Apologies for the lousy names.)

It's less relevant today because Go benchmarks have a known set (and ordering) of columns, but your format says that this is also a valid benchmark line:

BenchmarkFoo     1000        12.34 MB/s

and without more context one cannot know what the 12.34 MB/s represents. (Compression rate? alloc rate? network throughput?)

Anyway, as I mentioned, there's plenty of leeway in the specification for anyone to create their own header convention; I was just wondering if it makes sense to prescribe one from the outset.

@aclements
Copy link
Member

@cespare, interesting. I think I understand. This is the difference between dimensions and units: one tells you what's being measured (e.g., "time"), while the other relates a value to a specific physical quantity (e.g., "minutes", and "seconds").

I've actually been fighting with this in a general unit parser/renormalizer I wrote for benchstat, since benchstat wants to show a dimension for its own column headers, and values with nicely (and differently) scaled units in the rows. Specifying it explicitly feels almost unnecessary because you can usually derive the dimension from the unit, but you're right that some of the units used by go test are quite vague. Arguably, these units are bad: any two numbers in the same or conformable units should be comparable, but "B/op" could many any number of different things that aren't necessary comparable. It might be possible to always derive reasonable dimensions from "better" units, such as "bytes allocated/op".

So, another solution is to say that we know what ns/op, MB/s, B/op, and allocs/op mean, but any custom metrics have to have "better" units.

@josharian
Copy link
Contributor

I just hit this myself. rsc/benchstat#5

@aclements
Copy link
Member

I just encountered one annoying problem with this format. If there are no tests (such as in the go1 benchmarks), the testing package prints "testing: warning: no tests to run", which this format interprets as a configuration line with key "testing" and value "warning: no tests to run".

@rsc
Copy link
Contributor Author

rsc commented Apr 21, 2016

@mpvl can we change that print to use the SKIP form you had suggested?

@minux
Copy link
Member

minux commented Apr 21, 2016 via email

@davecheney
Copy link
Contributor

The 'testing ...' line is written to stderr. It should not be present in
stdout.

On Thu, Apr 21, 2016 at 3:59 PM, Minux Ma notifications@github.com wrote:

Or we can specifically reserve and ignore the "testing" configuration key,
which is reasonable given that "testing" is the package name.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#14313 (comment)

@aclements
Copy link
Member

The 'testing ...' line is written to stderr. It should not be present in stdout.

True, but I usually capture both when I'm logging benchmarks because I want to see the errors in the log.

@davecheney
Copy link
Contributor

I'm sorry. This was my misunderstanding, I thought that you were piping the
benchmark to a file, then the testing: line was throwing off a tool that
post processed that file.

On Fri, 22 Apr 2016, 00:19 Austin Clements, notifications@github.com
wrote:

The 'testing ...' line is written to stderr. It should not be present in
stdout.

True, but I usually capture both when I'm logging benchmarks because I
want to see the errors in the log.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#14313 (comment)

@bradfitz
Copy link
Contributor

Closing this as a dup of #2981 (JSON as the standard format?), because both would likely be resolved regardless of what is decided.

Feel free to reopen this if you feel otherwise.

@aclements
Copy link
Member

@bradfitz, to be clear, the currently proposed JSON format doesn't capture most of what's interesting about this proposal (custom metrics and benchmark metadata), and according to #2981 (comment), it seems these things are out of scope for the current JSON proposal.

Not that there's any harm in closing this issue. We're already using this format. But it's not a dup of #2981 since it covers different ground.

@josharian
Copy link
Contributor

I agree—not a dup. There's a lot happening here beyond mere encoding. And it's not clear to me how the JSON proposal handles the header lines discussed here, which are not tied to specific benchmark runs. Re-opening.

@bradfitz
Copy link
Contributor

Then I'm marking this as "Proposal-Accepted" so it doesn't come up in future proposal review searches. This is affected accepted because you're all using it anyway. I still think there's value in a standard format like JSON, but that discussion can continue in #2981.

We can keep this bug open to document the design proposal here.

@bradfitz bradfitz changed the title proposal: standard benchmark data format cmd/go: decide, document standard benchmark data format Aug 25, 2016
@bradfitz bradfitz modified the milestones: Unplanned, Proposal Aug 25, 2016
@gopherbot
Copy link

CL https://golang.org/cl/33570 mentions this issue.

gopherbot pushed a commit to golang/proposal that referenced this issue Nov 29, 2016
The benchmark format proposal was written in parallel with the
sub-benchmarks support and the two didn't quite agree on the format of
benchmark names.

This commit makes two changes to the benchmark format document so that
it agrees with the practice and implementation of sub-benchmarks in
the testing package:

* It replaces key:value in the benchmark name with key=value, which is
  the format suggested by the sub-benchmarks documentation.

* It removes the special rule about reporting gomaxprocs as a special
  key=value pair. This was never implemented in the testing package,
  so it still emits a -N suffix for the gomaxprocs value. Technically
  this syntax is ambiguous, but it's what we're doing.

Updates golang/go#14313.

Change-Id: I67cbd735de7ab145bc7c3a93b5dadbe05120929c
Reviewed-on: https://go-review.googlesource.com/33570
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>
@bcmills
Copy link
Contributor

bcmills commented Jan 23, 2019

Is there anything left to be done for this proposal? (Does the documentation need to be moved to some more prominent or searchable location?)

@cep21
Copy link
Contributor

cep21 commented Sep 8, 2019

Hi,

I didn't find one, so I wrote an API for encoding/decoding/streaming the benchmark output as defined by the spec readme at https://github.com/cep21/benchparse. I'm happy to contribute it if desired.

@mdempsky
Copy link
Member

mdempsky commented Oct 3, 2019

Ping @rsc @aclements .

@ligurio
Copy link

ligurio commented Oct 23, 2019

We are unaware of any standard formats for recording raw benchmark data, and we've been unable to find any using web searches. One might expect that a standard benchmark suite such as SPEC CPU2006 would have defined a format for raw results, but that appears not to be the case.

Looks like you want to reinvent a wheel. I know about at least these formats: JMeter, Skia, Chrome, BenchmarkAnything, Google Benchmark, dstat.

P.S. I have overlooked that it has been already merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

16 participants