Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: Go compiler/runtime performance monitoring system #48803

Closed
prattmic opened this issue Oct 5, 2021 · 59 comments
Closed

all: Go compiler/runtime performance monitoring system #48803

prattmic opened this issue Oct 5, 2021 · 59 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@prattmic
Copy link
Member

prattmic commented Oct 5, 2021

The runtime/compiler team is investigating the creation of a performance monitoring system, with a primary goal of reducing release toil by catching performance regressions as early as possible.

Requirements are still being discussed, but highlights include:

  • Automation of performance data generation.
    • Benchmark execution solution that tracks commits to Go (e.g., x/build coordinator scheduling benchmark runs).
  • Storage and visualization of performance data over time.
  • Key set of "application-level" and "feature-focused" benchmarks.
  • Active monitoring of changepoints.
  • Easy debugging/reproduction.

cc @aclements @dr2chase @mknyszek @jeremyfaller @golang/release

@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 5, 2021
@mknyszek mknyszek added this to the Unreleased milestone Oct 5, 2021
@mvdan
Copy link
Member

mvdan commented Oct 5, 2021

cc @odeke-em

@odeke-em
Copy link
Member

odeke-em commented Oct 5, 2021

cc @odeke-em

Thank you for the tag @mvdan!

Indeed, at Orijtech Inc we've been working on such a product "Bencher" for the past 2 quarters of 2021 and produced it recently https://twitter.com/odeke_et/status/1428650768135974919?s=21: we have an integration on the GitHub marketplace at https://github.com/marketplace/gobencher and we actually have something on our roadmap to add a Gerrit integration during Q4 2021 or Q1 2022. I had raised and shown the product to @Sajmani, @spf13, @ianlancetaylor and @ianthehat and others :-)

For example, take a look at one of our public users Entgo https://dashboard.github.orijtech.com/benchmark/2bb34b8e166747d2974f76825819acbd
image
image

We'd be delighted to work with y'all to bring this first class to the Go tooling ecosystem!

Kindly cc-ing my colleagues @cuonglm @kirbyquerby @willpoint @jhusdero

@mengzhuo
Copy link
Contributor

mengzhuo commented Oct 6, 2021

I have some thoughts about this site/app:

  1. Transparent. All results should be available to public like benchsave instead of internal site from Google.
  2. Informative. Regression notice to Gerrit/owner of CL.
  3. Easy to compare with different baselines like major versions of Go.

@mknyszek
Copy link
Contributor

mknyszek commented Oct 6, 2021

@mengzhuo All three of those things are already on our radar. :) Thanks.

@gopherbot
Copy link

Change https://golang.org/cl/353909 mentions this issue: cmd/coordinator: don't snapshot in dev mode

gopherbot pushed a commit to golang/build that referenced this issue Oct 6, 2021
In dev mode, we have no GCS storage client, so attempting to write a
snapshot will panic.

For golang/go#48803

Change-Id: Id37264ebd765f914a55acf2fd18274020850331f
Reviewed-on: https://go-review.googlesource.com/c/build/+/353909
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@prattmic
Copy link
Member Author

prattmic commented Oct 6, 2021

@odeke-em We are still evaluating our options. We've looked at several different off-the-shelf options, including Bencher, as well as extending the existing http://build.golang.org infrastructure with what we need. We are currently leaning towards extending http://build.golang.org, as we already maintain it and it supports much of what we need already (fully open source, precise control over dedicated hardware used for benchmarking, etc).

@gopherbot
Copy link

Change https://golang.org/cl/354629 mentions this issue: cmd/buildlet: add revdial test

@gopherbot
Copy link

Change https://golang.org/cl/354630 mentions this issue: all: handle revdial redirects on connect

@gopherbot
Copy link

Change https://golang.org/cl/354638 mentions this issue: cmd/coordinator: find work in dev mode

@gopherbot
Copy link

Change https://golang.org/cl/354637 mentions this issue: cmd/coordinator: make listen address configurable

gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
This replaces the broken test in internal/coordinator/pool.

For golang/go#48803

Change-Id: I7bd9265bba555562ffa7d59169a9c8792ed97d3c
Reviewed-on: https://go-review.googlesource.com/c/build/+/354629
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
In dev mode, the coordinator uses hostPathHandler, which redirects
/reverse and /revdial to /farmer.golang.org/reverse, etc.

Establishing a revdial connection chokes on this redirect, as there it
expects to complete the protocol switch in a single request.

Add rudimentary redirect support via a helper so this works in dev mode.

Note that linkRewriter must implement http.Hijacker as
revdial.ConnHandler type-asserts the http.ResponseWriter to a Hijacker.

For golang/go#48803

Change-Id: I191fa6ff17bbd334203430f3c1f2c5e03db407ff
Reviewed-on: https://go-review.googlesource.com/c/build/+/354630
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
Though it is a good default, it is cumbersome for dev mode to listen
only on localhost when developing over SSH. Add a -listen flag to allow
overriding this with any address.

For golang/go#48803

Change-Id: If01a0b44926a33f2aa01548319508166592936e3
Reviewed-on: https://go-review.googlesource.com/c/build/+/354637
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
Start the main findWorkLoop in dev mode, to discover work to run from
the dashboard. In dev mode, was also replace the linux-amd64 builder
config with one using host-linux-amd64-localdev reverse buildlets.

This provides a complete lifecycle to test out builds with a local dev
coordinator and buildlet.

For golang/go#48803

Change-Id: I8ade6c8bccf3bc51437ca9e7d11c232753fe7465
Reviewed-on: https://go-review.googlesource.com/c/build/+/354638
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@shawndx
Copy link
Contributor

shawndx commented Nov 17, 2021

Curious to know will benchmark fluctuation be taken into account, and how? Thanks.

@prattmic
Copy link
Member Author

@shawn-xdji By fluctuation, I assume you mean noise in the results. We are taking a few approaches up front:

  • Running benchmarks on a low-noise machines. For now we plan to use "sole-tenant" VMs on GCE, though we may move to physical machines if necessary.
  • Rerunning the baseline benchmarks each time we test a new commit to help account for long term changes (e.g., OS upgrade changes performance).
  • Changepoint detection will need to statistical handle a small amount of noise.
  • We are only planning to track a curated set of benchmarks. Benchmarks that a fundamentally noisy will either be excluded or fixed.

Finally, all of this is something we'll need to learn on as we go and find out how big the issues are.

@shawndx
Copy link
Contributor

shawndx commented Nov 18, 2021

Thanks @prattmic, that answers my question, anticipate you will share some best practices later.

@gopherbot
Copy link

Change https://golang.org/cl/376634 mentions this issue: cmd/coordinator: baseline toolchain for benchmarks

gopherbot pushed a commit to golang/build that referenced this issue Jan 10, 2022
When benchmarking, we want to benchmark both the toolchain under test
(i.e., buildStatus.Rev) as well as an older "baseline" toolchain, which
will be compared against.

For now, the baseline toolchain is the latest stable release. In the
future we may want to update more frequently, but this is a simple
starting point.

This CL determines the baseline toolchain commit for a given test and
installs it on the buildlet at BENCH_BASELINE_GOROOT.
golang.org/x/benchmarks/cmd/bench is responsible for utilizing the
baseline toolchain. CL 376096 is the corresponding change to cmd/bench.

Most of the baseline toolchain logic is limited to runBenchmarkTests().
In theory, it logically fits a bit better with the rest of the toolchain
logic in build() et al, but keeping it limited to runBenchmarkTests()
helps keep the common build() path from getting much more complex for a
minor edge-case feature.

For golang/go#49207.
For golang/go#48803.

Change-Id: Id63f8333cf9d1ff952850c3347e999b5e98f7294
Reviewed-on: https://go-review.googlesource.com/c/build/+/376634
Reviewed-by: Alex Rakoczy <alex@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/392635 mentions this issue: influx: InfluxDB instance setup

gopherbot pushed a commit to golang/perf that referenced this issue Mar 17, 2022
This CL creates a Docker image for running an InfluxDB instance for the
Go performance monitoring dashboard.

The image is based on the Google-maintained GCP InfluxDB 2 image, with
an additional small program to perform initial database setup and push
access credentials to Google Secret Manager.

See README.md for instructions on running the image locally or on GCP.

For golang/go#48803

Change-Id: Ica34bd624f6daf89e483f898b110ccaceac83559
Reviewed-on: https://go-review.googlesource.com/c/perf/+/392635
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/394354 mentions this issue: influx: InfluxDB instance setup

@gopherbot
Copy link

Change https://go.dev/cl/394358 mentions this issue: WIP: influx: add kubernetes deployment

gopherbot pushed a commit to golang/build that referenced this issue Jun 21, 2022
Loading can take a few seconds, so add a basic note that something is
coming.

For golang/go#48803.

Change-Id: Ic24483f5729395cfde87ba751c851950abe48281
Reviewed-on: https://go-review.googlesource.com/c/build/+/413419
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Jun 21, 2022
For golang/go#48803.

Change-Id: I678a2dc5e668179d0fbaf4401aa810ad930c7933
Reviewed-on: https://go-review.googlesource.com/c/build/+/413421
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Jun 21, 2022
For golang/go#48803.

Change-Id: I2cee352261b1e432e625d6d5d2d081b78d926c1a
Reviewed-on: https://go-review.googlesource.com/c/build/+/413423
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/413426 mentions this issue: perf: add brief dashboard overview

@prattmic
Copy link
Member Author

For those that have been following along, the current basic dashboard is available at https://perf.golang.org/dashboard/.

@prattmic
Copy link
Member Author

Note that we are still refining the set of quality, stable benchmarks and additional work on reducing nodes on executors.

@jake-ciolek
Copy link
Contributor

@prattmic Awesome stuff, would it be possible to expand the date range further back?

@prattmic
Copy link
Member Author

@jake-ciolek Yes, I'd like to make the date range configurable (at least to show the entire release cycle), but haven't gotten to that yet.

@gopherbot
Copy link

Change https://go.dev/cl/413574 mentions this issue: cmd/coordinator: add header link to perf dashboard

@gopherbot
Copy link

Change https://go.dev/cl/413577 mentions this issue: perf: don't catch 404 throw

@gopherbot
Copy link

Change https://go.dev/cl/413576 mentions this issue: perf: shrink data.json

@gopherbot
Copy link

Change https://go.dev/cl/413578 mentions this issue: perf: make duration configurable

gopherbot pushed a commit to golang/build that referenced this issue Jun 23, 2022
For golang/go#48803.

Change-Id: I15393e80fa9c2a961e7feacc3acd2b72de312b82
Reviewed-on: https://go-review.googlesource.com/c/build/+/413426
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Jun 23, 2022
For golang/go#48803.

Change-Id: Ied4ff67757282bdd33f6e85b652762a18c369e8d
Reviewed-on: https://go-review.googlesource.com/c/build/+/413574
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Jun 23, 2022
/dashboard/data.json?benchmark=all reduced from 7.8MB to 2.0MB with gzip
compression, and from 2.0MB to 1.1MB by removing the indentation.

For golang/go#48803.

Change-Id: I9031d23b1b751e130521658fd45551771cec04da
Reviewed-on: https://go-review.googlesource.com/c/build/+/413576
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Jun 23, 2022
This catches all exceptions, so e.g., a json parsing error is completely
swallowed without even appearing in the JS console, which makes
debugging difficult.

For golang/go#48803.

Change-Id: If62f153413cdb2dfbeab2d83ddf6aaedc43cce8c
Reviewed-on: https://go-review.googlesource.com/c/build/+/413577
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/414036 mentions this issue: cmd/coordinator, perf: improve header consistency

gopherbot pushed a commit to golang/build that referenced this issue Jun 24, 2022
I forgot there was another copy of the template in status.go.
Move at least coordinator's 2 copies of the header into one
common place; we can do more later.

Also update the perf header to be consistent and link back
to the coordinator status page for easier navigation.

For golang/go#48803.

Change-Id: Id3f789d529e349cde1557ed17dd713549f55b942
Reviewed-on: https://go-review.googlesource.com/c/build/+/414036
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Jun 27, 2022
This adds a ?day=N parameter that adjusts the query range, and ?end=TIME
parameter that sets the view end time.

Addition form fields allow adjusting these settings.

Though the search box and duration/end boxes are conceptually different
adjustments, and have separate submit buttons, put them inside the same
<form> so that the browser automatically sets all query parameters as
expected. e.g., when on a single benchmark page, changing duration/end
should not change the selected benchmark.

For golang/go#48803.

Change-Id: I8ff366983c568dc0e887289a174082c248934c2d
Reviewed-on: https://go-review.googlesource.com/c/build/+/413578
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
@prattmic
Copy link
Member Author

@jake-ciolek https://go.dev/cl/413578 is rolled out, so now the graph duration and end date are adjustable.

@gopherbot
Copy link

Change https://go.dev/cl/414474 mentions this issue: perf: remove /trend

@jake-volvo
Copy link

@jake-ciolek https://go.dev/cl/413578 is rolled out, so now the graph duration and end date are adjustable.

Thank you. Great work.

I have noticed the data goes back only to 22nd of March (even if I query for more). Would it be possible to extend it?

Also, could we look in to other metrics, like the time to compile Go itself (compilebench?), the size of Go binary broken down by funcs (feed data from something like compilecmp). Perhaps timings of individual SSA passes. Just some ideas for things that could be useful.

@prattmic
Copy link
Member Author

I have noticed the data goes back only to 22nd of March (even if I query for more). Would it be possible to extend it?

That is when we started collecting data. It is theoretically possible to test older commits, but I don't think it is worth the effort since we have plenty to investigate already.

Also, could we look in to other metrics, like the time to compile Go itself (compilebench?), the size of Go binary broken down by funcs (feed data from something like compilecmp). Perhaps timings of individual SSA passes. Just some ideas for things that could be useful.

The benchmarks we run are in x/benchmarks/cmd/bench (which most consists of configurations of x/benchmarks/cmd/bent (various unit benchmarks from the community) and x/benchmarks/sweet (large application-scale benchmarks)). We have a minimal set of benchmarks right now, and are certainly open to benchmarking additional things, like the ideas you suggested.

gopherbot pushed a commit to golang/build that referenced this issue Jul 7, 2022
It has been superseded by /dashboard, so let's remove it to keep things
simple.

For golang/go#48803.

Change-Id: I5b7ca91c72ee8f7bcfe762fdcd7400a619611058
Reviewed-on: https://go-review.googlesource.com/c/build/+/414474
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/442259 mentions this issue: perf: move unrelated types/consts

gopherbot pushed a commit to golang/build that referenced this issue Oct 11, 2022
These are not related to regression detection, so move them from the
middle of the regression functions.

For golang/go#48803.

Change-Id: If9d0d1d7560c54db5da6feb83cdaaffa803ba056
Reviewed-on: https://go-review.googlesource.com/c/build/+/442259
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
@golang golang locked and limited conversation to collaborators Jan 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

9 participants