Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: reduce PGO profile processing overhead #58102

Closed
prattmic opened this issue Jan 26, 2023 · 15 comments
Closed

cmd/compile: reduce PGO profile processing overhead #58102

prattmic opened this issue Jan 26, 2023 · 15 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Milestone

Comments

@prattmic
Copy link
Member

Currently, each cmd/compile invocation parses the full PGO pprof profile, builds a full weight graph, and then determines what is relevant to that package. This is a lot of work that scales poorly with the size of the profile and the size of the build (number of packages). For particularly large profiles, this can lead to extremely long build times.

Follow-up to #55022.

cc @cherrymui @aclements

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 26, 2023
@prattmic prattmic added this to the Go1.21 milestone Jan 26, 2023
@prattmic prattmic self-assigned this Jan 26, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 26, 2023
@gopherbot
Copy link

Change https://go.dev/cl/464575 mentions this issue: WIP: trim profiles to only include transitive deps

@cherrymui
Copy link
Member

#62400 has a prototype that significantly reduce build time overhead for PGO. It is too late to be integrated in for Go 1.22. Bump to Go 1.23.

@cherrymui cherrymui modified the milestones: Go1.22, Go1.23 Nov 27, 2023
@CAFxX
Copy link
Contributor

CAFxX commented Jan 22, 2024

Just as a practical example of this in the wild: I just tried to pull a production profile (from GCP cloud profiler; 16MB gzip compressed, 60MB uncompressed) of one of our services and drop it into its repo as default.pgo. We are currently on go 1.21, and including dependencies that service is made up of 700+ packages.

This caused the build time (GCP Cloudbuild, e2-highcpu-32) to skyrocket from 2.5 minutes to 34 minutes -- before being killed, presumably due to OOM, since the build timeout is 60 minutes ~13 minutes (after tweaking the build process a bit to avoid OOMs).

It's clear we won't really be able to enable PGO until the PGO build overhead is lowered to more reasonable levels.

@prattmic
Copy link
Member Author

prattmic commented Jan 22, 2024

@CAFxX Thanks for the report!

https://go.dev/cl/529738 is nearly ready to go in and aims to address the majority of this issue. If you are able, it would be great if you could try it out and report whether it addresses the problem.

That CL doesn't yet plumb through cmd/go, so you'll need to run manually and directly set the compiler flags. Something like:

$ cd go-checkout/src # Go to checkout of the Go repo.
$ git fetch https://go.googlesource.com/go refs/changes/38/529738/11 && git checkout -b change-529738 FETCH_HEAD # Download the CL
$ ./make.bash # Build toolchain
$ ../bin/go tool preprofile -i input.pprof -o preprocessed.pgo # Run the preprocessor
your-project$ ~/go-checkout/bin/go build -gcflags=all=-pgoprofile=/path/to/preprocessed.pgo # Manually pass output to compiler

Note: make sure you don't have a default.pgo in your package, otherwise the cmd/go handling of that file will conflict with the manual flag passed to the compiler.

Edit: I submitted the CL, so now you can just pull tip.

@gopherbot
Copy link

Change https://go.dev/cl/557458 mentions this issue: cmd/compile/internal/pgo: remove fatal logging from functions returning error

@gopherbot
Copy link

Change https://go.dev/cl/557460 mentions this issue: Revert "cmd/preprofile: Add preprocess tool to pre-parse the profile file."

gopherbot pushed a commit that referenced this issue Jan 22, 2024
…file."

This reverts CL 529738.

Reason for revert: Breaking longtest builders

For #58102.
Fixes #65220.

Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def
Reviewed-on: https://go-review.googlesource.com/c/go/+/557460
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@CAFxX
Copy link
Contributor

CAFxX commented Jan 23, 2024

Will try as soon as I have a bit of more time to dedicate. update: I see it has been reverted.

A couple of further notes in the meantime:

  • Using an uncompressed profile seems to already reduce CPU usage quite a bit (approx. -20%). Haven't checked the CL, but if it doesn't do this already it may be a good idea to somehow avoid decompressing (and maybe also parsing) the profile for each package being compiled.
  • In general it seems like PGO significantly increases peak memory usage during compilation. We normally use 1GB/core when building Go code, and it seems PGO is reliably pushing past that. Maybe the default behavior of the build driver should be changed to dial down the number of parallel builds if the driver detects that there is not enough free memory and/or if children are getting killed because of OOM (similar to what can be done with parallel: see --term-seq, --retries, --memsuspend, and --memfree)

@gopherbot
Copy link

Change https://go.dev/cl/560035 mentions this issue: cmd/compile: avoid reading entire PGO profile just to check the header

gopherbot pushed a commit that referenced this issue Feb 2, 2024
isPreProfileFile reads the entire file into memory just to check the
first few bytes, and then throws it all away. We can avoid this by just
peeking at the beginning.

For #58102.

Change-Id: Id2c2844e5e44a2f3a9c7cdb9a027d94d26bdf71d
Reviewed-on: https://go-review.googlesource.com/c/go/+/560035
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
…file."

This reverts CL 529738.

Reason for revert: Breaking longtest builders

For golang#58102.
Fixes golang#65220.

Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def
Reviewed-on: https://go-review.googlesource.com/c/go/+/557460
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
isPreProfileFile reads the entire file into memory just to check the
first few bytes, and then throws it all away. We can avoid this by just
peeking at the beginning.

For golang#58102.

Change-Id: Id2c2844e5e44a2f3a9c7cdb9a027d94d26bdf71d
Reviewed-on: https://go-review.googlesource.com/c/go/+/560035
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/569336 mentions this issue: cmd/preprofile: drop output directory check

@gopherbot
Copy link

Change https://go.dev/cl/569338 mentions this issue: cmd/compile: rename cmd/compile/internal/pgo to cmd/compile/internal/pgoir

@gopherbot
Copy link

Change https://go.dev/cl/569335 mentions this issue: cmd/preprofile: clean up error handling

@gopherbot
Copy link

Change https://go.dev/cl/569337 mentions this issue: cmd/compile,cmd/preprofile: move logic to shared common package

@gopherbot
Copy link

Change https://go.dev/cl/569425 mentions this issue: cmd/go: use cache for PGO preprocessing

@gopherbot
Copy link

Change https://go.dev/cl/569424 mentions this issue: cmd/go: preprocess PGO profiles

@gopherbot
Copy link

Change https://go.dev/cl/569423 mentions this issue: cmd/go: inital plumbing for PGO profiles preprocessing

gopherbot pushed a commit that referenced this issue Mar 27, 2024
This CL adjusts error handling to be a bit more idiomatic. The
processing function returns errors, leaving main to log and exit on
error.

This CL contains no functional changes.

For #58102.

Change-Id: I9074127cc675e177d046474b7f01fbc37d0bd4c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/569335
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Mar 27, 2024
This check serves only to provide a more descriptive error if the output
directory doesn't exist. That isn't useless, but I don't see why this tool
specifically should do this when no other part of the toolchain does.

For #58102.

Change-Id: I01cf9db2cc1dad85c3afd8a6b008c53f26cb877a
Reviewed-on: https://go-review.googlesource.com/c/go/+/569336
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Mar 27, 2024
The processing performed in cmd/preprofile is a simple version of the
same initial processing performed by cmd/compile/internal/pgo. Refactor
this processing into the new IR-independent cmd/internal/pgo package.

Now cmd/preprofile and cmd/compile run the same code for initial
processing of a pprof profile, guaranteeing that they always stay in
sync.

Since it is now trivial, this CL makes one change to the serialization
format: the entries are ordered by weight. This allows us to avoid
sorting ByWeight on deserialization.

Impact on PGO parsing when compiling cmd/compile with PGO:

* Without preprocessing: PGO parsing ~13.7% of CPU time
* With preprocessing (unsorted): ~2.9% of CPU time (sorting ~1.7%)
* With preprocessing (sorted): ~1.3% of CPU time

The remaining 1.3% of CPU time approximately breaks down as:

* ~0.5% parsing the preprocessed profile
* ~0.7% building weighted IR call graph
  * ~0.5% walking function IR to find direct calls
  * ~0.2% performing lookups for indirect calls targets

For #58102.

Change-Id: Iaba425ea30b063ca195fb2f7b29342961c8a64c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/569337
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
gopherbot pushed a commit that referenced this issue Mar 27, 2024
…pgoir

This helps reduce confusion with cmd/internal/pgo, which performs
compilation-independent analysis. pgoir associates that data with the
IR from the current package compilation.

For #58102.

Change-Id: I9ef1c8bc41db466d3340f41f6d071b95c09566de
Reviewed-on: https://go-review.googlesource.com/c/go/+/569338
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Mchnan pushed a commit to Mchnan/go-sylixos that referenced this issue Mar 28, 2024
This CL adjusts error handling to be a bit more idiomatic. The
processing function returns errors, leaving main to log and exit on
error.

This CL contains no functional changes.

For golang#58102.

Change-Id: I9074127cc675e177d046474b7f01fbc37d0bd4c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/569335
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Mchnan pushed a commit to Mchnan/go-sylixos that referenced this issue Mar 28, 2024
This check serves only to provide a more descriptive error if the output
directory doesn't exist. That isn't useless, but I don't see why this tool
specifically should do this when no other part of the toolchain does.

For golang#58102.

Change-Id: I01cf9db2cc1dad85c3afd8a6b008c53f26cb877a
Reviewed-on: https://go-review.googlesource.com/c/go/+/569336
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Mchnan pushed a commit to Mchnan/go-sylixos that referenced this issue Mar 28, 2024
The processing performed in cmd/preprofile is a simple version of the
same initial processing performed by cmd/compile/internal/pgo. Refactor
this processing into the new IR-independent cmd/internal/pgo package.

Now cmd/preprofile and cmd/compile run the same code for initial
processing of a pprof profile, guaranteeing that they always stay in
sync.

Since it is now trivial, this CL makes one change to the serialization
format: the entries are ordered by weight. This allows us to avoid
sorting ByWeight on deserialization.

Impact on PGO parsing when compiling cmd/compile with PGO:

* Without preprocessing: PGO parsing ~13.7% of CPU time
* With preprocessing (unsorted): ~2.9% of CPU time (sorting ~1.7%)
* With preprocessing (sorted): ~1.3% of CPU time

The remaining 1.3% of CPU time approximately breaks down as:

* ~0.5% parsing the preprocessed profile
* ~0.7% building weighted IR call graph
  * ~0.5% walking function IR to find direct calls
  * ~0.2% performing lookups for indirect calls targets

For golang#58102.

Change-Id: Iaba425ea30b063ca195fb2f7b29342961c8a64c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/569337
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Mchnan pushed a commit to Mchnan/go-sylixos that referenced this issue Mar 28, 2024
…pgoir

This helps reduce confusion with cmd/internal/pgo, which performs
compilation-independent analysis. pgoir associates that data with the
IR from the current package compilation.

For golang#58102.

Change-Id: I9ef1c8bc41db466d3340f41f6d071b95c09566de
Reviewed-on: https://go-review.googlesource.com/c/go/+/569338
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Apr 18, 2024
The new go tool preprofile preprocesses a PGO pprof profile into an
intermediate representation that is more efficient for the compiler to
consume. Performing preprocessing avoids having every single compile
process from duplicating the same processing.

This CL prepares the initial plumbing to support automatic preprocessing
by cmd/go.

Each compile action takes a new dependency on a new "preprocess PGO
profile" action. The same action instance is shared by all compile
actions (assuming they have the same input profile), so the action only
executes once.

Builder.build retrieves the file to pass to -pgofile from the output of
the preprocessing action, rather than directly from
p.Internal.PGOProfile.

Builder.buildActionID also uses the preprocess output as the PGO
component of the cache key, rather than the original source. This
doesn't matter for normal toolchain releases, as the two files are
semantically equivalent, but it is useful for correct cache invalidation
in development. For example, if _only_ go tool preprofile changes
(potentially changing the output), then we must regenerate the output
and then rebuild all packages.

This CL does not actually invoke go tool preprocess. That will come in
the next CL. For now, it just copies the input pprof profile.

This CL shouldn't be submitted on its own, only with the children. Since
the new action doesn't yet use the build cache, every build (even fully
cached builds) unconditionally run the PGO action.

For #58102.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: I594417cfb0164cd39439a03977c904e4c0c83b8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/569423
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Apr 18, 2024
Following the previous CL, now actually run the preprofile tool to create the
preprocessed output.

There is still no build cache integration, so the tool will run on every
build even if nothing has changed.

For #58102.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: I0414377a956889f457e50898737fcaa8a698658d
Reviewed-on: https://go-review.googlesource.com/c/go/+/569424
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Projects
Development

No branches or pull requests

5 participants