Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/metrics: add a way to measure runtime-internal lock contention #57071

Closed
rhysh opened this issue Dec 4, 2022 · 13 comments
Closed

runtime/metrics: add a way to measure runtime-internal lock contention #57071

rhysh opened this issue Dec 4, 2022 · 13 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Dec 4, 2022

Go's diagnostic tools give only a minimal view into contention on runtime-internal locks. I filed #57069 and #57070 with the best data available to me now. The story in #45894 has evolved over time. I haven't kept close track of the paths others take to finding runtime-internal lock contention, but I bet it's hard for them too. (Or: please share your techniques!)

The first one, #57069, appeared in the last few weeks while changing how we operate one of our apps. Pretty quick turnaround from seeing a problem to being able to describe it as a bug.

But #57070 describes a problem I've seen for years and wasn't quite sure I understood. In that app, we saw an increase in our CPU profiles' reports of time in runtime.futex when moving from Go 1.17 to 1.18 without any associated regression in the app's performance. I took that to mean that the change was only in the reporting, that those samples arrived in bursts and so were hidden by shortcomings in setitimer-based profiles. That helped to make sense of the execution traces, but the story was still hard to prove (to myself, and to you here) without having CPU profile samples inside the execution trace data stream.

I struggled with #45894 for a while before coming to understand that a lot of that post-GC delay was related to long waits on the runtime-internal mprof lock.

Maybe Go 1.19's tools are enough to get the job done, but I wonder if we should have a more direct measurement of contention and wait time on runtime-internal locks, like an equivalent of #49881.

CC @golang/runtime and @mknyszek

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 4, 2022
@prattmic prattmic added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 5, 2022
@prattmic prattmic added this to the Backlog milestone Dec 5, 2022
@prattmic
Copy link
Member

prattmic commented Dec 5, 2022

I think it would be great if we could get the runtime locks fully wired up to the existing sync.Mutex contention profiles.

@rhysh
Copy link
Contributor Author

rhysh commented Sep 5, 2023

Full wiring into runtime/pprof's sync.Mutex contention profile would be great. I'm anticipating that being at least a little tricky, since adding stacks to the mprof involves acquiring a lock.

With an intended end state of the data being part of that profile, I think 1/ the total wait time should be reported as part of /sync/mutex/wait/total:seconds rather than as a separate metric name, and 2/ that the wait time for runtime-internal locks should be reported in the profile with a temporary "best we could do" placeholder function name, similar to _LostSIGPROFDuringAtomic64. Maybe _LostRuntimeLockContention.

Then, as we move towards reporting the actual call stacks that led to contended runtime.lock calls, the data in /sync/mutex/wait/total:seconds will remain the same and fewer call paths will require being reported as _LostRuntimeLockContention and will instead have their real call stacks.


I'm inclined to add an atomic.Uint64 field to runtime.m to count the number of nanoseconds that this M has had to wait inside (sampled) lock2 calls where the "speculative grab" failed; I don't want to respond to contention on a runtime.lock by adding cross-M contention on a global reporting value. And then to sum those values, including a global pool for dead Ms, as we do for NumCgoCall.

It looks like I can call nanotime from within lock2.

Maybe I can also capture the call stack onto an array that is part of the runtime.m struct, but it's not clear to me when would be a good time to try adding its contribution to the mprof table. Once a stack is present in the table, we might be able to add weight to it without taking any locks (so safe to call from lock2). For getting the stack to appear there in the first place the challenge seems similar to the runtime.cpuProfile.extra mechanism, but that relies on other code executing on a regular basis in a less constrained environment. Maybe the equivalent here would be "sysmon" or "the GC", but those don't seem like great options.


  • Is it good to report as /sync/mutex/wait/total:seconds?
  • Because the docs for that metric say the Mutex profile will have more detail, is it OK to report those numbers as part of that profile with a bogus call stack (such as _LostRuntimeLockContention)?
  • Is it OK to make the M struct larger, by a few words (counters) or by a few dozen words (call stack buffer)?
  • Any feedback on my general plan, before I mail a CL that's heading towards a dead end?

@mknyszek @prattmic

Thanks!

@gopherbot
Copy link

Change https://go.dev/cl/528656 mentions this issue: runtime: remove lock from block/mutex profile

@gopherbot
Copy link

Change https://go.dev/cl/528657 mentions this issue: runtime: profile contended lock calls

@gopherbot
Copy link

Change https://go.dev/cl/534161 mentions this issue: runtime/metrics: add STW start and total time metrics

gopherbot pushed a commit that referenced this issue Nov 15, 2023
This CL adds four new time histogram metrics:

/sched/pauses/stopping/gc:seconds
/sched/pauses/stopping/other:seconds
/sched/pauses/total/gc:seconds
/sched/pauses/total/other:seconds

The "stopping" metrics measure the time taken to start a stop-the-world
pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps.
This can be used to detect STW struggling to preempt Ps.

The "total" metrics measure the total duration of a stop-the-world
pause, from starting to stop-the-world until the world is started again.
This includes the time spent in the "start" phase.

The "gc" metrics are used for GC-related STW pauses. The "other" metrics
are used for all other STW pauses.

All of these metrics start timing in stopTheWorldWithSema only after
successfully acquiring sched.lock, thus excluding lock contention on
sched.lock. The reasoning behind this is that while waiting on
sched.lock the world is not stopped at all (all other Ps can run), so
the impact of this contention is primarily limited to the goroutine
attempting to stop-the-world. Additionally, we already have some
visibility into sched.lock contention via contention profiles (#57071).

/sched/pauses/total/gc:seconds is conceptually equivalent to
/gc/pauses:seconds, so the latter is marked as deprecated and returns
the same histogram as the former.

In the implementation, there are a few minor differences:

* For both mark and sweep termination stops, /gc/pauses:seconds started
  timing prior to calling startTheWorldWithSema, thus including lock
  contention.

These details are minor enough, that I do not believe the slight change
in reporting will matter. For mark termination stops, moving timing stop
into startTheWorldWithSema does have the side effect of requiring moving
other GC metric calculations outside of the STW, as they depend on the
same end time.

Fixes #63340

Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05
Reviewed-on: https://go-review.googlesource.com/c/go/+/534161
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@rhysh
Copy link
Contributor Author

rhysh commented Nov 15, 2023

@prattmic made a good point on https://go.dev/cl/528657 that the change includes an awkward mix of the semantics of the "mutex" and "block" profiles, by capturing the runtime.unlock call stack but using that M's own wait time as the value. To match the user-space "mutex" profile, its value should correspond to how long the other Ms were waiting for that critical section to complete. That's a more invasive change to src/runtime/lock_futex.go than we can make at this point in the Go 1.22 cycle.

@prattmic , @mknyszek and I discussed options at last week's "runtime diagnostics improvements" meeting: #57175 (comment) . The conclusion we came to is that we shouldn't ship those strange semantics as on by default for Go 1.22, but that we'd like to start getting the benefits of this from users who understand the current caveats and who understand that those semantics will soon change (improve). We're planning to do this via a GODEBUG setting that's off by default, and which will go away once the contention reports for runtime.mutex match those of sync.Mutex.

I updated https://go.dev/cl/528657 with a proposed name of GODEBUG=profileruntimelocks=1. When disabled, contention on runtime-internal locks is still reported in the /sync/mutex/wait/total:seconds metric, and is attributed to the runtime._LostContendedLock function.

If a temporary GODEBUG setting isn't appropriate here, we can disable the call stack collection in that same way for Go 1.22, attributing everything to the placeholder function so the /sync/mutex/wait/total:seconds metric and the sum of the mutex profile samples continue to match.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 17, 2023
@dmitshur dmitshur modified the milestones: Backlog, Go1.22 Nov 17, 2023
@rhysh rhysh reopened this Nov 21, 2023
@gopherbot
Copy link

Change https://go.dev/cl/544195 mentions this issue: runtime: profile contended lock calls

@prattmic
Copy link
Member

prattmic commented Dec 4, 2023

@rhysh FYI, I will be sending a release note CL for this. I don't want to race with you. :)

I also propose that we change the name of GODEBUG=profileruntimelocks=1 to GODEBUG=runtimecontentionstacks=1 and runtime._LostContendedLock to runtime._LostContendedRuntimeLock. I will send a CL for both.

My thought is that even with GODEBUG=profileruntimelocks=0 (the default), runtime locks are still profiled, we simply don't collect stacks. That makes the name confusing. Similarly, since users will see runtime._LostContendedLock in this case, I think runtime._LostContendedRuntimeLock is slightly more clear that this isn't one of the locks in their code.

@gopherbot
Copy link

Change https://go.dev/cl/547058 mentions this issue: doc: document runtimecontentionstacks

@gopherbot
Copy link

Change https://go.dev/cl/547057 mentions this issue: runtime/pprof: document block and mutex profiles

@gopherbot
Copy link

Change https://go.dev/cl/547055 mentions this issue: runtime: rename GODEBUG=profileruntimelocks to runtimecontentionstacks

@gopherbot
Copy link

Change https://go.dev/cl/547056 mentions this issue: runtime/metrics: document runtime-internal locks in /sync/mutex/wait/total:seconds

@gopherbot
Copy link

Change https://go.dev/cl/547059 mentions this issue: doc: add release notes for runtime-internal contention

gopherbot pushed a commit that referenced this issue Dec 6, 2023
profileruntimelocks is new in CL 544195, but the name is deceptive. Even
with profileruntimelocks=0, runtime-internal locks are still profiled.
The actual difference is that call stacks are not collected. Instead all
contention is reported at runtime._LostContendedLock.

Rename this setting to runtimecontentionstacks to make its name more
aligned with its behavior.

In addition, for this release the default is profileruntimelocks=0,
meaning that users are fairly likely to encounter
runtime._LostContendedLock. Rename it to
runtime._LostContendedRuntimeLock in an attempt to make it more
intuitive that these are runtime locks, not locks in application code.

For #57071.

Change-Id: I38aac28b2c0852db643d53b1eab3f3bc42a43393
Reviewed-on: https://go-review.googlesource.com/c/go/+/547055
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
gopherbot pushed a commit that referenced this issue Dec 6, 2023
…total:seconds

For #57071.

Change-Id: I7ce6c35bed95a6ea3cdc17007f861c5dd82404d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/547056
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Dec 6, 2023
Amazingly, we seem to have nearly no in-tree documentation on the
semantics of block and mutex profiles. Add brief summaries, including
the new behavior from CL 506415 and CL 544195.

For #14689.
For #44920.
For #57071.
For #61015.

Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1
Reviewed-on: https://go-review.googlesource.com/c/go/+/547057
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Dec 6, 2023
For #57071.

Change-Id: Ic1645af57aa589917c67154a5e4ad0b4edd7ba90
Reviewed-on: https://go-review.googlesource.com/c/go/+/547058
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Dec 6, 2023
For #57071.
For #61422.

Change-Id: I5d546d8828be897cb087e85a1251213c582b3894
Reviewed-on: https://go-review.googlesource.com/c/go/+/547059
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
profileruntimelocks is new in CL 544195, but the name is deceptive. Even
with profileruntimelocks=0, runtime-internal locks are still profiled.
The actual difference is that call stacks are not collected. Instead all
contention is reported at runtime._LostContendedLock.

Rename this setting to runtimecontentionstacks to make its name more
aligned with its behavior.

In addition, for this release the default is profileruntimelocks=0,
meaning that users are fairly likely to encounter
runtime._LostContendedLock. Rename it to
runtime._LostContendedRuntimeLock in an attempt to make it more
intuitive that these are runtime locks, not locks in application code.

For golang#57071.

Change-Id: I38aac28b2c0852db643d53b1eab3f3bc42a43393
Reviewed-on: https://go-review.googlesource.com/c/go/+/547055
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
…total:seconds

For golang#57071.

Change-Id: I7ce6c35bed95a6ea3cdc17007f861c5dd82404d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/547056
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Amazingly, we seem to have nearly no in-tree documentation on the
semantics of block and mutex profiles. Add brief summaries, including
the new behavior from CL 506415 and CL 544195.

For golang#14689.
For golang#44920.
For golang#57071.
For golang#61015.

Change-Id: I1a6edce7c434fcb43f17c83eb362b1f9d1a32df1
Reviewed-on: https://go-review.googlesource.com/c/go/+/547057
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
For golang#57071.

Change-Id: Ic1645af57aa589917c67154a5e4ad0b4edd7ba90
Reviewed-on: https://go-review.googlesource.com/c/go/+/547058
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
For golang#57071.
For golang#61422.

Change-Id: I5d546d8828be897cb087e85a1251213c582b3894
Reviewed-on: https://go-review.googlesource.com/c/go/+/547059
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

5 participants