Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: remove idle GC workers for the 2-minute GC cycle #44163

Closed
mknyszek opened this issue Feb 8, 2021 · 21 comments
Closed

runtime: remove idle GC workers for the 2-minute GC cycle #44163

mknyszek opened this issue Feb 8, 2021 · 21 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge GarbageCollector NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Feb 8, 2021

Background

Today the Go GC will spin up dedicated GC workers which use 25% of GOMAXPROCS resources. But it will also soak up any additional GOMAXPROCS that are idle and run "idle GC workers" to do so.

The idea here is that those cores are idle anyway, so why not use them to speed up the GC cycle? There are negative performance implications to having the GC on longer than necessary (floating garbage accumulation for memory, and write barrier enabled for CPU time).

EDIT: To be clear, I'm not proposing removing dedicated GC workers, which use 25% of CPU resources during a GC cycle. Those are very, very important. Just the idle workers that can increase that number to 100% if the application is idle.

Why?

When a Go process is mostly idle, idle enough that the "every-2-minutes" GC kicks in, it's going to go full blast on all CPU cores, resulting in a CPU utilization spike. While this on it's own isn't a big deal, it makes a difference in the face of monitoring systems and auto-scaling systems. The former is surprising; one might see these spikes in their monitoring and spend time investigating them, only to find there's nothing to do about that (today). The latter is dangerous: if an auto-scaling system decides to give an application more resources because it notices this spike, then the Go GC is now going to soak up those resources, resulting in a significant (if temporary, and generally bounded) increase in resources.

From the earlier proposal to remove idle GC workers (no longer relevant):

I believe today these idle GC workers are doing more harm than good.

Firstly, when a Go process is mostly idle, idle enough that the "every-2-minutes" GC kicks in, it's going to go full blast on all CPU cores, resulting in a CPU utilization spike. While this on it's own isn't a big deal, it makes a difference in the face of monitoring systems and auto-scaling systems. The former is surprising; one might see these spikes in their monitoring and spend time investigating them, only to find there's nothing to do about that (today). The latter is dangerous: if an auto-scaling system decides to give an application more resources because it notices this spike, then the Go GC is now going to soak up those resources, resulting in a significant (if temporary, and generally bounded) increase in resources.

Secondly, the Go pacer fails to account for this extra CPU utilization. This significantly skews the pacer's function and makes it consistently undershoot its target in the steady-state. The goal of the pacer is to reach some GC CPU utilization target, but that target is somewhat meaningless if the idle GC workers can kick in and blow that away in applications that have any idle time. The result is that if idle GC workers are running, the GC will consistently undershoot its goal.

Worse still, this extra CPU utilization isn't easy to account for in the pacer. If we pace assuming the idle GC workers are running, it's possible for the GC to really panic and spike utilization just when you really need it. For instance, take an application with many goroutines blocked on syscalls. Let's say most of the time they're not doing a lot, but suddenly (by chance) many wake up at once during a GC cycle. Suddenly the GC CPU utilization will drop (no space for idle workers) and the GC will think it's extremely behind, so at that moment it'll start forcing all these goroutines to assist, basically up to the utilization that the idle workers were doing. This makes our notion of a steady-state fragile, since it now depends on the CPU resources available. (I have a new pacer design in the works and while it can deal with this, it ends up with consistent undershoot when the idle workers are running; I don't think you can win here.)

Thirdly, there's a latency cost. Consider again the case of goroutines waiting on, say, network requests. The Go scheduler is going to schedule idle GC workers, and while they yield readily (they're "low priority" in a sense), their function is primarily non-preemptible and likely will be for the forseeable future. The cost of scheduling them away is small, but not free.

Why not?

The application could suddenly become not idle during the GC cycle, and there would be a latency cost in the form of assists (vs. idle GC workers, which tend to introduce very little latency cost). Given that the 2-minute GC kicks in, however, I really doubt that this will every be an issue. The application's steady-state would have to change mid-GC, and on the time scale of 2 minutes, that's already extremely unlikely.

From the earlier proposal to remove idle GC workers (no longer relevant):

I'm honestly not sure. The only scenario I could construct where the idle GC workers come in handy is a bursty application, since you increase your chance that the GC fires outside of a burst, and you get a nice boost in throughput and latency since the GC conveniently tries to finish as fast as possible outside the burst.

But this argument is still somewhat dubious because there's no guarantee the GC won't just fire during the burst, so I'm not sure how much this actually helps. However, one reason I'm creating this issue is that maybe someone out there can think of another scenario, or has another use-case.

Why not now?

I don't think there's any reason not to do this very targeted fix. It will require some additional testing to make sure everything still works correctly, because the 2-minute GC is a fairly rare case, so bugs here will manifest as surprising and very rare failures.

From the earlier proposal to remove idle GC workers (no longer relevant):

Removing the idle GC workers is unfortunately going to be risky from an implementation standpoint. @prattmic tried this out and unfortunately it uncovered at least one bug. It's possible that over time the idle GC workers have become critical to GC progress in some scenarios. Oops.

@prattmic concluded from his investigation into this that it's going to uncover a non-trivial amount of latent, subtle bugs (and I say bugs since theoretically the idle GC workers should have no effect on the function of the GC otherwise) and it's going to take a while to squash all of them.

As a result, it's going to be critical to land this early in the development cycle and give it plenty of time to soak, with an easy roll-back option (perhaps a runtime-internal constant).

However, despite all this, I think this is worth doing sooner rather than later. The longer we wait, the greater chance we have that seemingly orthogonal changes to the GC tangle this more.

CC @aclements @prattmic

@mknyszek mknyszek added GarbageCollector NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Feb 8, 2021
@mknyszek mknyszek added this to the Go1.17 milestone Feb 8, 2021
@randall77
Copy link
Contributor

So who's doing the GC work then? Is it all handled during assists?

If so, then I worry about an application that stops allocating mid-cycle while the write barrier is active. Then is the write barrier active forever? Is all the floating garbage never collected (and resulting free pages returned to the OS)?

Or is some of the GC work done during write barriers also? That would eliminate the write-barrier-on-forever problem, and lessen, but maybe not eliminate, the floating garbage one (to still have a problem you'd need an application that runs but never writes a pointer).

@mknyszek
Copy link
Contributor Author

mknyszek commented Feb 8, 2021

@randall77 I only mean idle GC workers (which are dedicated low-priority workers that soak up additional idle Ps in addition to the 25% dedicated), not the 25% dedicated workers. I think it's very important to keep dedicated workers, for the reasons you mention (and others).

@mknyszek
Copy link
Contributor Author

Definitely not in this release, but I've found some other issues with dedicated worker scheduling in working on #44167. I would like to address all these issues in the final implementation for #44167 which is currently in progress.

@mknyszek mknyszek modified the milestones: Go1.17, Go1.18 Apr 15, 2021
@mknyszek mknyszek self-assigned this Apr 19, 2021
@mknyszek mknyszek added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Apr 23, 2021
@cagedmantis
Copy link
Contributor

This issue is currently labeled as early-in-cycle for Go 1.18.
That time is now, so this is a friendly ping so the issue is looked at again.

@mknyszek
Copy link
Contributor Author

thanks @cagedmantis, I'm preparing a change for this, it's on my radar.

@mknyszek
Copy link
Contributor Author

ah, though I will note I probably won't land it before the tree opens. I don't see a need.

@gopherbot
Copy link

Change https://golang.org/cl/351131 mentions this issue: runtime: add an experiment to disable idle GC mark workers

@aclements
Copy link
Member

Bumping to 1.19 early-in-cycle.

@aclements aclements modified the milestones: Go1.18, Go1.19 Oct 5, 2021
@mknyszek
Copy link
Contributor Author

mknyszek commented Dec 16, 2021

After some experimentation in the last development cycle, I think this is no longer feasible. Not for any direct technical reason -- the experiment above seems to work. The problem is that removing idle GC workers costs about 1% throughput and latency, across the board.

Furthermore, I think the new pacer (#44167) accounts for the idle GC workers a little bit better, and I've discovered that the latency cost of scheduling the idle GC workers is dwarfed by the latency cost of having write barriers and goroutine assists on longer (because the GC cycle is longer without the idle GC workers).

I'd like to make an alternative proposal: disable idle GC workers only in the case where they're actively harmful, that is, when the 2-minute GC runs. This is fairly easy to check for and implement.

@mknyszek mknyszek changed the title runtime: remove idle GC workers runtime: remove idle GC workers for the 2-minute GC cycle Dec 16, 2021
@mknyszek
Copy link
Contributor Author

I updated the original post with all the details of the new proposal.

@thepudds
Copy link
Contributor

I'd like to make an alternative proposal: disable idle GC workers only in the case where they're actively harmful, that is, when the 2-minute GC runs.

FWIW, I think that would be well worthwhile, including it can very disconcerting to see it happen when you have a large heap, and it can mean go is not being a nice citizen of whatever environment it is running in.

It also happens to have exacerbated some of the issues cited in that Discord blog (e.g., see #37116).

I think in #37116 it might also have been theorized that the idle workers on the 2 min cycle were causing sporadic latency issues, but I would need to re-read that issue to confirm.

@gopherbot
Copy link

This issue is currently labeled as early-in-cycle for Go 1.19.
That time is now, so a friendly reminder to look at it again.

@gopherbot
Copy link

Change https://go.dev/cl/393394 mentions this issue: runtime: disable idle mark workers during periodic GC cycles

@mknyszek
Copy link
Contributor Author

I have to pivot this issue once more. The idle priority mark workers are critical to ensuring GC progress currently, for two reasons. One is #44313 which is fixable (working on it). The other is more subtle.

If we only have a fractional worker, even with a fix for #44313, it's possible that the scheduler will choose not to schedule a fractional worker, for example if the fractional worker's goal utilization has been reached. Consider a case where GOMAXPROCS=1 so we have a 25% fractional worker, and all the user goroutines in the application happen to block (directly or indirectly) on a GC. This can happen if a goroutine calls runtime.GOMAXPROCS or runtime.GC. If an M's goroutine decides to block, and it starts looking for work, but everything is blocked on a GC, and the fractional worker isn't needed, then it's going to schedule an idle worker. This ensures the GC progresses, and eventually the aforementioned goroutines get unblocked. If we turn idle workers off (note that it doesn't really matter what triggered the GC), then there's nothing to keep the GC going. The M will go to sleep, and the system will deadlock. My previous iteration of https://go.dev/cl/393394 demonstrated this in the trybots.

I don't think this has a general solution other than fundamentally reworking how fractional workers operate. The quick hack is to have sysmon force it to get scheduled, but giving yet-more-responsibility to sysmon isn't great. An alternative is to schedule the fractional worker anyway (allowing it to go over its quota), but make it eager to yield. This means a lot of spinning in and out of the scheduler, but it's better than what we have today.

A third alternative, which I've implemented in the latest version of https://go.dev/cl/393394, is a compromise. Keep idle mark workers, but give them a limit, and set that limit to 1 for periodic GCs. In the grand scheme of things, "idle mark worker" is really the correct thing designation here: the system is fully idle. I think if #44313 is fixed then it's also OK to set that limit to 0 if we have at least one dedicated worker, because that worker will ensure GC progress.

I plan to first land https://go.dev/cl/393394 (if there are no objections), try to fix #44313 (which is turning out to be a little tricky), then drop the idle GC worker limit from 1 to 0 if at least one dedicated worker is running. Does this sound reasonable, @aclements @prattmic ?

@aclements
Copy link
Member

What if the problem isn't that we need idle workers to ensure progress, but just fractional workers need to be less special? If the fractional worker used a timer for when it needed to run again rather than relying on the scheduler to poll for if it needs to run, then it would be able to wake a stopped M.

@mknyszek
Copy link
Contributor Author

I think timers could work. Two potential problems, though, from experience using timers for the scavenger:

  1. It'll delay checkdead unless we're careful about how the goroutine is parked and unparked. This is mainly a test issue.
  2. The granularity of our timers is pretty low. The scavenger (also paced by CPU utilization) needs to run for about 1 ms before producing a sleep duration large enough to be consistently correlated with the input sleep duration. I fear chaotic changes in CPU usage for GOMAXPROCS<6 if we try to schedule at a higher granularity.

(1) is probably a non-issue, since a GC cycle has a pretty short duration compared to completing all the scavenge work, and checkdead typically isn't all that useful once the application gets big enough to have a long GC cycle. We can also work around this, it's just complicated.

I have honestly no idea whether (2) will be a problem in practice. It depends a lot on the length of the GC cycle. The shortest GC cycles can definitely be < 1 ms in length.

As a result, I don't think this is just a striaghtforward "let's do it this way," though I can be convinced. Currently, I think it needs a bit of experimentation and tuning, which I'll fold into my "fundamentally reworking how fractional workers operate" comment above. :)

@prattmic
Copy link
Member

What if the problem isn't that we need idle workers to ensure progress, but just fractional workers need to be less special? If the fractional worker used a timer for when it needed to run again rather than relying on the scheduler to poll for if it needs to run, then it would be able to wake a stopped M.

I agree this would be nice. In fact, it would be nice if dedicated and fractional workers were closer to "normal" goroutines that could simply move to _Grunnable when they need to run (with the scheduler boosting their priority I guess? Haven't thought about this too much).

Working with timers will be a bit awkward since there isn't an obvious time we need to run again, we first need to pick a period to use. To Michael's point about timer granularity, perhaps we could have a pretty long period. e.g., fractional workers will run for at least 10ms before sleeping. On the other hand, this means a 25% fractional worker will sleep for 30ms, which while not technically deadlocked may be a long time to wait if all other goroutines are blocked waiting for GC to make progress.

The scavenger (also paced by CPU utilization) needs to run for about 1 ms before producing a sleep duration large enough to be consistently correlated with the input sleep duration

I'm not sure precisely what you mean, but this just sounds like a description of #44343.

@mknyszek
Copy link
Contributor Author

On the other hand, this means a 25% fractional worker will sleep for 30ms, which while not technically deadlocked may be a long time to wait if all other goroutines are blocked waiting for GC to make progress.

That's a good point. In theory if the application is active, this is fine; assists will finish off the GC. Another problem with a long period is something like GOMAXPROCS=1: if the GC is short enough, then 10 ms could easily just eat up the entire GC cycle each time, effectively turning into a STW GC (nothing makes any progress except the GC). I realize 10 ms was just your strawman number, but even 1 ms could have the same effect. We can deal with this a bit by adding a bit of random delay before the fractional worker kicks in, but it's just more complexity.

At first glance this might not seem any worse than idle mark workers eating up an entire scheduling quantum, but idle mark workers only pop up when there's actually nothing to do.

I'm not sure precisely what you mean, but this just sounds like a description of #44343.

That's correct.

(And more precisely, I was referring to the scavenger's controller, which assumes that sleeping longer has some kind of proportional effect on CPU utilization. At smaller sleep increments than 1 ms, the amount actually slept varies wildly enough that error in the controller accumulates in an unbounded manner.)

I think your comments serve my broader point that to "fix" the fractional worker would require (and certainly benefit from) a rethink of how GC scheduling works today. While I think that's worthwhile, I'd also like to offer my preference to not block this issue on that.

That being said, if y'all don't like this interim solution of just limiting idle mark workers (turning them down to zero if we have at least one dedicated mark worker), I'm willing to put this aside and find time later to look into making this work better.

@prattmic
Copy link
Member

That being said, if y'all don't like this interim solution of just limiting idle mark workers (turning them down to zero if we have at least one dedicated mark worker), I'm willing to put this aside and find time later to look into making this work better.

I'm OK with it. IMO, the whole discussion above is an argument in favor of having some kind of idle mark workers. At least in the (admittedly rare) case of a program blocked on GC completion it is difficult to argue that you wouldn't want at least one worker running as soon as the entire program is idle.

@aclements
Copy link
Member

I'm fine with the solution of limiting the idle workers, too. I just wanted to make sure we'd thought through the problem.

@gopherbot
Copy link

Change https://go.dev/cl/395634 mentions this issue: runtime: disable idle mark workers with at least one dedicated worker

gopherbot pushed a commit that referenced this issue Apr 26, 2022
This change reduces the maximum number of idle mark workers during
periodic (currently every 2 minutes) GC cycles to 1.

Idle mark workers soak up all available and unused Ps, up to GOMAXPROCS.
While this provides some throughput and latency benefit in general, it
can cause what appear to be massive CPU utilization spikes in otherwise
idle applications. This is mostly an issue for *very* idle applications,
ones idle enough to trigger periodic GC cycles. This spike also tends to
interact poorly with auto-scaling systems, as the system might assume
the load average is very low and suddenly see a massive burst in
activity.

The result of this change is not to bring down this 100% (of GOMAXPROCS)
CPU utilization spike to 0%, but rather

  min(25% + 1/GOMAXPROCS*100%, 100%)

Idle mark workers also do incur a small latency penalty as they must be
descheduled for other work that might pop up. Luckily the runtime is
pretty good about getting idle mark workers off of Ps, so in general
the latency benefit from shorter GC cycles outweighs this cost. But, the
cost is still non-zero and may be more significant in idle applications
that aren't invoking assists and write barriers quite as often.

We can't completely eliminate idle mark workers because they're
currently necessary for GC progress in some circumstances. Namely,
they're critical for progress when all we have is fractional workers. If
a fractional worker meets its quota, and all user goroutines are blocked
directly or indirectly on a GC cycle (via runtime.GOMAXPROCS, or
runtime.GC), the program may deadlock without GC workers, since the
fractional worker will go to sleep with nothing to wake it.

Fixes #37116.
For #44163.

Change-Id: Ib74793bb6b88d1765c52d445831310b0d11ef423
Reviewed-on: https://go-review.googlesource.com/c/go/+/393394
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge GarbageCollector NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

7 participants