Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/debug: soft memory limit #48409

Closed
mknyszek opened this issue Sep 15, 2021 · 42 comments
Closed

runtime/debug: soft memory limit #48409

mknyszek opened this issue Sep 15, 2021 · 42 comments

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Sep 15, 2021

Proposal: Soft memory limit

Author: Michael Knyszek

Summary

I propose a new option for tuning the behavior of the Go garbage collector by setting a soft memory limit on the total amount of memory that Go uses.

This option comes in two flavors: a new runtime/debug function called SetMemoryLimit and a GOMEMLIMIT environment variable. In sum, the runtime will try to maintain this memory limit by limiting the size of the heap, and by returning memory to the underlying platform more aggressively. This includes with a mechanism to help mitigate garbage collection death spirals. Finally, by setting GOGC=off, the Go runtime will always grow the heap to the full memory limit.

This new option gives applications better control over their resource economy. It empowers users to:

  • Better utilize the memory that they already have,
  • Confidently decrease their memory limits, knowing Go will respect them,
  • Avoid unsupported forms of garbage collection tuning.

Details

Full design document found here.

Note that, for the time being, this proposal intends to supersede #44309. Frankly, I haven't been able to find a significant use-case for it, as opposed to a soft memory limit overall. If you believe you have a real-world use-case for a memory target where a memory limit with GOGC=off would not solve the same problem, please do not hesitate to post on that issue, contact me on the gophers slack, or via email at mknyszek@golang.org. Please include as much detail as you can.

@mknyszek mknyszek added this to the Go1.18 milestone Sep 15, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Sep 15, 2021
@gopherbot
Copy link

Change https://golang.org/cl/350116 mentions this issue: design: add proposal for a soft memory limit

gopherbot pushed a commit to golang/proposal that referenced this issue Sep 21, 2021
For golang/go#48409.

Change-Id: I4e5d6d117982f51108dca83a8e59b118c2b6f4bf
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/350116
Reviewed-by: Michael Pratt <mpratt@google.com>
@mpx
Copy link
Contributor

mpx commented Sep 21, 2021

Afaict, the impact of memory limit is visible once the GC is CPU throttled, but not before. Would it be worth exposing the current effective GOGC as well?

@mknyszek
Copy link
Contributor Author

@mpx I think that's an interesting idea. If GOGC is not off, then you have a very clear sign of throttling in telemetry. However, if GOGC=off I think it's harder to tell, and it gets blurry once the runtime starts bumping up against the GC CPU utilization limit, i.e. what does effective GOGC mean when the runtime is letting itself exceed the heap goal?

I think that's pretty close. Ideally we would have just one metric that could show, at-a-glance, "are you in the red, and if so, how far?"

@mknyszek mknyszek modified the milestones: Go1.18, Proposal Sep 22, 2021
@raulk
Copy link

raulk commented Sep 27, 2021

In case you find this useful as a reference (and possibly to include in "prior art"), the go-watchdog library schedules GC according to a user-defined policy. It can infer limits from the environment/host, container, and it can target a maximum heap size defined by the user. I built this library to deal with #42805, and ever since we integrated it into https://github.com/filecoin-project/lotus, we haven't had a single OOM reported.

@rsc
Copy link
Contributor

rsc commented Oct 6, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Oct 6, 2021
@rsc
Copy link
Contributor

rsc commented Oct 13, 2021

@mknyszek what is the status of this?

@mknyszek
Copy link
Contributor Author

@rsc I believe the design is complete. I've received feedback on the design, iterated on it, and I've arrived at a point where there aren't any major remaining comments that need to be addressed. I think the big question at the center of this proposal is whether the API benefit is worth the cost. The implementation can change and improve over time; most of the details are internal.

Personally, I think the answer is yes. I've found that mechanisms that respects users' memory limits and that give the GC the flexibility to use more of the available memory are quite popular. Where Go users implement this themselves, they're left working with tools (like runtime.GC/debug.FreeOSMemory and heap ballasts) that have some significant pitfalls. The proposal also takes steps to mitigate the most significant costs of having a new GC tuning knob.

In terms of implementation, I have some of the foundational bits up for review now that I wish to land in 1.18 (I think they're uncontroversial improvements, mostly related to the scavenger). My next step is create a complete implementation and trial it on real workloads. I suspect that a complete implementation won't land in 1.18 at this point, which is fine. It'll give me time to work out any unexpected issues with the design in practice.

@rsc
Copy link
Contributor

rsc commented Oct 20, 2021

Thanks for the summary. Overall the reaction here seems overwhelmingly positive.

Does anyone object to doing this?

@kent-h
Copy link

kent-h commented Oct 26, 2021

I have some of the foundational bits up for review now that I wish to land in 1.18

I suspect that a complete implementation won't land in 1.18

@mknyszek I'm somewhat confused by this. At a high level, what are you hoping to include in 1.18, and what do you expect to come later?
(Specifically: will we have extra knobs in 1.18, or will these changes be entirely internal?)

@mknyszek
Copy link
Contributor Author

@kent-h The proposal has not been accepted, so the API will definitely not land in 1.18. All that I'm planning to land is work on the scavenger, to make it scale a bit better. This is useful in its own right, and it happens that the implementation of SetMemoryLimit as described in the proposal depends on it. There won't be any internal functionality pertaining to SetMemoryLimit in the tree in Go 1.18.

@rsc
Copy link
Contributor

rsc commented Oct 27, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals (old) Oct 27, 2021
@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) Nov 3, 2021
@rsc
Copy link
Contributor

rsc commented Nov 3, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime/debug: soft memory limit runtime/debug: soft memory limit Nov 3, 2021
@rsc rsc modified the milestones: Proposal, Backlog Nov 3, 2021
gopherbot pushed a commit that referenced this issue May 3, 2022
This change makes the memory limit functional by including it in the
heap goal calculation. Specifically, we derive a heap goal from the
memory limit, and compare that to the GOGC-based goal. If the goal based
on the memory limit is lower, we prefer that.

To derive the memory limit goal, the heap goal calculation now takes
a few additional parameters as input. As a result, the heap goal, in the
presence of a memory limit, may change dynamically. The consequences of
this are that different parts of the runtime can have different views of
the heap goal; this is OK. What's important is that all of the runtime
is able to observe the correct heap goal for the moment it's doing
something that affects it, like anything that should trigger a GC cycle.

On the topic of triggering a GC cycle, this change also allows any
manually managed memory allocation from the page heap to trigger a GC.
So, specifically workbufs, unrolled GC scan programs, and goroutine
stacks. The reason for this is that now non-heap memory can effect the
trigger or the heap goal.

Most sources of non-heap memory only change slowly, like GC pointer
bitmaps, or change in response to explicit function calls like
GOMAXPROCS. Note also that unrolled GC scan programs and workbufs are
really only relevant during a GC cycle anyway, so they won't actually
ever trigger a GC. Our primary target here is goroutine stacks.

Goroutine stacks can increase quickly, and this is currently totally
independent of the GC cycle. Thus, if for example a goroutine begins to
recurse suddenly and deeply, then even though the heap goal and trigger
react, we might not notice until its too late. As a result, we need to
trigger a GC cycle.

We do this trigger in allocManual instead of in stackalloc because it's
far more general. We ultimately care about memory that's mapped
read/write and not returned to the OS, which is much more the domain of
the page heap than the stack allocator. Furthermore, there may be new
sources of memory manual allocation in the future (e.g. arenas) that
need to trigger a GC if necessary. As such, I'm inclined to leave the
trigger in allocManual as an extra defensive measure.

It's worth noting that because goroutine stacks do not behave quite as
predictably as other non-heap memory, there is the potential for the
heap goal to swing wildly. Fortunately, goroutine stacks that haven't
been set up to shrink by the last GC cycle will not shrink until after
the next one. This reduces the amount of possible churn in the heap goal
because it means that shrinkage only happens once per goroutine, per GC
cycle. After all the goroutines that should shrink did, then goroutine
stacks will only grow. The shrink mechanism is analagous to sweeping,
which is incremental and thus tends toward a steady amount of heap
memory used. As a result, in practice, I expect this to be a non-issue.

Note that if the memory limit is not set, this change should be a no-op.

For #48409.

Change-Id: Ie06d10175e5e36f9fb6450e26ed8acd3d30c681c
Reviewed-on: https://go-review.googlesource.com/c/go/+/394221
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue May 3, 2022
This change does everything necessary to make the memory allocator and
the scavenger respect the memory limit. In particular, it:

- Adds a second goal for the background scavenge that's based on the
  memory limit, setting a target 5% below the limit to make sure it's
  working hard when the application is close to it.
- Makes span allocation assist the scavenger if the next allocation is
  about to put total memory use above the memory limit.
- Measures any scavenge assist time and adds it to GC assist time for
  the sake of GC CPU limiting, to avoid a death spiral as a result of
  scavenging too much.

All of these changes have a relatively small impact, but each is
intimately related and thus benefit from being done together.

For #48409.

Change-Id: I35517a752f74dd12a151dd620f102c77e095d3e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/397017
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue May 3, 2022
Currently the runtime's scavenging algorithm involves running from the
top of the heap address space to the bottom (or as far as it gets) once
per GC cycle. Once it treads some ground, it doesn't tread it again
until the next GC cycle.

This works just fine for the background scavenger, for heap-growth
scavenging, and for debug.FreeOSMemory. However, it breaks down in the
face of a memory limit for small heaps in the tens of MiB. Basically,
because the scavenger never retreads old ground, it's completely
oblivious to new memory it could scavenge, and that it really *should*
in the face of a memory limit.

Also, every time some thread goes to scavenge in the runtime, it
reserves what could be a considerable amount of address space, hiding it
from other scavengers.

This change modifies and simplifies the implementation overall. It's
less code with complexities that are much better encapsulated. The
current implementation iterates optimistically over the address space
looking for memory to scavenge, keeping track of what it last saw. The
new implementation does the same, but instead of directly iterating over
pages, it iterates over chunks. It maintains an index of chunks (as a
bitmap over the address space) that indicate which chunks may contain
scavenge work. The page allocator populates this index, while scavengers
consume it and iterate over it optimistically.

This has a two key benefits:
1. Scavenging is much simpler: find a candidate chunk, and check it,
   essentially just using the scavengeOne fast path. There's no need for
   the complexity of iterating beyond one chunk, because the index is
   lock-free and already maintains that information.
2. If pages are freed to the page allocator (always guaranteed to be
   unscavenged), the page allocator immediately notifies all scavengers
   of the new source of work, avoiding the hiding issues of the old
   implementation.

One downside of the new implementation, however, is that it's
potentially more expensive to find pages to scavenge. In the past, if
a single page would become free high up in the address space, the
runtime's scavengers would ignore it. Now that scavengers won't, one or
more scavengers may need to iterate potentially across the whole heap to
find the next source of work. For the background scavenger, this just
means a potentially less reactive scavenger -- overall it should still
use the same amount of CPU. It means worse overheads for memory limit
scavenging, but that's not exactly something with a baseline yet.

In practice, this shouldn't be too bad, hopefully since the chunk index
is extremely compact. For a 48-bit address space, the index is only 8
MiB in size at worst, but even just one physical page in the index is
able to support up to 128 GiB heaps, provided they aren't terribly
sparse. On 32-bit platforms, the index is only 128 bytes in size.

For #48409.

Change-Id: I72b7e74365046b18c64a6417224c5d85511194fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/399474
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@mknyszek
Copy link
Contributor Author

mknyszek commented May 3, 2022

The core feature has landed, but I still need to land a few new metrics to help support visibility into this.

gopherbot pushed a commit that referenced this issue May 13, 2022
This metric exports the the last GC cycle index that the GC limiter was
enabled. This metric is useful for debugging and identifying the root
cause of OOMs, especially when SetMemoryLimit is in use.

For #48409.

Change-Id: Ic6383b19e88058366a74f6ede1683b8ffb30a69c
Reviewed-on: https://go-review.googlesource.com/c/go/+/403614
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/406574 mentions this issue: runtime: reduce useless computation when memoryLimit is off

@gopherbot
Copy link

Change https://go.dev/cl/406575 mentions this issue: runtime: update description of GODEBUG=scavtrace=1

gopherbot pushed a commit that referenced this issue May 20, 2022
For #48409.

Change-Id: I056afcdbc417ce633e48184e69336213750aae28
Reviewed-on: https://go-review.googlesource.com/c/go/+/406575
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
tjvc added a commit to tjvc/gauche that referenced this issue Jun 5, 2022
WIP implementation of a memory limit. This will likely be superseded
by Go's incoming soft memory limit feature (coming August?), but it's
interesting to explore nonetheless.

Each time we receive a PUT request, check the used memory. To calculate
used memory, we use runtime.ReadMemStats. I was concerned that it would
have a large performance cost, because it stops the world on every
invocation, but it turns out that it has previously been optimised.
Return a 500 if this value has exceeded the current max memory. We
use TotalAlloc do determine used memory, because this seemed to be
closest to the container memory usage reported by Docker. This is broken
regardless, because the value does not decrease as we delete keys
(possibly because the store map does not shrink).

If we can work out a constant overhead for the map data structure, we
might be able to compute memory usage based on the size of keys and
values. I think it will be difficult to do this reliably, though. Given
that a new language feature will likely remove the need for this work,
a simple interim solution might be to implement a max number of objects
limit, which provides some value in situations where the user can
predict the size of keys and values.

TODO:

* Make the memory limit configurable by way of an environment variable
* Push the limit checking code down to the put handler

golang/go#48409
golang/go@4a7cf96
patrickmn/go-cache#5
https://github.com/vitessio/vitess/blob/main/go/cache/lru_cache.go
golang/go#20135
https://redis.io/docs/getting-started/faq/#what-happens-if-redis-runs-out-of-memory
https://redis.io/docs/manual/eviction/
@gopherbot
Copy link

Change https://go.dev/cl/410735 mentions this issue: doc/go1.19: adjust runtime release notes

@gopherbot
Copy link

Change https://go.dev/cl/410734 mentions this issue: runtime: document GOMEMLIMIT in environment variables section

gopherbot pushed a commit that referenced this issue Jun 7, 2022
For #48409.

Change-Id: Ia6616a377bc4c871b7ffba6f5a59792a09b64809
Reviewed-on: https://go-review.googlesource.com/c/go/+/410734
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Chris Hines <chris.cs.guy@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
gopherbot pushed a commit that referenced this issue Jun 7, 2022
This addresses comments from CL 410356.

For #48409.
For #51400.

Change-Id: I03560e820a06c0745700ac997b02d13bc03adfc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/410735
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Chris Hines <chris.cs.guy@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
@rabbbit
Copy link

rabbbit commented Jan 22, 2023

Hey @mknyszek - first of all, thanks for the excellent work; this is great.

I wanted to share our experience thinking about enabling this in production. It works great and exactly as advertised. Some well-maintained applications have enabled it with great success, and the usage is spreading organically.

We'd ideally want to enable it for everyone by default (a vast majority of our applications have plenty of memory available), but we're currently too afraid to do this. The reason is the death spirals you called in the proposal. Applications leaking memory, with GOMEMLIMIT, can get to a significantly degraded state. Paradoxically, those applications prefer to OOM, die quickly and be restarted than to struggle for a long time. The number of applications makes avoiding leaks unfeasible.

A part of the problem (perhaps) is that we lack a good enough way of setting the right limit. We cannot set it to 98-99% of the container memory because some other applications can be running there. But, if we set it to 90%, once we hit the death spiral situation, we're in a degraded state for too long - it can take hours for OOM, and in the meantime, we are at risk of all containers of an application entering the degraded state.

Another aspect is that our containers typically don't use close to all the available CPU time. So the assumption from the gc-guide, while true, has a slightly different result in practice:

The intuition behind the 50% GC CPU limit is based on the worst-case impact on a program with ample available memory. In the case of a misconfiguration of the memory limit, where it is set too low mistakenly, the program will slow down at most by 2x, because the GC can't take more than 50% of its CPU time away.

The GC might use at most 50% of the total CPU time, but it can end up using 2-3x more CPU than the actual application work. This is "GC degradation" would be hard to explain/sell to application owners.

We're also concerned with a "degradation on failover" situation - an application that might be okay usually, in case of a sudden increase in traffic, might end up in a death spiral. And this would be precisely the time we need to avoid those.

What we're doing now is:

  • most high-core applications have an internal GC tuner by @cdvr1993 described here. This work predates your work but is stable.
  • some applications are opting in for enabling GOMEMLIMIT independently.
  • we'd like to enable the GC tuning on more applications - ideally with GOMEMLIMIT to reduce the amount of custom code. Since we're afraid of the death spirals, though, we've discussed building a "lightweight" version of our tuner that would watch runtime stats (perhaps runtime/metrics: add /gc/heap/live:bytes #56857) that would dynamically limit the GC usage more aggressively and let applications die faster.

Hope this is useful. Again, thanks for the excellent work.

@mknyszek
Copy link
Contributor Author

Thanks for the detailed feedback and I'm glad it's working well for your overall!

Speaking broadly, I'd love to know more about what exactly this degraded state looks like. What is the downstream effect? Latency increase? Throughput decrease? Both? If you could obtain a GODEBUG=gctrace=1 (outputs to STDERR) of this degraded state, that would be helpful in identifying what if any next steps we should take.

We'd ideally want to enable it for everyone by default (a vast majority of our applications have plenty of memory available), but we're currently too afraid to do this. The reason is the death spirals you called in the proposal. Applications leaking memory, with GOMEMLIMIT, can get to a significantly degraded state. Paradoxically, those applications prefer to OOM, die quickly and be restarted than to struggle for a long time. The number of applications makes avoiding leaks unfeasible.

Choosing to die quickly over struggling for a long time is an intentional point in the design. In these difficult situations something has to give and we chose to make that memory.

But also if the scenario here is memory leaks, it's hard to do much about that without fixing the leak. The live heap will grow and eventually even without GOMEMLIMIT you'll OOM as well. GOMEMLIMIT isn't really designed to deal with a memory leak well (generally, we consider memory leaks to be a bug in long-running applications), and yeah I can see turning it on basically turning into "well, it just gets slower before it dies, and it takes longer to die," which may be worse than not setting a memory limit at all.

As for fixing memory leaks, we're currently planning some work on improving the heap analysis situation. I hope that'll make keeping applications leak-free more feasible in the future. (#57447)

(I recognize that encountering a memory leak bug at some point is inevitable, but in general we don't expect long-running applications to run under the expectation of memory leaks. I also get that it's a huge pain these days to debug them, but we're looking into trying to make that better with heap analysis.)

A part of the problem (perhaps) is that we lack a good enough way of setting the right limit. We cannot set it to 98-99% of the container memory because some other applications can be running there. But, if we set it to 90%, once we hit the death spiral situation, we're in a degraded state for too long - it can take hours for OOM, and in the meantime, we are at risk of all containers of an application entering the degraded state.

FTR that's what the runtime/debug.SetMemoryLimit API is for and it should be safe (performance-wise) to call with a relatively high frequency. Just to be clear, is this also the memory leak scenario?

The 90% case you're describing sounds like a misconfiguration to me; if the application's live heap is really close enough to the memory limit to achieve this kind of death spiral scenario, then the intended behavior is to die after a relatively short period, but it might not if it turns out there's actually plenty of available memory. However, this cotenant situation might not be ideal for the memory limit to begin with.

As a general rule, the memory limit, when used in conjunction with GOGC=off, is not a great fit for an environment where the Go program is potentially cotenant with others, and the others don't have predictable memory usage (or the Go application can't easily respond to cotenant changes). See https://go.dev/doc/gc-guide#Suggested_uses. In this case I'd suggest slightly overcommitting the memory limit to protect against many transient spikes in memory use (in your example here, maybe 95-96%), but set GOGC to something other than off.

The GC might use at most 50% of the total CPU time, but it can end up using 2-3x more CPU than the actual application work. This is "GC degradation" would be hard to explain/sell to application owners.

I'm not sure I follow. Are you describing a situation in which your application is using say, 25% CPU utilization, and the GC is eating up 50%?

We're also concerned with a "degradation on failover" situation - an application that might be okay usually, in case of a sudden increase in traffic, might end up in a death spiral. And this would be precisely the time we need to avoid those.

(Small pedantic note, but the 50% GC CPU limiter is a mechanism to cut off the death spiral; in general a death spiral means that the GC keeps taking on more and more of the CPU load until application progress stops entirely.)

I think it depends on the load you're expecting. It's always possible to construct a load that'll cause some form of degradation, even when you're not using the memory limit (something like a tight OOM loop as the service gets restarted would be what I would expect with just GOGC).

If the memory limit is failing to degrade gracefully, then that's certainly a problem and a bug on our side (perhaps even a design flaw somewhere!). (Perhaps this risk of setting a limit too low such that you sit in the degraded state for too long instead of actually falling over can be considered something like failing to degrade gracefully, and that suggests that even 50% GC CPU is trying too hard as a default. I can believe that but I'd like to acquire more data first.)

However, without more details about the scenario in question, I'm not sure what else we can do to alleviate the concern. One idea is a backpressure mechanism (#29696), but for now I think we've decided to see what others can build since this wisdom of this space seems to have shifted a few times over the last few years (e.g. what metric should we use? Memory? CPU? Scheduling latency? A combination? If so, what combination and weighted how? Perhaps it's very application-dependent?).

What we're doing now is:

As a final note, I just want to point out that at the end of the day, the memory limit is just another tool in the toolkit. If you can make some of your applications work better without it, I don't think that necessarily means it's a failure of the memory limit (sometimes it might be, but not always). I'm not saying that you necessarily think the memory limit should be used everywhere, just wanted to leave that here for anyone who comes looking at this thread. :)

@cdvr1993
Copy link

Hi @mknyszek

Regarding the 50% cpu limit... Unless we understand incorrectly it means it can use up to that CPU to avoid going over the soft limit, but for many of our applications anything more than 20% GC CPU can have a serious impact (mostly when on failover state). Currently, we dynamically change GOGC when there is memory available we tend to increase it, when there isn't we just keep decreasing it to ensure our own soft limit, but we have a minimum threshold and we allow different service owners to set their own minimum threshold. That's more or less what we are missing with Go soft limit.

We currently don't have an example using soft limit, but in the past we have had issues with GOGC being too low and this caused bigger problems than a few instances crashing due to OOM. So, based on that assumption we think the scenario would repeat with soft limit.

What would be nice is a way of modifying how much CPU the GC can take to ensure the soft limit? Or a minimum GOGC value so that service owners decide at what point they believe is better to OOM than the degradation caused to the elevated GC.

Or would you suggest is better to wait for #56857 to have a way to keep an eye on the size of live bytes, so that when it gets close to the soft limit make a decision of either eat the cost of GC or just OOM?

@rabbbit
Copy link

rabbbit commented Jan 24, 2023

Thanks for the detailed feedback and I'm glad it's working well for your overall!

Speaking broadly, I'd love to know more about what exactly this degraded state looks like. What is the downstream effect? Latency increase? Throughput decrease? Both? If you could obtain a GODEBUG=gctrace=1 (outputs to STDERR) of this degraded state, that would be helpful in identifying what if any next steps we should take.

Getting the traces to work in production would be hard. We have an HTTP handler to tune GOMEMLIMIT per container, so we can experiment with that with reasonable safety. There's no way to runtime way to enable traces, right?

That being said I can perhaps try to reproduce the same situation in staging. What we have seen in production was a significant CPU time utilization increase, leading to CPU throttling, leading to both latency increase and throughput decrease.

Below is screenshot of a "slowly leaking application" (more explained below) where we enabled GOMEMLIMIT temporarily. Note the CPU utilization increased significantly more than we expected - more than 50% of GOMAXPROCS.

image

We'd ideally want to enable it for everyone by default (a vast majority of our applications have plenty of memory available), but we're currently too afraid to do this. The reason is the death spirals you called in the proposal. Applications leaking memory, with GOMEMLIMIT, can get to a significantly degraded state. Paradoxically, those applications prefer to OOM, die quickly and be restarted than to struggle for a long time. The number of applications makes avoiding leaks unfeasible.

Choosing to die quickly over struggling for a long time is an intentional point in the design. In these difficult situations something has to give and we chose to make that memory.

But also if the scenario here is memory leaks, it's hard to do much about that without fixing the leak. The live heap will grow and eventually even without GOMEMLIMIT you'll OOM as well. GOMEMLIMIT isn't really designed to deal with a memory leak well (generally, we consider memory leaks to be a bug in long-running applications), and yeah I can see turning it on basically turning into "well, it just gets slower before it dies, and it takes longer to die," which may be worse than not setting a memory limit at all.

As for fixing memory leaks, we're currently planning some work on improving the heap analysis situation. I hope that'll make keeping applications leak-free more feasible in the future. (#57447)
(I recognize that encountering a memory leak bug at some point is inevitable, but in general we don't expect long-running applications to run under the expectation of memory leaks. I also get that it's a huge pain these days to debug them, but we're looking into trying to make that better with heap analysis.)

So I think you might be too optimistic vs what we see in our reality here (sorry:)). We:

  1. have applications that are leaking quick, they restart often, they need to be fixed. Those typically have higher priority, and can be diagnosed with some effort - I wouldn't actually call it pain though, profiles are typically helpful enough.
  2. "slowly leaking memory" applications that just very slowly accumulate memory as they run. These are actually low-priority - as long as the {release_frequency}>2-5*{time_to_oom}, fixing it will not get prioritized. Especially if some of the leaks are in gnarly bits like stat emission. This only becomes a problem during extended quiet periods - the expectation is still that the applications will crash rather than degrade.

In summary though, we strongly expect leaks to be around forever.

A part of the problem (perhaps) is that we lack a good enough way of setting the right limit. We cannot set it to 98-99% of the container memory because some other applications can be running there. But, if we set it to 90%, once we hit the death spiral situation, we're in a degraded state for too long - it can take hours for OOM, and in the meantime, we are at risk of all containers of an application entering the degraded state.

FTR that's what the runtime/debug.SetMemoryLimit API is for and it should be safe (performance-wise) to call with a relatively high frequency. Just to be clear, is this also the memory leak scenario?

Yeah, so we would need to continue running a custom tuner though, right? It also seems if we're tuning in "user-space", equivalent results can be achieved with GOGC and GOMEMLIMIT - right?

The 90% case you're describing sounds like a misconfiguration to me; if the application's live heap is really close enough to the memory limit to achieve this kind of death spiral scenario, then the intended behavior is to die after a relatively short period, but it might not if it turns out there's actually plenty of available memory. However, this cotenant situation might not be ideal for the memory limit to begin with.

As a general rule, the memory limit, when used in conjunction with GOGC=off, is not a great fit for an environment where the Go program is potentially cotenant with others, and the others don't have predictable memory usage (or the Go application can't easily respond to cotenant changes). See https://go.dev/doc/gc-guide#Suggested_uses. In this case I'd suggest slightly overcommitting the memory limit to protect against many transient spikes in memory use (in your example here, maybe 95-96%), but set GOGC to something other than off.

This is slightly more nuanced, (and perhaps offtopic) each of our containers runs with a "helper" process responsible for starting up and shipping logs and performing local health checks (it's silly. don't ask). The memory we need to reserve for it varies per application - thus, for small containers, 95% might not be enough. For larger applications, we can increase the limit, but for both cases, we'd likely still need to look at the log output dynamically.

It is not immediately clear to me how to tune the right value of GOGC combined with GOMEMLIMIT. But, more importantly, my understanding of GOMEMLIMIT is that no matter the GOGC value we can still hit the death-spiral situation.

The GC might use at most 50% of the total CPU time, but it can end up using 2-3x more CPU than the actual application work. This is "GC degradation" would be hard to explain/sell to application owners.

I'm not sure I follow. Are you describing a situation in which your application is using say, 25% CPU utilization, and the GC is eating up 50%?

Yeah, @cdvr1993 explained it in the previous comment too. If container has GOMAXPROCS=8, but utilized 3 at that time. Then we hit GOMEMLIMIT, and GC is allowed to (per our understanding) to use up to 4 cores, so GC is now using more CPU than the application. At the same time, anything above 80% CPU utilization (in our experience) results in dramatically increased latency.

We're also concerned with a "degradation on failover" situation - an application that might be okay usually, in case of a sudden increase in traffic, might end up in a death spiral. And this would be precisely the time we need to avoid those.

(Small pedantic note, but the 50% GC CPU limiter is a mechanism to cut off the death spiral; in general a death spiral means that the GC keeps taking on more and more of the CPU load until application progress stops entirely.)

Perhaps we need a different name here then:) What we've observed might not be a death spiral, but a degradation large enough to severely disrupt production. Even with the 50% limit.

I think it depends on the load you're expecting. It's always possible to construct a load that'll cause some form of degradation, even when you're not using the memory limit (something like a tight OOM loop as the service gets restarted would be what I would expect with just GOGC).

Yeah, the problem seems to occur for applications that are "mostly fine", with days between OOMs.

If the memory limit is failing to degrade gracefully, then that's certainly a problem and a bug on our side (perhaps even a design flaw somewhere!). (Perhaps this risk of setting a limit too low such that you sit in the degraded state for too long instead of actually falling over can be considered something like failing to degrade gracefully, and that suggests that even 50% GC CPU is trying too hard as a default. I can believe that but I'd like to acquire more data first.)

However, without more details about the scenario in question, I'm not sure what else we can do to alleviate the concern. One idea is a backpressure mechanism (#29696), but for now I think we've decided to see what others can build since this wisdom of this space seems to have shifted a few times over the last few years (e.g. what metric should we use? Memory? CPU? Scheduling latency? A combination? If so, what combination and weighted how? Perhaps it's very application-dependent?).

IMO it seems like what you built is "almost perfect". We just need the applications to "die faster" - the easiest changes that come to mind would be reducing the limit from 50%, to either something like 25% or a static value (2 cores?).

When I say "almost perfect" I mean it though - I suspect we could rollout the GOMEMLIMIT to 98% of our applications with great results and without a problem, but the remaining users would come after us with pitchforks. And that forces us to use the GOMEMLIMIT as an opt-in, which is very disappointing given the results we see in 98% of the applications.

Thanks for the thoughtful response!

@rabbbit
Copy link

rabbbit commented Jan 27, 2023

Hey @mknyszek @cdvr1993 I raised a new issue in #58106.

@VEDANTDOKANIA
Copy link

@rabbbit @mknyszek @rsc we are facing one issue regarding the memory limit . We are setting the memory limit to 18GB in 24GB server but still GC runs very frequently and eats up 80 percent of CPU and memory used is only 4 to 5 GB max . Also memory limit is goroutine wise ? or how to set the same for whole program.

In the entry point of our application we have specified something like this :-

debug.SetMemoryLimit(int64(8* 1024 * 1024 * 1024))

Is this okay or we need to do something additional. Also where to set the optional unit as described in the documentation

@mknyszek
Copy link
Contributor Author

@VEDANTDOKANIA Unfortunately I can't help with just the information you gave me.

Firstly, how are you determining that the GC runs very frequently, and that it uses 80 percent of CPU? That's far outside of the bounds of what the GC should allow: there's an internal limiter to 50% of available CPU (as defined by GOMAXPROCS) that will prioritize using new memory over additional CPU usage beyond that point.

Please file a new issue with more details, ideally:

  • Platform
  • Go version
  • The environment you're running in (if in a container or cgroup, what is the container's CPU quota?)
  • The STDERR output of running your program with GODEBUG=gctrace=1.

Thanks.

Also memory limit is goroutine wise ? or how to set the same for whole program.

It's for the whole Go process.

In the entry point of our application we have specified something like this :-

debug.SetMemoryLimit(int64(8* 1024 * 1024 * 1024))

That should work fine, but just so we're on the same page, that will set an 8 GiB memory limit. Note that the GC may execute very frequently (but again, still capped at roughly 50%) if this value is set smaller than the baseline memory use your program requires.

Also where to set the optional unit as described in the documentation

The optional unit is part of the GOMEMLIMIT environment variable that Go programs understand. e.g. GOMEMLIMIT=18GiB.

akshayjshah added a commit to connectrpc/connect-go that referenced this issue Jul 26, 2023
The more I look at it, the more convinced I am that this option is a bad
idea. It's very unclear what it's trying to accomplish, and there are
many better options:

* Limiting heap usage? Use the upcoming soft memory limit APIs
  (golang/go#48409).
* Limiting network I/O? Use `http.MaxBytesReader` and set a per-stream
  limit.
* Banning "large messages"? Be clear what you mean, and use
  `unsafe.SizeOf` or `proto.Size` in an interceptor.

Basically, the behavior here (and in grpc-go) is an incoherent middle
ground between Go runtime settings, HTTP-level settings, and a vague "no
large messages" policy.

I'm doubly sure we should delete this because we've decided not to
expose the metrics to track how close users are to the configured limit
:)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

10 participants