runtime/debug: soft memory limit #48409

mknyszek · 2021-09-15T21:51:16Z

Proposal: Soft memory limit

Author: Michael Knyszek

Summary

I propose a new option for tuning the behavior of the Go garbage collector by setting a soft memory limit on the total amount of memory that Go uses.

This option comes in two flavors: a new runtime/debug function called SetMemoryLimit and a GOMEMLIMIT environment variable. In sum, the runtime will try to maintain this memory limit by limiting the size of the heap, and by returning memory to the underlying platform more aggressively. This includes with a mechanism to help mitigate garbage collection death spirals. Finally, by setting GOGC=off, the Go runtime will always grow the heap to the full memory limit.

This new option gives applications better control over their resource economy. It empowers users to:

Better utilize the memory that they already have,
Confidently decrease their memory limits, knowing Go will respect them,
Avoid unsupported forms of garbage collection tuning.

Details

Full design document found here.

Note that, for the time being, this proposal intends to supersede #44309. Frankly, I haven't been able to find a significant use-case for it, as opposed to a soft memory limit overall. If you believe you have a real-world use-case for a memory target where a memory limit with GOGC=off would not solve the same problem, please do not hesitate to post on that issue, contact me on the gophers slack, or via email at mknyszek@golang.org. Please include as much detail as you can.

The text was updated successfully, but these errors were encountered:

gopherbot · 2021-09-15T22:10:01Z

Change https://golang.org/cl/350116 mentions this issue: design: add proposal for a soft memory limit

For golang/go#48409. Change-Id: I4e5d6d117982f51108dca83a8e59b118c2b6f4bf Reviewed-on: https://go-review.googlesource.com/c/proposal/+/350116 Reviewed-by: Michael Pratt <mpratt@google.com>

mpx · 2021-09-21T15:27:45Z

Afaict, the impact of memory limit is visible once the GC is CPU throttled, but not before. Would it be worth exposing the current effective GOGC as well?

mknyszek · 2021-09-21T15:58:56Z

@mpx I think that's an interesting idea. If GOGC is not off, then you have a very clear sign of throttling in telemetry. However, if GOGC=off I think it's harder to tell, and it gets blurry once the runtime starts bumping up against the GC CPU utilization limit, i.e. what does effective GOGC mean when the runtime is letting itself exceed the heap goal?

I think that's pretty close. Ideally we would have just one metric that could show, at-a-glance, "are you in the red, and if so, how far?"

raulk · 2021-09-27T16:55:19Z

In case you find this useful as a reference (and possibly to include in "prior art"), the go-watchdog library schedules GC according to a user-defined policy. It can infer limits from the environment/host, container, and it can target a maximum heap size defined by the user. I built this library to deal with #42805, and ever since we integrated it into https://github.com/filecoin-project/lotus, we haven't had a single OOM reported.

rsc · 2021-10-06T22:07:39Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2021-10-13T17:11:07Z

@mknyszek what is the status of this?

mknyszek · 2021-10-13T18:20:47Z

@rsc I believe the design is complete. I've received feedback on the design, iterated on it, and I've arrived at a point where there aren't any major remaining comments that need to be addressed. I think the big question at the center of this proposal is whether the API benefit is worth the cost. The implementation can change and improve over time; most of the details are internal.

Personally, I think the answer is yes. I've found that mechanisms that respects users' memory limits and that give the GC the flexibility to use more of the available memory are quite popular. Where Go users implement this themselves, they're left working with tools (like runtime.GC/debug.FreeOSMemory and heap ballasts) that have some significant pitfalls. The proposal also takes steps to mitigate the most significant costs of having a new GC tuning knob.

In terms of implementation, I have some of the foundational bits up for review now that I wish to land in 1.18 (I think they're uncontroversial improvements, mostly related to the scavenger). My next step is create a complete implementation and trial it on real workloads. I suspect that a complete implementation won't land in 1.18 at this point, which is fine. It'll give me time to work out any unexpected issues with the design in practice.

rsc · 2021-10-20T17:29:17Z

Thanks for the summary. Overall the reaction here seems overwhelmingly positive.

Does anyone object to doing this?

kent-h · 2021-10-26T15:45:00Z

I have some of the foundational bits up for review now that I wish to land in 1.18

I suspect that a complete implementation won't land in 1.18

@mknyszek I'm somewhat confused by this. At a high level, what are you hoping to include in 1.18, and what do you expect to come later?
(Specifically: will we have extra knobs in 1.18, or will these changes be entirely internal?)

mknyszek · 2021-10-26T16:00:04Z

@kent-h The proposal has not been accepted, so the API will definitely not land in 1.18. All that I'm planning to land is work on the scavenger, to make it scale a bit better. This is useful in its own right, and it happens that the implementation of SetMemoryLimit as described in the proposal depends on it. There won't be any internal functionality pertaining to SetMemoryLimit in the tree in Go 1.18.

rsc · 2021-10-27T18:01:49Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2021-11-03T18:05:24Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

gopherbot · 2022-05-16T14:30:19Z

Change https://go.dev/cl/406574 mentions this issue: runtime: reduce useless computation when memoryLimit is off

gopherbot · 2022-05-16T15:19:46Z

Change https://go.dev/cl/406575 mentions this issue: runtime: update description of GODEBUG=scavtrace=1

For #48409. Change-Id: I056afcdbc417ce633e48184e69336213750aae28 Reviewed-on: https://go-review.googlesource.com/c/go/+/406575 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>

WIP implementation of a memory limit. This will likely be superseded by Go's incoming soft memory limit feature (coming August?), but it's interesting to explore nonetheless. Each time we receive a PUT request, check the used memory. To calculate used memory, we use runtime.ReadMemStats. I was concerned that it would have a large performance cost, because it stops the world on every invocation, but it turns out that it has previously been optimised. Return a 500 if this value has exceeded the current max memory. We use TotalAlloc do determine used memory, because this seemed to be closest to the container memory usage reported by Docker. This is broken regardless, because the value does not decrease as we delete keys (possibly because the store map does not shrink). If we can work out a constant overhead for the map data structure, we might be able to compute memory usage based on the size of keys and values. I think it will be difficult to do this reliably, though. Given that a new language feature will likely remove the need for this work, a simple interim solution might be to implement a max number of objects limit, which provides some value in situations where the user can predict the size of keys and values. TODO: * Make the memory limit configurable by way of an environment variable * Push the limit checking code down to the put handler golang/go#48409 golang/go@4a7cf96 patrickmn/go-cache#5 https://github.com/vitessio/vitess/blob/main/go/cache/lru_cache.go golang/go#20135 https://redis.io/docs/getting-started/faq/#what-happens-if-redis-runs-out-of-memory https://redis.io/docs/manual/eviction/

gopherbot · 2022-06-06T21:34:31Z

Change https://go.dev/cl/410735 mentions this issue: doc/go1.19: adjust runtime release notes

gopherbot · 2022-06-06T21:34:32Z

Change https://go.dev/cl/410734 mentions this issue: runtime: document GOMEMLIMIT in environment variables section

For #48409. Change-Id: Ia6616a377bc4c871b7ffba6f5a59792a09b64809 Reviewed-on: https://go-review.googlesource.com/c/go/+/410734 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Chris Hines <chris.cs.guy@gmail.com> Reviewed-by: Russ Cox <rsc@golang.org>

This addresses comments from CL 410356. For #48409. For #51400. Change-Id: I03560e820a06c0745700ac997b02d13bc03adfc6 Reviewed-on: https://go-review.googlesource.com/c/go/+/410735 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Chris Hines <chris.cs.guy@gmail.com> Reviewed-by: Russ Cox <rsc@golang.org>

rabbbit · 2023-01-22T08:53:30Z

Hey @mknyszek - first of all, thanks for the excellent work; this is great.

I wanted to share our experience thinking about enabling this in production. It works great and exactly as advertised. Some well-maintained applications have enabled it with great success, and the usage is spreading organically.

We'd ideally want to enable it for everyone by default (a vast majority of our applications have plenty of memory available), but we're currently too afraid to do this. The reason is the death spirals you called in the proposal. Applications leaking memory, with GOMEMLIMIT, can get to a significantly degraded state. Paradoxically, those applications prefer to OOM, die quickly and be restarted than to struggle for a long time. The number of applications makes avoiding leaks unfeasible.

A part of the problem (perhaps) is that we lack a good enough way of setting the right limit. We cannot set it to 98-99% of the container memory because some other applications can be running there. But, if we set it to 90%, once we hit the death spiral situation, we're in a degraded state for too long - it can take hours for OOM, and in the meantime, we are at risk of all containers of an application entering the degraded state.

Another aspect is that our containers typically don't use close to all the available CPU time. So the assumption from the gc-guide, while true, has a slightly different result in practice:

The intuition behind the 50% GC CPU limit is based on the worst-case impact on a program with ample available memory. In the case of a misconfiguration of the memory limit, where it is set too low mistakenly, the program will slow down at most by 2x, because the GC can't take more than 50% of its CPU time away.

The GC might use at most 50% of the total CPU time, but it can end up using 2-3x more CPU than the actual application work. This is "GC degradation" would be hard to explain/sell to application owners.

We're also concerned with a "degradation on failover" situation - an application that might be okay usually, in case of a sudden increase in traffic, might end up in a death spiral. And this would be precisely the time we need to avoid those.

What we're doing now is:

most high-core applications have an internal GC tuner by @cdvr1993 described here. This work predates your work but is stable.
some applications are opting in for enabling GOMEMLIMIT independently.
we'd like to enable the GC tuning on more applications - ideally with GOMEMLIMIT to reduce the amount of custom code. Since we're afraid of the death spirals, though, we've discussed building a "lightweight" version of our tuner that would watch runtime stats (perhaps runtime/metrics: add /gc/heap/live:bytes #56857) that would dynamically limit the GC usage more aggressively and let applications die faster.

Hope this is useful. Again, thanks for the excellent work.

mknyszek · 2023-01-23T17:15:29Z

Thanks for the detailed feedback and I'm glad it's working well for your overall!

Speaking broadly, I'd love to know more about what exactly this degraded state looks like. What is the downstream effect? Latency increase? Throughput decrease? Both? If you could obtain a GODEBUG=gctrace=1 (outputs to STDERR) of this degraded state, that would be helpful in identifying what if any next steps we should take.

We'd ideally want to enable it for everyone by default (a vast majority of our applications have plenty of memory available), but we're currently too afraid to do this. The reason is the death spirals you called in the proposal. Applications leaking memory, with GOMEMLIMIT, can get to a significantly degraded state. Paradoxically, those applications prefer to OOM, die quickly and be restarted than to struggle for a long time. The number of applications makes avoiding leaks unfeasible.

Choosing to die quickly over struggling for a long time is an intentional point in the design. In these difficult situations something has to give and we chose to make that memory.

But also if the scenario here is memory leaks, it's hard to do much about that without fixing the leak. The live heap will grow and eventually even without GOMEMLIMIT you'll OOM as well. GOMEMLIMIT isn't really designed to deal with a memory leak well (generally, we consider memory leaks to be a bug in long-running applications), and yeah I can see turning it on basically turning into "well, it just gets slower before it dies, and it takes longer to die," which may be worse than not setting a memory limit at all.

As for fixing memory leaks, we're currently planning some work on improving the heap analysis situation. I hope that'll make keeping applications leak-free more feasible in the future. (#57447)

(I recognize that encountering a memory leak bug at some point is inevitable, but in general we don't expect long-running applications to run under the expectation of memory leaks. I also get that it's a huge pain these days to debug them, but we're looking into trying to make that better with heap analysis.)

A part of the problem (perhaps) is that we lack a good enough way of setting the right limit. We cannot set it to 98-99% of the container memory because some other applications can be running there. But, if we set it to 90%, once we hit the death spiral situation, we're in a degraded state for too long - it can take hours for OOM, and in the meantime, we are at risk of all containers of an application entering the degraded state.

FTR that's what the runtime/debug.SetMemoryLimit API is for and it should be safe (performance-wise) to call with a relatively high frequency. Just to be clear, is this also the memory leak scenario?

The 90% case you're describing sounds like a misconfiguration to me; if the application's live heap is really close enough to the memory limit to achieve this kind of death spiral scenario, then the intended behavior is to die after a relatively short period, but it might not if it turns out there's actually plenty of available memory. However, this cotenant situation might not be ideal for the memory limit to begin with.

As a general rule, the memory limit, when used in conjunction with GOGC=off, is not a great fit for an environment where the Go program is potentially cotenant with others, and the others don't have predictable memory usage (or the Go application can't easily respond to cotenant changes). See https://go.dev/doc/gc-guide#Suggested_uses. In this case I'd suggest slightly overcommitting the memory limit to protect against many transient spikes in memory use (in your example here, maybe 95-96%), but set GOGC to something other than off.

The GC might use at most 50% of the total CPU time, but it can end up using 2-3x more CPU than the actual application work. This is "GC degradation" would be hard to explain/sell to application owners.

I'm not sure I follow. Are you describing a situation in which your application is using say, 25% CPU utilization, and the GC is eating up 50%?

We're also concerned with a "degradation on failover" situation - an application that might be okay usually, in case of a sudden increase in traffic, might end up in a death spiral. And this would be precisely the time we need to avoid those.

(Small pedantic note, but the 50% GC CPU limiter is a mechanism to cut off the death spiral; in general a death spiral means that the GC keeps taking on more and more of the CPU load until application progress stops entirely.)

I think it depends on the load you're expecting. It's always possible to construct a load that'll cause some form of degradation, even when you're not using the memory limit (something like a tight OOM loop as the service gets restarted would be what I would expect with just GOGC).

If the memory limit is failing to degrade gracefully, then that's certainly a problem and a bug on our side (perhaps even a design flaw somewhere!). (Perhaps this risk of setting a limit too low such that you sit in the degraded state for too long instead of actually falling over can be considered something like failing to degrade gracefully, and that suggests that even 50% GC CPU is trying too hard as a default. I can believe that but I'd like to acquire more data first.)

However, without more details about the scenario in question, I'm not sure what else we can do to alleviate the concern. One idea is a backpressure mechanism (#29696), but for now I think we've decided to see what others can build since this wisdom of this space seems to have shifted a few times over the last few years (e.g. what metric should we use? Memory? CPU? Scheduling latency? A combination? If so, what combination and weighted how? Perhaps it's very application-dependent?).

What we're doing now is:

As a final note, I just want to point out that at the end of the day, the memory limit is just another tool in the toolkit. If you can make some of your applications work better without it, I don't think that necessarily means it's a failure of the memory limit (sometimes it might be, but not always). I'm not saying that you necessarily think the memory limit should be used everywhere, just wanted to leave that here for anyone who comes looking at this thread. :)

cdvr1993 · 2023-01-23T20:23:25Z

Hi @mknyszek

Regarding the 50% cpu limit... Unless we understand incorrectly it means it can use up to that CPU to avoid going over the soft limit, but for many of our applications anything more than 20% GC CPU can have a serious impact (mostly when on failover state). Currently, we dynamically change GOGC when there is memory available we tend to increase it, when there isn't we just keep decreasing it to ensure our own soft limit, but we have a minimum threshold and we allow different service owners to set their own minimum threshold. That's more or less what we are missing with Go soft limit.

We currently don't have an example using soft limit, but in the past we have had issues with GOGC being too low and this caused bigger problems than a few instances crashing due to OOM. So, based on that assumption we think the scenario would repeat with soft limit.

What would be nice is a way of modifying how much CPU the GC can take to ensure the soft limit? Or a minimum GOGC value so that service owners decide at what point they believe is better to OOM than the degradation caused to the elevated GC.

Or would you suggest is better to wait for #56857 to have a way to keep an eye on the size of live bytes, so that when it gets close to the soft limit make a decision of either eat the cost of GC or just OOM?

rabbbit · 2023-01-24T07:29:10Z

Thanks for the detailed feedback and I'm glad it's working well for your overall!

Speaking broadly, I'd love to know more about what exactly this degraded state looks like. What is the downstream effect? Latency increase? Throughput decrease? Both? If you could obtain a GODEBUG=gctrace=1 (outputs to STDERR) of this degraded state, that would be helpful in identifying what if any next steps we should take.

Getting the traces to work in production would be hard. We have an HTTP handler to tune GOMEMLIMIT per container, so we can experiment with that with reasonable safety. There's no way to runtime way to enable traces, right?

That being said I can perhaps try to reproduce the same situation in staging. What we have seen in production was a significant CPU time utilization increase, leading to CPU throttling, leading to both latency increase and throughput decrease.

Below is screenshot of a "slowly leaking application" (more explained below) where we enabled GOMEMLIMIT temporarily. Note the CPU utilization increased significantly more than we expected - more than 50% of GOMAXPROCS.

We'd ideally want to enable it for everyone by default (a vast majority of our applications have plenty of memory available), but we're currently too afraid to do this. The reason is the death spirals you called in the proposal. Applications leaking memory, with GOMEMLIMIT, can get to a significantly degraded state. Paradoxically, those applications prefer to OOM, die quickly and be restarted than to struggle for a long time. The number of applications makes avoiding leaks unfeasible.

Choosing to die quickly over struggling for a long time is an intentional point in the design. In these difficult situations something has to give and we chose to make that memory.

But also if the scenario here is memory leaks, it's hard to do much about that without fixing the leak. The live heap will grow and eventually even without GOMEMLIMIT you'll OOM as well. GOMEMLIMIT isn't really designed to deal with a memory leak well (generally, we consider memory leaks to be a bug in long-running applications), and yeah I can see turning it on basically turning into "well, it just gets slower before it dies, and it takes longer to die," which may be worse than not setting a memory limit at all.

As for fixing memory leaks, we're currently planning some work on improving the heap analysis situation. I hope that'll make keeping applications leak-free more feasible in the future. (#57447)
(I recognize that encountering a memory leak bug at some point is inevitable, but in general we don't expect long-running applications to run under the expectation of memory leaks. I also get that it's a huge pain these days to debug them, but we're looking into trying to make that better with heap analysis.)

So I think you might be too optimistic vs what we see in our reality here (sorry:)). We:

have applications that are leaking quick, they restart often, they need to be fixed. Those typically have higher priority, and can be diagnosed with some effort - I wouldn't actually call it pain though, profiles are typically helpful enough.
"slowly leaking memory" applications that just very slowly accumulate memory as they run. These are actually low-priority - as long as the {release_frequency}>2-5*{time_to_oom}, fixing it will not get prioritized. Especially if some of the leaks are in gnarly bits like stat emission. This only becomes a problem during extended quiet periods - the expectation is still that the applications will crash rather than degrade.

In summary though, we strongly expect leaks to be around forever.

A part of the problem (perhaps) is that we lack a good enough way of setting the right limit. We cannot set it to 98-99% of the container memory because some other applications can be running there. But, if we set it to 90%, once we hit the death spiral situation, we're in a degraded state for too long - it can take hours for OOM, and in the meantime, we are at risk of all containers of an application entering the degraded state.

FTR that's what the runtime/debug.SetMemoryLimit API is for and it should be safe (performance-wise) to call with a relatively high frequency. Just to be clear, is this also the memory leak scenario?

Yeah, so we would need to continue running a custom tuner though, right? It also seems if we're tuning in "user-space", equivalent results can be achieved with GOGC and GOMEMLIMIT - right?

The 90% case you're describing sounds like a misconfiguration to me; if the application's live heap is really close enough to the memory limit to achieve this kind of death spiral scenario, then the intended behavior is to die after a relatively short period, but it might not if it turns out there's actually plenty of available memory. However, this cotenant situation might not be ideal for the memory limit to begin with.

As a general rule, the memory limit, when used in conjunction with GOGC=off, is not a great fit for an environment where the Go program is potentially cotenant with others, and the others don't have predictable memory usage (or the Go application can't easily respond to cotenant changes). See https://go.dev/doc/gc-guide#Suggested_uses. In this case I'd suggest slightly overcommitting the memory limit to protect against many transient spikes in memory use (in your example here, maybe 95-96%), but set GOGC to something other than off.

This is slightly more nuanced, (and perhaps offtopic) each of our containers runs with a "helper" process responsible for starting up and shipping logs and performing local health checks (it's silly. don't ask). The memory we need to reserve for it varies per application - thus, for small containers, 95% might not be enough. For larger applications, we can increase the limit, but for both cases, we'd likely still need to look at the log output dynamically.

It is not immediately clear to me how to tune the right value of GOGC combined with GOMEMLIMIT. But, more importantly, my understanding of GOMEMLIMIT is that no matter the GOGC value we can still hit the death-spiral situation.

The GC might use at most 50% of the total CPU time, but it can end up using 2-3x more CPU than the actual application work. This is "GC degradation" would be hard to explain/sell to application owners.

I'm not sure I follow. Are you describing a situation in which your application is using say, 25% CPU utilization, and the GC is eating up 50%?

Yeah, @cdvr1993 explained it in the previous comment too. If container has GOMAXPROCS=8, but utilized 3 at that time. Then we hit GOMEMLIMIT, and GC is allowed to (per our understanding) to use up to 4 cores, so GC is now using more CPU than the application. At the same time, anything above 80% CPU utilization (in our experience) results in dramatically increased latency.

We're also concerned with a "degradation on failover" situation - an application that might be okay usually, in case of a sudden increase in traffic, might end up in a death spiral. And this would be precisely the time we need to avoid those.

(Small pedantic note, but the 50% GC CPU limiter is a mechanism to cut off the death spiral; in general a death spiral means that the GC keeps taking on more and more of the CPU load until application progress stops entirely.)

Perhaps we need a different name here then:) What we've observed might not be a death spiral, but a degradation large enough to severely disrupt production. Even with the 50% limit.

I think it depends on the load you're expecting. It's always possible to construct a load that'll cause some form of degradation, even when you're not using the memory limit (something like a tight OOM loop as the service gets restarted would be what I would expect with just GOGC).

Yeah, the problem seems to occur for applications that are "mostly fine", with days between OOMs.

If the memory limit is failing to degrade gracefully, then that's certainly a problem and a bug on our side (perhaps even a design flaw somewhere!). (Perhaps this risk of setting a limit too low such that you sit in the degraded state for too long instead of actually falling over can be considered something like failing to degrade gracefully, and that suggests that even 50% GC CPU is trying too hard as a default. I can believe that but I'd like to acquire more data first.)

However, without more details about the scenario in question, I'm not sure what else we can do to alleviate the concern. One idea is a backpressure mechanism (#29696), but for now I think we've decided to see what others can build since this wisdom of this space seems to have shifted a few times over the last few years (e.g. what metric should we use? Memory? CPU? Scheduling latency? A combination? If so, what combination and weighted how? Perhaps it's very application-dependent?).

IMO it seems like what you built is "almost perfect". We just need the applications to "die faster" - the easiest changes that come to mind would be reducing the limit from 50%, to either something like 25% or a static value (2 cores?).

When I say "almost perfect" I mean it though - I suspect we could rollout the GOMEMLIMIT to 98% of our applications with great results and without a problem, but the remaining users would come after us with pitchforks. And that forces us to use the GOMEMLIMIT as an opt-in, which is very disappointing given the results we see in 98% of the applications.

Thanks for the thoughtful response!

rabbbit · 2023-01-27T01:12:29Z

Hey @mknyszek @cdvr1993 I raised a new issue in #58106.

VEDANTDOKANIA · 2023-07-19T10:07:15Z

@rabbbit @mknyszek @rsc we are facing one issue regarding the memory limit . We are setting the memory limit to 18GB in 24GB server but still GC runs very frequently and eats up 80 percent of CPU and memory used is only 4 to 5 GB max . Also memory limit is goroutine wise ? or how to set the same for whole program.

In the entry point of our application we have specified something like this :-

debug.SetMemoryLimit(int64(8* 1024 * 1024 * 1024))

Is this okay or we need to do something additional. Also where to set the optional unit as described in the documentation

mknyszek · 2023-07-19T14:42:26Z

@VEDANTDOKANIA Unfortunately I can't help with just the information you gave me.

Firstly, how are you determining that the GC runs very frequently, and that it uses 80 percent of CPU? That's far outside of the bounds of what the GC should allow: there's an internal limiter to 50% of available CPU (as defined by GOMAXPROCS) that will prioritize using new memory over additional CPU usage beyond that point.

Please file a new issue with more details, ideally:

Platform
Go version
The environment you're running in (if in a container or cgroup, what is the container's CPU quota?)
The STDERR output of running your program with GODEBUG=gctrace=1.

Thanks.

Also memory limit is goroutine wise ? or how to set the same for whole program.

It's for the whole Go process.

In the entry point of our application we have specified something like this :-

debug.SetMemoryLimit(int64(8* 1024 * 1024 * 1024))

That should work fine, but just so we're on the same page, that will set an 8 GiB memory limit. Note that the GC may execute very frequently (but again, still capped at roughly 50%) if this value is set smaller than the baseline memory use your program requires.

Also where to set the optional unit as described in the documentation

The optional unit is part of the GOMEMLIMIT environment variable that Go programs understand. e.g. GOMEMLIMIT=18GiB.

The more I look at it, the more convinced I am that this option is a bad idea. It's very unclear what it's trying to accomplish, and there are many better options: * Limiting heap usage? Use the upcoming soft memory limit APIs (golang/go#48409). * Limiting network I/O? Use `http.MaxBytesReader` and set a per-stream limit. * Banning "large messages"? Be clear what you mean, and use `unsafe.SizeOf` or `proto.Size` in an interceptor. Basically, the behavior here (and in grpc-go) is an incoherent middle ground between Go runtime settings, HTTP-level settings, and a vague "no large messages" policy. I'm doubly sure we should delete this because we've decided not to expose the metrics to track how close users are to the configured limit :)

mknyszek added this to the Go1.18 milestone Sep 15, 2021

gopherbot added the Proposal label Sep 15, 2021

mknyszek added the GarbageCollector label Sep 15, 2021

mknyszek modified the milestones: Go1.18, Proposal Sep 22, 2021

seankhliao mentioned this issue Oct 25, 2021

proposal: runtime: add way for applications to respond to GC backpressure #29696

Open

rsc added the Proposal-FinalCommentPeriod label Oct 27, 2021

rsc changed the title ~~proposal: runtime/debug: soft memory limit~~ runtime/debug: soft memory limit Nov 3, 2021

rsc added the Proposal-Accepted label Nov 3, 2021

rsc modified the milestones: Proposal, Backlog Nov 3, 2021

mknyszek mentioned this issue Nov 10, 2021

proposal: runtime/debug: user-configurable memory target #44309

Closed

guscarreon mentioned this issue Nov 11, 2021

Add a garbage collection threshold prebid/prebid-server#2081

Merged

syumai mentioned this issue Nov 25, 2021

20211125 Gophers Code Reading Party basebank/gophers-code-reading-party#16

Closed

mknyszek mentioned this issue Jan 13, 2022

runtime: frequent ReadMemStats will cause heap_live to exceed next_gc #50592

Open

aclements added this to Go Compiler / Runtime Mar 1, 2022

aclements moved this to In Progress in Go Compiler / Runtime Mar 1, 2022

mknyszek mentioned this issue May 18, 2022

image/gif: TestDecodeMemoryConsumption flakes #35166

Closed

cherrymui mentioned this issue Jun 9, 2022

api: audit for Go 1.19 #53310

Closed

mknyszek mentioned this issue Jun 13, 2022

x/website: add a Go GC guide #53360

Closed

rsc unassigned mknyszek Jun 23, 2022

MarcoPolo mentioned this issue Jun 23, 2022

rcmgr: consider removing memory tracking, except for muxers libp2p/go-libp2p#1708

Open

rsc moved this to Accepted in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

rabbbit mentioned this issue Jan 27, 2023

runtime/debug: GOMEMLIMIT prolonged high GC CPU utilization before container OOM #58106

Open

mknyszek removed this from Go Compiler / Runtime Feb 15, 2023

rsc removed this from Proposals May 3, 2023

tiancaiamao mentioned this issue Dec 15, 2023

sessionctx: change @@tidb_enable_gogc_tuner default value to 0 pingcap/tidb#49420

Open

13 tasks

golang locked and limited conversation to collaborators Jul 18, 2024

gopherbot added the FrozenDueToAge label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/debug: soft memory limit #48409

runtime/debug: soft memory limit #48409

mknyszek commented Sep 15, 2021 •

edited

Loading

gopherbot commented Sep 15, 2021

mpx commented Sep 21, 2021

mknyszek commented Sep 21, 2021

raulk commented Sep 27, 2021 •

edited

Loading

rsc commented Oct 6, 2021

rsc commented Oct 13, 2021

mknyszek commented Oct 13, 2021

rsc commented Oct 20, 2021

kent-h commented Oct 26, 2021

mknyszek commented Oct 26, 2021

rsc commented Oct 27, 2021

rsc commented Nov 3, 2021

gopherbot commented May 16, 2022

gopherbot commented May 16, 2022

gopherbot commented Jun 6, 2022

gopherbot commented Jun 6, 2022

rabbbit commented Jan 22, 2023

mknyszek commented Jan 23, 2023

cdvr1993 commented Jan 23, 2023

rabbbit commented Jan 24, 2023

rabbbit commented Jan 27, 2023

VEDANTDOKANIA commented Jul 19, 2023

mknyszek commented Jul 19, 2023

runtime/debug: soft memory limit #48409

runtime/debug: soft memory limit #48409

Comments

mknyszek commented Sep 15, 2021 • edited Loading

Proposal: Soft memory limit

Summary

Details

gopherbot commented Sep 15, 2021

mpx commented Sep 21, 2021

mknyszek commented Sep 21, 2021

raulk commented Sep 27, 2021 • edited Loading

rsc commented Oct 6, 2021

rsc commented Oct 13, 2021

mknyszek commented Oct 13, 2021

rsc commented Oct 20, 2021

kent-h commented Oct 26, 2021

mknyszek commented Oct 26, 2021

rsc commented Oct 27, 2021

rsc commented Nov 3, 2021

gopherbot commented May 16, 2022

gopherbot commented May 16, 2022

gopherbot commented Jun 6, 2022

gopherbot commented Jun 6, 2022

rabbbit commented Jan 22, 2023

mknyszek commented Jan 23, 2023

cdvr1993 commented Jan 23, 2023

rabbbit commented Jan 24, 2023

rabbbit commented Jan 27, 2023

VEDANTDOKANIA commented Jul 19, 2023

mknyszek commented Jul 19, 2023

mknyszek commented Sep 15, 2021 •

edited

Loading

raulk commented Sep 27, 2021 •

edited

Loading