Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: pageAlloc.searchAddr may point to unmapped memory in discontiguous heaps, violating its invariant #40191

Closed
mknyszek opened this issue Jul 13, 2020 · 10 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Jul 13, 2020

It's been brought to my attention by the Fuchsia team that in cases where the Go heap is very discontiguous, the documented invariant for (*pageAlloc).searchAddr may be violated, leading to errant segfaults.

Consider the following case.

n                   = 0   1   2   ... i-1 i    i+1 ...
summary[0][n].max() = 0   0   0   ... 0   1    100 ...

Suppose searchAddr points into the region described by summary i-1, and on the last page allocation, it was exhausted. An allocation then comes in looking for, say, 12 consecutive pages. (*pageAlloc).find will look at summary i, and skip over it, but will set the "free window" to the first page of the region described by summary i, which need not be mapped.

In a contiguous heap we're unlikely to run into this case because the first page of i is likely to be mapped, so the summary would be there.

Most operating systems, even when not following the address hints provided to sysReserve in the runtime, tend to keep the heap contiguous. Fuchsia specifically goes out of its way to map memory discontinuously unless a flag (ZX_VM_COMPACT) is passed to the equivalent function.

This is a real problem, even for existing systems. Note that the above scenario need not happen for summary level 0, it can happen at any level (say there's a small hole in the address space due to a region being reserved by the OS or something and we allocate virtual address space around it).

Fixing this problem is not straight-forward without some kind of performance consequences (though it's unclear to me how big those consequences will be).

Fundamentally, this issue arises because we try to update searchAddr "on the way" to allocating memory which may not be the first free page. One simplifying solution is to simply not do that. It means that there will need to be more iteration when allocating, and we're less likely to take the fast path, but it will be correct.

Another option is to actually find the first piece of mapped memory in that region, which would be possible by iterating over (*pageAlloc).inUse (inUse will be large, but the search could be turned into a binary search as per the comment in addrRanges.findSucc). We could also find it by walking down the summaries, but this is unnecessary and is basically like doing the find operation twice, which is a bit wasteful.

A third option is to remove the invariant on searchAddr altogether and just be a lot more strict in all the places where we try to use searchAddr to access summaries directly (mostly on fast paths in the page allocator and scavenger). The trouble with this is that the only efficient way to verify that searchAddr is valid (short of walking the tree, which we're hoping to avoid in these cases!) is to check inUse, but that's not exactly efficient either. In this case, it would likely be better to just always move searchAddr to the right location on the slow path by checking inUse.

My vote is for option 2. It's not terribly complicated, and the change is local to just the find operation. Option 1 might be preferred anyway just because it simplifies find a bunch, though.

This issue should not be considered a blocking issue for Go 1.15 because the bug was technically introduced in Go 1.14, but it should be fixed and probably backported too, since it can cause failures with no workaround.

CC @aclements @prattmic

@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Jul 13, 2020
@mknyszek mknyszek added this to the Go1.15 milestone Jul 13, 2020
@mknyszek mknyszek self-assigned this Jul 13, 2020
@mknyszek
Copy link
Contributor Author

@gopherbot Please open a backport issue to Go 1.14.

@gopherbot
Copy link

Backport issue(s) opened: #40192 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@randall77
Copy link
Contributor

but will set the "free window" to the first page of the region described by summary i, which need not be mapped.

Why then is there a free page indicated in unmapped memory? How is page i not mapped?
I would think all unmapped pages have either max==0 (not allocatable), or max==64 (allocatable, but not allocated yet).

@mknyszek
Copy link
Contributor Author

@randall77 The higher-level summaries describe large portions of the address space, some of which may not be mapped. IIRC each level-0 summary on linux/amd64 represents 128 GiB of address space. If your heap is small (like one arena in size) then one level-0 summary will indicate that pages are free in that 128 GiB region are free, but not all of that 128 GiB need be mapped. You're right that unmapped pages necessarily have max == 0 at the chunk level (since chunks are <= the arena size), but when describing larger regions there could be a mix of mapped and unmapped memory in there.

Also, d'oh, I forgot to CC you. Sorry about that.

@prattmic
Copy link
Member

As an addendum to option two: pageAlloc.find can do a lookup in mheap.arenas to determine if the candidate free window falls in an existing arena as a fast path. If not, it needs to fallback to searching inUse for the first usable page.

@gopherbot
Copy link

Change https://golang.org/cl/242680 mentions this issue: runtime: ensure pageAlloc.find returns a valid searchAddr

@gopherbot
Copy link

Change https://golang.org/cl/242678 mentions this issue: runtime: add tests for addrRanges.findSucc

@gopherbot
Copy link

Change https://golang.org/cl/242677 mentions this issue: runtime: define the AddrRange used for testing in terms of addrRange

@gopherbot
Copy link

Change https://golang.org/cl/242679 mentions this issue: runtime: implement addrRanges.findSucc with a binary search

@gopherbot
Copy link

Change https://golang.org/cl/246197 mentions this issue: [release-branch.go.1.14] runtime: validate candidate searchAddr in pageAlloc.find

gopherbot pushed a commit that referenced this issue Aug 22, 2020
…geAlloc.find

Currently pageAlloc.find attempts to find a better estimate for the
first free page in the heap, even if the space its looking for isn't
necessarily going to be the first free page in the heap (e.g. if npages
>= 2). However, in doing so it has the potential to return a searchAddr
candidate that doesn't actually correspond to mapped memory, but this
candidate might still be adopted. As a result, pageAlloc.alloc's fast
path may look at unmapped summary memory and segfault. This case is rare
on most operating systems since the heap is kept fairly contiguous, so
the chance that the candidate searchAddr discovered is unmapped is
fairly low. Even so, this is totally possible and outside the user's
control when it happens (in fact, it's likely to happen consistently for
a given user on a given system).

Fix this problem by ensuring that our candidate always points to mapped
memory. We do this by looking at mheap's arenas structure first. If it
turns out our candidate doesn't correspond to mapped memory, then we
look at inUse to round up the searchAddr to the next mapped address.

While we're here, clean up some documentation related to searchAddr.

For #40191.
Fixes #40192.

Change-Id: I759efec78987e4a8fde466ae45aabbaa3d9d4214
Reviewed-on: https://go-review.googlesource.com/c/go/+/242680
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
(cherry picked from commit b56791c)
Reviewed-on: https://go-review.googlesource.com/c/go/+/246197
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
gopherbot pushed a commit that referenced this issue Oct 22, 2020
Currently the AddrRange used for testing is defined separately from
addrRange in the runtime, making it difficult to test it as well as
addrRanges. Redefine AddrRange in terms of addrRange instead.

For #40191.

Change-Id: I3aa5b8df3e4c9a3c494b46ab802dd574b2488141
Reviewed-on: https://go-review.googlesource.com/c/go/+/242677
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
gopherbot pushed a commit that referenced this issue Oct 23, 2020
This change adds a test suite for addrRanges.findSucc so we can change
the implementation more safely.

For #40191.

Change-Id: I14a834b6d54836cbc676eb0edb292ba6176705cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/242678
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Oct 26, 2020
This change modifies addrRanges.findSucc to more efficiently find the
successor range in an addrRanges by using a binary search to narrow down
large addrRanges and iterate over no more than 8 addrRanges.

This change makes the runtime more robust against systems that may
aggressively randomize the address space mappings it gives the runtime
(e.g. Fuchsia).

For #40191.

Change-Id: If529df2abd2edb1b1496d8690ddd284ecd7138c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/242679
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@golang golang locked and limited conversation to collaborators Jul 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants