Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: aix-ppc64 builder is failing with "bad prune", "addr range base and limit are not in the same memory segment" fatal errors #38966

Closed
dmitshur opened this issue May 8, 2020 · 11 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-AIX
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented May 8, 2020

The aix-ppc64 builder has started to fail on build.golang.org continuously since b1a48af or 55ec518. The errors look like:

fatal error: bad prune
fatal error: addr range base and limit are not in the same memory segment

It's unclear to me if one of the CLs has started to cause this, or if the builder has a problem. Needs investigation.

Edit: @mknyszek has narrowed it down, see comment below.

Previously, a flaky failure in net/http made it harder to tell where the problem was:

--- FAIL: TestServerSetKeepAlivesEnabledClosesConns (2.08s)
    serve_test.go:5513: idle count after SetKeepAlivesEnabled called = 1; want 0
FAIL
FAIL	net/http	8.191s

/cc @trex58 @mknyszek @mvdan

@dmitshur dmitshur added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-AIX labels May 8, 2020
@dmitshur dmitshur added this to the Go1.15 milestone May 8, 2020
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label May 8, 2020
@mknyszek
Copy link
Contributor

mknyszek commented May 8, 2020

Ah, the bad prune and base and limit not in the same memory segment errors are almost certainly my CL(s) that I landed recently. We do something a little funky with arenaBaseOffset on AIX.

CC @Helflym

@dmitshur dmitshur changed the title x/build: aix-ppc64 builder is failing runtime: aix-ppc64 builder is failing with "bad prune", "addr range base and limit are not in the same memory segment" fatal errors May 8, 2020
@dmitshur dmitshur removed the Builders x/build issues (builders, bots, dashboards) label May 8, 2020
@Helflym
Copy link
Contributor

Helflym commented May 11, 2020

I think the problem comes from the fact that arenaBaseOffset is not the actual base offset 0x0a00_0000_0000_0000 but it's opposite 0xf600_0000_0000_0000. Thus, minOffAddr is not -arenaBaseOffset but arenaBaseOffset directly and maxOffAddr something weird equals to 0x09ff_ffff_ffff_ffff.
However, I'm not sure to understand what minOffAddr and maxOffAddr are supposed to represent. Is it supposed to be the first and the last possible address returned by mmap ? In this case, maxOffAddrwon't be ^uintptr(0) - arenaBaseOffset anyway, as the range for mmap on AIX is limited, from 0x0a00_0000_0000_0000 to 0x0aff_ffff_ffff_ffff, and not 0xffff_ffff_ffff_ffff.

Note that I don't really understand how the check is working inside makeAddrRange too. I'm missing something on it or what do you mean by segments ?

@mknyszek
Copy link
Contributor

@Helflym x86 has a split/discontiguous address space, and the purpose of arenaBaseOffset is to make that look contiguous when we want to index into arrays over the address space. It doesn't matter on most platforms since all memory returned by the OS is in one segment anyway, but it does on Solaris, for example, which hands out memory in different segments.

minOffAddr and maxOffAddr are the minimum and maximum addresses in the "offset" address space. Normally for a 64-bit address space it would be 0 and ^uintptr(0), but given the trick we did on AIX we should probably make that 0x0a00000000000000 and 0x0affffffffffffff like you suggest.

@Helflym
Copy link
Contributor

Helflym commented May 11, 2020

I confirm, setting these values for minOffAddr and maxOffAddr is working.
I think minOffAddr can stay as is, the value is already correct. But what do we do for maxOffAddr ? Do we hard code it to 0x0affffffffffffff or do you have a better solution using arenaBaseOffset ?

@mknyszek
Copy link
Contributor

I think the correct value for maxOffAddr is derived from the address uintptr((1<<heapAddrBits)-1) - arenaBaseOffset instead of ^uintptr(0) - arenaBaseOffset. This will give us the correct value on AIX and I believe on every amd64 platform as well (the sign-extended segment ends up at the bottom in the offset address space, so it's fine to just do (1<<heapAddrBits)-1. I'm not totally convinced yet that it's correct for all possible values of arenaBaseOffset, though I'm not sure that matters.

CC @aclements @prattmic for additional opinions (optional).

@prattmic
Copy link
Member

I think all of this is quite confusing, and the documentation doesn't help:

"arenaBaseOffset is the pointer value that corresponds to index 0 in the heap arena map." - https://cs.opensource.google/go/go/+/master:src/runtime/malloc.go;l=290

Unless I am misreading that sentence, it is asserting that arenaIndex(arenaBaseOffset) == 0, which is not true:

x86, Linux: arenaIndex(arenaBaseOffset) -> 0x400000 (if I followed the heapArenaBytes math properly, but it is non-zero regardless)

AIX: arenaIndex(arenaBaseOffset) -> 0xfffffffb00000000 (that's fun!)

I believe what it should say is:

"arenaBaseOffset is the offset such that arena index zero corresponds to the minimum valid virtual address. i.e., arenaBase(0) == minimum valid virtual address"

This property actually holds true:

x86, Linux: arenaBase(0) -> 0xffff800000000000

AIX: arenaBase(0) -> 0x0a00000000000000

From arenaBase, it follows that minAddr = 0 - arenaBaseOffset.

With this clarified, I think it makes much more sense to define minOffAddr and maxOffAddr explicitly:

x86: minOffAddr = 0xffff800000000000, maxOffAddr = 0x7fffffffffff

AIX: minOffAddr = 0x0a00000000000000, maxOffAddr = 0x0affffffffffffff

And then have arenaBaseOffset = -minOffAddr.

(As an aside, I think the most recent docs on offAddr are a bit confusing because they say that is "is" an offset address. Except it's just stored as a normal address. It is the methods (less, greater) that treat ordering as though it is offset.)

@mknyszek
Copy link
Contributor

I think all of this is quite confusing, and the documentation doesn't help:

"arenaBaseOffset is the pointer value that corresponds to index 0 in the heap arena map." - https://cs.opensource.google/go/go/+/master:src/runtime/malloc.go;l=290

Unless I am misreading that sentence, it is asserting that arenaIndex(arenaBaseOffset) == 0, which is not true:

x86, Linux: arenaIndex(arenaBaseOffset) -> 0x400000 (if I followed the heapArenaBytes math properly, but it is non-zero regardless)

Agreed, that documentation is wrong. This should all be updated.

AIX: arenaIndex(arenaBaseOffset) -> 0xfffffffb00000000 (that's fun!)

Admittedly, this part was kind of a hack... :( Ideally we'd like to just treat AIX's address space as a 60-bit address space, but that would require making fairly large mappings for both the page allocator and the arena index. Unfortunately, the application slows to a crawl when a large mapping is made (we're not exactly sure what goes wrong).

I believe what it should say is:

"arenaBaseOffset is the offset such that arena index zero corresponds to the minimum valid virtual address. i.e., arenaBase(0) == minimum valid virtual address"

This property actually holds true:

x86, Linux: arenaBase(0) -> 0xffff800000000000

AIX: arenaBase(0) -> 0x0a00000000000000

From arenaBase, it follows that minAddr = 0 - arenaBaseOffset.

With this clarified, I think it makes much more sense to define minOffAddr and maxOffAddr explicitly:

x86: minOffAddr = 0xffff800000000000, maxOffAddr = 0x7fffffffffff

AIX: minOffAddr = 0x0a00000000000000, maxOffAddr = 0x0affffffffffffff

That all makes sense to me.

And then have arenaBaseOffset = -minOffAddr.

(As an aside, I think the most recent docs on offAddr are a bit confusing because they say that is "is" an offset address. Except it's just stored as a normal address. It is the methods (less, greater) that treat ordering as though it is offset.)

Also agreed. It's technically true if you don't construct an offAddr value via offAddr{} (how it's represented should be of no interest to the user), but it's hard prevent that in the runtime where everything is in one package and the abstraction is a bit weak/leaky.

I'll put up a CL that fixes this issue and does some additional cleanup in the documentation.

@gopherbot
Copy link

Change https://golang.org/cl/233497 mentions this issue: runtime: make maxOffAddr reflect the actual address space upper bound

@mknyszek
Copy link
Contributor

And then have arenaBaseOffset = -minOffAddr.

@prattmic Actually, this isn't directly feasible. arenaBaseOffset is used in a critical path and this would make it non-constant. I think at the end of the day, this still needs to be defined in terms of arenaBaseOffset because that's what's actually being used in the operation.

Also taking a step back: while I agree the definition of minOffAddr and maxOffAddr isn't terribly clear, I'm not sure we get much clarity by writing out the addresses, outside of AIX. The sys.GoosAix trick also hinders readability. For now, I'm going back to my original plan of just using (1<<heapAddrBits)-1 instead of ^uintptr(0).

I've uploaded the CL, let me know what you think!

pull bot pushed a commit to kokizzu/go that referenced this issue May 14, 2020
Currently maxOffAddr is defined in terms of the whole 64-bit address
space, assuming that it's all supported, by using ^uintptr(0) as the
maximal address in the offset space. In reality, the maximal address in
the offset space is (1<<heapAddrBits)-1 because we don't have more than
that actually available to us on a given platform.

On most platforms this is fine, because arenaBaseOffset is just
connecting two segments of address space, but on AIX we use it as an
actual offset for the starting address of the available address space,
which is limited. This means using ^uintptr(0) as the maximal address in
the offset address space causes wrap-around, especially when we just
want to represent a range approximately like [addr, infinity), which
today we do by using maxOffAddr.

To fix this, we define maxOffAddr more appropriately, in terms of
(1<<heapAddrBits)-1.

This change also redefines arenaBaseOffset to not be the negation of the
virtual address corresponding to address zero in the virtual address
space, but instead directly as the virtual address corresponding to
zero. This matches the existing documentation more closely and makes the
logic around arenaBaseOffset decidedly simpler, especially when trying
to reason about its use on AIX.

Fixes golang#38966.

Change-Id: I1336e5036a39de846f64cc2d253e8536dee57611
Reviewed-on: https://go-review.googlesource.com/c/go/+/233497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@dmitshur
Copy link
Contributor Author

dmitshur commented Jun 24, 2020

I'm puzzled about what happened in this issue. It's still open, yet I see CL 233497 was merged on May 14th into the main Go repo with the text "Fixes #38966", which is this very issue. Yet it never got closed?

(That CL caused #39128, which was fixed via roll-forward.)

Maybe GitHub had an service disruption that caused the "Fixes #38966" event get skipped somehow.

@mknyszek Can you confirm this issue should be closed now, or am I missing something? Thanks.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 24, 2020
@mknyszek
Copy link
Contributor

@dmitshur Yeah this should be closed, I think. It should've been closed with that CL, not sure what happened. Closing now...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-AIX
Projects
None yet
Development

No branches or pull requests

5 participants