New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync/atomic: add OR/AND operators for unsigned types #61395
Comments
setPresent should not be calling runtime.Gosched. There's nothing another goroutine is going to do to help move things along (unlike, say, waiting for a mutex to be unlocked). If you remove the Gosched, it should be inlinable. At least, -m says this is inlinable:
I wrote this in runtime/atomic_test.go:
The result is:
So this would be an x86-only optimization that wins less than 2X. I'm not sure this really makes sense given how unusual it typically is. The main argument I can see is inlinability, but it seems to be inlinable already if written efficiently. |
I was wondering about
A 2x win may still be significant, but I'll know more when I add a better benchmarking setup. But also I found LDSET/LDSETA/LDSETAL/LDSETL for ARM, quoting:
I think the "atomically" may be referring to the entire operation. To be honest the mnemonic doesn't make it sound like this is a bitwise or. Perhaps an ARM expert could chime in. |
Thanks for the link to LDSETAL. That looks like it might be good enough, although we would have to think carefully about whether the acquire-release semantics it guarantees matches the sequentially consistent semantics we want for sync/atomic when limited to the OR operation. Perhaps it does but perhaps not. |
That's correct. ARMV8.1 added instructions for atomic And (also, Min/Max, and, for some reason, Xor). LDCLR, LDCLRA, LDCLRAL, LDCLRL is the equivalent for atomic And. |
I'll note that there was also no contention in @rsc's benchmark. Typically, CAS loops collapse far worse under contention than direct atomic operations. |
This proposal has been added to the active column of the proposals project |
This might be a stretch, but another option could be to rewrite the simplest form of the CAS loop to the corresponding atomic operation on platforms that support it, similar to how "intrinsics" for encoding/binary work. That way old code using the thing that works now benefits as well. New API would still be convenient, but I think atomic flag sets that want it are rare, and if it's intrinsified then it can live in any package. I am given to understand this would be harder than existing rewrite rules, though, because it spans more than a single basic block. It may also be hard to formulate proofs that the rewrite preserves semantics. |
@randall77 actually prototyped exactly this. I'm not necessarily opposed to having these rewrite rules, but I don't think they're a substitute for just giving people the API that they need. Often, when writing low-level code like this, the developer wants to say what they mean and have a good sense of what's actually going on, so having to express this operation through a rather opaque rewrite rule seems too subtle. |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
What new API additions were proposed here? |
I guess we all assumed we knew what the API was but forgot to write it down. Oops! |
Anyone currently working on this one? I've made some progress already and would like to finish the work. If that is ok please feel free to assign the issue to me. Thanks. |
I see that the current internal/atomic If we want to keep the same api style as Add we need to change Or/And to return a new value instead of applying the bitwise operation inplace. Just want to confirm that this is indeed an expected outcome, I see that it is used in a couple places in runtime and atomic tests. |
We should make the new API for If that makes it inconsistent with One thing that bugs me a bit: there is a choice to return the new value or the old value. With |
It seems really unfortunate to have these operations destroy useful information by returning the new value. In isolation, clearly we would want them to return the old value, but that would be confusingly inconsistent with the rest of sync/atomic. These operations could return both the old and new value, like: func AndUint32(addr *uint32, mask uint32) (old, new uint32)
func (x *Uint32) And(mask uint32) (old, new uint32) This would allow us to return the old value, while making the return signature unmistakably different from other functions in sync/atomic to avoid confusion. I'm sure most uses would assign one result or the other (or possibly both) to This reminds me a lot of GCC's atomic intrinsics, where several have both an The one downside I see to returning both old and new is that these functions couldn't be used as part of a larger expression. Users could of course wrap it in their own operation that returned just one result or the other, and use that in expression contexts. |
I haven't consider touching the SSA. I'm not really familiar with it to pull this off. My idea is to implement Or32/Or64 and And32/And64 in internal/atomic to serve as basis for the sync/atomic work. In order to not mess up the current Or/And and variants I will leave them untouched for now.
I like this approach, it is unfortunate that it prevents its use in larger expressions, at least this approach is explicit about the returns so people don't accidentally use old for new and vice versa. Another option is just to stick with the api we currently have for add, returning the new value only |
Looking at the handful of uses in the runtime, in fact, none of them use the result at all. So I'm not too worried about the inability to use these operations in a larger expression.
+1 |
Change https://go.dev/cl/543035 mentions this issue: |
TSAN recently got support for Go's new atomic And and Or operations (#61395). This CL updates the race syso files to include the change. Also regenerate cgo dynamic imports on darwin. OpenBSD/AMD64 is not updated, as TSAN no longer supports OpenBSD (#52090). Linux/PPC64 is not updated, as I'm running into some builder issues. Still working on it. For #61395. For #62624. Change-Id: Ifc90ea79284f29a356f9e8a5f144f6c690881395 Reviewed-on: https://go-review.googlesource.com/c/go/+/543035 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Without having all the architectures implementing the And/Or operators merged I can't proceed with the public sync/atomic apis. This CL adds a generic implementation that should work for all the remaining arches, while waiting for the native assembly implementations. For golang#61395
Without having all the architectures implementing the And/Or operators merged I can't proceed with the public sync/atomic apis. This CL adds a generic implementation that should work for all the remaining arches, while waiting for the native assembly implementations. For golang#61395
Change https://go.dev/cl/543175 mentions this issue: |
Change https://go.dev/cl/543397 mentions this issue: |
Without having all the architectures implementing the And/Or operators merged I can't proceed with the public sync/atomic apis. This CL adds a generic implementation that should work for all the remaining arches, while waiting for the native assembly implementations in CL 531835, CL 531678, CL 531895. I regret the oversight of not pushing this earlier. For #61395 Change-Id: Ib2d67f359fe324b4743eb79e9c8e52e8f6f5476c GitHub-Last-Rev: d350927 GitHub-Pull-Request: #64214 Reviewed-on: https://go-review.googlesource.com/c/go/+/543175 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
Following CL 543035, this CL updates race syso for Linux/PPC64LE. Now we have update all of them (except OpenBSD). For #61395. Fixes #62624. Change-Id: I9e1d758355114a50ff206e5d78dc4ea8a06367d8 Reviewed-on: https://go-review.googlesource.com/c/go/+/543397 Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Turns out after adding the generic implementation for And/Or we ended up with duplicated ops that are exactly the same for arm. This CL removes the arm code and adds arm to the generic build flags. For golang#61395
Change https://go.dev/cl/544017 mentions this issue: |
When I initially added the wasm code for these ops I did not saw that wasm actually has the Cas operations implemented, although they are merely pointer assignments since wasm is single threaded. Now with a generic implementation for And/Or we can add wasm to the build tags. For golang#61395
Change https://go.dev/cl/544116 mentions this issue: |
Turns out after adding the generic implementation for And/Or we ended up with duplicated ops that are exactly the same for arm. Apologies for the oversight, this CL removes the redundant arm code and adds arm to the generic build flags. For #61395 Change-Id: Id5e5a5cf113774948f8e772592e898d0810ad1f6 GitHub-Last-Rev: 4d8c857 GitHub-Pull-Request: #64299 Reviewed-on: https://go-review.googlesource.com/c/go/+/544017 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
When I initially added the wasm code for these ops I did not saw that wasm actually has the Cas operations implemented, although they are merely pointer assignments since wasm is single threaded. Now with a generic implementation for And/Or we can add wasm to the build tags. For #61395 Change-Id: I997dc90477c772882d6703df1b795dfc0d90a699 GitHub-Last-Rev: 92736a6 GitHub-Pull-Request: #64300 Reviewed-on: https://go-review.googlesource.com/c/go/+/544116 Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
This CL adds race instrumentation for the new sync/atomic And/Or apis. I belive the only missing implementation is for openbsd/amd64, that does not have the newer tsan ops since it is stuck in tsan V2. I'll send the race impl for openbsd alongside the sync/atomic code in a follow-up CL. For golang#61395
Change https://go.dev/cl/544476 mentions this issue: |
Brief update: with Go1.22 entering a freeze, I've adjusted the milestone to Go1.23. Given our time constraints, it seems adequate to postpone since so far we have only internal bits merged. |
Change https://go.dev/cl/544455 mentions this issue: |
These primitives will be used by the new And/Or sync/atomic apis. For #61395 Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333 GitHub-Last-Rev: 9755db5 GitHub-Pull-Request: #63314 Reviewed-on: https://go-review.googlesource.com/c/go/+/531895 Reviewed-by: abner chenc <chenguoqi@loongson.cn> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
These primitives will be used by the new And/Or sync/atomic apis. For golang#61395 Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333 GitHub-Last-Rev: 9755db5 GitHub-Pull-Request: golang#63314 Reviewed-on: https://go-review.googlesource.com/c/go/+/531895 Reviewed-by: abner chenc <chenguoqi@loongson.cn> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
These primitives will be used by the new And/Or sync/atomic apis. For #61395 Change-Id: Ia9b4877048002d3d7d1dffa2311d0ec5f38e4ee5 GitHub-Last-Rev: 20dea11 GitHub-Pull-Request: #63318 Reviewed-on: https://go-review.googlesource.com/c/go/+/531678 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
Update, Aug 16 2023: Current proposal at #61395 (comment).
Original related proposal: #31748
Use case: we have types with methods that set a value. These methods manipulate a bitset indicating that the value was set, which is used (e.g.) for data serialization. Users of this API know to use a lock to manage concurrently reading/writing the same field, but they are allowed to concurrently write to different fields. Given that the bitsets are logically shared between different fields, we must manipulate them atomically. Currently that takes the form of a CAS loop:
But, on x86-64, there are atomic versions of AND/OR that do this in one go, as mentioned in #31748. Using this would not only make the setters faster, it would likely also allow inlining them:
setPresent
is too complex to inline.cc @aclements @ianlancetaylor @randall77
The text was updated successfully, but these errors were encountered: