Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/bits: add extended precision Add, Sub, Mul, Div #24813

Closed
bmakam-qdt opened this issue Apr 11, 2018 · 74 comments
Closed

math/bits: add extended precision Add, Sub, Mul, Div #24813

bmakam-qdt opened this issue Apr 11, 2018 · 74 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Proposal-Accepted
Milestone

Comments

@bmakam-qdt
Copy link
Contributor

This is a proposal to speed up crypto/poly1305 using [u]int128 and multiword arithmetic. Arm64 has instructions that let you multiply two 64-bit registers to two 64-bit registers. It also has instructions for multiword addition. I have implemented some of these intrinsics and intrinsified them in https://go-review.googlesource.com/c/go/+/106376 which improved the performance of crypto/poly1305 by ~30% on arm64 (Amberwing). I have added these intrinsics for arm64 in poly1305 package but they might benefit other platforms as well. I am seeking advice on the design of this implementation. Is poly1305 package the right place to have these intrinsics or should they go in math/big or math/bit? This might also be a use case for #9455

@gopherbot gopherbot added this to the Proposal milestone Apr 11, 2018
@FiloSottile FiloSottile changed the title proposal: improve crypto/poly1305 using multiword arithmetic proposal: intrinsic multiword arithmetic Apr 11, 2018
@FiloSottile
Copy link
Contributor

FiloSottile commented Apr 11, 2018

amd64 also has these instructions, and access to them is one of the major blockers to writing performant crypto in Go instead of assembly.

I think the best place for them would be math/bits. Can you give us a preview of the exported APIs?

/cc @vkrasnov @gtank (they probably have opinions)

@mvdan
Copy link
Member

mvdan commented Apr 11, 2018

/cc @mundaym @TocarIP too, perhaps

@bmakam-qdt
Copy link
Contributor Author

MulWW - this is same as math/big.mulWW
AddQQ - adds two uint128. Each uint128 is represented as a pair of uint64 (hi, lo) returns uint128(hi,lo)
AddQW - adds a uint128 and uint64 returns uint128(hi, lo)
ShrQ44 - shiftright uint128 with constant 44 returns uint64
ShrQ42 - shiftright uint128 with constant 42 returns uint64
The arm64 assembly for these APIs are in https://go-review.googlesource.com/c/go/+/106376/3/src/vendor/golang_org/x/crypto/poly1305/asm_arm64.s
ShrQ44 and ShrQ42 can be combined into one API I think.

@vkrasnov
Copy link
Contributor

This is useful, no doubt. Intel can definitely benefit too. I'd put them all in math/big. The ShrQXX should probably be generic?

@bmakam-qdt
Copy link
Contributor Author

arm64 has an API EXTRconst which can be used to write a rule like (ShrQconst [c] x y) -> EXTRconst [c] x y). If other targets also have EXTRconst this can be generic.

@FiloSottile
Copy link
Contributor

I don't like the idea of polluting math/big. It already has a million godoc entries, but at least they all deal in Int/Float/Rat, while these would split a fixed size uint128 in hi/lo uint64.

@vkrasnov
Copy link
Contributor

arm64 has an API EXTRconst which can be used to write a rule like (ShrQconst [c] x y) -> EXTRconst [c] x y). If other targets also have EXTRconst this can be generic.

Intel should have SHRD.

I don't like the idea of polluting math/big

Maybe create something new? like math/uint128

@mvdan
Copy link
Member

mvdan commented Apr 11, 2018

Is there any chance of writing this in regular Go code without any extra APIs, and having the compiler recognise the pattern to optimise?

@bmakam-qdt
Copy link
Contributor Author

are you suggesting a builtin [u]int128 type like how GCC does: https://godbolt.org/g/o6ka5e ?

@FiloSottile
Copy link
Contributor

@mvdan Not really. The high bits of uint64 * uint64 are simply discarded. We would need uint128 in the language. (#9455)

@vkrasnov
Copy link
Contributor

That is probably too big a change for a very niche use case. I think that less than 0.1% of Go projects would need these.

@josharian
Copy link
Contributor

If anywhere these would probably belong in math/bits, not math/big. I think some of them were even mooted during the original math/bits API discussion. (On phone so don’t have issue link handy.)

@randall77
Copy link
Contributor

math/bits seems like the right place.
I think you only need the following building blocks:

func Mul(x, y uint64) (uint64, uint64) // 64x64 -> 128 multiply (low, then high)
func Add(x, y uint64) (uint64, uint64) // 64x64 -> 65 add (low, then high)

(and maybe signed versions?)
Everything else can be written in Go.

Even Add is not really necessary, but might make it easier for the compiler to detect the carry. (In Go, the carry is low_result < low_x || low_result < low_y or something.)

The expression you would use for a multiword shift should be directly detectable by the compiler. (It doesn't currently, but it should be straightforward.)

@FiloSottile
Copy link
Contributor

FiloSottile commented Apr 11, 2018

I think AddQQ and AddQW have uint128 inputs, not uint64.

Agreed on not being useful enough to change the language. I don't hate math/uint128, but if we can boil them down to a couple API in math/bits and detecting some patterns, that's better.

@cherrymui
Copy link
Member

If we have 64x64 -> 128, we might probably also have 32x32 -> 64 on 32-bit architectures. On 64-bit architectures this would be trivial.

@randall77
Copy link
Contributor

randall77 commented Apr 12, 2018

Can't you already get a 32x32->64 multiply on 32-bit archs? 386 can, at least:

func f(x, y uint32) (uint32, uint32) {
	z := uint64(x) * uint64(y)
	return uint32(z), uint32(z >> 32)
}

compiles to

	0x0012 00018 (/home/khr/go/tmp1.go:3)	MOVL	"".y+8(SP), AX
	0x0016 00022 (/home/khr/go/tmp1.go:3)	MOVL	"".x+4(SP), CX
	0x001a 00026 (/home/khr/go/tmp1.go:4)	MULL	CX
	0x001c 00028 (/home/khr/go/tmp1.go:4)	MOVL	AX, "".~r2+12(SP)
	0x0020 00032 (/home/khr/go/tmp1.go:4)	MOVL	DX, "".~r3+16(SP)

@FiloSottile
Copy link
Contributor

32x32 -> 64 can be handled by pattern matching thanks to uint64. This is about filling the uint128 void.

@cherrymui
Copy link
Member

You are right. 32x32->64 can already be handled.

@ericlagergren
Copy link
Contributor

previous conversation about uint128: #9455

@TocarIP
Copy link
Contributor

TocarIP commented Apr 12, 2018

I like an idea about exposing mulWW and divWW, which we already intrinsify, for general use, but I'm not sure about right package. IIRC there was Gophercon talk about optimizing 64x64->128 multiplication, so this looks like something useful for others.
This should also help hash/fnv128, which currently does ugly stuff like this:

                 s1l := (s[1] & 0xffffffff) * prime128Lower
                 s1h := (s[1] >> 32) * prime128Lower
                 s0l := (s[0]&0xffffffff)*prime128Lower + (s[1]&0xffffffff)<<prime128Shift
                 s0h := (s[0]>>32)*prime128Lower + (s[1]>>32)<<prime128Shift
                 // Carries
                 s1h += s1l >> 32
                 s0l += s1h >> 32
                 s0h += s0l >> 32
                 // Update the values
                 s[1] = (s1l & 0xffffffff) + (s1h << 32)
                 s[0] = (s0l & 0xffffffff) + (s0h << 32)

@bmakam-qdt
Copy link
Contributor Author

The expression you would use for a multiword shift should be directly detectable by the compiler. (It doesn't currently, but it should be straightforward.)

I think we might not need the intrinsic for multiword shiftright because we can create a pattern that can be detected by the compiler, something like this:
ShrQconst(42, z[5], z[4]) -> (z5<<22 | z4>>42)

I think AddQQ and AddQW have uint128 inputs, not uint64.

Yes, AddQQ takes two uint128 i.e two pairs of uint64 and outputs the result to uint128(hi, lo) and AddQW takes a uint128 and uint64 and outputs the result to uint128(hi, lo).

@gtank
Copy link
Contributor

gtank commented Apr 12, 2018

I am strongly in favor of doing this in some form.

If it's going to be explicit functions, then math/bits or a dedicated math/{intrinsic, uint128, multiword} is a better place for it than math/big, since we care a lot about the details of the underlying representation here.

For crypto, my dream API would give me direct mappings to the basic func: uint64 x uint64 -> uint64 x uint64 functions (mulq, addq, etc for amd64), then have smart pattern matching do the carry/flag/pipeline-aware instruction optimizations when possible (think add/adc, or the newer mulx/adcx/adox). There's an example of this being a huge win in math/big already and Intel wrote a guide for how to apply those instructions to patterns that are common in crypto code.

@gopherbot
Copy link

Change https://golang.org/cl/106376 mentions this issue: cmd/compile: add intrinsics for multiword arithmetic

@bmakam-qdt
Copy link
Contributor Author

Sorry, I jumped the gun and assumed we have a consensus on math/bits as the right package for the APIs and took a stab at it in CL106376. My sincere apologies.

@jimmyfrasche
Copy link
Member

I strongly disagree about putting these in math/bits: these are not operations on bits. They may involve operations on bits, but ultimately so does everything.

I really do not want to see math/bits turn into a junk drawer.

The proposed operations are a new class of operations and they deserve their own package for ease of discovery, to make reading code that uses these operations clear, and so that they can have focused documentation.

@randall77
Copy link
Contributor

I don't see a need for both AddQW and AddQQ. AddQW(x1, x0, y) = AddQQ(x1, x0, 0, y). The compiler should have no problem constant folding the 0. Just have a single Add.
In any case, I don't like Q and W. What do they stand for? (quadword and word? That makes no sense here.)

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jun 18, 2018
@smasher164
Copy link
Member

smasher164 commented Jul 8, 2018

Porting musl's fma implementation to Go required Mul64 as well. Here is a possible full-width multiply implementation.
Edit: Link to musl libc

// Mul returns the high and low bits from the product of x and y.
func Mul(x, y uint) (hi, lo uint) {
	if UintSize == 32 {
		hi32, lo32 := Mul32(uint32(x), uint32(y))
		return uint(hi32), uint(lo32)
	}
	hi64, lo64 := Mul64(uint64(x), uint64(y))
	return uint(hi64), uint(lo64)
}

// Mul32 returns the high and low 32 bits from the 64-bit product of x and y.
func Mul32(x, y uint32) (hi, lo uint32) {
	z := uint64(x) * uint64(y)
	return uint32(z >> 32), uint32(z)
}

// Mul64 returns the high and low 64 bits from the 128-bit product of x and y.
func Mul64(x, y uint64) (hi, lo uint64) {
	xlo := uint64(uint32(x))
	xhi := x >> 32
	ylo := uint64(uint32(y))
	yhi := y >> 32

	t1 := xlo * ylo
	t2 := xlo*yhi + xhi*ylo
	t3 := xhi * yhi
	lo = t1 + (t2 << 32)
	hi = t3 + (t2 >> 32)
	if t1 > lo {
		hi += 1
	}
	return hi, lo
}

@smasher164
Copy link
Member

smasher164 commented Jul 9, 2018

I have a prototype of the extended precision operations over at github.com/smasher164/extprec. All the implementations are adapted from "Hacker's Delight", and the relevant references are provided.

However, it still needs real test cases in order to hammer out any possible bugs. I've only tested it with random hand-picked values.

Edit: Completed unit tests!

@gopherbot
Copy link

Change https://golang.org/cl/123157 mentions this issue: math/bits: add extended precision Add, Sub, Mul, Div

@bmkessler
Copy link
Contributor

The above CL ports the go versions of these functions from math/big since those are previously tested. Additional SSA rules will still need to be implemented as this just handles the simple Mul and Div cases for amd64 and arm64 that are used in math/big.

@TuomLarsen
Copy link

Really tiny nitpick. Wouldn't it make more sense to have func Div(hi, lo, y uint), instead of func Div(hi, lo, x uint)? Because in all other functions, x is the first operand and y the second one. hi and lo are 128-bits of the first operand so maybe an y would be more descriptive.

@gopherbot
Copy link

Change https://golang.org/cl/129415 mentions this issue: cmd/compile: intrinsify math/bits.Mul and Div

gtank pushed a commit to gtank/go that referenced this issue Sep 3, 2018
Port math/big pure go versions of add-with-carry, subtract-with-borrow,
full-width multiply, and full-width divide.

Updates golang#24813

Change-Id: Ifae5d2f6ee4237137c9dcba931f69c91b80a4b1c
gtank pushed a commit to gtank/go that referenced this issue Sep 3, 2018
Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64) and Div/Div64 (AMD64).
SSA rules for other functions and architectures are left as a future
optimization.  Benchmark results on AMD64/ARM64 before and after SSA
implementation are below.

amd64
name     old time/op  new time/op  delta
Add-4    1.82ns ± 2%  1.89ns ± 6%     ~     (p=0.167 n=5+5)
Add32-4  1.78ns ± 7%  1.80ns ± 9%     ~     (p=0.690 n=5+5)
Add64-4  1.82ns ± 3%  1.93ns ±23%     ~     (p=0.810 n=5+5)
Sub-4    1.85ns ± 4%  1.91ns ± 6%     ~     (p=0.246 n=5+5)
Sub32-4  1.82ns ± 4%  1.87ns ± 8%     ~     (p=0.730 n=5+5)
Sub64-4  1.91ns ± 7%  1.85ns ± 4%     ~     (p=0.341 n=5+5)
Mul-4    11.7ns ± 5%   1.8ns ± 0%  -84.72%  (p=0.000 n=5+4)
Mul32-4  1.60ns ± 2%  1.61ns ± 6%     ~     (p=0.651 n=5+5)
Mul64-4  7.13ns ± 9%  2.08ns ±11%  -70.88%  (p=0.008 n=5+5)
Div-4    59.5ns ± 4%  49.2ns ± 5%  -17.43%  (p=0.008 n=5+5)
Div32-4  18.3ns ± 3%  18.2ns ± 1%     ~     (p=0.333 n=5+5)
Div64-4  58.6ns ±13%  48.5ns ± 5%  -17.24%  (p=0.008 n=5+5)

arm64
name      old time/op  new time/op  delta
Add-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Add32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Add64-96  5.52ns ± 0%  5.51ns ± 0%     ~     (p=0.444 n=5+5)
Sub-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Sub32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Sub64-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Mul-96    34.6ns ± 0%   5.0ns ± 0%  -85.52%  (p=0.008 n=5+5)
Mul32-96  4.51ns ± 0%  4.51ns ± 0%     ~     (all equal)
Mul64-96  21.1ns ± 0%   5.0ns ± 0%  -76.26%  (p=0.008 n=5+5)
Div-96    64.7ns ± 0%  64.7ns ± 0%     ~     (all equal)
Div32-96  17.0ns ± 0%  17.0ns ± 0%     ~     (all equal)
Div64-96  53.1ns ± 0%  53.1ns ± 0%     ~     (all equal)

Updates golang#24813

Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068
@gtank
Copy link
Contributor

gtank commented Sep 3, 2018

I was curious about this, so I cherry-picked the relevant CLs on top of current Go master and have some preliminary results from my Ed25519 field arithmetic. bits.Mul64 got me about halfway to the speed of an assembly implementation without much effort. I suspect I could make the pure Go a bit faster as well, but haven't tried yet.

Rough benchmarks on a Kaby Lake laptop (i7-7560U @ 2.40GHz):

$ benchstat old32 bits64
name             old time/op  new time/op  delta
KeyGeneration-4  48.5µs ± 0%  33.9µs ± 2%  -30.22%  (p=0.004 n=5+6)
Signing-4        49.6µs ± 1%  34.8µs ± 2%  -29.84%  (p=0.002 n=6+6)
Verification-4    133µs ± 1%    93µs ± 3%  -29.81%  (p=0.002 n=6+6)
$ benchstat bits64 amd64
name             old time/op  new time/op  delta
KeyGeneration-4  33.9µs ± 2%  24.9µs ± 0%  -26.36%  (p=0.001 n=6+7)
Signing-4        34.8µs ± 2%  25.9µs ± 0%  -25.70%  (p=0.001 n=6+7)
Verification-4   93.2µs ± 3%  60.5µs ± 0%  -35.06%  (p=0.002 n=6+6)
$ benchstat old32 amd64
name             old time/op  new time/op  delta
KeyGeneration-4  48.5µs ± 0%  24.9µs ± 0%  -48.61%  (p=0.003 n=5+7)
Signing-4        49.6µs ± 1%  25.9µs ± 0%  -47.87%  (p=0.001 n=6+7)
Verification-4    133µs ± 1%    61µs ± 0%  -54.42%  (p=0.002 n=6+6)

If anyone else is interested in playing with this, it's this code and this Go.

gopherbot pushed a commit that referenced this issue Sep 11, 2018
Port math/big pure go versions of add-with-carry, subtract-with-borrow,
full-width multiply, and full-width divide.

Updates #24813

Change-Id: Ifae5d2f6ee4237137c9dcba931f69c91b80a4b1c
Reviewed-on: https://go-review.googlesource.com/123157
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rasky
Copy link
Member

rasky commented Sep 15, 2018

@gtank your mul64 function (which is actually a mad64 AFAICT) should use bits.Add to implement double-word add:

func mul64(accLo, accHi, x, y uint64) (ol, oh uint64) {
 	oh, ol = bits.Mul64(x, y)
        var carry uint64
        ol, carry = bits.Add64(ol, accLo, 0)
        oh, _ = bits.Add64(oh, accHi, carry)
 	return
 }

at some point that should get turned into ADD+ADC, which would probably get you another boost compared to assembly.

gopherbot pushed a commit that referenced this issue Sep 26, 2018
Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64).
SSA rules for other functions and architectures are left as a future
optimization.  Benchmark results on AMD64/ARM64 before and after SSA
implementation are below.

amd64
name     old time/op  new time/op  delta
Add-4    1.78ns ± 0%  1.85ns ±12%     ~     (p=0.397 n=4+5)
Add32-4  1.71ns ± 1%  1.70ns ± 0%     ~     (p=0.683 n=5+5)
Add64-4  1.80ns ± 2%  1.77ns ± 0%   -1.22%  (p=0.048 n=5+5)
Sub-4    1.78ns ± 0%  1.78ns ± 0%     ~     (all equal)
Sub32-4  1.78ns ± 1%  1.78ns ± 0%     ~     (p=1.000 n=5+5)
Sub64-4  1.78ns ± 1%  1.78ns ± 0%     ~     (p=0.968 n=5+4)
Mul-4    11.5ns ± 1%   1.8ns ± 2%  -84.39%  (p=0.008 n=5+5)
Mul32-4  1.39ns ± 0%  1.38ns ± 3%     ~     (p=0.175 n=5+5)
Mul64-4  6.85ns ± 1%  1.78ns ± 1%  -73.97%  (p=0.008 n=5+5)
Div-4    57.1ns ± 1%  56.7ns ± 0%     ~     (p=0.087 n=5+5)
Div32-4  18.0ns ± 0%  18.0ns ± 0%     ~     (all equal)
Div64-4  56.4ns ±10%  53.6ns ± 1%     ~     (p=0.071 n=5+5)

arm64
name      old time/op  new time/op  delta
Add-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Add32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Add64-96  5.52ns ± 0%  5.51ns ± 0%     ~     (p=0.444 n=5+5)
Sub-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Sub32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Sub64-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Mul-96    34.6ns ± 0%   5.0ns ± 0%  -85.52%  (p=0.008 n=5+5)
Mul32-96  4.51ns ± 0%  4.51ns ± 0%     ~     (all equal)
Mul64-96  21.1ns ± 0%   5.0ns ± 0%  -76.26%  (p=0.008 n=5+5)
Div-96    64.7ns ± 0%  64.7ns ± 0%     ~     (all equal)
Div32-96  17.0ns ± 0%  17.0ns ± 0%     ~     (all equal)
Div64-96  53.1ns ± 0%  53.1ns ± 0%     ~     (all equal)

Updates #24813

Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068
Reviewed-on: https://go-review.googlesource.com/129415
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/138917 mentions this issue: cmd/compile: instrinsify math/bits.Mul on ppc64x

gopherbot pushed a commit that referenced this issue Oct 2, 2018
Add SSA rules to intrinsify Mul/Mul64 on ppc64x.

benchmark             old ns/op     new ns/op     delta
BenchmarkMul-40       8.80          0.93          -89.43%
BenchmarkMul32-40     1.39          1.39          +0.00%
BenchmarkMul64-40     5.39          0.93          -82.75%

Updates #24813

Change-Id: I6e95bfbe976a2278bd17799df184a7fbc0e57829
Reviewed-on: https://go-review.googlesource.com/138917
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
@smasher164
Copy link
Member

Should this issue be closed now that CL 123157 has been merged in?

@bradfitz
Copy link
Contributor

Maybe? @randall77, are there plans to intrinsify more of this?

@randall77
Copy link
Contributor

Only Mul has been intrinsified so far.
I think we want Add/Sub intrinsified also (or better, recognize the Go code and generate the add-with-carry instruction).
That said, those tasks don't need to live in this issue, which is more about API. If concerned people can open issues for their architecture, we can close this issue.

@bradfitz
Copy link
Contributor

Okay, closing.

remyoudompheng pushed a commit to remyoudompheng/go that referenced this issue Jan 13, 2019
This method was using a handwritten long multiplication of uint64s.
Since implementation of golang#24813 we can remove it and replace it
by Mul64 from math/bits.

This brings a small speedup for 64-bit platforms.

Benchmarks on Haskell Celeron 2955U.

benchmark                                  old ns/op     new ns/op     delta
BenchmarkAppendFloat/Decimal-2             127           127           +0.00%
BenchmarkAppendFloat/Float-2               340           317           -6.76%
BenchmarkAppendFloat/Exp-2                 258           233           -9.69%
BenchmarkAppendFloat/NegExp-2              256           231           -9.77%
BenchmarkAppendFloat/Big-2                 402           375           -6.72%
BenchmarkAppendFloat/BinaryExp-2           113           114           +0.88%
BenchmarkAppendFloat/32Integer-2           125           125           +0.00%
BenchmarkAppendFloat/32ExactFraction-2     274           249           -9.12%
BenchmarkAppendFloat/32Point-2             339           317           -6.49%
BenchmarkAppendFloat/32Exp-2               255           229           -10.20%
BenchmarkAppendFloat/32NegExp-2            254           229           -9.84%
BenchmarkAppendFloat/64Fixed1-2            165           154           -6.67%
BenchmarkAppendFloat/64Fixed2-2            184           176           -4.35%
BenchmarkAppendFloat/64Fixed3-2            168           158           -5.95%
BenchmarkAppendFloat/64Fixed4-2            187           177           -5.35%
BenchmarkAppendFloat/Slowpath64-2          84977         84883         -0.11%
@gopherbot
Copy link

Change https://golang.org/cl/157717 mentions this issue: strconv: simply (*extFloat).Multiply using math/bits.Mul64

gopherbot pushed a commit that referenced this issue Mar 4, 2019
This method was using a handwritten long multiplication of uint64s.
Since implementation of #24813 we can remove it and replace it
by Mul64 from math/bits.

This brings a small speedup for 64-bit platforms.

Benchmarks on Haswell Celeron 2955U.

benchmark                                  old ns/op     new ns/op     delta
BenchmarkAppendFloat/Decimal-2             127           127           +0.00%
BenchmarkAppendFloat/Float-2               340           317           -6.76%
BenchmarkAppendFloat/Exp-2                 258           233           -9.69%
BenchmarkAppendFloat/NegExp-2              256           231           -9.77%
BenchmarkAppendFloat/Big-2                 402           375           -6.72%
BenchmarkAppendFloat/BinaryExp-2           113           114           +0.88%
BenchmarkAppendFloat/32Integer-2           125           125           +0.00%
BenchmarkAppendFloat/32ExactFraction-2     274           249           -9.12%
BenchmarkAppendFloat/32Point-2             339           317           -6.49%
BenchmarkAppendFloat/32Exp-2               255           229           -10.20%
BenchmarkAppendFloat/32NegExp-2            254           229           -9.84%
BenchmarkAppendFloat/64Fixed1-2            165           154           -6.67%
BenchmarkAppendFloat/64Fixed2-2            184           176           -4.35%
BenchmarkAppendFloat/64Fixed3-2            168           158           -5.95%
BenchmarkAppendFloat/64Fixed4-2            187           177           -5.35%
BenchmarkAppendFloat/Slowpath64-2          84977         84883         -0.11%

Change-Id: If05784e856289b3b7bf136567882e7ee10234756
Reviewed-on: https://go-review.googlesource.com/c/go/+/157717
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
@golang golang locked and limited conversation to collaborators Jan 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Proposal-Accepted
Projects
None yet
Development

No branches or pull requests