Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile,runtime: frequent test timeouts on solaris-amd64-oraclerel #51443

Closed
bcmills opened this issue Mar 2, 2022 · 20 comments
Closed

cmd/compile,runtime: frequent test timeouts on solaris-amd64-oraclerel #51443

bcmills opened this issue Mar 2, 2022 · 20 comments
Labels
Builders x/build issues (builders, bots, dashboards) compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Solaris
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 2, 2022

#!watchflakes
default <- builder == "solaris-amd64-oraclerel" && (pkg == "runtime" || pkg == "cmd/compile") && (`^panic: test timed out` || `failed to start: context deadline exceeded`)

greplogs --dashboard -md -l -e '(?ms)\Asolaris-amd64-oraclerel.*^panic: test timed out.*FAIL\s+runtime' --since=2022-01-01

2022-03-01T21:27:42-b0db2f0/solaris-amd64-oraclerel
2022-02-28T19:00:23-b33592d/solaris-amd64-oraclerel
2022-01-08T00:24:25-90860e0/solaris-amd64-oraclerel
2022-01-06T23:39:43-042548b/solaris-amd64-oraclerel

@golang/runtime: the builder appears to only run the -short tests. Is there something we can feasibly do to make -short mode shorter?

@golang/release, @rorth: would it make sense to set GO_TEST_TIMEOUT_SCALE on this builder to make the timeouts more generous?

@bcmills bcmills added Builders x/build issues (builders, bots, dashboards) OS-Solaris NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 2, 2022
@bcmills bcmills added this to the Backlog milestone Mar 2, 2022
@dmitshur
Copy link
Contributor

dmitshur commented Mar 2, 2022

It makes sense to increase GO_TEST_TIMEOUT_SCALE to me, but @rorth as the builder owner is in a better position to decide the optimal configuration for this builder. This builder is not configured as a TryBot, so if it needs more time, it's fine and won't affect pre-submit testing times (#17104).

@rorth
Copy link

rorth commented Mar 3, 2022 via email

@ianlancetaylor
Copy link
Contributor

While running tests, a go builder will happily grab whatever resources it has available. Are you doing anything to restrict it to 4 jobs?

@rorth
Copy link

rorth commented Mar 4, 2022 via email

@ianlancetaylor
Copy link
Contributor

Good point, I think the builder config will set GOMAXPROCS, which will limit the resource usage of each specific test, but then you will have 4 tests run in parallel and each test will happily run 4 tests in parallel. At least, I think that is how it works; my apologies if I'm getting this wrong.

@rorth
Copy link

rorth commented Mar 7, 2022 via email

@bcmills
Copy link
Contributor Author

bcmills commented Mar 30, 2022

As a first step, I've now set GOMAXPROCS=8 for the builder to see if
this helps. The last runtime timeout happened before that change,
though. We'll have to see if this is enough.

Appears not to be:

greplogs --dashboard -md -l -e '(?ms)\Asolaris-amd64-oraclerel.*^panic: test timed out.*FAIL\s+runtime' --since=2022-03-08

2022-03-29T16:24:51-a2baae6/solaris-amd64-oraclerel
2022-03-25T19:04:59-2bbf383/solaris-amd64-oraclerel
2022-03-25T15:15:57-3fd8b86/solaris-amd64-oraclerel

@bcmills
Copy link
Contributor Author

bcmills commented May 3, 2022

greplogs --dashboard -md -l -e '(?ms)\Asolaris-amd64-oraclerel.*^panic: test timed out.*FAIL\s+runtime' --since=2022-03-30
2022-05-02T19:53:56-3b01a80/solaris-amd64-oraclerel

@bcmills
Copy link
Contributor Author

bcmills commented May 11, 2022

greplogs -l -e '(?ms)\Asolaris-amd64-oraclerel.*^panic: test timed out.*FAIL\s+runtime' --since=2022-05-03
2022-05-11T03:28:01-ccb7987/solaris-amd64-oraclerel
2022-05-10T21:56:21-bda9da8/solaris-amd64-oraclerel

@bcmills
Copy link
Contributor Author

bcmills commented May 26, 2022

greplogs -l -e '(?ms)\Asolaris-amd64-oraclerel.*^panic: test timed out.*FAIL\s+runtime' --since=2022-05-12
2022-05-25T19:25:08-04337a6/solaris-amd64-oraclerel
2022-05-23T16:21:22-c1d197a/solaris-amd64-oraclerel

@gopherbot
Copy link

Change https://go.dev/cl/408701 mentions this issue: dashboard: add known issue for solaris-amd64-oraclerel

gopherbot pushed a commit to golang/build that referenced this issue May 26, 2022
For golang/go#52653.
Updates golang/go#51443.

Change-Id: Iaea8fab13ed979e54c827f0f3c4d705bdaff4ee4
Reviewed-on: https://go-review.googlesource.com/c/build/+/408701
Reviewed-by: Alex Rakoczy <alex@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Feb 3, 2023

There have apparently not been any more of these failures since May. I'm calling this obsolete.

@bcmills bcmills closed this as not planned Won't fix, can't repro, duplicate, stale Feb 3, 2023
@gopherbot
Copy link

Change https://go.dev/cl/465156 mentions this issue: dashboard: unmark known-issues with low failure rates

gopherbot pushed a commit to golang/build that referenced this issue Feb 4, 2023
I had initially added known issues fairly aggressively in order to use
them to reduce noise in 'greplogs -triage'. Now that we are using
'watchflakes' for triage, that noise reduction is no longer important
(the failures are already clustered to their respective known issues),
and having greyed-out cells on the dashboard makes new regressions too
easy to miss.

Concretely:

- golang/go#42212 is mostly specific to x/net at this point (as
  golang/go#57841)

- There have been no failures matching golang/go#51001 since October.

- golang/go#52724 has been so rare lately that we hadn't yet added a
  'watchflakes' pattern for it.

- There have been no failures matching golang/go#51443 since May.

- There have been no failures matching golang/go#53116 or
  golang/go#53093 since I enabled 'watchflakes' for the builder in
  December.

- The linux-amd64-perf builder seems to be passing consistently for
  x/benchmarks and x/tools, so there is no need to refer to
  golang/go#53538 to explain failures on it.

Change-Id: Ia16db2a23e5fa037a299f1f56fb26f1cf84521e1
Reviewed-on: https://go-review.googlesource.com/c/build/+/465156
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot gopherbot reopened this Apr 7, 2023
@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
default <- builder == "solaris-amd64-oraclerel" && pkg == "runtime" && `^panic: test timed out`
2023-04-07 15:12 solaris-amd64-oraclerel go@39986d28 runtime.TestFakeTime (log)
panic: test timed out after 3m0s
running tests:
	TestFakeTime (32s)

runtime.syscall_wait4(0x30b3, 0xc0002b6018, 0x0, 0xc0023881b0)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/runtime/syscall_solaris.go:310 +0x85 fp=0xc00007e9f0 sp=0xc00007e998 pc=0x472225
syscall.Wait4(0xc000180008?, 0x0?, 0x0?, 0x0?)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/syscall/syscall_solaris.go:275 +0x26 fp=0xc00007ea38 sp=0xc00007e9f0 pc=0x480586
os.(*Process).wait(0xc000136270)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/os/exec_unix.go:43 +0x77 fp=0xc00007ea90 sp=0xc00007ea38 pc=0x4a22d7
os.(*Process).Wait(...)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc000166580)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/os/exec/exec.go:890 +0x45 fp=0xc00007eaf8 sp=0xc00007ea90 pc=0x531f45
os/exec.(*Cmd).Run(0x37?)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/os/exec/exec.go:590 +0x2d fp=0xc00007eb18 sp=0xc00007eaf8 pc=0x530b2d
os/exec.(*Cmd).CombinedOutput(0xc000166580)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/os/exec/exec.go:1005 +0x94 fp=0xc00007eb40 sp=0xc00007eb18 pc=0x532814
runtime_test.buildTestProg.func1()
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/runtime/crash_test.go:143 +0x3d1 fp=0xc00007ed30 sp=0xc00007eb40 pc=0x6c6bd1
sync.(*Once).doSlow(0x7a9880?, 0xc001ab01b0?)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/sync/once.go:74 +0xbf fp=0xc00007ed90 sp=0xc00007ed30 pc=0x47c83f
sync.(*Once).Do(...)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/sync/once.go:65
runtime_test.buildTestProg(0xc0009049c0, {0x8188b8, 0xc}, {0xc000b84a00, 0x1, 0x1})
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/runtime/crash_test.go:128 +0x48a fp=0xc00007ee88 sp=0xc00007ed90 pc=0x6c678a
runtime_test.TestFakeTime(0xc0009049c0)
	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/runtime/time_test.go:29 +0x87 fp=0xc00007ef70 sp=0xc00007ee88 pc=0x74f067
testing.tRunner(0xc0009049c0, 0x841278)

watchflakes

@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
default <- builder == "solaris-amd64-oraclerel" && pkg == "runtime" && (`^panic: test timed out` || `failed to start: context deadline exceeded`)
2023-04-11 20:25 solaris-amd64-oraclerel go@28480216 runtime.TestCgoExternalThreadSIGPROF (log)
--- FAIL: TestCgoExternalThreadSIGPROF (0.01s)
    crash_cgo_test.go:98: /opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/tmp/go-build3722635515/testprogcgo.exe CgoExternalThreadSIGPROF failed to start: context deadline exceeded
2023-04-11 20:25 solaris-amd64-oraclerel go@28480216 runtime.TestCgoExecSignalMask (log)
--- FAIL: TestCgoExecSignalMask (0.00s)
    crash_cgo_test.go:137: /opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/tmp/go-build3722635515/testprogcgo.exe CgoExecSignalMask failed to start: context deadline exceeded
2023-04-11 20:25 solaris-amd64-oraclerel go@28480216 runtime.TestSigStackSwapping (log)
--- FAIL: TestSigStackSwapping (0.01s)
    crash_cgo_test.go:502: /opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/tmp/go-build3722635515/testprogcgo.exe SigStack failed to start: context deadline exceeded

watchflakes

@mknyszek
Copy link
Contributor

In triage, it looks to us like all the tests are just timing out. The builder just seems slow. There might be one culprit. Verbose test output would tell us which test cases are slow.

CC @golang/solaris

@mknyszek
Copy link
Contributor

I forgot that @golang/solaris is empty... CC @rorth I suppose?

@mknyszek mknyszek changed the title runtime: frequent test timeouts on solaris-amd64-oraclerel cmd/compile,runtime: frequent test timeouts on solaris-amd64-oraclerel Apr 12, 2023
@bcmills
Copy link
Contributor Author

bcmills commented Apr 12, 2023

The C toolchain on this builder seems particularly slow. I wonder if just raising the GO_TEST_TIMEOUT_SCALE for the builder might resolve things.

@rorth
Copy link

rorth commented Apr 13, 2023 via email

@bcmills
Copy link
Contributor Author

bcmills commented May 12, 2023

Duplicate of #60152

@bcmills bcmills marked this as a duplicate of #60152 May 12, 2023
@bcmills bcmills closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Solaris
Projects
None yet
Development

No branches or pull requests

6 participants