testing: shuffle seed should be different when -shuffle=on and -count flag is set #61256

cristaloleg · 2023-07-10T09:05:40Z

This proposal is a clarification of the original test shuffle flag proposal #28592

Today I was debugging tests that were rarely failing due to cross-dependency between test functions. To verify the fix I noticed that go test -shuffle=on -count=100 runs tests in the same order as the first iteration. In other words -test.shuffle value is generated once and is reused for the other 99 test runs.

This makes sense when the shuffle seed is set explicitly (-shuffle=X ) but for random value (-shuffle=on) we should generate a new one to increase test order randomness to cover more combinations.

Current documentation for count and shuffle flags doesn't mention their relation, so the proposal (if accepted) will not break go test behaviour (docs from current master branch):

//	-count n
//	    Run each test, benchmark, and fuzz seed n times (default 1).
//	    If -cpu is set, run n times for each GOMAXPROCS value.
//	    Examples are always run once. -count does not apply to
//	    fuzz tests matched by -fuzz.
//

...

//
//	-shuffle off,on,N
//	    Randomize the execution order of tests and benchmarks.
//	    It is off by default. If -shuffle is set to on, then it will seed
//	    the randomizer using the system clock. If -shuffle is set to an
//	    integer N, then N will be used as the seed value. In both cases,
//	    the seed will be reported for reproducibility.
//

Thank you.

The text was updated successfully, but these errors were encountered:

cristaloleg · 2023-07-20T12:28:12Z

@rsc forgive me for a ping but can this be included into review meetings?

rsc · 2023-07-26T18:26:27Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2023-08-02T17:39:52Z

There is also a strange detail that -count=3 with tests A,B,C runs A,B,C,A,B,C,A,B,C, but with benchmarks X,Y,Z you get X,X,X,Y,Y,Y,Z,Z,Z. I wonder why we do that, and whether we should run the tests A,A,A,B,B,B,C,C,C too.

I agree that if we keep the tests A,B,C,A,B,C,A,B,C then we should at least shuffle the rounds independently.

But maybe for -shuffle=on -count=N we should make the full list of N*M things that will run and shuffle that entire list, so there is no longer a "round" boundary anywhere.

cristaloleg · 2023-08-03T06:52:47Z

tests A,B,C runs A,B,C,A,B,C,A,B,C, but with benchmarks X,Y,Z you get X,X,X,Y,Y,Y,Z,Z,Z.

For me this sounds like a separate proposal (however, running benchmark N times and switching to another sounds quite natural).

the full list of N*M things that will run and shuffle that entire list

Great point, this might also catch cases when running test twice in a row is buggy (pre/post-test initialisation contains a bug)

rsc · 2023-08-09T17:48:24Z

It sounds like maybe the proposal should be that -shuffle=on -count=N handles tests and benchmarks the same way, which is make a list of all the tests (alternately, benchmarks) with N copies each, and shuffle that entire list, and then run them in that order.

If you're running tests and benchmarks, they still run as two separate phases: first tests, then benchmarks.

If -cpu is involved, each cpu set still runs as separate phases around the shuffled sets. So -cpu=1,2,4 runs a shuffled set with cpu=1, a shuffled set (probably differently shuffled) with cpu=2, and then another with cpu=4.

We should probably check whether the current tests -count=3 doing A,B,C,A,B,C,A,B,C runs the parallel tests after each A,B,C or at the end of all of them. If the former then the shuffle will be a bit different, maybe too different.

cristaloleg · 2023-08-09T18:13:35Z

I haven't thought about -cpu flag. Don't have strong cons/pros why it should/not be involved here. Same for parallel tests.

However, I really like that tests gonna be ran in a more arbitrary order than before.

(I assume -p flag should not change the behaviour in any way)

TheCoreMan · 2023-08-14T08:34:53Z

Just putting in my 2 cents: whenever I used shuffle I expect the order to change between runs, even (or especially) when using -count. That's why I use "shuffle" 🎴

aclements · 2023-08-16T18:10:40Z

We should probably check whether the current tests -count=3 doing A,B,C,A,B,C,A,B,C runs the parallel tests after each A,B,C or at the end of all of them.

The answer appears to be that it runs A,B,C,Parallel,A,B,C,Parallel,etc. That is, parallel tests run after each A,B,C, not at the end of all of them.

That's a bit unfortunate for shuffling the whole list. I think that means for tests, we either need to shuffle within each "A,B,C,Parallel" (but independently for each count), or we need to put all of the parallel tests at the end (whether or not we're shuffling).

We could still do a whole-list shuffle for benchmarks. The behavior would be different between tests and benchmarks, but that's already true today, and beyond some pedagogical purity, I'm not sure it matters if they're different.

bcmills · 2023-08-16T18:36:47Z

One more option for shuffling would be to periodically toggle between Parallel and not. For example, we could shuffle in one “release parallel” event per -count, and then tack on one more “release parallel” event at the end of everything.

(When we hit a `“release parallel” event we would unblock all of the parallel tests buffered so far, wait for them to finish, and then resume running the next possibly-non-parallel test in shuffled order.)

rsc · 2023-08-30T17:56:35Z

Sounds like the simplest thing to do is say for tests we only shuffle between the individual sections and for benchmarks we shuffle the whole thing.

rsc · 2023-09-07T13:53:26Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2023-10-03T00:51:10Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

cristaloleg added the Proposal label Jul 10, 2023

gopherbot added this to the Proposal milestone Jul 10, 2023

rsc added the Proposal-FinalCommentPeriod label Sep 7, 2023

rsc changed the title ~~proposal: testing: shuffle seed should be different when -shuffle=on and -count flag is set~~ testing: shuffle seed should be different when -shuffle=on and -count flag is set Oct 3, 2023

rsc modified the milestones: Proposal, Backlog Oct 3, 2023

rsc added Proposal-Accepted and removed Proposal-FinalCommentPeriod labels Oct 3, 2023

ucirello mentioned this issue Mar 31, 2024

proposal: testing: expose -test.shuffle seed in a public function. #66623

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: shuffle seed should be different when -shuffle=on and -count flag is set #61256

testing: shuffle seed should be different when -shuffle=on and -count flag is set #61256

cristaloleg commented Jul 10, 2023

cristaloleg commented Jul 20, 2023 •

edited

rsc commented Jul 26, 2023

rsc commented Aug 2, 2023

cristaloleg commented Aug 3, 2023

rsc commented Aug 9, 2023

cristaloleg commented Aug 9, 2023

TheCoreMan commented Aug 14, 2023 •

edited

aclements commented Aug 16, 2023

bcmills commented Aug 16, 2023

rsc commented Aug 30, 2023

rsc commented Sep 7, 2023

rsc commented Oct 3, 2023

testing: shuffle seed should be different when -shuffle=on and -count flag is set #61256

testing: shuffle seed should be different when -shuffle=on and -count flag is set #61256

Comments

cristaloleg commented Jul 10, 2023

cristaloleg commented Jul 20, 2023 • edited

rsc commented Jul 26, 2023

rsc commented Aug 2, 2023

cristaloleg commented Aug 3, 2023

rsc commented Aug 9, 2023

cristaloleg commented Aug 9, 2023

TheCoreMan commented Aug 14, 2023 • edited

aclements commented Aug 16, 2023

bcmills commented Aug 16, 2023

rsc commented Aug 30, 2023

rsc commented Sep 7, 2023

rsc commented Oct 3, 2023

cristaloleg commented Jul 20, 2023 •

edited

TheCoreMan commented Aug 14, 2023 •

edited