Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: shuffle seed should be different when -shuffle=on and -count flag is set #61256

Open
cristaloleg opened this issue Jul 10, 2023 · 12 comments

Comments

@cristaloleg
Copy link

This proposal is a clarification of the original test shuffle flag proposal #28592

Today I was debugging tests that were rarely failing due to cross-dependency between test functions. To verify the fix I noticed that go test -shuffle=on -count=100 runs tests in the same order as the first iteration. In other words -test.shuffle value is generated once and is reused for the other 99 test runs.

This makes sense when the shuffle seed is set explicitly (-shuffle=X ) but for random value (-shuffle=on) we should generate a new one to increase test order randomness to cover more combinations.

Current documentation for count and shuffle flags doesn't mention their relation, so the proposal (if accepted) will not break go test behaviour (docs from current master branch):

//	-count n
//	    Run each test, benchmark, and fuzz seed n times (default 1).
//	    If -cpu is set, run n times for each GOMAXPROCS value.
//	    Examples are always run once. -count does not apply to
//	    fuzz tests matched by -fuzz.
//

...

//
//	-shuffle off,on,N
//	    Randomize the execution order of tests and benchmarks.
//	    It is off by default. If -shuffle is set to on, then it will seed
//	    the randomizer using the system clock. If -shuffle is set to an
//	    integer N, then N will be used as the seed value. In both cases,
//	    the seed will be reported for reproducibility.
//

Thank you.

@gopherbot gopherbot added this to the Proposal milestone Jul 10, 2023
@cristaloleg
Copy link
Author

cristaloleg commented Jul 20, 2023

@rsc forgive me for a ping but can this be included into review meetings?

@rsc
Copy link
Contributor

rsc commented Jul 26, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Aug 2, 2023

There is also a strange detail that -count=3 with tests A,B,C runs A,B,C,A,B,C,A,B,C, but with benchmarks X,Y,Z you get X,X,X,Y,Y,Y,Z,Z,Z. I wonder why we do that, and whether we should run the tests A,A,A,B,B,B,C,C,C too.

I agree that if we keep the tests A,B,C,A,B,C,A,B,C then we should at least shuffle the rounds independently.

But maybe for -shuffle=on -count=N we should make the full list of N*M things that will run and shuffle that entire list, so there is no longer a "round" boundary anywhere.

@cristaloleg
Copy link
Author

tests A,B,C runs A,B,C,A,B,C,A,B,C, but with benchmarks X,Y,Z you get X,X,X,Y,Y,Y,Z,Z,Z.

For me this sounds like a separate proposal (however, running benchmark N times and switching to another sounds quite natural).

the full list of N*M things that will run and shuffle that entire list

Great point, this might also catch cases when running test twice in a row is buggy (pre/post-test initialisation contains a bug)

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

It sounds like maybe the proposal should be that -shuffle=on -count=N handles tests and benchmarks the same way, which is make a list of all the tests (alternately, benchmarks) with N copies each, and shuffle that entire list, and then run them in that order.

If you're running tests and benchmarks, they still run as two separate phases: first tests, then benchmarks.

If -cpu is involved, each cpu set still runs as separate phases around the shuffled sets. So -cpu=1,2,4 runs a shuffled set with cpu=1, a shuffled set (probably differently shuffled) with cpu=2, and then another with cpu=4.

We should probably check whether the current tests -count=3 doing A,B,C,A,B,C,A,B,C runs the parallel tests after each A,B,C or at the end of all of them. If the former then the shuffle will be a bit different, maybe too different.

@cristaloleg
Copy link
Author

I haven't thought about -cpu flag. Don't have strong cons/pros why it should/not be involved here. Same for parallel tests.

However, I really like that tests gonna be ran in a more arbitrary order than before.

(I assume -p flag should not change the behaviour in any way)

@TheCoreMan
Copy link

TheCoreMan commented Aug 14, 2023

Just putting in my 2 cents: whenever I used shuffle I expect the order to change between runs, even (or especially) when using -count. That's why I use "shuffle" 🎴

@aclements
Copy link
Member

We should probably check whether the current tests -count=3 doing A,B,C,A,B,C,A,B,C runs the parallel tests after each A,B,C or at the end of all of them.

The answer appears to be that it runs A,B,C,Parallel,A,B,C,Parallel,etc. That is, parallel tests run after each A,B,C, not at the end of all of them.

That's a bit unfortunate for shuffling the whole list. I think that means for tests, we either need to shuffle within each "A,B,C,Parallel" (but independently for each count), or we need to put all of the parallel tests at the end (whether or not we're shuffling).

We could still do a whole-list shuffle for benchmarks. The behavior would be different between tests and benchmarks, but that's already true today, and beyond some pedagogical purity, I'm not sure it matters if they're different.

@bcmills
Copy link
Contributor

bcmills commented Aug 16, 2023

One more option for shuffling would be to periodically toggle between Parallel and not. For example, we could shuffle in one “release parallel” event per -count, and then tack on one more “release parallel” event at the end of everything.

(When we hit a `“release parallel” event we would unblock all of the parallel tests buffered so far, wait for them to finish, and then resume running the next possibly-non-parallel test in shuffled order.)

@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

Sounds like the simplest thing to do is say for tests we only shuffle between the individual sections and for benchmarks we shuffle the whole thing.

@rsc
Copy link
Contributor

rsc commented Sep 7, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Oct 3, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: testing: shuffle seed should be different when -shuffle=on and -count flag is set testing: shuffle seed should be different when -shuffle=on and -count flag is set Oct 3, 2023
@rsc rsc modified the milestones: Proposal, Backlog Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests

6 participants