BUG: random: Fix long delays/hangs with zipf(a) when a near 1.#27048
Merged
charris merged 2 commits intonumpy:mainfrom Jul 26, 2024
Merged
BUG: random: Fix long delays/hangs with zipf(a) when a near 1.#27048charris merged 2 commits intonumpy:mainfrom
charris merged 2 commits intonumpy:mainfrom
Conversation
charris
approved these changes
Jul 26, 2024
Member
|
LGTM. I suspect that in practical use, most folks would only be interested in the smaller integers, ranking, for instance, so it would probably be useful to allow the user to set the truncation point with a new argument. |
Member
Author
|
The second commit restores the legacy implementation of |
Member
|
Thanks Warren. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The problem reported in gh-9829 is that
zipf(a)hangs whenais very close to 1.The implementation is based on the rejection method described in the text "Non-Uniform Random Variate Generation" by Luc Devroye. The candidate X is generated from a uniform random variate U in (0, 1] with this code:
X is rejected if it exceeds the largest value of the integer return type. (As noted in the code comments, the
zipffunction models a Zipf distribution truncated toRAND_INT_MAX.)The problem is that when
ais close to 1,1 / (a - 1)is large, and when U is sufficiently small, X will be much larger than RAND_INT_MAX and X can even overflow to infinity. The closerais to 1, the more likely this is to happen, resulting in the code spending a lot of time rejecting values that are too big.The fix is straightforward: for the given
a, work backwards fromRAND_INT_MAXto find the values of U that will cause X to be too big, and eliminate those values from the uniform variate U. It is the smaller values of U that result in bigger values of X, so we need to find Umin, below which X will be too big, and then sample U from the interval (Umin, 1] instead of (0, 1]. By solvingwe find
So the fix in the code is to initialize Umin before entering the loop with:
and to replace the generation of U with
(or something equivalent) so U is sampled from the interval (Umin, 1].
With this change, an example as extreme as
a = np.nextafter(1.0, 2.0)does not cause a problem:Closes gh-9829.