ENH: Added 'doane' and 'sqrt' estimators to np.histogram in numpy.function_base#7090
Conversation
numpy/lib/function_base.py
Outdated
There was a problem hiding this comment.
Please try to keep the line shorter then 80 characters.
EDIT: if you would clean up the tests in that regard too, would be nice, but I won't press the issue.
There was a problem hiding this comment.
Fixed up the line width and some other stuff in the docs.
|
I read a little more carefully and reconciled the issues numerical stability and excessive temp arrays. The updated version of the formula is now I have also fixed up Scott's formula based on Ward's paper cited in the Cross Validated answer (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.9725). It is a good sign that none of the tests changed or failed for this commit. |
|
☔ The latest upstream changes (presumably #6656) made this pull request unmergeable. Please resolve the merge conflicts. |
|
Looks like the release notes have a conflict. Could you rebase? |
0526cdd to
a14bb52
Compare
|
Rebased and squashed. |
|
Any further comments? |
|
☔ The latest upstream changes (presumably #7181) made this pull request unmergeable. Please resolve the merge conflicts. |
a14bb52 to
ae8ab99
Compare
|
Squawk. |
ae8ab99 to
a14bb52
Compare
a14bb52 to
b8b5561
Compare
|
This PR fortuitously fixes the following mishap: https://github.com/numpy/numpy/blob/master/numpy/lib/nanfunctions.py#L922. Not a reason to pull it in in and of itself, but I just noticed that after a rebase and thought I should mention it somewhere without opening another issue/PR. |
|
OK, sorry for letting this sit. I am a bit overloaded generally right now. Since the only real problem I have with it is "do we really need it", and it is only half public, nor seems a big maintenance burden, I will merge this. If anyone thinks we are adding too many histogram estimators, or finds doane too weird (@josef-pkt if you want to check it go ahead, I never got around checking the original paper, but I think I am willing to trust @madphysicist and this stackexchange guy by now), we can revert the stuff. |
ENH: Added 'doane' and 'sqrt' estimators to np.histogram in numpy.function_base
|
Sorry, I lost an unfinished reply a while ago. roughly: I thought some examples would be useful However, I saw that the current The Doane paper goes on a lot about finding "nice" numbers, rounding to round numbers mutliple of 5, 100, .... That sounds like a lot of effort for a very doubtful outcome, given that the R function that does something similar (I never tried) is criticized in the related references. So, I don't have any objections to adding options, but they also won't be a huge improvement. What might be useful in general is to switch to "auto" as default. |
|
Yeah, I had already got the gist of, doane not being a big deal/improvement. I would like switching to "auto" as default, it sounds just so much more useful then the default of 10. But I am not sure it won't be too painful for downstream. If someone whished to attempt it, I think I could be fine with trying. But be prepared to having to undo it or the mailing list already disagreeing.... |
|
One more "impression" I had looking at some references |
|
Sounds all interesting :). With the "auto" option, it could also be something that fits a bit more to downstream (i.e. plotting in matplotlib), though it might be a bit tricky to give a different bins keyword based on numpy version. |
|
It would also be possible to provide a FutureWarning, that users should explicitly specify the option. |
These are a couple of estimators I find myself using sometimes. They do not break existing code and tests show that they work sensibly.