|
msg345077 - (view) |
Author: Mark Shannon (Mark.Shannon) *  |
Date: 2019-06-09 09:23 |
PEP 590 allows us the short circuit the __new__, __init__ slow path for commonly created builtin types.
As an initial step, we can speed up calls to range, list and dict by about 30%.
See https://gist.github.com/markshannon/5cef3a74369391f6ef937d52cca9bfc8
|
|
msg347272 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2019-07-04 11:11 |
Can we call tp_call instead of vectorcall when kwargs is not empty?
https://github.com/python/cpython/blob/7f41c8e0dd237d1f3f0a1d2ba2f3ee4e4bd400a7/Objects/call.c#L209-L219
For example, dict_init may be faster than dict_vectorcall when `d2 = dict(**d1)`.
|
|
msg347336 - (view) |
Author: Jeroen Demeyer (jdemeyer) *  |
Date: 2019-07-05 12:31 |
One thing that keeps bothering me when using vectorcall for type.__call__ is that we would have two completely independent code paths for constructing an object: the new one using vectorcall and the old one using tp_call, which in turn calls tp_new and tp_init.
In typical vectorcall usages, there is no need to support the old way any longer: we can set tp_call = PyVectorcall_Call and that's it. But for "type", we still need to support tp_new and tp_init because there may be C code out there that calls tp_new/tp_init directly. To give one concrete example: collections.defaultdict calls PyDict_Type.tp_init
One solution is to keep the old code for tp_new/tp_init. This is what Mark did in PR 13930. But this leads to duplication of functionality and is therefore error-prone (different code paths may have subtly different behaviour).
Since we don't want to break Python code calling dict.__new__ or dict.__init__, not implementing those is not an option. But to be compatible with the vectorcall signature, ideally we want to implement __init__ using METH_FASTCALL, so __init__ would need to be a normal method instead of a slot wrapper of tp_init (similar to Python classes). This would work, but it needs some support in typeobject.c
|
|
msg349809 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-08-15 15:49 |
New changeset 37806f404f57b234902f0c8de9a04647ad01b7f1 by Miss Islington (bot) (Jeroen Demeyer) in branch 'master':
bpo-37207: enable vectorcall for type.__call__ (GH-14588)
https://github.com/python/cpython/commit/37806f404f57b234902f0c8de9a04647ad01b7f1
|
|
msg352133 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2019-09-12 11:48 |
$ ./python -m pyperf timeit --compare-to ./python-master 'dict()'
python-master: ..................... 89.9 ns +- 1.2 ns
python: ..................... 72.5 ns +- 1.6 ns
Mean +- std dev: [python-master] 89.9 ns +- 1.2 ns -> [python] 72.5 ns +- 1.6 ns: 1.24x faster (-19%)
$ ./python -m pyperf timeit --compare-to ./python-master -s 'import string; a=dict.fromkeys(string.ascii_lowercase); b=dict.fromkeys(string.ascii_uppercase)' -- 'dict(a, **b)'
python-master: ..................... 1.41 us +- 0.04 us
python: ..................... 1.53 us +- 0.04 us
Mean +- std dev: [python-master] 1.41 us +- 0.04 us -> [python] 1.53 us +- 0.04 us: 1.09x slower (+9%)
---
There is some overhead in old dict merging idiom. But it seems reasonable compared to the benefit. LGTM.
|
|
msg362219 - (view) |
Author: miss-islington (miss-islington) |
Date: 2020-02-18 15:13 |
New changeset 6e35da976370e7c2e028165c65d7d7d42772a71f by Petr Viktorin in branch 'master':
bpo-37207: Use vectorcall for range() (GH-18464)
https://github.com/python/cpython/commit/6e35da976370e7c2e028165c65d7d7d42772a71f
|
|
msg364095 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-13 13:57 |
New changeset 9ee88cde1abf7f274cc55a0571b1c2cdb1263743 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up tuple() (GH-18936)
https://github.com/python/cpython/commit/9ee88cde1abf7f274cc55a0571b1c2cdb1263743
|
|
msg364322 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-16 14:04 |
New changeset c98f87fc330eb40fbcff627dfc50958785a44f35 by Dong-hee Na in branch 'master':
bpo-37207: Use _PyArg_CheckPositional() for tuple vectorcall (GH-18986)
https://github.com/python/cpython/commit/c98f87fc330eb40fbcff627dfc50958785a44f35
|
|
msg364324 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-16 14:06 |
New changeset 87ec86c425a5cd3ad41b831b54c0ce1a0c363f4b by Dong-hee Na in branch 'master':
bpo-37207: Add _PyArg_NoKwnames() helper function (GH-18980)
https://github.com/python/cpython/commit/87ec86c425a5cd3ad41b831b54c0ce1a0c363f4b
|
|
msg364340 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-16 17:17 |
New changeset 6ff79f65820031b219622faea8425edaec9a43f3 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up set() constructor (GH-19019)
https://github.com/python/cpython/commit/6ff79f65820031b219622faea8425edaec9a43f3
|
|
msg364428 - (view) |
Author: Dong-hee Na (corona10) *  |
Date: 2020-03-17 13:58 |
Victor,
frozenset is the last basic builtin collection which is not applied to this improvement yet.
frozenset also show similar performance improvement by using vectorcall
pyperf compare_to master.json bpo-37207.json
Mean +- std dev: [master] 2.26 us +- 0.06 us -> [bpo-37207] 2.06 us +- 0.05 us: 1.09x faster (-9%)
> What I mean is that vectorcall should not be used for everything
I definitely agree with this opinion. So I ask your opinion before submit the patch.
frozenset is not frequently used than the list/set/dict.
but frozenset is also the basic builtin collection, IMHO it is okay to apply vectorcall.
What do you think?
|
|
msg364447 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-17 16:55 |
> What do you think?
I would prefer to see a PR to give my opinion :)
|
|
msg364538 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-18 17:30 |
New changeset 1c60567b9a4c8f77e730de9d22690d8e68d7e5f6 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up frozenset() (GH-19053)
https://github.com/python/cpython/commit/1c60567b9a4c8f77e730de9d22690d8e68d7e5f6
|
|
msg364808 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-22 16:03 |
Remaining issue: optimize list(iterable), PR 18928. I reviewed the PR and I'm waiting for Petr.
|
|
msg365307 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-30 12:16 |
New changeset ce105541f8ebcf2dffcadedfdeffdb698a0edb44 by Petr Viktorin in branch 'master':
bpo-37207: Use vectorcall for list() (GH-18928)
https://github.com/python/cpython/commit/ce105541f8ebcf2dffcadedfdeffdb698a0edb44
|
|
msg365309 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-30 12:18 |
All PRs are now merged. Thanks to everybody who was involved in this issue. It's a nice speedup which is always good to take ;-)
|
|
msg365385 - (view) |
Author: Petr Viktorin (petr.viktorin) *  |
Date: 2020-03-31 12:43 |
The change to dict() was not covered by the smaller PRs.
That one will need more thought, but AFAIK it wasn't yet rejected.
|
|
msg365387 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-03-31 12:44 |
Oh sorry, I missed the dict.
|
|
msg365448 - (view) |
Author: Dong-hee Na (corona10) *  |
Date: 2020-04-01 03:24 |
@vstinner @petr.viktorin
I 'd like to experiment dict vector call and finalize the work.
Can I proceed it?
|
|
msg365452 - (view) |
Author: Petr Viktorin (petr.viktorin) *  |
Date: 2020-04-01 08:24 |
Definitely!
|
|
msg365488 - (view) |
Author: Dong-hee Na (corona10) *  |
Date: 2020-04-01 15:39 |
+------------------+-------------------+-----------------------------+
| Benchmark | master-dict-empty | bpo-37207-dict-empty |
+==================+===================+=============================+
| bench dict empty | 502 ns | 443 ns: 1.13x faster (-12%) |
+------------------+-------------------+-----------------------------+
+------------------+--------------------+-----------------------------+
| Benchmark | master-dict-update | bpo-37207-dict-update |
+==================+====================+=============================+
| bench dict empty | 497 ns | 425 ns: 1.17x faster (-15%) |
+------------------+--------------------+-----------------------------+
+--------------------+---------------------+-----------------------------+
| Benchmark | master-dict-kwnames | bpo-37207-dict-kwnames |
+====================+=====================+=============================+
| bench dict kwnames | 1.38 us | 917 ns: 1.51x faster (-34%) |
+--------------------+---------------------+-----------------------------+
|
|
msg365489 - (view) |
Author: Dong-hee Na (corona10) *  |
Date: 2020-04-01 15:40 |
@vstinner @petr.viktorin
Looks like benchmark showing very impressive result.
Can I submit the patch?
|
|
msg365490 - (view) |
Author: Petr Viktorin (petr.viktorin) *  |
Date: 2020-04-01 15:45 |
> Can I submit the patch?
Yes!
If you think a patch is ready for review, just submit it. There's not much we can comment on before we see the code :)
(I hope that doesn't contradict what your mentor says...)
|
|
msg365491 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-04-01 15:48 |
When I designed the FASTCALL calling convention, I experimented a new tp_fastcall slot to PyTypeObject to optimize __call__() method: bpo-29259.
Results on the pyperformance benchmark suite were not really convincing and I had technical issues (decide if tp_call or tp_fastcall should be called, handle ABI compatibility and backward compatibility, etc.). I decided to give up on this idea.
I'm happy to see that PEP 590 managed to find its way into Python internals and actually make Python faster ;-)
|
|
msg365545 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-04-02 00:55 |
New changeset e27916b1fc0364e3627438df48550c16f0b80b82 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up dict() (GH-19280)
https://github.com/python/cpython/commit/e27916b1fc0364e3627438df48550c16f0b80b82
|
|
msg365546 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-04-02 00:56 |
Can we now close this issue? Or does someone plan to push further optimizations. Maybe new issues can be opened for next optimizations?
|
|
msg365553 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-04-02 01:31 |
> When I designed the FASTCALL calling convention, I experimented a new tp_fastcall slot to PyTypeObject to optimize __call__() method: bpo-29259.
Ah, by the way, I also made an attempt to use the FASTCALL calling convention for tp_new and tp_init: bpo-29358. Again, the speedup wasn't obvious and the implementation was quite complicated with many corner cases. So I gave up on this one. It didn't seem to be really worth it.
|
|
msg365811 - (view) |
Author: Dong-hee Na (corona10) *  |
Date: 2020-04-05 05:15 |
IMHO, we can close this PR.
Summary:
The PEP 590 vectorcall is applied to list, tuple, dict, set, frozenset and range
If someone wants to apply PEP 590 to other cases.
Please open a new issue for it!
Thank you, Mark, Jeroen, Petr and everyone who works for this issue.
|
|
msg365907 - (view) |
Author: Petr Viktorin (petr.viktorin) *  |
Date: 2020-04-07 14:44 |
As discussed briefly in Mark's PR, benchmarks like this are now slower:
ret = dict(**{'a': 2, 'b': 4, 'c': 6, 'd': 8})
Python 3.8: Mean +- std dev: 281 ns +- 9 ns
master: Mean +- std dev: 456 ns +- 14 ns
|
|
msg373095 - (view) |
Author: Łukasz Langa (lukasz.langa) *  |
Date: 2020-07-06 11:22 |
New changeset b4a9263708cc67c98c4d53b16933f6e5dd07990f by Dong-hee Na in branch 'master':
bpo-37207: Update whatsnews for 3.9 (GH-21337)
https://github.com/python/cpython/commit/b4a9263708cc67c98c4d53b16933f6e5dd07990f
|
|
msg373116 - (view) |
Author: Dong-hee Na (corona10) *  |
Date: 2020-07-06 13:32 |
New changeset 97558d6b08a656eae209d49b206f703cee0359a2 by Dong-hee Na in branch '3.9':
[3.9] bpo-37207: Update whatsnews for 3.9 (GH-21337)
https://github.com/python/cpython/commit/97558d6b08a656eae209d49b206f703cee0359a2
|