Skip to content

Investigate performance concerns raised between 1.4 and 4.3 #324

@jaraco

Description

@jaraco

In bpo-44246, @asottile writes:

here's the performance regressions, they affect any callers of distributions() and are even worse on callers of the new apis.

a call to distributions() is about 3x slower than in 3.9

here is the setup I am using:

virtualenv venv39 -ppython3.9
venv39/bin/pip install flake8 pytest twine pre-commit
virtualenv venv310 -ppython3.10
venv310/bin/pip install flake8 pytest twine pre-commit

to test just the distributions() call I'm using the following:

$ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()'
20 loops, best of 20: 12.5 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()'
20 loops, best of 20: 36.7 msec per loop

this is a less-extreme example, many applications have more dependencies installed -- but even in this case this is adding ~24ms startup to any application using entry_points() -- and it gets worse

the return value of entry_points() alone isn't all that useful, next an application needs to retrieve its entry points. let's start for the somewhat normal case of retrieving a single category of entry points:

$ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()["flake8.extension"]'
20 loops, best of 20: 12.7 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points(name="flake8.extension")'
20 loops, best of 20: 37.1 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points().select(group="flake8.extension")'
20 loops, best of 20: 37.1 msec per loop

again, 3x slower and very real time to the end user (~24-25ms)

now let's show an example usage that something like flake8 uses where multiple groups are requested (this is common for apps and plugin systems which provide multiple distinct functionalities)

$ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'eps = entry_points(); eps["flake8.extension"]; eps["flake8.report"]'
 
20 loops, best of 20: 12.6 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'eps = entry_points(); eps.select(group="flake8.extension"); eps.select(group="flake8.report")'
20 loops, best of 20: 38.2 msec per loop

also slower, but an additional ms per call to .select(...)

and it only gets worse with more and more packages installed

here's the versions I'm using to ensure they are up to date:

$ venv39/bin/python --version --version
Python 3.9.5 (default, May 19 2021, 11:32:47) 
[GCC 9.3.0]
$ venv310/bin/python --version --version
Python 3.10.0b2 (default, Jun  2 2021, 00:22:18) [GCC 9.3.0]

Python 3.10.0b2 maps to importlib_metadata 4.3 and Python 3.9.5 maps to importlib_metadata 1.4. Let's investigate which factors affect the performance and what use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidThis doesn't seem right

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions