Wednesday, May 11, 2022

The 2022 Python Language Summit: Lightning talks

These were a series of short talks, each lasting around five minutes.


Read the rest of the 2022 Python Language Summit coverage here.



Lazy imports, with Carl Meyer

Carl Meyer, an engineer at Instagram, presented on a proposal that has since blossomed into PEP 690: lazy imports, a feature that has already been implemented in Cinder, Instagram’s performance-optimised fork of CPython 3.8.

What’s a lazy import? Meyer explained that the core difference with lazy imports is that the import does not happen until the imported object is referenced.

Examples

In the following Python module, spam.py, with lazy imports activated, the module eggs would never in fact be imported since eggs is never referenced after the import:


# spam.py import sys import eggs def main(): print("Doing some spammy things.") sys.exit(0) if __name__ == "__main__": main()


And in this Python module, ham.py, with lazy imports activated, the function bacon_function is imported – but only right at the end of the script, after we’ve completed a for-loop that’s taken a very long time to finish:


# ham.py
import sys
import time
from bacon import bacon_function def main(): for _ in range(1_000_000_000): print('Doing hammy things') time.sleep(1) bacon_function()
sys.exit(0) if __name__ == "__main__": main()


Meyer revealed that the Instagram team’s work on lazy imports had resulted in startup time improvements of up to 70%, memory usage improvements of up to 40%, and the elimination of almost all import cycles within their code base. (This last point will be music to the ears of anybody who has worked on a Python project larger than a few modules.)

Downsides

Meyer also laid out a number of costs to having lazy imports. Lazy imports create the risk that ImportError (or any other error resulting from an unsuccessful import) could potentially be raised… anywhere. Import side effects could also become “even less predictable than they already weren’t”.

Lastly, Meyer noted, “If you’re not careful, your code might implicitly start to require it”. In other words, you might unexpectedly reach a stage where – because your code has been using lazy imports – it now no longer runs without the feature enabled, because your code base has become a huge, tangled mess of cyclic imports.

Where next for lazy imports?

Python users who have opinions either for or against the proposal are encouraged to join the discussion on discuss.python.org.



Python-Dev versus Discourse, with Thomas Wouters

This was less of a talk, and more of an announcement.

Historically, if somebody wanted to make a significant change to CPython, they were required to post on the python-dev mailing list. The Steering Council now views the alternative venue for discussion, discuss.python.org, to be a superior forum in many respects.

Thomas Wouters, Core Developer and Steering Council member, said that the Steering Council was planning on loosening the requirements, stated in several places, that emails had to be sent to python-dev in order to make certain changes. Instead, they were hoping that discuss.python.org would become the authoritative discussion forum in the years to come.



Asks from Pyston, with Kevin Modzelewski

Kevin Modzelewski, core developer of the Pyston project, gave a short presentation on ways forward for CPython optimisations. Pyston is a performance-oriented fork of CPython 3.8.12.

Modzelewski argued that CPython needed better benchmarks; the existing benchmarks on pyperformance were “not great”. Modzelewski also warned that his “unsubstantiated hunch” was that the Faster CPython team had already accomplished “greater than one-half” of the optimisations that could be achieved within the current constraints. Modzelewski encouraged the attendees to consider future optimisations that might cause backwards-incompatible behaviour changes.



Core Development and the PSF, with Thomas Wouters

This was another short announcement from Thomas Wouters on behalf of the Steering Council. After sponsorship from Google providing funding for the first ever CPython Developer-In-Residence (Łukasz Langa), Meta has provided sponsorship for a second year. The Steering Council also now has sufficient funds to hire a second Developer-In-Residence – and attendees were notified that they were open to the idea of hiring somebody who was not currently a core developer.



“Forward classes”, with Larry Hastings

Larry Hastings, CPython core developer, gave a brief presentation on a proposal he had sent round to the python-dev mailing list in recent days: a “forward class” declaration that would avoid all issues with two competing typing PEPs: PEP 563 and PEP 649. In brief, the proposed syntax would look something like this:



forward class X() continue class X: # class body goes here def __init__(self, key): self.key = key


In theory, according to Hastings, this syntax could avoid issues around runtime evaluation of annotations that have plagued PEP 563, while also circumventing many of the edge cases that unexpectedly fail in a world where PEP 649 is implemented.

The idea was in its early stages, and reaction to the proposal was mixed. The next day, at the Typing Summit, there was more enthusiasm voiced for a plan laid out by Carl Meyer for a tweaked version of Hastings’s earlier attempt at solving this problem: PEP 649.



Better fields access, with Samuel Colvin

Samuel Colvin, maintainer of the Pydantic library, gave a short presentation on a proposal (recently discussed on discuss.python.org) to reduce name clashes between field names in a subclass, and method names in a base class.

The problem is simple. Suppose you’re a maintainer of a library, whatever_library. You release Version 1 of your library, and one user start to use your library to make classes like the following:

from whatever_library import BaseModel class Farmer(BaseModel): name: str fields: list[str]


Both the user and the maintainer are happy, until the maintainer releases Version 2 of the library. Version 2 adds a method, .fields() to BaseModel, which will print out all the field names of a subclass. But this creates a name clash with your user’s existing code, wich has fields as the name of an instance attribute rather than a method.

Colvin briefly sketched out an idea for a new way of looking up names that would make it unambiguous whether the name being accessed was a method or attribute.



class Farmer(BaseModel): $name: str $fields: list[str] farmer = Farmer(name='Jones', fields=['meadow', 'highlands']) print(farmer.$fields) # -> ['meadow', 'highlands'] print(farmer.fields()) # -> ['name', 'fields']

The 2022 Python Language Summit: Achieving immortality

What does it mean to achieve immortality? At the 2022 Python Language Summit, Eddie Elizondo, an engineer at Instagram, and Eric Snow, CPython core developer, set out to explain just that.

Only for Python objects, though. Not for humans. That will have to be another PEP.



Objects in Python, as they currently stand

In Python, as is well known, everything is an object. This means that if you want to calculate even a simple sum, such as 194 + 3.14, the Python interpreter must create two objects: one object of type int representing the number 194, and another object of type float representing the type 3.14.

All objects in Python maintain a reference count: a running total of the number of active references to that object that currently exist in the program. If the reference count of an object drops to 0, the object is eventually destroyed (through a process known as garbage collection). This process ensures that programmers writing Python don’t normally need to concern themselves with manually deleting an object when they’re done with it. Instead, memory is automatically freed up.

The need to keep reference counts for all objects (along with a few other mutable fields on all objects) means that there is currently no way of having a “truly immutable” object in Python.

This is a distinction that only really applies at the C level. For example, the None singleton cannot be mutated at runtime at the Python level:


>>> None.__bool__ = lambda self: True Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'NoneType' object attribute '__bool__' is read-only


However, at the C level, the object representing None is mutating constantly, as the reference count to the singleton changes constantly.



Immortal objects

An “immortal object”, according to PEP 683 (written by Elizondo/Snow), is an object marked by the runtime as being effectively immutable, even at the C level. The reference count for an immortal object will never reach 0; thus, an immortal object will never be garbage-collected, and will never die.


“The fundamental improvement here is that now an object can be truly immutable.”

Eddie Elizondo and Eric Snow, PEP 683

 

The lack of truly immutable objects in Python, PEP 683 explains, “can have a large negative impact on CPU and memory performance, especially for approaches to increasing Python’s scalability”.



The benefits of immortality

At their talk at the Python Language Summit, Elizondo and Snow laid out a number of benefits that their proposed changes could bring.

Guaranteeing “true memory immutability”, Elizondo explained, “we can simplify and enable larger initiatives,” including Eric Snow’s proposal for a per-interpreter GIL, but also Sam Gross’s proposal for a version of Python that operates without the GIL entirely. The proposal could also unlock new optimisation techniques in the future by helping create new ways of thinking about problems in the CPython code base.






The costs

A naive implementation of immortal objects is costly, resulting in performance regeressions of around 6%. This is mainly due to adding a new branch of code to the logic keeping track of an object’s reference count.

With mitigations, however, Elizondo and Snow explained that the performance regression could be reduced to around 2%. The question they posed to the assembled developers in the audience was whether this was an “acceptable” performance regression – and, if not, what would be?



Reception

The proposal was greeted with a mix of curious interest and healthy scepticism. There was agreement that certain aspects of the proposal would reach wide support among the community, and consensus that a performance regression of 1-2% would be acceptable if clear benefits could be shown. However, there was also concern that parts of the proposal would change semantics in a backwards-incompatible way.

Pablo Galindo Salgado, Release Manager for Python 3.10/3.11 and CPython Core Developer, worried that all the benefits laid out by the speakers were only potential benefits, and asked for more specifics. He pointed out that changing the semantics of reference-counting would be likely to break an awful lot of projects, given that popular third-party libraries such as numpy, for example, use C extensions which continuously check reference counts.

Thomas Wouters, CPython Core Developer and Steering Council Member, concurred, saying that it probably “wasn’t possible” to make these changes without changing the stable ABI. Kevin Modzelewski, a maintainer of Pyston, a performance-oriented fork of Python 3.8, noted that Pyston had had immortal objects for a while – but Pyston had never made any promise to support the stable ABI, freeing the project of that constraint.