Thursday, May 26, 2022

PyCon JP Association Awarded the PSF Community Service Award for Q4 2021

The PyCon JP Association team from left to right. Takayuki, Shunsuke, Jonas, Manabu, Takanori


The PyCon JP Association team was awarded the 2021 Q4 Community Service Award.

RESOLVED, that the Python Software Foundation award the 2021 Q4 Community Service Award to the following members of the PyCon JP Association for their work in organizing local and regional PyCons, the work of monitoring our trademarks, and in particular organizing the "PyCon JP Charity Talks" raising more than $25,000 USD in funds for the PSF: Manabu Terada, Takanori Suzuki, Takayuki Shimizukawa, Shunsuke Yoshida, Jonas Obrist.

We interviewed Jonas Obrist on behalf of the PyCon JP Association to learn more about their inspiration, their work with the Python community, and supporting the Python Software Foundation - PSF. Débora Azevedo also contributed to this article, by sharing more light on the PyCon JP Association's efforts and commitment to the Python community.

Jonas Obrist Speaks on the motivation behind the PyCon JP Association

What inspired the PyCon JP Association team into raising funds to support the PSF?

We were inspired to raise funds after reading an announcement by the PSF about the impact of the pandemic on the foundation and PyCon US. We could of course empathize with the struggle given that we have a similar situation, even if on a much smaller scale.

We tried to think of a good way to help and either Manabu or Takanori came up with the idea to host online events with presentations by members of the community with all proceeds going to the PSF. We held two such events so far with over a hundred attendees and even managed to gather some companies and individuals as sponsors. 

Thanks to the online nature, costs were kept very low and we were able to make a sizeable donation to the PSF, hopefully softening the financial blow of the pandemic a bit.

How do you think other Python communities can support the PSF?

I think other communities have various ways to support the PSF depending on each community's situation.

We have the fortune to be well established with great support from the wider community and especially local companies, so we were able to make a direct financial contribution

Not all communities are this fortunate, but I think there are still other ways to contribute.

One such way is simply to boost awareness of the Python programming language in the local developer community. ideally, this would eventually lead to companies adopting Python more widely, which in turn could lead to direct support to the PSF, either via direct donations or support of PyCon US by these companies.

Boosting local communities also leads to more Python developers in general, leading to a broader pool of people who could support the PSF in various ways.

The most important thing is for each community to think about what they can do, even if it is just a small thing, given that we owe so much to the PSF.

Débora Azevedo speaks about the impact and significance of the PyCon JP Association

One of the things I am most excited about is seeing the PyCon JP Association team receiving this CSA and realizing that the work being done outside the US-Europe countries is acknowledged.

PyCon JP association has risen to the financial challenge that the PSF went through due to the pandemic and has organized PyCon JP Charity Talks. Aiming to increase awareness within the industry about the importance of the PSF is so meaningful. The charity talks were initiatives by the PyCon JP association, and it was a significant effort on their part. Their work is such an outstanding work that shouldn't be overlooked.

As part of the wider Python community, it warms my heart to see that there are many amazing people worldwide working so we can have a more sustainable Python community.

Seeing local communities contributing to the global community, especially regarding this financial aspect, shows that the community feels the need to have different types of work done and understands the importance and impact of each one of them.

And, as part of the Diversity and Inclusion workgroup and understanding the impact of seeing people from non-US-based communities being acknowledged for doing the work, it feels like we are being heard.

The PyCon JP association's community work is outstanding and I am so grateful to them and for being in a community that values each contribution to its development and growth.

The Python Software Foundation congratulates and celebrates the PyCon JP association team -  Manabu TeradaTakanori SuzukiTakayuki ShimizukawaShunsuke YoshidaJonas Obrist.

You can learn more about making a donation to the PSF here.

Wednesday, May 11, 2022

The 2022 Python Language Summit: Lightning talks

These were a series of short talks, each lasting around five minutes.


Read the rest of the 2022 Python Language Summit coverage here.



Lazy imports, with Carl Meyer

Carl Meyer, an engineer at Instagram, presented on a proposal that has since blossomed into PEP 690: lazy imports, a feature that has already been implemented in Cinder, Instagram’s performance-optimised fork of CPython 3.8.

What’s a lazy import? Meyer explained that the core difference with lazy imports is that the import does not happen until the imported object is referenced.

Examples

In the following Python module, spam.py, with lazy imports activated, the module eggs would never in fact be imported since eggs is never referenced after the import:


# spam.py import sys import eggs def main(): print("Doing some spammy things.") sys.exit(0) if __name__ == "__main__": main()


And in this Python module, ham.py, with lazy imports activated, the function bacon_function is imported – but only right at the end of the script, after we’ve completed a for-loop that’s taken a very long time to finish:


# ham.py
import sys
import time
from bacon import bacon_function def main(): for _ in range(1_000_000_000): print('Doing hammy things') time.sleep(1) bacon_function()
sys.exit(0) if __name__ == "__main__": main()


Meyer revealed that the Instagram team’s work on lazy imports had resulted in startup time improvements of up to 70%, memory usage improvements of up to 40%, and the elimination of almost all import cycles within their code base. (This last point will be music to the ears of anybody who has worked on a Python project larger than a few modules.)

Downsides

Meyer also laid out a number of costs to having lazy imports. Lazy imports create the risk that ImportError (or any other error resulting from an unsuccessful import) could potentially be raised… anywhere. Import side effects could also become “even less predictable than they already weren’t”.

Lastly, Meyer noted, “If you’re not careful, your code might implicitly start to require it”. In other words, you might unexpectedly reach a stage where – because your code has been using lazy imports – it now no longer runs without the feature enabled, because your code base has become a huge, tangled mess of cyclic imports.

Where next for lazy imports?

Python users who have opinions either for or against the proposal are encouraged to join the discussion on discuss.python.org.



Python-Dev versus Discourse, with Thomas Wouters

This was less of a talk, and more of an announcement.

Historically, if somebody wanted to make a significant change to CPython, they were required to post on the python-dev mailing list. The Steering Council now views the alternative venue for discussion, discuss.python.org, to be a superior forum in many respects.

Thomas Wouters, Core Developer and Steering Council member, said that the Steering Council was planning on loosening the requirements, stated in several places, that emails had to be sent to python-dev in order to make certain changes. Instead, they were hoping that discuss.python.org would become the authoritative discussion forum in the years to come.



Asks from Pyston, with Kevin Modzelewski

Kevin Modzelewski, core developer of the Pyston project, gave a short presentation on ways forward for CPython optimisations. Pyston is a performance-oriented fork of CPython 3.8.12.

Modzelewski argued that CPython needed better benchmarks; the existing benchmarks on pyperformance were “not great”. Modzelewski also warned that his “unsubstantiated hunch” was that the Faster CPython team had already accomplished “greater than one-half” of the optimisations that could be achieved within the current constraints. Modzelewski encouraged the attendees to consider future optimisations that might cause backwards-incompatible behaviour changes.



Core Development and the PSF, with Thomas Wouters

This was another short announcement from Thomas Wouters on behalf of the Steering Council. After sponsorship from Google providing funding for the first ever CPython Developer-In-Residence (Łukasz Langa), Meta has provided sponsorship for a second year. The Steering Council also now has sufficient funds to hire a second Developer-In-Residence – and attendees were notified that they were open to the idea of hiring somebody who was not currently a core developer.



“Forward classes”, with Larry Hastings

Larry Hastings, CPython core developer, gave a brief presentation on a proposal he had sent round to the python-dev mailing list in recent days: a “forward class” declaration that would avoid all issues with two competing typing PEPs: PEP 563 and PEP 649. In brief, the proposed syntax would look something like this:



forward class X() continue class X: # class body goes here def __init__(self, key): self.key = key


In theory, according to Hastings, this syntax could avoid issues around runtime evaluation of annotations that have plagued PEP 563, while also circumventing many of the edge cases that unexpectedly fail in a world where PEP 649 is implemented.

The idea was in its early stages, and reaction to the proposal was mixed. The next day, at the Typing Summit, there was more enthusiasm voiced for a plan laid out by Carl Meyer for a tweaked version of Hastings’s earlier attempt at solving this problem: PEP 649.



Better fields access, with Samuel Colvin

Samuel Colvin, maintainer of the Pydantic library, gave a short presentation on a proposal (recently discussed on discuss.python.org) to reduce name clashes between field names in a subclass, and method names in a base class.

The problem is simple. Suppose you’re a maintainer of a library, whatever_library. You release Version 1 of your library, and one user start to use your library to make classes like the following:

from whatever_library import BaseModel class Farmer(BaseModel): name: str fields: list[str]


Both the user and the maintainer are happy, until the maintainer releases Version 2 of the library. Version 2 adds a method, .fields() to BaseModel, which will print out all the field names of a subclass. But this creates a name clash with your user’s existing code, wich has fields as the name of an instance attribute rather than a method.

Colvin briefly sketched out an idea for a new way of looking up names that would make it unambiguous whether the name being accessed was a method or attribute.



class Farmer(BaseModel): $name: str $fields: list[str] farmer = Farmer(name='Jones', fields=['meadow', 'highlands']) print(farmer.$fields) # -> ['meadow', 'highlands'] print(farmer.fields()) # -> ['name', 'fields']

The 2022 Python Language Summit: Achieving immortality

What does it mean to achieve immortality? At the 2022 Python Language Summit, Eddie Elizondo, an engineer at Instagram, and Eric Snow, CPython core developer, set out to explain just that.

Only for Python objects, though. Not for humans. That will have to be another PEP.



Objects in Python, as they currently stand

In Python, as is well known, everything is an object. This means that if you want to calculate even a simple sum, such as 194 + 3.14, the Python interpreter must create two objects: one object of type int representing the number 194, and another object of type float representing the number 3.14.

All objects in Python maintain a reference count: a running total of the number of active references to that object that currently exist in the program. If the reference count of an object drops to 0, the object is eventually destroyed (through a process known as garbage collection). This process ensures that programmers writing Python don’t normally need to concern themselves with manually deleting an object when they’re done with it. Instead, memory is automatically freed up.

The need to keep reference counts for all objects (along with a few other mutable fields on all objects) means that there is currently no way of having a “truly immutable” object in Python.

This is a distinction that only really applies at the C level. For example, the None singleton cannot be mutated at runtime at the Python level:


>>> None.__bool__ = lambda self: True Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'NoneType' object attribute '__bool__' is read-only


However, at the C level, the object representing None is mutating constantly, as the reference count to the singleton changes constantly.



Immortal objects

An “immortal object”, according to PEP 683 (written by Elizondo/Snow), is an object marked by the runtime as being effectively immutable, even at the C level. The reference count for an immortal object will never reach 0; thus, an immortal object will never be garbage-collected, and will never die.


“The fundamental improvement here is that now an object can be truly immutable.”

Eddie Elizondo and Eric Snow, PEP 683

 

The lack of truly immutable objects in Python, PEP 683 explains, “can have a large negative impact on CPU and memory performance, especially for approaches to increasing Python’s scalability”.



The benefits of immortality

At their talk at the Python Language Summit, Elizondo and Snow laid out a number of benefits that their proposed changes could bring.

Guaranteeing “true memory immutability”, Elizondo explained, “we can simplify and enable larger initiatives,” including Eric Snow’s proposal for a per-interpreter GIL, but also Sam Gross’s proposal for a version of Python that operates without the GIL entirely. The proposal could also unlock new optimisation techniques in the future by helping create new ways of thinking about problems in the CPython code base.






The costs

A naive implementation of immortal objects is costly, resulting in performance regeressions of around 6%. This is mainly due to adding a new branch of code to the logic keeping track of an object’s reference count.

With mitigations, however, Elizondo and Snow explained that the performance regression could be reduced to around 2%. The question they posed to the assembled developers in the audience was whether this was an “acceptable” performance regression – and, if not, what would be?



Reception

The proposal was greeted with a mix of curious interest and healthy scepticism. There was agreement that certain aspects of the proposal would reach wide support among the community, and consensus that a performance regression of 1-2% would be acceptable if clear benefits could be shown. However, there was also concern that parts of the proposal would change semantics in a backwards-incompatible way.

Pablo Galindo Salgado, Release Manager for Python 3.10/3.11 and CPython Core Developer, worried that all the benefits laid out by the speakers were only potential benefits, and asked for more specifics. He pointed out that changing the semantics of reference-counting would be likely to break an awful lot of projects, given that popular third-party libraries such as numpy, for example, use C extensions which continuously check reference counts.

Thomas Wouters, CPython Core Developer and Steering Council Member, concurred, saying that it probably “wasn’t possible” to make these changes without changing the stable ABI. Kevin Modzelewski, a maintainer of Pyston, a performance-oriented fork of Python 3.8, noted that Pyston had had immortal objects for a while – but Pyston had never made any promise to support the stable ABI, freeing the project of that constraint.

The 2022 Python Language Summit: A per-interpreter GIL

“Hopefully,” the speaker began, “This is the last time I give a talk on this subject.”

“My name is Eric Snow, I’ve been a core developer since 2012, and I’ve been working towards a per-interpreter GIL since 2014.”



In 1997, the PyInterpreterState struct was added to CPython, allowing multiple Python interpreters to be run simultaneously within a single process. “For the longest time,” Snow noted, speaking at the 2022 Python Language Summit, this functionality was little used and little understood. In recent years, however, awareness and adoption of the idea has been spreading.




Multiple interpreters, however, cannot yet be run in true isolation from each other when run inside the same process. Part of this is due to the GIL (“Global Interpreter Lock”), a core feature of CPython that is the building block for much of the language. The obvious solution to this problem is to have a per-interpreter GIL: a separate lock for each interpreter spawned within the process.

With a per-interpreter GIL, Snow explained, CPython will be able to achieve true multicore parallelism for code running in different interpreters.

A per-interpreter GIL, however, is no small task. “In general, any mutable state shared between multiple interpreters must be guarded by a lock,” Snow explained. Ultimately, what this means is that the amount of state shared between multiple interpreters must be brought down to an absolute minimum if the GIL is to become per-interpreter. As of 2017, there were still several thousand global variables; now, after a huge amount of work (and several PEPs), this has been reduced to around 1200 remaining globals.


 

“Basically, we can’t share objects between interpreters”

– Eric Snow

 


Troubles on the horizon

The reception to Snow’s talk was positive, but a number of concerns were raised by audience members.

One potential worry is the fact that any C-extension module that wishes to be compatible with sub-interpreters will have to make changes to their design. Snow is happy to work on fixing these for the standard library, but there’s concern that end users of Python may put pressure on third-party library maintainers to provide support for multiple interpreters. Nobody wishes to place an undue burden on maintainers who are already giving up their time for free; subinterpreter support should remain an optional feature for third-party libraries.

Larry Hastings, a core developer in the audience for the talk, asked Snow what exactly the benefits of subinterpreters were compared to the existing multiprocessing module (allowing interpreters to be spawned in parallel processes), if sharing objects between the different interpreters would pose so many difficulties. Subinterpreters, Snow explained, hold significant speed benefits over multiprocessing in many respects.

Hastings also queried how well the idea of a per-interpreter GIL would interact with Sam Gross’s proposal for a version of CPython that removed the GIL entirely. Snow replied that there was minimal friction between the two projects. “They’re not mutually exclusive,” he explained. Almost all the work required for a per-interpreter GIL “is stuff that’s a good idea to do anyway. It’s already making CPython faster by consolidating memory”.

The 2022 Python Language Summit: Python in the browser

Python can be run on many platforms: Linux, Windows, Apple Macs, microcomputers, and even Android devices. But it’s a widely known fact that, if you want code to run in a browser, Python is simply no good – you’ll just have to turn to JavaScript.

Now, however, that may be about to change. Over the course of the last two years, and following over 60 CPython pull requests (many attached to GitHub issue #84461), Core Developer Christian Heimes and contributor Ethan Smith have achieved a state where the CPython main branch can now be compiled to WebAssembly. This opens up the possibility of being able to run arbitrary Python programs clientside inside your web browser of choice.

At the 2022 Python Language Summit, Heimes gave a talk updating the attendees of the progress he’s made so far, and where the project hopes to go next.



WebAssembly basics

WebAssembly (or “WASM”, for short), Heimes explained, is a low-level assembly-like language that can be as fast as native machine code. Unlike your usual machine code, however, WebAssembly is independent from the machine it is running on. Instead, the core principle of WebAssembly is that it can be run anywhere, and can be run in a completely isolated environment. This leads to it being a language that is extremely fast, extremely portable, and provides minimal security risks – perfect for running clientside in a web browser.







After much work, CPython now cross-compiles to WebAssembly using emscripten through the --with-emscripten-target=browser flag. The CPython test suite now also passes on emscripten builds, and work is going towards adding a buildbot to CPython’s fleet of automatic robot testers, to ensure this work does not regress in the future.

Users who want to try out Python in the browser can try it out at https://repl.ethanhs.me/. The work opens up exciting possibilities of being able to run PyGame clientside and adding Jupyter bindings.







Support status

It should be noted that cross-compiling to WebAssembly is still highly experimental, and not yet officially supported by CPython. Several important modules in the Python standard library are not currently included in the bundled package produced when --with-emscripten-target=browser is specified, leading to a number of tests needing to be skipped in order for the test suite to pass.




Nonetheless, the future’s bright. Only a few days after Heimes’s talk, Peter Wang, CEO at Anaconda, announced the launch of PyScript in a PyCon keynote address. PyScript is a tool that allows Python to be called from within HTML, and to call JavaScript libraries from inside Python code – potentially enabling a website to be written entirely in Python.

PyScript is currently built on top of Pyodide, a third-party project bringing Python to the browser, on which work began before Heimes started his work on the CPython main branch. With Heimes’s modifications to Python 3.11, this effort will only become easier.

The 2022 Python Language Summit: Performance Improvements by the Faster CPython team

Python 3.11, if you haven’t heard, is fast. Over the past year, Microsoft has funded a team – led by core developers Mark Shannon and Guido van Rossum – to work full-time on making CPython faster. With additional funding from Bloomberg, and help from a wide range of other contributors from the community, the results have borne fruit. On the pyperformance benchmarks at the time of the beta release, Python 3.11 was around 1.25x faster than Python 3.10, a phenomenal achievement.

But there is more still to be done. At the 2022 Python Language Summit, Mark Shannon presented on where the Faster CPython project aims to go next. The future’s fast.



The first problem Shannon raised was a problem of measurements. In order to know how to make Python faster, we need to know how slow Python is currently. But how slow at doing what, exactly?

Good benchmarks are vital for a project that aims to optimise Python for general usage. For that, the Faster CPython team needs the help of the community at large. The project “needs more benchmarks,” Shannon said – it needs to understand more precisely what the user base at large is using Python for, how they’re doing it, and what makes it slow at the moment (if it is slow!).

A benchmark, Shannon explained, is “just a program that we can time”. Anybody with a benchmark – or even just a suggestion for a benchmark! – that they believe is representative of a larger project they’re working on is invited to submit them to the issue tracker at the python/pyperformance repository on GitHub.



Nonetheless, the Faster CPython team has plenty to be getting on with in the meantime.

Much of the optimisation work in 3.11 has been achieved through the implementation of PEP 659, a “specializing adaptive interpreter”. The adaptive interpreter that Shannon and his team have introduced tracks individual bytecodes at various points in a program’s execution. When it spots an opportunity, a bytecode may be “quickened”: this means that a slow bytecode, that can do many things, is replaced by the interpreter with a more specialised bytecode that is very good at doing one specific thing. The work on PEP 659 has now largely been done, but major parts, such as dynamic specialisations of for-loops and binary operations, are still to be completed.

Shannon noted that Python also has essentially the same memory consumption in 3.11 as it did in 3.10. This is something he’d like to work on: a smaller memory overhead generally means fewer reference-counting operations in the virtual machine, a lower garbage-collection overhead, and smoother performance as a result of it all.

Another big remaining avenue for optimisations is the question of C extensions. CPython’s easy interface with C is its major advantage over other Python implementations such as PyPy, where incompatibilities with C extensions are one of the biggest hurdles for adoption by users. The optimisation work that has been done in CPython 3.11 has largely ignored the question of extension modules, but Shannon now wants to open up the possibility of exposing low-level function APIs to the virtual machine, reducing the overhead time of communicating between Python code and C code.



Is that a JIT I see on the horizon?

Lastly, but certainly not least, Shannon said, “everybody wants a JIT compiler… even if it doesn’t make sense yet”.

A JIT (“just-in-time”) compiler is the name given for a compiler that dynamically detects where performance bottlenecks exist in a program as the program is running. Once these bottlenecks have been identified, the JIT compiles these parts of the program on-the-fly into native machine code in order to speed things up. It’s a similar idea to Shannon’s PEP 659, but goes much further, since the specialising adaptive interpreter never goes beyond the bytecode level.

The idea of using a JIT compiler for Python is hardly new. PyPy’s JIT compiler is the major source of the large performance gains the project has over CPython in some areas. Third-party projects, such as pyjion and numba, bring just-in-time compilation to CPython that’s just a pip install away. Integrating a JIT into the core of CPython, however, would be materially different.

Shannon has historically voiced scepticism about the wisdom of introducing a JIT compiler into CPython itself, and said that work on introducing one is still some way off. A JIT, according to Shannon, will probably not arrive until 3.13 at the earliest, given the amount of lower-hanging fruit that is still to be worked on. The first step towards a JIT, he explained, would be to implement a trace interpreter, which would allow for better testing of concepts and lay the groundwork for future changes.



Playing nicely with the other Python projects

The gains Shannon’s team has achieved are hugely impressive, and likely to benefit the community as a whole in a profound way. But various problems lie on the horizon. Sam Gross’s proposal for a version of CPython without the Global Interpreter Lock (the nogil fork) has potential for speeding up multithreaded Python code in very different ways to the Faster CPython team’s work – but it could also be problematic for some of the optimisations that have already been implemented, many of which assume that the GIL exists. Eric Snow’s dream of achieving multiple subinterpreters within a single process, meanwhile, will have a smaller performance impact on single-threaded code compared to nogil, but could still create some minor complications for Shannon’s team.

The 2022 Python Language Summit: Upstreaming optimisations from Cinder

In May 2021, the team at Instagram made waves in the world of Python by open-sourcing Cinder, a performance-oriented fork of CPython.

Cinder is a version of CPython 3.8 with a ton of optimisations added to improve speed across a wide range of metrics, including “eager evaluation of coroutines”, a just-in-time compiler, and an “experimental bytecode compiler” that makes use of PEP 484 type annotations.

Now, the engineers behind Cinder are looking to upstream many of these changes so that CPython itself can benefit from these optimisations. At the 2022 Python Language Summit, Itamar Ostricher, an engineer at Instagram, presented on Cinder’s optimisations relating to async tasks and coroutines.



Asyncio refresher

Consider the following (contrived) example. Here, we have a function, IO_bound_function, which is dependent on some kind of external input in order to finish what it’s doing (for example, this might be a web request, or an attempt to read from a file, etc.). We also have another function, important_other_task, which we want to be run in the same event loop as IO_bound_function

import asyncio async def IO_bound_function(): """This function could finish immediately... or not!""" # Body of this function here async def important_other_task(): await asyncio.sleep(5) print('Task done!') async def main(): await asyncio.gather( IO_bound_function(), important_other_task() ) print("All done!") if __name__ == "__main__": asyncio.run(main)

IO_bound_function could take a long time to complete – but it could also complete immediately. In an asynchronous programming paradigm, we want to ensure that if it takes a long time to complete, the function doesn’t hold up the rest of the program. Instead, IO_bound_function will yield execution to the other thing scheduled in the event loop, important_other_task, letting this coroutine take control of execution for a period.

So far so good – but what if IO_bound_function finishes what it’s doing immediately? In that eventuality, we’re creating a coroutine object for no reason at all, since the coroutine will never have to suspend execution and will never have to reclaim control of the event loop at any future point in time.



Call me maybe?

The team at Instagram saw this as an optimisation opportunity. At the “heart” of many of their async-specific improvements, Itamar explained, is an extension to Python’s vectorcall protocol: a new _Py_AWAITED_CALL_MARKER flag, which enables a callee to know that a call is being awaited by a caller.



The addition of this flag means that awaitables can sometimes be eagerly evaluated, and coroutine objects often do not need to be constructed at all.




Ostricher reported that Instagram had seen performance gains of around 5% in their async-heavy workloads as a result of this optimisation.



Pending questions

Significant questions remain about whether these optimisations can be merged into the main branch of CPython, however. Firstly, exact performance numbers are hard to come by: the benchmark Ostricher presented does not isolate Cinder’s async-specific optimisations.

More important might be the issue of fairness. If some awaitables in an event loop are eagerly evaluated, this might change the effective priorities in an event loop, potentially creating backwards-incompatible changes with CPython’s current behaviour.

Lastly, there are open questions about whether this conflicts with a big change to asyncio that has just been made in Python 3.11: the introduction of task groups. Task groups – a concept similar to “nurseries” in Trio, a popular third-party async framework – are a major evolution in asyncio’s API. But “it’s not completely clear how the Cinder optimisations might apply to Task Groups,” Ostricher noted.

Ostricher’s talk was well received by the audience, but it was agreed that discussion with the maintainers of other async frameworks such as Trio was essential in order to move forward. Guido van Rossum, creator of Python, opined that he could “get over the fairness issue”. The issue of compatibility with task groups, however, may prove more complicated.

Given the newness of task groups in asyncio, there remains a high degree of uncertainty as to how this feature will be used by end users. Without knowing the potential use cases, it is hard to comment on whether and how optimisations can be made in this area.

The 2022 Python Language Summit: F-Strings in the grammar

Formatted string literals, known as “f-strings” for short, were first introduced in Python 3.6 via PEP 498. Since then, they’ve swiftly become one of the most popular features of modern Python.

At the 2022 Python Language Summit, Pablo Galindo Salgado presented on exciting new possibilities for improving the implementation of f-strings by incorporating f-string parsing into the grammar of Python itself.



F-strings: the current situation

Here’s the f-string we all know and love:

f"Interpolating an arbitrary expression into a string: {1 + 1}."

At runtime, the expression between the braces will be evaluated (the result’s 2, for those wondering). str() will then be called on the result of the evaluated expression, and the result of that will then be interpolated into the string at the desired place.

The following diagram summarises the code behind this runtime magic:




That’s right: the current implementation for f-strings relies on “around 1,400 lines of manually written parser code in C”. This is entirely separate to the parsing logic that exists elsewhere for deconstructing all other Python code.

According to Salgado, while this code has now been extensively battle-tested, it is still ridden with tiny bugs. The complexity of the code here also makes it very difficult to make changes to the code without introducing new bugs, which in turn makes it hard to work on improving error messages – a major goal of Salgado’s over recent years.


“The technical term for this is not ideal

– Pablo Galindo Salgado, on the current state of f-string parsing



What the f?

How do we know that an f-string is an f-string? “As humans,” Salgado noted, we know an f-string is an f-string “because the string starts with f”.

Sadly, CPython’s parser is not yet quite as smart. The parser, currently, does not have the wherewithal to distinguish f-strings from other strings. Instead, when parsing f-strings, the interpreter does the following dance:



Since Python 3.9, however, CPython is equipped with a shiny new PEG parser, which can be taught all sorts of advanced and complicated things. Salgado now proposes to move f-string parsing logic from the dedicated C file into CPython’s formal grammar, enabling f-string parsing to happen at the same time as the parsing for the rest of a Python file.

Salgado, along with others who are working on the project with him, is currently working on fixing “89876787698 tiny error message differences”, explaining, “This is the hard part. We need feedback – we need to know if it’s actually worth doing at this point.”

Eric V. Smith, the creator of f-strings, and a member of the audience for the talk, is a strong supporter of the project. Smith explained that the original design had been chosen due to a desire to make life simple for code editor tools such as IDEs, which might have struggled to parse f-strings if this design had been chosen initially. But he had been "surprised", he said, at how well Salgado's proposal had been received among the authors of these tools when he had recently sought feedback on it.

Smith plans to continue gathering feedback from IDE maintainers over the coming months.



Potential benefits

As well as simplifying the code and making it more maintainable, moving f-string parsing into the grammar holds a number of exciting (and/or horrifying) possibilities. For example, this is just about the maximum level of f-string nesting you can currently achieve in Python:

f"""a{f'''b{f"c{f'd {1+2=}'}"}'''}"""

But with f-strings in the grammar… the possibilities are endless.

On a more serious note, Salgado noted that the following is a common source of frustration for Python developers:

>>> my_list_of_strings = ['a', 'b', 'c'] >>> f'Here is a list: {"\n".join(my_list_of_strings)}' File "<stdin>", line 1 f'Here is a list: {"\n".join(my_list_of_strings)}' ^ SyntaxError: f-string expression part cannot include a backslash

But by having f-strings as part of the grammar, there would no longer be any need for this restriction.



Reception

The proposal was extremely enthusiastically received. “It would be huge if we could get syntax errors pointing to the correct place like with normal Python code,” commented Eric Snow.

I’m all for it.
– Guido van Rossum, Core Developer and creator of Python

 

We are all in agreement. You should just do it.
– Thomas Wouters, Core Developer and Steering Council member

 

You can spend the next ten minutes implementing it.
– Łukasz Langa, Core Developer and CPython Developer-in-Residence

The 2022 Python Language Summit: Python without the GIL

If you peruse the archives of language-summit blogs, you’ll find that one theme comes up again and again: the dream of Python without the GIL. Continuing this venerable tradition, Sam Gross kicked off the 2022 Language Summit by giving the attendees an update on nogil, a project that took the Python community by storm when it was first announced in October 2021.

The GIL, or “Global Interpreter Lock”, is the key feature of Python that prevents true concurrency between threads. This is another way of saying that it makes it difficult to do multiple tasks simultaneously while only running a single Python process. Previously the main cheerleader for removing the GIL was Larry Hastings, with his famous “Gilectomy” project. The Gilectomy project was ultimately abandoned due to the fact that it made single-threaded Python code significantly slower. But after seeing Gross’s proof-of-concept fork in October, Hastings wrote in an email to the python-dev mailing list:


Sam contacted me privately some time ago to pick my brain a little. But honestly, Sam didn’t need any helphe’d already taken the project further than I’d ever taken the Gilectomy.



The current status of nogil

Since releasing his proof-of-concept fork in October – based on an alpha version of Python 3.9 – Gross stated that he’d been working to rebase the nogil changes onto 3.9.10.

3.9 had been chosen as a target for now, as reaching a level of early adoption was important in order to judge whether the project as a whole would be viable. Early adopters would not be able to use the project effectively if third-party packages didn’t work when using nogil. There is still much broader support for Python 3.9 among third-party packages than for Python 3.10, and so Python 3.9 still made more sense as a base branch for now rather than 3.10 or main.

Gross’s other update was that he had made a change in his approach with regard to thread safety. In order to make Python work effectively without the GIL, a lot of code needs to have new locks added to it in order to ensure that it is still thread-safe. Adding new locks to existing code, however, can be very difficult, as there is potential for large slowdowns in some areas. Gross’s solution had been to invent a new kind of lock, one that is “more Gilly”.



The proposal

Gross came to the Summit with a proposal: to introduce a new compiler flag in Python 3.12 that would disable the GIL.

This is a slight change to Gross’s initial proposal from October, where he brought up the idea of a runtime flag. A compiler flag, however, reduces the risk inherent in the proposal: “You have more of a way to back out.” Additionally, using a compiler flag avoids thorny issues concerning preservation of C ABI stability. “You can’t do it with a runtime flag,” Gross explained, “But there’s precedent for changing the ABI behind a compiler flag”.



Reception

Gross’s proposal was greeted with a mix of excitement and robust questioning from the assembled core developers.

Carol Willing queried whether it might make more sense for nogil to carry on as a separate fork of CPython, rather than for Gross to aim to merge his work into the main branch of CPython itself. Gross, however, responded that this “was not a path to success”.

"A lot of the value of Python is the ecosystem, not just the language… CPython really leads the way in terms of the community moving as a block.

"Removing the GIL is a really transformative step. Most Python programs just don’t use threads at the moment if they want to run on multiple cores. If nogil is to be a success, the community as a whole has to buy into it."

– Sam Gross

Samuel Colvin, maintainer of the pydantic library, expressed disappointment that the new proposal was for a compiler flag, rather than a runtime flag. “I can’t help thinking that the level of adoption would be massively higher” if it was possible to change the setting from within Python, Colvin commented.

There was some degree of disagreement as to what the path forward from here should be. Gross appeared to be seeking a high-level decision about whether nogil was a viable way forward. The core developers in attendance, however, were reluctant to give an answer without knowing the low-level costs. “We need to lay out a plan of how to proceed,” remarked Pablo Galindo Salgado. “Just creating a PR with 20,000 lines of code changed is infeasible.”

Barry Warsaw and Itamar Ostricher both asked Gross about the impact nogil could have on third-party libraries if they wanted to support the new mode. Gross responded that the impact on many libraries would be minimal – no impact at all to a library like scikit-learn, and perhaps only 15 lines of code for numpy. Gross had received considerable interest from scientific libraries, he said, so was confident that the pressure to build separate C extensions to support nogil mode would not be unduly burdensome. Carol Willing encouraged Gross to attend scientific-computing conferences, to gather more feedback from that community.

There was also a large amount of concern from the attendees about the impact the introduction of nogil could have on CPython development. Some worried that introducing nogil mode could mean that the number of tests run in CI would have to double. Others worried that the maintenance burden would significantly increase if two separate versions of CPython were supported simultaneously: one with the GIL, and one without.

Overall, there was still a large amount of excitement and curiosity about nogil mode from the attendees. However, significant questions remain unresolved regarding the next steps for the project.

The 2022 Python Language Summit: Dealing with CPython's issue and PR backlog

“Any noise annoys an oyster, but a noisy noise annoys an oyster most.”

– Tongue twister, author unknown

As the Python programming language continues to grow in popularity, so too does the accumulation of issues and pull requests (“PRs”) on the CPython GitHub repository. At time of writing (morning of 7th May, 2022), the total stands at 7,027 open issues, and 1,471 pull requests. At the 2022 Python Language Summit, CPython Core Developer Irit Katriel gave a talk on possible ways forward for dealing with the backlog.



Historically, there has been reluctance among CPython’s team of core developers to close issues or PRs that may be of dubious worth. BPO-539907 was presented to the audience as an issue that had remained open on the issue tracker for over 20 years. The example is an extreme one, but represents a pattern that anybody who has scrolled through the CPython issue tracker will surely have seen before:



Anyone with experience in triaging issue trackers in open source will know that it is not always easy to close an issue. People on the internet do not always take kindly to being told that something they believe to be a bug is, in fact, intended behaviour.

Low-quality feature requests can be even harder to tackle, and can be broadly split into three buckets. The first bucket holds feature requests that simply make no sense, or else would have actively harmful impacts – these can be fairly easily closed. The second bucket holds feature requests that would add maintenance costs, but would realistically add little value to end users. These can often lead to tiresome back-and-forths – something one person may see as a large problem in a piece of software may, ultimately, cause few problems for the majority of users.

The feature requests that can linger on an issue tracker for twenty years, however, are usually those in the third bucket: features that everybody can agree might be nice if, in an ideal world, they were implemented – but that nobody, ultimately, has the time or motivation to work on.



Katriel’s contention is that leaving an issue open on the tracker for 20 years serves no one and that, instead, we should think harder about what an issue tracker is actually for.

If the proposed tkinter lock from BPO-539907 is ever implemented, Katriel, argues, “it’s not because of the twenty-year-old issue – it’s because somebody will discover the need for it.” Rather than only closing issues that have obvious defects, we should flip the script, and become far more willing to close issues if they serve no obvious purpose. Issues should only be kept open if they serve an obvious purpose in helping further CPython’s development. Instead of asking “Why should we close this?”, we should instead ask, “Why should we keep this open?”

Citing a recent blog post by Sam Schillace, Katriel argues that not only do issues such as BPO-539907 (newly renamed as GH-36387, for those keeping tabs) serve little purpose – they also do active harm to the CPython project. Schillace argues that the problem of the “noisy monitor” – a term he uses for any kind of feedback system where it becomes impossible to tell the signal from the noise – is “one of the most pernicious, and common, patterns that engineering teams fall prey to”. Leaving low-quality issues on a tracker, Shillace argues, wastes the time of developers and triagers, and “obscures both newer quality issues as well as the overall drift of the product.”


“It’s far better… to keep the tool clean for the things that matter.”

– Sam Schillace, Noisy Monitors

 


No one has done more work than Katriel over the past few years to keep the issue tracker healthy, and her presentation was well received by the audience of core devs and triagers. The question of where now to proceed, however, is harder to tackle.

Pablo Galindo Salgado, an expert on CPython’s PEG parser, and the chief architect behind the “Better error messages” project in recent years, noted that he received “many, many issues” relating to possible changes to the parser and improvements to error messages. “Every time you close an issue,” he said, “People demand an explanation.” Arguing that maintainer time is “the most valuable resource” in open-source software, Salgado said that the easiest option was often just to leave it open.

However, hard though it may be to close an issue, ignoring open issues for an extended period of time also does a disservice to contributors. Itamar Ostricher – not a CPython core developer, but an experienced software engineer who has worked for many years at Meta – said that the contributor experience was “often confusing”. “Is this an issue where a PR would be accepted if I wrote one? Does a core dev want to work on it?” Ostricher asked. Or is it just a bad idea?

Ned Deily, release manager for Python 3.6 and 3.7, agreed, and argued that CPython needed to become more consistent in how core devs treat issues and PRs. Some modules, like tkinter, have been “ownerless” for a long time, Deily argued. This can create a chicken-and-egg problem. If a module has no maintainer, the solution is to add more maintainers. But a contributor can only become a maintainer if they demonstrate their worth through a series of merged PRs. And if a module has no active maintainer, there may be no core devs who feel they have sufficient expertise to review and merge a PR relating to the unmaintained module. So the contributor can never become a core developer (as their PRs will never be merged), and the module will never gain a new maintainer.



Where now?

Various solutions were proposed to improve the situation. Katriel thought it would be good to introduce a new “Accepted” label, that could be added by a triager or a core developer. The idea is that the presence of the label signifies that the core developer team is not waiting for any further information from the issue filer: the bug report (or feature request) has been acknowledged as valid.

Many attendees noted that the problem was in many ways a social problem rather than a technical problem: the core development team needed a fundamental change in mindset if they were to seriously tackle the issue backlog. Senthil Kumaran argued that we should “err on the side of closing things”. Jelle Zijlstra similarly argued that we needed to reach a place where it was understood to be “okay” to close a feature request that had been open for many years with no activity.

There was also, however, interest in improving workflow automation. Christian Heimes discussed the difficulty of closing an issue or PR if you are a core developer with English as a second language. Crafting the nuances of a rejection notice so that it is polite but also clear can be a challenging task. Ideas around automated messages from bots or canned responses were discussed.



The enormity of the task at hand is clear. Unfortunately, there is probably not one easy fix that will solve the problem.

Things are already moving in a better direction, however, in many respects. Łukasz Langa, CPython’s Developer-In-Residence, has been having a huge impact in stabilising the number of open issues. The CPython triage team, a group of volunteers helping the core developers maintain CPython, has also been significantly expanded in recent months, increasing the workforce available to triage and close issues and PRs.

PEP 594, deprecating several standard-library modules that have been effectively unmaintained for many years, also led to a large number of issues and PRs being closed in recent months. And the transition to GitHub issues itself, which only took place a few weeks ago, appears to have imbued the triage team with a new sense of energy.

Discussion continues on Discourse about further potential ways forward:

The 2022 Python Language Summit

Every year, just before the start of PyCon US, around 30 core developers, triagers, and special guests gather for the Python Language Summit: an all-day event of talks where the future direction of Python is discussed. The summit in 2022 was the first in-person summit since 2019, due to disruption caused by the coronavirus pandemic in 2020-21.

This year's summit was covered by Alex Waygood.


This year's Language Summit attendees



Tuesday, May 10, 2022

No Starch Press has released an all-Python Humble Bundle!

No Starch Press, an indie tech-book publisher and Community sponsor of the PSF,  just announced a partnership with Humble Bundle that lets you pay what you want for all-Python DRM-free ebook titles for Python beginners to pros. And a share of the proceeds from the bundle goes to the PSF!

The “Python, by No Starch Press” bundle runs now through May 23rd and benefits the PSF as well as Hacker Initiative, a public 501(c)(3) organization for hackers, created by NSP founder Bill Pollock.

The bundle includes picks for all levels like Python Crash Course (Eric Matthes), Automate the Boring Stuff with Python (Al Sweigart), and Doing Math with Python (Amit Saha). The promotion has a pay-what-you-want model, so you can choose a pricing tier for up to 18 ebooks, then decide how much of your purchase goes to the featured causes.

Pricing starts at $1 for Python Playground (Mahesh Venkitachalam) + Doing Math with Python, or starting at $30 you can get all 18 Python ebooks! 

Check it out and happy reading!