Saturday, May 29, 2021

The 2021 Python Language Summit: What Is the stdlib?

Brett Cannon gave a presentation at the 2021 Python Language Summit about the standard library in order to start a conversation about whether it's time to write a PEP that more clearly defines it.

Brett Cannon

 

What Is the stdlib?

He succinctly described the stdlib as "a collection of modules that ship with CPython (usually)." This was the most accurate definition he could give, considering how big it is, how varied its contents are, and how long it has been around.

He didn't offer an answer to the question about whether or not there should be a new informational PEP to define clear goals for the stblib, but he wanted core developers to engage with the question. There are a variety of opinions on the stdlib, but it could be beneficial to come to some kind of agreement about:

  • What it should be
  • How to manage it
  • What it should focus on
  • How to decide what will or will not be added to it

He shared that he semi-regularly sees requests for adding a TOML parser to the stdlib. When he considers requests, he asks himself:

  • Should a module be added?
  • How should such a decision be made?
  • What API should it have?
  • Is there a limit to how big the stdlib should get?

So far, there haven't been basic guidelines for answering these kind of questions. As a result, decisions have been made on a case-by-case basis.

How Big Is the stdlib?

He did some data digging in March of 2021 and shared his findings. Here are the broad strokes:

  • python -v -S -c pass imports 14 modules.
  • The are 208 top-level modules in the standard library.
  • The ratio of modules to people registered to vote in the last Steering Council election is 2.3, but not all of those people are equally available to help maintain the stdlib.

What Should the stdlib Cover?

Some people have suggested that the stdlib should be focused on helping users bootstrap pip and no more. Others have said that it should focus on system administration or other areas. Considering that there are thirty-one thematic groupings in the index, the people who maintain the stdlib don't seem to have come to a collective decision either. The groupings cover everything from networking to GUI libraries to REPLs and more.

The stdlib has been around for a long time, and we need to be careful about breaking people's code, so the goal is not to deprecate what is already there but to consider guidelines for making additions.

How Do We Decide What Goes Into the stdlib?

He compared PEP 603 and graphlib to show how this question has been answered in different ways in the past. The goal of PEP 603 was to add the class frozenmap to the collections module. However, the graphlib module was only ever discussed in an issue and never got a PEP before it was added to the stdlib. There is no standardized approach for making these kinds of decisions, so he would like to know what approaches core developers think would be most appropriate.

What Is the Maintenance Cost?

The PR queue is already long, which can be overwhelming for maintainers and discouraging for contributors.

The following modules aren't used in any of the top 4000 projects:

  • mailcap
  • binhex
  • chunk
  • nis

Seventy-six modules are used in less than 1% of the 4000 most downloaded projects on PyPI. That's over 36% of all the modules in the stdlib. This raises some questions:

  • Do we want to continue to ship these modules?
  • What does this tell us about what the community finds useful in the stdlib?
  • How can that inform future guidelines about what to include in the stdlib?

Based on the data from March 2021, there were:

  • 37 modules with no open PRs
  • 1,451 PRs involving the stdlib, which made up the bulk of all the PRs
The module with the highest number of PRs was asyncio, which had only 50. That's only 3% of all of the open PRs at the time. 
 
The standard library has a significant maintenance cost, but core developers can formulate a plan to get the most out of the maintenance that goes into the stdlib by deciding what it should focus on. They can discuss these issues and work towards resolving them this year.

Monday, May 24, 2021

The 2021 Python Language Summit: The Python Documentation Work Group

Mariatta Wijaya and Carol Willing gave a presentation about a new documentation work group at the 2021 Python Language Summit. Last year, Carol Willing and Ned Batchelder spoke about laying the groundwork for this project at the 2020 Python Language Summit.

Carol Willing and Mariatta Wijaya
 

Why Does Python Need a Documentation Work Group?

The mission of the Python Software Foundation is to advance the Python language and grow a diverse, international community of Python programmers so that the language can continue to flourish well into the future. However, when it comes to documentation, core developers don't necessarily reflect the larger world of Python users. If we can bring together core developers, documentarians, and educators, then we can have better documentation that serves the needs of the wider community more effectively.

What Should the Documentation Work Group Achieve?

The work group has two main goals:

  1. Improve documentation content so there's more documentation aimed at people who are learning the language
  2. Modernize documentation themes to make them responsive on mobile so they can better serve users working with limited bandwidth

Although core developers have sometimes felt a great deal of ownership of parts of the documentation, all of the documentation is a community resource. As a result, no one person should be responsible for any one part of the documentation. If the work group is large enough, then it can serve as an editorial board that could work towards consensus.

The work group will:

  • Set priorities and projects for the next year
  • Build a larger documentation community and help them feel engaged, connected, and empowered

What Will Stay the Same?

Changes to documentation will still go through the same PR process that is described in the dev guide. There will be the same commitment to quality. Although there will be new documentation to meet the needs of underserved users and topics, the existing docs at docs.python.org and devguide.python.org will remain.

What Will Change?

Documentation is a gateway to education. In order for it to be more effective, we need broader input from the community. There have already been considerable efforts with translation and localization, but it would also be beneficial to have a new landing page to help users find the resources they need.

What's Next?

The next step is to deal with the logistics of work-group membership. The current members are Mariatta Wijaya, Carol Willing, Ned Batchelder, and Julien Palard. The charter for the work group states that it can have up to twenty members. The application process will be similar to the one that the code of conduct work group used. The current members expect applications to come from the wider Python community as well as from core developers.

Once the group has more members, they will hold a monthly meeting that will be scheduled to accommodate a variety of time zones. They will discuss docs issues, open PRs, the status of projects, achievements, next steps, and more.

There are also plans for AMA sessions on Discourse so that docs team members can answer questions and connect with the wider docs community. In addition, the group will reach out to PyLadies and tap into the diverse skill sets of their members.

Where Can You Learn More?

To learn more, you can check out the Python docs community on:

Sunday, May 23, 2021

The 2021 Python Language Summit: The Challenges of Packaging Python for a Linux Distro

Matthias Klose gave a talk about the challenges of packaging Python for a Linux distribution at the 2021 Python Language Summit. He wanted to discuss:

  • CPython sources and how they fit with Debian and Ubuntu
  • Ownership of module installations 
  • Architecture and platform support
  • Inclusiveness and the ubiquity of Python on various platforms
  • Communication issues
 

What Is Python Like on Debian and Ubuntu?

He shared what Debian 11 and Ubuntu 21.04 have installed for Python. By default, there is almost nothing, so it's usually pulled in by various seeds or images. Linux distributions usually have a mature packaging system and don't ship naked CPython, unlike macOs or Windows.

These Debian and Ubuntu versions still use Python 2 to bootstrap PyPI, but one version of Python 3 is also shipped with them. CPython itself is split into multiple binary packages for license issues, dependencies, development needs, and cross-buildability needs. There is also a new python3-full package that includes everything for the batteries-included experience. About twenty percent of the packages shipped by Debian use Python. He said that this is usually enough for desktop users but may not be enough for developers. However, the line between those two groups is not always clear.

As for the QA that Python is getting in Debian and Ubuntu, for the main part of the archive, it must conform to the Debian Free Software Guidelines. These guidelines include free distribution, inclusion of source code, and ability to modify and create derived works. Packages have to build from source and pass the upstream test suite as well as CI tests.

What Distro-Specific Decisions Were Made for Debian and Ubuntu?

Debian has a policy for shipping Python packages that is also used by Ubuntu. Usually, applications are in application-specific locations so they don't get in the way of anything else. They ship modules as --single-version-externally-managed, and they are usually shipped only if the application needs them.

The site-packages directory has been renamed to /usr/lib/python3/dist-packages and /usr/local/lib/python3.x/dist-packages for local installs. The path doesn't change during Python upgrades (PEP 3147 and PEP 3149). Although there are a large number of packages that use Python, pip is not used in the archive but just provided for use.

You can't call Python with python, python2, python3.x, or python3. There is no Python executable name by default to call Python because they just removed most of the Python 2 stuff. There is a package called python-is-python3 that reinstalls the Python symlink. That package was a compromise, and there was some difficulty getting it into Debian.

In the past, there have been license issues with shipping the CPython upstream sources. There are still license issues with the _dbm module, which is only buildable with a GPL-3+ license. There were also some executables included in the sources that were removed for 3.10. The big remaining issue is that wheels are still included without the source and can't be shipped, so you have to build them using the regular setuptools and pip distributions. Usually, symlinks and dependencies are used to point to the proper setuptools and pip packages.

The relationship between pip and Linux distributions is a difficult one, and there is more than one way to install Python modules. Part of the motivation behind renaming site-packages to dist-packages was that pip was breaking desktop systems. They also wanted to resolve conflicts with locally built Python (installed in /usr/local).

There has been some controversy about what can break your system:

  • sudo rm -rf /
  • sudo pip install pil
  • sudo apt install python3-pil

PyPA does not consider sudo pip install to be dangerous, but Debian and Ubuntu have different opinions about how pip should behave. Mixing packages from two different installers can lead to problems. Although PEP 517 appears to say that pip is only recommended, pip does seem to be enforced more and more. Matthias Klose joked that perhaps ensurepip should be renamed to enforcepip. He also said that having some kind of offline mode for pip would help.

He discussed inclusiveness and said that Python being ubiquitous should be seen as an asset rather than a burden. He was sad to see negative attitudes towards platforms that are used less, such as AIX, and didn't see why Python would want to exclude some communities.

What Communication Issues Are There?

In December or November, there were tweets about problems within Debian and Ubuntu. He considered some of the concerns that were brought up to be valid, but he said that there were also legal threats made against the Debian project on behalf of the PSF. He was of the opinion that all parties could improve and that it wasn't just a Debian or Ubuntu problem.

Here are some of the communication problems he highlighted:

  • Distro issues don't reach distro people.
  • Problems with pip breaking systems don't reach pip developers.
  • There is no single place to discuss PyPA issues.
  • There have been problems with manylinux wheels built for CentOS that came up in distro channels.
He would like to see communication improve and reminded the summit attendees that no group speaks with a single voice.

Saturday, May 22, 2021

The 2021 Python Language Summit: Lightning Talks, Round 1

The first day of the 2021 Python Language Summit finished with a series of lightning talks from Petr Viktorin, Lorena Mesa, Scott Shawcroft, and Jeff Allen.

Petr Viktorin, Lorena Mesa, and Scott Shawcroft
 

The Stable ABI and Limited C API

Petr Viktorin spoke about the stable ABI and limited C API. The stable ABI is a way to compile a C extension on Python 3.x and run it on Python 3.x+. It was introduced in 2009 with PEP 384. You can use it to simplify extension maintenance, and it will allow you to support more versions. But it does have lower performance, and you can't do everything with it that you could with the full API.

Petr Viktorin would like to see it used for bindings and embeddings. If Python is just a small part of your application and you don't want to invest a lot of maintainer time into supporting Python, then that would be a good use case. You could also use it to support unreleased Python versions.

If you limit yourself to the subset of the limited C API, then you will get an extension that conforms to the stable ABI. The limited C API aims to avoid implementation details and play well with:

  • Alternate Python implementations
  • Extension languages other than C
  • New features, such as isolated subinterpreters

However, the limited C API is not stable.

The limited C API and the stable ABI are now defined in Misc/stable_abi.txt. There are already tests, and soon there will be documentation as well. To learn more, check out:

 

Promoting PyLadies in CPython Development

Lorena Mesa spoke about PyLadies, which is an international mentorship group with a focus on helping more women become active participants and leaders in the Python open-source community.

Most of the growth in the PyLadies community has been coming from outside the USA and Europe. South America has the most active chapters, with Brazil in the lead. In order to help chapters better support their members, PyLadies is working on a centralized mandate and a global governance model.

While PyLadies has been working on education and outreach, it has been challenging to quantify how the group is helping women become active participants and leaders in the OSS community. In order to address this issue, they are preparing a survey about the challenges their members face in open source. Members may be having difficulty with: 

  • Language barriers
  • Technical expertise
  • Support
PyLadies will be launching a video series on how to be a contributor and would like to hear from core developers who could:
  • Submit a recording of their workflow to publish on YouTube
  • Offer feedback
If you'd like to participate, you can get in touch
 

CircuitPython: A Subset of CPython

 
CircuitPython is a much smaller version of Python that runs on microcontrollers. Scott Shawcroft compared what's included in CircuitPython and CPython to give a sense of what is central to users' experience of Python.
 
Although CircuitPython is 650 kilobytes compared to CPython's 29 megabytes, it still feels like Python. It is intended to be a strict subset of CPython that allows people to learn Python on microcontrollers and then graduate to CPython on a Linux computer without having to rework too much of their code.

Scott Shawcroft compared lists of modules and built-ins in CircuitPython and CPython to show that CircuitPython doesn't need to have much to still feel like Python. You need to have the built-ins and the syntax, but not necessarily all of the standard library. This is great news because, with a smaller core, you can bring Python to smaller devices.
 
However, CPython users who have come to CircuitPython have asked for NumPy, f-strings, pandas, Jupyter, pdb, and asyncio to be added. Interestingly, even though some of their requests go beyond core Python, users still associate them with their experience of Python. If that's the case, then maybe we don't need to be afraid to take elements of core Python and move them into packages because, at the end of the day, people still consider them to be Python.

Jython 3: Something Completely Different?

 
Jeff Allen shared the approach he's advancing for the Jython 3 core, which is quite different from Jython 2. He's been slowly working on his ideas outside of the official repo and wanted feedback to help identify potential problem areas before he starts working in the official repo.

He has already solved some more daunting problems, such as:
  • Inheritance
  • Descriptors
  • Built-in methods, but not yet their call sites
  • An interpreter for a subset of CPython bytecode
 
There is still a lot to do, but he hasn't had to go back to square one recently. However, there may still be bigger problems with Java integration, exceptions, and async. He asked for feedback about any other potential pitfalls he may not have seen yet.

The 2021 Python Language Summit: HPy — Present and Future

At the 2021 Python Language Summit, Antonio Cuni gave a presentation about HPy. He also gave a presentation about HPy at the 2020 Python Language Summit, so this year he shared updates on how the project has evolved since then.

 

What Is HPy?

HPy is an alternative API for writing C extensions. Although the current Python C API shows CPython implementation details, HPy hides all the implementation details that would otherwise be exposed. Antonio Cuni said that, if everyone used HPy, then it would help Python evolve in the long term.

Using HPy extensions will make it easier to support alternative implementations. HPy is designed to be GC friendly and isn't build on top of ref counting. It is also designed to have zero overhead on CPython, so you can port an existing module from the Python C API to HPy without any performance loss. In addition, it allows incremental migration, allowing you to port your existing extension one function at a time, or even one method at a time. HPy is also faster than the existing Python C API on alternative implementation such as PyPy and GraalPython.

What's New With HPy?

In the past year since Antonio Cuni last shared an update at a Python Language Summit, HPy has continued to make progress. It now has:

  • Support for Windows
  • Support for creating custom types in C
  • A debug mode to help you find mistakes in your C code
  • Setup tools integration to make it easier to compile HPy extensions 

There has also been work on a very early port of some parts of NumPy to HPy. The feedback from the NumPy team has been positive so far. Soon, the HPy team will start writing a Cython backend so that all Cython extensions will be able to automatically use HPy as well.

The HPy team has made a lot of progress with building community and getting funding. There is now a site for HPy as well as a blog, and there has been a lot of interest and involvement from the Python community. For example, someone independently started porting Pillow. Oracle, IBM, and Quansight Labs have provided some funding, but there has still been plenty of non-funded open source development, as usual.

How Do the CPython ABI and the Universal ABI Compare?

There are some differences between the CPython ABI and the Universal ABI:

CPython ABI vs Universal ABI

On the Universal ABI side, there is no way to support wheels.

How Does Debug Mode Work?

HPy's debug mode may be useful to you even if you aren't concerned about the problems that HPy is intended to solve because it can help you find common problems in your C code, such as memory leaks. Here's an example of an HPy function that takes an object and increments it by one:

HPy_Close() isn't called on the object that was created, so you have a memory leak. If you want to compile this file into an extension, then you can use setup.py:

Now, you can load the module and debug:

 

What Does the Future Hold?

Antonio Cuni closed his presentation by asking the CPython developers at the summit if it would be possible to make HPy a semi-official API in the future, with first-class support for importing modules and distributing wheels. Some attendees suggested writing a PEP to make that happen.

Thursday, May 20, 2021

The 2021 Python Language Summit: Making CPython Faster

At the 2021 Python Language Summit, Guido van Rossum gave a presentation about plans for making CPython faster. This presentation came right after Dino Viehland's talk about Instagram's performance improvements to CPython and made multiple references to it.

 

Can We Make CPython Faster?

We can, but it's not yet clear by how much. Last October, Mark Shannon shared a plan on GitHub and python-dev. He asked for feedback and said that he could make CPython five times faster in four years, or fifty percent faster per year for four years in a row. He was looking for funding and wouldn't reveal the details of his plan without it.

How Will We Make CPython Faster?

Seven months ago, Guido van Rossum left a brief retirement to work at Microsoft. He was given the freedom to pick a project and decided to work on making CPython faster. Microsoft will be funding a small team consisting of Guido van Rossum, Mark Shannon, Eric Snow, and possibly others.

The team will:

  • Collaborate fully and openly with CPython's core developers
  • Make incremental changes to CPython
  • Take care of maintenance and support
  • Keep all project-specific repos open
  • Have all discussions in trackers on open GitHub repos
 
The team will need to work within some constraints. They'll need to keep code maintainable, not break stable ABI compatibility, not break limited API compatibility, and not break or slow down extreme cases, such as pushing a million cases on the eval stack. Although they won't be able to change the data model, they will be able to change:
  • The byte code
  • The compiler
  • The internals of a lot of objects
 
The team is optimistic about doubling CPython's speed for 3.11. They plan to try an adaptive, specializing byte code interpreter, which is a bit like the existing inline cache and a bit like the shadow byte code covered in Dino Viehland's talk. Mark Shannon shared a sketch of an implementation of an adaptive specializing interpreter in PEP 659.
 
If the team grows, then they'll have more room to try other optimizations. They could improve startup time, change the pyc file format, change the internals of integers, put __dict__ at a fixed offset, use hidden classes, and possibly use tagged integers.

The team plans to keep working on speeding up CPython after Python 3.11, but they'll need to be creative in order to achieve a 5x improvement. It will involve machine-generated code. They may also need to evolve the stable ABI.
 

Who Will Benefit?

You'll benefit from the speed increase if you:

  • Run CPU-intensive pure Python code
  • Use tools or websites built in CPython
 
You might not benefit from the speed increase if you:
  • Rewrote your code in C, Cython, C++, or similar to increase speed already (e.g. NumPy, TensorFlow)
  • Have code that is mostly waiting for I/O
  • Use multithreading
  • Need to make your code more algorithmically efficient first

Where Can You Learn More?

To learn more, you can check out the following repos:

You can also read through PEP 659 — Specializing Adaptive Interpreter.

The 2021 Python Language Summit: CPython Performance Improvements at Instagram

Dino Viehland gave a presentation at the 2021 Python Language Summit about improvements to CPython's performance at Instagram.

Dino Viehland

Cinder is Instagram's internal performance-oriented production version of CPython 3.8, so all of the comparisons in this presentation dealt with Python 3.8. Cinder has a lot of performance optimizations, including bytecode inline caching, eager evaluation of coroutines, a method-at-a-time JIT, and an experimental bytecode compiler that uses type annotations to emit type-specialized bytecode that performs better in the JIT.

Successful Improvements

Instagram did a lot of work with asynchronous I/O. One big change was sending and receiving values without raising StopIteration. Raising all of those exceptions was a huge source of overhead. On simple benchmarks, this was 1.6 times faster, but it was also a 5% win in production. These changes have been upstreamed to Python 3.10 (bpo-41756 & bpo-42085).

Instagram also made another change to asynchronous I/O that hasn't been upstreamed yet: eager evaluation. Often, in their workload, if they await a call to a function, then it can run and immediately complete. If the call completes without blocking, then they don't have to create a coroutine object. Instead, a wait handle is returned. (One singleton instance is used, as the handle is immediately consumed.)

They used the new vectorcall API to do this work, so they have a new flag to show that a call is being awaited at the call site. In addition to having functions check this flag, they also have asyncio.gather() check the flag. This avoids overhead for task creation and scheduling. These changes led to a 3% win in production and haven't been upstreamed yet, but there have been discussions.

Another big change is inline caching for byte code, which they call shadow byte code. Although Python 3.10 has some inline caching as well, Instagram took a somewhat different approach. In Instagram's implementation, hot methods get a complete copy of the byte code and caches. As the function executes, they replace the opcodes in that copy with a hidden copy that's more specific. This resulted in a 5% win in production.

Dino Viehland also spoke about dictionary watchers, which haven't been upstreamed to CPython. Dictionary watchers provide updates to globals for builtins when they are modified. Instagram achieved this by reusing the existing version tag in dictionaries to mark dictionaries that are being watched. They took the low bit from that, so now whenever they need to bump the dictionary version, they bump it by two. This led to an additional 5% win when combined with shadow byte code.

Instagram made targeted optimizations as well. The CPython documentation mentions that assigning to __builtins__ is a CPython implementation detail. But it's an unusual one, because when you assign to it, it may not be respected immediately. For example, if you're using the same globals, then you use the existing builtins. Instagram made that always point to the fixed builtins dictionary, which led to a 1% win in production.

They also made some small changes to PyType_Lookup that were upstreamed and will be in Python 3.10. You can check bpo-43452 to learn more. In addition, Instagram worked on ThreadState lookup avoidance and prefetching variables before they're loaded, but frame creation is still expensive.

Experimental Work

Instagram has tried some experimental changes as well. One big one was the JIT. They have a custom method at a time JIT. There is nearly full coverage for all of the opcodes. They do have some unsupported opcodes, but they are rare and not used in methods, such as IMPORT_STAR. There are a couple of intermediate representations. The front end lowers to an HIR where they do an SSA and have a ref count insertion pass as well as other optimization passes. After they go though the HIR level, they lower it to an LIR, which is closer to x64.

Another experimental idea is something they call static Python. It provides similar performance gains as MyPyC or Cython, but it works at runtime and has no extra compile steps. It starts with a new source loader that loads files marked with import __static__, and it supports cross module compilation across different source files. There are also new byte codes such as INVOKE_FUNCTION and LOAD_FIELD that can be bound tightly at runtime. It uses normal PEP 484 annotations.

InterOp needs to enforce types at the boundaries between untyped Python and static Python. If you call a typed function, then you might get a TypeError. Static Python has a whole new static compiler that uses the regular Python ast module and is based on the Python 2.x compiler package.

In addition, Pyro is an unannounced, experimental, from-scratch implementation that reuses the standard library. The main differences between Pyro and CPython are:

  • Compacting garbage collection
  • Tagged pointers
  • Hidden classes

The C API is emulated for the PEP 384 subset for supporting C extensions.

Performance Results

Production improvements are difficult to measure because changes have been incremental over time, but they are estimated at between 20% and 30% overall. When Instagram was benchmarking, they used CPython 3.8 as the baseline and compared Cinder, Cinder with the JIT, and Cinder JIT noframe, which Instagram is not yet using in production but wants to move towards so they won't have to create Python frame objects for jitted code.

Cinder had good results on a large set of benchmarks and a 4x return on richards but did worse with others, particularly 2to3, python_startup, and python_startup_no_site. This was probably because they JIT every single function when it's invoked the first time. They haven't yet made the changes to JIT a function when it becomes hot. They also haven't yet tested comparisons with Pypy.
 
Cinder is open source, so you can check out the repo yourself.

Sunday, May 16, 2021

The 2021 Python Language Summit: Progress on Running Multiple Python Interpreters in Parallel in the Same Process

Victor Stinner and Dong-hee Na gave a presentation at the 2021 Python Language Summit about running multiple Python interpreters in parallel in the same process.

Victor Stinner & Dong-hee Na

Use Cases

Victor Stinner started by explaining why we would need to make the changes that they're discussing. One use case would be if you wanted to embed Python and extend the features of your application, like Vim, Blender, LibreOffice, and pybind11. Another use case is subinterpreters. For example, to handle HTTP requests, there is Apache mod_wsgi, which uses subinterpreters. There are also plugins for WeeChat, which is an IRC client written in C.

Embedding Python

One of the current issues with embedding Python is that it doesn't explicitly release memory at exit. If you use a tool to track memory leaks, such as Valgrind, then you can see a lot of memory leaks when you exit Python.

Python makes the assumption that the process is done as soon as you exit, so you wouldn't need to release memory. But that doesn't work for embedded Python because applications can survive after calling Py_Finalize(), so you have to modify Py_Finalize() to release all memory allocations done by Python. Doing that is even more important for Py_EndInterpreter(), which is used to exit the subinterpreter.

Running Multiple Interpreters in Parallel

The idea is to run one interpreter per thread and one thread per CPU, so you use as many interpreters as you have CPUs to distribute the workload. It's similar to multiprocessing use cases, such as distributing machine learning.

Why Do We Need a Single Process?

There are multiple advantages to using a single process. Not only can it be more convenient, but it can also be more efficient for some uses cases. Admin tools are designed for handling a single process rather than multiple. Some APIs don't work with cross-processes since they are designed for single processes. On Windows, creating a thread is faster than creating a process. In addition, macOS decided to ban fork(), so multiprocessing uses spawn by default and is slower.

No Shared Object

The issue with running multiple interpreters is that all CPUs have access to the same memory. There is concurrent access on the refcnt object. One way to make sure that the code is correct is to put a lock on the reference counter or use an atomic operation, but that can create a performance bottleneck. One solution would be to not share any objects between interpreters, even if they're immutable objects.

What Drawbacks Do Subinterpreters Have?

If you have a crash, like a segfault, then all subinterpreters will be killed. You need to make sure that all imported extensions support subinterpreters.

C API & Extensions

Next, Dong-hee Na shared the current status of the extension modules that support heap types, module state, and multiphase initialization. In order to support multiple subinterpreters, you need to support multiphase initialization (PEP 489), but first you need to convert static types to heap types and add module state. PEP 384 and PEP 573 support heap types, and we mostly use PyTypeFromSpec() and PyTypeFromModuleAndSpec() APIs. Dong-hee Na walked the summit attendees through an example with the _abc module extension.

Work Done So Far

Victor Stinner outlined some of the work that has already been done. They had to deal with many things to make interpreters not share objects anymore, such as free lists, singletons, slice cache, pending calls, type attribute lookup cache, interned strings, and Unicode identifiers. They also had to deal with the states of modules because there are some C APIs that directly access states, so they needed to be per interpreter rather than per module instance. 

One year ago, Victor Stinner wrote a proof of concept to check if the design for subinterpreters made sense and if they're able to scale with the number of CPUs:


Work That Still Needs to Be Done

Some of the easier TODOs are:

  • Converting remaining extensions and static types
  • Making _PyArg_Parser per interpreter
  • Dealing with the GIL itself

 Some of the more challenging TODOs are:

  • Removing static types from the public C API
  • Making None, True, and False singletons per interpreter
  • Getting the Python thread state (tstate) from a thread local storage (TLS)

There are some ideas for the future:

  • Having an API to directly share Python objects
  • Sharing data and use one Python object per interpreter with locks
  • Supporting spawning subprocesses (fork)

If you want to know more, you can play around with this yourself:

./configure --with-experimental-isolated-subinterpreters
#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS

Saturday, May 15, 2021

The 2021 Python Language Summit: PEP 654 — Exception Groups and except*

PEP 654 was authored by Irit Katriel, Yury Selivanov, and Guido van Rossum. This PEP is currently at the draft stage. At the 2021 Python Language Summit, the authors shared what it is, why we need it, and which ideas they rejected.

Irit Katriel, Yury Selivanov, and Guido van Rossum

 What Is PEP 654?

The purpose of this PEP is to help Python users handle unrelated exceptions. Right now, if you're dealing with several unrelated exceptions, you can:

  • Raise one exception and throw away the others, in which case you're losing exceptions
  • Return a list of exceptions instead of raising them, in which case they become error codes rather than exceptions, so you can't handle them with exception-handling mechanisms
  • Wrap the list of exceptions in a wrapper exception and use it as a list of error codes, which still can't be handled with exception-handling mechanisms

PEP 654 proposes:

Each except* clause will be executed once, at most. Each leaf exception will be handled by one except* clause, at most.

In the discussions about the PEP that have happened so far, there were no major objections to these ideas, but there are still disagreements about how to represent an exception group. Exception groups can be nested, and each exception has its own metadata:

A nested ExceptionGroup, with metadata

Originally, the authors thought that they could make exception groups iterable, but that wasn't the best option because metadata has to be preserved. Their solution was to use a .split() operation to take a condition on a leaf and copy the metadata:

Splitting exception groups with .split()

 

Why Do We Need Exception Groups and except*?

There are some differences between operational errors and control flow errors that you need to take into account when you're dealing with exceptions:

Operational errors vs control flow errors in Python

In Example 1, there is a clearly defined operation with a straightforward error. But in Example 2, there are concurrent tasks that could contain any number of lines of code, so you don't know what caused the KeyError. In this case, handling one KeyError could potentially be useful for logging, but it isn't helpful otherwise. But there are other exceptions that it could make more sense to handle:

asyncio.CancelledError

It's important to understand the differences between operational errors and control flow errors, as they relate to try-except statements:

  • Operational errors are typically handled right where they happen and work well with try-except statements. 
  • Control flow errors are essentially signals, and the current semantics of the try-except statement doesn't handle them adequately.

Before asyncio, it wasn't as big of a problem that there weren't advanced mechanisms to react to these kinds of control flow errors, but now it's more important that we have a better way to deal with these sorts of issues. asyncio.gather() is an unusual API because it has two entirely different operation modes controlled by one keyword argument, return_exceptions

asyncio.gather()

The problem with this API is that, if an error happens, you still wait for all of the tasks to complete. In addition, you can't use try-except to handle the exceptions but instead have to unpack
the results of those tasks and manually check them, which can be cumbersome.

The solution to this problem was to implement another way of controlling concurrent tasks:

asyncio.TaskGroup

If one tasks fails, then all other tasks will be cancelled. Users of asyncio have been requesting this kind of solution, but it needed a new way of dealing with exceptions and was part of the inspiration behind PEP 654.

Which Ideas Were Rejected?

Whether or not exception groups should be iterable is still an open question. For that to work, tracebacks would need to be concatenated, with shared parts copied, which isn't very efficient. But iteration isn't usually the right approach for working with exception groups anyway. A potential compromise could be to have an iteration utility in traceback.py.

The authors considered teaching except to handle exception groups instead of adding except*, but there would be too many backwards compatibility problems. They also thought about using an except* clause on one exception at a time. Backwards compatibility issues wouldn't apply there, but this would essentially be iteration, which wouldn't help.

Thursday, May 13, 2021

The 2021 Python Language Summit: Welcome, Introductions, Guidelines

As attendees slowly filtered into the virtual event leading up to the official start time, they were clearly happy to see each other and have the chance to get together virtually even though PyCon US and the language summit have had to be remote for two years in a row. Although we would like to see each other in person again, one benefit of keeping the summit virtual this year was that more people were able to participate than usual.

Normally, the number of people attending would be small enough that there would be time for each person to take a moment to introduce themselves to the group. Since there were more participants than usual this year, Łukasz Langa walked us through a slide deck that had a page for each of the attendees. It was an international event, with participants attending from North America, South America, Europe, Africa, the Middle East, Asia, and Oceania.

After Łukasz finished the introductions, Ewa Jodlowska told us about the code of conduct and the procedures in place to help all participants feel welcome and have a positive experience.

With the 2021 Python Language Summit off to a good start, we took a group photo and were ready to launch into the talks!

The 2021 Python Language Summit

Every year, a small group of core developers from Python implementations such as CPython, PyPy, Jython, and more come together to share information, discuss problems, and seek consensus in order to help Python continue to flourish.

The Python Language Summit features short presentations followed by group discussions. The topics can relate to the language itself, the standard library, the development process, documentation, packaging, and more! In 2021, the summit was held over two days by videoconference and was led by Mariatta Wijaya and Łukasz Langa.

If you weren't able to attend the summit, then you can still stay up to date with what's happening in the world of Python by reading blog posts about all of the talks that were given. Over the next few weeks, you'll be able to dive into all of the news from the summit so you can join in on the big conversations that are happening in the Python community.

Day 1

 
Łukasz Langa
 
Irit Katriel, Yury Selivanov, and Guido van Rossum
 
Victor Stinner and Dong-hee Na
 
Dino Viehland
 
Guido van Rossum
 
Antonio Cuni
 
Petr Viktorin, Lorena Mesa, Scott Shawcroft, and Jeff Allen
 

Day 2

 
Matthias Klose
 
Mariatta Wijaya and Carol Willing
 
Brett Cannon
 
Eric Snow
 
Zac Hatfield-Dodds
 
Ronny Pfannschmidt, Pablo Galindo, Batuhan Taskaya, Luciano Ramalho, Jason R. Coombs, Mark Shannon, and Tobias Kohn
 
We hope you enjoy diving into what went on at the summit, and we're looking forward to seeing how the Python community continues these discussions.

Wednesday, May 12, 2021

Débora Azevedo: Finding A Sense of Belonging Through the Python Community

PyLadies Brazil co-founder,  Débora Azevedo can encapsulate her feelings about the Python community in one word: Belonging.  

Growing up in Natal, Brazil, Débora never would’ve guessed that Python would come to play such an important role in her life. “When I was first learning about programming in high school, I found it really difficult to comprehend. At that time I wrote code in a notebook, because I didn’t own a computer.” 


Débora stumbled upon her love for programming accidentally when she was studying to be an English teacher. “I decided to pursue a computer networking course because I knew I didn’t want to code. There were four terms in the course and during the fourth term I had to learn programming,” she said.  “I was convinced it just wouldn’t work for me. But when I started to learn Python, I suddenly had this empowered feeling that I could really build something.”


That feeling of empowerment was something that Débora felt compelled to share with others. In 2013, Débora organized the first meeting of PyLadies Brazil. “Through PyLadies, I got to meet with these amazing, smart women who helped me believe in myself. One day they encouraged me to give a talk about my experience with Python, even though I didn’t feel qualified,” Débora recalled. “From the moment I got involved with PyLadies, they always made me feel like I could do something great. Having that sense of self-worth is so important and it inspired me to help others.




For the past seven years, Débora has worked tirelessly to build a strong Python community in Brazil. What began as a small group of local women empowering each other through code, has grown into the largest PyLadies network in the world, with more than 30 PyLadies Brazil chapters nationwide. Débora was also the first chair of the PyLadies Brazil Conference and has since organized and spoken at countless Python Brazil conferences and events, over the past decade. 





In 2019, Débora attended PyCon US for the first time. “I was able to go to PyCon because of a PyLadies grant from the Python Software Foundation. It was my first time traveling to the U.S. and it was the biggest conference I’d ever been to,” Débora remembered. “Being able to attend PyCon US and network with others in the Python community was incredible. I remember that I hosted an open space to talk about PyLadies and I got to meet people from all over the world. I never would’ve had that opportunity if it weren’t for the PSF.” 





Today, Débora is working on her Master’s degree in Innovation in Education Technologies and is developing educational software to help people learn more efficiently. Her experience with PyLadies sparked her interest in pursuing this new path. “Learning the Python language challenged me to think about how people learn and inspired me to help others learn to code,” Débora stated. “Being part of the Python community allowed me to have a positive impact on the lives of others.” 


When asked why she thought supporting the Python Software Foundation was important, her answer was simple, “The grants and PyCon scholarships provided by the PSF have a huge impact on thousands of people globally. I feel like I belong to the Python community and I want to see it grow.”






If you donate $99 or more before June 12, 2021, you will receive an exclusive Python T-shirt, not sold in stores (shipping starts at the end of June). Please note that the PSF cannot ship shirts to OFAC sanctioned countries: https://home.treasury.gov/policy-issues/financial-sanctions/sanctions-programs-and-country-information. We apologize for this inconvenience.

Blog written by Morgan Mayo, PSF's Director of Resource Development