Sunday, May 16, 2021

The 2021 Python Language Summit: Progress on Running Multiple Python Interpreters in Parallel in the Same Process

Victor Stinner and Dong-hee Na gave a presentation at the 2021 Python Language Summit about running multiple Python interpreters in parallel in the same process.

Use Cases

Victor Stinner started by explaining why we would need to make the changes that they're discussing. One use case would be if you wanted to embed Python and extend the features of your application, like Vim, Blender, LibreOffice, and pybind11. Another use case is subinterpreters. For example, to handle HTTP requests, there is Apache mod_wsgi, which uses subinterpreters. There are also plugins for WeeChat, which is an IRC client written in C.

Embedding Python

One of the current issues with embedding Python is that it doesn't explicitly release memory at exit. If you use a tool to track memory leaks, such as Valgrind, then you can see a lot of memory leaks when you exit Python.

Python makes the assumption that the process is done as soon as you exit, so you wouldn't need to release memory. But that doesn't work for embedded Python because applications can survive after calling Py_Finalize(), so you have to modify Py_Finalize() to release all memory allocations done by Python. Doing that is even more important for Py_EndInterpreter(), which is used to exit the subinterpreter.

Running Multiple Interpreters in Parallel

The idea is to run one interpreter per thread and one thread per CPU, so you use as many interpreters as you have CPUs to distribute the workload. It's similar to multiprocessing use cases, such as distributing machine learning.

Why Do We Need a Single Process?

There are multiple advantages to using a single process. Not only can it be more convenient, but it can also be more efficient for some uses cases. Admin tools are designed for handling a single process rather than multiple. Some APIs don't work with cross-processes since they are designed for single processes. On Windows, creating a thread is faster than creating a process. In addition, macOS decided to ban fork(), so multiprocessing uses spawn by default and is slower.

No Shared Object

The issue with running multiple interpreters is that all CPUs have access to the same memory. There is concurrent access on the refcnt object. One way to make sure that the code is correct is to put a lock on the reference counter or use an atomic operation, but that can create a performance bottleneck. One solution would be to not share any objects between interpreters, even if they're immutable objects.

What Drawbacks Do Subinterpreters Have?

If you have a crash, like a segfault, then all subinterpreters will be killed. You need to make sure that all imported extensions support subinterpreters.

C API & Extensions

Next, Dong-hee Na shared the current status of the extension modules that support heap types, module state, and multiphase initialization. In order to support multiple subinterpreters, you need to support multiphase initialization (PEP 489), but first you need to convert static types to heap types and add module state. PEP 384 and PEP 573 support heap types, and we mostly use PyTypeFromSpec() and PyTypeFromModuleAndSpec() APIs. Dong-hee Na walked the summit attendees through an example with the _abc module extension.

Work Done So Far

Victor Stinner outlined some of the work that has already been done. They had to deal with many things to make interpreters not share objects anymore, such as free lists, singletons, slice cache, pending calls, type attribute lookup cache, interned strings, and Unicode identifiers. They also had to deal with the states of modules because there are some C APIs that directly access states, so they needed to be per interpreter rather than per module instance. 

One year ago, Victor Stinner wrote a proof of concept to check if the design for subinterpreters made sense and if they're able to scale with the number of CPUs:


Work That Still Needs to Be Done

Some of the easier TODOs are:

  • Converting remaining extensions and static types
  • Making _PyArg_Parser per interpreter
  • Dealing with the GIL itself

 Some of the more challenging TODOs are:

  • Removing static types from the public C API
  • Making None, True, and False singletons per interpreter
  • Getting the Python thread state (tstate) from a thread local storage (TLS)

There are some ideas for the future:

  • Having an API to directly share Python objects
  • Sharing data and use one Python object per interpreter with locks
  • Supporting spawning subprocesses (fork)

If you want to know more, you can play around with this yourself:

./configure --with-experimental-isolated-subinterpreters
#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS

Saturday, May 15, 2021

The 2021 Python Language Summit: PEP 654 — Exception Groups and except*

PEP 654 was authored by Irit Katriel, Yury Selivanov, and Guido van Rossum. This PEP is currently at the draft stage. At the 2021 Python Language Summit, the authors shared what it is, why we need it, and which ideas they rejected.

Irit Katriel, Yury Selivanov, and Guido van Rossum

 

What Is PEP 654?

The purpose of this PEP is to help Python users handle unrelated exceptions. Right now, if you're dealing with several unrelated exceptions, you can:

  • Raise one exception and throw away the others, in which case you're losing exceptions
  • Return a list of exceptions instead of raising them, in which case they become error codes rather than exceptions, so you can't handle them with exception-handling mechanisms
  • Wrap the list of exceptions in a wrapper exception and use it as a list of error codes, which still can't be handled with exception-handling mechanisms

PEP 654 proposes:

Each except* clause will be executed once, at most. Each leaf exception will be handled by one except* clause, at most.

In the discussions about the PEP that have happened so far, there were no major objections to these ideas, but there are still disagreements about how to represent an exception group. Exception groups can be nested, and each exception has its own metadata:

A nested ExceptionGroup, with metadata

Originally, the authors thought that they could make exception groups iterable, but that wasn't the best option because metadata has to be preserved. Their solution was to use a .split() operation to take a condition on a leaf and copy the metadata:

Splitting exception groups with .split()

 

Why Do We Need Exception Groups and except*?

There are some differences between operational errors and control flow errors that you need to take into account when you're dealing with exceptions:

Operational errors vs control flow errors in Python

In Example 1, there is a clearly defined operation with a straightforward error. But in Example 2, there are concurrent tasks that could contain any number of lines of code, so you don't know what caused the KeyError. In this case, handling one KeyError could potentially be useful for logging, but it isn't helpful otherwise. But there are other exceptions that it could make more sense to handle:

asyncio.CancelledError

It's important to understand the differences between operational errors and control flow errors, as they relate to try-except statements:

  • Operational errors are typically handled right where they happen and work well with try-except statements. 
  • Control flow errors are essentially signals, and the current semantics of the try-except statement doesn't handle them adequately.

Before asyncio, it wasn't as big of a problem that there weren't advanced mechanisms to react to these kinds of control flow errors, but now it's more important that we have a better way to deal with these sorts of issues. asyncio.gather() is an unusual API because it has two entirely different operation modes controlled by one keyword argument, return_exceptions

asyncio.gather()

The problem with this API is that, if an error happens, you still wait for all of the tasks to complete. In addition, you can't use try-except to handle the exceptions but instead have to unpack
the results of those tasks and manually check them, which can be cumbersome.

The solution to this problem was to implement another way of controlling concurrent tasks:

asyncio.TaskGroup

If one tasks fails, then all other tasks will be cancelled. Users of asyncio have been requesting this kind of solution, but it needed a new way of dealing with exceptions and was part of the inspiration behind PEP 654.

Which Ideas Were Rejected?

Whether or not exception groups should be iterable is still an open question. For that to work, tracebacks would need to be concatenated, with shared parts copied, which isn't very efficient. But iteration isn't usually the right approach for working with exception groups anyway. A potential compromise could be to have an iteration utility in traceback.py.

The authors considered teaching except to handle exception groups instead of adding except*, but there would be too many bakwards compatibility problems. They also thought about using an except* clause on one exception at a time. Backwards compatibility issues wouldn't apply there, but this would essentially be iteration, which wouldn't help.