Wednesday, June 12, 2019

2019 Board of Directors Election - Voting is Open

Voting is currently open for the 2019 Python Software Foundation Board of Directors Election. We have a great list of candidates this year so if you received a ballot, please vote.

Who should have received a ballot?

If you became a PSF Supporting Member*, Contributing Member, Managing Member and/or Fellow by May 31, 2019 you are eligible to vote. You should have received a ballot from Helios with details on how to cast your vote. If you cannot find the email, please search your inbox and also check your spam for the word "helios".

Once you login to Helios, be sure to follow the process until you see "Congratulations, your vote has been successfully cast!".

* Must be a current membership and not expired as of May 31, 2019

When do I need to vote by?

Voting opened June 7th and will close by the end of June 16 AoE.

How do I become a voting member?

If you're currently not a voting member but wish be to a voting member for future elections (2020 and on), here are some options for you to consider:

  • Contribute to the PSF $99 yearly by becoming a Supporting Member. You can sign up via http://psfmember.org.
  • If you dedicate at least five hours per month working to support the Python ecosystem you can become a Managing Member. If you dedicate at least five hours per month working on Python-related projects that advance the mission of the PSF you can become a Contributing Member. You can self certify via https://forms.gle/vbJvweHW8rimAjYd6. You must be a basic member before you apply to be a Contributing/Managing member.
  • If you know of someone that has gone above and beyond in their contributions to the Python community, consider nominating them for the PSF Fellow membership level. Details are available here: https://www.python.org/psf/fellows/.

If you have any questions about the PSF Election, please contact the PSF staff: psf-staff at python dot org.

--------------------------------------------------

The PSF is currently holding its 2019 Fundraiser. As a non-profit organization, the PSF depends on sponsorships and donations to support the Python community. Check out our Annual Impact Report for more details: https://www.python.org/psf/annual-report/2019/.

Please consider contributing to the PSF's 2019 fundraiser; we can't continue our work without your support! https://www.python.org/psf/donations/2019-q2-drive/.

Tuesday, June 04, 2019

Python Language Summit Lightning Talks, Part 2

The Summit concluded with a second round of lightning talks, which speakers had signed up for that day. These talks were therefore more off-the-cuff than the morning's talks, and several of them were direct responses to earlier presentations.

Read more 2019 Python Language Summit coverage.

Christian Heimes

SSL Module Updates




Python’s ssl module depends on OpenSSL. On Linux, Python uses the system OpenSSL, but on Mac and Windows it ships its own. Christian Heimes explained to the Language Summit that Python 3.7 must upgrade its included version of OpenSSL to 1.1.1 to receive long-term support, but he warned that this change might cause unexpected issues for Python programmers on Mac and Windows.

Heimes wants to deprecate support for TLS 1.1 as soon as possible. Recent Linux distributions and browsers already prohibit the old protocol for security reasons. In Python 3.8 he plans to document that TLS 1.1 “may work”, depending on the underlying OpenSSL version, and in Python 3.9 it will be explicitly banned. Larry Hastings asked whether this change could be spread over two Python releases, in the way that most features are first deprecated and then removed. Heimes replied that OpenSSL itself is moving this quickly.

Python has poor support for the root certificate authority files included in the operating system. On Linux and BSD, ssl.create_default_context() uses the root CAs correctly. On Windows, according to Heimes, root CAs are partly broken despite “a hack I added a couple years ago that does not work for technical reasons.” And on macOS there is no support without installing the certifi package. Heimes proposed to rely more on the operating system: on Mac and Windows in particular, the interpreter should ask the OS to verify certificates against its known CAs, instead of asking OpenSSL.

It has been several years since Heimes and Cory Benfield began drafting PEP 543. This PEP would decouple Python’s API from the specifics of the OpenSSL library, so it could use OS-provided TLS libraries on macOS, Windows, and elsewhere. Heimes told the Language Summit that he and Paul Kehrer would work on PEP 543 during the post-PyCon sprints.

Larry Hastings

Let’s Argue About Clinic


Argument Clinic is a tool used in the implementation of CPython to generate argument-parsing code for C functions that are used from Python; i.e., “builtin” functions. (It is named for a Monty Python sketch.) Since its original intent was to create C functions that handle their arguments like Python functions do, it only handles Python-like function signatures.

Larry Hastings addressed the Language Summit to ask whether Argument Clinic ought to be extended to handle argument parsing more generally, including function signatures that would not be possible in pure Python. For example, some builtin functions have parameters with a default value of NULL, which is representable in C but not in Python. Hastings said he had observed developers wanting to use Argument Clinic for all builtin functions because it is convenient to use and generates fast code.

Eric Snow

The C API



One of the reasons for CPython’s success is its powerful C API, which permits C extensions to interact with the interpreter at a low level for the sake of performance or flexibility. But, according to Eric Snow, the C API has become a hindrance to progress because it is so tightly coupled to CPython’s implementation details. He identified several problems with the current CPython implementation, such as the GIL, but said, “we can't go very far fixing those problems without breaking the C API.”

One solution is to split the C API into four categories. The CPython header files would be split into four directories to make it more obvious to core developers and extension authors which category each type or function belongs to:

  • “internal” — “Do not touch!”
  • “private” — “Use at your own risk!”
  • “unstable” — “Go for it (but rebuild your extension each Python release)!”
  • “stable” — “Worry-free!”
There are a number of other solutions proposed or in progress:
Snow finished by inviting interested people to join him on the C API special interest group mailing list.

Steve Dower

Python in the Windows Store



When a Windows user types python in the command shell on a clean system, the shell typically responds, “python is not recognized as an internal or external command”. After the May Windows update this will change: Typing python in the shell will now open the Microsoft Store and offer to install Python. When Steve Dower showed the install screen to the Language Summit the audience broke into applause.
screenshot

The package is owned by the Python core developers; Microsoft’s contribution was to add the python command stub that opens the install page. Compared to the installer that users can download from python.org, said Dower, “the Microsoft Store is a more controlled environment.” Its distribution of Python is easier to install and upgrade, at the cost of some inconsistencies with the full Python install. “It's not going to work for everyone.” Advanced developers and those who want multiple versions of Python will prefer to install it themselves, but the Microsoft Store will satisfy programmers who simply need Python available. “Everyone with a Windows machine in any of the tutorial rooms right now should probably be using this,” he said.

It had not been possible to install Python so conveniently on Windows, until recent changes to the Microsoft Store. For example, Store apps were originally prohibited from accessing their current working directory, but apps are now permitted virtually the same permissions as regular programs.

Carol Willing asked whether the Store version of Python could be used for reproducing data science results. “There are a number of situations where I would say don't use this package,” responded Dower. Since the Microsoft Store will automatically update its version of Python whenever there is a new release, data scientists who care about reproducibility should install Python themselves.

Nathaniel Smith

Bors: How Rust Handles Buildbots and Merge Workflow

(Or: One way to solve Pablo’s problems)


In response to Pablo Galindo Salgado’s earlier talk about the pain caused by test failures, Nathaniel Smith whipped up a talk about the Rust language’s test system. The Rust community observes what they call the Not Rocket Science Rule: “Automatically maintain a repository of code that always passes all the tests.” Although it is obvious that all projects ought to adhere to this rule, most fail to, including Python. How does Rust achieve it?

When a Rust developer approves a pull request, the “bors” bot tests it and, if the tests pass, merges the pull request to master.


This seems quite elementary, as Smith acknowledged. But there are two unusual details of the bors system that enforce the Not Rocket Science Rule. The first is that bors tests pull requests in strict sequence. It finds the next approved pull request, merges it together with master, tests that version of the code, and if the tests pass bors makes that version the new master, otherwise it rejects the pull request. Then, bors moves to the next pull request in the queue. Compared to the typical system of testing pull requests before merging, bors’s algorithm tests the version of the code that will actually be published.

The second way the Rust community enforces the Not Rocket Science Rule is by requiring the bors process for all pull requests. “They do this for everything,” said Smith. “This is how you merge. There's no green button.” Taken together, bors’s algorithm and the workflow requirement ensure that Rust always passes its tests on master.

Smith described some conveniences that improve the Rust developer experience. First, bors can be triggered on a pull request before approving it, as a spot check to see whether the code passes the test suite as-is. Second, since bors must test pull requests one at a time, it has an optimization to prevent it from falling behind. It can jump ahead in the queue, merging a large batch of pull requests together and testing the result. If they pass, they can all be merged; otherwise, bors bisects the batch to find the faulty change, alerts its author, and merges all the pull requests before it.

The Rust project currently uses a successor to bors called Homu, written in Python. There are several other implementations, including bors-ng, which is available as a public service for any GitHub repository.

Victor Stinner

Status of stable API and stable ABI in Python 3.8



Python 3.8 will improve the long-term stability of the C API and ABI for extension authors. Some of the details are settled, but Victor Stinner’s presentation to the Language Summit showed there are still many unanswered questions.

As Eric Snow had mentioned, C header files in Python 3.8 will be split into directories for the public stable APIs, the unstable CPython-specific API, and the internal API. Afterwards there will be less risk of exposing an internal detail in the public API by mistake, since it will be obvious whenever a pull request changes a public header.

CPython’s internal API headers were not installed by “make install” in the past, but it could be useful for a debugger or other low-level tool to inspect the interpreter’s internal data structures. Thus, in Python 3.8 the internal headers will be installed in a special subdirectory.

In the course of Stinner’s regular work at RedHat he often debugs problems with customers’ third-party C extension modules. A debug build of the extension module might not be available, but Stinner could gather some useful information by loading the extension module with a debug build of Python. Today, this is impossible: debug builds of Python only work with debug builds of extension modules and vice versa. The debug build of Python 3.8, however, will be ABI compatible with the release build, so the same extension modules will work with both.

Another motivation for updating the C API is isolation of subinterperters. Stinner referred to Petr Viktorin’s talk about removing process-wide global state for the sake of proper isolation, and possibly giving each subinterpreter its own GIL.

Attaining a clean, stable API and ABI may require breaking the current one; the core developers’ discussion focused on how much to break backwards compatibility and what techniques might minimize the impact on extension authors. The shape of Python 3.8’s C API and ABI are not yet settled. When Steve Dower asked whether Stinner was proposing a new stable ABI, Stinner answered, “I’m not sure what I’m proposing.”

Yarko Tymciurak

Cognitive Encapsulation

The Anchor of Working Together



Tymciurak began his lightning talk by complimenting the Language Summit participants. “In terms of communication skills, you do such a great job that I'm shocked sometimes.”

The factors that contribute to collaboration, he said, begin with a diversity of skills. As Victor Stinner had mentioned in his mentorship talk, a team with mixed skill sets and skill levels has advantages over a team of homogenous experts. Members of high-performing teams are also enthusiastic about common goals, they are personally committed to their teammates, and they show strong interpersonal skills.

Tymciurak credited Guido van Rossum for establishing the importance of teamwork from the beginning. Nevertheless, he said, “sometimes things may go off track.” The cause of irreconcilable disagreements or emotional blowups are not always obvious, but Tymciurak claimed that to him, “it's immediately obvious and really simple to fix.”

Cognitive encapsulation is the awareness that one’s experience of reality is not reality itself. “It’s your own mental model,” said Tymciurak. When we communicate, if explicitly share with others what we think, see, or hear, then we are respecting cognitive encapsulation. As Tymciurak describes it, “That’s being aware that my thoughts are my own.” On the other hand, if we assume that others already agree with us, or we represent our personal experience as if it is the only possible experience for the whole group, then encapsulation is violated and we are likely to cause conflict.

As an example of cognitive encapsulation at work, Tymciurak contrasted two types of communication. One is transactional. Someone asks, “Where’s the meeting?” You answer by saying which room it is in. Another type is control communication. If an instructor commands students to “turn to page 47,” then control communication is appropriate and the students will accept it. But when a team member uses control communication without the team’s acceptance, conflict arises. Tymciurak said, “When you tell someone else what to do, you're breaking the encapsulation. Be careful. There's times when it's appropriate. But be aware of when it's not.”

Another key practice that preserves cognitive encapsulation is to truly listen. Especially when the speaker is a junior teammate, it is crucial to be able to listen without agreeing, disagreeing, or correcting. Tymciurak described the outcome of a team that works together this way. Individuals know that they understand each others’ views, and they can advocate for their own views, speaking from their own experience. “Then you can speak with authority and power. And that's part of the magic of encapsulation.”

Monday, June 03, 2019

Python Language Summit Lightning Talks, Part 1

The Summit began with six pre-selected lightning talks, with little time for discussion of each. Five of them are summarized here. An upcoming article will cover Pablo Galindo Salgado's lightning talk on improvements in Python's test infrastructure.

Read more 2019 Python Language Summit coverage.

Jukka Lehtosalo

Writing Standard Library C Modules In Python



Jukka Lehtosalo described his work with Michael Sullivan on an experimental compiler called mypyc.

The Python standard library, Lehtosalo said, contains the modules that most programmers use by default, so it should be fast. The main optimization technique has historically been to write C extensions. So far, 90 standard library modules are partly or entirely written in C, often for the sake of speed, totaling 200,000 lines of C code in the standard library. But C is hard to write and error prone, and requires specialized skills. “C is kind of becoming a dinosaur,” he said, provoking laughter from the core developers.

As an alternative, Lehtosalo proposes “writing C extensions in Python.” The mypyc compiler reads PEP 484 annotated type-checked Python, and transforms it into C extension modules that run between 2 and 20 times faster than pure Python. Some of Python’s more dynamic features such as monkeypatching are prohibited, and other features are not yet supported, but the project is improving rapidly.

The project has a similar goal to Cython’s: to transform Python into C, which is then compiled into extension modules. Compared to Cython, however, mypyc supports a wider range of PEP 484 types such as unions and generics. In Lehtosalo and Sullivan’s experiments it offers a greater performance improvement. They propose further experimentation, testing how well mypyc translates certain performance-sensitive standard library modules, such as algorithms, random, or asyncio. The translated modules could be published on PyPI first, rather than replacing the standard library modules right away. If the test goes well, mypyc would offer “C-like performance with the convenience of Python.”

Core developer Brett Cannon suggested an experiment using some module, such as datetime, that is already implemented in both Python and C. The Python version could be translated with mypyc and then pitted against the handwritten C version.

Matthias Bussonnier

Async REPL And async-exec



Python’s interactive shell makes it easy for beginners to learn Python, and for all Python programmers to experiment as they develop. However, async Python code is practically unusable with the shell. The await keyword must be used within a coroutine, so a programmer who wants the result of an a waitable object must define a coroutine and run an event loop method to execute it.

Matthias Bussonnier presented his work, which integrates async and await into the alternative IPython shell. IPython permits the await keyword at the top level, so a user can get the results of coroutines or other awaitables in the shell without defining a coroutine:
In [1]: from asyncio import sleep

In [2]: await sleep(1)

In [3]: from aiohttp import ClientSession

In [4]: s = ClientSession()

In [5]: response = await s.get('https://api.github.com')
IPython supports asyncio and other async frameworks such as trio. In the future, a plugin system will allow any async/await-based framework to be usable in the shell.
Bussonnier argued that some of his ideas should be adopted by core Python. If asynchronous coding were convenient in the shell, it would be useful for educators, and it would remove what he considers the misconception that async is hard. Best of all, Python would get ahead of Javascript.

However, to support async and await in the shell currently requires some unsatisfying hacks. There are subtle issues with local versus global variables, background tasks, and docstrings. Bussonnier has filed issue 34616, implement "Async exec", to make full async support in the shell possible.

Update: After the Language Summit, Bussonnier and Yury Selivanov updated the Python compiler to permit await, async for, and async with as top-level syntax in the shell when executed like python -m asyncio:


Jason Fried

Asyncio And The Case For Recursion



A typical asyncio application has a single call to run_until_complete() near the top level of the application, which runs the asyncio event loop for the entire application. All code beneath this level must assume that the loop is running.

Facebook engineer Jason Fried presented to the Language Summit a scenario in which this application structure fails. Consider an async application that contains a mix of async code and blocking calls that are tolerably fast. Deep within the call stack of one of these blocking calls, a developer sees an opportunity for concurrency, so she adds some async code and executes it with run_until_complete(). This call raises “RuntimeError: This event loop is already running.” As Fried explained, any call to run_until_complete() in a call chain under async def has this result, but due to modularization and unittest mocking in Facebook’s Python architecture, this error can first arise late in the development cycle.

How should this problem be avoided? The asyncio philosophy is to avoid mixture by converting all blocking code to asynchronous coroutines, but converting a huge codebase all at once is intractable. “It's a harder problem than moving from Python 2 to 3,” he said, “because at least I can go gradually from Python 2 to 3.”

Fried suggested a solution for incrementally converting a large application, and to allow developers to add asyncio calls anywhere “without fear.” He proposed that the asyncio event loop allow recursive calls to run_until_complete(). If the loop is already running, this call will continue running existing tasks along with the new task passed in. Library authors could freely use asyncio without caring whether their consumers also use asyncio or not. “Yeah sure it's ugly,” he conceded, “but it does allow you to slowly asyncio-ify a distinct code base.”

Thomas Wouters objected that this proposal would violate many correctness properties guaranteed by the current loop logic. Amber Brown concurred. She explained that Twisted’s loop prohibits reentrance to ensure that timeouts work correctly. One of the core tenets of asynchronous programming is that all tasks must cooperate. There is no good solution, she said, for mixing blocking and async code.

Mark Shannon

Optimising CPython, Or Not



“Every few years someone comes along with some exciting new potential for speeding up CPython,” began Mark Shannon, “and a year later everyone's forgotten about it.” Some of these optimizations are worth pursuing, however. We can identify promising optimizations with a heuristic.

First, Shannon advised the audience to think in terms of time, not speed. Do not measure the number of operations Python can execute in a period; instead, measure the amount of time it requires to finish a whole task and divide the total time into chunks. As an example, Shannon described a recent proposal on the python-dev mailing list for implementing a register-based virtual machine, which would store local variables in fixed slots, rather than on a stack as the Python VM does today. How much time could such a change save? Shannon walked the audience through his thought process, first estimating the cost of the Python interpreter’s stack manipulation and guessing how much cheaper a register-based VM would be. Shannon estimates that up to 50 percent of Python’s runtime is “interpretive overhead,” and a register-based VM might reduce that significantly, so it is worth trying. However, only an experiment can measure the actual benefit.

Shannon compared the register-based VM to another optimization, “superinstructions.” The technique is to find a common sequence of bytecodes, such as the two bytecodes to load None onto the stack and then return it, and combine them together into a new bytecode that executes the whole sequence. Superinstructions reduce interpretive overhead by spending less time in the main loop moving from one bytecode to the next. Shannon suspects this technique would beat the register-based optimization.

In conclusion, Shannon advised the audience that the next time another Unladen Swallow or similar project appears, to determine first which part of the interpreter it optimizes. If the optimization targets a part of the interpreter that represents less than 90% of the total runtime, said Shannon, “it’s pretty much doomed to fail.”

Łukasz Langa

Black under github.com/python



The past year has been marked by controversy in the Python community, but consensus is forming on the most unexpected topic: code formatting. Łukasz Langa’s Black code formatter is only a year old, but it has been adopted by pytest, attrs, tox, Django, Twisted, and numerous other major Python projects. The core developers are enthusiastic about Black, too: When Langa introduced himself as its author, the room broke into applause.

Langa proposed moving black from his personal repository to the Python organization on GitHub. He said, “My goal for this is to provide a suitable default for users who don't have any preexisting strong opinions on the matter.”

Some core developers dissented, arguing that since Black is already so successful, there is no need to move it. Gregory Smith said it is not the core team’s role to bless one code formatter over others; he regrets that opinionated tools like mypy are in the official organization and he opposes adding more. Guido van Rossum suggested moving it to the Python Code Quality Authority organization; Langa responded that beginners haven’t heard of that organization and moving Black there would have no effect.

Update: Despite some objections at the Language Summit, Black is now in the official Python organization on GitHub.

Pablo Galindo Salgado: The Night's Watch is Fixing the CIs in the Darkness for You


Python is tested on a menagerie of “buildbot” machines with different OSes and architectures, to ensure all Python users have the same experience on all platforms. As Pablo Galindo Salgado told the Language Summit, the bugs revealed by multi-platform tests are “Lovecraftian horrors”: race conditions, bugs specific to particular architectures or compiler versions, and so on. The core team had to confront these horrors with few good weapons, until now.

Read more 2019 Python Language Summit coverage.

The Solemn Duty of Bug Triage


When a test fails, the core developer who triages the failure follows an arduous process. “It's not glamorous by any means,” said Galindo, “but someone needs to do it.” Galindo, Victor Stinner, and Zachary Ware are the main bug triagers, and they all follow a similar sequence: read the failure email, search for duplicate failures, read the voluminous logs to characterize the problem, and file a bug with a detailed description. Then, optionally, they try to reproduce the problem. Since failures are often specific to one buildbot, the triagers must contact the buildbot’s owner and get permission to ssh into it and debug.

According to Galindo, typical test failures are “really, really complicated,” so the triage team takes a firm stance about reverting changes. If they suspect that a change has broken a test, its author has one day to follow up with a fix or the change will be reverted. “Nobody likes their commits to be reverted,” he told the Language Summit. But test failures can cause cascading failures later on, so the team must be ruthless.

New Tools for Squashing Bugs


A pull request is not tested by the buildbots until after it is merged, so the author does not immediately know if they have broken any tests. Galindo and his colleagues have written a bot which reacts to a test failure by commenting on the merged pull request that caused it, with reassuring instructions to help the panicked author respond. “We have some arcane magic,” he said, to distinguish compiler errors from tracebacks and neatly format them into the message, so the author get begin diagnosing immediately.


Since the bot was deployed in September, the mean time to fix a test failure has fallen dramatically. When Galindo showed this chart, the core developers broke into applause.


Nevertheless, there are still severe problems with Python’s tests. Flaky tests break about 40% of the builds; the system is programmed to retry a failure and consider it successful if the second run passes, but this is clearly just a stopgap. Galindo urged the core team to reduce flaky tests by eliminating race conditions and sleeps. He also asked for help writing a tool that would analyze which flaky tests fail most often, and a tool to detect and merge duplicate test failures.

Finally, Galindo proposed allowing contributors to test their pull requests on the buildbots before merging. This feature should be implemented cautiously. “The buildbots are very delicate,” he said. They cannot safely run arbitrary code like on Travis or other commercial test infrastructures. Still, it would be worth the effort, if contributors could catch mistakes before they are merged.

Thursday, May 30, 2019

Use two-factor auth to improve your PyPI account's security

To increase the security of Python package downloads, we're beginning to introduce two-factor authentication (2FA) as a login security option on the Python Package Index. This is thanks to a grant from the Open Technology Fund; coordinated by the Packaging Working Group of the Python Software Foundation.

Starting today, the canonical Python Package Index at PyPI.org and the test site at test.pypi.org offer 2FA for all users. We encourage project maintainers and owners to log in and go to their Account Settings to add a second factor. This will help improve the security of their PyPI user accounts, and thus reduce the risk of vandals, spammers, and thieves gaining account access.

PyPI's maintainers tested this new feature throughout May and fixed several resulting bug reports; regardless, you might find a new issue. If you find any potential security vulnerabilities, please follow our published security policy. (Please don't report security issues in Warehouse via GitHub, IRC, or mailing lists. Instead, please directly email one or more of our maintainers.) If you find an issue that is not a security vulnerability, please report it via GitHub.

PyPI currently supports a single 2FA method: generating a code through a Time-based One-time Password (TOTP) application. After you set up 2FA on your PyPI account, then you must provide a TOTP (along with your username and password) to log in. Therefore, to use 2FA on PyPI, you'll need to provision an application (usually a mobile phone app) in order to generate authentication codes; see our FAQ for suggestions and pointers.

You'll need to verify your primary email address on your Test PyPI and/or PyPI accounts before setting up 2FA. You can also do that in your Account Settings.

Currently, only TOTP is supported as a 2FA method. Also, 2FA only affects login via the website which safeguards against malicious changes to project ownership, deletion of old releases, and account take overs. Package uploads will continue to work without 2FA codes being provided.

But we're not done! We're currently working on WebAuthn-based multi-factor authentication, which will let you use, for instance, Yubikeys for your second factor. Then we'll add API keys for package upload, then an advanced audit trail of sensitive user actions. More details are in our progress reports.

Thanks to the Open Technology Fund for funding this work. And please sign up for the PyPI Announcement Mailing List for future updates.

Wednesday, May 29, 2019

2018 in review!


Happy New Year from the PSF! We’d like to highlight some of our activities from 2018 and update the community on the initiatives we are working on.

PyCon 2018


PyCon 2018 was held in Cleveland, Ohio, US. The conference brought together 3,389 attendees from 41 countries. We awarded $118,543 in financial aid to 143 attendees. In addition to financial aid, the conference continues to offer childcare for attendees, a newcomer orientation, a PyLadies lunch, and many more events.

Registration is now open for PyCon 2019: https://pycon.blogspot.com/2018/11/pycon-2019-registration-is-open.html .

Community Support


We initiated a Python Software Foundation Meetups Pro network at the end of the year, which supports 37 meetups in 8 countries and further expansion planned. The Sponsorship model allows the PSF to invite existing groups to the Meetup Pro network. The organizers no longer pay for the meetup subscription once they become part of the PSF network. This initiative will save approximately 32 hours of PSF staff time and 21 hours of meetup organizer time.

To help with transparency, the PSF launched its first newsletter in December! If you’d like to receive our next edition, subscribe here:  https://www.python.org/psf/newsletter/. You can read our first edition here: https://mailchi.mp/53049c7e2d8b/python-software-foundation-q4-newsletter

This year we formalized our fiscal sponsorship program to better support mission related projects. The PSF has signed fiscal sponsorship agreements with 8 groups including Pallets (Flask), PhillyPUG, PuPPy, PyCascades, PyHawaii, PyMNtos, PyArkansas, and the Python San Diego User Group. Through this effort, the PSF is able to support these projects by handling their accounting and admin work so the projects can concentrate on furthering their goals.

Python Package Index


Thanks to a generous award from the Mozilla Open Source Support program, the all new Python Package Index based on the warehouse codebase rollout was completed in April of 2018.

If you are interested in what the Packaging Group is currently working on, check out their RFP for security and accessibility development: http://pyfound.blogspot.com/2018/12/upcoming-pypi-improvements-for-2019.html.

Grants


The Python Ambassador program helps further the PSF's mission with the help of local Pythonistas.  The goal is to perform local outreach and introduce Python to areas where it may not exist yet. In March 2018, the board approved expanding our Python Ambassador program to include East Africa. Kato Joshua and the Afrodjango Initiative have been doing great outreach in universities in Uganda, Rwanda, and Kenya. 

In a general overview, $324,000 was paid in grants last year to recipients in 51 different countries. We awarded $59,804 more in grants in 2018 than 2017. That's a 22.6% increase for global community support.

Here is a chart showing the global grant distribution in 2018:

PSF Staff


In June Ernest W. Durbin III was hired as Director of Infrastructure. Ernest will be evaluating and strengthening internal systems, supporting and improving community infrastructure, and developing programs that benefit the Python community worldwide.

In September, the PSF hired Jackie Augustine as Event Manager. Jackie will be working with the team on all facets of PyCon and managing several community resources for regional conferences.

It is with great pleasure that we announce that Ewa Jodlowska will be the PSF's first Executive Director, starting January 1, 2019. Given her years of dedicated service to the PSF from event manager to her current position as Director of Operations, we can think of no one more qualified to fill this role as the PSF continues to grow and develop.


Community Recognition


Through out 2018, we presented several awards to recognize those that go above and beyond in our community. This year we gave out several Community Service Awards, a Distinguished Service Award, and a Frank Willison Memorial Award. To find out more about our awards or how to nominate someone for a Community Service Award, check out: https://www.python.org/community/awards/.

Community Service Awards

Chukwudi Nwachukwu was recognized for his contribution to spreading the growth of Python to the Nigerian community and his dedication and research to the PSF grants work group.

Mario Corchero was awarded a CSA for his leadership of the organization of PyConES, PyLondinium, and the PyCon Charlas track in 2018. His work has been instrumental in promoting the use of Python and fostering Python communities in Spain, Latin America, and the UK.

We also honored our Job Board volunteers: Jon Clements, Melanie Jutras, Rhys Yorke, Martijn Pieters, Patrice Neff, and Marc-Andre Lemburg, who have spent many hours reviewing and managing the hundreds of job postings submitted on an annual basis

Mariatta Wijaya was an awardee for her contributions to CPython, her efforts to improve the workflow of the Python core team, and her work to increase diversity in our community. In addition, her work as co-chair of PyCascades helps spread the growth of Python

Alex Gaynor received an award for his contributions to the Python and Django Communities and the Python Software Foundation. Alex previously served as a PSF Director in 2015-2016. He currently serves as an Infrastructure Staff member and contributes to legacy PyPI and the next generation warehouse and has helped legacy warehouse in security (disabling unsupported OpenID) and cutting bandwidth costs by compressing 404 images.

2018 Distinguished Service Award

The 2018 Distinguished Service Award was presented to Marc-Andre Lemburg for his significant contributions to Python as a core developer, EuroPython chair, PSF board member, and board member of the EuroPython Society.

2018 Frank Willison Memorial Award

The Frank Willison Memorial Award for Contributions to the Python Community was awarded to Audrey Roy Greenfeld and Daniel Roy Greenfeld for their contributions to the development of Python and the global Python community through their speaking, teaching, and writing.

Donations and Sponsorships


We'd like to thank all of our donors and sponsors that continue to support our mission! Donations and fundraisers resulted in $489,152 of revenue. This represents 15% of total 2018 revenue. PSF and PyCon sponsors contributed over $1,071K in revenue! 

This year we welcomed 17 new sponsors in 2018 including our first Principal Sponsors, Facebook and Capital One. Thank you for your very generous support.


We welcome your thoughts on how you’d like to see our Foundation involved in Python’s ecosystem and are always interested in hearing from you. Email us!

We wish you a very successful 2019!

Ewa Jodlowska
Executive Director

Betsy Waliszewski
Sponsor Coordinator

Tuesday, May 28, 2019

Python Core Developer Mentorship


Core developer Victor Stinner described to the Language Summit his method of mentoring potential new core developers. His former apprentices Pablo Galindo Salgado and Cheryl Sabella, who have both been promoted to core developer in the last year, recounted their experiences of the mentorship program.

Read more 2019 Python Language Summit coverage.


Barriers To Entry


Python needs more core developers now, according to Stinner, to spread the burden of maintaining the code and reviewing contributions. Regular contributors can be promoted to the core team, but this process can take up to five years and few contributors stay engaged for long enough, because contributing to the Python project is discouraging.

Contributors’ main frustration is that pull requests can languish for months or years without a review, so they give up and seek a responsive project instead. Python is caught in a Catch-22, where the core team’s understaffing makes the project unwelcoming to potential recruits, which means the team stays understaffed. But there are other hurdles for contributors: The code base is 30 years old, with some dusty corners and complex parts, and it supports a wild variety of platforms. Python’s popularity can also be a barrier; it is frightening to modify code used by millions of people.

The Fast Path To The Core Team


Stinner described how core developers can overcome the Catch-22 by personally mentoring prospective teammates, as he does. He identifies promising coders who contribute frequently, and contacts them to offer mentorship.

Stinner said that a mentor must follow the apprentice’s progress closely over a period of many months, at least. Not all worthwhile effort results in a Git commit: an apprentice must spend time learning the workflow, the codebase, and so on. With close attention, a mentor will know that the apprentice is making progress even during quiet periods. Once an apprentice submits a pull request, the mentor’s job is to provide a prompt, thorough review, or recruit the appropriate expert to do so.

Stinner admitted that it is difficult to prioritize among the many items on his to-do list, so he dedicates time on his schedule for mentoring to ensure he is available for his apprentice. It is particularly important to ask regularly, "What are you doing? Are you stuck? Do you need some help?"

The main goal of the mentorship is to keep the apprentice motivated to contribute to Python. Compared to the usual contributor’s experience of submitting a patch and getting no response for months, an apprentice with a committed mentor will have reliable feedback and encouragement. If the mentor and apprentice stay engaged for long enough, the apprentice can earn the mentor’s trust and be nominated for promotion to the core team.

In Stinner’s view, mentorship must happen in private, so the apprentice can be comfortable asking “dumb” questions. The mentorship should also be secret, at least at the beginning. Core developer Ned Deily commented that it would be helpful to know who is being mentored so he could prioritize reviewing their patches and answering their questions. But Stinner said he does not announce when he begins mentoring someone, to avoid pressure. “It can be very scary to see many people looking at your work.”

Mentors should provide a series of rewards for apprentices to earn. Stinner said that he initially considered gamifying the mentorship process with badges, but rejected this idea. Instead, contributors are rewarded with ever greater responsibilities. Bug triage is a good first responsibility, since the cost of mistakes is trivial: a bug closed in error can be reopened, a mislabelled bug can be labelled correctly. In Stinner’s experience, apprentices are eager to gain more responsibility and they take each new task seriously. “They understand what it means and they do their best not to make mistakes.”

Stinner invited two recently promoted core developers to describe their experience as apprentices.

Pablo Galindo Salgado



Pablo Galindo Salgado was promoted to core developer in June 2018. He told the Language Summit that one of a mentor’s most important roles is as a source of tribal knowledge. Many tasks as a core developer require knowledge of undocumented behaviors, or the historical context for a piece of code, or who is the current expert about a certain aspect of Python. Apprentices have an advantage over other contributors because they have someone to ask these questions.

According to Galindo, there must be a moment in the mentorship where the core developer encourages the apprentice to embrace failure. “I committed some mistakes in the beginning,” he said. “When you don't have context, you think you broke the world.” Victor Stinner and Yury Selivanov explained gently that everyone is human, and shared stories about their own past mistakes.

Cheryl Sabella


Cheryl Sabella became a core developer in February 2019. When she began working on CPython two years earlier, it was the first open source project she had contributed to. “So I knew nothing,” she told the Language Summit. Fortunately, she said, the community supported her as she learned git, the Python development workflow, and the codebase itself. Her first pull request was a documentation change. When Mariatta Wijaya approved it and commented with the “Ta-da” emoji, says Sabella, “I was over the moon.”

Sabella contributed for some time, especially to IDLE, and one day Stinner wrote to say that he’d granted her bug triage authority. As she recounted to the Language Summit, this new power made her nervous; she would never have asked for it. The next year, when Stinner invited her to become a core developer, she hesitated for so long that Stinner eventually told her, "Okay, I'm not going to bother you anymore about this." Then he invited her again in January 2019, saying, "Well, I told you I wasn't going to but I'm bothering you again."

Sabella said she had not begun the mentorship program with the intent of becoming a core developer, she only wanted to contribute. It was the core team’s regular guidance and cheerleading that motivated her to join.

Victor Stinner


Victor Stinner returned to the podium to share his insights as a mentor. He said mentors should choose apprentices who represent not only diverse nationalities and genders, but also diverse skill levels. The core team spends much of their time reviewing contributors’ pull requests, and they need a variety of skills and personalities to review them all: some patches are documentation, some are in Python or C, some require specialized knowledge, some are just very tedious.

Stinner said that mentors should accept a range of outcomes. “It's not a failure if, at the end of some mentoring, the mentoree doesn't become a core developer.” Mentorships are often interrupted by professional duties or events in either participant’s life, or the mentor and apprentice turn out to be a poor match. There is value from the relationship nevertheless. The apprentice becomes a better Python contributor and a better programmer. The mentor learns, by observing the apprentice’s difficulties, about barriers to entry on the Python project, such as gaps in the documentation or tooling.

Mentoring is a small burden, Stinner told the Language Summit. Apprentices are only available one day a week, typically, because Python competes with a regular job or university program, thus they can only consume a few hours of the mentor’s time. The mentorship program is efficient and effective: In the previous twelve months, five apprentices have been promoted to core developer. Stinner told the Language Summit, “I think everybody in this room can do more mentoring.”

Monday, May 27, 2019

Mariatta Wijaya: Let's Use GitHub Issues Already!


Core developer Mariatta Wijaya addressed her colleagues, urging them to switch to GitHub issues without delay. Before she began, Łukasz Langa commented that the previous two sessions had failed to start any controversies. “Come on, we can do better!”

Wijaya replied, “Hold my tequila.”

Read more 2019 Python Language Summit coverage.

Python’s Issue Tracker Is Stagnating


The current Python bug tracker is hosted at bugs.python.org (“BPO”) and uses a bug tracker called Roundup. Roundup’s development is stagnant, and it lacks many features that the Python project could use. In theory, the Python community could improve Roundup, but there are barriers: Roundup is versioned in Mercurial and it has no continuous integration testing. “If the community cared about improving bugs.python.org,” Wijaya asked, “why we haven't been doing it all this time? Seems like you're interested in doing something else.”

Compared to Roundup, GitHub issues have a number of superior features. Project administrators can easily edit issues there or report abuse. GitHub issues permit replying by email, and GitHub supports two-factor authentication. The GitHub APIs allow the core team to write bots that take over many Python development chores. Already, bots backport patches and enforce the Contributor License Agreement (“CLA”); bots could become even more powerful once issues are moved to GitHub.

GitHub Issues: A Yearlong Debate


Shortly after last year’s summit, Wijaya proposed in PEP 581 that Python migrate to GitHub issues. She acknowledged that it was wrenching to give up on BPO and Roundup after so many years. However, in her opinion, it is time to move to a different issue tracker, and GitHub is the natural choice. The core developers are all familiar with it, as well as most potential contributors.

The plan for moving to GitHub issues is split into two PEPs: The rationale is explained in PEP 581, and the migration plan is in PEP 588. The first steps are to back up all GitHub data already associated with the repository, and to set up a CLA assistant for issues—research for both tasks is in progress. Additionally, the Python organization on GitHub needs a bug triage team for people with permission to label or close tickets.

Of course, the main job is to copy thousands of issues from Roundup to GitHub with maximum fidelity, which requires knowledge of the Roundup codebase. Wijaya asked for help from someone who could write the migration code or teach her how to do it. Either way, it is likely to be the core team’s final encounter with Roundup’s code.

“Now,” said Wijaya, “Let's just use Github already! Why aren't we doing this yet?” She asked the audience what anxieties GitHub issues provoked, or what questions were still unanswered in her PEP. The sooner the migration is complete, she believes, the better for the core developers and the entire Python community.

Discussion


Ned Deily suggested revising the Python Development Guide early to describe the GitHub issues workflow before migration begins. This would prevent a period of confusion among core developers after the migration. Besides, the process of updating the Guide might flush out more details that the PEPs need to specify.

Thomas Wouters made a proposal, which he feared was controversial: Don’t migrate the old bugs. Wijaya and audience members responded with several versions of this idea. BPO could be made read-only, with the addition of a “Migrate to GitHub” button on bugs that anyone could press if they cared about an old bug. Or BPO could stay read-write for a while; active bugs would be automatically migrated until a sunset date. Some issues have useful patches or comments which should not be lost, so either BPO must be kept online with links from GitHub issues to their BPO ancestors, or else each BPO issue’s entire history must be copied to GitHub.

Guido van Rossum concluded that there were many decisions yet to be made before the migration could begin. “I'm not trying to say let's spend another year thinking about this,” he said. “I want this as badly as you want it.” However, the team must consider the consequences more carefully before they act.

Steve Dower spoke up to say that he would prefer to stay on BPO. The current tracker’s “experts index” is particularly useful: it automatically notifies the Windows team, for example, when a relevant bug is filed, and there is no equivalent GitHub feature. He rebelled at being told in effect, “Here is the change, why haven't we done it already?” He felt the default decision on any PEP ought to be maintaining the status quo.

Barry Warsaw said, “Let's remember we have friends at GitHub that will help us with the process.” If the core team finds missing features in GitHub issues, perhaps GitHub will implement them.

Carol Willing argued, “There comes a point in time when have to put a stake in the ground. Nobody's saying Github is perfect, but you need to ask, are we holding back other contributions by staying on BPO?” Many scientific Python projects such as NumPy already track their issues on GitHub. If Python migrates to GitHub issues it could interact better with them, as well as with future projects that take Python in new directions. “By staying locked in bugs.python.org, we're doing ourselves a disservice.”

Postscript


Two weeks after the Summit, PEP 581 was officially approved, making the migration to GitHub inevitable.

Tuesday, May 21, 2019

Petr Viktorin: Extension Modules And Subinterpreters

When a Python subinterpreter loads an extension module written in C, it tends to unwittingly share state with other subinterpreters that have loaded the same module, unless that module is written very carefully. Petr Viktorin addressed the Python Language Summit to describe the problem in detail and propose a cleaner isolation of subinterpreters.

Read more 2019 Python Language Summit coverage.

Python-Based Libraries Use Subinterpreters For Isolation


Python can run several interpreter instances in a single process, keeping each subinterpreter relatively isolated from the others. There are two ways this feature could be used in the future, but both require improvements to Python. First, Python could achieve parallelism by giving each subinterpreter its own Global Interpreter Lock (GIL) and passing messages between them; Eric Snow has proposed this use of subinterpreters in PEP 554.

Another scenario is when libraries happen to use Python as part of their implementation. Viktorin described, for example, a simulation library that uses Python and NumPy internally, or a chat library that uses Python and asyncio. It should be possible for one application to load multiple libraries such as this, each of which uses a Python interpreter, without cross-contamination. This use case was the subject of Viktorin’s presentation. The problem, he said, is that “CPython is not ready for this,” because it does not properly manage global state.

There Are Many Kinds Of Global State


Viktorin described a hierarchy, or perhaps a tree, of kinds of global state in an interpreter.

Process state: For example, open file descriptors.

Runtime state: The Python memory allocator’s data structures, and the GIL (until PEP 554).

Interpreter state: The contents of the "builtins" module and the dict of all imported modules.

Thread state: Thread locals like asyncio’s current event loop; fortunately this is per-interpreter.

Context state: Implicit state such as decimal.context.

Module state: Python variables declared at file scope or with the “global” keyword, which in fact creates module-local state.


Module State Behaves Surprisingly


With a series of examples, Viktorin demonstrated the subtle behavior of module-level state.

To begin with a non-surprising example, a pure-Python module’s state is recreated by re-importing it:

import enum
old_enum = enum
del sys.modules['enum']
import enum
old_enum == enum  # False

But surprisingly, a C extension module only appears to be recreated when it is re-imported:

import _sqlite3
old_sqlite3 = _sqlite3
del sys.modules['_sqlite3']
import _sqlite3
old_sqlite3 == _sqlite3 # False

The last line seems to show that the two modules are distinct, but as Viktorin said, “This is a lie.” The module’s initialization is not re-run, and the contents of the two modules are shared:

old_sqlite3.Error is _sqlite3.Error # True

It is far too easy to contaminate other subinterpreters with these shared contents—in effect, a C extension’s module state is therefore a process global state.

Modules Must Be Rewritten Thoughtfully


C extensions written in the new style avoid this problem with subinterpreters. Not all C extensions in the standard library are updated yet; Christian Heimes commented that the ssl module must be ported to the new style of initialization. Although it is simple to find modules that must be ported, the actual porting requires thought. Coders must meticulously distinguish among different kinds of global state. C static variables are process globals, PyState_FindModule returns an interpreter-global reference to a module, and PyModule_GetState returns module-local state. Each nugget of module data must be deliberately placed at one of the levels in the hierarchy.

As an example of how tricky this is, Viktorin pointed out a bug in the csv module. If it is imported twice, exception-handling breaks:

import _csv
old_csv = _csv
del sys.modules['_csv']
import _csv
try:
    # Pass an invalid array to reader(): should be a string, not 1.
    list(old_csv.reader([1]))
except old_csv.Error:
    # The exception clause should catch the error but doesn't.
    pass

The old_csv.reader function ought to raise an instance of old_csv.Error, which would match the except clause. In fact, the csv module has a bug. When it is re-imported it overwrites interpreter-level state, including the _csv.Error type, instead of keeping its state at the module-local level.

Audience members agreed this was a bug, but Viktorin insists that this particular bug is merely a symptom of a larger problem: it is too hard to write properly isolated extension modules. Viktorin and three coauthors have proposed PEP 573 to ease this problem, with special attention to exception types.

Viktorin advised all module authors to keep state at the module level. He recognized that this is not always possible: for example, the Python standard library’s readline module wraps the C readline library, which has global hooks. These are necessarily process-global state. He asked the audience, how should this scenario be handled? Should readline error if it is imported in more than one subinterpreter? He said, “There’s some thinking to do.” In any case, CPython needs a good default.

The correct way to code a C extension is to use module-local state, and that should be the most obvious place to store state from C. It seems to Viktorin that the newest style APIs do emphasize module-local state as he desires, but they are not yet well-known.

Further reading:

PEP 384 (3.2): Defining a Stable ABI

PEP 489 (3.5): Multi-phase extension module initialization

PEP 554 (3.9): Multiple Interpreters in the Stdlib

PEP 573 (3.9): Module State Access from C Extension Methods

Not a PEP yet: CPython C API Design Guidelines (layers & rings)

Saturday, May 18, 2019

Scott Shawcroft: History of CircuitPython



Scott Shawcroft is a freelance software engineer working full time for Adafruit, an open source hardware company that manufactures electronics that are easy to assemble and program. Shawcroft leads development of CircuitPython, a Python interpreter for small devices.

The presentation began with a demo of Adafruit’s Circuit Playground Express, a two-inch-wide circular board with a microcontroller, ten RGB lights, a USB port, and other components. Shawcroft connected the board to his laptop with a USB cable and it appeared as a regular USB drive with a source file called code.py. He edited the source file on his laptop to dim the brightness of the board’s lights. When he saved the file, the board automatically reloaded the code and the lights dimmed. “So that's super quick,” said Shawcroft. “I just did the demo in three minutes.”

Read more 2019 Python Language Summit coverage.

CircuitPython Is Optimized For Learning Electronics

The history of CircuitPython begins with MicroPython, a Python interpreter written from scratch for embedded systems by Damien George starting in 2013. Three years later, Adafruit hired Shawcroft to port MicroPython to the SAMD21 chip they use on many of their boards. Shawcroft’s top priority was serial and USB support for Adafruit’s boards, and then to implement communication with a variety of sensors. “The more hardware you can support externally,” he said, “the more projects people can build.”

As Shawcroft worked with MicroPython’s hardware APIs, he found them ill-fitting for Adafruit’s goals. MicroPython customizes its hardware APIs for each chip family to provide speed and flexibility for hardware experts. Adafruit’s audience, however, is first-time coders. Shawcroft said, “Our goal is to focus on the first five minutes someone has ever coded.”

To build a Python for Adafruit’s needs, Shawcroft forked MicroPython and created a new project, CircuitPython. In his Language Summit talk, he emphasized it is a “friendly fork”: both projects are MIT-licensed and share improvements in both directions. In contrast to MicroPython’s hardware APIs that vary by chip, CircuitPython has one hardware API, allowing Adafruit to write one set of libraries for them all.

MicroPython has a distinct standard library that differs from CPython’s: for example, its time functions are in a module named utime with a different feature set from the standard time module. It also ships modules with features not found in CPython’s standard library, such as advanced filesystem management features. In CircuitPython, Shawcroft removed the nonstandard features and modules. This change helps new coders ramp smoothly from CircuitPython on a microcontroller to CPython on a full-size computer, and it makes Adafruit’s libraries reusable on CPython itself.

Another motive for forking was to create a separate community for CircuitPython. In the original MicroPython project’s community, Shawcroft said, “There are great folks, and there's some not-so-great folks.” The CircuitPython community welcomes beginners, publishes documentation suitable for them, and maintains standards of conduct that are safe for minors.

Audience members were curious about CircuitPython’s support for Python 3.8 and beyond. When Damien George began MicroPython he targeted Python 3.4 compliance, which CircuitPython inherits. Shawcroft said that MicroPython has added some newer Python features, and decisions about more language features rest with Damien George.

Minimal Barrier To Entry



Photo courtesy of Adafruit.

Shawcroft aims to remove all roadblocks for beginners to be productive with CircuitPython. As he demonstrated, CircuitPython auto-reloads and runs code when the user saves it; there are two more user experience improvements in the latest release. First, serial output is shown on a connected display, so a program like print("hello world") will have visible output even before the coder learns how to control LEDs or other observable effects.

Second, error messages are now translated into nine languages, and Shawcroft encourages anyone with language skills to contribute more. Guido van Rossum and A. Jesse Jiryu Davis were excited to see these translations and suggested contributing them to CPython. Shawcroft noted that the existing translations are MIT-licensed and can be ported; however, the translations do not cover all the messages yet, and CircuitPython cannot show messages in non-Latin characters such as Chinese. Chinese fonts are several megabytes of characters, so the size alone presents an unsolved problem.

Later this year, Shawcroft will add Bluetooth support for coders to connect their phone or tablet to an Adafruit board and enjoy the same quick edit-refresh cycle there. Touchscreens will require a different sort of code editor, perhaps more like EduBlocks. Despite the challenges, Shawcroft echoed Russell Keith-Magee’s insistence on the value of mobile platforms: “My nieces, they have tablets and phones. They do not have laptops.”

Shawcroft’s sole request for the core developers was to keep new language features simple, with few special cases. First, because each new CPython feature must be reimplemented in MicroPython and CircuitPython, and special cases make this work thorny. Second, because complex logic translates into large code size, and the space for code on microcontrollers is minuscule.


Amber Brown: Batteries Included, But They're Leaking



Amber Brown of the Twisted project shared her criticisms of the Python standard library. This proved to be the day’s most controversial talk; Guido van Rossum stormed from the room during Q & A.

Read more 2019 Python Language Summit coverage.

Applications Need More Than The Standard Library

Python claims to ship with batteries included, but according to Brown, without external packages it is only “marginally useful.” For example, asyncio requires external libraries to connect to a database or to speak HTTP. Brown asserted that there were many such dependencies from the standard library to PyPI: typing works best with mypy, the ssl module requires a monkeypatch to connect to non-ASCII domain names, datetime needs pytz, and six is non-optional for writing code for Python 2 and 3.

Other standard library modules are simply inferior to alternatives on PyPI. The http.client documentation advises readers to use Requests, and the datetime module is confusing compared to its competitors such as arrow, dateutil, and moment.

Poor Quality, Lagging Features, And Obsolete Code


“Python's batteries are leaking,” said Brown. She thinks that some bugs in the standard library will never be fixed. And even when bugs are fixed, PyPI libraries like Twisted cannot assume they run on the latest Python, so they must preserve their bug workarounds forever.

There are many modules that few applications use, but there is no method to install a subset of the standard library. Brown called out the XML parser and tkinter in particular for making the standard library larger and harder to build, burdening all programmers for the sake of a few. As Russell Keith-Magee had described earlier in the day, the size of the standard library makes it difficult for PyBee to run Python on constrained devices. Brown also noted that some standard library modules were optimized in C for Python 3, but had to be reimplemented in pure Python for PyPy to support them.

Brown identified new standard library features that were “too little, too late,” leaving users to depend on backports to use those features in Python 2. For example, socket.sendmsg was added only recently, meaning Twisted must ship its own C extension to use sendmsg in Python 2. Although Python 2 is nearly at its end of life, this only holds for the core developers, according to Brown, and for users, Red Hat and other distributors will keep Python 2 alive “until the goddam end of time.” Brown also mentioned that some itertools code is shown as examples in the documentation instead of shipped as functions in the itertools module.

Guido van Rossum, sitting at the back of the room, interrupted at this moment, “Can you keep to one topic? I'm sorry but this is just one long winding rant. What is your point?” Brown responded that her point was that there are a multitude of problems in the standard library.

Standard Library Modules Crowd Out Innovation


Brown’s most controversial opinion, in her own estimation, is that adding modules to the standard library stifles innovation, by discouraging programmers from using or contributing to competing PyPI packages. Ever since asyncio was announced she has had to explain why Twisted is still worthwhile, and now that data classes are in the standard library Hynek Schlawack must defend his attrs package. Even as standard library modules crowd out other projects, they lag behind them. According to Brown, “the standard library is where code sometimes goes to die,” because it is difficult and slow to contribute code there. She acknowledged recent improvements, from Mariatta Wijaya’s efforts in particular, but Python is still harder to contribute to than PyPI packages.

“So I know a lot of this is essentially a rant,” she concluded, “but it's fully intended to be.”

Discussion


Nick Coghlan interpreted Brown’s proposal as generalizing the “ensurepip” model to ensure some packages are always available but can be upgraded separately from the standard library, and he thought this was reasonable.

Van Rossum was less convinced. He asked again, “Amber, what is your point?” Brown said her point was to move asyncio to PyPI, along with most new feature development. “We should embrace PyPI,” she exhorted. Some ecosystems such as Javascript rely too much on packages, she conceded, but there are others like Rust that have small standard libraries and high-quality package repositories. She thinks that Python should move farther in that direction.

Van Rossum argued instead that if the Twisted team wants the ecosystem to evolve, they should stop supporting older Python versions and force users to upgrade. Brown acknowledged this point, but said half of Twisted users are still on Python 2 and it is difficult to abandon them. The debate at this point became personal for Van Rossum, and he left angrily.

Nathaniel Smith commented, “I'm noticing some tension here.” He guessed that Brown and the core team were talking past each other because the core team had different concerns from other Python programmers. Brown went further adding that because few Python core developers are also major library maintainers, library authors’ complaints are devalued or ignored.

The remaining core developers continued the technical discussion. Barry Warsaw said that the core team had discussed deprecating modules in the standard library, or creating slim distributions with a subset of it, but that it required a careful design. Others objected that slimming down the standard library risked breaking downstream code, or making work for programmers in enterprises that trust the standard library but not PyPI.

Pablo Galindo Salgado was concerned that moving modules from the standard library to PyPI would create an explosion of configurations to test, but in Brown’s opinion, “We are already living that life.” Some Linux and Python distributions have selectively backported features and fixes, leading to a much more complex set of configurations than the core team realizes.

Wednesday, May 15, 2019

Paul Ganssle: Time Zones In The Standard Library

Python boasts that it comes with “batteries included,” but programmers have long been frustrated at one set of missing batteries: the standard library does not include any time zone definitions. The datetime module supports the idea of time zones, but a programmer who wants to know when Daylight Saving Time starts in Cleveland must install a third-party package. Paul Ganssle spoke to the Python Language Summit to offer a solution. Ganssle maintains the PyPI package dateutil, and contributes to the standard library datetime module. He described the state of Python time zone support and how time zone definitions could be added to the standard library.

Read more 2019 Python Language Summit coverage.

Python Comes With Limited Time Zone Support


A time zone is a function that maps a naïve time to an unambiguous Coordinated Universal Time (UTC). Individual time zones can be quite eccentric, so Python does not attempt to define time zone logic, it simply provides an abstract base class TZInfo that is subclassed by implementors. Although there could theoretically be unlimited kinds of time zones, most programmers encounter three concrete types:

1. UTC or a fixed offset from it.

2. Local time.

3. A time zone from the IANA database.

The first of these was added to the standard library in Python 3.2. Ganssle said, “Whenever I teach people about datetimes, it's really nice to be able to say, if you're using Python 3, you can just have a UTC object.” The purpose of Ganssle’s proposal was to add the second and third.

Ambiguous Times


Ganssle explained that when Eastern Daylight Time ends, clocks are set back from 2:00am to 1:00am, thus there are two UTC times that map to 1:30am local time on that day:

>>> NYC = tz.gettz("America/New_York")
>>> dt0 = datetime(2004, 10, 31, 5, 30, tzinfo=tz.UTC)
>>> print(dt0.astimezone(NYC))
2004-10-31 01:30:00-04:00
>>> print((dt0 + timedelta(hours=1)).astimezone(NYC))
2004-10-31 01:30:00-05:00

PEP 495 solved the problem of ambiguous times by adding the “fold” attribute to datetime objects. A datetime with fold=0 is the first occurrence of that local time, the second occurrence has fold=1. With this addition, standard Python provides all the prerequisites for proper time zones, so Ganssle argued they should now be added to the standard library.

How To Maintain The Time Zone Definitions?


IANA time zones are the de facto standard for time zone data, and they ship with many operating systems. Both Ganssle’s dateutil and the competing pytz package use the IANA database as their source of truth. Therefore it would be natural to include the IANA time zones in the Python standard library, but this presents a problem: the IANA database changes every time a government changes a time zone, which occurs as often as 20 times a year. Time zone changes are far more frequent than Python releases.

Ganssle offered two solutions for updating time zone data, and then offered a compromise between them as his actual proposal. The first solution is to rely on the operating system’s time zone database. Python could rely on the system update mechanism to refresh this data, and it would use the same time zone definitions as most other applications. System time zone data is not officially supported on Windows, however, and is not always installed on Linux.

The second solution is to publish IANA time zone definitions as a PyPI package. It could be updated frequently, but the core team would have to invent some way to notify users when it is time to update their time zone data. Plus, it would be risky for Python to use different time zones than the rest of the system.

Ganssle proposed a hybrid: the Python standard library should use the system’s time zone data if possible, otherwise fall back to a PyPI package which would be installed conveniently, analogous to installing pip with “ensurepip” today.

The Local Time Zone


Naïve times in Python are sometimes treated as times in the local time zone, sometimes not. Ganssle showed an example demonstrating that if a programmer converts a naïve time to UTC, Python assumes its original time zone is local:

>>> dt = datetime(2020, 1, 1, 12)
>>> dt.astimezone(timezone.utc)
2020-01-01 17:00:00+00:00

However, adding a naïve time to a UTC time is prohibited:

>>> datetime(2020, 1, 1, 12) - datetime(2020, 1, 1, tzinfo=timezone.utc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't subtract offset-naive and offset-aware datetimes

Ganssle’s dateutil package offers a more thorough implementation of “local time zone”, and he thinks Python programmers would appreciate local times in the standard library. To add them, however, the core team must first handle the astonishing behavior of local times when the system time zone changes. The first surprise is that changing the system time zone has no effect until the Python program calls time.tzset(). (And on Windows, time.tzset() is not available.) The second surprise is that changing system time and then calling time.tzset() changes the UTC offset of existing times created before the change.

Ganssle proposed several ways the standard library could act in this scenario. It could ignore changes to the system time zone while a Python program is running, or it could detect time zone changes but avoid mutating the offsets of existing time objects. He had no opinion about the best outcome.

Conclusion


Ned Deily wondered what Ganssle’s proposal would solve that which pytz does not. Ganssle responded that pytz’s author has stopped maintaining the package because he believes time zones should move to the standard library. Full time zone is a basic feature that should always be available. In Ganssle’s view, however, his own dateutil is a better package to emulate than pytz. “I would take dateutil, clean up some of the rough edges, and propose it as some of the batteries that would be included.”

Łukasz Langa said that he planned, as Python 3.8’s release manager, to issue monthly patch releases, and he thought that should be frequent enough to keep users’ time zone data updated. Russell Keith-Magee said no, North Korea once announced a time zone change with three days’ notice. Other audience members thought this scenario was obscure, and the PEP should not be required to handle such emergencies.

At the end of his talk Ganssle summarized his proposal. He believes that the standard library should support IANA time zones, using the operating system as the source of time zone data or falling back to a PyPI package. There are several options for handling local time zone changes at runtime. The design should be formalized in at least an informational PEP, “if not one where it's contentious and we all hate each other at the end of it.”