Tuesday, June 18, 2019

PyPI Now Supports Two-Factor Login via WebAuthn

To further increase the security of Python package downloads, we're adding a new beta feature to the Python Package Index: WebAuthn support for U2F compatible hardware security keys as a two-factor authentication (2FA) login security method. This is thanks to a grant from the Open Technology Fund, coordinated by the Packaging Working Group of the Python Software Foundation.

Last month, we added the first 2FA method for users logging into the canonical Python Package Index at PyPI.org and the test site at test.pypi.org. Hundreds of project owners and maintainers have now started using that method (generating a code through a Time-based One-time Password (TOTP) application) to better secure their accounts.

Starting today, PyPI also supports (in beta) WebAuthn (U2F compatible) security keys for a second login factor. A security key (also known as a universal second factor, or U2F compatible key) is hardware device that communicates via USB, NFC, or Bluetooth. Popular keys include Yubikey, Google Titan and Thetis. PyPI supports any FIDO U2F compatible key and follows the WebAuthn standard. Users who have set up this second factor will be prompted to use their key (usually by inserting it into a USB port and pressing a button) when logging in. (This feature requires JavaScript.)

This is a beta feature and we expect that users will find minor issues over the next few weeks; we ask for your bug reports. If you find any potential security vulnerabilities, please follow our published security policy. (Please don't report security issues in Warehouse via GitHub, IRC, or mailing lists. Instead, please directly email one or more of our maintainers.) If you find an issue that is not a security vulnerability, please report it via GitHub.

We encourage project maintainers and owners to log in and go to your Account Settings to add a second factor. This will help improve the security of your PyPI user accounts, and thus reduce the risk of vandals, spammers, and thieves gaining account access. If you're not yet comfortable using a beta feature, you can provision a TOTP application for your second factor.

You'll need to verify your primary email address on your Test PyPI and/or PyPI accounts before setting up 2FA. You can also do that in your Account Settings.

2FA only affects login via the website, which safeguards against malicious changes to project ownership, deletion of old releases, and account takeovers. Package uploads will continue to work without users providing 2FA codes.

But that's just for now. We are working on implementing per-user API keys as an alternative form of multifactor authentication in the setuptools/twine/PyPI auth flows. These will be application-specific tokens scoped to individual users/projects, so that users will be able to use token-based logins to better secure uploads. And we'll move on to working on an advanced audit trail of sensitive user actions, plus improvements to accessibility and localization for PyPI. More details are in our progress reports.

Thanks to the Open Technology Fund for funding this work. And please sign up for the PyPI Announcement Mailing List for future updates.

Wednesday, June 12, 2019

2019 Board of Directors Election - Voting is Open

Voting is currently open for the 2019 Python Software Foundation Board of Directors Election. We have a great list of candidates this year so if you received a ballot, please vote.

Who should have received a ballot?

If you became a PSF Supporting Member*, Contributing Member, Managing Member and/or Fellow by May 31, 2019 you are eligible to vote. You should have received a ballot from Helios with details on how to cast your vote. If you cannot find the email, please search your inbox and also check your spam for the word "helios".

Once you login to Helios, be sure to follow the process until you see "Congratulations, your vote has been successfully cast!".

* Must be a current membership and not expired as of May 31, 2019

When do I need to vote by?

Voting opened June 7th and will close by the end of June 16 AoE.

How do I become a voting member?

If you're currently not a voting member but wish be to a voting member for future elections (2020 and on), here are some options for you to consider:

Contribute to the PSF $99 yearly by becoming a Supporting Member. You can sign up via http://psfmember.org.
If you dedicate at least five hours per month working to support the Python ecosystem you can become a Managing Member. If you dedicate at least five hours per month working on Python-related projects that advance the mission of the PSF you can become a Contributing Member. You can self certify via https://forms.gle/vbJvweHW8rimAjYd6. You must be a basic member before you apply to be a Contributing/Managing member.
If you know of someone that has gone above and beyond in their contributions to the Python community, consider nominating them for the PSF Fellow membership level. Details are available here: https://www.python.org/psf/fellows/.

If you have any questions about the PSF Election, please contact the PSF staff: psf-staff at python dot org.

--------------------------------------------------

The PSF is currently holding its 2019 Fundraiser. As a non-profit organization, the PSF depends on sponsorships and donations to support the Python community. Check out our Annual Impact Report for more details: https://www.python.org/psf/annual-report/2019/.

Please consider contributing to the PSF's 2019 fundraiser; we can't continue our work without your support! https://www.python.org/psf/donations/2019-q2-drive/.

Tuesday, June 04, 2019

Python Language Summit Lightning Talks, Part 2

The Summit concluded with a second round of lightning talks, which speakers had signed up for that day. These talks were therefore more off-the-cuff than the morning's talks, and several of them were direct responses to earlier presentations.

Read more 2019 Python Language Summit coverage.

Christian Heimes

SSL Module Updates

Python’s ssl module depends on OpenSSL. On Linux, Python uses the system OpenSSL, but on Mac and Windows it ships its own. Christian Heimes explained to the Language Summit that Python 3.7 must upgrade its included version of OpenSSL to 1.1.1 to receive long-term support, but he warned that this change might cause unexpected issues for Python programmers on Mac and Windows.

Heimes wants to deprecate support for TLS 1.1 as soon as possible. Recent Linux distributions and browsers already prohibit the old protocol for security reasons. In Python 3.8 he plans to document that TLS 1.1 “may work”, depending on the underlying OpenSSL version, and in Python 3.9 it will be explicitly banned. Larry Hastings asked whether this change could be spread over two Python releases, in the way that most features are first deprecated and then removed. Heimes replied that OpenSSL itself is moving this quickly.

Python has poor support for the root certificate authority files included in the operating system. On Linux and BSD, ssl.create_default_context() uses the root CAs correctly. On Windows, according to Heimes, root CAs are partly broken despite “a hack I added a couple years ago that does not work for technical reasons.” And on macOS there is no support without installing the certifi package. Heimes proposed to rely more on the operating system: on Mac and Windows in particular, the interpreter should ask the OS to verify certificates against its known CAs, instead of asking OpenSSL.

It has been several years since Heimes and Cory Benfield began drafting PEP 543. This PEP would decouple Python’s API from the specifics of the OpenSSL library, so it could use OS-provided TLS libraries on macOS, Windows, and elsewhere. Heimes told the Language Summit that he and Paul Kehrer would work on PEP 543 during the post-PyCon sprints.

Larry Hastings

Let’s Argue About Clinic

Argument Clinic is a tool used in the implementation of CPython to generate argument-parsing code for C functions that are used from Python; i.e., “builtin” functions. (It is named for a Monty Python sketch.) Since its original intent was to create C functions that handle their arguments like Python functions do, it only handles Python-like function signatures.

Larry Hastings addressed the Language Summit to ask whether Argument Clinic ought to be extended to handle argument parsing more generally, including function signatures that would not be possible in pure Python. For example, some builtin functions have parameters with a default value of NULL, which is representable in C but not in Python. Hastings said he had observed developers wanting to use Argument Clinic for all builtin functions because it is convenient to use and generates fast code.

Eric Snow

The C API

One of the reasons for CPython’s success is its powerful C API, which permits C extensions to interact with the interpreter at a low level for the sake of performance or flexibility. But, according to Eric Snow, the C API has become a hindrance to progress because it is so tightly coupled to CPython’s implementation details. He identified several problems with the current CPython implementation, such as the GIL, but said, “we can't go very far fixing those problems without breaking the C API.”

One solution is to split the C API into four categories. The CPython header files would be split into four directories to make it more obvious to core developers and extension authors which category each type or function belongs to:

“internal” — “Do not touch!”
“private” — “Use at your own risk!”
“unstable” — “Go for it (but rebuild your extension each Python release)!”
“stable” — “Worry-free!”

There are a number of other solutions proposed or in progress:

It might be possible to use CFFI in place of the C API to achieve a “better C API”, or to use Cython.
Victor Stinner has a longstanding project to improve the C API by removing bad features and providing a second-generation API that is less coupled to CPython internals.
Steve Dower has a higher-level proposal to describe what an ideal C API would look like, if Python could ever get there. Snow said, “Really great proposal, I think we should continue with that one.”
Petr Viktorin spoke earlier in the day about isolating subinterpreters, and Snow has proposed to enable parallelism by giving each subinterpreter its own GIL.
PEP 432 will restructure CPython’s initialization and finalization sequence.

Snow finished by inviting interested people to join him on the C API special interest group mailing list.

Steve Dower

Python in the Windows Store

When a Windows user types python in the command shell on a clean system, the shell typically responds, “python is not recognized as an internal or external command”. After the May Windows update this will change: Typing python in the shell will now open the Microsoft Store and offer to install Python. When Steve Dower showed the install screen to the Language Summit the audience broke into applause.
screenshot

The package is owned by the Python core developers; Microsoft’s contribution was to add the python command stub that opens the install page. Compared to the installer that users can download from python.org, said Dower, “the Microsoft Store is a more controlled environment.” Its distribution of Python is easier to install and upgrade, at the cost of some inconsistencies with the full Python install. “It's not going to work for everyone.” Advanced developers and those who want multiple versions of Python will prefer to install it themselves, but the Microsoft Store will satisfy programmers who simply need Python available. “Everyone with a Windows machine in any of the tutorial rooms right now should probably be using this,” he said.

It had not been possible to install Python so conveniently on Windows, until recent changes to the Microsoft Store. For example, Store apps were originally prohibited from accessing their current working directory, but apps are now permitted virtually the same permissions as regular programs.

Carol Willing asked whether the Store version of Python could be used for reproducing data science results. “There are a number of situations where I would say don't use this package,” responded Dower. Since the Microsoft Store will automatically update its version of Python whenever there is a new release, data scientists who care about reproducibility should install Python themselves.

Nathaniel Smith

Bors: How Rust Handles Buildbots and Merge Workflow

(Or: One way to solve Pablo’s problems)

In response to Pablo Galindo Salgado’s earlier talk about the pain caused by test failures, Nathaniel Smith whipped up a talk about the Rust language’s test system. The Rust community observes what they call the Not Rocket Science Rule: “Automatically maintain a repository of code that always passes all the tests.” Although it is obvious that all projects ought to adhere to this rule, most fail to, including Python. How does Rust achieve it?

When a Rust developer approves a pull request, the “bors” bot tests it and, if the tests pass, merges the pull request to master.

This seems quite elementary, as Smith acknowledged. But there are two unusual details of the bors system that enforce the Not Rocket Science Rule. The first is that bors tests pull requests in strict sequence. It finds the next approved pull request, merges it together with master, tests that version of the code, and if the tests pass bors makes that version the new master, otherwise it rejects the pull request. Then, bors moves to the next pull request in the queue. Compared to the typical system of testing pull requests before merging, bors’s algorithm tests the version of the code that will actually be published.

The second way the Rust community enforces the Not Rocket Science Rule is by requiring the bors process for all pull requests. “They do this for everything,” said Smith. “This is how you merge. There's no green button.” Taken together, bors’s algorithm and the workflow requirement ensure that Rust always passes its tests on master.

Smith described some conveniences that improve the Rust developer experience. First, bors can be triggered on a pull request before approving it, as a spot check to see whether the code passes the test suite as-is. Second, since bors must test pull requests one at a time, it has an optimization to prevent it from falling behind. It can jump ahead in the queue, merging a large batch of pull requests together and testing the result. If they pass, they can all be merged; otherwise, bors bisects the batch to find the faulty change, alerts its author, and merges all the pull requests before it.

The Rust project currently uses a successor to bors called Homu, written in Python. There are several other implementations, including bors-ng, which is available as a public service for any GitHub repository.

Victor Stinner

Status of stable API and stable ABI in Python 3.8

Python 3.8 will improve the long-term stability of the C API and ABI for extension authors. Some of the details are settled, but Victor Stinner’s presentation to the Language Summit showed there are still many unanswered questions.

As Eric Snow had mentioned, C header files in Python 3.8 will be split into directories for the public stable APIs, the unstable CPython-specific API, and the internal API. Afterwards there will be less risk of exposing an internal detail in the public API by mistake, since it will be obvious whenever a pull request changes a public header.

CPython’s internal API headers were not installed by “make install” in the past, but it could be useful for a debugger or other low-level tool to inspect the interpreter’s internal data structures. Thus, in Python 3.8 the internal headers will be installed in a special subdirectory.

In the course of Stinner’s regular work at RedHat he often debugs problems with customers’ third-party C extension modules. A debug build of the extension module might not be available, but Stinner could gather some useful information by loading the extension module with a debug build of Python. Today, this is impossible: debug builds of Python only work with debug builds of extension modules and vice versa. The debug build of Python 3.8, however, will be ABI compatible with the release build, so the same extension modules will work with both.

Another motivation for updating the C API is isolation of subinterperters. Stinner referred to Petr Viktorin’s talk about removing process-wide global state for the sake of proper isolation, and possibly giving each subinterpreter its own GIL.

Attaining a clean, stable API and ABI may require breaking the current one; the core developers’ discussion focused on how much to break backwards compatibility and what techniques might minimize the impact on extension authors. The shape of Python 3.8’s C API and ABI are not yet settled. When Steve Dower asked whether Stinner was proposing a new stable ABI, Stinner answered, “I’m not sure what I’m proposing.”

Yarko Tymciurak

Cognitive Encapsulation

The Anchor of Working Together

Tymciurak began his lightning talk by complimenting the Language Summit participants. “In terms of communication skills, you do such a great job that I'm shocked sometimes.”

The factors that contribute to collaboration, he said, begin with a diversity of skills. As Victor Stinner had mentioned in his mentorship talk, a team with mixed skill sets and skill levels has advantages over a team of homogenous experts. Members of high-performing teams are also enthusiastic about common goals, they are personally committed to their teammates, and they show strong interpersonal skills.

Tymciurak credited Guido van Rossum for establishing the importance of teamwork from the beginning. Nevertheless, he said, “sometimes things may go off track.” The cause of irreconcilable disagreements or emotional blowups are not always obvious, but Tymciurak claimed that to him, “it's immediately obvious and really simple to fix.”

Cognitive encapsulation is the awareness that one’s experience of reality is not reality itself. “It’s your own mental model,” said Tymciurak. When we communicate, if explicitly share with others what we think, see, or hear, then we are respecting cognitive encapsulation. As Tymciurak describes it, “That’s being aware that my thoughts are my own.” On the other hand, if we assume that others already agree with us, or we represent our personal experience as if it is the only possible experience for the whole group, then encapsulation is violated and we are likely to cause conflict.

As an example of cognitive encapsulation at work, Tymciurak contrasted two types of communication. One is transactional. Someone asks, “Where’s the meeting?” You answer by saying which room it is in. Another type is control communication. If an instructor commands students to “turn to page 47,” then control communication is appropriate and the students will accept it. But when a team member uses control communication without the team’s acceptance, conflict arises. Tymciurak said, “When you tell someone else what to do, you're breaking the encapsulation. Be careful. There's times when it's appropriate. But be aware of when it's not.”

Another key practice that preserves cognitive encapsulation is to truly listen. Especially when the speaker is a junior teammate, it is crucial to be able to listen without agreeing, disagreeing, or correcting. Tymciurak described the outcome of a team that works together this way. Individuals know that they understand each others’ views, and they can advocate for their own views, speaking from their own experience. “Then you can speak with authority and power. And that's part of the magic of encapsulation.”

Monday, June 03, 2019

Python Language Summit Lightning Talks, Part 1

The Summit began with six pre-selected lightning talks, with little time for discussion of each. Five of them are summarized here. An upcoming article will cover Pablo Galindo Salgado's lightning talk on improvements in Python's test infrastructure.

Read more 2019 Python Language Summit coverage.

Jukka Lehtosalo

Writing Standard Library C Modules In Python

Jukka Lehtosalo described his work with Michael Sullivan on an experimental compiler called mypyc.

The Python standard library, Lehtosalo said, contains the modules that most programmers use by default, so it should be fast. The main optimization technique has historically been to write C extensions. So far, 90 standard library modules are partly or entirely written in C, often for the sake of speed, totaling 200,000 lines of C code in the standard library. But C is hard to write and error prone, and requires specialized skills. “C is kind of becoming a dinosaur,” he said, provoking laughter from the core developers.

As an alternative, Lehtosalo proposes “writing C extensions in Python.” The mypyc compiler reads PEP 484 annotated type-checked Python, and transforms it into C extension modules that run between 2 and 20 times faster than pure Python. Some of Python’s more dynamic features such as monkeypatching are prohibited, and other features are not yet supported, but the project is improving rapidly.

The project has a similar goal to Cython’s: to transform Python into C, which is then compiled into extension modules. Compared to Cython, however, mypyc supports a wider range of PEP 484 types such as unions and generics. In Lehtosalo and Sullivan’s experiments it offers a greater performance improvement. They propose further experimentation, testing how well mypyc translates certain performance-sensitive standard library modules, such as algorithms, random, or asyncio. The translated modules could be published on PyPI first, rather than replacing the standard library modules right away. If the test goes well, mypyc would offer “C-like performance with the convenience of Python.”

Core developer Brett Cannon suggested an experiment using some module, such as datetime, that is already implemented in both Python and C. The Python version could be translated with mypyc and then pitted against the handwritten C version.

Matthias Bussonnier

Async REPL And async-exec

Python’s interactive shell makes it easy for beginners to learn Python, and for all Python programmers to experiment as they develop. However, async Python code is practically unusable with the shell. The await keyword must be used within a coroutine, so a programmer who wants the result of an a waitable object must define a coroutine and run an event loop method to execute it.

Matthias Bussonnier presented his work, which integrates async and await into the alternative IPython shell. IPython permits the await keyword at the top level, so a user can get the results of coroutines or other awaitables in the shell without defining a coroutine:

In [1]: from asyncio import sleep

In [2]: await sleep(1)

In [3]: from aiohttp import ClientSession

In [4]: s = ClientSession()

In [5]: response = await s.get('https://api.github.com')

IPython supports asyncio and other async frameworks such as trio. In the future, a plugin system will allow any async/await-based framework to be usable in the shell.
Bussonnier argued that some of his ideas should be adopted by core Python. If asynchronous coding were convenient in the shell, it would be useful for educators, and it would remove what he considers the misconception that async is hard. Best of all, Python would get ahead of Javascript.

However, to support async and await in the shell currently requires some unsatisfying hacks. There are subtle issues with local versus global variables, background tasks, and docstrings. Bussonnier has filed issue 34616, implement "Async exec", to make full async support in the shell possible.

Update: After the Language Summit, Bussonnier and Yury Selivanov updated the Python compiler to permit await, async for, and async with as top-level syntax in the shell when executed like python -m asyncio:

Jason Fried

Asyncio And The Case For Recursion

A typical asyncio application has a single call to run_until_complete() near the top level of the application, which runs the asyncio event loop for the entire application. All code beneath this level must assume that the loop is running.

Facebook engineer Jason Fried presented to the Language Summit a scenario in which this application structure fails. Consider an async application that contains a mix of async code and blocking calls that are tolerably fast. Deep within the call stack of one of these blocking calls, a developer sees an opportunity for concurrency, so she adds some async code and executes it with run_until_complete(). This call raises “RuntimeError: This event loop is already running.” As Fried explained, any call to run_until_complete() in a call chain under async def has this result, but due to modularization and unittest mocking in Facebook’s Python architecture, this error can first arise late in the development cycle.

How should this problem be avoided? The asyncio philosophy is to avoid mixture by converting all blocking code to asynchronous coroutines, but converting a huge codebase all at once is intractable. “It's a harder problem than moving from Python 2 to 3,” he said, “because at least I can go gradually from Python 2 to 3.”

Fried suggested a solution for incrementally converting a large application, and to allow developers to add asyncio calls anywhere “without fear.” He proposed that the asyncio event loop allow recursive calls to run_until_complete(). If the loop is already running, this call will continue running existing tasks along with the new task passed in. Library authors could freely use asyncio without caring whether their consumers also use asyncio or not. “Yeah sure it's ugly,” he conceded, “but it does allow you to slowly asyncio-ify a distinct code base.”

Thomas Wouters objected that this proposal would violate many correctness properties guaranteed by the current loop logic. Amber Brown concurred. She explained that Twisted’s loop prohibits reentrance to ensure that timeouts work correctly. One of the core tenets of asynchronous programming is that all tasks must cooperate. There is no good solution, she said, for mixing blocking and async code.

Mark Shannon

Optimising CPython, Or Not

“Every few years someone comes along with some exciting new potential for speeding up CPython,” began Mark Shannon, “and a year later everyone's forgotten about it.” Some of these optimizations are worth pursuing, however. We can identify promising optimizations with a heuristic.

First, Shannon advised the audience to think in terms of time, not speed. Do not measure the number of operations Python can execute in a period; instead, measure the amount of time it requires to finish a whole task and divide the total time into chunks. As an example, Shannon described a recent proposal on the python-dev mailing list for implementing a register-based virtual machine, which would store local variables in fixed slots, rather than on a stack as the Python VM does today. How much time could such a change save? Shannon walked the audience through his thought process, first estimating the cost of the Python interpreter’s stack manipulation and guessing how much cheaper a register-based VM would be. Shannon estimates that up to 50 percent of Python’s runtime is “interpretive overhead,” and a register-based VM might reduce that significantly, so it is worth trying. However, only an experiment can measure the actual benefit.

Shannon compared the register-based VM to another optimization, “superinstructions.” The technique is to find a common sequence of bytecodes, such as the two bytecodes to load None onto the stack and then return it, and combine them together into a new bytecode that executes the whole sequence. Superinstructions reduce interpretive overhead by spending less time in the main loop moving from one bytecode to the next. Shannon suspects this technique would beat the register-based optimization.

In conclusion, Shannon advised the audience that the next time another Unladen Swallow or similar project appears, to determine first which part of the interpreter it optimizes. If the optimization targets a part of the interpreter that represents less than 90% of the total runtime, said Shannon, “it’s pretty much doomed to fail.”

Łukasz Langa

Black under github.com/python

The past year has been marked by controversy in the Python community, but consensus is forming on the most unexpected topic: code formatting. Łukasz Langa’s Black code formatter is only a year old, but it has been adopted by pytest, attrs, tox, Django, Twisted, and numerous other major Python projects. The core developers are enthusiastic about Black, too: When Langa introduced himself as its author, the room broke into applause.

Langa proposed moving black from his personal repository to the Python organization on GitHub. He said, “My goal for this is to provide a suitable default for users who don't have any preexisting strong opinions on the matter.”

Some core developers dissented, arguing that since Black is already so successful, there is no need to move it. Gregory Smith said it is not the core team’s role to bless one code formatter over others; he regrets that opinionated tools like mypy are in the official organization and he opposes adding more. Guido van Rossum suggested moving it to the Python Code Quality Authority organization; Langa responded that beginners haven’t heard of that organization and moving Black there would have no effect.

Update: Despite some objections at the Language Summit, Black is now in the official Python organization on GitHub.

Pablo Galindo Salgado: The Night's Watch is Fixing the CIs in the Darkness for You

Python is tested on a menagerie of “buildbot” machines with different OSes and architectures, to ensure all Python users have the same experience on all platforms. As Pablo Galindo Salgado told the Language Summit, the bugs revealed by multi-platform tests are “Lovecraftian horrors”: race conditions, bugs specific to particular architectures or compiler versions, and so on. The core team had to confront these horrors with few good weapons, until now.

Read more 2019 Python Language Summit coverage.

The Solemn Duty of Bug Triage

When a test fails, the core developer who triages the failure follows an arduous process. “It's not glamorous by any means,” said Galindo, “but someone needs to do it.” Galindo, Victor Stinner, and Zachary Ware are the main bug triagers, and they all follow a similar sequence: read the failure email, search for duplicate failures, read the voluminous logs to characterize the problem, and file a bug with a detailed description. Then, optionally, they try to reproduce the problem. Since failures are often specific to one buildbot, the triagers must contact the buildbot’s owner and get permission to ssh into it and debug.

According to Galindo, typical test failures are “really, really complicated,” so the triage team takes a firm stance about reverting changes. If they suspect that a change has broken a test, its author has one day to follow up with a fix or the change will be reverted. “Nobody likes their commits to be reverted,” he told the Language Summit. But test failures can cause cascading failures later on, so the team must be ruthless.

New Tools for Squashing Bugs

A pull request is not tested by the buildbots until after it is merged, so the author does not immediately know if they have broken any tests. Galindo and his colleagues have written a bot which reacts to a test failure by commenting on the merged pull request that caused it, with reassuring instructions to help the panicked author respond. “We have some arcane magic,” he said, to distinguish compiler errors from tracebacks and neatly format them into the message, so the author get begin diagnosing immediately.

Since the bot was deployed in September, the mean time to fix a test failure has fallen dramatically. When Galindo showed this chart, the core developers broke into applause.

Nevertheless, there are still severe problems with Python’s tests. Flaky tests break about 40% of the builds; the system is programmed to retry a failure and consider it successful if the second run passes, but this is clearly just a stopgap. Galindo urged the core team to reduce flaky tests by eliminating race conditions and sleeps. He also asked for help writing a tool that would analyze which flaky tests fail most often, and a tool to detect and merge duplicate test failures.

Finally, Galindo proposed allowing contributors to test their pull requests on the buildbots before merging. This feature should be implemented cautiously. “The buildbots are very delicate,” he said. They cannot safely run arbitrary code like on Travis or other commercial test infrastructures. Still, it would be worth the effort, if contributors could catch mistakes before they are merged.