Friday, June 14, 2024

The Python Language Summit 2024: Python's security model after the xz-utils backdoor

Pablo Galindo Salgado describing the xz-utils backdoor
(Photo credit: Hugo van Kemenade)
 

The backdoor of the popular compression project xz-utils was discovered on Friday, March 29th 2024, by Andres Freund. Andres is an engineer at Microsoft who noticed performance issues with SSH while contributing to the Postgres project. Andres wasn't looking for security issues, but after digging into the problem further had discovered an attempt to subvert SSH logins across multiple Linux distros.

This was a social engineering attack to gain elevated access to a project, also known as an "insider threat". An account named "Jia Tan" had begun contributing to the xz-utils project soon after the original maintainer had announced on the mailing list that they were struggling with maintenance of the project. Through the use of multiple sock-puppet accounts pressuring the maintainer and over a year of high-quality contributions, eventually Jia Tan was made a release manager for the project.

"Jia Tan may have a bigger role in the project in the future. He has been helping a lot off-list and is practically a co-maintainer already. :-)"

— xz-utils maintainer, Lasse Collin

Over time a series of small subversive changes were made to the project all culminating in a tainted release artifact that put the backdoor in motion. Luckily for all of us, Andres discovered the attack before the new version was deployed more widely.

How is Python similar to xz-utils?

Pablo Galindo Salgado, Steering Council member and the release manager for Python 3.10 and 3.11, brought this topic to the Language Summit to discuss what could be done to improve Python's security model in the wake of the xz-utils backdoor.

Pablo noted the similarities shared between CPython and xz-utils, referencing the previous Language Summit's talk on core developer burnout, the number of modules in the standard library that have one or zero maintainers, the high ratio of maintainers to source code, and the use of autotools for configuration. Autotools was used by Jia Tan as part of the backdoor, specifically to obscure the changes to tainted release artifacts.

Pablo confirmed along with many nods of agreement that indeed, CPython could be vulnerable to a contributor or core developer getting secretly malicious changes merged into the project.

"Could this happen in CPython? Yes!" -- Pablo

For multiple reasons like being able to fix bugs and single-maintainer modules, CPython doesn't require reviewers on the pull requests of core developers. This can lead to "unilateral action", meaning that a change is introduced into CPython without the review of someone besides the author. Other situations like release managers backporting fixes to other branches without review are common.

There was also an emphasis on "binary files", like wheels, images, certificates, and test data that is checked into the CPython repository. Today some of this data doesn't have a known "upstream" or source where it was generated from making introspection difficult. Part of the xz-utils backdoor utilized binary test data in order to smuggle code into the release artifacts without being reviewed by other developers.

So what can be done?

There aren't any silver bullets when it comes to social engineering and insider threats. Barry Warsaw and Carol Willing both emphasized the importance having an action plan in advance for what to do if something similar to the xz-utils backdoor were to happen in order to promptly fix the issue and alert the community.

Thomas Wouters asked the group whether the xz-utils backdoor was a serious enough event to force a new workflow to be adopted by core developers. Thomas noted that mandatory review of all pull requests had been discussed previously and wasn't adopted at the time, but also wasn't discussed as a security issue like it is today. There's been a hesitance to break peoples' workflows or make it impossible to get bugs fixed. This change would also require a cultural change to make asking for code reviews more common amongst core developers to be effective.

Carol Willing concurred, noting that almost every other project she's contributing to requires reviews for all pull requests.

Guido van Rossum was less convinced that having additional review would help much for security. Guido was more concerned about who is given "commit bit" (write access) in the first place, asking for a higher bar such as whether someone had met the person in real life, at a conference, or over a video call.

Mariatta agreed with verifying identities of core developers, including requiring updates to reconfirm the identities of individuals noting that this is commonplace for employment. Mariatta noted that the contributions being done by CPython core developers is of equal or more importance than any individuals' employment.

Some doubt was thrown on verifying identities, especially via video call, as it's now not unheard of for someone being interviewed for employment over a video call to be different from the person who shows up on the first day of work.

Hugo van Kemenade remarked on removing inactive core developers, noting that it's already documented in the CPython developer guide that inactive or unreachable core developers can be removed with or without notice. There was agreement within the group that this should be done more actively to reduce the chances that unattended privileged accounts are resurrected by malicious actors.

There was some discussion about removing modules from the standard library, especially modules which are not used or have no maintainers. Toshio Kuratomi cautioned that moving modules out of the standard library only pushes the problem outwards to one or more projects on PyPI. Łukasz Langa concurred on this point referencing specifically the "chunk" module removed via PEP 594 and feeling unsure whether the alternative project on PyPI should be recommended to users given the author not being reachable.

Overall it was clear there is more discussion and work to be done in this rapidly changing area.

The Python Language Summit 2024: Should Python adopt Calendar Versioning?

 

Hugo van Kemenade, the newly announced Release Manager for Python 3.14 and 3.15, started the Language Summit with a proposal to change Python's versioning scheme.

Hugo's view of kicking off the language summit!
(Photo credit: Hugo van Kemenade)

The goal of Hugo's proposal was to make expectations around versioning, backwards compatibility, and support timelines clearer for Python users.

On the surface, Python's versioning might appear to be Semantic Versioning (SemVer) due to its three-part version and infamous set of backwards incompatible changes known as Python 3. Hugo noted that the publication of Python 1.0.0 (1994) and what would become the Python versioning scheme predates the publication of SemVer by at around 15 years (2009).

The perception of Python using semantic versioning is a source of confusion for users who don't expect backwards incompatible changes when upgrading to new versions of Python. In reality almost all new feature releases of Python include backwards incompatible changes such as the removal of "dead batteries" where PEP 594 marked 19 modules for removal in Python 3.13.

Calendar Versioning (CalVer) encompasses a wide array of different versioning schemes that have one property in common: using the release date as part of a release's version. Calendar-based versions vary quite widely, but typically include a two or four digit year (YY or YYYY) and sometimes a month or day (MM and DD).

Using years in versions is quite common amongst other programming languages, operating systems like Ubuntu, and tools like Black, pip, and PyCharm.

Examples of other programming languages using calendar-based versioning like Ada, Algol, C, C++, Fortran, and JavaScript
Slide from Hugo's presentation showing programming languages using calendar-based versioning like Ada, Algol, C, C++, Fortran, and JavaScript

Since 2019, Python has made releases according to the new yearly cadence from PEP 602. Moving to annual releases made it possible for downstream distributors to rely on when a new Python version appears, which brings newer Python versions to users faster.

Each minor release receives 5 years of security fixes. Using the release year of 2026 as an example, users could add 5 years and know they'll receive security fixes on that minor release until 2031. Figuring out this information from "3.15" in the existing versioning scheme would require another lookup, typically to the release schedule PEP.

If the year were baked into the version, one wouldn't need to see the release schedule to know when support was ending, instead one could add 5 years to the year encoded in the version (e.g. for "3.26", 26 + 5 = 31, therefore security support ends in 2031).

Hugo offered multiple proposed versioning schemes, including:

  • Using the release year as minor version (3.YY.micro, "3.26.0")
  • Using the release year as major version (YY.0.micro, "26.0.0")
  • Using the release year and month as major and minor version (YY.MM.micro, "26.10.0")

There were discussions about other options beyond these amongst attendees.

Thomas Wouters, release manager for 3.12 and 3.13, questioned the value-add for adopting a new versioning system. Thomas noted that while the current system is confusing, changing the system in any way also adds confusion for users. Hugo responded that clarity, especially support for security fix and end-of-life dates, was the biggest motivation.

Barry Warsaw wondered if there was a way to test potential new versioning scheme ahead of time to find potential problems. Hugo referenced the deadsnakes project which builds distributions of CPython for Ubuntu. The deadsnakes project previously created a build of Python 3.9 that modified the version to be "3.10" to help discover breakages in projects assuming a single-digit minor version. Hugo also had experience using static code analysis to find other version assumptions in Python projects.

"Python 3 is a brand at this point, and we should stick to it" said Guido van Rossum after sharing concerns that changes to the major version would break the ecosystem more than changes to the minor version. Others voiced concerns about changing the major version "3" including in the "python3" binary and for packaging such as "abi3" tag.

Carol Willing noted that many projects are relying on Python's versioning system and already have those versions "baked in" to warnings in existing releases. Hugo confirmed this is a problem, including Python itself, which had a few deprecation warnings and messages that reference future Python versions like 3.15. Hugo's plan would be to update these versions for Python, give plenty of time before the new versioning scheme took affect.

Donghee Na offered up Rust's use of "yearly editions" in the branding of their releases, where the version number is completely separate from the branding of the release. Hugo was concerned that this would add another layer of confusion and would mostly repeat information already found in the release schedule.

Overall the proposal to use the current year as the minor version was well-received, Hugo mentioned that he'd be drafting up a PEP for this change.

Carl Meyer cautioned against making any changes to the version scheme before 2026 in order to preserve the 3.14 "π"-thon release which received approval and laughter from the room. Sounds like whatever happens we'll get to have our pie and eat it too. 🥧