Thursday, May 14, 2020

Lightning Talks Part 2 - Python Language Summit 2020


Zac Hatfield-Dodds

Teaching Python with Errors




When a new coder begins learning Python, the first Python feature they usually see is SyntaxError. In Zac Hatfield-Dodds's experience, novices meet these errors practically as soon as they start typing, and they will spend most of their time over the following months struggling with them. Since experienced programmers rarely encounter syntax errors and easily fix them, the core team has not built very good tooling for them, and the official Python tutorial doesn't cover errors until Section 8. In any case, documentation is not the place to fix novices' user experience, since they don't know where to look for help. The only place to fix it is in CPython.

Read more 2020 Python Language Summit coverage.

SyntaxError does little to help a beginner. It directs their attention to the spot after the token that caused the error. In this example from the tutorial, the caret points at the last letter of print, but the coder's mistake was omitting a colon after True:
>>> while True print('Hello world')
  File "<stdin>", line 1
    while True print('Hello world')
                   ^
SyntaxError: invalid syntax
Hatfield-Dodds proposed more precise errors that tell users about mismatched parentheses, unterminated string literals, missing commas and colons, and so on. Pablo Galindo Salgado said, via Zoom chat, that Python 3.8 has improved some error messages, for example:
>>> (1+3+4))
  File "<stdin>", line 1
    (1+3+4))
           ^
SyntaxError: unmatched ')'
The new parser for 3.9 might improve error messages further, although in the short term it requires more work just to bring it to parity with the current parser.

Hatfield-Dodds suggested CPython could implement "did you mean..." for both SyntaxErrors and NameErrors, by fuzzily searching for replacements for typos. Incremental improvements to SyntaxError could be funded by educational institutions, he guessed, and would make a good project for contributors from outside the core team, if the core developers are willing to guide them. "I care a lot about this," said Hatfield-Dodds, and paused to let out a big exhale. "If people's first exposure to errors in Python is an error that tells them what the problem is and how to fix it, we might even convince them to read error messages in the future, which would be magical."

Jim Baker

State of Jython




"We are not dead yet!" said Jim Baker to the Language Summit. He admitted that "this is something I've said many times about Jython in the past." The project is certainly behind CPython—it has just published a bugfix release of 2.7, and Python 3 support is far off—but it is making progress nevertheless. "And again," said Baker, "our apologies."

Jython's previous bugfix release was nearly two years ago; the main topics of the latest version, Jython 2.7.2, were an overhaul Jython's PyJavaType objects, and solutions to deep race conditions. Baker said there is still an active user base for Jython based on the response to Jython's recent betas, but "capacity in the project is low." The project is currently led by Jeff Allen with two other regular contributors, none devoted to Jython full-time. Emeritus developers chip in occasionally. Baker hopes there will be more interest once Jython 3 ships, but he wrote in his slides, "the journey is unpredictable and resources are few."

There are several Python 3 implementations on the JVM, but none is ready to use. Isaiah Peng made a solo attempt at implementing Jython 3 in 2016-2017. It is too late now to resume this work, because Peng's branch didn't pull changes from the main Jython repository and they have now diverged too far. Baker said Jython should copy ideas from this prototype and credit Peng's work. Since 2016, Jeff Allen has been writing the Very Slow Jython Project, "a project to re-think implementation choices in the Jython core, through the gradual, narrated evolution of a toy implementation." Independently of the Jython team, Oracle is actively building an experimental Python 3 implementation on the Graal (pronounced "grail") JVM. "It's fantastic," said Baker, but unlike Jython "it doesn't do this beautiful subclassing of Java classes with Python classes."

The plan for Jython is to target the Python 3.8 language, including type hints. Baker has prototyped code to generate Python type hints from Java classes. The team will overhaul the core implementation using modern Java features, and continue to emphasize Jython's strengths: convenient integration with Java libraries, speed equal to CPython or better, and high concurrency (unlike CPython). Baker speculated that Jython 3.8's asyncio module could be built on the high-performance Netty library. He hopes that the new HPy API will take off, because it would simplify supporting C extensions from Jython.

Eric Holscher

Read the Docs features of interest




Holscher's presentation was an advertisement for nifty additions to ReadTheDocs, and an enticement to move CPython's documentation there.

ReadTheDocs recently added the hoverxref feature; when a reader hovers their cursor over a link in a documentation page, a tooltip shows the content of the linked section. Holscher has forked the CPython docs to host them on ReadTheDocs and demonstrate this feature's utility.
ReadTheDocs also has nicer pull request integration than the CPython repository does. Currently, when contributors offer pull requests to CPython, the patched documentation is available for download as a zip archive of HTML files. ReadTheDocs goes one better; its PR builder publishes the patched docs to the web for review. Search engines are blocked from indexing these docs, and each page displays a warning that it was created from a pull request. (After the Summit, at the core developers' request, the ReadTheDocs team enabled this feature for pull requests to the Python Developer Guide, which is hosted on ReadTheDocs.)

Finally, Holscher claimed that ReadTheDocs's text search is better than what CPython uses, which is generated by Sphinx. ReadTheDocs's search results include direct links to pages' subsections, and they provide search-as-you-type.

Sanyam Khurana commented via Zoom chat, "This looks very promising and amazing!" Pablo Galindo Salgado suggested hosting the PEPs on ReadTheDocs as well.

"Some of this is beta," said Holscher. Nevertheless, it's exciting to consider how much better the reader experience would be if CPython migrated. He argued that CPython should benefit from future improvements in ReadTheDocs, and ReadTheDocs should benefit from the attention of CPython developers. "We’ve talked about this in the past, there were blockers," he said, but the ReadTheDocs team has now addressed them.

Mariatta Wijaya

Make your life happier (with Zapier)




"I want you all to do more automation in your lives," said Mariatta Wijaya. She acknowledged that Zapier is her employer, but her intention was pure. "I know you're volunteering for open source. You should use more automation and save time."

For a demonstration, Wijaya showed the Zapier workflow she had used to invite attendees to the Language Summit. "You all received calendar invites for this event," she said. "I did not send them by going to Google Calendar." Instead, she collected names and email addresses in a Google spreadsheet. For each attendee, once she obtained a recording waiver and consent to the code of conduct, she put a "y" in the attendee's row in the spreadsheet. Her Zap then sent the invite automatically.



Earlier in the summit, some core developers had complained about the firehose of Discourse emails. Zapier has a Discourse integration that can manage this torrent. A user can create a "Zap" that takes new Discourse messages, filters them according to keywords or other attributes the user chooses, then triggers an email, Slack notification, or some other action. Wijaya also described how Zapier automates onboarding new PyLadies organizers.

Conclusion

"Well, we made it," said Łukasz Langa at the end of the second day of the videoconference. "I'm sorry it was not what a real Python Language Summit could have been, but I hope it was better than nothing." His co-organizer Mariatta Wijaya congratulated attendees from all over the globe for staying up. "I know this is past bed time for many of you." It was 8pm Pacific Time, the middle of the night for attendees in Europe and Africa, and the morning of the next day in Asia. She thanked the PSF and PyCon staff, and MongoDB for sponsoring the Summit.

Victor Stinner added, "Thanks TCP/IP for making this possible."

Sumana Harihareswara said, "This was real to me."

Wednesday, May 13, 2020

Call for Volunteers! Python GitHub Migration Work Group

Call for community volunteers! It is time to assemble a Python Work Group that will aid in Python’s migration to GitHub!
PEP 581 was accepted and Python is now starting to plan for the actual migration (per PEP 588)!
We are looking for volunteers to participate in a work group that will be involved with Python’s migration from bugs.python.org to GitHub. We want to make sure the directions this migration takes represents what the community needs!
The transition will be completed with the assistance of GitHub’s team and the PSF will be contracting a Technical Project Manager to assist. Certain Steering Council members and PSF staff members will be part of this work group as well!
The work group will help review contractor resumes, weigh in on discussions, help guide decisions, help get community input, and have a close overview of the entire project. If you have experience with Roundup and GitHub we want to hear from you! We want to ensure that through this Work Group we have a wide range of users represented. The discussions and decisions will help mold what the final outcome will be.
Fill out this application form by May 27: https://forms.gle/jivuUdgViQPU4rKh8. We will reach out soon after.

6 Ways Salesforce Gets Things Done with Python

Salesforce Engineering puts Python to work across many areas of their business. 

Read on to see how they use python in machine learning, security, internal devops teams and more.

The Python programming language has strong ties to both engineering and science disciplines, which gives its users access to a wide number of libraries to solve both practical and theoretical problems. We put it to work across Salesforce.org (our non-profit product arm), Heroku, Salesforce Einstein, Industries and Service Clouds, internal devops teams, and more.


Here are 6 things we use Python to do (and you can too!) through projects we’ve open sourced:


1. Conquer the Natural Language Decathlon by performing ten disparate natural language tasks (DecaNLP).

Deep learning has significantly improved state-of-the-art performance for natural language processing (NLP) tasks, but each one is typically studied in isolation. The Natural Language Decathlon (decaNLP) is a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. By requiring a single system to perform ten disparate natural language tasks, decaNLP offers a unique setting for multitask, transfer, and continual learning.

2. Create SSL/TLS client fingerprints that are easy to produce on any platform and can be easily shared for threat intelligence (JA3).

JA3 gathers the decimal values of the bytes for the following fields in the Client Hello packet; SSL Version, Accepted Ciphers, List of Extensions, Elliptic Curves, and Elliptic Curve Formats. It then concatenates those values together in order, using a “,” to delimit each field and a “-” to delimit each value in each field.

3. Use Google Sheets like tables in code (pygsheetsorm).

Ever wanted to be able to use a Google Sheet like a table in your code? How about if you could get a list of objects that automatically mapped column headers into properties? Then this project is for you! This is a simple interface on top of pygsheets.

4. Get rid of silent errors in Perforce syncs (o4).

At Salesforce, we use Perforce at a very large scale. A scale that exposes some shortcomings in p4 itself. o4 was created to improve reliability of a sync and increase scalability in our very large-scale CI. What that boils down to is the rather horrendous reality that a p4 sync makes most of the changes to your local files. o4 allows you to continue using Perforce and all the associated tools and IDE plugins, without the uncertainty around a sync. Every sync is guaranteed perfect, every single time. In the rare occurrence that a sync could not be met to 100%, o4 will fail loudly. Crash and burn. No more silent errors! In addition to that, o4 allows some dramatic improvements to CI.

5. Automatically verify, de-duplicate, and suggest payouts for vulnerability reports through HackerOne (AutoTriageBot).

This bot can automatically verify reports about XSS, SQLi, and Open Redirect vulnerabilities (via both GET and POST). In addition, it is built in a modular manner so that it can be easily expanded to add tests for other classes of vulnerabilities.

6. Run continuous integration from the command line for Salesforce Managed Package applications (CumulusCI).

Out of the box, CumulusCI provides a complete best practice development and release process based on the processes used by Salesforce.org to build and release managed packages to thousands of users. It offers a flexible and pluggable system for running tasks (single actions) and flows (sequences of tasks) and an OAuth-based org keychain allowing easy connection to Salesforce orgs and stored in local files using AES encryption.

.  .  .


Still want more Python? Check out all of the Salesforce Open Source projects built in Python on GitHub.

WRITTEN BY

Laura Lindeman
Voracious reader & crafter of words. organizer extraordinaire. #peoplegeek at Salesforce on the Tech & Products Innovation & Learning team. 

Capital One - Lessons From Adopting Python as a Team

By Akshay Prabhu, Software Engineering Manager, Capital One

Rewriting Legacy ETL Jobs in Python When You’re Not Python Devs


So how does a team of six engineers - heavily experienced in web development in languages like ReactJS, NodeJs, and Java - go about adopting Python into their work?

The application development and cloud computing technology landscape is always changing and an important part of our role as engineers is to stay up to date on those changes. Sometimes it is through solo work - such as learning a new framework or skill. But sometimes it is through team-based work - such as adopting and migrating a whole project to a new language. 

Like most other engineers, I’ve experienced this kind of team-based work multiple times in my career, and I recently went through it in my current role as a Software Engineering Manager at Capital One. In my role I am working on leading a team of engineers to develop highly scalable web applications, including both API and UI layers. As part of our journey to migrate applications to the cloud, we were also involved in rewriting a lot of legacy ETL jobs originally built using licensed tools. Most of my past engineering experience before these re-write efforts for ETL was around NodeJS, Java, and Ruby on Rails; and until a year ago I had not worked with Python. The same was true with most of the engineers on my team.

In fact, our entire team consisted of experienced engineers who have delivered multiple web and distributed applications in the cloud, but none had exposure to ETL or data-driven projects.

About one month into the rewrite efforts we were hitting limitations around using Java to migrate legacy system code. We wanted to be able to achieve simple File Operations, as well as complicated queries using Spark, but with dynamically typed language and minimal bootstrap code. This was one of our reasons for considering whether it was time to switch languages.

Why Did We Decide On Python?


For our project, we faced the huge task of re-writing multiple jobs running on a legacy ETL platform. This involved enterprise API integrations, as well as complex data analysis and refinement.


Inflexibility of Existing Languages

Due to the nature of these jobs, none of the languages we were most experienced with were a great choice. That’s because they were:
  • Static typed languages like Java 
  • Involved heavy bootstrap code 
  • Lacked extensive support for data manipulation libraries such as Pandas
  • Lacked extensive external community support for Spark integrations or data analysis

Flexibility with Python: API Integration, Data Analysis, and Others

Python seemed to be a good choice for us as it was flexible enough to support a wide array of use cases. It also fit in well as it was:
  • Dynamically typed
  • Supported re-writing Bash based or ETL jobs in fewer lines of code
  • Had a well supported REST interface 
  • Had excellent support for data manipulation libraries such as Pandas and Spark

There were a few other areas that really cemented our use of Python - these were File Operations and Community Support.

File Operations

File Operations were key to our project as our process involved reading multiple source files
in parquet format, extracting, refining, and producing new files. We needed a language that could easily integrate with SPARK on HDFS for complicated and larger datasets, as well as an equally powerful library for smaller datasets like Pandas. Hence, File Operations were at the center of our re-write. 

Python worked well in both cases, including:
  • Well suited for simple file manipulations using Pandas
  • Worked in complex scenarios using Spark Queries on HDFS
  • Needed significantly fewer lines of code to accomplish this than in Java or NodeJS
  • Could work for File Operations in memory when source files are few MBs 
  • Simple enough to make API calls for various Enterprise Layers

Community Support

As we were assessing adopting a new language, Community Support both within Capital One and without was a key for us. We wanted a language which was well supported by an active open source community that:
  • Constantly updates security enhancements
  • Resolves outstanding questions or issues
  • Actively merges new feature requests from engineers 
Outside of Capital One
  • Python has a much more extensive community of engineers in the Data Analysis space as compared to Java
  • PySpark has much better support than Spark Integration with Java

Within Capital One
  • Capital One has a very active community of Python engineers and experts to help teams get started and maintain their Python projects
  • This internal community allowed us to seek guidance, as well as go through multiple code reviews

How We Got Started with Python

Like any other programming language, we started by defining our source of truth for standards. We went with PEP8 which is the standard that defines Coding Style Guides for Python. 
We went through the below key stages from Planning to Production.


Key to learning and adopting a new language was putting in time for foundational work; automating compliance with PEP8 and adopting Py Tooling like Black and Flake8.

Let’s go through some of the key elements to these stages.

Define Standards and Automating Adoption

As our team was new to Python, we spent the first initial few days defining standards on how we would code to comply with PEP8. But in addition to adoption, we needed to automate our workflow to comply with these standards.
  • Added a Pre-Commit Hook for Black which automatically formats code on local commits.
  • Black didnt catch all violations, which is where Flake8 came in.
  • Flake 8 installed as a Pre-Commit hook stopped any code commits where there were outstanding compliance errors with PEP8.

After automating our workflow to comply with standards and a base repo, we started with the core dev work.

Logging

This was key given Python is dynamically typed; logging was our solution to better track problems.

  • As a Team we Decided on a common Logging format:

``` Code Block
requestid - machine_instanceid
timestamp -    YYYY-MM-DD HH:MM:SS,milliseconds
loglevel    - INFO, ERROR
modulename    - function_name
state -    START/END/INPROGRESS
type -    SCRIPT/EXTERNAL_API/etc.
modresponse -    Success/Error
duration - Tracking External API calls
message -   custom message as needed
errormessage - err message
```
  • We leveraged the ELK stack (Elasticsearch, Logstash, and Kibana) for logging.
  • By using Kibana as the Web Application UI for our logs, we could see our execution details as well as trace down exceptions in Kibana.
Critical takeaways
  • Adopting a simple library called requests to handle our API calls.
  • Spark v/s Pandas: When you perform operations on a dataframe in Spark, a new dataframe/reference is created which is by design. This works well with large datasets but is a hindrance when the dataset is smaller. Hence for filtered smaller datasets under 5MB, we decided to go with Pandas for quick data frame manipulations. 
  • Automation for compliance to coding standards was a huge time saver as most of our team was new to Python.
  • We quickly realized for us to test along with all the dev work, we needed a TDD approach where pytest came into our workspace. This proved extremely helpful.

Was it Worth the Effort?

In addition to being the right choice of tool for our job, exploring and learning Python allowed the team to work more closely together and bond more than ever. 
We as a team learned something new together and solved multiple issues as we hit walls; which took our team bonding to great heights!

Given the nature of our project, Python did fit in very well. It’s dynamically typed and the libraries are feature-rich, working for both simple API calls as well as complex operations around data transformation and filtering. As engineers, we worked together as a team to “learn how to learn” and adopt a new language the right way. With the help of Capital One’s Python Gurus, we adopted all the best practices for developing a working production application and delivered this project before the committed deadline. I am lucky to be part of such an awesome team, as well as to work with Capital One Python experts like Steven Lott, which was critical for getting us across the finish line. 


I hope this has been helpful and would love to learn what languages you and your team have adopted recently; especially in data-driven projects like this, let me know in the comments!

Resources


***
These opinions are those of the author. Unless noted otherwise in this post, Capital One is not affiliated with, nor is it endorsed by any of the companies mentioned. All trademarks and other intellectual property used or displayed are the ownership of their respective owners. This article is © 2020 Capital One.

Sunday, May 10, 2020

CPython on Mobile platforms - Python Language Summit 2020


"We've got very big news on Android," Russell Keith-Magee told the Language Summit. "We're close to having a full set of BeeWare tools that can run on Android."

The BeeWare project aims to let programmers write apps in Python for Android, iOS, and other platforms using native UI widgets. Keith-Magee reported that BeeWare has made good progress since his Summit presentation last year. On iOS, "Python worked well before, it works well now," and BeeWare has added Python 3.8 support. Until recently, however, Python was struggling to make inroads on Android. BeeWare's Android strategy was to compile Python to Java bytecode, but Android devices are now fast enough, and the Android kernel permissive enough, to run CPython itself. With funding from the PSF, BeeWare hired Asheesh Laroia to port CPython to Android.

Read more 2020 Python Language Summit coverage.

A top concern for BeeWare is distribution size. Python applications for mobile each bundle their own copy of the Python runtime, so Python must be shrunk as small as possible. There have been proposals recently for a "minimum viable Python" or "kernel Python", which would ship without the standard library and let developers install the stdlib modules they need from PyPI. (Amber Brown's 2019 Summit talk inspired some of these proposals.) Keith-Magee said a kernel Python would solve many problems for mobile. He also asked for a cross-compiling pip that installs packages for a target platform, instead of the platform it's running on. Senthil Kumaran observed, "BeeWare, MicroPython, Embedded Python, Kivy all seem to have a need for a kernel-only Python," and suggested they combine forces to create one.



To regular Python programmers, the mobile environment is an alien planet. There are no subprocesses; sockets, pipes and signals all behave differently than on regular Unix; and many syscalls are prohibited. TLS certificate handling on Android is particularly quirky. For the CPython test suite to pass on mobile it must skip the numerous tests that use fork or spawn, or use signals, or any other APIs that are different or absent.

Adapting CPython for life on this alien planet requires changes throughout the code base. In 2015 Keith-Magee submitted a "monster patch" enabling iOS support for CPython, but the patch has languished in the years since. Now, he maintains a fork with the iOS patches applied to branches for Python 3.5 through 3.8. For Android, he maintains a handful of patch files and a list of unittests to skip. Now that Android support is maturing, he said, "We're in a place where we can have a serious conversation about how we get these changes merged into CPython itself."

A prerequisite for merging these changes is mobile platform testing in CPython's continuous integration system. Currently, Keith-Magee tests on his laptop with several phones connected to it. As he told the Summit, he's certain there is a CI service with physical phones, but he has not found it yet and hasn't invested in building one. He develops BeeWare in his spare time, and CI is not the top priority. "Funding is one thing that makes stuff happen," he said. He thanked the PSF for the grant that made Android support possible. Mobile Python suffers a chicken-and-egg problem: there is no corporate funding for Python on mobile because Python doesn't support mobile, so there is no one relying on mobile Python who is motivated to fund it.

Keith-Magee asked the Summit attendees to be frank with him about bringing mobile Python into the core. He asked, "Do we want this at all?" If so, the core team would have to review all patches with their mobile impact in mind, as well as reviewing mobile-specific patches. "What is the appetite for patches that require non-mobile developers to care about mobile issues?" The decision would involve the whole core team and many community discussions. Guido van Rossum endorsed good mobile support long-term. So did Ned Deily, adding, "To actually do it will require money and people. Bigger than many other projects."

Thursday, May 07, 2020

Core Workflow updates - Python Language Summit 2020



The PyCon 2015 sprint was the first time this blogger contributed to Python—or rather, I tried to. The three patches I submitted that year are awaiting a review to this day. In recent years, however, the core team has made bold changes to their development workflow to make their tasks easier, to spread responsibility more widely, and to improve the experience of contributors. When I submitted a patch in the 2018 sprints it was reviewed and merged in a few weeks. Mariatta Wijaya has led many of these changes. She presented the latest updates to the core workflow.

Read more 2020 Python Language Summit coverage.

"This is the third Language Summit in which I've talked about GitHub issues," sighed Wijaya. Last year she urged the core team to make a decision about replacing the quirky old bug tracker on bugs.python.org. Two weeks later they approved PEP 581: Using GitHub Issues. Its sequel, PEP 588: GitHub Issues Migration plan, is still in progress. "I really care about this topic," said Wijaya, but she and her collaborators have mostly focused on other projects for the last year.

There are signs of progress, however. Wijaya began a wish list where Python developers can request features for GitHub's issue tracker that will make it a full replacement for bugs.python.org. The PSF is seeking to hire a project manager for the migration. Wijaya, Brett Cannon, and others have met with GitHub staff, who have offered to import old issues in bulk. The migration will be "like throwing a big switch," said Cannon; there must be a great deal of planning behind the scenes before it is thrown.

In 2018, the Python developers created a web forum (running on Discourse) as a potential replacement for their several mailing lists. Since then, they have contended with both mailing lists and the web forum. Wijaya asked, "We've been using Discourse for a couple of years, is it time to make some final decisions?" Guido van Rossum said that using Discourse via email is clunky, but if he silenced its emails he would forget to check it on the web. Brett Cannon hypothesized that people who have tuned their personal systems for managing email prefer to use only email, and those who have not, prefer Discourse. The Summit attendees reached no conclusion.

A new team of Python contributors formed last summer: the Python Triage Team. Triagers can close issues, edit issue labels, etc., and they can label or close pull requests in the Python organization's GitHub repositories. Unlike core developers, they cannot merge pull requests. Their power to manage issues extends to all the core developers' repositories, including CPython, the developer guide, and the source code for Python's GitHub bots. New members join the team when they are invited by core developers; they can self-nominate by asking a core dev to vouch for them. Wijaya said several triagers have graduated to the core team, and she wants this pipeline to keep flowing. New core developer Kyle Stanley recommended better promotion of the triager-to-core-dev career path. "For myself and several recent candidates, it seems to have worked excellently as a stepping stone."

Tuesday, May 05, 2020

Python Developers Survey 2019 Results

We are excited to share the results of the third official Python Developers Survey conducted by the Python Software Foundation with the help of JetBrains.
More than 24,000 Python users from over 150 countries took part in the survey this past November. With the help of the data collected, we are able to present the summarized results, identify the latest trends, and create a Python developer profile.
The survey results cover a broad list of topics. Some of the key things you may learn from the report include: main motivations for Python usage, current popular frameworks, libraries, tools and languages, and many other insights.
In all likelihood, there are plenty of potential findings that were not included in the report. If you have specific questions that are unanswered, send them to us and we’ll dig into the data for additional analysis. You also have the opportunity to delve deeper into the raw survey data and uncover your own amazing facts.
We’ll be delighted to learn about your findings! Please share them on Twitter or other social media, mentioning @jetbrains‏ and @ThePSF with the #pythondevsurvey hashtag. We’re also very open to any suggestions and feedback related to this survey which could help us run an even better one next time. Feel free to open issues here with any comments or questions.
Many thanks to everyone who participated in the survey and helped us map an accurate landscape of the Python community!

Monday, May 04, 2020

Python’s migration to GitHub - Request for Project Manager Resumes

The Python Software Foundation is looking for a Project Manager to assist with CPython’s migration from bugs.python.org to GitHub for issue tracking. CPython's development partially moved to GitHub in February 2017. All other projects within the PSF's organization are hosted on GitHub and are using GitHub issues. CPython is still using Roundup as the issue tracker on https://bugs.python.org (also known as “bpo”). To read more about the rationale behind this migration, read PEP 581.
Thank you to GitHub for donating financial support so this project can begin.

Timeline

  • May 4 - Requests for resumes opens
  • June 4 - Requests for resumes closes
  • June 12 - Final decision will be made on proposals received
  • June 22 - Work will begin

Submitting a proposal

Resumes should be submitted as Portable Document Format (PDF) files via email to Ewa Jodlowska.


Role description

Goal

Support Python through the full transition from bugs.python.org to GitHub with the assistance of GitHub’s migration team.

Tasks

These tasks are from PEP 588 and a meeting the Steering Council had with GitHub. This is not an exhaustive list of all tasks, but an overview of what the work will most likely entail. The migration will be completed by GitHub and the Project Manager will work with the Python team to steer that migration. 
  • Create a timeline for the project with the Python team and GitHub team
  • Find out from the community the context behind GitHub search limitations, why bugs.python.org search is sometimes preferred.
  • Research the Contributor License Agreement (CLA) process and how it can be achieved outside of bugs.python.org. Work with interested contractors, volunteers, and the PSF’s Director of Infrastructure on a solution.
  • Work with GitHub’s migration team and Python’s community on how mapping of fields should work from bugs.python.org to GitHub
  • Work with GitHub’s migration team on the transition from bugs.python.org and be the Python point of contact for GitHub. This includes helping field questions from GitHub to the Steering Council/core devs and vice versa.
  • Assist the Python community with creating guidelines on how people are promoted to Python’s triage team.
  • Obtain from GitHub a list of projects that have bots built that may help Python with "nosy lists"
  • Oversee the creation of the new workflow on GitHub
  • Assist with the creation of GitHub labels and templates when necessary 
  • Oversee the creation of the sandbox issue tracker on GitHub to experiment and test the new workflow
  • Ensure that the sandbox received adequate testing from the Python team
  • Update the devguide with that new process ahead of the migration and communicate it to the core developers
  • Communicate with PSF staff on a regular basis when necessary and provide monthly reports via email.
Note: Some of the work (for example the CLA process) may require additional hired help from outside contractors. Decisions on these tasks will be made after the Project Manager’s research and review is presented to the PSF staff and Steering Council.

Estimated budget

The budget is capped at $30,000 for this project.

Necessary Skills

  • Excellent time management skills
  • Must be very organized, punctual, and detail-oriented
  • Experience working with volunteers 
  • Excellent written and verbal communication
  • Experience working with software development teams (remotely is a plus)
  • Ability to balance demand and prioritize
  • Experience working with GitHub 
  • Experience with GitHub APIs is a plus
  • Experience working with Roundup is a plus

Questions?

Contact Ewa Jodlowska, PSF’s Executive Director..

Sunday, May 03, 2020

Property-Based Testing for Python builtins and the standard library - Python Language Summit 2020


Zac Hatfield-Dodds opened his presentation with a paraphrase of the economist Thomas Schelling:
No matter how rigorous her analysis or heroic his imagination, no person can write a test case that would never occur to them.
Hatfield-Dodds told the Language Summit, handwritten tests are "fantastic for testing particular edge cases, they're great regression tests," but they're limited by the developer's understanding of the system under test. "We can't write tests for bugs we don't know could occur." We can overcome this limit with exhaustive testing, checking our code's behavior with every possible input; if that is impractical, coverage-guided fuzz testing can generate random inputs and evolve them, trying to explore every branch in the code under test. Fuzzers are very good at finding inputs that crash a program, but they're not as well suited for finding other kinds of bugs.

Read more 2020 Python Language Summit coverage.

For testing the Python standard library, Hatfield-Dodds proposed a different technique: property-based testing. (He is one of the leaders of the Hypothesis property-based testing project.) A property-based test framework doesn't generate totally random input like a fuzzer; it can generate structured inputs such as lists of numbers, or only sorted lists, or instances of a certain object. Unlike handwritten tests, which usually assert that a particular input produces one exact output, property-based tests assert properties of a function, for example that its output is sorted, or that a function is idempotent or commutative.

Hatfield-Dodds presented the following Hypothesis test of a JSON codec:
@given(
    value=st.recursive(
        st.none() | st.booleans() | st.floats() | st.text(),
        lambda x: st.lists(x) | st.dictionaries(st.text(), x),
    )
)
def test_record_json_roundtrip(value):
    assume(value == value)
    assert value == json.loads(json.dumps(value))
The recursive input generator can create None, booleans, floats, text, or lists or dictionaries that contain such values, and so on recursively. Within the test function, the assume statement checks that the input is equal to itself, to avoid inputs with nan, which is not. The heart of the test is the assert statement.

(The above example still has troubles with nan, see Hatfield-Dodds' PyCon Australia talk.)

Hypothesis searches for bugs by randomizing the input, or trying interesting values that tend to trigger edge cases, or retrying inputs that triggered bugs in previous runs. When Hypothesis finds a bug, it evolves the input, searching for the simplest input that reproduces the same bug.

"I want you all to write property-based tests for CPython, for builtins, for PyPy, for everything," said Hatfield-Dodds. He proposed to write new tests, or port existing ones to a property-based test framework, run them in CPython's continuous integration suite, and share them among the Python implementations. These tests could use Hypothesis; they could also be integrated with the AFL fuzzer or used in Google's OSS-Fuzz project. He presented a repository of tests demonstrating the technique for standard library modules such as gzip, re, and datetime. There is even a test that can generate random, valid Python code to fuzz-test the Python parser.

Łukasz Langa mentioned that David MacIver had used Hypothesis to test a Python code formatter and found dozens of bugs.

Paul Ganssle told the Summit that he used property-based testing for his implementation of datetime.fromisoformat. When the function was merged into the standard library the property-based tests were not. In subsequent development Ganssle introduced a segfault bug that "almost certainly would have been caught" if the original tests had still been running. He strongly endorsed Hatfield-Dodds's idea. He added that property-based testing is especially good at checking that two implementations of a module, one written in Python and one in C, are equivalent.

Friday, May 01, 2020

Should All Strings Become f-strings? - Python Language Summit 2020


The first language change proposed this year was the most radical: to make f-strings the default. Eric V. Smith, who wrote the PEP for f-strings in 2015, said they are the killer feature of Python 3.6, and they are the motivation for many of his clients to move to Python 3. However, they are error-prone. It's common to forget the "f" prefix:
x = 1
# Forgot the f prefix.
print("The value is {x}")
Smith has observed programmers f-prefixing all strings, whether they include substitutions or not, just to avoid this mistake.

Read more 2020 Python Language Summit coverage.


When f-strings were added in 3.6, it was suggested to make them the default, but this was too big a breaking change. Besides, replacing all literal brace characters with double braces would be ugly:
"A single pair of braces: {{}}"
In this year's Summit, Smith proposed again to make f-strings the default. The following kinds of strings would become f-strings:
  • "string" — an f-string
  • f"string" — still an f-string
  • r"string" — a raw f-string
Binary literals like b"string" would not become f-strings. Smith would add a new "p" string prefix for plain strings, which would behave like ordinary strings today.
  • p"string" — a plain string
Performance would not be affected: the runtime behavior of a string without any substitutions would be the same as today. Plain strings would still have some uses; for example, regular expressions that include braces, or as the input to str.format. In Smith's opinion, f-strings have superseded str.format, but several in the audience objected that str.format with a plain string allows for late binding, and f-strings don't obviate str.format_map.

Smith acknowledged some problems with his idea. It would introduce yet another string prefix. Flipping the master switch to enable f-mode would break some code, so there must be a way to gradually enable the change module by module, perhaps like:
from __future__ import all_fstrings
He was concerned the change was so drastic that the Python core developers would never have the nerve to enable it without requiring a future import. If so, the idea should be abandoned right away.

Yarko Tymciurak asked via chat: "How do you describe to beginners what p'why is this needed' is?" Smith conceded that p-strings make the language more complicated, but, he said, "There's going to be very few p's in the wild, and I think their explanation will be fairly obvious."

Several attendees were enthusiastic to make the change. Brett Cannon said that removing the need for f-prefix would make the language easier for beginners.

Larry Hastings pointed out that PHP strings are format strings by default and "the script kids love it." However, he wrote, "It seems to me this is solving the problem of 'oh I forgot to put an f in front of my string', and not noticing until it's too late. Is that problem bad enough that we have to change the language?
" Many agreed that f-strings by default would have been a good idea if Python were beginning from scratch; however, Paul Moore, Guido van Rossum, and others feared the disruption would outweigh the benefits. The group concluded that Smith should send his PEP to the mailing list for further debate.

Thursday, April 30, 2020

Lightning Talks Part 1 - Python Language Summit 2020


Sumana Harihareswara

What do you need from pip, PyPI, and packaging?


Python packaging has seen relatively quick development in recent years as a result of increased funding; most famously the new PyPI.org website was launched in 2018. The current work in progress includes malware detection and signed packages on PyPI, a new dependency resolver for pip, and a revamp of virtualenv. Much of this work is funded by grants from companies. (Details on the Working Group page.) Sumana Harihareswara from the Packaging Working Group is a prolific grant proposal writer; she presented ideas for further development.

Read more 2020 Python Language Summit coverage.

Python packaging ideas for the future include:

Harihareswara solicited packaging ideas from the audience to inform the Python Packaging Authority roadmap and the Fundable Packaging Improvements page, asked them to add their complaints to the packaging problems list, and requested help writing grant proposals.

Since Harihareswara had listed a revamp of virtualenv among the works in progress, Barry Warsaw wondered what advantages virtualenv has over venv, which is now built in to all supported Python versions. Bernat Gabor, who maintains virtualenv, answered that virtualenv is faster, provides a richer API for tools built on it, and serves as a laboratory for ideas that might be merged into venv.

Ernest W. Durbin III provided a status update on malware checking: the framework is in place but only two checks have been implemented, "mainly for demonstration." He has invited security researchers to implement more checks.

David Mertz asked whether pip's new dependency resolver would be able to resolve dependency conflicts. Paul Moore said he is still researching what users want pip to do in the case of conflicts, and what solutions are provided by resolver algorithms. The new resolver is about to be released, but it is still alpha-level and will be turned off by default.

Eric Snow

A Retrospective on My "Multi-Core Python" Project


Of all the schemes for freeing CPython from the Global Interpreter Lock, the frontrunner is Eric Snow's plan to give each subinterpreter its own lock. He proposed the idea in 2015, began discussing and prototyping the idea intensely, and burned out the next year. "I was trying to do too much on my own," said Snow. In 2017 he resumed development, this time with dozens of collaborators, and wrote PEP 554 to expose subinterpreters to pure Python programs, which will ease testing for the multi-core project. He presented his progress to the Language Summit in 2018 and in a 2019 PyCon talk. His TalkPython interview last year was especially effective at drawing attention to the project.

Snow's immediate blocker is PEP 554's acceptance and implementation, but much work remains after that. He told the 2020 Language Summit, "I've just been chugging along, little by little. Lots of little pieces to get this project done!" Hard problems include passing data safely between subinterpreters, the "grind" of removing all the global variables, and reaching the actual goal: creating a distinct GIL per subinterpreter. Snow predicts the split GIL won't land in Python 3.9, but "3.10 for sure."

Snow thanked a large list of contributors, many of them outside the core developer team.

Kyle Stanley asked whether daemon threads should be still be allowed in subinterpreters or not. (Victor Stinner tried to ban them but had to revert his change.) Snow replied that daemon threads in subinterpreters lead to many finalization problems, and their use should be discouraged, but removing them entirely has proven too disruptive for the core team to accomplish any time soon.

The 2020 Python Language Summit


The Python Language Summit is a small gathering of Python language implementers (both the core developers of CPython and alternative Pythons), as well third-party library authors and other Python community members. The Summit features short presentations followed by group discussions. In 2020, the Summit was held over two days by videoconference; questions were asked by a combination of voice and chat. It was led by Łukasz Langa and Mariatta Wijaya.

Thanks to MongoDB for sponsoring the Python Language Summit.



Day 1

Should All Strings Become f-strings?
Eric V. Smith

Replacing CPython’s Parser with a PEG-based parser
Pablo Galindo, Lysandros Nikolaou, Guido van Rossum

A Formal Specification for the (C)Python Virtual Machine
Mark Shannon

HPy: a Future-Proof Way of Extending Python?
Antonio Cuni

CPython Documentation: The Next 5 Years
Carol Willing, Ned Batchelder



Day 2


Lightning talks round 1
Sumana Harihareswara, Eric Snow

The Path Forward for Typing
Guido van Rossum

Property-Based Testing for Python Builtins and the Standard Library
Zac Hatfield-Dodds

Core Workflow Updates
Mariatta Wijaya

CPython on Mobile Platforms
Russell Keith-Magee

Lighting talks round 2
Zac Hatfield-Dodds, Jim Baker, Eric Holscher, Mariatta Wijaya



Image: Natal Rock Python

The path forward for typing - Python Language Summit 2020

"There are a lot of PEPs about typing!" said Guido van Rossum at the Language Summit. Since 2014 there have been ten PEPs approved for Python's type-checking features. Two of them have been approved already this year: the relatively "esoteric" PEP 613: Explicit Type Aliases, and another that will have widespread impact, PEP 585: Type Hinting Generics In Standard Collections, written by Łukasz Langa and mainly implemented by Van Rossum. Thanks to this PEP, types which had been defined like List[int] can now be spelled list[int], with a lowercase "L". As Van Rossum told the Python Language Summit, "We want to avoid a world where users have to remember, 'Here I have to use a capital-L List and here I use a lowercase-L list.'"

Read more 2020 Python Language Summit coverage.


A "generic" is a type that can be parameterized with other types. Generics are usually container types. Since Python 3.5, the typing module has provided "type aliases" like List, which can be parametrized with the type of values it contains, like List[str] in this type-annotated function definition:

from typing import List
def greet_all(names: List[str]) -> None:
    for name in names:
        print("Hello", name)

Van Rossum showed the Summit the following code, demonstrating that the ordinary built-in list and dict classes can now be used as generics for type annotations:

>>> p = list[int]
>>> p
list[int]
>>> p.__origin__
<class 'list'>
>>> p.__args__
(<class 'int'>,)
>>> p((1, 2, 3))
[1, 2, 3]
>>> from typing import TypeVar; T = TypeVar("T")
>>> dict[str, T][int]
Dict[str, int]

The syntax list[int] is enabled by implementing __class_getitem__ on list. The built-in containers such as tuple, dict, list and set are supported, along with some standard library containers and abstract base classes, including collections.deque, collections.abc.Iterable, queue.Queue, and re.Pattern. The effect for everyday coders is mainly a matter of spelling, yet as Van Rossum said, "It's probably going to affect everyone's code, or everyone will encounter code like this." Fewer users will have to import type aliases such as List from the typing module; it will be required only for advanced annotations. Van Rossum asked the Summit, "How much of this do we want to make built in?"

Python's approach to type-checking is to add type annotations in the source code, but to check types neither during compilation nor at runtime. Instead, programmers use a separate type-checker (such as mypy or PyCharm). The new PEP 585 type annotations are the same: they do no checking at all, so "nonsense" annotations like list[str, str] are permitted. It is the type checker's job to reject them.

Annotations are not completely free at runtime, however: by default an annotation like List[int] is evaluated to create a type object when it is encountered, usually at module-load time. This can noticeably hurt startup times for big type-annotated programs. PEP 563 Postponed Evaluation of Annotations was introduced in Python 3.7 to solve this problem: type annotations are saved as strings, and evaluated only when a type checker such as mypy requests it. This optimization is currently guarded behind from __future__ import annotations. Van Rossum asked whether postponed evaluation should become the default in Python 3.9, which will be released imminently, or 3.10.

Also in Python 3.10 will be PEP 604, which permits the current Union[t1, t2] annotation to be spelled as t1 | t2, using the vertical bar to express a union of types. The PEP's scope might expand to add syntax that even programs without type annotations would enjoy. For example, isinstance(x, (t1, t2)) could be written isinstance(x, t1 | t2), and an exception handler could be written like except t1 | t2.

Yury Selivanov noted that typing.Optional[t] could be replaced with t | None, and asked whether it could be shortened further as t?. "Every year," replied Van Rossum, "there's another feature that people want to use the question mark for." In his opinion, t | None is convenient enough, and another syntax would be redundant. (Although the new PEG parser would make it easy to implement.)

Stéphane Wirtel asked if Python would ever have exception annotations. "Ouch!" said Van Rossum. The consensus is that Java's checked exceptions were a bad idea, and would probably be bad in Python too. "I don't think I have the stomach for that."

The standard library and most PyPI packages have no type annotations. Type-hinted "package stubs" for this code are hosted in the typeshed repository, but storing all those stubs in a monolithic distribution doesn't scale, and the problem will grow worse. In a GitHub issue thread, Jukka Lehtosalo predicted that in two years, stubs for third-party packages will outnumber those for the standard library, and in five years, typeshed will include more than 1000 third-party packages. As Van Rossum told the Language Summit, Lehtosalo's proposal will split typeshed into separate distributions so users can easily download just the stubs they need, consistent with PEP 561.

Brett Cannon asked whether the standard library's annotations should be shipped with Python, either as stub files or in the code itself. Van Rossum said new stdlib code should be written with annotations inline, but old code includes optimizations and strange legacy behaviors that defy static typing. Currently mypy does not analyze standard library code because "it assumes that the standard library is full of untyped shit," it looks in typeshed instead. If indigenous type annotations grew in the standard library, the core team would have to coordinate with type checker authors to manage the change.

Van Rossum offered an update on mypy. He admitted he hadn't been active on mypy recently, and "my former colleagues at Dropbox have not been able to make as much progress as we did in the past." Support for NumPy is stalled. The same goes for decorators, although once PEP 612 is approved it will provide a prerequisite for decorator support in mypy. Raymond Hettinger asked if mypy development needs funding. Michael Sullivan, a mypy contributor from Dropbox, replied that Dropbox considers mypy mostly complete, and has moved on to projects like their Python 3 migration. Van Rossum said funding could help. Personally he has "moved on to retirement." The Python static typing mailing list is quieter than Van Rossum would like, interested people should join.

There's better news about mypyc, an experimental project to translate type-annotated Python into C. The translator's main use for now is converting mypy to C for speed. There is work in progress to allow a mix of Python and Python-translated-to-C in the same program, and to write documentation. The mypyc project expects a Google Summer of Code student this summer.

CPython Documentation: The Next 5 Years - Python Language Summit 2020


"Documentation is the way we communicate with each other," said Willing. "Historically, we've done a great job with documentation." But the environment is changing: Python's BDFL has retired, and Python's user base is expanding, becoming more global, and moving away (to some degree) from lower-level programming to higher-level applications. These changes impose new documentation burdens on the small core team. Willing said, "We don't scale well."

Read more 2020 Python Language Summit coverage.


Willing and Ned Batchelder proposed a new Python Steering Council workgroup called the "Documentation Editorial Board". Its members would include core developers and community members; they would write style guides, manage translations into non-English languages, and create a landing page that guides different kinds of users to the documentation that suits them. (Daniele Procida had shared earlier a guide to writing docs for a variety of users' needs.) The core team need not write all the docs themselves—they should be owned and written by the community, along with the core team, overseen by the new Editorial Board.

In the Editorial Board's first year it would focus on governance, translations, the new landing page, and tutorials. Willing was inspired by the core team's overhaul of the asyncio docs; they added tutorials and split the high-level information from the low-level. The rest of the standard library would serve users better with more tutorials like asyncio's. Style guides would ensure consistency and best practices. As Ned Batchelder pointed out, Python has two PEPs for code style (one for C and one for Python), but none for documentation.

In its second year, the Board would measure its effectiveness so far, and begin running documentation sprints. Willing recommends the Board begin annual editorial reviews, seeking patterns of user confusion: "When users ask questions on mailing lists and the bug tracker, it means something's not clear to them." Updating the documentation to fix common misunderstandings would save time in the long run for users and the core team.

Batchelder observed that "twenty-five years ago, our main audience seemed to be refugees from C," but most readers of the Python docs today are not career software developers at all; they need different docs.

Raymond Hettinger asked, "Any thoughts on why no one stepped up to write any docs for the walrus operator? I'm not seeing people volunteering for major documentation efforts. Mostly the contributions are minor, micro edits." Willing replied that the walrus operator specifically was a "hot potato" that deterred volunteers. In general, the Python core team doesn't encourage others to lead big documentation projects; community members don't have a sense of ownership over the docs, nor the authority to merge their changes, so skilled writers take their efforts elsewhere. The proposed new Editorial Board would help to change that.

Sumana Harihareswara asked how documentation work would be funded, and whether professional technical writers might be involved. Willing replied that the PSF will fund some work, but she emphasized recruiting volunteers from the community. Several in the audience asked about making a "core documenter" role analogous to "core developer"; Batchelder replied that fine-grained roles and permissions in open source projects are counterproductive. People who write excellent documentation should simply be promoted to core developers.

HPy: a future-proof way of extending Python? - Python Language Summit 2020


Antonio Cuni presented HPy (pronounced "aitch pi"), an attempt at a replacement C API that is compatible and performant across several interpreter implementations. The idea was born at EuroPython last year from a discussion among CPython, PyPy, and Cython developers.

Read more 2020 Python Language Summit coverage.

CPython's API for C extensions is tightly coupled with the interpreter's internals. Another interpreter such as Jython, if it wants to support the same C extensions, must emulate these internals, pretending to the C extension that the interpreter works the same as CPython. Even CPython suffers: any of its internals that are exposed in the C API can't be improved without breaking compatibility. Python objects in the C API are pointers to CPython's PyObject structs, whose internal layout is partly exposed to extensions. Extensions expect each PyObject pointer to be constant for the object's lifetime, which prevents a memory manager like PyPy's from moving objects during garbage collection.

Most prominently, the C API requires extensions to control objects' lifetimes by incrementing and decrementing their reference counts. Any Python implementation that does not have a reference-counting memory manager, such as PyPy, must emulate refcounts for the sake of the C API. Cuni calls this a "massive amount of precious developer hours wasted," an impediment to performance, and the main obstacle for Larry Hastings's GILectomy.

Victor Stinner has already outlined the design for a better C API that hides some internals, but still depends on reference-counting; Cuni confronts the same questions and gives a more radical answer.

HPy is a new C API that is interpreter-agnostic. Instead of PyObject pointers, HPy presents handles (hence the "H" in its name). Where a C API user would incref a PyObject when copying a reference to it, an HPy user would duplicate a handle. Each handle is distinct and must be closed independently. Cuni showed this code example:
/* C API */
PyObject *a = PyLong_FromLong(42);
PyObject *b = a;
Py_INCREF(b);
Py_DECREF(a);
Py_DECREF(a); // Ok
/* HPy */
HPy a = HPyLong_FromLong(ctx, 42);
HPy b = HPy_Dup(ctx, a);
HPy_Close(a);
HPy_Close(a); // WRONG!
Handles are HPy's basic departure from the C API. The independence of handles' liftimes, said Cuni, decouples HPy from CPython's ref-counting memory manager, and makes HPy a natural, fast C interface for other Pythons. PyPy, for example, will maintain a map of handles to objects; when its garbage collector moves objects in memory, it only needs to update the map. Handles permit precise debugging: if a handle is leaked, HPy prints the line number where it was created. (The HPy context parameter ctx that is passed everywhere allows for subinterpreters, and perhaps other features in the future.)

Brett Cannon asked whether HPy will be a minimalist API for extension development, or if it will include specialized APIs for speed. For example, the current C API has a generic PyObject_GetItem, and a fast PyDict_GetItem specialized for dicts. Cuni said he prefers a smaller API, but benchmarks would guide him.

Cannon asked whether a tool could semi-automatically port C code from the C API to HPy. It could not, according to Cuni, because the problem of closing each handle exactly once must be solved carefully by a human. HPy's debug messages will be a great help. "In theory," Cuni said, "it should be as easy as adding an 'H' to all C API calls, renaming Py_INCREF to HPy_Dup, putting HPy_Close here and there, and then see if the debug mode is happy or complains."

Victor Stinner asked whether his draft proposal to incrementally modify the C API to hide internals would eventually solve PyPy's problems with C extensions. Cuni replied, "It's not enough for PyPy because of reference counting and the fact that PyObject* is not a good representation for objects that can move in memory." But he acknowledged that Stinner's proposal goes in the right direction.

Cuni said the "HPy strategy to conquer the world" is to create a zero-overhead façade that maps HPy to the C API (using compile-time macros), then port third-party C extensions to pure HPy, one function at a time. It must be faster on alternative implementations than their existing C API emulations; early benchmarks show a 3x speedup on PyPy and 2x on GraalPython, a JVM-based Python.

HPy is currently missing type objects, but Cuni said it "basically works." An HPy extension can be compiled to the CPython ABI or to an "HPy universal ABI" that allows the same compiled extension to work with multiple interpreters. In the future, a new Cython backend will target HPy instead of the C API. Cuni and his collaborators have ported ujson to HPy; they plan to port a subset of NumPy next, and eventually to write a PEP and merge HPy into the official CPython distribution, where it will live alongside the existing C API. Cuni hopes the core developers will endorse HPy for third-party C extension development; in a "hypothetical sci-fi future" CPython might port its standard library C modules to HPy.