Thursday, May 14, 2020

Lightning Talks Part 2 - Python Language Summit 2020

Zac Hatfield-Dodds

Teaching Python with Errors

When a new coder begins learning Python, the first Python feature they usually see is SyntaxError. In Zac Hatfield-Dodds's experience, novices meet these errors practically as soon as they start typing, and they will spend most of their time over the following months struggling with them. Since experienced programmers rarely encounter syntax errors and easily fix them, the core team has not built very good tooling for them, and the official Python tutorial doesn't cover errors until Section 8. In any case, documentation is not the place to fix novices' user experience, since they don't know where to look for help. The only place to fix it is in CPython.

Read more 2020 Python Language Summit coverage.

SyntaxError does little to help a beginner. It directs their attention to the spot after the token that caused the error. In this example from the tutorial, the caret points at the last letter of print, but the coder's mistake was omitting a colon after True:

>>> while True print('Hello world')
  File "&lt;stdin>", line 1
    while True print('Hello world')
                   ^
SyntaxError: invalid syntax

Hatfield-Dodds proposed more precise errors that tell users about mismatched parentheses, unterminated string literals, missing commas and colons, and so on. Pablo Galindo Salgado said, via Zoom chat, that Python 3.8 has improved some error messages, for example:

>>> (1+3+4))
  File "&lt;stdin>", line 1
    (1+3+4))
           ^
SyntaxError: unmatched ')'

The new parser for 3.9 might improve error messages further, although in the short term it requires more work just to bring it to parity with the current parser.

Hatfield-Dodds suggested CPython could implement "did you mean..." for both SyntaxErrors and NameErrors, by fuzzily searching for replacements for typos. Incremental improvements to SyntaxError could be funded by educational institutions, he guessed, and would make a good project for contributors from outside the core team, if the core developers are willing to guide them. "I care a lot about this," said Hatfield-Dodds, and paused to let out a big exhale. "If people's first exposure to errors in Python is an error that tells them what the problem is and how to fix it, we might even convince them to read error messages in the future, which would be magical."

Jim Baker

State of Jython

"We are not dead yet!" said Jim Baker to the Language Summit. He admitted that "this is something I've said many times about Jython in the past." The project is certainly behind CPython—it has just published a bugfix release of 2.7, and Python 3 support is far off—but it is making progress nevertheless. "And again," said Baker, "our apologies."

Jython's previous bugfix release was nearly two years ago; the main topics of the latest version, Jython 2.7.2, were an overhaul Jython's PyJavaType objects, and solutions to deep race conditions. Baker said there is still an active user base for Jython based on the response to Jython's recent betas, but "capacity in the project is low." The project is currently led by Jeff Allen with two other regular contributors, none devoted to Jython full-time. Emeritus developers chip in occasionally. Baker hopes there will be more interest once Jython 3 ships, but he wrote in his slides, "the journey is unpredictable and resources are few."

There are several Python 3 implementations on the JVM, but none is ready to use. Isaiah Peng made a solo attempt at implementing Jython 3 in 2016-2017. It is too late now to resume this work, because Peng's branch didn't pull changes from the main Jython repository and they have now diverged too far. Baker said Jython should copy ideas from this prototype and credit Peng's work. Since 2016, Jeff Allen has been writing the Very Slow Jython Project, "a project to re-think implementation choices in the Jython core, through the gradual, narrated evolution of a toy implementation." Independently of the Jython team, Oracle is actively building an experimental Python 3 implementation on the Graal (pronounced "grail") JVM. "It's fantastic," said Baker, but unlike Jython "it doesn't do this beautiful subclassing of Java classes with Python classes."

The plan for Jython is to target the Python 3.8 language, including type hints. Baker has prototyped code to generate Python type hints from Java classes. The team will overhaul the core implementation using modern Java features, and continue to emphasize Jython's strengths: convenient integration with Java libraries, speed equal to CPython or better, and high concurrency (unlike CPython). Baker speculated that Jython 3.8's asyncio module could be built on the high-performance Netty library. He hopes that the new HPy API will take off, because it would simplify supporting C extensions from Jython.

Eric Holscher

Read the Docs features of interest

Holscher's presentation was an advertisement for nifty additions to ReadTheDocs, and an enticement to move CPython's documentation there.

ReadTheDocs recently added the hoverxref feature; when a reader hovers their cursor over a link in a documentation page, a tooltip shows the content of the linked section. Holscher has forked the CPython docs to host them on ReadTheDocs and demonstrate this feature's utility.

ReadTheDocs also has nicer pull request integration than the CPython repository does. Currently, when contributors offer pull requests to CPython, the patched documentation is available for download as a zip archive of HTML files. ReadTheDocs goes one better; its PR builder publishes the patched docs to the web for review. Search engines are blocked from indexing these docs, and each page displays a warning that it was created from a pull request. (After the Summit, at the core developers' request, the ReadTheDocs team enabled this feature for pull requests to the Python Developer Guide, which is hosted on ReadTheDocs.)

Finally, Holscher claimed that ReadTheDocs's text search is better than what CPython uses, which is generated by Sphinx. ReadTheDocs's search results include direct links to pages' subsections, and they provide search-as-you-type.

Sanyam Khurana commented via Zoom chat, "This looks very promising and amazing!" Pablo Galindo Salgado suggested hosting the PEPs on ReadTheDocs as well.

"Some of this is beta," said Holscher. Nevertheless, it's exciting to consider how much better the reader experience would be if CPython migrated. He argued that CPython should benefit from future improvements in ReadTheDocs, and ReadTheDocs should benefit from the attention of CPython developers. "We’ve talked about this in the past, there were blockers," he said, but the ReadTheDocs team has now addressed them.

Mariatta Wijaya

Make your life happier (with Zapier)

"I want you all to do more automation in your lives," said Mariatta Wijaya. She acknowledged that Zapier is her employer, but her intention was pure. "I know you're volunteering for open source. You should use more automation and save time."

For a demonstration, Wijaya showed the Zapier workflow she had used to invite attendees to the Language Summit. "You all received calendar invites for this event," she said. "I did not send them by going to Google Calendar." Instead, she collected names and email addresses in a Google spreadsheet. For each attendee, once she obtained a recording waiver and consent to the code of conduct, she put a "y" in the attendee's row in the spreadsheet. Her Zap then sent the invite automatically.

Earlier in the summit, some core developers had complained about the firehose of Discourse emails. Zapier has a Discourse integration that can manage this torrent. A user can create a "Zap" that takes new Discourse messages, filters them according to keywords or other attributes the user chooses, then triggers an email, Slack notification, or some other action. Wijaya also described how Zapier automates onboarding new PyLadies organizers.

Conclusion

"Well, we made it," said Łukasz Langa at the end of the second day of the videoconference. "I'm sorry it was not what a real Python Language Summit could have been, but I hope it was better than nothing." His co-organizer Mariatta Wijaya congratulated attendees from all over the globe for staying up. "I know this is past bed time for many of you." It was 8pm Pacific Time, the middle of the night for attendees in Europe and Africa, and the morning of the next day in Asia. She thanked the PSF and PyCon staff, and MongoDB for sponsoring the Summit.

Victor Stinner added, "Thanks TCP/IP for making this possible."

Sumana Harihareswara said, "This was real to me."

Wednesday, May 13, 2020

Call for Volunteers! Python GitHub Migration Work Group

Call for community volunteers! It is time to assemble a Python Work Group that will aid in Python’s migration to GitHub!

PEP 581 was accepted and Python is now starting to plan for the actual migration (per PEP 588)!

We are looking for volunteers to participate in a work group that will be involved with Python’s migration from bugs.python.org to GitHub. We want to make sure the directions this migration takes represents what the community needs!

The transition will be completed with the assistance of GitHub’s team and the PSF will be contracting a Technical Project Manager to assist. Certain Steering Council members and PSF staff members will be part of this work group as well!

The work group will help review contractor resumes, weigh in on discussions, help guide decisions, help get community input, and have a close overview of the entire project. If you have experience with Roundup and GitHub we want to hear from you! We want to ensure that through this Work Group we have a wide range of users represented. The discussions and decisions will help mold what the final outcome will be.

Fill out this application form by May 27: https://forms.gle/jivuUdgViQPU4rKh8. We will reach out soon after.

6 Ways Salesforce Gets Things Done with Python

Salesforce Engineering puts Python to work across many areas of their business.

Read on to see how they use python in machine learning, security, internal devops teams and more.

The Python programming language has strong ties to both engineering and science disciplines, which gives its users access to a wide number of libraries to solve both practical and theoretical problems. We put it to work across Salesforce.org (our non-profit product arm), Heroku, Salesforce Einstein, Industries and Service Clouds, internal devops teams, and more.

Here are 6 things we use Python to do (and you can too!) through projects we’ve open sourced:

1. Conquer the Natural Language Decathlon by performing ten disparate natural language tasks (DecaNLP).

Deep learning has significantly improved state-of-the-art performance for natural language processing (NLP) tasks, but each one is typically studied in isolation. The Natural Language Decathlon (decaNLP) is a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. By requiring a single system to perform ten disparate natural language tasks, decaNLP offers a unique setting for multitask, transfer, and continual learning.

2. Create SSL/TLS client fingerprints that are easy to produce on any platform and can be easily shared for threat intelligence (JA3).

JA3 gathers the decimal values of the bytes for the following fields in the Client Hello packet; SSL Version, Accepted Ciphers, List of Extensions, Elliptic Curves, and Elliptic Curve Formats. It then concatenates those values together in order, using a “,” to delimit each field and a “-” to delimit each value in each field.

3. Use Google Sheets like tables in code (pygsheetsorm).

Ever wanted to be able to use a Google Sheet like a table in your code? How about if you could get a list of objects that automatically mapped column headers into properties? Then this project is for you! This is a simple interface on top of pygsheets.

4. Get rid of silent errors in Perforce syncs (o4).

At Salesforce, we use Perforce at a very large scale. A scale that exposes some shortcomings in p4 itself. o4 was created to improve reliability of a sync and increase scalability in our very large-scale CI. What that boils down to is the rather horrendous reality that a p4 sync makes most of the changes to your local files. o4 allows you to continue using Perforce and all the associated tools and IDE plugins, without the uncertainty around a sync. Every sync is guaranteed perfect, every single time. In the rare occurrence that a sync could not be met to 100%, o4 will fail loudly. Crash and burn. No more silent errors! In addition to that, o4 allows some dramatic improvements to CI.

5. Automatically verify, de-duplicate, and suggest payouts for vulnerability reports through HackerOne (AutoTriageBot).

This bot can automatically verify reports about XSS, SQLi, and Open Redirect vulnerabilities (via both GET and POST). In addition, it is built in a modular manner so that it can be easily expanded to add tests for other classes of vulnerabilities.

6. Run continuous integration from the command line for Salesforce Managed Package applications (CumulusCI).

Out of the box, CumulusCI provides a complete best practice development and release process based on the processes used by Salesforce.org to build and release managed packages to thousands of users. It offers a flexible and pluggable system for running tasks (single actions) and flows (sequences of tasks) and an OAuth-based org keychain allowing easy connection to Salesforce orgs and stored in local files using AES encryption.

. . .

Still want more Python? Check out all of the Salesforce Open Source projects built in Python on GitHub.

WRITTEN BY

Laura Lindeman

Voracious reader & crafter of words. organizer extraordinaire. #peoplegeek at Salesforce on the Tech & Products Innovation & Learning team.

Capital One - Lessons From Adopting Python as a Team

By Akshay Prabhu, Software Engineering Manager, Capital One

Rewriting Legacy ETL Jobs in Python When You’re Not Python Devs

So how does a team of six engineers - heavily experienced in web development in languages like ReactJS, NodeJs, and Java - go about adopting Python into their work?

The application development and cloud computing technology landscape is always changing and an important part of our role as engineers is to stay up to date on those changes. Sometimes it is through solo work - such as learning a new framework or skill. But sometimes it is through team-based work - such as adopting and migrating a whole project to a new language.

Like most other engineers, I’ve experienced this kind of team-based work multiple times in my career, and I recently went through it in my current role as a Software Engineering Manager at Capital One. In my role I am working on leading a team of engineers to develop highly scalable web applications, including both API and UI layers. As part of our journey to migrate applications to the cloud, we were also involved in rewriting a lot of legacy ETL jobs originally built using licensed tools. Most of my past engineering experience before these re-write efforts for ETL was around NodeJS, Java, and Ruby on Rails; and until a year ago I had not worked with Python. The same was true with most of the engineers on my team.

In fact, our entire team consisted of experienced engineers who have delivered multiple web and distributed applications in the cloud, but none had exposure to ETL or data-driven projects.

About one month into the rewrite efforts we were hitting limitations around using Java to migrate legacy system code. We wanted to be able to achieve simple File Operations, as well as complicated queries using Spark, but with dynamically typed language and minimal bootstrap code. This was one of our reasons for considering whether it was time to switch languages.

Why Did We Decide On Python?

For our project, we faced the huge task of re-writing multiple jobs running on a legacy ETL platform. This involved enterprise API integrations, as well as complex data analysis and refinement.

Inflexibility of Existing Languages

Due to the nature of these jobs, none of the languages we were most experienced with were a great choice. That’s because they were:

Static typed languages like Java
Involved heavy bootstrap code
Lacked extensive support for data manipulation libraries such as Pandas
Lacked extensive external community support for Spark integrations or data analysis

Flexibility with Python: API Integration, Data Analysis, and Others

Python seemed to be a good choice for us as it was flexible enough to support a wide array of use cases. It also fit in well as it was:

Dynamically typed
Supported re-writing Bash based or ETL jobs in fewer lines of code
Had a well supported REST interface
Had excellent support for data manipulation libraries such as Pandas and Spark

There were a few other areas that really cemented our use of Python - these were File Operations and Community Support.

File Operations

File Operations were key to our project as our process involved reading multiple source files

in parquet format, extracting, refining, and producing new files. We needed a language that could easily integrate with SPARK on HDFS for complicated and larger datasets, as well as an equally powerful library for smaller datasets like Pandas. Hence, File Operations were at the center of our re-write.

Python worked well in both cases, including:

Well suited for simple file manipulations using Pandas
Worked in complex scenarios using Spark Queries on HDFS
Needed significantly fewer lines of code to accomplish this than in Java or NodeJS
Could work for File Operations in memory when source files are few MBs
Simple enough to make API calls for various Enterprise Layers

Community Support

As we were assessing adopting a new language, Community Support both within Capital One and without was a key for us. We wanted a language which was well supported by an active open source community that:

Constantly updates security enhancements
Resolves outstanding questions or issues
Actively merges new feature requests from engineers

Outside of Capital One

Python has a much more extensive community of engineers in the Data Analysis space as compared to Java
PySpark has much better support than Spark Integration with Java

Within Capital One

Capital One has a very active community of Python engineers and experts to help teams get started and maintain their Python projects
This internal community allowed us to seek guidance, as well as go through multiple code reviews

How We Got Started with Python

Like any other programming language, we started by defining our source of truth for standards. We went with PEP8 which is the standard that defines Coding Style Guides for Python.

We went through the below key stages from Planning to Production.

Key to learning and adopting a new language was putting in time for foundational work; automating compliance with PEP8 and adopting Py Tooling like Black and Flake8.

Let’s go through some of the key elements to these stages.

Define Standards and Automating Adoption

As our team was new to Python, we spent the first initial few days defining standards on how we would code to comply with PEP8. But in addition to adoption, we needed to automate our workflow to comply with these standards.

Added a Pre-Commit Hook for Black which automatically formats code on local commits.
Black didnt catch all violations, which is where Flake8 came in.
Flake 8 installed as a Pre-Commit hook stopped any code commits where there were outstanding compliance errors with PEP8.

After automating our workflow to comply with standards and a base repo, we started with the core dev work.

Logging

This was key given Python is dynamically typed; logging was our solution to better track problems.

As a Team we Decided on a common Logging format:

``` Code Block

requestid - machine_instanceid

timestamp - YYYY-MM-DD HH:MM:SS,milliseconds

loglevel - INFO, ERROR

modulename - function_name

state - START/END/INPROGRESS

type - SCRIPT/EXTERNAL_API/etc.

modresponse - Success/Error

duration - Tracking External API calls

message - custom message as needed

errormessage - err message

```

We leveraged the ELK stack (Elasticsearch, Logstash, and Kibana) for logging.
By using Kibana as the Web Application UI for our logs, we could see our execution details as well as trace down exceptions in Kibana.

Critical takeaways

Adopting a simple library called requests to handle our API calls.
Spark v/s Pandas: When you perform operations on a dataframe in Spark, a new dataframe/reference is created which is by design. This works well with large datasets but is a hindrance when the dataset is smaller. Hence for filtered smaller datasets under 5MB, we decided to go with Pandas for quick data frame manipulations.
Automation for compliance to coding standards was a huge time saver as most of our team was new to Python.
We quickly realized for us to test along with all the dev work, we needed a TDD approach where pytest came into our workspace. This proved extremely helpful.

Was it Worth the Effort?

In addition to being the right choice of tool for our job, exploring and learning Python allowed the team to work more closely together and bond more than ever.

We as a team learned something new together and solved multiple issues as we hit walls; which took our team bonding to great heights!

Given the nature of our project, Python did fit in very well. It’s dynamically typed and the libraries are feature-rich, working for both simple API calls as well as complex operations around data transformation and filtering. As engineers, we worked together as a team to “learn how to learn” and adopt a new language the right way. With the help of Capital One’s Python Gurus, we adopted all the best practices for developing a working production application and delivered this project before the committed deadline. I am lucky to be part of such an awesome team, as well as to work with Capital One Python experts like Steven Lott, which was critical for getting us across the finish line.

I hope this has been helpful and would love to learn what languages you and your team have adopted recently; especially in data-driven projects like this, let me know in the comments!

Resources

PySpark

Pandas

***

These opinions are those of the author. Unless noted otherwise in this post, Capital One is not affiliated with, nor is it endorsed by any of the companies mentioned. All trademarks and other intellectual property used or displayed are the ownership of their respective owners. This article is © 2020 Capital One.

Sunday, May 10, 2020

CPython on Mobile platforms - Python Language Summit 2020

"We've got very big news on Android," Russell Keith-Magee told the Language Summit. "We're close to having a full set of BeeWare tools that can run on Android."

The BeeWare project aims to let programmers write apps in Python for Android, iOS, and other platforms using native UI widgets. Keith-Magee reported that BeeWare has made good progress since his Summit presentation last year. On iOS, "Python worked well before, it works well now," and BeeWare has added Python 3.8 support. Until recently, however, Python was struggling to make inroads on Android. BeeWare's Android strategy was to compile Python to Java bytecode, but Android devices are now fast enough, and the Android kernel permissive enough, to run CPython itself. With funding from the PSF, BeeWare hired Asheesh Laroia to port CPython to Android.

Read more 2020 Python Language Summit coverage.

A top concern for BeeWare is distribution size. Python applications for mobile each bundle their own copy of the Python runtime, so Python must be shrunk as small as possible. There have been proposals recently for a "minimum viable Python" or "kernel Python", which would ship without the standard library and let developers install the stdlib modules they need from PyPI. (Amber Brown's 2019 Summit talk inspired some of these proposals.) Keith-Magee said a kernel Python would solve many problems for mobile. He also asked for a cross-compiling pip that installs packages for a target platform, instead of the platform it's running on. Senthil Kumaran observed, "BeeWare, MicroPython, Embedded Python, Kivy all seem to have a need for a kernel-only Python," and suggested they combine forces to create one.

To regular Python programmers, the mobile environment is an alien planet. There are no subprocesses; sockets, pipes and signals all behave differently than on regular Unix; and many syscalls are prohibited. TLS certificate handling on Android is particularly quirky. For the CPython test suite to pass on mobile it must skip the numerous tests that use fork or spawn, or use signals, or any other APIs that are different or absent.

Adapting CPython for life on this alien planet requires changes throughout the code base. In 2015 Keith-Magee submitted a "monster patch" enabling iOS support for CPython, but the patch has languished in the years since. Now, he maintains a fork with the iOS patches applied to branches for Python 3.5 through 3.8. For Android, he maintains a handful of patch files and a list of unittests to skip. Now that Android support is maturing, he said, "We're in a place where we can have a serious conversation about how we get these changes merged into CPython itself."

A prerequisite for merging these changes is mobile platform testing in CPython's continuous integration system. Currently, Keith-Magee tests on his laptop with several phones connected to it. As he told the Summit, he's certain there is a CI service with physical phones, but he has not found it yet and hasn't invested in building one. He develops BeeWare in his spare time, and CI is not the top priority. "Funding is one thing that makes stuff happen," he said. He thanked the PSF for the grant that made Android support possible. Mobile Python suffers a chicken-and-egg problem: there is no corporate funding for Python on mobile because Python doesn't support mobile, so there is no one relying on mobile Python who is motivated to fund it.

Keith-Magee asked the Summit attendees to be frank with him about bringing mobile Python into the core. He asked, "Do we want this at all?" If so, the core team would have to review all patches with their mobile impact in mind, as well as reviewing mobile-specific patches. "What is the appetite for patches that require non-mobile developers to care about mobile issues?" The decision would involve the whole core team and many community discussions. Guido van Rossum endorsed good mobile support long-term. So did Ned Deily, adding, "To actually do it will require money and people. Bigger than many other projects."

Thursday, May 07, 2020

Core Workflow updates - Python Language Summit 2020

The PyCon 2015 sprint was the first time this blogger contributed to Python—or rather, I tried to. The three patches I submitted that year are awaiting a review to this day. In recent years, however, the core team has made bold changes to their development workflow to make their tasks easier, to spread responsibility more widely, and to improve the experience of contributors. When I submitted a patch in the 2018 sprints it was reviewed and merged in a few weeks. Mariatta Wijaya has led many of these changes. She presented the latest updates to the core workflow.

Read more 2020 Python Language Summit coverage.

"This is the third Language Summit in which I've talked about GitHub issues," sighed Wijaya. Last year she urged the core team to make a decision about replacing the quirky old bug tracker on bugs.python.org. Two weeks later they approved PEP 581: Using GitHub Issues. Its sequel, PEP 588: GitHub Issues Migration plan, is still in progress. "I really care about this topic," said Wijaya, but she and her collaborators have mostly focused on other projects for the last year.

There are signs of progress, however. Wijaya began a wish list where Python developers can request features for GitHub's issue tracker that will make it a full replacement for bugs.python.org. The PSF is seeking to hire a project manager for the migration. Wijaya, Brett Cannon, and others have met with GitHub staff, who have offered to import old issues in bulk. The migration will be "like throwing a big switch," said Cannon; there must be a great deal of planning behind the scenes before it is thrown.

In 2018, the Python developers created a web forum (running on Discourse) as a potential replacement for their several mailing lists. Since then, they have contended with both mailing lists and the web forum. Wijaya asked, "We've been using Discourse for a couple of years, is it time to make some final decisions?" Guido van Rossum said that using Discourse via email is clunky, but if he silenced its emails he would forget to check it on the web. Brett Cannon hypothesized that people who have tuned their personal systems for managing email prefer to use only email, and those who have not, prefer Discourse. The Summit attendees reached no conclusion.

A new team of Python contributors formed last summer: the Python Triage Team. Triagers can close issues, edit issue labels, etc., and they can label or close pull requests in the Python organization's GitHub repositories. Unlike core developers, they cannot merge pull requests. Their power to manage issues extends to all the core developers' repositories, including CPython, the developer guide, and the source code for Python's GitHub bots. New members join the team when they are invited by core developers; they can self-nominate by asking a core dev to vouch for them. Wijaya said several triagers have graduated to the core team, and she wants this pipeline to keep flowing. New core developer Kyle Stanley recommended better promotion of the triager-to-core-dev career path. "For myself and several recent candidates, it seems to have worked excellently as a stepping stone."

Tuesday, May 05, 2020

Python Developers Survey 2019 Results

We are excited to share the results of the third official Python Developers Survey conducted by the Python Software Foundation with the help of JetBrains.

More than 24,000 Python users from over 150 countries took part in the survey this past November. With the help of the data collected, we are able to present the summarized results, identify the latest trends, and create a Python developer profile.

View the results of Python Developers Survey 2019!

The survey results cover a broad list of topics. Some of the key things you may learn from the report include: main motivations for Python usage, current popular frameworks, libraries, tools and languages, and many other insights.

In all likelihood, there are plenty of potential findings that were not included in the report. If you have specific questions that are unanswered, send them to us and we’ll dig into the data for additional analysis. You also have the opportunity to delve deeper into the raw survey data and uncover your own amazing facts.

We’ll be delighted to learn about your findings! Please share them on Twitter or other social media, mentioning @jetbrains‏ and @ThePSF with the #pythondevsurvey hashtag. We’re also very open to any suggestions and feedback related to this survey which could help us run an even better one next time. Feel free to open issues here with any comments or questions.

Many thanks to everyone who participated in the survey and helped us map an accurate landscape of the Python community!

Monday, May 04, 2020

Python’s migration to GitHub - Request for Project Manager Resumes

The Python Software Foundation is looking for a Project Manager to assist with CPython’s migration from bugs.python.org to GitHub for issue tracking. CPython's development partially moved to GitHub in February 2017. All other projects within the PSF's organization are hosted on GitHub and are using GitHub issues. CPython is still using Roundup as the issue tracker on https://bugs.python.org (also known as “bpo”). To read more about the rationale behind this migration, read PEP 581.

Thank you to GitHub for donating financial support so this project can begin.

Timeline

May 4 - Requests for resumes opens
June 4 - Requests for resumes closes
June 12 - Final decision will be made on proposals received
June 22 - Work will begin

Submitting a proposal

Resumes should be submitted as Portable Document Format (PDF) files via email to Ewa Jodlowska.

Role description

Goal

Support Python through the full transition from bugs.python.org to GitHub with the assistance of GitHub’s migration team.

Tasks

These tasks are from PEP 588 and a meeting the Steering Council had with GitHub. This is not an exhaustive list of all tasks, but an overview of what the work will most likely entail. The migration will be completed by GitHub and the Project Manager will work with the Python team to steer that migration.

Create a timeline for the project with the Python team and GitHub team
Find out from the community the context behind GitHub search limitations, why bugs.python.org search is sometimes preferred.
Research the Contributor License Agreement (CLA) process and how it can be achieved outside of bugs.python.org. Work with interested contractors, volunteers, and the PSF’s Director of Infrastructure on a solution.
Work with GitHub’s migration team and Python’s community on how mapping of fields should work from bugs.python.org to GitHub
Work with GitHub’s migration team on the transition from bugs.python.org and be the Python point of contact for GitHub. This includes helping field questions from GitHub to the Steering Council/core devs and vice versa.
Assist the Python community with creating guidelines on how people are promoted to Python’s triage team.
Obtain from GitHub a list of projects that have bots built that may help Python with "nosy lists"
Oversee the creation of the new workflow on GitHub
Assist with the creation of GitHub labels and templates when necessary
Oversee the creation of the sandbox issue tracker on GitHub to experiment and test the new workflow
Ensure that the sandbox received adequate testing from the Python team
Update the devguide with that new process ahead of the migration and communicate it to the core developers
Communicate with PSF staff on a regular basis when necessary and provide monthly reports via email.

Note: Some of the work (for example the CLA process) may require additional hired help from outside contractors. Decisions on these tasks will be made after the Project Manager’s research and review is presented to the PSF staff and Steering Council.

Estimated budget

The budget is capped at $30,000 for this project.

Necessary Skills

Excellent time management skills
Must be very organized, punctual, and detail-oriented
Experience working with volunteers
Excellent written and verbal communication
Experience working with software development teams (remotely is a plus)
Ability to balance demand and prioritize
Experience working with GitHub
Experience with GitHub APIs is a plus
Experience working with Roundup is a plus

Questions?

Contact Ewa Jodlowska, PSF’s Executive Director..

Sunday, May 03, 2020

Property-Based Testing for Python builtins and the standard library - Python Language Summit 2020

Zac Hatfield-Dodds opened his presentation with a paraphrase of the economist Thomas Schelling:

No matter how rigorous her analysis or heroic his imagination, no person can write a test case that would never occur to them.

Hatfield-Dodds told the Language Summit, handwritten tests are "fantastic for testing particular edge cases, they're great regression tests," but they're limited by the developer's understanding of the system under test. "We can't write tests for bugs we don't know could occur." We can overcome this limit with exhaustive testing, checking our code's behavior with every possible input; if that is impractical, coverage-guided fuzz testing can generate random inputs and evolve them, trying to explore every branch in the code under test. Fuzzers are very good at finding inputs that crash a program, but they're not as well suited for finding other kinds of bugs.

Read more 2020 Python Language Summit coverage.

For testing the Python standard library, Hatfield-Dodds proposed a different technique: property-based testing. (He is one of the leaders of the Hypothesis property-based testing project.) A property-based test framework doesn't generate totally random input like a fuzzer; it can generate structured inputs such as lists of numbers, or only sorted lists, or instances of a certain object. Unlike handwritten tests, which usually assert that a particular input produces one exact output, property-based tests assert properties of a function, for example that its output is sorted, or that a function is idempotent or commutative.

Hatfield-Dodds presented the following Hypothesis test of a JSON codec:

@given(
    value=st.recursive(
        st.none() | st.booleans() | st.floats() | st.text(),
        lambda x: st.lists(x) | st.dictionaries(st.text(), x),
    )
)
def test_record_json_roundtrip(value):
    assume(value == value)
    assert value == json.loads(json.dumps(value))

The recursive input generator can create None, booleans, floats, text, or lists or dictionaries that contain such values, and so on recursively. Within the test function, the assume statement checks that the input is equal to itself, to avoid inputs with nan, which is not. The heart of the test is the assert statement.

(The above example still has troubles with nan, see Hatfield-Dodds' PyCon Australia talk.)

Hypothesis searches for bugs by randomizing the input, or trying interesting values that tend to trigger edge cases, or retrying inputs that triggered bugs in previous runs. When Hypothesis finds a bug, it evolves the input, searching for the simplest input that reproduces the same bug.

"I want you all to write property-based tests for CPython, for builtins, for PyPy, for everything," said Hatfield-Dodds. He proposed to write new tests, or port existing ones to a property-based test framework, run them in CPython's continuous integration suite, and share them among the Python implementations. These tests could use Hypothesis; they could also be integrated with the AFL fuzzer or used in Google's OSS-Fuzz project. He presented a repository of tests demonstrating the technique for standard library modules such as gzip, re, and datetime. There is even a test that can generate random, valid Python code to fuzz-test the Python parser.

Łukasz Langa mentioned that David MacIver had used Hypothesis to test a Python code formatter and found dozens of bugs.

Paul Ganssle told the Summit that he used property-based testing for his implementation of datetime.fromisoformat. When the function was merged into the standard library the property-based tests were not. In subsequent development Ganssle introduced a segfault bug that "almost certainly would have been caught" if the original tests had still been running. He strongly endorsed Hatfield-Dodds's idea. He added that property-based testing is especially good at checking that two implementations of a module, one written in Python and one in C, are equivalent.

Friday, May 01, 2020

Should All Strings Become f-strings? - Python Language Summit 2020

The first language change proposed this year was the most radical: to make f-strings the default. Eric V. Smith, who wrote the PEP for f-strings in 2015, said they are the killer feature of Python 3.6, and they are the motivation for many of his clients to move to Python 3. However, they are error-prone. It's common to forget the "f" prefix:

x = 1
# Forgot the f prefix.
print("The value is {x}")

Smith has observed programmers f-prefixing all strings, whether they include substitutions or not, just to avoid this mistake.

Read more 2020 Python Language Summit coverage.

When f-strings were added in 3.6, it was suggested to make them the default, but this was too big a breaking change. Besides, replacing all literal brace characters with double braces would be ugly:

"A single pair of braces: {{}}"

In this year's Summit, Smith proposed again to make f-strings the default. The following kinds of strings would become f-strings:

"string" — an f-string
f"string" — still an f-string
r"string" — a raw f-string

Binary literals like b"string" would not become f-strings. Smith would add a new "p" string prefix for plain strings, which would behave like ordinary strings today.

p"string" — a plain string

Performance would not be affected: the runtime behavior of a string without any substitutions would be the same as today. Plain strings would still have some uses; for example, regular expressions that include braces, or as the input to str.format. In Smith's opinion, f-strings have superseded str.format, but several in the audience objected that str.format with a plain string allows for late binding, and f-strings don't obviate str.format_map.

Smith acknowledged some problems with his idea. It would introduce yet another string prefix. Flipping the master switch to enable f-mode would break some code, so there must be a way to gradually enable the change module by module, perhaps like:

from __future__ import all_fstrings

He was concerned the change was so drastic that the Python core developers would never have the nerve to enable it without requiring a future import. If so, the idea should be abandoned right away.

Yarko Tymciurak asked via chat: "How do you describe to beginners what p'why is this needed' is?" Smith conceded that p-strings make the language more complicated, but, he said, "There's going to be very few p's in the wild, and I think their explanation will be fairly obvious."

Several attendees were enthusiastic to make the change. Brett Cannon said that removing the need for f-prefix would make the language easier for beginners.

Larry Hastings pointed out that PHP strings are format strings by default and "the script kids love it." However, he wrote, "It seems to me this is solving the problem of 'oh I forgot to put an f in front of my string', and not noticing until it's too late. Is that problem bad enough that we have to change the language? " Many agreed that f-strings by default would have been a good idea if Python were beginning from scratch; however, Paul Moore, Guido van Rossum, and others feared the disruption would outweigh the benefits. The group concluded that Smith should send his PEP to the mailing list for further debate.