Sunday, May 03, 2020

Property-Based Testing for Python builtins and the standard library - Python Language Summit 2020


Zac Hatfield-Dodds opened his presentation with a paraphrase of the economist Thomas Schelling:
No matter how rigorous her analysis or heroic his imagination, no person can write a test case that would never occur to them.
Hatfield-Dodds told the Language Summit, handwritten tests are "fantastic for testing particular edge cases, they're great regression tests," but they're limited by the developer's understanding of the system under test. "We can't write tests for bugs we don't know could occur." We can overcome this limit with exhaustive testing, checking our code's behavior with every possible input; if that is impractical, coverage-guided fuzz testing can generate random inputs and evolve them, trying to explore every branch in the code under test. Fuzzers are very good at finding inputs that crash a program, but they're not as well suited for finding other kinds of bugs.

Read more 2020 Python Language Summit coverage.

For testing the Python standard library, Hatfield-Dodds proposed a different technique: property-based testing. (He is one of the leaders of the Hypothesis property-based testing project.) A property-based test framework doesn't generate totally random input like a fuzzer; it can generate structured inputs such as lists of numbers, or only sorted lists, or instances of a certain object. Unlike handwritten tests, which usually assert that a particular input produces one exact output, property-based tests assert properties of a function, for example that its output is sorted, or that a function is idempotent or commutative.

Hatfield-Dodds presented the following Hypothesis test of a JSON codec:
@given(
    value=st.recursive(
        st.none() | st.booleans() | st.floats() | st.text(),
        lambda x: st.lists(x) | st.dictionaries(st.text(), x),
    )
)
def test_record_json_roundtrip(value):
    assume(value == value)
    assert value == json.loads(json.dumps(value))
The recursive input generator can create None, booleans, floats, text, or lists or dictionaries that contain such values, and so on recursively. Within the test function, the assume statement checks that the input is equal to itself, to avoid inputs with nan, which is not. The heart of the test is the assert statement.

(The above example still has troubles with nan, see Hatfield-Dodds' PyCon Australia talk.)

Hypothesis searches for bugs by randomizing the input, or trying interesting values that tend to trigger edge cases, or retrying inputs that triggered bugs in previous runs. When Hypothesis finds a bug, it evolves the input, searching for the simplest input that reproduces the same bug.

"I want you all to write property-based tests for CPython, for builtins, for PyPy, for everything," said Hatfield-Dodds. He proposed to write new tests, or port existing ones to a property-based test framework, run them in CPython's continuous integration suite, and share them among the Python implementations. These tests could use Hypothesis; they could also be integrated with the AFL fuzzer or used in Google's OSS-Fuzz project. He presented a repository of tests demonstrating the technique for standard library modules such as gzip, re, and datetime. There is even a test that can generate random, valid Python code to fuzz-test the Python parser.

Łukasz Langa mentioned that David MacIver had used Hypothesis to test a Python code formatter and found dozens of bugs.

Paul Ganssle told the Summit that he used property-based testing for his implementation of datetime.fromisoformat. When the function was merged into the standard library the property-based tests were not. In subsequent development Ganssle introduced a segfault bug that "almost certainly would have been caught" if the original tests had still been running. He strongly endorsed Hatfield-Dodds's idea. He added that property-based testing is especially good at checking that two implementations of a module, one written in Python and one in C, are equivalent.

Friday, May 01, 2020

Should All Strings Become f-strings? - Python Language Summit 2020


The first language change proposed this year was the most radical: to make f-strings the default. Eric V. Smith, who wrote the PEP for f-strings in 2015, said they are the killer feature of Python 3.6, and they are the motivation for many of his clients to move to Python 3. However, they are error-prone. It's common to forget the "f" prefix:
x = 1
# Forgot the f prefix.
print("The value is {x}")
Smith has observed programmers f-prefixing all strings, whether they include substitutions or not, just to avoid this mistake.

Read more 2020 Python Language Summit coverage.


When f-strings were added in 3.6, it was suggested to make them the default, but this was too big a breaking change. Besides, replacing all literal brace characters with double braces would be ugly:
"A single pair of braces: {{}}"
In this year's Summit, Smith proposed again to make f-strings the default. The following kinds of strings would become f-strings:
  • "string" — an f-string
  • f"string" — still an f-string
  • r"string" — a raw f-string
Binary literals like b"string" would not become f-strings. Smith would add a new "p" string prefix for plain strings, which would behave like ordinary strings today.
  • p"string" — a plain string
Performance would not be affected: the runtime behavior of a string without any substitutions would be the same as today. Plain strings would still have some uses; for example, regular expressions that include braces, or as the input to str.format. In Smith's opinion, f-strings have superseded str.format, but several in the audience objected that str.format with a plain string allows for late binding, and f-strings don't obviate str.format_map.

Smith acknowledged some problems with his idea. It would introduce yet another string prefix. Flipping the master switch to enable f-mode would break some code, so there must be a way to gradually enable the change module by module, perhaps like:
from __future__ import all_fstrings
He was concerned the change was so drastic that the Python core developers would never have the nerve to enable it without requiring a future import. If so, the idea should be abandoned right away.

Yarko Tymciurak asked via chat: "How do you describe to beginners what p'why is this needed' is?" Smith conceded that p-strings make the language more complicated, but, he said, "There's going to be very few p's in the wild, and I think their explanation will be fairly obvious."

Several attendees were enthusiastic to make the change. Brett Cannon said that removing the need for f-prefix would make the language easier for beginners.

Larry Hastings pointed out that PHP strings are format strings by default and "the script kids love it." However, he wrote, "It seems to me this is solving the problem of 'oh I forgot to put an f in front of my string', and not noticing until it's too late. Is that problem bad enough that we have to change the language?
" Many agreed that f-strings by default would have been a good idea if Python were beginning from scratch; however, Paul Moore, Guido van Rossum, and others feared the disruption would outweigh the benefits. The group concluded that Smith should send his PEP to the mailing list for further debate.