Significant Configparser Performance Regression #128641

2trvl · 2025-01-08T18:21:45Z

Bug report

Bug description:

Hello, @jaraco! The following commit 019143f slows down ConfigParser.read() from 2 to 6 times.

_Line is created many times when reading a file and the same regular expression is compiled for each object _strip_inline. This amounts to a 60% speed loss. The simplest solution would be to add a __call__ method and preferably create a _Line(object) when initializing RawConfigParser with an empty string value. Or abandon the _Line object altogether.

Another 40% of the performance loss comes from using cached_property for _Line.clean (10%), writing to _ReadState attributes instead of local variables (15%), and breaking up the previous giant loop into new _handle functions (15%).

I discovered this circumstance when writing an update to webbrowser. I needed to parse hundreds of small .desktop files. At first I didn't understand the reason for the increase in execution time between different distributions, so I developed 3 versions of the program:
(M) Multiprocessing
(O) Original Routine
(T) Threading

And measured their performance using timeit:

Linux arch 6.10.6-arch1-1
Parsing 186 files

Python 3.11.11
(M) 5 loops, best of 5: 62.4 msec per loop
(O) 2 loops, best of 5: 110 msec per loop
(T) 2 loops, best of 5: 130 msec per loop

Python 3.12.8
(M) 5 loops, best of 5: 66.5 msec per loop
(O) 2 loops, best of 5: 118 msec per loop
(T) 2 loops, best of 5: 140 msec per loop

Python 3.13.1
(M) 2 loops, best of 5: 125 msec per loop
(O) 1 loop, best of 5: 222 msec per loop
(T) 1 loop, best of 5: 248 msec per loop

Python 3.13.1 Free-threaded
(M) 1 loop, best of 5: 331 msec per loop
(O) 1 loop, best of 5: 648 msec per loop
(T) 1 loop, best of 5: 340 msec per loop

As you can see, performance regression is 2-6 times between 3.11 and 3.13.

I also attach a small reproducible test, just a module calling read():

import configparser

files = [
    "python3.13.desktop",
    "python3.10.desktop",
    "htop.desktop",
    "byobu.desktop",
    "com.gexperts.Tilix.desktop",
    "snap-handle-link.desktop",
    "io.snapcraft.SessionAgent.desktop",
    "remote-viewer.desktop",
    "python3.12.desktop",
    "google-chrome.desktop",
    "vim.desktop",
    "python3.11.desktop",
    "virt-manager.desktop",
    "info.desktop",
    "ubuntu-desktop-installer_ubuntu-desktop-installer.desktop",
    "firefox_firefox.desktop",
]

def main() -> None:
    parser = configparser.ConfigParser(interpolation=None)

    for shortcut in files:
        try:
            parser.clear()
            if not parser.read(shortcut, encoding="utf-8"):
                continue
        except (UnicodeDecodeError, configparser.Error):
            continue

if __name__ == "__main__":
    main()

Archive with the above mentioned .desktop files:
shortcuts.zip

And a program for generating your own .desktop paths on Linux/BSD:

import glob
import os

XDG_DATA_HOME = os.environ.get(
    "XDG_DATA_HOME", os.path.expanduser("~/.local/share")
)
XDG_DATA_DIRS = os.environ.get(
    "XDG_DATA_DIRS", "/usr/local/share/:/usr/share/"
)
XDG_DATA_DIRS = XDG_DATA_DIRS.split(os.pathsep)

def main() -> list[str]:
    files = []
    for appdata in (XDG_DATA_HOME, *XDG_DATA_DIRS):
        shortcuts = os.path.join(appdata, "applications")
        if not os.path.isdir(shortcuts):
            continue
        shortcuts = os.path.join(shortcuts, "**", "*.desktop")
        files.extend(glob.iglob(shortcuts, recursive=True))
    return files

if __name__ == "__main__":
    print(main())

Just run this example with different interpreters and you will see the difference:

$ python3.11 -m timeit -s "import nixconfig" "nixconfig.main()"
50 loops, best of 5: 5.27 msec per loop
$ python3.12 -m timeit -s "import nixconfig" "nixconfig.main()"
50 loops, best of 5: 5.33 msec per loop
$ python3.13 -m timeit -s "import nixconfig" "nixconfig.main()"
20 loops, best of 5: 11.2 msec per loop

At this point I leave the solution to this problem to you, as I have no architectural vision of configparser. However, I am willing to offer my help in proposing solutions for Pull Request.

CPython versions tested on:

3.11, 3.12, 3.13

Operating systems tested on:

Linux

The text was updated successfully, but these errors were encountered:

eendebakpt · 2025-01-09T20:33:38Z

@2trvl Thanks for the detailed description and analysis. Adding a cache to the _Line object improves performance significantly. Could you try out this branch: main...eendebakpt:cpython:configparser and report how much of the original performance this restores on your system?

2trvl · 2025-01-09T22:28:41Z

@eendebakpt This method doesn't help much, the performance gain is 8.7%. I'll attach a patch made in a hurry, which gives 33%. 7% is still the difference between a simple search for the prefix symbol and re.search.

cpython/Lib/configparser.py

Lines 987 to 1005 in 54f7e14

    
           for lineno, line in enumerate(fp, start=1): 
        
               comment_start = sys.maxsize 
        
               # strip inline comments 
        
               inline_prefixes = {p: -1 for p in self._inline_comment_prefixes} 
        
               while comment_start == sys.maxsize and inline_prefixes: 
        
                   next_prefixes = {} 
        
                   for prefix, index in inline_prefixes.items(): 
        
                       index = line.find(prefix, index+1) 
        
                       if index == -1: 
        
                           continue 
        
                       next_prefixes[prefix] = index 
        
                       if index == 0 or (index > 0 and line[index-1].isspace()): 
        
                           comment_start = min(comment_start, index) 
        
                   inline_prefixes = next_prefixes 
        
               # strip full line comments 
        
               for prefix in self._comment_prefixes: 
        
                   if line.strip().startswith(prefix): 
        
                       comment_start = 0 
        
                       break

Although ideally it would be to remove _Line and _ReadState.
configparser1.patch

2trvl added the type-bug An unexpected behavior, bug, or error label Jan 8, 2025

sobolevn added the stdlib Python modules in the Lib dir label Jan 8, 2025

2trvl mentioned this issue Jan 9, 2025

Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Configparser Performance Regression #128641

Significant Configparser Performance Regression #128641

2trvl commented Jan 8, 2025 •

edited by github-actions bot

Loading

eendebakpt commented Jan 9, 2025

2trvl commented Jan 9, 2025 •

edited

Loading

Significant Configparser Performance Regression #128641

Significant Configparser Performance Regression #128641

Comments

2trvl commented Jan 8, 2025 • edited by github-actions bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

eendebakpt commented Jan 9, 2025

2trvl commented Jan 9, 2025 • edited Loading

2trvl commented Jan 8, 2025 •

edited by github-actions bot

Loading

2trvl commented Jan 9, 2025 •

edited

Loading