Skip to content

[Detail Bug] KeyError in _map_pygments_to_notion_language for unmapped Pygments languages #442

@detail-app

Description

@detail-app

Summary

  • Context: The _map_pygments_to_notion_language function maps Pygments language identifiers to Notion's CodeLang enum values. This function is called whenever Sphinx processes code blocks (literal_block nodes) to determine the syntax highlighting language for Notion.
  • Bug: The function uses direct dictionary key access (language_mapping[pygments_lang.lower()]) instead of a safe .get() method, causing a KeyError when encountering Pygments languages not explicitly defined in the mapping.
  • Actual vs. expected: When users specify a valid Pygments language not in the 100+ language mapping (such as "ada", "apl", "alloy", or any of the 500+ other Pygments languages), the function raises a KeyError and crashes the build. The expected behavior would be to fall back to CodeLang.PLAIN_TEXT for unknown languages.
  • Impact: Users cannot build documentation with code blocks in any of the hundreds of Pygments languages not explicitly mapped, causing build failures and preventing documentation generation.

Code with bug

@beartype
def _map_pygments_to_notion_language(*, pygments_lang: str) -> CodeLang:
    """
    Map ``Pygments`` language names to Notion CodeLang ``enum`` values.
    """
    language_mapping: dict[str, CodeLang] = {
        "abap": CodeLang.ABAP,
        "arduino": CodeLang.ARDUINO,
        # ... ~100 language mappings ...
        "yaml": CodeLang.YAML,
        "yml": CodeLang.YAML,
    }

    return language_mapping[pygments_lang.lower()]  # <-- BUG 🔴 KeyError for unmapped languages

Evidence

Example

Consider a user documenting Ada code with this reStructuredText:

.. code-block:: ada

   procedure Hello is
   begin
      Put_Line("Hello, World!");
   end Hello;

When Sphinx processes this:

  1. The literal_block node is created with language="ada"
  2. The _process_node_to_blocks function (line 1017) calls _map_pygments_to_notion_language(pygments_lang="ada")
  3. The function tries to access language_mapping["ada"]
  4. Since "ada" is not in the mapping dictionary, Python raises KeyError: 'ada'
  5. The build fails completely

This affects any of the 600+ Pygments lexers that aren't in the ~100 language mapping.

Failing test

Test script

"""Test to reproduce the KeyError bug in _map_pygments_to_notion_language."""

import tempfile
from pathlib import Path
from sphinx.testing.util import SphinxTestApp

# Create a temporary directory
with tempfile.TemporaryDirectory() as tmpdir:
    srcdir = Path(tmpdir) / "source"
    srcdir.mkdir()

    # Write a minimal conf.py
    (srcdir / "conf.py").write_text("extensions = ['sphinx_notion']\n")

    # Write a test RST file with Ada code (valid Pygments language, not in mapping)
    (srcdir / "index.rst").write_text("""
Test
====

.. code-block:: ada

   procedure Hello is
   begin
      Put_Line("Hello, World!");
   end Hello;
""")

    # Build the docs
    outdir = Path(tmpdir) / "output"
    app = SphinxTestApp(
        buildername="notion",
        srcdir=srcdir,
        builddir=outdir,
    )
    try:
        app.build()
        print("Build succeeded!")
    except KeyError as e:
        print(f"Build failed with KeyError: {e}")
        raise

Test output

Traceback (most recent call last):
  File "/home/user/sphinx-notionbuilder/test_unknown_lang.py", line 36, in <module>
    app.build()
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/testing/util.py", line 237, in build
    super().build(force_all, filenames)
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/application.py", line 426, in build
    self.builder.build_update()
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 375, in build_update
    self.build(
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 454, in build
    self.write(docnames, updated_docnames, method)
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 735, in write
    self.write_documents(docnames)
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 749, in write_documents
    self._write_serial(sorted_docnames)
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 768, in _write_serial
    self.write_doc(docname, doctree)
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/text.py", line 69, in write_doc
    self.writer.write(doctree, destination)
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/writers/__init__.py", line 80, in write
    self.translate()
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/writers/text.py", line 385, in translate
    self.document.walkabout(visitor)
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/nodes.py", line 186, in walkabout
    if child.walkabout(visitor):
       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/nodes.py", line 186, in walkabout
    if child.walkabout(visitor):
       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/nodes.py", line 178, in walkabout
    visitor.dispatch_visit(self)
  File "<@beartype(sphinx_notion.NotionTranslator.dispatch_visit) at 0x7fe15c7ed8a0>", line 33, in dispatch_visit
  File "/home/user/sphinx-notionbuilder/src/sphinx_notion/__init__.py", line 1804, in dispatch_visit
    blocks = _process_node_to_blocks(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.12.10/lib/python3.12/functools.py", line 912, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/sphinx-notionbuilder/src/sphinx_notion/__init__.py", line 1017, in _
    language = _map_pygments_to_notion_language(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(sphinx_notion._map_pygments_to_notion_language) at 0x7fe15c7a3880>", line 27, in _map_pygments_to_notion_language
  File "/home/user/sphinx-notionbuilder/src/sphinx_notion/__init__.py", line 853, in _map_pygments_to_notion_language
    return language_mapping[pygments_lang.lower()]
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'ada'
Build failed with KeyError: 'ada'

Full context

The _map_pygments_to_notion_language function is called from two locations:

  1. Processing literal blocks (line 1017 in src/sphinx_notion/__init__.py): When converting regular code blocks
  2. Processing captioned code blocks (line 1603 in src/sphinx_notion/__init__.py): When converting code blocks with captions inside containers

Both call sites extract the language from the literal_block node and pass it to _map_pygments_to_notion_language:

pygments_lang = node.get(key="language", failobj="")
language = _map_pygments_to_notion_language(pygments_lang=pygments_lang)

When users write code blocks in their reStructuredText documentation using the .. code-block:: directive, Sphinx processes these into literal_block nodes with a language attribute. If no language is specified, Sphinx sets language="default". The mapping includes "default", so this case works.

However, Pygments supports over 600 lexers (programming languages), while the mapping only includes about 100. Examples of common languages NOT in the mapping include:

  • Ada, APL, Agda, Alloy, Ampl
  • ActionScript, ANTLR
  • Many domain-specific languages
  • Hundreds of others

When any of these unmapped languages are used in a code block, the entire build crashes.

External documentation

Pygments provides 602 different lexers (as of the version used in this project). The documentation lists all available lexers, including their aliases. Any of these can be used in Sphinx .. code-block:: directives, but only ~100 are mapped to Notion languages.

Why has this bug gone undetected?

This bug was introduced in commit ff71072 ("Remove handling for unknown languages") which intentionally changed the implementation from:

return language_mapping.get(pygments_lang.lower(), CodeLang.PLAIN_TEXT)  # <-- FIX 🟢

to:

return language_mapping[pygments_lang.lower()]  # Current buggy code

The bug has gone undetected because:

  1. Test coverage was removed: The same commit's parent (90f92ad) removed the test test_code_block_unknown_language that specifically tested handling of unknown languages
  2. Common cases are covered: The commit added "default" and "text" to the mapping, which handles the most common scenarios (no language specified, or explicitly "text")
  3. Limited language usage: Most documentation uses popular languages (Python, JavaScript, Bash, etc.) which are all in the mapping
  4. The mapping is comprehensive for popular languages: The ~100 languages in the mapping cover most common use cases, so users rarely encounter unmapped languages

The bug only manifests when someone tries to document code in one of the hundreds of less-common languages that Pygments supports but the mapping doesn't include.

Recommended fix

Revert to the previous safe implementation that uses .get() with a default fallback:

return language_mapping.get(pygments_lang.lower(), CodeLang.PLAIN_TEXT)  # <-- FIX 🟢

This ensures that any Pygments language not explicitly in the mapping will render as plain text in Notion, rather than crashing the build. This is the appropriate fallback behavior since Notion may not support syntax highlighting for all 600+ Pygments languages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdetail

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions