-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
- Context: The
_map_pygments_to_notion_languagefunction maps Pygments language identifiers to Notion'sCodeLangenum values. This function is called whenever Sphinx processes code blocks (literal_blocknodes) to determine the syntax highlighting language for Notion. - Bug: The function uses direct dictionary key access (
language_mapping[pygments_lang.lower()]) instead of a safe.get()method, causing aKeyErrorwhen encountering Pygments languages not explicitly defined in the mapping. - Actual vs. expected: When users specify a valid Pygments language not in the 100+ language mapping (such as "ada", "apl", "alloy", or any of the 500+ other Pygments languages), the function raises a
KeyErrorand crashes the build. The expected behavior would be to fall back toCodeLang.PLAIN_TEXTfor unknown languages. - Impact: Users cannot build documentation with code blocks in any of the hundreds of Pygments languages not explicitly mapped, causing build failures and preventing documentation generation.
Code with bug
@beartype
def _map_pygments_to_notion_language(*, pygments_lang: str) -> CodeLang:
"""
Map ``Pygments`` language names to Notion CodeLang ``enum`` values.
"""
language_mapping: dict[str, CodeLang] = {
"abap": CodeLang.ABAP,
"arduino": CodeLang.ARDUINO,
# ... ~100 language mappings ...
"yaml": CodeLang.YAML,
"yml": CodeLang.YAML,
}
return language_mapping[pygments_lang.lower()] # <-- BUG 🔴 KeyError for unmapped languagesEvidence
Example
Consider a user documenting Ada code with this reStructuredText:
.. code-block:: ada
procedure Hello is
begin
Put_Line("Hello, World!");
end Hello;When Sphinx processes this:
- The
literal_blocknode is created withlanguage="ada" - The
_process_node_to_blocksfunction (line 1017) calls_map_pygments_to_notion_language(pygments_lang="ada") - The function tries to access
language_mapping["ada"] - Since "ada" is not in the mapping dictionary, Python raises
KeyError: 'ada' - The build fails completely
This affects any of the 600+ Pygments lexers that aren't in the ~100 language mapping.
Failing test
Test script
"""Test to reproduce the KeyError bug in _map_pygments_to_notion_language."""
import tempfile
from pathlib import Path
from sphinx.testing.util import SphinxTestApp
# Create a temporary directory
with tempfile.TemporaryDirectory() as tmpdir:
srcdir = Path(tmpdir) / "source"
srcdir.mkdir()
# Write a minimal conf.py
(srcdir / "conf.py").write_text("extensions = ['sphinx_notion']\n")
# Write a test RST file with Ada code (valid Pygments language, not in mapping)
(srcdir / "index.rst").write_text("""
Test
====
.. code-block:: ada
procedure Hello is
begin
Put_Line("Hello, World!");
end Hello;
""")
# Build the docs
outdir = Path(tmpdir) / "output"
app = SphinxTestApp(
buildername="notion",
srcdir=srcdir,
builddir=outdir,
)
try:
app.build()
print("Build succeeded!")
except KeyError as e:
print(f"Build failed with KeyError: {e}")
raiseTest output
Traceback (most recent call last):
File "/home/user/sphinx-notionbuilder/test_unknown_lang.py", line 36, in <module>
app.build()
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/testing/util.py", line 237, in build
super().build(force_all, filenames)
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/application.py", line 426, in build
self.builder.build_update()
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 375, in build_update
self.build(
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 454, in build
self.write(docnames, updated_docnames, method)
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 735, in write
self.write_documents(docnames)
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 749, in write_documents
self._write_serial(sorted_docnames)
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/__init__.py", line 768, in _write_serial
self.write_doc(docname, doctree)
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/builders/text.py", line 69, in write_doc
self.writer.write(doctree, destination)
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/writers/__init__.py", line 80, in write
self.translate()
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/sphinx/writers/text.py", line 385, in translate
self.document.walkabout(visitor)
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/nodes.py", line 186, in walkabout
if child.walkabout(visitor):
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/nodes.py", line 186, in walkabout
if child.walkabout(visitor):
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/sphinx-notionbuilder/.venv/lib/python3.12/site-packages/docutils/nodes.py", line 178, in walkabout
visitor.dispatch_visit(self)
File "<@beartype(sphinx_notion.NotionTranslator.dispatch_visit) at 0x7fe15c7ed8a0>", line 33, in dispatch_visit
File "/home/user/sphinx-notionbuilder/src/sphinx_notion/__init__.py", line 1804, in dispatch_visit
blocks = _process_node_to_blocks(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.10/lib/python3.12/functools.py", line 912, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/sphinx-notionbuilder/src/sphinx_notion/__init__.py", line 1017, in _
language = _map_pygments_to_notion_language(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<@beartype(sphinx_notion._map_pygments_to_notion_language) at 0x7fe15c7a3880>", line 27, in _map_pygments_to_notion_language
File "/home/user/sphinx-notionbuilder/src/sphinx_notion/__init__.py", line 853, in _map_pygments_to_notion_language
return language_mapping[pygments_lang.lower()]
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'ada'
Build failed with KeyError: 'ada'
Full context
The _map_pygments_to_notion_language function is called from two locations:
- Processing literal blocks (line 1017 in
src/sphinx_notion/__init__.py): When converting regular code blocks - Processing captioned code blocks (line 1603 in
src/sphinx_notion/__init__.py): When converting code blocks with captions inside containers
Both call sites extract the language from the literal_block node and pass it to _map_pygments_to_notion_language:
pygments_lang = node.get(key="language", failobj="")
language = _map_pygments_to_notion_language(pygments_lang=pygments_lang)When users write code blocks in their reStructuredText documentation using the .. code-block:: directive, Sphinx processes these into literal_block nodes with a language attribute. If no language is specified, Sphinx sets language="default". The mapping includes "default", so this case works.
However, Pygments supports over 600 lexers (programming languages), while the mapping only includes about 100. Examples of common languages NOT in the mapping include:
- Ada, APL, Agda, Alloy, Ampl
- ActionScript, ANTLR
- Many domain-specific languages
- Hundreds of others
When any of these unmapped languages are used in a code block, the entire build crashes.
External documentation
Pygments provides 602 different lexers (as of the version used in this project). The documentation lists all available lexers, including their aliases. Any of these can be used in Sphinx .. code-block:: directives, but only ~100 are mapped to Notion languages.
Why has this bug gone undetected?
This bug was introduced in commit ff71072 ("Remove handling for unknown languages") which intentionally changed the implementation from:
return language_mapping.get(pygments_lang.lower(), CodeLang.PLAIN_TEXT) # <-- FIX 🟢to:
return language_mapping[pygments_lang.lower()] # Current buggy codeThe bug has gone undetected because:
- Test coverage was removed: The same commit's parent (
90f92ad) removed the testtest_code_block_unknown_languagethat specifically tested handling of unknown languages - Common cases are covered: The commit added
"default"and"text"to the mapping, which handles the most common scenarios (no language specified, or explicitly "text") - Limited language usage: Most documentation uses popular languages (Python, JavaScript, Bash, etc.) which are all in the mapping
- The mapping is comprehensive for popular languages: The ~100 languages in the mapping cover most common use cases, so users rarely encounter unmapped languages
The bug only manifests when someone tries to document code in one of the hundreds of less-common languages that Pygments supports but the mapping doesn't include.
Recommended fix
Revert to the previous safe implementation that uses .get() with a default fallback:
return language_mapping.get(pygments_lang.lower(), CodeLang.PLAIN_TEXT) # <-- FIX 🟢This ensures that any Pygments language not explicitly in the mapping will render as plain text in Notion, rather than crashing the build. This is the appropriate fallback behavior since Notion may not support syntax highlighting for all 600+ Pygments languages.