Skip to content

How do PyQuery to remove HTML coments? #256

@mywaiting

Description

@mywaiting

We know that PyQuery is an implementation that imitates jQuery writen in Javascript and I am well aware that there is no direct implementation to clear HTML comments in jQuery. Because using css selector can NOT(imposible) find html commens and delete it

However, PyQuery is built using lxml as the underlying layer, which has comprehensive xpath capabilities to capture and process all XML documents

You can extend PyQuery like this

class MyPyQuery(pyquery.PyQuery):
    def xpath(self, xpath):
        """Use the xpath compatible find method, as css selector cannot find html commends and delete it
        """
        # Note: This is the only difference from the native pyquery. find() method
        # xpath = self._css_to_xpath(selector)
        results = [child.xpath(xpath, namespaces=self.namespaces)
                   for tag in self
                   for child in tag.getchildren()]
        # Flatten the results
        elements = []
        for r in results:
            elements.extend(r)
        return self._copy(elements, parent=self)

Usage like this, you can remove all html comments like this

d = MyPyQuery("<p>hello</p><!-- comments() here --><p>world</p>")
d.xpath("//comment()").remove()
print(d.html())
# <p>hello</p><p>world</p>

Although PyQuery is designed to fully simulate jQUery, it also uses the cssselect package to convert xpath to cssselector, But I think it makes sense to keep the entry point for manipulating xpath in PyQuery

I am just providing an example for everyone's reference. If necessary, I may provide a PR to facilitate the entire package to consider both implementing xpath selector and CSS selector simultaneously

Thx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions