-
Notifications
You must be signed in to change notification settings - Fork 185
Description
We know that PyQuery is an implementation that imitates jQuery writen in Javascript and I am well aware that there is no direct implementation to clear HTML comments in jQuery. Because using css selector can NOT(imposible) find html commens and delete it
However, PyQuery is built using lxml as the underlying layer, which has comprehensive xpath capabilities to capture and process all XML documents
You can extend PyQuery like this
class MyPyQuery(pyquery.PyQuery):
def xpath(self, xpath):
"""Use the xpath compatible find method, as css selector cannot find html commends and delete it
"""
# Note: This is the only difference from the native pyquery. find() method
# xpath = self._css_to_xpath(selector)
results = [child.xpath(xpath, namespaces=self.namespaces)
for tag in self
for child in tag.getchildren()]
# Flatten the results
elements = []
for r in results:
elements.extend(r)
return self._copy(elements, parent=self)Usage like this, you can remove all html comments like this
d = MyPyQuery("<p>hello</p><!-- comments() here --><p>world</p>")
d.xpath("//comment()").remove()
print(d.html())
# <p>hello</p><p>world</p>Although PyQuery is designed to fully simulate jQUery, it also uses the cssselect package to convert xpath to cssselector, But I think it makes sense to keep the entry point for manipulating xpath in PyQuery
I am just providing an example for everyone's reference. If necessary, I may provide a PR to facilitate the entire package to consider both implementing xpath selector and CSS selector simultaneously
Thx