Python模块lxml使用 – fengyqf's blog

xpath解析html文档

从字符串解析HTML文档，返回根节点

lxml.etree.HTML(text, parser=None, base_url=None)

Parses an HTML document from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed “HTML literals” in Python code.

To override the parser with a different HTMLParser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).

文档 https://lxml.de/apidoc/lxml.etree.html#lxml.etree.HTML

# 典型使用场景：从requests返回的html字符串 resp.text
html='<html><head><title>a-test-page</title><body><li>line1</li><li>line22</li></body></html>'
tree=etree.HTML(html)
tree.xpath('//title/text()')   #
tree.xpath('//li')  #

xpath解析xml文档

Last Updated on 2024/11/12

xpath解析html文档

xpath解析xml文档

相关文章：

发表评论 取消回复

发表评论取消回复