xpath解析html文档
从字符串解析HTML文档,返回根节点
lxml.etree.HTML(text, parser=None, base_url=None)
Parses an HTML document from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed “HTML literals” in Python code.
To override the parser with a different HTMLParser you can pass it to the parser keyword argument.
The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).
文档 https://lxml.de/apidoc/lxml.etree.html#lxml.etree.HTML
# 典型使用场景:从requests返回的html字符串 resp.text html='<html><head><title>a-test-page</title><body><li>line1</li><li>line22</li></body></html>' tree=etree.HTML(html) tree.xpath('//title/text()') # tree.xpath('//li') #
xpath解析xml文档