Skip to content
browserutils
Glossary

What is XPath?

XPath (XML Path Language) is a query language for selecting nodes and computing values from XML and HTML documents using path expressions.

XPath (XML Path Language) is a W3C-standardized query language for selecting nodes and computing values from XML and HTML documents using path-like expressions. First published in 1999 and now at version 3.1, XPath is used extensively in XSLT transformations, web scraping with libraries like lxml and Cheerio, browser test automation (Selenium and Playwright), and document processing pipelines.

Path expressions

XPath uses path-like syntax to navigate the document tree:

/bookstore/book/title        → Select all <title> elements under <book> under <bookstore>
//title                      → Select all <title> elements anywhere in the document
/bookstore/book[1]           → Select the first <book> element
/bookstore/book[last()]      → Select the last <book> element
/bookstore/book[@category]   → Select <book> elements that have a "category" attribute

The / separator navigates child elements (like a file path). // selects descendants at any depth.

Predicates and filters

Square brackets filter nodes by condition:

//book[@category='fiction']           → Books with category "fiction"
//book[price > 30]                    → Books with price greater than 30
//book[contains(title, 'Python')]     → Books with "Python" in the title
//div[@class='content']//p            → All <p> elements inside <div class="content">

Axes

XPath axes select nodes relative to the current node:

  • parent::* — the parent element
  • ancestor::div — all ancestor <div> elements
  • following-sibling::* — all following siblings
  • preceding::* — all nodes that appear before in document order
  • descendant-or-self::* — the current node and all descendants

Short forms: .. for parent, . for self, @ for attributes.

XPath in web scraping and testing

Selenium, Playwright, and Puppeteer all support XPath selectors for locating elements:

# Selenium
driver.find_element(By.XPATH, "//button[@id='submit']")

# Playwright
page.locator("xpath=//input[@name='email']")

While CSS selectors handle most cases, XPath is more powerful for:

  • Selecting elements by text content: //button[text()='Submit']
  • Navigating upward: //span[@class='error']/parent::div
  • Complex conditions: //tr[td[1]='Active' and td[3] > 100]

XPath vs. CSS selectors

CSS selectors are shorter and more readable for simple selections. XPath is more expressive — it can traverse upward (parent, ancestor), filter by text content, and evaluate complex boolean conditions. Performance-wise, CSS selectors are generally faster in browsers.

XPath vs. JSONPath

JSONPath brings XPath-like querying to JSON documents. The syntax differs ($.store.book[*].title vs /store/book/title), but the concept is the same: navigate a tree structure with path expressions.

Format XML documents with the XML Formatter, test JSONPath expressions with the JSON Path Tester, or convert between formats with XML to JSON.