XPath (XML Path Language) is a query language for selecting nodes and computing values from XML and HTML documents using path expressions. It’s a W3C standard used extensively in XSLT transformations, web scraping, test automation (Selenium), and XML processing.
Path expressions
XPath uses path-like syntax to navigate the document tree:
/bookstore/book/title → Select all <title> elements under <book> under <bookstore>
//title → Select all <title> elements anywhere in the document
/bookstore/book[1] → Select the first <book> element
/bookstore/book[last()] → Select the last <book> element
/bookstore/book[@category] → Select <book> elements that have a "category" attribute
The / separator navigates child elements (like a file path). // selects descendants at any depth.
Predicates and filters
Square brackets filter nodes by condition:
//book[@category='fiction'] → Books with category "fiction"
//book[price > 30] → Books with price greater than 30
//book[contains(title, 'Python')] → Books with "Python" in the title
//div[@class='content']//p → All <p> elements inside <div class="content">
Axes
XPath axes select nodes relative to the current node:
parent::*— the parent elementancestor::div— all ancestor<div>elementsfollowing-sibling::*— all following siblingspreceding::*— all nodes that appear before in document orderdescendant-or-self::*— the current node and all descendants
Short forms: .. for parent, . for self, @ for attributes.
XPath in web scraping and testing
Selenium, Playwright, and Puppeteer all support XPath selectors for locating elements:
# Selenium
driver.find_element(By.XPATH, "//button[@id='submit']")
# Playwright
page.locator("xpath=//input[@name='email']")
While CSS selectors handle most cases, XPath is more powerful for:
- Selecting elements by text content:
//button[text()='Submit'] - Navigating upward:
//span[@class='error']/parent::div - Complex conditions:
//tr[td[1]='Active' and td[3] > 100]
XPath vs. CSS selectors
CSS selectors are shorter and more readable for simple selections. XPath is more expressive — it can traverse upward (parent, ancestor), filter by text content, and evaluate complex boolean conditions. Performance-wise, CSS selectors are generally faster in browsers.
XPath vs. JSONPath
JSONPath brings XPath-like querying to JSON documents. The syntax differs ($.store.book[*].title vs /store/book/title), but the concept is the same: navigate a tree structure with path expressions.
Format XML documents with the XML Formatter, test JSONPath expressions with the JSON Path Tester, or convert between formats with XML to JSON.