The Power of XPath for SEO

Most SEOs understand that there are always going to be times when doing your job requires a lot of tedious work, the digital version of “manual labor.” We rely on tools like Majestic and Moz to be our go-to whenever we need to quickly get the data we need for whatever tasks we need to accomplish, but sometimes, even the best of these applications still falls short of what is required. This is a common problem in SEO, something works for one website, but won’t work for another because they aren’t constructed the same way. There is never a “one size fits all” in SEO, so the automation will never be perfect.¬†Shouldn’t there be some method of customizing the necessary data compilation we need that is specific to the website we are working on? That’s where XPath comes in, the solution that helps SEOs create their own tools and make their jobs far more efficient.

Now you may be asking yourself, so what is XPath and how can it make my life easier? XPath, also known as XML Path Language is a query language for selecting pieces of information in an XML document that you want to pull data from. This essentially means that any website (since all are structured in XML language) can be “scraped” for information that you want based on the command you write in XPath.

There are many operators and functions you can use within the language to create more complicated data scrapes, but the basic syntax is:

=ImportXML(URL, //TargetPath[@name=Identifer])

TargetPath is the specific aspect you want to pull from the XML/HTML code (i.e. h1, title, div, etc.). The Identifier is some characteristic which differentiates this from other data that may follow the same TargetPath. Understand that you do not always need a secondary Identifier depending on the function, and more advanced functions may require additional data. To use XPath, create a Google doc with a column for URL, and any data you are trying to retrieve off of the specified page(s).

image 1

So how do you use this for SEO?

There are endless ways to use this tool in both SEO and social media if you understand the syntax of the code, but here is my list of top uses for XPath:

1) Pull data for title tags, meta descriptions, and header tags for a list of URLs and/or QA Implementation of a Keyword Map.

Title Tags

=ImportXML(“http://www.example.com”;”//title[1]“)

Or, if pulling in from a column of URLs

=ImportXML(CellWithURL;”//title[1]“)

Meta Descriptions

=ImportXML(“http://www.example.com/”;”//meta[@name='description']/@content”)

Or, if pulling in from a column of URLs

=ImportXML(CellWithURL;”//meta[@name='description']/@content”)

Header Tags

=ImportXML(“http://www.example.com/”;”//*[@name='h1']“)

Or, if pulling in from a column of URLs

=ImportXML(CellWithURL;”//*[@name='h1']“)

 

2) Pull top search results for a keyword query.

=ImportXML(“www.google.com/search?q=KEYWORD”,”//cite”)

Caveat – it only pulls ten results at a time, you have to concatenate the URLS in some instances, and there’s no way to filter around ads, even if you have them set to not be displayed in your browser.

To change things about the search engine itself, you can also add other parameters such as:

  • &hl= – language
  • &gl= – region/country
  • &num= – number of search results to display (default is 10)

 

3) Find the top ranking keywords for a specific website using a site: command.

=ImportXML(“www.google.com/search?q=site:example.com + KEYWORD”,”//cite”)

 

4) Pull top products for a keyword query in Google.

=ImportXML(“www.google.com/products?q=KEYWORD”,”//h3[@class='r']“)

 

5) Import a site’s entire sitemap using the sitemap’s URL

=ImportXML(“example.com/sitemap.xml”, “//url/loc”)

 

Obviously there is a multitude of other great applications, but these are ones I’ve found most useful and tend to use more frequently. There are a few limitations of XPath that make it less than ideal for SEOs to use as well. First, Google restricts docs so you can only pull 50 queries at a time, which means you’ll have to run smaller batches overall. Second, since every website is coded differently, you will often have to troubleshoot formulas to make them work. In some cases this is easier than others (such as using h instead of h1), but in others, this means digging into the code to find the right query path.

Luckily, there are several great Chrome extensions to help with identifying these query paths, or taking the formulas out of the mix completely. My personal favorite is Scraper, which allows you to right click on any part of a webpage. There is also “Scrape similar…” to pull similar data (but only from the page you are currently on).

Image 2

 

Image 3

This tool makes it very easy to identify the right path, but unfortunately doesn’t work in a way where you can just copy the XPath selector and paste it into an XPath formula in Google Docs. The alternative I have, at least for simple paths such as search results, is to simply right click and “Inspect Element” to find any good identifiers for the query path.

IMAGE4

 

 

 

 

Leave a Reply