Digging through data on the internet makes me feel like an archaeologist. Metaphorical pickax in one hand, dusting brush in the other, I carefully unearth each line item and place it into Google Sheets for review. While this does let me live my 1999 The Mummy reenactment dreams, there’s just one drawback—it takes forever.
Instead of manually scouring websites for basic data like names, addresses, or title tags, you can tap Google Sheets’ IMPORTXML function to automate all that careful digging into one seamless (almost magical) data transfer.
Here’s how to use IMPORTXML in Google Sheets and make manual data collection a thing of your past.
Table of contents:
What is the IMPORTXML Function in Google Sheets?
The IMPORTXML function is a tool in Google Sheets that lets you import structured web data directly into your spreadsheet. The function is =IMPORTXML(url, "xpath")
. All you really need to know is where the data you’re looking for is contained, and then you can enter the target URL and an XPath query specifying the data you want to extract from the web page into that core function.
If that all sounded like Klingon, you’re still in this, I promise. A grasp of basic XML and HTML is needed to use IMPORTXML, but I’ll cover that here—if I can do it, you can do it.
Here are some terms you need to understand to use IMPORTXML:
-
HTML: The basic markup language that tells browsers how to display information. An HTML element consists of a start tag, content, and an end tag.
-
XML: A markup language and file format for data management. It’s similar to HTML, but you define your own tags.
-
Tag: In both HTML and XML, a tag is the markup used to denote the start and end of an element. Tags are enclosed in angle brackets.
-
XPath: An expression used to navigate XML and specify paths. This is how you tell Sheets which place to pull data from on a webpage.
How to use the IMPORTXML function in Google Sheets
First, I’ll cover a super high-level summary of how to do this. Then, we’ll get into an example that will make this make more sense and show you some of the genuinely exciting (if you’re really into web data) possibilities of IMPORTXML.
-
Find the webpage. Navigate to the website that contains the data you want to import.
-
Inspect element. Right-click a blank part of the webpage, and click Inspect to examine the HTML structure of the webpage.
-
Identify the tags: Find the tags that contain the data you want—such as
<td>
for table data cells. -
Enter IMPORTXML formula. In a Google Sheet cell, enter the function
=IMPORTXML(url, "xpath")
. Update the IMPORTXML function with the URL in place of url. Between the quotation marks, replace xpath with two slashes followed by the tags you just identified. -
Customize your data. Extract only certain elements from the table, filter your data, and clean up the remaining dataset.
That should populate your sheet with the data between the tags you identified from the URL you entered.
Note that you may be prompted to allow access to an external website—if that happens, just click Accept.
Example of importing data into Google Sheets with IMPORTXML
Wikipedia tables are a great way to practice IMPORTXML, so let’s look at an example of extracting a list of postal codes and city districts from Wikipedia. I’ll also walk through how to get even more specific data so you can see how to manipulate the results to get just what you need.
1. Find the data you want to import
In this exercise, I’ll be grabbing all the postal codes in Edmonton, Alberta. Open that page in a new browser window if you want to follow along.
2. Inspect the HTML on the webpage
Right-click on a postal code cell on the page and select Inspect to view the HTML source code in your browser.
3. Identify the tags
Check the tag housing the data in question, which in this case is <td>
—this is the tag that contains all the data in a cell.
If you find yourself looking at a code that reads <b>
, you’re in the wrong place—this is a formatting code for the text itself. You want the tag <td>
, or table data, since it will pull all the data within a table.
You can use the search feature within the Inspect menu to pinpoint the area you want to pull.
4. Enter the IMPORTXML function
In a new Google Sheet, import all the postal codes. To do this, select a cell and enter the formula:
=IMPORTXML("https://en.wikipedia.org/wiki/List_of_T_postal_codes_of_Canada", "//td")
As a reminder, that’s the function =IMPORTXML(url, "xpath")
, but we’ve replaced url with the URL of the webpage and we’ve replaced xpath with two slashes and the tag.
If it prompts you to, click Allow access (assuming you trust the site you’re importing from!). This will populate all the postal codes on our original Wikipedia page—all the data held within the <td>
tag.
5. Customize your data
If you’ve never seen HTML in your life, this section might be a little scary—but just follow along with these steps, and you’ll be fine.
Extract only the primary city
What if you only want to grab the primary city associated with the postal code? Our first query pulled in a lot of extra data.
To narrow the scope, we can change the XPath query to "
/
/td/span/a[1]"
. This will just pull the city linked to each postal code into our sheet—because we’re specifying that we want to look within those table data cells for a “span” tag that contains an <a>
(anchor) tag, which is where the city names are found in the HTML.
Extract only the postal code
We can also just extract the postal codes themselves. To do that, we’ll update to this XPath query to "//td/b[1]"
.
Here, we know /b[1]
targets just the postal codes because inspecting the HTML source reveals that postal codes are enclosed within the first <b>
tag inside each <td>
table cell.
Filter for Edmonton
To find the postal codes in Edmonton, Alberta, we’ll use an even more specific query.
We’ll modify the XPath query to "//td[span/a='Edmonton']/b[1]"
. This will return only the postal codes that include “Edmonton” in the td/span/a[1]
field—since that’s where we can see the city names are housed in the page code.
Extract neighborhood names
Next, we’ll build on this by extracting the neighborhood names and putting them in the column next to the postal codes. To get neighborhood names following “Edmonton,” we’ll use this XPath query: "//td[span/a='Edmonton']/span[1]"
.
As you can see, this is suddenly a lot of data. This function grabs the entire contents of span[1] and uses parentheses and slashes to divide it up, splicing “Edmonton” into the first column and each neighborhood name into following columns.
Clean up the dataset
From here, we can clean things up by reorganizing our data into columns for postal code, city, and neighborhood. This will give us a nice crisp dataset that we can use for pretty much anything—as long as that anything requires neighborhood names in Edmonton.
Given that the data in question spans from column B to column J, you can use SPLIT and CONCATENATE functions for this: =SPLIT(concatenate(B2:J2),"(/)")
.
Just scroll and use this function a few columns over or a few rows down. You can do this for all the data you want to clean up and remove excess spaces or punctuation.
And there you have it—our list of postal codes and city districts in Edmonton.
Related Google Sheets functions
Google Sheets functions are The Mummy levels of cool, and there are more where that came from. Here are a few more functions that work similarly to IMPORTXML, as they can be used to import data from one source into a sheet.
Stretch your new XPath wings with one of these related functions:
-
IMPORTHTML: Imports data from an HTML table or list on a webpage. (Very similar to what we just did, but for entire HTML tables.)
-
IMPORTRANGE: Imports a range of cells from one Google Sheets spreadsheet into another. (Makes combining Sheets data easy.)
-
IMPORTDATA: Imports data from a CSV or TSV file hosted on a web URL.
-
IMPORTFEED: Imports content from an RSS feed—great for news sites if you want to create a live feed in your spreadsheet.
-
GOOGLEFINANCE: Imports current or historical financial market data directly from Google Finance.
IMPORTXML tips
If you’ve made it this far, you’re probably already an accredited Sheets wizard and don’t need any more of my advice. But just in case, here are some best practices for working with functions like IMPORTXML.
-
Use the Inspect feature: Use Inspect Element in your web browser to view the HTML structure of the page before pulling data. Right-click on the webpage element you want to import, and select Inspect from your browser menu. This allows you to see all the tags included—which can potentially help make your query more specific.
-
Double-check your URL: Always make sure the URL in your IMPORTXML formula is the whole URL—not just the slug or most of it. And yes, this includes the
http://
orhttps://
prefix. (We won’t talk about how many times I’ve messed this one up.) -
Be specific: Targeting really specific data with your query is usually better than pulling massive data ranges. Use precise queries to pull only exactly the data you need, no more and no less.
-
Use a test sheet: I like to set up a separate Google Sheet to test my IMPORTXML functions before implementing them because there’s no way I won’t mess it up the first time. This helps me make sure my query is working correctly before adding it to my real spreadsheet with all the Very Important Data.
Automate Google Sheets with Zapier
The IMPORTXML function is a gift to data enthusiasts everywhere. It makes manual data collection, at least on the web, a thing of the mummified past, so you can spend more time admiring your beautiful Sheets.
And if that’s still not enough, you can explore Zapier’s Google Sheets integrations. Zapier lets you connect Sheets with thousands of other apps, automating imports from forms, events, and anywhere else you have data. Learn more about how to automate Google Sheets, or check out some of these pre-made templates.
Zapier is the leader in workflow automation—integrating with thousands of apps from partners like Google, Salesforce, and Microsoft. Use interfaces, data tables, and logic to build secure, automated systems for your business-critical workflows across your organization’s technology stack. Learn more.
IMPORTXML FAQ
What is the alternative to IMPORTXML in Google Sheets?
If you’re looking for an alternative to IMPORTXML, you could use the similar function, IMPORTHTML. It’s less precise than IMPORTXML and is primarily used for tables or lists. You could also use a third-party tool if your goal is larger-scale web scraping—or write your own script.
How can I refresh IMPORTXML in Google Sheets?
To refresh IMPORTXML in Sheets, click on any cell containing the function and press Enter or return. This will manually recalculate the formula and refresh the data. The data also refreshes every time you make a change to the formula, or you can use an add-on or a custom script that will automatically refresh the Sheet.
How often does IMPORTXML refresh?
As often as you want it to. The IMPORTXML function doesn’t refresh automatically, but it’s easy to refresh whenever you want to trigger it. Refresh triggers include hitting the Enter or return key on a cell containing the function, editing other nearby cells, or changing the formula. The function may also refresh if it contains a dynamic element like NOW()
, since those elements refresh automatically.
Related reading:
This article was originally published by Allana Mayer in 2017. The most recent update was in November 2024.