Running an e-commerce business in today’s world can be quite a job. Everyone is trying to beat each other with different tactics in marketing campaigns, prices, customer service, etc… To keep up with your competitors you must know their prices for the same products as yours to stay just below them to attract customers.
With that said, let’s take a look at how you can scrape prices from your competitor’s e-Commerce website for prices on products that matches your products.
Common problems with tracking competitor's pricing
Back in the day, an organization had to manually go to the competitors’ stores, check their prices and then return to their company and update the product catalog with new prices. This used to require a lot of human resources and was very time-consuming. As more and more companies turn to the e-commerce part of their business, the prices for products in a digital world become more transparent. This gives customers the option to check for the best price every time they want to shop for a new product.
In this short tutorial, I will teach you how to use selenium with Python to create a script that will extract prices from your competitors’ websites and store the values in a CSV for further processing.
The Scenario – Scrape prices from a competitor
Let’s imagine you got an e-commerce website and you sell dog equipment. Lots of other websites out there are also selling dog equipment and some of them even got the same products as you do. To get an advantage over your competitors, you need to check their product prices every day, or hour to see if there are any changes in their pricing and update your shop accordingly.
To do this we need to do some web scraping using Python. To achieve this we will be using a Python package named Selenium with a chrome driver to inspect price elements on the websites.
For this example, I will be scraping the price of four different websites for one single product named Ruffwear Cloud Chaser Jacket. It’s a jacket for dogs made by Ruff Wear. Below is an example of the jacket on trekkinn.com.
Scrape the websites for prices using Python and Selenium
The first thing we have to do is find the websites we would like to scrape. I will be scraping 4 websites for this demo:
- Ruffwear Cloud Chaser™ » waterproof softshell dog jacket (woofshack.com)
- Cloud Chaser™ Dog Jacket | Ruffwear
- Ruffwear Cloud Chaser Jacket Grey buy and offers on Trekkinn
- Ruffwear Cloud Chaser Jacket Grey buy and offers on Bricoinn
Install requirements
Let’s prepare our code. First, you need to make sure that you have selenium installed on your machine.
pip install -U selenium
Alternatively, you can download the web driver for Selenium at Google.
Inspect websites for price elements
Go to one of the websites you would like to scrape the price at and mark the price, right-click the marked text and select Inspect. Now you have to locate the price element inside the source code.
Write the script for extracting the price from the website
Get the element by id from the source code. The web driver from Chrome has to be in the same folder as your python script. Let’s test that we can get the price from the above page:
from selenium import webdriver
import time
# Get the website using the Chrome webbdriver
browser = webdriver.Chrome()
browser.get('https://www.woofshack.com/en/cloud-chaser-waterproof-softshell-dog-jacket-ruffwear-rw-5102.html')
# Print out the result
price = browser.find_element_by_id('product-price-665')
print("Price: " + price.text)
# Close the browser
time.sleep(3)
browser.close()
When running the script in the console, you will get this result:
python .\extract-prices.py
DevTools listening on ws://127.0.0.1:62089/devtools/browser/14418422-4293-492b-b3ff-aad1a5f62ff4
Price: €107.95
Now we have to store the data into a data frame that can be inserted into an excel sheet in the end. Make sure you have pandas installed, if not you can use this command: pip install pandas
.
from selenium import webdriver
import time
# Get the website using the Chrome webbdriver
browser = webdriver.Chrome()
browser.get('https://www.woofshack.com/en/cloud-chaser-waterproof-softshell-dog-jacket-ruffwear-rw-5102.html')
# Print out the result
price = browser.find_element_by_id('product-price-665')
print("Price: " + price.text)
# Close the browser
time.sleep(3)
browser.close()
#store it into a data frame for saving to Excel at a later time in the script
import numpy as np
import pandas as pd
df = pd.DataFrame([["woofshack.com", price.text]], columns=["Website","Price"])
from selenium import webdriver
import time
# Get the website using the Chrome webbdriver
browser = webdriver.Chrome()
browser.get('https://www.woofshack.com/en/cloud-chaser-waterproof-softshell-dog-jacket-ruffwear-rw-5102.html')
# Print out the result
price = browser.find_element_by_id('product-price-665')
print("Price: " + price.text)
# Close the browser
time.sleep(3)
browser.close()
#store it into a data frame for saving to Excel at a later time in the script
import numpy as np
import pandas as pd
df = pd.DataFrame([["woofshack.com", price.text]], columns=["Website","Price"])
Now do the same thing for the other website like below.
from selenium import webdriver
import time
# Get the website using the Chrome webbdriver
browser = webdriver.Chrome()
browser.get('https://www.woofshack.com/en/cloud-chaser-waterproof-softshell-dog-jacket-ruffwear-rw-5102.html')
# Print out the result
price = browser.find_element_by_id('product-price-665')
print("Price: " + price.text)
#store it into a data frame for saving to Excel at a later time in the script
import numpy as np
import pandas as pd
df = pd.DataFrame([["woofshack.com", price.text]], columns=["Website","Price"])
# Close the browser
#time.sleep(3)
#browser.close()
#Repeat the step for website no. 2, etc...
browser.get('https://www.bricoinn.com/en/ruffwear-cloud-chaser-jacket/138328147/p')
price = browser.find_element_by_id('datos_producto_precio')
print("Price: " + price.text)
#Put in the product B price into the table
df2 = pd.DataFrame([["bricoinn.com", price.text]], columns=["Website","Price"])
df=df.append(df2, ignore_index=True)
#Repeat the step for website no. 3, etc...
browser.get('https://www.trekkinn.com/outdoor-mountain/ruffwear-cloud-chaser-jacket/138328147/p')
price = browser.find_element_by_id('datos_producto_precio')
print("Price: " + price.text)
#Put in the product C price into the table
df3 = pd.DataFrame([["trekkinn.com", price.text]], columns=["Website","Price"])
df=df.append(df3, ignore_index=True)
print(df)
# Close the browser
browser.close()
Running the small web scrape script gave me this in my console:
PS C:\Users\Christian Schou\OneDrive\Skrivebord\deep-learning> python .\extract-prices.py
DevTools listening on ws://127.0.0.1:55039/devtools/browser/38f8468b-1eaf-4a37-b2f0-9d8692b83bb7
Website Price
0 woofshack.com €107.95
1 bricoinn.com kr 792.99
2 trekkinn.com kr 792.99
Now the only thing we have to do to save the result to an Excel-sheet is to append the following line of code at the bottom of the script: df.to_csv(r'PriceList.csv', index = False)
.
The Excel CSV file looks like the following when data has been imported:
Final Web Scrape Script
Below is the final script to scrape prices from your competitors.
from selenium import webdriver
import time
# Get the website using the Chrome webbdriver
browser = webdriver.Chrome()
browser.get('https://www.woofshack.com/en/cloud-chaser-waterproof-softshell-dog-jacket-ruffwear-rw-5102.html')
# Print out the result
price = browser.find_element_by_id('product-price-665')
print("Price: " + price.text)
#store it into a data frame for saving to Excel at a later time in the script
import numpy as np
import pandas as pd
df = pd.DataFrame([["woofshack.com", price.text]], columns=["Product","Price"])
# Close the browser
#time.sleep(3)
#browser.close()
#Repeat the step for website no. 2, etc...
browser.get('https://www.bricoinn.com/en/ruffwear-cloud-chaser-jacket/138328147/p')
price = browser.find_element_by_id('datos_producto_precio')
print("Price: " + price.text)
#Put in the product B price into the table
df2 = pd.DataFrame([["bricoinn.com", price.text]], columns=["Product","Price"])
df=df.append(df2, ignore_index=True)
#Repeat the step for website no. 3, etc...
browser.get('https://www.trekkinn.com/outdoor-mountain/ruffwear-cloud-chaser-jacket/138328147/p')
price = browser.find_element_by_id('datos_producto_precio')
print("Price: " + price.text)
#Put in the product C price into the table
df3 = pd.DataFrame([["trekkinn.com", price.text]], columns=["Product","Price"])
df=df.append(df3, ignore_index=True)
print(df)
# Close the browser
browser.close()
#Save data frame data into Excel CSV file
df.to_csv(r'PriceList.csv', index = False)
Summary
You can use the above script for a single product or tweak it to use it for multiple products on one single website. In this short tutorial about Python web scraping, you learned how a simple python script can help you to track competitor prices to your advantage.
You can extend it to do much more like automated emails, data analysis on prices, etc… If you got any issues, questions, or suggestions, please let me know in the comments. If you are interested in Stocks then don’t forget to check out my tutorial on how to scrape Stock Prices from Yahoo Finance. Happy Coding! 🙂