How to Scrape Walmart Product Data Using Python and Handle Anti-Scraping Mechanisms

2025-03-19

When scraping data from large e-commerce platforms like Walmart, we often encounter various anti-scraping mechanisms that make direct data extraction challenging. In this article, we will walk through how to use Python to bypass anti-scraping measures and retrieve stable product data using strategies such as request header spoofing, introducing request delays, using proxy servers, and utilizing dynamic page scraping tools.

1. Install Required Python Libraries

To scrape static pages, you need the following libraries:

pip install requests beautifulsoup4

If the page data is dynamically loaded via JavaScript, you will need to install Selenium:

pip install selenium webdriver-manager

2. Scraping Walmart Using Requests + BeautifulSoup

Here is an example that scrapes product information for the search term "laptop" on Walmart:

import requests
from bs4 import BeautifulSoup
# Walmart search URL
search_query = "laptop"
base_url = f"https://www.walmart.com/search?q={search_query}"
# Spoofing the user agent to avoid detection
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
response = requests.get(base_url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
products = soup.find_all("div", class_="search-result-gridview-item")
for product in products:
title = product.find("a", class_="product-title-link")
price = product.find("span", class_="price-characteristic")
if title and price:
product_name = title.text.strip()
product_price = price.text.strip()
product_url = "https://www.walmart.com" + title["href"]
print(f"Product Name: {product_name}")
print(f"Price: ${product_price}")
print(f"Link: {product_url}")
print("-" * 50)
else:
print("Request failed, status code:", response.status_code)

3. Handling Anti-Scraping Mechanisms

(1) Adding Request Headers

To make the crawler look more like a real user, you can use a more complete header:

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.4472.124 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.google.com/",
}

(2) Adding Request Delays

By adding delays between consecutive requests, you can reduce the risk of triggering anti-scraping mechanisms:

import time
time.sleep(2)  # Add a delay of 2 seconds

(3) Using Proxies

When direct requests are blocked, you can bypass IP blocking by using proxy servers. Here’s how you can integrate Luckdata’s proxy services:

import requests
proxies = {
"http": "http://Account:Password@ahk.luckdata.io:Port",
"https": "http://Account:Password@ahk.luckdata.io:Port",
}
response = requests.get(base_url, headers=headers, proxies=proxies)
if response.status_code == 200:
print("Successfully retrieved data via proxy")
else:
print("Proxy request failed, status code:", response.status_code)

Luckdata’s proxy services support both dynamic residential and data center proxies, ensuring high anonymity and global coverage to effectively reduce IP blocking risks and improve scraping stability for large-scale data collection.

4. Using Selenium to Scrape Dynamic Pages

When Walmart pages load content using JavaScript, you can use Selenium to simulate browser actions and retrieve the full data:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
search_query = "laptop"
base_url = f"https://www.walmart.com/search?q={search_query}"
driver.get(base_url)
time.sleep(5)  # Wait for the page to load
products = driver.find_elements(By.CSS_SELECTOR, "div.search-result-gridview-item")
for product in products:
try:
title_element = product.find_element(By.CSS_SELECTOR, "a.product-title-link")
price_element = product.find_element(By.CSS_SELECTOR, "span.price-characteristic")
product_name = title_element.text
product_price = price_element.text
product_url = title_element.get_attribute("href")
print(f"Product Name: {product_name}")
print(f"Price: ${product_price}")
print(f"Link: {product_url}")
print("-" * 50)
except Exception:
print("Skipping a product, data may be incomplete")
driver.quit()

5. Using API to Retrieve Walmart Data

Using an API allows you to directly access structured data, avoiding the complexities of parsing HTML and handling anti-scraping measures. Here's how you can call Luckdata’s Walmart API:

import requests
headers = {
'X-Luckdata-Api-Key': 'your luckdata key'
}
api_url = 'https://luckdata.io/api/walmart-API/get_vwzq'
params = {
'url': 'https://www.walmart.com/ip/NELEUS-Mens-Dry-Fit-Mesh-Athletic-Shirts-3-Pack-Black-Gray-Olive-Green-US-Size-M/439625664?classType=VARIANT'
}
response = requests.get(api_url, headers=headers, params=params)
print(response.json())

By using Luckdata's API, you can quickly retrieve detailed Walmart product data, avoiding anti-scraping limitations and obtaining structured data directly, which simplifies the scraping process significantly.

6. Saving Data to a CSV File

You can save the scraped data to a CSV file for easy analysis and processing:

import csv
data = [
("Product Name", "Price", "Link"),
("Laptop 1", "$499.99", "https://www.walmart.com/laptop1"),
("Laptop 2", "$799.99", "https://www.walmart.com/laptop2"),
]
with open("walmart_data.csv", "w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerows(data)
print("Data has been saved to walmart_data.csv")

Conclusion

Static Page Scraping: Use requests and BeautifulSoup to scrape static web pages.
Dynamic Page Scraping: Use Selenium to simulate browser actions and extract data from pages that use JavaScript.
Handling Anti-Scraping Mechanisms: Bypass detection by spoofing headers, adding request delays, using proxies (such as Luckdata's proxy services), and leveraging headless Selenium.
API Usage: Use Luckdata’s API to quickly obtain structured Walmart product data, bypassing the complexities of web scraping.
Data Storage: Save the scraped data in CSV format for easy analysis and use in future projects.