A Comprehensive Guide to Collecting Data from Footlocker.kr Using Web Scraping and Luckdata API

In the field of sneaker data collection, obtaining product information, stock status, and pricing data is crucial. This article will introduce two common methods for data retrieval:

  1. Traditional web scraping using Python's requests and BeautifulSoup.

  2. Using the Luckdata Sneaker API to quickly fetch structured data.

By comparing and combining these two methods, we can choose the most suitable approach or integrate them for subsequent data processing and analysis.


1. Analyzing Requirements and Target Website

Before scraping data, it's important to understand the structure of the target website footlocker.kr and the locations of the required data. Typically, you can use the browser's developer tools (F12) to inspect the HTML structure and identify the tags and classes containing product names, prices, stock information, etc.

Since some pages use JavaScript to dynamically load content, using requests alone may not retrieve complete data. Fortunately, the Luckdata Sneaker API provides a unified interface that directly returns structured JSON data, eliminating the need for HTML parsing.


2. Method 1: Traditional Web Scraping with Python

2.1 Using requests to Fetch Page Content

The following code demonstrates how to set up request headers and use requests to retrieve the page's HTML source:

import requests

# Define request headers to mimic a browser

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

}

url = "https://www.footlocker.kr/ko/"

response = requests.get(url, headers=headers)

if response.status_code == 200:

print("Request successful! Previewing part of the source:")

print(response.text[:500]) # Preview the first 500 characters

else:

print(f"Request failed, status code: {response.status_code}")

Note: If the page content is rendered by JavaScript, this method may not retrieve the full data.

2.2 Using BeautifulSoup to Parse HTML Data

If the data is static, you can use BeautifulSoup to extract it. For example, to extract product names from a page:

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "html.parser")

# Assuming product names are in div elements with class 'ProductCard-name'

product_elements = soup.find_all("div", class_="ProductCard-name")

product_names = [product.get_text(strip=True) for product in product_elements]

print("Product Name List:")

for name in product_names:

print(name)

In actual development, you'll need to adjust the selectors based on the website's actual HTML structure.

2.3 Storing Data

The scraped data can be stored in a CSV or JSON file for further analysis:

import pandas as pd

# Remove duplicates and store data

product_names = list(set(product_names))

df = pd.DataFrame({"Product Name": product_names})

df.to_csv("footlocker_products.csv", index=False, encoding="utf-8")


3. Method 2: Using the Luckdata Sneaker API to Retrieve Data

The Luckdata Sneaker API is a tool that integrates interfaces from multiple sneaker websites, allowing you to retrieve product information from footlocker.kr and other platforms through a unified API. It directly returns structured JSON data, which simplifies the process of HTML parsing.

3.1 API Request Example

First, you need to obtain a Luckdata API Key and then send a request using the following code. The complete URL is:

https://luckdata.io/api/sneaker-API/get_aa0x?url=https://www.footlocker.kr/ko/product/~/316161353004.html

Here’s the example code:

import requests

# Set your Luckdata API Key

luckdata_api_key = "your_luckdata_key"

headers = {

"X-Luckdata-Api-Key": luckdata_api_key

}

# Product URL

product_url = "https://www.footlocker.kr/ko/product/~/316161353004.html"

# Construct the API request URL

api_url = f"https://luckdata.io/api/sneaker-API/get_aa0x?url={product_url}"

# Send the request

response = requests.get(api_url, headers=headers)

# Check the response status and print the returned data

if response.status_code == 200:

data = response.json()

print("Luckdata API response:")

print(data)

else:

print(f"Request failed, status code: {response.status_code}, response: {response.text}")

3.2 Parsing Luckdata API Response Data

Assuming the API returns fields such as name (product name), price, stock_status, image (image URL), etc., you can extract the relevant fields with the following code:

if response.status_code == 200:

data = response.json()

product_name = data.get("name", "Unknown Product")

price = data.get("price", "Unknown Price")

stock_status = data.get("stock_status", "Unknown Stock Status")

image_url = data.get("image", "No Image")

brand = data.get("brand", "Unknown Brand")

category = data.get("category", "Unknown Category")

print(f"Product Name: {product_name}")

print(f"Brand: {brand}")

print(f"Category: {category}")

print(f"Price: {price}")

print(f"Stock Status: {stock_status}")

print(f"Image URL: {image_url}")

3.3 Storing Data

Storing Data as CSV

import pandas as pd

df = pd.DataFrame([{

"Product Name": product_name,

"Brand": brand,

"Category": category,

"Price": price,

"Stock Status": stock_status,

"Image URL": image_url

}])

df.to_csv("footlocker_product_api.csv", index=False, encoding="utf-8")

Storing Data as JSON

import json

product_data = {

"Product Name": product_name,

"Brand": brand,

"Category": category,

"Price": price,

"Stock Status": stock_status,

"Image URL": image_url

}

with open("footlocker_product_api.json", "w", encoding="utf-8") as f:

json.dump(product_data, f, ensure_ascii=False, indent=4)


4. Integrated Application: Bulk Retrieval and Data Consolidation

For scenarios where you need to collect information on multiple products, you can first create a list of product URLs, then loop through them to call the Luckdata API and store the data in a single CSV file.

import requests

import pandas as pd

luckdata_api_key = "your_luckdata_key"

headers = {"X-Luckdata-Api-Key": luckdata_api_key}

# List of product URLs

product_urls = [

"https://www.footlocker.kr/ko/product/~/316161353004.html",

"https://www.footlocker.kr/ko/product/~/316161353005.html"

]

all_products = []

for product_url in product_urls:

api_url = f"https://luckdata.io/api/sneaker-API/get_aa0x?url={product_url}"

response = requests.get(api_url, headers=headers)

if response.status_code == 200:

data = response.json()

product_name = data.get("name", "Unknown Product")

price = data.get("price", "Unknown Price")

stock_status = data.get("stock_status", "Unknown Stock Status")

image_url = data.get("image", "No Image")

brand = data.get("brand", "Unknown Brand")

category = data.get("category", "Unknown Category")

all_products.append({

"Product Name": product_name,

"Brand": brand,

"Category": category,

"Price": price,

"Stock Status": stock_status,

"Image URL": image_url

})

else:

print(f"Request failed: {product_url}")

# Save the data to a CSV file

df = pd.DataFrame(all_products)

df.to_csv("footlocker_bulk_products.csv", index=False, encoding="utf-8")

print("Bulk data saved successfully!")


5. Summary and Recommendations

Through the example codes above, we can see that:

  • Traditional Web Scraping: Although it can directly fetch HTML data, it requires handling JavaScript rendering and anti-scraping mechanisms, and the code is susceptible to changes in the website's structure.

  • Luckdata Sneaker API: It directly returns structured data, simplifying HTML parsing and offering significant advantages for bulk data retrieval and real-time updates. However, it requires choosing the appropriate subscription plan based on usage.

Recommendation:

  • If your data needs are simple and the page is static, consider traditional web scraping.

  • If you want to retrieve more comprehensive and structured data while avoiding anti-scraping mechanisms, the Luckdata API is recommended.

  • In practice, both methods can be combined: for instance, when Luckdata API data is incomplete, you can supplement it with web scraping.