Using Python to Automate Data Monitoring of Footlocker.kr

As online shopping and e-commerce continue to grow, Footlocker has become one of the leading retailers for sneakers and streetwear worldwide. Its Korean site, Footlocker.kr, plays an important role in the sneaker market. For data analysts and developers alike, the ability to monitor key information such as product prices, stock status, and other details in real time is invaluable. This article provides a comprehensive guide on how to build an automated monitoring system using Python, integrating traditional web scraping, the Luckdata API, and proxy IP support.

1. Introduction

In today’s digital era, where online shopping is ubiquitous, Footlocker.kr is not only a popular destination for sneaker and apparel purchases but also an important data source for market trends. Real-time tracking of key data points such as product prices and inventory status can help with demand forecasting, inventory management, and strategic decision-making. In this article, we explore two primary methods for acquiring data from Footlocker.kr:

  1. Traditional Web Scraping – Using Python’s requests and BeautifulSoup libraries to extract data from HTML pages. This method may require Residential proxy IPs to mitigate anti-scraping measures.

  2. Luckdata Sneaker API – Calling an external API to receive structured JSON data, which eliminates the need for manual HTML parsing and better handles dynamic content.

We will also cover how to set up scheduled tasks with libraries like schedule (or APScheduler), send notifications via email or Slack, and incorporate error handling and logging to ensure a robust system. By combining these technologies, you can build an efficient and reliable automated monitoring system for Footlocker.kr.


2. Environment Setup and Dependencies

Before diving into coding, ensure you have a proper Python environment (preferably Python 3.7 or later) set up, ideally within a virtual environment to isolate dependencies. The following Python libraries are required:

  • requests: For making HTTP requests.

  • BeautifulSoup (bs4): For parsing HTML content.

  • schedule (or APScheduler): For scheduling periodic tasks.

  • smtplib: For sending email notifications.

  • slack_sdk: For sending notifications via Slack.

  • Luckdata API: No special library is needed since it can be called using requests.

Install the necessary packages using pip:

pip install requests beautifulsoup4 schedule slack_sdk

For more advanced scheduling, you might also install APScheduler:

pip install apscheduler


3. Website Structure and Data Requirements

Before developing the monitoring system, it is essential to analyze the structure of Footlocker.kr. Use your browser’s developer tools (press F12) to inspect the HTML elements and locate the data you need to monitor. Key information includes:

  • Product Name: Typically located within a <div> or <span> element with a specific class or id.

  • Price: The product's price, which may include both original and discounted prices.

  • Stock Status: Information indicating whether the product is available or sold out, often displayed with text or icons.

  • Additional Details: Such as brand, category, and image URLs, which provide richer product information.

In cases where the content is loaded dynamically with JavaScript, traditional web scraping might not capture the complete data. This is where the Luckdata Sneaker API becomes invaluable—it returns structured JSON data directly, bypassing the need for manual HTML parsing.


4. Data Acquisition Methods Comparison

4.1 Traditional Web Scraping

Traditional web scraping involves sending HTTP requests directly to the target URL, then parsing the returned HTML with a library like BeautifulSoup. This method works well for static content; however, it can be challenged by anti-scraping measures such as rate limiting and IP blocking. To mitigate these issues, proxy IPs are used to hide your real IP address and distribute requests.

Below is an example of using Python’s requests and BeautifulSoup with a proxy:

import requests

from bs4 import BeautifulSoup

# Set your proxy IP (replace with your actual proxy credentials)

proxy_ip = "http://Account:Password@ahk.luckdata.io:Port"

proxies = {

'http': proxy_ip,

'https': proxy_ip

}

# Target URL: Footlocker.kr homepage

url = "https://www.footlocker.kr/ko/"

# Define headers to mimic a real browser

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

}

# Make the request using the proxy

response = requests.get(url, headers=headers, proxies=proxies)

if response.status_code == 200:

print("Request successful! Preview of the HTML content:")

print(response.text[:500]) # Preview first 500 characters

else:

print(f"Request failed with status code: {response.status_code}")

# Parse the HTML using BeautifulSoup, assuming product names are in elements with the class "ProductCard-name"

soup = BeautifulSoup(response.text, "html.parser")

product_elements = soup.find_all("div", class_="ProductCard-name")

product_names = [element.get_text(strip=True) for element in product_elements]

print("List of product names:")

for name in product_names:

print(name)

4.2 Luckdata Sneaker API

The Luckdata Sneaker API offers a simplified method by returning structured JSON data. This approach avoids the complications of HTML parsing and dynamic content rendering. Proxy IPs can also be used with API calls to further protect privacy and ensure stable connections.

Below is a sample code snippet demonstrating how to call the Luckdata Sneaker API with proxy support:

import requests

# Set your proxy IP (replace with your actual proxy credentials)

proxy_ip = "http://Account:Password@ahk.luckdata.io:Port"

proxies = {

'http': proxy_ip,

'https': proxy_ip

}

# Your Luckdata API key

luckdata_api_key = "your_luckdata_key"

headers = {

"X-Luckdata-Api-Key": luckdata_api_key

}

# Target product URL

product_url = "https://www.footlocker.kr/ko/product/~/316161353004.html"

# Construct the API request URL

api_url = f"https://luckdata.io/api/sneaker-API/get_aa0x?url={product_url}"

# Make the API request using the proxy

response = requests.get(api_url, headers=headers, proxies=proxies)

if response.status_code == 200:

data = response.json()

print("Data returned by Luckdata API:")

print(data)

# Extract key fields

product_name = data.get("name", "Unknown Product")

price = data.get("price", "Unknown Price")

stock_status = data.get("stock_status", "Stock status unknown")

image_url = data.get("image", "No image")

brand = data.get("brand", "Unknown Brand")

category = data.get("category", "Unknown Category")

print(f"Product Name: {product_name}")

print(f"Brand: {brand}")

print(f"Category: {category}")

print(f"Price: {price}")

print(f"Stock Status: {stock_status}")

print(f"Image URL: {image_url}")

else:

print(f"API request failed with status code: {response.status_code} and response: {response.text}")

By comparing these two methods, you can decide which one best suits your project’s needs. Traditional web scraping gives you direct control but may require extra work to handle dynamic content and anti-scraping measures, while the Luckdata API offers a streamlined, reliable way to obtain the data.


5. Designing the Automated Monitoring System

After selecting the data acquisition method, the next step is to design the automated monitoring system. This system will periodically fetch data, compare it with historical data, and send notifications if any significant changes occur.

5.1 Setting Up Scheduled Tasks

To automate data monitoring, we can use the schedule library to run our tasks at regular intervals (e.g., hourly or daily). Here’s a simple example that uses Luckdata API for data retrieval:

import schedule

import time

import requests

def job():

print("Fetching Footlocker.kr data...")

# Use Luckdata API method for data fetching

proxy_ip = "http://Account:Password@ahk.luckdata.io:Port"

proxies = {'http': proxy_ip, 'https': proxy_ip}

luckdata_api_key = "your_luckdata_key"

headers = {"X-Luckdata-Api-Key": luckdata_api_key}

product_url = "https://www.footlocker.kr/ko/product/~/316161353004.html"

api_url = f"https://luckdata.io/api/sneaker-API/get_aa0x?url={product_url}"

response = requests.get(api_url, headers=headers, proxies=proxies)

if response.status_code == 200:

data = response.json()

print("Latest data:", data)

# Compare with stored data and trigger notifications if changes are detected

else:

print(f"Data fetch failed with status code: {response.status_code}")

# Schedule the job to run every hour

schedule.every(1).hours.do(job)

while True:

schedule.run_pending()

time.sleep(1)

For more complex scheduling, APScheduler can be used to allow for more sophisticated task management.

5.2 Data Comparison and Change Detection

Once data is fetched, it should be compared with previously stored data to detect any changes in price, stock status, or other relevant fields. Data can be stored in CSV, JSON files, or even in a database. Below is a simple implementation for saving and comparing data:

import pandas as pd

import os

def save_data(data, filename="footlocker_data.csv"):

# Convert data (a dictionary) to a DataFrame

df = pd.DataFrame([data])

if os.path.exists(filename):

# Read the existing data and combine with the new data, removing duplicates

old_df = pd.read_csv(filename)

df = pd.concat([old_df, df]).drop_duplicates(subset=["Product Name"])

df.to_csv(filename, index=False, encoding="utf-8")

def compare_data(old_data, new_data):

# Compare key fields (price and stock status) and return True if there are changes

if old_data.get("Price") != new_data.get("Price") or old_data.get("Stock Status") != new_data.get("Stock Status"):

return True

return False

Integrate these functions within your scheduled task so that every time new data is fetched, it is compared with historical data, and any significant changes trigger a notification.


6. Designing the Notification Mechanism

When a change is detected, timely notifications are crucial. Common notification channels include email and Slack. Below are examples of how to implement these in Python.

6.1 Email Notifications

Using Python’s smtplib module, you can easily send email alerts. Here’s a sample function:

import smtplib

from email.mime.text import MIMEText

def send_email(subject, body, from_addr="your_email@example.com", to_addr="recipient@example.com"):

msg = MIMEText(body, "plain", "utf-8")

msg["Subject"] = subject

msg["From"] = from_addr

msg["To"] = to_addr

try:

# Connect to the SMTP server (example using Gmail)

server = smtplib.SMTP("smtp.gmail.com", 587)

server.starttls()

server.login(from_addr, "your_email_password")

server.sendmail(from_addr, [to_addr], msg.as_string())

server.quit()

print("Email sent successfully!")

except Exception as e:

print(f"Failed to send email: {e}")

# Example usage: Send an alert when a price change is detected

send_email("Product Price Update", "Product A has experienced a price change. Please review the update.")

6.2 Slack Notifications

For real-time messaging, Slack notifications are highly effective. Use the slack_sdk package to send messages to a designated channel:

from slack_sdk import WebClient

from slack_sdk.errors import SlackApiError

def send_slack_message(message, channel="#your-channel"):

client = WebClient(token="your-slack-token")

try:

response = client.chat_postMessage(channel=channel, text=message)

print("Slack notification sent successfully!")

except SlackApiError as e:

print(f"Failed to send Slack message: {e.response['error']}")

# Example usage: Notify when stock status changes

send_slack_message("Alert: Product A stock status has updated!")

Integrate these notification functions into your monitoring workflow so that whenever a significant change is detected through the data comparison step, alerts are sent out immediately.


7. Error Handling and Logging

A robust monitoring system must gracefully handle errors. Network issues, parsing errors, or file I/O problems can all occur, and logging these events is essential for troubleshooting. Python’s logging module offers an easy way to log events:

import logging

# Configure logging format and log file

logging.basicConfig(

filename="monitor.log",

level=logging.INFO,

format="%(asctime)s - %(levelname)s - %(message)s"

)

def monitor_task():

try:

logging.info("Starting data fetch from Footlocker.kr")

# Insert data fetching and comparison operations here

except Exception as e:

logging.error(f"An error occurred during data monitoring: {e}")

# Test logging by running the monitor task

monitor_task()

By logging important events and errors, you can track system behavior over time and quickly pinpoint issues when they occur.


8. Extended Features and Bulk Monitoring

While monitoring a single product is a good starting point, you may eventually want to monitor multiple products simultaneously. The system can be extended to handle batch monitoring, data visualization, and even predictive analytics.

8.1 Bulk Data Fetching

To monitor multiple products, maintain a list of product URLs and loop through them, collecting data for each. For example:

import requests

import pandas as pd

# Set proxy and API parameters

proxy_ip = "http://Account:Password@ahk.luckdata.io:Port"

proxies = {'http': proxy_ip, 'https': proxy_ip}

luckdata_api_key = "your_luckdata_key"

headers = {"X-Luckdata-Api-Key": luckdata_api_key}

# List of product URLs

product_urls = [

"https://www.footlocker.kr/ko/product/~/316161353004.html",

"https://www.footlocker.kr/ko/product/~/316161353005.html"

# Add more URLs as needed

]

all_products = []

for url in product_urls:

api_url = f"https://luckdata.io/api/sneaker-API/get_aa0x?url={url}"

response = requests.get(api_url, headers=headers, proxies=proxies)

if response.status_code == 200:

data = response.json()

all_products.append({

"Product Name": data.get("name", "Unknown Product"),

"Brand": data.get("brand", "Unknown Brand"),

"Category": data.get("category", "Unknown Category"),

"Price": data.get("price", "Unknown Price"),

"Stock Status": data.get("stock_status", "Stock status unknown"),

"Image URL": data.get("image", "No image")

})

else:

print(f"Failed to fetch data for {url}")

# Save the aggregated data to a CSV file

df = pd.DataFrame(all_products)

df.to_csv("footlocker_bulk_products.csv", index=False, encoding="utf-8")

print("Bulk data fetching and storage completed!")

8.2 Data Visualization and Predictive Analytics

Once historical data is collected, you can analyze trends and predict future market behavior. Using libraries such as Matplotlib or Plotly, you can create visualizations that display changes in price and stock levels over time. For example:

import matplotlib.pyplot as plt

import pandas as pd

# Load historical data from CSV

data = pd.read_csv("footlocker_data.csv")

# Plot price trend for a specific product

plt.figure(figsize=(10, 5))

plt.plot(data["Date"], data["Price"], marker="o", label="Price Trend")

plt.title("Product Price Trend Over Time")

plt.xlabel("Date")

plt.ylabel("Price")

plt.legend()

plt.show()

By integrating predictive models, such as regression analysis or time-series forecasting, you can further enhance the system to predict future changes, thereby supporting proactive decision-making.


9. Summary and Recommendations

This article provided a detailed guide on how to build an automated data monitoring system for Footlocker.kr using Python. The key points include:

  • Data Acquisition Methods: We compared traditional web scraping, which requires HTML parsing and may need proxies to bypass anti-scraping measures, with the Luckdata Sneaker API that returns structured JSON data, simplifying the process considerably.

  • Proxy IP Integration: Incorporating proxy IPs in both scraping and API calls enhances privacy and reduces the risk of being blocked by the target website.

  • Automated Monitoring System: The system includes scheduled tasks (using schedule or APScheduler), data comparison to detect changes, and notification mechanisms via email or Slack.

  • Error Handling and Logging: Using Python’s logging module, we ensure that errors are captured and logged for easier troubleshooting.

  • Extensions and Bulk Monitoring: The system can be scaled to monitor multiple products and even integrated with visualization and predictive analytics tools for deeper market insights.

By combining these techniques, you can build a robust, efficient, and reliable automated monitoring system that continuously tracks Footlocker.kr data. This system not only supports real-time alerts for price or stock changes but also provides valuable historical data for trend analysis and future forecasting.

Future enhancements could include integrating machine learning models for predictive analytics, using a database for more efficient data management, and incorporating more sophisticated visualization dashboards. Such improvements would further empower businesses to make informed decisions based on up-to-date market data.