Selenium-Webdriver と Selenium-Chromedriver を使用した Forex Factory からデータ抽出
このガイドでは、Selenium-Webdriver と Selenium-Chromedriver を使用して Forex Factory から経済指標データを抽出する方法を説明します。 Forex Factory は、世界中の経済指標に関するリアルタイムデータと分析を提供する人気のあるウェブサイトです。
必要なもの
- Python
- Selenium
- Chromedriver
手順
-
環境設定
- Python と Selenium をインストールします。
- Chromedriver をダウンロードし、システムの適切な場所にインストールします。
- ブラウザのバージョンに一致する Chromedriver バージョンをダウンロードしてください。
-
スクリプト作成
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # Chromedriver のパスを設定 driver_path = "/path/to/chromedriver" # Chrome ブラウザを起動 driver = webdriver.Chrome(executable_path=driver_path) # Forex Factory の URL にアクセス url = "https://www.forexfactory.com/calendar" driver.get(url) # 経済指標テーブルを取得 table = driver.find_element(By.ID, "calendarTable") # 各行をループ for row in table.find_elements(By.TAG_NAME, "tr"): # 日付を取得 date_cell = row.find_element(By.CLASS_NAME, "calendarCellDate") date = date_cell.text # 時間を取得 time_cell = row.find_element(By.CLASS_NAME, "calendarCellTime") time = time_cell.text # 国を取得 country_cell = row.find_element(By.CLASS_NAME, "calendarCellCountry") country = country_cell.text # 指標を取得 indicator_cell = row.find_element(By.CLASS_NAME, "calendarCellIndicator") indicator = indicator_cell.text # 重要度を取得 importance_cell = row.find_element(By.CLASS_NAME, "calendarCellImportance") importance = importance_cell.text # 予測を取得 forecast_cell = row.find_element(By.CLASS_NAME, "calendarCellForecast") forecast = forecast_cell.text # 実際値を取得 actual_cell = row.find_element(By.CLASS_NAME, "calendarCellActual") actual = actual_cell.text # 抽出したデータを処理 print(f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}") # ブラウザを閉じる driver.quit()
説明
-
このスクリプトは、以下の手順を実行します。
- Chrome ブラウザを起動します。
- Forex Factory の URL にアクセスします。
- 経済指標テーブルを取得します。
- 各行をループして、日付、時間、国、指標、重要度、予測、実際値などのデータを抽出します。
- 抽出したデータをコンソールに出力します。
- このスクリプトはあくまでも例であり、必要に応じてカスタマイズできます。
- Forex Factory の Web サイトは変更される可能性があるため、スクリプトを定期的に更新する必要がある場合があります。
- データ抽出を行う前に、Forex Factory の利用規約を確認してください。
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Set Chromedriver path
driver_path = "/path/to/chromedriver"
# Launch Chrome browser
driver = webdriver.Chrome(executable_path=driver_path)
# Access Forex Factory URL
url = "https://www.forexfactory.com/calendar"
driver.get(url)
# Wait for the table to load
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "calendarTable"))
)
# Get the economic indicator table
table = driver.find_element(By.ID, "calendarTable")
# Loop through each row
for row in table.find_elements(By.TAG_NAME, "tr"):
# Extract data from each cell
date_cell = row.find_element(By.CLASS_NAME, "calendarCellDate")
date = date_cell.text
time_cell = row.find_element(By.CLASS_NAME, "calendarCellTime")
time = time_cell.text
country_cell = row.find_element(By.CLASS_NAME, "calendarCellCountry")
country = country_cell.text
indicator_cell = row.find_element(By.CLASS_NAME, "calendarCellIndicator")
indicator = indicator_cell.text
importance_cell = row.find_element(By.CLASS_NAME, "calendarCellImportance")
importance = importance_cell.text
forecast_cell = row.find_element(By.CLASS_NAME, "calendarCellForecast")
forecast = forecast_cell.text
actual_cell = row.find_element(By.CLASS_NAME, "calendarCellActual")
actual = actual_cell.text
# Process extracted data
print(
f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}"
)
# Close the browser
driver.quit()
Explanation:
-
Import Libraries:
import time
: Used for introducing a delay between page loads.from selenium import webdriver
: Imports the Selenium WebDriver library.from selenium.webdriver.common.by import By
: Imports theBy
class for locating elements by various criteria.from selenium.webdriver.support.ui import WebDriverWait
: Imports theWebDriverWait
class for explicit waits.from selenium.webdriver.support import expected_conditions as EC
: Imports theexpected_conditions
module for defining wait conditions.
-
Set Chromedriver Path:
-
Launch Chrome Browser:
-
Access Forex Factory URL:
url = "https://www.forexfactory.com/calendar"
: Stores the URL of the Forex Factory economic calendar page.driver.get(url)
: Navigates the Chrome browser to the specified URL.
-
Wait for Table to Load:
-
Get Economic Indicator Table:
-
Loop through Each Row:
-
Extract Data from Each Cell:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Set Chromedriver path
driver_path = "/path/to/chromedriver"
# Launch Chrome browser
driver = webdriver.Chrome(executable_path=driver_path)
# Access Forex Factory URL
url = "https://www.forexfactory.com/calendar"
driver.get(url)
# Wait for the table to load
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "#calendarTable tr"))
)
# Get economic indicator table rows
table_rows = driver.find_elements(By.CSS_SELECTOR, "#calendarTable tr")
# Loop through each row
for row in table_rows:
# Extract data from each cell using CSS selectors
date = row.find_element(By.CSS_SELECTOR, ".calendarCellDate").text
time = row.find_element(By.CSS_SELECTOR, ".calendarCellTime").text
country = row.find_element(By.CSS_SELECTOR, ".calendarCellCountry").text
indicator = row.find_element(By.CSS_SELECTOR, ".calendarCellIndicator").text
importance = row.find_element(By.CSS_SELECTOR, ".calendarCellImportance").text
forecast = row.find_element(By.CSS_SELECTOR, ".calendarCellForecast").text
actual = row.find_element(By.CSS_SELECTOR, ".calendarCellActual").text
# Process extracted data
print(
f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}"
)
# Close the browser
driver.quit()
This method utilizes CSS selectors to locate elements within the table rows. CSS selectors provide a more concise and flexible way to target specific elements based on their attributes and structural relationships in the HTML.
Using Beautiful Soup with Selenium:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
# Set Chromedriver path
driver_path = "/path/to/chromedriver"
# Launch Chrome browser
driver = webdriver.Chrome(executable_path=driver_path)
# Access Forex Factory URL
url = "https://www.forexfactory.com/calendar"
driver.get(url)
# Wait for the table to load
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "calendarTable"))
)
# Get the table HTML
table_html = driver.find_element(By.ID, "calendarTable").get_attribute("outerHTML")
# Parse the HTML using Beautiful Soup
soup = BeautifulSoup(table_html, "lxml")
# Extract data from table rows using Beautiful Soup
for row in soup.find_all("tr"):
date = row.find("td", class_="calendarCellDate").text
time = row.find("td", class_="calendarCellTime").text
country = row.find("td", class_="calendarCellCountry").text
indicator = row.find("td", class_="calendarCellIndicator").text
importance = row.find("td", class_="calendarCellImportance").text
forecast = row.find("td", class_="calendarCellForecast").text
actual = row.find("td", class_="calendarCellActual").text
# Process extracted data
print(
f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}"
)
# Close the browser
driver.quit()
This approach combines Selenium with Beautiful Soup, a popular HTML parsing library. After extracting the table HTML using Selenium, it's parsed using Beautiful Soup to extract data from the table rows. This method leverages the strengths of both tools: Selenium for browser automation and Beautiful Soup for efficient HTML parsing.
selenium-webdriver selenium-chromedriver