Selenium-Webdriver と Selenium-Chromedriver を使用した Forex Factory からデータ抽出

2024-07-27

このガイドでは、Selenium-Webdriver と Selenium-Chromedriver を使用して Forex Factory から経済指標データを抽出する方法を説明します。 Forex Factory は、世界中の経済指標に関するリアルタイムデータと分析を提供する人気のあるウェブサイトです。

必要なもの

  • Python
  • Selenium
  • Chromedriver

手順

  1. 環境設定

    • Python と Selenium をインストールします。
    • Chromedriver をダウンロードし、システムの適切な場所にインストールします。
    • ブラウザのバージョンに一致する Chromedriver バージョンをダウンロードしてください。
  2. スクリプト作成

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    # Chromedriver のパスを設定
    driver_path = "/path/to/chromedriver"
    
    # Chrome ブラウザを起動
    driver = webdriver.Chrome(executable_path=driver_path)
    
    # Forex Factory の URL にアクセス
    url = "https://www.forexfactory.com/calendar"
    driver.get(url)
    
    # 経済指標テーブルを取得
    table = driver.find_element(By.ID, "calendarTable")
    
    # 各行をループ
    for row in table.find_elements(By.TAG_NAME, "tr"):
        # 日付を取得
        date_cell = row.find_element(By.CLASS_NAME, "calendarCellDate")
        date = date_cell.text
    
        # 時間を取得
        time_cell = row.find_element(By.CLASS_NAME, "calendarCellTime")
        time = time_cell.text
    
        # 国を取得
        country_cell = row.find_element(By.CLASS_NAME, "calendarCellCountry")
        country = country_cell.text
    
        # 指標を取得
        indicator_cell = row.find_element(By.CLASS_NAME, "calendarCellIndicator")
        indicator = indicator_cell.text
    
        # 重要度を取得
        importance_cell = row.find_element(By.CLASS_NAME, "calendarCellImportance")
        importance = importance_cell.text
    
        # 予測を取得
        forecast_cell = row.find_element(By.CLASS_NAME, "calendarCellForecast")
        forecast = forecast_cell.text
    
        # 実際値を取得
        actual_cell = row.find_element(By.CLASS_NAME, "calendarCellActual")
        actual = actual_cell.text
    
        # 抽出したデータを処理
        print(f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}")
    
    # ブラウザを閉じる
    driver.quit()
    

説明

  1. このスクリプトは、以下の手順を実行します。

    • Chrome ブラウザを起動します。
    • Forex Factory の URL にアクセスします。
    • 経済指標テーブルを取得します。
    • 各行をループして、日付、時間、国、指標、重要度、予測、実際値などのデータを抽出します。
    • 抽出したデータをコンソールに出力します。
  • このスクリプトはあくまでも例であり、必要に応じてカスタマイズできます。
  • Forex Factory の Web サイトは変更される可能性があるため、スクリプトを定期的に更新する必要がある場合があります。
  • データ抽出を行う前に、Forex Factory の利用規約を確認してください。



import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Set Chromedriver path
driver_path = "/path/to/chromedriver"

# Launch Chrome browser
driver = webdriver.Chrome(executable_path=driver_path)

# Access Forex Factory URL
url = "https://www.forexfactory.com/calendar"
driver.get(url)

# Wait for the table to load
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "calendarTable"))
)

# Get the economic indicator table
table = driver.find_element(By.ID, "calendarTable")

# Loop through each row
for row in table.find_elements(By.TAG_NAME, "tr"):
    # Extract data from each cell
    date_cell = row.find_element(By.CLASS_NAME, "calendarCellDate")
    date = date_cell.text

    time_cell = row.find_element(By.CLASS_NAME, "calendarCellTime")
    time = time_cell.text

    country_cell = row.find_element(By.CLASS_NAME, "calendarCellCountry")
    country = country_cell.text

    indicator_cell = row.find_element(By.CLASS_NAME, "calendarCellIndicator")
    indicator = indicator_cell.text

    importance_cell = row.find_element(By.CLASS_NAME, "calendarCellImportance")
    importance = importance_cell.text

    forecast_cell = row.find_element(By.CLASS_NAME, "calendarCellForecast")
    forecast = forecast_cell.text

    actual_cell = row.find_element(By.CLASS_NAME, "calendarCellActual")
    actual = actual_cell.text

    # Process extracted data
    print(
        f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}"
    )

# Close the browser
driver.quit()

Explanation:

  1. Import Libraries:

    • import time: Used for introducing a delay between page loads.
    • from selenium import webdriver: Imports the Selenium WebDriver library.
    • from selenium.webdriver.common.by import By: Imports the By class for locating elements by various criteria.
    • from selenium.webdriver.support.ui import WebDriverWait: Imports the WebDriverWait class for explicit waits.
    • from selenium.webdriver.support import expected_conditions as EC: Imports the expected_conditions module for defining wait conditions.
  2. Set Chromedriver Path:

  3. Launch Chrome Browser:

  4. Access Forex Factory URL:

    • url = "https://www.forexfactory.com/calendar": Stores the URL of the Forex Factory economic calendar page.
    • driver.get(url): Navigates the Chrome browser to the specified URL.
  5. Wait for Table to Load:

  6. Get Economic Indicator Table:

  7. Loop through Each Row:

  8. Extract Data from Each Cell:




from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Set Chromedriver path
driver_path = "/path/to/chromedriver"

# Launch Chrome browser
driver = webdriver.Chrome(executable_path=driver_path)

# Access Forex Factory URL
url = "https://www.forexfactory.com/calendar"
driver.get(url)

# Wait for the table to load
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, "#calendarTable tr"))
)

# Get economic indicator table rows
table_rows = driver.find_elements(By.CSS_SELECTOR, "#calendarTable tr")

# Loop through each row
for row in table_rows:
    # Extract data from each cell using CSS selectors
    date = row.find_element(By.CSS_SELECTOR, ".calendarCellDate").text
    time = row.find_element(By.CSS_SELECTOR, ".calendarCellTime").text
    country = row.find_element(By.CSS_SELECTOR, ".calendarCellCountry").text
    indicator = row.find_element(By.CSS_SELECTOR, ".calendarCellIndicator").text
    importance = row.find_element(By.CSS_SELECTOR, ".calendarCellImportance").text
    forecast = row.find_element(By.CSS_SELECTOR, ".calendarCellForecast").text
    actual = row.find_element(By.CSS_SELECTOR, ".calendarCellActual").text

    # Process extracted data
    print(
        f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}"
    )

# Close the browser
driver.quit()

This method utilizes CSS selectors to locate elements within the table rows. CSS selectors provide a more concise and flexible way to target specific elements based on their attributes and structural relationships in the HTML.

Using Beautiful Soup with Selenium:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

# Set Chromedriver path
driver_path = "/path/to/chromedriver"

# Launch Chrome browser
driver = webdriver.Chrome(executable_path=driver_path)

# Access Forex Factory URL
url = "https://www.forexfactory.com/calendar"
driver.get(url)

# Wait for the table to load
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "calendarTable"))
)

# Get the table HTML
table_html = driver.find_element(By.ID, "calendarTable").get_attribute("outerHTML")

# Parse the HTML using Beautiful Soup
soup = BeautifulSoup(table_html, "lxml")

# Extract data from table rows using Beautiful Soup
for row in soup.find_all("tr"):
    date = row.find("td", class_="calendarCellDate").text
    time = row.find("td", class_="calendarCellTime").text
    country = row.find("td", class_="calendarCellCountry").text
    indicator = row.find("td", class_="calendarCellIndicator").text
    importance = row.find("td", class_="calendarCellImportance").text
    forecast = row.find("td", class_="calendarCellForecast").text
    actual = row.find("td", class_="calendarCellActual").text

    # Process extracted data
    print(
        f"Date: {date}, Time: {time}, Country: {country}, Indicator: {indicator}, Importance: {importance}, Forecast: {forecast}, Actual: {actual}"
    )

# Close the browser
driver.quit()

This approach combines Selenium with Beautiful Soup, a popular HTML parsing library. After extracting the table HTML using Selenium, it's parsed using Beautiful Soup to extract data from the table rows. This method leverages the strengths of both tools: Selenium for browser automation and Beautiful Soup for efficient HTML parsing.


selenium-webdriver selenium-chromedriver

selenium webdriver chromedriver

Maven、Selenium-ChromeDriver、Chrome DevTools Protocol 以外の代替手段: Playwright、Puppeteer、Cypress、Gradle、Ant、Chromedriver、WebDriver、Applitools、BrowserStack

Chrome DevTools Protocol (CDP) は、Chrome ブラウザを制御するためのプロトコルです。Selenium は、Web ブラウザの自動化テストを実行するためのオープンソース ツールです。Selenium は CDP を使用して Chrome ブラウザを制御できます。