Selenium I: automating multi-page workflows with Ruby and Selenium


The problem

I had a repetitive, time-sensitive workflow that required switching between two browser pages, waiting for specific signals, and triggering actions at precise moments. Doing it manually meant context-switching constantly, missing timing windows, and making data entry errors. A linear Selenium script was the obvious first step, but linear scripts break the first time a page loads 500ms slower than expected.

This article covers the first iteration: a ~700-line Ruby script that hardened the workflow with explicit waits, DOM-driven state detection, layered retry logic, and modular page objects.

Entry point: a tight claim-submit loop

The script is a continuous loop. Each iteration claims a task from the primary dashboard, drives it through its state transitions, and submits it. There is no batching; every task is handled end-to-end before the next one is claimed.

#!/usr/bin/env ruby
require 'json'
require 'date'

require_relative 'extension'
require_relative 'project'
require_relative 'session'

VERSION = '3.4'

Session.clear_log
Session.log('INIT', '~', "AutoWorkflow v#{VERSION}")
Session.log('INIT', '~', 'https://github.com/cetio')

start_driver(false)
Extension.cleanup!()
project = Project.new()

at_exit do
  err = $!
  if err && !err.is_a?(SystemExit)
    Session.log('INIT', '!', "Unhandled: #{err.class}: #{err.message}", error: err)
  end
  Extension.cleanup!()
end

loop do
  tracker = project.claim_task()
  tracker.await_submission()
end

Explicit waits with scroll-and-retry

The core interaction primitive is an oriented_click: scroll the element into view, check it is displayed and enabled, try a native click, and fall back to a JS click if the native one fails. It returns a boolean so the caller can decide whether to retry or abort. oriented_send_keys follows the same pattern for text input.

def oriented_click(value = nil, timeout: 1, interval: 0.3, **locator)
  locator = locator.empty? ? nil : locator
  element = locator ? nil : value

  start = Time.now
  while (Time.now - start) < timeout
    begin
      el = locator ? $driver.find_element(locator) : element
      $driver.execute_script(
        "arguments[0].scrollIntoView({block: 'center', inline: 'nearest'})"
        , el) rescue nil

      if el.displayed? && el.enabled?
        el.click() rescue
          $driver.execute_script("arguments[0].click();", el) rescue nil
        return true
      end
    rescue
    end
    sleep interval
  end

  false
end
def oriented_send_keys(keys, value = nil, timeout: 2, interval: 0.3, **locator)
  locator = locator.empty? ? nil : locator
  element = locator ? nil : value

  start = Time.now
  while (Time.now - start) < timeout
    begin
      el = locator ? $driver.find_element(locator) : element
      $driver.execute_script(
        "arguments[0].scrollIntoView({block: 'center', inline: 'nearest'})"
        , el) rescue nil

      if el.displayed? && el.enabled?
        el.clear rescue nil
        el.send_keys(keys) rescue
          $driver.execute_script(
            "arguments[0].value = arguments[1];", el, keys.to_s) rescue nil
        return true
      end
    rescue
    end
    sleep interval
  end

  false
end

Detect state from the DOM, not from memory

A task page has a small set of observable states. Instead of tracking state in variables, the script re-reads the DOM every time it needs to decide what to do. This makes it resilient to race conditions, refreshes, and unexpected dialogs.

module Controller
  module TaskPage
    CONTINUE_BUTTON    = "//button[@aria-label='Continue']"
    OPEN_EXTERNAL      = "//button[contains(text(), 'Open external tool')]"
    TIMER_BUTTONS      = [
      "//button[contains(text(), 'Start timer')]",
      "//button[contains(text(), 'Start task')]",
      "//button[contains(text(), 'Resume task')]"
    ].freeze
    SUBMIT_TASK        = "//button[contains(text(), 'Submit task')]"
    CONFIRM_TIME       = "//button[contains(text(), 'Confirm time')]"
    EXIT_BUTTON        = "//button[contains(text(), 'Exit')]"
    SKIP_BUTTON        = "//button[contains(text(), 'Skip')]"
    NEXT_TASK          = "//button[contains(text(), 'Next task')]"
    NUMERIC_INPUT      = "//input[@type='number']"
    TEXT_INPUT         = "//textarea | //input[not(@type) or @type='text']"

    def state!
      loop do
        return :error if safe_find_elements(
          xpath: "//h2[contains(text(), 'Application error')]").any?
        return :continue if safe_find_elements(xpath: CONTINUE_BUTTON).any?
        return :external if safe_find_elements(xpath: OPEN_EXTERNAL).any?
        TIMER_BUTTONS.each do |str|
          return :timer if safe_find_elements(xpath: str).any?
        end
        return :submit if safe_find_elements(xpath: SUBMIT_TASK).any?
        return :nav    if safe_find_elements(xpath: NEXT_TASK).any?
        return :confirm if safe_find_elements(xpath: CONFIRM_TIME).any?
        return :dialog if Controller.dialog_present?
        safe_find_elements(xpath: "//button").each do |btn|
          return :task if /\d{6}-/.match?(btn.text) rescue nil
        end
        sleep 0.1
      end
    end
  end
end

The state! method loops until it can definitively categorize the page. Each state maps to a concrete DOM indicator: a button with specific text, an error heading, a visible dialog. There is no guessing.

Drive through transient states with begin_task!

Before the actual work starts, the page often bounces through a sequence of dialogs, timers, and continue buttons. begin_task! drives through all of them in order, returning only when the task is truly ready for input.

def begin_task!(require_dialog = false)
  seen_dialog = false
  loop do
    sleep 0.3 if require_dialog

    case state!
    when :external
      seen_dialog = true
      shadow = safe_window_handles.dup
      oriented_click(xpath: OPEN_EXTERNAL)
      WAIT.until { (safe_window_handles - shadow).any? }
      new_window = (safe_window_handles - shadow).first
      with_window(new_window) { $driver.close() }

    when :timer
      seen_dialog = true
      TIMER_BUTTONS.each do |str|
        next unless safe_find_elements(xpath: str).any?
        oriented_click(xpath: str)
        sleep 0.3
      end

    when :dialog
      Controller.close_dialog!()
      sleep 0.3

    when :error
      $driver.navigate.refresh()
      sleep 0.3

    when :continue
      continue!()
      sleep 0.3

    else
      return if !require_dialog || seen_dialog
    end
  end
end

Dialog handling as a shared primitive

Dialogs appear on both the primary and secondary pages. A shared controller module detects them, tries an acknowledge button first (if auto-acknowledge is enabled), and falls back to a close button.

module Controller
  DIALOG        = '//div[@role="dialog"]'
  CLOSE_DIALOG  = "//button[@aria-label='Close']"
  ACKNOWLEDGE   = "//button[contains(text(), 'Acknowledge')]"

  module_function

  def dialog_present?
    safe_find_elements(xpath: DIALOG).any?
  end

  def close_dialog!()
    return true if acknowledge!()
    oriented_click(xpath: CLOSE_DIALOG)
  end

  def acknowledge!()
    return false unless safe_find_elements(xpath: ACKNOWLEDGE).any?
    oriented_click(xpath: ACKNOWLEDGE) if Extension.auto_acknowledge?
  end
end

Claim tasks from a dashboard

The Project class manages the primary dashboard window. It navigates to the task list, waits for the task page to load, and retries through transient errors and dialogs.

class Project
  include Controller::Project

  attr_accessor :handle, :uuid

  def initialize()
    @handle = open_or_nav('https://platform.example.com/contractor/tasks')
    $driver.switch_to.window(@handle)

    Session.log('PROJECT', '~', 'Awaiting task page...')
    until $driver.current_url.include?('/tasks')
      sleep 0.3
    end

    @uuid = $driver.current_url.match(
      /\b[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}\b/)
  end
end
module Controller
  module Project
    CONTINUE_TASK   = "//button[contains(text(), 'Continue task')]"
    START_NEW_TASK  = "//button[contains(text(), 'Start new task')]"
    INBOX_LIST      = "//div[@data-testid='task-inbox']"
    AVAILABLE_TASKS = "//button[contains(text(), 'Available tasks')]"

    def claim_task()
      navigate_to_tasks()
      Session.log('PROJECT', '~', 'Claiming from dashboard')

      counter = 0
      until $driver.current_url.include?('/workspace')
        sleep 0.5
        if Controller.dialog_present?
          Session.log('PROJECT', '~', 'Awaiting dialog dismissal...')
          await_dialog_dismissal!()
        end

        counter += 1
        if safe_find_elements(xpath: CONTINUE_TASK).any?
          counter = 0
          oriented_click(xpath: CONTINUE_TASK)
          break if await_workspace_page()
        elsif safe_find_elements(xpath: START_NEW_TASK).any?
          counter = 0
          oriented_click(xpath: START_NEW_TASK)
          break if await_workspace_page()
        elsif safe_find_elements(css: '.animate-spin').any?
          counter = 0
        elsif counter >= 10
          $driver.navigate.refresh()
          counter = 0
        end
      end

      tracker = ::Tracker.new(self)
      Session.log('PROJECT', '+', "Claimed task #{tracker.uuid}")
      tracker
    end

    def navigate_to_tasks()
      $driver.switch_to.window(@handle)
      with_retry { $driver.navigate.to(
        "https://platform.example.com/contractor/#{@uuid}/tasks") }
    rescue Selenium::WebDriver::Error::NoSuchWindowError
      @handle = $driver.execute_script('window.open(arguments[0])',
        "https://platform.example.com/contractor/#{@uuid}/tasks")
      $driver.switch_to.window(@handle)
    end

    def await_workspace_page()
      counter = 0
      until $driver.current_url.include?('/workspace')
        sleep 0.5
        return false if (counter += 1) >= 10
      end
      true
    end
  end
end

Defensive error handling and retries

Network flakiness, stale element references, and transient UI states are not edge cases; they are the norm. with_retry wraps every Selenium call with exponential backoff so a brief disconnect does not kill the session.

MAX_RETRIES = 5
BASE_RETRY_DELAY = 1.0

def with_retry(max_retries = MAX_RETRIES, base_delay = BASE_RETRY_DELAY)
  retries = 0
  begin
    yield
  rescue Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::ETIMEDOUT,
         Net::ReadTimeout, Timeout::Error,
         Selenium::WebDriver::Error::WebDriverError,
         Selenium::WebDriver::Error::NoSuchDriverError,
         Selenium::WebDriver::Error::SessionNotCreatedError => e
    retries += 1
    if retries <= max_retries
      delay = base_delay * (2 ** (retries - 1))
      Session.log('HELPERS', '~',
        "Connection error (#{e.class}): retrying #{retries}/#{max_retries} " \
        "after #{delay.round(2)}s...")
      sleep delay
      retry
    else
      Session.log('HELPERS', '!',
        "Max retries exceeded: #{e.class}: #{e.message}", error: e)
      raise
    end
  end
end

def safe_find_elements(locator)
  with_retry { $driver.find_elements(locator) }
end

def safe_find_element(locator)
  with_retry { $driver.find_element(locator) }
end

def safe_window_handles
  handles = with_retry { $driver.window_handles }
  Array(handles)
end

Structured logging with optional sound

Every significant action is logged with a UTC timestamp, location tag, and status symbol. Logs go to both stdout and a persisted file. Sound notifications help when the script is running in the background.

module Session
  AUTO_DIR = File.join(Dir.home, 'AutoWorkflow')
  LOG_PATH = File.join(AUTO_DIR, 'output.log')

  module_function

  def clear_log
    FileUtils.mkdir_p(AUTO_DIR) unless File.directory?(AUTO_DIR)
    File.open(LOG_PATH, 'w') { |f| f.puts('') }
  end

  def log(location, symbol, message, error: nil, sound_path: nil)
    line = "[#{Time.now.utc.iso8601}] [#{location}:#{symbol}] #{message}"
    write_line(line)
    log_error_details(location, error) if error
    play_sound(sound_path) if sound_path && Extension.audio_notify?
  end

  def write_line(line)
    puts line
    FileUtils.mkdir_p(AUTO_DIR) unless File.directory?(AUTO_DIR)
    File.open(LOG_PATH, 'a') { |f| f.puts(line) }
  end

  private

  def play_sound(path)
    return unless File.exist?(path)
    case RbConfig::CONFIG['host_os']
    when /darwin/ then spawn_detached('afplay', path)
    else
      return if spawn_detached('mpv', '--no-video', '--quiet', path)
      return if spawn_detached('paplay', path)
      return if spawn_detached('aplay', path)
    end
  end
end

What this iteration proved

Continue to Part II → for the dispatch-oriented runtime: JS-level request interception, tab queuing, companion extensions, and treating Selenium like a runtime instead of a script.