Selenium I: automating multi-page workflows with Ruby and Selenium
The problem
I had a repetitive, time-sensitive workflow that required switching between two browser pages, waiting for specific signals, and triggering actions at precise moments. Doing it manually meant context-switching constantly, missing timing windows, and making data entry errors. A linear Selenium script was the obvious first step, but linear scripts break the first time a page loads 500ms slower than expected.
This article covers the first iteration: a ~700-line Ruby script that hardened the workflow with explicit waits, DOM-driven state detection, layered retry logic, and modular page objects.
Entry point: a tight claim-submit loop
The script is a continuous loop. Each iteration claims a task from the primary dashboard, drives it through its state transitions, and submits it. There is no batching; every task is handled end-to-end before the next one is claimed.
#!/usr/bin/env ruby
require 'json'
require 'date'
require_relative 'extension'
require_relative 'project'
require_relative 'session'
VERSION = '3.4'
Session.clear_log
Session.log('INIT', '~', "AutoWorkflow v#{VERSION}")
Session.log('INIT', '~', 'https://github.com/cetio')
start_driver(false)
Extension.cleanup!()
project = Project.new()
at_exit do
err = $!
if err && !err.is_a?(SystemExit)
Session.log('INIT', '!', "Unhandled: #{err.class}: #{err.message}", error: err)
end
Extension.cleanup!()
end
loop do
tracker = project.claim_task()
tracker.await_submission()
end
Explicit waits with scroll-and-retry
The core interaction primitive is an oriented_click: scroll the element into view, check it is displayed and enabled, try a native click, and fall back to a JS click if the native one fails. It returns a boolean so the caller can decide whether to retry or abort. oriented_send_keys follows the same pattern for text input.
def oriented_click(value = nil, timeout: 1, interval: 0.3, **locator)
locator = locator.empty? ? nil : locator
element = locator ? nil : value
start = Time.now
while (Time.now - start) < timeout
begin
el = locator ? $driver.find_element(locator) : element
$driver.execute_script(
"arguments[0].scrollIntoView({block: 'center', inline: 'nearest'})"
, el) rescue nil
if el.displayed? && el.enabled?
el.click() rescue
$driver.execute_script("arguments[0].click();", el) rescue nil
return true
end
rescue
end
sleep interval
end
false
end
def oriented_send_keys(keys, value = nil, timeout: 2, interval: 0.3, **locator)
locator = locator.empty? ? nil : locator
element = locator ? nil : value
start = Time.now
while (Time.now - start) < timeout
begin
el = locator ? $driver.find_element(locator) : element
$driver.execute_script(
"arguments[0].scrollIntoView({block: 'center', inline: 'nearest'})"
, el) rescue nil
if el.displayed? && el.enabled?
el.clear rescue nil
el.send_keys(keys) rescue
$driver.execute_script(
"arguments[0].value = arguments[1];", el, keys.to_s) rescue nil
return true
end
rescue
end
sleep interval
end
false
end
Detect state from the DOM, not from memory
A task page has a small set of observable states. Instead of tracking state in variables, the script re-reads the DOM every time it needs to decide what to do. This makes it resilient to race conditions, refreshes, and unexpected dialogs.
module Controller
module TaskPage
CONTINUE_BUTTON = "//button[@aria-label='Continue']"
OPEN_EXTERNAL = "//button[contains(text(), 'Open external tool')]"
TIMER_BUTTONS = [
"//button[contains(text(), 'Start timer')]",
"//button[contains(text(), 'Start task')]",
"//button[contains(text(), 'Resume task')]"
].freeze
SUBMIT_TASK = "//button[contains(text(), 'Submit task')]"
CONFIRM_TIME = "//button[contains(text(), 'Confirm time')]"
EXIT_BUTTON = "//button[contains(text(), 'Exit')]"
SKIP_BUTTON = "//button[contains(text(), 'Skip')]"
NEXT_TASK = "//button[contains(text(), 'Next task')]"
NUMERIC_INPUT = "//input[@type='number']"
TEXT_INPUT = "//textarea | //input[not(@type) or @type='text']"
def state!
loop do
return :error if safe_find_elements(
xpath: "//h2[contains(text(), 'Application error')]").any?
return :continue if safe_find_elements(xpath: CONTINUE_BUTTON).any?
return :external if safe_find_elements(xpath: OPEN_EXTERNAL).any?
TIMER_BUTTONS.each do |str|
return :timer if safe_find_elements(xpath: str).any?
end
return :submit if safe_find_elements(xpath: SUBMIT_TASK).any?
return :nav if safe_find_elements(xpath: NEXT_TASK).any?
return :confirm if safe_find_elements(xpath: CONFIRM_TIME).any?
return :dialog if Controller.dialog_present?
safe_find_elements(xpath: "//button").each do |btn|
return :task if /\d{6}-/.match?(btn.text) rescue nil
end
sleep 0.1
end
end
end
end
The state! method loops until it can definitively categorize the page. Each state maps to a concrete DOM indicator: a button with specific text, an error heading, a visible dialog. There is no guessing.
Drive through transient states with begin_task!
Before the actual work starts, the page often bounces through a sequence of dialogs, timers, and continue buttons. begin_task! drives through all of them in order, returning only when the task is truly ready for input.
def begin_task!(require_dialog = false)
seen_dialog = false
loop do
sleep 0.3 if require_dialog
case state!
when :external
seen_dialog = true
shadow = safe_window_handles.dup
oriented_click(xpath: OPEN_EXTERNAL)
WAIT.until { (safe_window_handles - shadow).any? }
new_window = (safe_window_handles - shadow).first
with_window(new_window) { $driver.close() }
when :timer
seen_dialog = true
TIMER_BUTTONS.each do |str|
next unless safe_find_elements(xpath: str).any?
oriented_click(xpath: str)
sleep 0.3
end
when :dialog
Controller.close_dialog!()
sleep 0.3
when :error
$driver.navigate.refresh()
sleep 0.3
when :continue
continue!()
sleep 0.3
else
return if !require_dialog || seen_dialog
end
end
end
Dialog handling as a shared primitive
Dialogs appear on both the primary and secondary pages. A shared controller module detects them, tries an acknowledge button first (if auto-acknowledge is enabled), and falls back to a close button.
module Controller
DIALOG = '//div[@role="dialog"]'
CLOSE_DIALOG = "//button[@aria-label='Close']"
ACKNOWLEDGE = "//button[contains(text(), 'Acknowledge')]"
module_function
def dialog_present?
safe_find_elements(xpath: DIALOG).any?
end
def close_dialog!()
return true if acknowledge!()
oriented_click(xpath: CLOSE_DIALOG)
end
def acknowledge!()
return false unless safe_find_elements(xpath: ACKNOWLEDGE).any?
oriented_click(xpath: ACKNOWLEDGE) if Extension.auto_acknowledge?
end
end
Claim tasks from a dashboard
The Project class manages the primary dashboard window. It navigates to the task list, waits for the task page to load, and retries through transient errors and dialogs.
class Project
include Controller::Project
attr_accessor :handle, :uuid
def initialize()
@handle = open_or_nav('https://platform.example.com/contractor/tasks')
$driver.switch_to.window(@handle)
Session.log('PROJECT', '~', 'Awaiting task page...')
until $driver.current_url.include?('/tasks')
sleep 0.3
end
@uuid = $driver.current_url.match(
/\b[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}\b/)
end
end
module Controller
module Project
CONTINUE_TASK = "//button[contains(text(), 'Continue task')]"
START_NEW_TASK = "//button[contains(text(), 'Start new task')]"
INBOX_LIST = "//div[@data-testid='task-inbox']"
AVAILABLE_TASKS = "//button[contains(text(), 'Available tasks')]"
def claim_task()
navigate_to_tasks()
Session.log('PROJECT', '~', 'Claiming from dashboard')
counter = 0
until $driver.current_url.include?('/workspace')
sleep 0.5
if Controller.dialog_present?
Session.log('PROJECT', '~', 'Awaiting dialog dismissal...')
await_dialog_dismissal!()
end
counter += 1
if safe_find_elements(xpath: CONTINUE_TASK).any?
counter = 0
oriented_click(xpath: CONTINUE_TASK)
break if await_workspace_page()
elsif safe_find_elements(xpath: START_NEW_TASK).any?
counter = 0
oriented_click(xpath: START_NEW_TASK)
break if await_workspace_page()
elsif safe_find_elements(css: '.animate-spin').any?
counter = 0
elsif counter >= 10
$driver.navigate.refresh()
counter = 0
end
end
tracker = ::Tracker.new(self)
Session.log('PROJECT', '+', "Claimed task #{tracker.uuid}")
tracker
end
def navigate_to_tasks()
$driver.switch_to.window(@handle)
with_retry { $driver.navigate.to(
"https://platform.example.com/contractor/#{@uuid}/tasks") }
rescue Selenium::WebDriver::Error::NoSuchWindowError
@handle = $driver.execute_script('window.open(arguments[0])',
"https://platform.example.com/contractor/#{@uuid}/tasks")
$driver.switch_to.window(@handle)
end
def await_workspace_page()
counter = 0
until $driver.current_url.include?('/workspace')
sleep 0.5
return false if (counter += 1) >= 10
end
true
end
end
end
Defensive error handling and retries
Network flakiness, stale element references, and transient UI states are not edge cases; they are the norm. with_retry wraps every Selenium call with exponential backoff so a brief disconnect does not kill the session.
MAX_RETRIES = 5
BASE_RETRY_DELAY = 1.0
def with_retry(max_retries = MAX_RETRIES, base_delay = BASE_RETRY_DELAY)
retries = 0
begin
yield
rescue Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::ETIMEDOUT,
Net::ReadTimeout, Timeout::Error,
Selenium::WebDriver::Error::WebDriverError,
Selenium::WebDriver::Error::NoSuchDriverError,
Selenium::WebDriver::Error::SessionNotCreatedError => e
retries += 1
if retries <= max_retries
delay = base_delay * (2 ** (retries - 1))
Session.log('HELPERS', '~',
"Connection error (#{e.class}): retrying #{retries}/#{max_retries} " \
"after #{delay.round(2)}s...")
sleep delay
retry
else
Session.log('HELPERS', '!',
"Max retries exceeded: #{e.class}: #{e.message}", error: e)
raise
end
end
end
def safe_find_elements(locator)
with_retry { $driver.find_elements(locator) }
end
def safe_find_element(locator)
with_retry { $driver.find_element(locator) }
end
def safe_window_handles
handles = with_retry { $driver.window_handles }
Array(handles)
end
Structured logging with optional sound
Every significant action is logged with a UTC timestamp, location tag, and status symbol. Logs go to both stdout and a persisted file. Sound notifications help when the script is running in the background.
module Session
AUTO_DIR = File.join(Dir.home, 'AutoWorkflow')
LOG_PATH = File.join(AUTO_DIR, 'output.log')
module_function
def clear_log
FileUtils.mkdir_p(AUTO_DIR) unless File.directory?(AUTO_DIR)
File.open(LOG_PATH, 'w') { |f| f.puts('') }
end
def log(location, symbol, message, error: nil, sound_path: nil)
line = "[#{Time.now.utc.iso8601}] [#{location}:#{symbol}] #{message}"
write_line(line)
log_error_details(location, error) if error
play_sound(sound_path) if sound_path && Extension.audio_notify?
end
def write_line(line)
puts line
FileUtils.mkdir_p(AUTO_DIR) unless File.directory?(AUTO_DIR)
File.open(LOG_PATH, 'a') { |f| f.puts(line) }
end
private
def play_sound(path)
return unless File.exist?(path)
case RbConfig::CONFIG['host_os']
when /darwin/ then spawn_detached('afplay', path)
else
return if spawn_detached('mpv', '--no-video', '--quiet', path)
return if spawn_detached('paplay', path)
return if spawn_detached('aplay', path)
end
end
end
What this iteration proved
- End-to-end automation is roughly 30% happy-path code and 70% handling edge cases, timing, and observability.
- Re-reading the DOM for state on every decision loop eliminates an entire class of race-condition bugs that fixed sleeps can only mask.
- A small state machine with concrete DOM indicators is more maintainable than a linear script, even when there are only six states.
- Structured logs with timestamps and sound notifications make headless debugging possible.
- Exponential-backoff retry on connection errors keeps the session alive through brief network hiccups.
Continue to Part II → for the dispatch-oriented runtime: JS-level request interception, tab queuing, companion extensions, and treating Selenium like a runtime instead of a script.