Selenium Web Scraping

This page explains how to do web scraping with Selenium IDE commands. Web scraping works if the data is inside the HTML of a website. If you want to extract data from a PDF, image or video you need to use visual screen scraping instead.

  1. Python Scrapy Selenium
  2. Web Scraping Tools
  3. Selenium Web Scraping Python Tutorial
  4. Selenium Web Scraping Python Firefox
  5. Selenium Web Scraping Tutorial

When to use what command?

The table belows shows the best command for each type of data extraction. Click the recommended command for more information and example code.

With this simple goal, Selenium can be used for many different purposes. For instance web-scraping. Many websites run client-side scripts to present data in an asynchronous way. This can cause issues when you are trying to scrape sites in which data you need is rendered through javascript. The answer is yes: QueryStorm lets you use C# inside of Excel and solves both issues, and there is also a Selenium NuGet package we can use for scraping. In addition – you don’t have to worry about writing the code for outputting the results to a CSV/XSLX file. How does QueryStorm solve these issues?

Python Scrapy Selenium

Data to extract is in...Command to useComment
Visible website text, for example text in a table just like this one, or a price on websitestoreText
Text in input fields (input box, text area, select drop down,...)storeValueDo not confuse this command with storeEval, which is not for web scraping.
Get the status of a checkbox or radiobuttonstoreChecked
URL 'behind' an image[email protected]storeAttribute [email protected] extracts the link of any element - if it has one! If that fails, consider browser automation to copy the link to the ${!clipboard} variable.
ALT text 'behind' an image[email protected]The storeAttribute command can be used to get any attribute the HTML element has. For example, use @alt to get the 'Alt' text of an image.
Page titlestoreTitle
Table content: Row/Column/CellstoreText with XPath locatorSee TABLE Web Scraping or automate browser addon
Data from a list e. g. search resultsLoop over storeTextSee How to web scrape search results
Save complete web page source codeXType ${KEY_CTRL+KEY_S}*On Mac it is ${KEY_CMD+KEY_S}.
Save complete web page with imagesXType ${KEY_CTLR+KEY_S}*See Forum post: How to save the entire HTML code
Take screenshot of websitecaptureEntirePageScreenshot*This saves the complete website as image.
Take screenshot of a web page elementstoreImage*This is an easy way to extract images. The other option is to download them.
Text found only website source codesourceExtract*e. g. Google Analytics ID. For text inside page comments or Javascript, this is the only option
PDF, Image, Video, CanvasOCRExtractRelative* This screen scraping command works everywhere because it works visually. The disadvantage is that it is slower than the pure HTML-based commands like storeText.
Text from outside the web pageOCRExtractRelative*For example, if you want to extract data from a browser extension or a desktop app

Web Scraping Tools

(*) These commands are only available in the UI.Vision RPA Selenium IDE. They are not part of the classic Selenium IDE.

See also

  • - Screen scraping (scraping/data extraction with computer vision, OCR)
  • - Form filling with Selenium IDE (the opposite of web scraping)
  • - File uploads with Selenium IDE
  • - Best Selenium IDE Locator Strategy
  • - RPA Software User Manual.

Anything wrong or missing on this page? Suggestions?

Selenium Web Scraping Python Tutorial

...then please contact us.

Selenium Web Scraping

Selenium Web Scraping Python Firefox

Selenium Web Scraping

Selenium Web Scraping Tutorial

UI.Vision RPA Selenium IDE for Chrome and Firefox - Web Test Automation'>