Streamlining Web Scraping with Robocorp: Examples and Tags

Naveen Kumar Ravi
3 min readMar 11, 2023

--

Photo by Alex Knight on Unsplash

Web scraping is the process of extracting data from websites. It has become an essential part of data mining, research, and analysis. With the increasing amount of data available on the internet, web scraping has become a necessary tool for businesses and researchers alike. However, web scraping can be a time-consuming process, especially when done manually. This is where Robocorp comes in handy. In this article, we will discuss how to automate web scraping using Robocorp with examples, diagrams, and tags.

What is Robocorp?

Robocorp is an open-source platform for building and running automation robots. It provides a suite of tools and frameworks for automating various tasks, including web scraping. Robocorp allows users to create and deploy software robots that can perform repetitive tasks such as data extraction, data processing, and data analysis. Robocorp is built on top of the Robot Framework, which is a popular open-source framework for automating software testing.

Steps for Automating Web Scraping using Robocorp:

Step 1: Install Robocorp:

To get started with Robocorp, you will need to install it on your computer. You can download the latest version of Robocorp from their website (https://robocorp.com/).

Step 2: Create a New Robot:

Once you have installed Robocorp, you can create a new robot. To do this, open the Robocorp Workforce application and click on the “New Robot” button. This will create a new robot project with the necessary files and folders.

Step 3: Add Libraries:

To automate web scraping using Robocorp, we need to add two libraries to our project:

  1. SeleniumLibrary: This library allows us to automate web browsers such as Chrome, Firefox, etc. SeleniumLibrary is built on top of Selenium, which is a popular web browser automation framework.
  2. RequestsLibrary: This library allows us to send HTTP requests to web servers and retrieve the response. RequestsLibrary is built on top of the Requests library, which is a popular Python library for sending HTTP requests.
dependencies:
- python=3.7
- pip
- robotframework
- robotframework-seleniumlibrary
- robotframework-requests
- chromedriver

Step 4: Create a New Task:

Now that we have added the necessary libraries to our project, we can create a new task for web scraping. To do this, we need to create a new file in our project folder with the extension “.robot”. In this file, we can define our web scraping task using the Robot Framework syntax.

Here’s an example of a simple web scraping task that retrieves the title of a webpage:

*** Settings ***
Library SeleniumLibrary
Library RequestsLibrary

*** Variables ***
${url} https://www.example.com

*** Test Cases ***
Get Page Title
Open Browser ${url} Chrome
${title}= Get Title
Log ${title}
Close Browser

In this example, we first define the necessary libraries using the “Library” keyword. We then define a variable ${url} with the URL of the webpage we want to scrape. In the “Test Cases” section, we open the webpage using the “Open Browser” keyword, retrieve the page title using the “Get Title” keyword, and log the title to the console using the “Log” keyword. Finally, we close the browser using the “Close Browser” keyword.

Step 5: Run the Task:

To run the web scraping task, we need to open the terminal or command prompt and navigate to the project folder. We can then run the task using the “robot” command followed by the name of the task file.

robot my_web_scraping_task.robot

This will execute the task and output the result to the console.

Step 6: Add Tags:

Tags are a useful feature in the Robot Framework that allows us to categorize and organize our test cases. We can add tags to our web scraping task using the “Tags” section in our task file.

*** Test Cases ***
Get Page Title
[Tags] Web Scraping
Open Browser ${url} Chrome
${title}= Get Title
Log ${title}
Close Browser

In this example, we added the tag “[Web Scraping]” to our test case. We can then use this tag to run all our web scraping tasks together.

robot -i Web Scraping my_robot_project

This command will run all test cases with the tag “Web Scraping”.

Conclusion:

Web scraping is a powerful tool for extracting data from websites. With the help of Robocorp, we can automate the web scraping process and save time and effort. In this article, we discussed how to automate web scraping using Robocorp with examples, diagrams, and tags. By following these steps, you can create your own web scraping tasks and automate the data extraction process.

--

--

Naveen Kumar Ravi
Naveen Kumar Ravi

Written by Naveen Kumar Ravi

Technical Architect | Java Full stack Developer with 9+ years of hands-on experience designing, developing, and implementing applications.

No responses yet