Web Scraping With Golang

  1. Web Scraping is a process of extracting content from a website. Since any website is but HTML code, web scraper parses and extracts values from this underlying HTML code. Lets set things up to begin with. I create a folder named webscraper under the folder in which I.
  2. “Web scraping is a computer software technique of extracting information from websites” “Web scraping focuses on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet.”.

Go is a programming language built to resemble a simplified version of the C programming language. It compiles at the machine level. Go was created at Google in 2007 by Robert Griesemer, Rob Pike, and Ken Thompson.

ScrapingFollow me on twitch!

In this article we’re going to have a look at how to mock http connections while running your tests on your golang application.

Since we do not want to call the real API or crawl an actual site to make sure that our application works correctly, we want to make sure that our application works within our defined parameters.

There's a great module that can help us with the task of mocking HTTP responses for tests called httpmock

HTTP mocks for web scraping

Let's say we have a component in our application that will do some web scraping, so we might use something like goquery.

In the below example we'll use a simple function that visits a website and extracts the content of the <title> tag.

Web Scraping With Golang

filename: scrape.go

Now if we are to write a unit test for that, we can do that as follows:

filename: scrape_test.go

In the test we run the function and compare the title we expect with the title that was scraped by the function.

Web Scraping With GolangWeb Scraping With Golang

Now the problem with this test is, that when ever we run go test it will actually go to my website and read the title. This means two things:

  1. Our tests will be slower and more error prone than they could be
  2. I can never change my website title without changing the tests for this project
  3. Most important: We introduced a dependency outside our control for our program that doesn't have any relation to it

To fix this we commonly use mocks, a way of faking http responses, but to actually have the exchange of information happen on the computer where the tests are run, without having to rely on an external webserver or API backend to be available.

HTTP mocks for API requests

In Golang we can use httpmock to intercept any http requests made and pin the responses in our tests. This way we can verify that our program works correctly, without having to actually send a requests over the network.

To install httpmock we can add a go.mod file:

and running go mod download.

Rewriting our scrape_test.go would look like this:

after which we can run go test and it should produce the following output:

Let's go over the most important changes ot the file:

  • myMockPage :=... sets up our example response, a piece of plain text that our function will parse into a HTML and look for the title
  • httpmock.Activate() activates the mocking, before this no requests can be intercepted
  • httpmock.RegisterResponder() defines the METHOD and the URL, so GET or POST and an address at which we fake an http response
  • httpmock.NewStringResponder will need a status code and a string to respond with instead of what actually lives at that URL
  • httpmock.DeactivateAndReset() stops mocking responses for the rest of the test

Golang Web Development

If you instead want to mock an API response you can use something like this:

That's it! Our client consuming the string should take care of the JSON parsing.

If you're familiar with mocking http connections in node.js you may have heard of the nock library, which is pretty popular when building JavaScript projects.

Scraping

Hope you enjoyed this little post about mocking in GO, let me know what you're building in the comments!

Web Scraping With Golang In Windows

Thank you for reading! If you have any comments, additions or questions, please leave them in the form below! You can also tweet them at me

Web Scraping With Golang Function

If you want to read more like this, follow me on feedly or other rss readers