- Web Scraping is a process of extracting content from a website. Since any website is but HTML code, web scraper parses and extracts values from this underlying HTML code. Lets set things up to begin with. I create a folder named webscraper under the folder in which I.
- “Web scraping is a computer software technique of extracting information from websites” “Web scraping focuses on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet.”.
Go is a programming language built to resemble a simplified version of the C programming language. It compiles at the machine level. Go was created at Google in 2007 by Robert Griesemer, Rob Pike, and Ken Thompson.Follow me on twitch!
In this article we’re going to have a look at how to mock http connections while running your tests on your golang application.
Since we do not want to call the real API or crawl an actual site to make sure that our application works correctly, we want to make sure that our application works within our defined parameters.
There's a great module that can help us with the task of mocking HTTP responses for tests called httpmock
HTTP mocks for web scraping
Let's say we have a component in our application that will do some web scraping, so we might use something like
In the below example we'll use a simple function that visits a website and extracts the content of the
Now if we are to write a unit test for that, we can do that as follows:
In the test we run the function and compare the title we expect with the title that was scraped by the function.
Now the problem with this test is, that when ever we run
go test it will actually go to my website and read the title. This means two things:
- Our tests will be slower and more error prone than they could be
- I can never change my website title without changing the tests for this project
- Most important: We introduced a dependency outside our control for our program that doesn't have any relation to it
To fix this we commonly use mocks, a way of faking http responses, but to actually have the exchange of information happen on the computer where the tests are run, without having to rely on an external webserver or API backend to be available.
HTTP mocks for API requests
In Golang we can use httpmock to intercept any http requests made and pin the responses in our tests. This way we can verify that our program works correctly, without having to actually send a requests over the network.
To install httpmock we can add a
go mod download.
scrape_test.go would look like this:
after which we can run
go test and it should produce the following output:
Let's go over the most important changes ot the file:
myMockPage :=...sets up our example response, a piece of plain text that our function will parse into a HTML and look for the title
httpmock.Activate()activates the mocking, before this no requests can be intercepted
httpmock.RegisterResponder()defines the METHOD and the URL, so GET or POST and an address at which we fake an http response
httpmock.NewStringResponderwill need a status code and a string to respond with instead of what actually lives at that URL
httpmock.DeactivateAndReset()stops mocking responses for the rest of the test
Golang Web Development
If you instead want to mock an API response you can use something like this:
That's it! Our client consuming the string should take care of the JSON parsing.
Hope you enjoyed this little post about mocking in GO, let me know what you're building in the comments!
Web Scraping With Golang In Windows
Thank you for reading! If you have any comments, additions or questions, please leave them in the form below! You can also tweet them at me
Web Scraping With Golang Function
If you want to read more like this, follow me on feedly or other rss readers