Tuesday, April 26, 2022

Scrape Web Data Excel VBA Macro (Part 1)

In this post we learn how to send a HTTP request to scrape and import web data into Excel using VBA macros. As an example, we are getting data form a jobs portal, which contains public data that anyone can see on the browser with no login needed. The process consists of two steps. In this post, we will see the first step, which consists of sending a HTTP request to get the web page contents as text. In the next post we will see how to process the HTTP response into the body of a HTML document in order to target the data of interest.


Macro/VBA code:

 
  Sub SendHTTPrequest()
 
      'Add Microsoft XML v6.0 Library to references first!
      Dim HTTPreq As Object, url As String, response As String
      Set HTTPreq = CreateObject("MSXML2.XMLHTTP")
   
      url = "https://uk.indeed.com/jobs-in-London"
 
      'Send HTTP request to get the data
      With HTTPreq
          .Open "GET", url, False
          .send
      End With
 
      response = HTTPreq.responseText
      Debug.Print response  'Shows in Immediate Window
 
      'Call ProcessHTTPresponse(response)  'next macro (Part 2)
 
  End Sub
 

 

Macro explained:

  • First of all, we need to add the Microsoft XML v6.0 library to the references in the VBA editor under Tools->References. This library allows to send and work with the XML HTTP request to GET the data as text. Later, we will put that into a HTML document (Part 2). As we will see in the next post, we need to add the Microsoft HTML Object Library to run that macro and process the HTTP response.
  • We declare HTTPreq as an object and later create and assign an XML object to it (late binding). We could also do it in one line (early binding) depending on the library/version. We declare two other variables to store the web page URL and the HTTP response as text.
  • The url variable stores the target web page URL. In this case, that’s the web address of the jobs portal selecting a particular city where we want to find jobs. The first page contains a list of 15 jobs only. Scrapping a web site with multiple pages requires to loop through the different URLs for each page, sending a request each time.
  • Then we open an HTTP connection request using three arguments: The request method, in this case is a GET request, the target web page URL, and a third parameter indicating whether the call is asynchronous (False). After that, we send the request.
  • We assign the HTTP response as text to a variable. We can visualize the contents of that variable in the immediate window using Debug.Print. That’s a bunch of text with the HTML document of the web page. For static pages, it will probably contain all the information of interested. However, we need to loop through the text and target the elements we want to extract.
  • Processing the response will be cover in the next post (Part 2). That consists of adding the response to the body of an HTML element, and then targeting the HTML elements containing the info of interested via the element id, tag, or class.

 

This is how we send HTTP request to scrape web data in Excel with VBA macros.

 

Other examples:

 

No comments:

Post a Comment

Popular Posts