Scraping Web Pages in LabVIEW

I made a video showing how! Well, I made it several weeks ago…

Watch on YouTube: "Example of Scraping Web Pages in LabVIEW"

The goal of the video is to convey that in many cases you can pull data from web resources without needing to automate a web browser control.

Watching the video you will see how to use the built-in browser developer tools, discover HTTP requests that are interesting, and recreate those HTTP requests with the LabVIEW HTTP Client VIs.

Workflow

For example, let’s say you want to get the current weather and decide to parse the front page of Weather Underground. The first thing you might try is right clicking on the weather shown on the page, choosing Inspect, and examining the DOM in the Elements pane.

Using Inspect and the Elements pane to find where the temperature is located in the DOM

Problems Relying on the Elements Pane

  1. It is possible the structure of the page may change. Web pages change look and structure all the time, we probably don’t want to fix our web scraping VI over and over again.

  2. The page might not be very easy to parse. Many pages at best have lots of extraneous information and at worst have invalid / poorly structured HTML.

  3. The Elements pane shows the live DOM tree including manipulations by JavaScript. The data might not be in the source HTML and instead may be gathered programmatically via JavaScript and then added to the DOM.

Use an API When Available

The best way to prevent issues with relying on DOM structure is to use an API provided by the web service. A programmatic API is usually the intended way to access data.

For example, Weather Underground has the Weather Underground API which has online documentation and returns easy to parse JSON:

{
  "response": {
    "version": "0.1",
    "termsofService": "http:\/\/www.wunderground.com\/weather\/api\/d\/terms.html",
    "features": {
      "conditions": 1
    }
  },
  "current_observation": {
    "display_location": {
      "full": "Austin, TX"
    },
    "station_id": "KTXAUSTI47",
    "observation_time": "Last Updated on June 16, 10:59 PM CDT",
    "observation_time_rfc822": "Thu, 16 Jun 2016 22:59:14 -0500",
    "weather": "Clear",
    "temperature_string": "83.3 F (28.5 C)",
    "temp_f": 83.3,
    "temp_c": 28.5,
    "relative_humidity": "69%",
    "wind_string": "From the SSE at 2.0 MPH Gusting to 7.0 MPH",
    "wind_dir": "SSE"
  }
}

Example response from Weather Underground API filtered to show relevant content.

After signing up for the API to get our key and using the docs to understand the URL structure, we can create a VI that uses the Weather Underground API:

A LabVIEW VI that calls the Weather Underground API using a zip code and an API Key provided by registration

When An API is Not Available

If the web service does not provide an API it is time to get crafty. We have to resort on relying on HTML structure or finding out where data is dynamically loaded from. This is the approach used by the video at the top of the page and can be summarized roughly as follows:

  1. Find a web page you want to scrape in your web browser.

  2. Open the developer tools in your browser, record the network requests, and inspect the network requests for useful data.

    I am partial to the Chrome Developer Tools, but Firefox and Edge all provide similar tools.

  3. Look through the network requests to find one that includes the data you are trying to find.

    The best responses will be JSON or XML because LabVIEW has VIs to parse those data types.

    If you can only find the data embedded in HTML, you can use LabVIEW’s string parsing VIs to search through the HTML and extract your data.

  4. Use the LabVIEW HTTP Client VIs to re-create a similar network request and query the data inside of LabVIEW

In the video we were lucky to find data as XML which LabVIEW can parse natively although we did not go through those steps. In addition, we are lucky that we could make the request without any additional complexities like cookies or authentication.

Complexities of Scraping Web Services

For many LabVIEW applications the approach described in this document probably works pretty well. Local network devices frequently expose an open web interface and many public web services provide a well-documented programmatic API.

However, there are some hurdles that you may run into:

Have any use cases you ran into, questions, or comments? Ask on the LabVIEW Web Development Community.

Written by

Milan Raj

All Things Web | Engineering | <3 | Austin, TX


Published

Powered By