Monitor web page changes with Go

In this article, we will look into how to create a program that will monitor web page changes using Go programming language with optional cookies header.

Approach

To tell if the page has changed, we will compare HTML response data of previous and current request. Depending on if those two match or don't match, we will print out appropriate information in the console. Making request and comparing will happen every X seconds.

Our program will be invoked in the following way:

$ go run main.go URL COOKIES

URL argument will be mandatory and COOKIES optional.

Code

First we will create file main.go and define the package.

package main  

Next, we import packages that we will work with.

import (  
    "net/http"
    "io/ioutil"
    "time"
    "log"
    "os"
)

Our main function will look like this:

func main() {  
    // Exit program if URL is missing
    if len(os.Args) < 2 {
        log.Println("Missing URL.")
        os.Exit(0)
    }

    // Set url variable
    url = os.Args[1]

    // Set cookie variable if passed as argument
    if len(os.Args) == 3 {
        cookies = os.Args[2]
    }

    // Make initial request to set first lastData
    makeRequest()

    // Continue with requests, loop every LoopEverySeconds seconds
    scheduleRequests()
}

As we can see from the code above, we will need to define two global variables which we will later use to make requests.

var url string  
var cookies string  

Also, we will use http.Client interface, constant LoopEverySeconds and lastData variable in which we will save HTML data of previous request.

const LoopEverySeconds = 5  
var client = &http.Client{}  
var lastData string  

Making requests

Now we need a function that will make request and get HTML data from URL using GET method. Once we have the data, we can compare current and previous one inside checkChanges function.

func makeRequest() {  
    req, err := http.NewRequest("GET", url, nil)
    if err != nil {
        log.Println(err)
        return
    }

    // Set cookies if they were passed as argument
    if cookies != "" {
        req.Header.Set("Cookie", cookies)
    }

    // Send request
    resp, err := client.Do(req)
    if err != nil {
        log.Println(err)
        return
    }

    // Save response body into data variable
    data, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Println(err)
        return
    }

    // If lastData is equal to "", it means that it is 
    // the first request and we set lastData to current 
    // response body
    // Otherwise, we compare previous and current HTML
    if lastData == "" {
        lastData = string(data)
    } else {
        checkChanges(string(data))
        lastData = string(data)
    }
}

In scheduleRequests function, we simply create infinite loop to call makeRequest every LoopEverySeconds seconds.

func scheduleRequests() {  
    // Infinite loop to run every LoopEverySeconds seconds
    for range time.Tick(time.Second * LoopEverySeconds) {
        makeRequest()
    }
}

Checking for changes

Simple function to check if current response HTML is different than previous one. If there has been a change, program will print out Changes noticed! and exit.

func checkChanges(newData string) {  
    if newData == lastData {
        log.Println("No changes.")
        return
    }

    log.Println("Changes noticed!")
    os.Exit(0)
}

Running the program

To run the program, simply write:

$ go run main.go "http://example.com" "SID=NtawTu; AID=OAs3VzVc;"

As mentioned before, second argument representing cookies it optional.

Putting it all together

You can find the code on GitHub Gist.

Further improvements

To save on storage, instead of saving full HTML data, we could have also saved only hash of it (look into crypto/sha256 package).

Also, it would be beneficial to get notified by email. Check out this snippet on how to send email via GMail.

In checkChanges you can then do the following:

func checkChanges(newData string) {  
    ...
    sendEmail("Changes noticed on " + url)
    log.Println("Changes noticed!")
    ...
}

Limitations

It is important to point out that since this is comparing full HTML data of page, there might be unwanted notifications. Dynamic elements like time or anything that changes in HTML with every request will cause the program to register the change. To bypass that problem, comparing only text or specific part of HTML instead is a better option.

Silvio Simunic

Read more posts by this author.