Saturday, May 13, 2006

Using HttpClient to PURGE Squid entries

This article describes a hack to send HTTP PURGE requests to a Squid server using the Apache Commons HttpClient library. Squid is a web proxy cache which sits between the webserver and the client, intercepting HTTP GET requests and serving them out of the cache if available, or passing the request through to the webserver, populating the cache, and serving it from the cache if not. This configuration is known as a reverse-proxy configuration, probably to distinguish it from the caching proxies that ISPs put in front of their gateways to speed up customer's HTTP requests.

Squid allows you to purge entries from its cache by using a PURGE request. You can configure Squid to accept PURGE requests only from localhost (or from a set of specified internal hosts), and provides a squidclient command to do the purging. This is explained in detail in the FAQ entry here.

Since my objective was to send the PURGE request to Squid from within a Java program, using the squidclient command was not the most optimal option. Looking around the web, I came across a newsgroup post which showed a Perl script to do the same thing, which just sent this standard HTTP request over a socket to the Squid port.

1
2
PURGE http://my.squid.host:port/junk HTTP/1.0
Accept: */*

Obviously, I could do something similar in Java as well. But I was also using HttpClient in this project to send HTTP GET requests, so I thought that it would be more maintainable and unified if I could somehow use HttpClient instead of doing direct socket calls for the PURGE. However, the PURGE request neither part of the standard HTTP 1.1 protocol, nor does it make sense in the context of a standard webserver, so it is not supported by HttpClient out of the box.

Adding support for a PURGE method was quite trivial, however. All I had to do was create a new PurgeMethod.java class, using the source code for the GetMethod.java class as a template, and HttpClient was able to serve HTTP PURGE requests to Squid. Here is the code for the PurgeMethod.java class.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// PurgeMethod.java
package my.company.com.httpclient.methods;
 
import org.apache.commons.httpclient.HttpMethodBase;
import org.apache.log4j.Logger;
 
/**
 * Specialized method to send a HTTP PURGE request to the specified URL. This
 * class implements the HttpMethod interface from the commons HttpClient
 * package.
 */
public class PurgeMethod extends HttpMethodBase {
 
    public PurgeMethod() {
        super();
        setFollowRedirects(true);
    }
 
    public PurgeMethod(String url) {
        super(url);
        setFollowRedirects(true);
    }
 
    public String getName() {
        return "PURGE";
    }
}

The calling code is very similar to the standard calling code for sending HTTP GET requests using HttpClient, which is detailed in the HttpClient Tutorial here. A stripped down version (without timeout settings and retries) is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    HttpClient client = new HttpClient();
    HttpMethod method = new PurgeMethod(url);
    try {
        int status = client.executeMethod(method);
        if (status != HttpStatus.SC_OK && status != HttpStatus.SC_NOTFOUND) {
            throw new Exception("HTTP PURGE failed for: " + url + "(" + status + ")");
        }
        return; // response body does not make any sense here
    } finally {
        method.releaseConnection();
    }

To test this, I aimed the code at a running Squid installation which was set up to reverse proxy to a Resin server. My test consisted of sending a PURGE request followed by two GET requests in succession, while watching the Squid access.log from another terminal. My expectation is that I will see a PURGE request, then a GET request which will not find the page in the cache (a TCP_MISS:DIRECT) followed by a GET request which will find the page in cache (a TCP_HIT:NONE). Here is a snippet from the Squid access.log.

1
2
3
10.16.181.34 - - [11/May/2006:15:51:08 -0700] "PURGE http://my.company.com/myapp/mypage.html HTTP/1.1" 200 122 TCP_MISS:NONE
10.16.181.34 - - [11/May/2006:15:51:09 -0700] "GET http://my.company.com/myapp/mypage.html HTTP/1.1" 200 30351 TCP_MISS:DIRECT
10.16.181.34 - - [11/May/2006:15:52:13 -0700] "GET http://my.company.com/myapp/mypage.html HTTP/1.1" 200 29043 TCP_HIT:NONE

As you can see, the actual code involved in adding the PURGE method functionality to HttpClient is trivial. However, this is one of the two ways I can think of to purge Squid entries in pure Java in a platform independent way. The other pure Java way is to use Java Sockets. I think that this approach is cleaner in the sense that it re-invents fewer wheels and piggybacks on standard open source libraries which provides the base plumbing functionality. Also I would like to commend the developers of the HttpClient library for making the framework so easy to extend, which I believe is a hallmark of great framework software.

No comments:

Post a Comment

Comments are moderated to prevent spam.