Saturday, March 27, 2010

Java REST Client Interface for Ehcache server

Ehcache is a popular caching library in the Java world. So far, I was aware only of the Ehcache library, using which you could build in-JVM caches. Lately (since late 2008 actually), they have come up with the Ehcache server, with which you can create remote caches that your application can access over HTTP.

This obviously has enormous implications for scalability. Granted, cache access times over a network are much higher than in-memory access, I think this is a fair tradeoff to make when you are dealing with potentially very large caches. Keeping your cache behind a server frees up your application JVM's memory. Also, if you want more cache, you can partition it out into more servers.

Our first cut of the ehcache setup was something like this. A bunch of applications would maintain their own in-JVM caches, but these caches would communicate with each other and replicate over RMI, as shown in the diagram below. This was mainly to test out the local replicated mode, which we used in our final setup for fault tolerance (see below).

The next step was to take the cache and put it behind the cache server. We used Ehcache Standalone Server version 0.8. It comes packaged within a Glassfish Application Server, so all we had to do was to expand the tarball, and update the ehcache.xml file in the war/WEB-INF/classes subdirectory with our local replicated cache definitions. To start the Ehcache server, use bin/startup.sh (this works fine, but needs cleaning up to log to a file, etc). Out of the box, its set to have the cache server listen on port 8080, you can change it to whatever you want instead.

Now, if we needed more cache memory, we could now add more server pairs (paired for fault tolerance, see diagram above) and partition the key space. Then on the client, we could do a simple hashmod of the key and direct it to the appropriate load balancer.

To make the transition seamless, I factored out all ehcache access code into a helper class - the application code was basically doing get(), put() and delete() calls on the cache - then switched the calls from local to remote cache. The code is shown below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
package com.mycompany.myapp.helpers;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;

import net.sf.ehcache.Element;

import org.apache.commons.httpclient.DefaultHttpMethodRetryHandler;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
import org.apache.commons.httpclient.methods.DeleteMethod;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.methods.InputStreamRequestEntity;
import org.apache.commons.httpclient.methods.PutMethod;
import org.apache.commons.httpclient.params.HttpMethodParams;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Required;
import org.springframework.stereotype.Service;

@Service("cacheHelper")
public class CacheHelper {

  private final Logger logger = LoggerFactory.getLogger(getClass());
  
  @Autowired private String serviceUrl;
  @Autowired private int readTimeout;
  @Autowired private int connectTimeout;
  @Autowired private int maxRetries;
  @Autowired private HttpClient httpClient;
  
  public Element get(String cacheName, String key) throws Exception {
    String url = StringUtils.join(new String[] {
      serviceUrl, cacheName, key}, "/");
    GetMethod getMethod = new GetMethod(url);
    configureMethod(getMethod);
    ObjectInputStream oin = null;
    int status = -1;
    try {
      status = httpClient.executeMethod(getMethod);
      if (status == HttpStatus.SC_NOT_FOUND) {
        // if the content is deleted already
        return null;
      }
      InputStream in = getMethod.getResponseBodyAsStream();
      oin = new ObjectInputStream(in);
      Element element = (Element) oin.readObject();
      return element;
    } catch (IOException e) {
      logger.warn("GET Failed (" + status + ")", e);
    } finally {
      IOUtils.closeQuietly(oin);
      getMethod.releaseConnection();
    }
    return null;
  }
  
  public void put(String cacheName, String key, Serializable value) 
      throws Exception {
    Element element = new Element(key, value);
    String url = StringUtils.join(new String[] {
      serviceUrl, cacheName, key}, "/");
    PutMethod putMethod = new PutMethod(url);
    configureMethod(putMethod);
    ObjectOutputStream oos = null;
    int status = -1;
    try {
      ByteArrayOutputStream bos = new ByteArrayOutputStream();
      oos = new ObjectOutputStream(bos);
      oos.writeObject(element);
      putMethod.setRequestEntity(new InputStreamRequestEntity(
        new ByteArrayInputStream(bos.toByteArray())));
      status = httpClient.executeMethod(putMethod);
    } catch (Exception e) {
      logger.warn("PUT Failed (" + status + ")", e);
    } finally {
      IOUtils.closeQuietly(oos);
      putMethod.releaseConnection();
    }
  }
  
  public void delete(String cacheName, String key) throws Exception {
    String url = StringUtils.join(new String[] {
      serviceUrl, cacheName, key}, "/");
    DeleteMethod deleteMethod = new DeleteMethod(url);
    configureMethod(deleteMethod);
    int status = -1;
    try {
      status = httpClient.executeMethod(deleteMethod);
    } catch (Exception e) {
      logger.warn("DELETE Failed (" + status + ")", e);
    } finally {
      deleteMethod.releaseConnection();
    }
  }
  
  private void configureMethod(HttpMethod method) {
    if (readTimeout > 0) {
      method.getParams().setSoTimeout(readTimeout);
    }
    if (maxRetries > 0) {
      method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
        new DefaultHttpMethodRetryHandler(maxRetries, false));
    }
  }
}

Calling code is no different from the code calling the local cache. Instead of cache.XXX() you now do cacheHelper.XXX() calls. The service URL is /ehcache/rest.

Here is the cache definition for one of our caches. You can repeat the cache block for multiple caches. The peer listener and peer provider factory defintions are shared across multiple caches.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<ehcache>
    ...
    <defaultCache .../>
    
    <cache name="my-cache"
        maxElementsInMemory="10000"
        eternal="true"
        diskPersistent="true"
        overflowToDisk="true">
      <cacheEventListenerFactory 
        class="net.sf.ehcache.distribution.RMICacheReplicatorFactory"/>
      <bootstrapCacheLoaderFactory 
        class="net.sf.ehcache.distribution.RMIBootstrapCacheLoaderFactory"/>
    </cache>

    <cacheManagerPeerListenerFactory
      class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"/>
    <cacheManagerPeerProviderFactory 
      class="net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory"
      properties="hostName=localhost,
                  peerDiscovery=automatic,
                  multicastGroupAddress=224.0.0.1,
                  multicastGroupPort=4446,
                  timeToLive=1"/>

</ehcache>

Obviously, neither the code or configuration is rocket science, there are enough examples in the Ehcache site for someone to build this stuff themselves. But the code examples of using the Ehcache server are quite generic, they concentrate on caching strings, while most people who have used Ehcache in local mode tend to cache real Serializable objects. Also, most enterprise type places I know tend to use Jakarta's HTTPClient rather than Java's URLConnection. Also, it took me a fair bit of time to figure out and test the distributed cache configuration. So if you are looking for a quick ramp up to using Ehcache server, then this post may be helpful.

Saturday, March 20, 2010

HTTP Debug Proxy with Twisted

Motivation

Recently, I've been building a few distributed components for an upcoming project. The components talk to each other over XMLRPC. So far, the connectivity is PHP-Java, Java-PHP and Java-Java. On the Java side, I use Apache XMLRPC library to build the clients and servers. The PHP side is basically Drupal's XMLRPC service.

Apache XMLRPC provides Java abstractions at both the client and the server ends, so a programmer only needs to work with Java objects. The library takes care of the generation and parsing of the XML request and response - while this is mostly very convenient, sometimes it is helpful for debugging to see the actual XML request and response. This is what initially prompted me to look for this kind of script, and ultimately build one.

Background

I initially tried netcat with tee, but couldn't make it work the way I wanted it to across both my CentOS and Mac OSX machines. To be honest, I didn't try too hard, because the nc/tee combination outputs to two separate files, and I wanted it in one single output.

There are actually two Python scripts which do about the same thing as the one I built. The HTTP/XMLRPC Debug proxy from myelin came closest to what I wanted, but I would have to hack it a bit to accomodate arbitary source ports. Another proxy was Xavier Defrang's HTTP Debugging Proxy which looked promising, but its HTTP only, and I wanted to use it (in the future) for protocols other than HTTP.

One nice (but non-free) tool in this space is Charles. This would be a good model for someone looking to build an Eclipse plugin :-).

I started out building something with Python sockets based on Gordon McMillans's Python Socket Programming HOWTO, but gave up when I started having problems with blocking in send() and recv() - my knowledge of socket programming wasn't enough to follow him down the rabbit hole of select() calls.

I ultimately settled on using Twisted, based on a rather lively discussion which pointed me to Twisted in the first place. What I liked about Twisted is that its very object oriented and feels almost like Java. A Twisted network component (client or server) is built using a protocol and a factory class, plus an optional "business logic" class. The components are not started directly, they are injected into the Twisted reactor object (similar to a IoC container) and the reactor started.

Twisted does have a steep learning curve, but Twisted Matrix Labs provides excellent documentation. There is also the O'Reilly book by Abe Fettig, which I tried to get but couldn't find at my local Borders bookstore. But the online documentation and tutorial is quite good, you can actually figure Twisted out from there.

Architecture

What I wanted was something that will hook in between the client and the server, and print out the request and response on the console, as shown in the figure below. The dotted blue and red lines represent the normal request and response flows respectively. The idea is to repoint the client at the HTTP proxy, and have the proxy forward the request over to the server application.

As you can see, the HTTP proxy is actually a pipeline of a server component and a client component. The server just listens on the port, spawning off a new client-server pair per each incoming connection. Once the server gets the data, it starts a client that sends the data over to the target (server application), gets back the response, and hands it back to the server, which sends it back to the source (client application) and terminates the connection.

Here's the script to do this all.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
#!/usr/local/bin/python
# $Id$
# $Source$
import getopt
import string
import sys

from twisted.internet import protocol
from twisted.internet import reactor

class ConsoleWriter():
  """ Write request (on source port) and response (from target host:port) """
  """ to console. Also holds on to the "latest" data received for output  """
  """ into the proxy server's response stream.                            """

  def write(self, data, type):
    if (data):
      lines = data.split("\n")
      prefix = "<" if type == "request" else ">"
      for line in lines:
        sys.stdout.write("%s %s\n" % (prefix, line))
    else:
      sys.stdout.write("No response from server\n")


class DebugHttpClientProtocol(protocol.Protocol):
  """ Client protocol. Writes out the request to the target HTTP server."""
  """ Response is written to stdout on receipt, and back to the server's"""
  """ transport when the client connection is lost.                     """

  def __init__(self, serverTransport):
    self.serverTransport = serverTransport

  def sendMessage(self, data):
    self.transport.write(data)
  
  def dataReceived(self, data):
    self.data = data
    ConsoleWriter().write(data, "response")
    self.transport.loseConnection()

  def connectionLost(self, reason):
    self.serverTransport.write(self.data)
    self.serverTransport.loseConnection()


class DebugHttpServerProtocol(protocol.Protocol):
  """ Server Protocol. Handles data received from client application.   """
  """ Writes the data to console, then creates a proxy client component """
  """ and sends the data through, then terminates the client and server """
  """ connections.                                                      """

  def dataReceived(self, data):
    self.data = data
    ConsoleWriter().write(self.data, "request")
    client = protocol.ClientCreator(reactor, DebugHttpClientProtocol, self.transport)
    d = client.connectTCP(self.factory.targetHost, self.factory.targetPort)
    d.addCallback(self.forwardToClient, client)

  def forwardToClient(self, client, data):
    client.sendMessage(self.data)


class DebugHttpServerFactory(protocol.ServerFactory):
  """ Server Factory. A holder for the protocol and for user-supplied args """

  protocol = DebugHttpServerProtocol

  def __init__(self, targetHost, targetPort):
    self.targetHost = targetHost
    self.targetPort = targetPort


def usage():
  sys.stdout.write("Usage: %s --help|--source port --target host:port\n"
    % (sys.argv[0]))
  sys.stdout.write("-h|--help: Show this message\n")
  sys.stdout.write("-s|--source: The port on the local host on which this \n")
  sys.stdout.write("             proxy listens\n")
  sys.stdout.write("-t|--target: The host:port which this proxy talks to\n")
  sys.stdout.write("Both -s and -t must be specified. There are no defaults.\n")
  sys.stdout.write("To use this proxy between client app A and server app B,\n")
  sys.stdout.write("point A at this proxy's source port, and point this\n")
  sys.stdout.write("proxy's target host:port at B. The request and response\n")
  sys.stdout.write("data flowing through A and B will be written to stdout for\n")
  sys.stdout.write("your visual pleasure.\n")
  sys.stdout.write("To stop the proxy, press CTRL+C\n")
  sys.exit(2)


def main():
  (opts, args) = getopt.getopt(sys.argv[1:], "s:t:h",
    ["source=", "target=", "help"])
  sourcePort, targetHost, targetPort = None, None, None
  for option, argval in opts:
    if (option in ("-h", "--help")):
      usage()
    if (option in ("-s", "--source")):
      sourcePort = int(argval)
    if (option in ("-t", "--target")):
      (targetHost, targetPort) = string.split(argval, ":")
  # remember no defaults?
  if (not(sourcePort and targetHost and targetPort)):
    usage()
  # start twisted reactor
  reactor.listenTCP(sourcePort,
    DebugHttpServerFactory(targetHost, int(targetPort)))
  reactor.run()


if __name__ == "__main__":
  main()

The server is defined using the DebugHttpServerProtocol and DebugHttpServerFactory, and the client is defined using the DebugHttpClientProtocol. The ConsoleWriter just writes a formatted request and response data to the console.

When a request comes in from the client application, it is sent to DebugHttpServerProtocol.dataReceived, where the data is first written out to the console. A client object is then created using ClientCreator, which takes the DebugHttpClientProtocol and a reference to the server's transport object. The client then connects to the target host and port, and a callback added for the client to relay the request over to the target server once the client connects.

Once the client connects, the callback is triggered, which relays the request across to the target host. The response from the target host is captured by DebugHttpClientProtocol.dataReceived(), which writes the data to the console, then loses the connection. The connection lost event is captured by the connectionLost() method, which writes the response back to the caller and closes the connection.

Testing/Usage

To test the proxy, I started up the proxy to listen to port 1234 and forward to my test Drupal instance running on port 80. I then repointed the service URL in my JUnit test from http://localhost/services/xmlrpc to http://localhost:1234/services/xmlrpc. The JUnit test sends a comment to Drupal's XMLRPC comment.save service.

1
sujit@cyclone:network$ ./httpspy.py -s 1234 -t localhost:80

I then run my JUnit test from another console. On the console where I started httpspy.py, I see the following (the "< " and "> " signifies request and response).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
< POST /services/xmlrpc HTTP/1.1
< Content-Type: text/xml
< User-Agent: Apache XML RPC 3.0 (Jakarta Commons httpclient Transport)
< Host: localhost:1234
< Content-Length: 604
< 
< <?xml version="1.0" encoding="UTF-8"?><methodCall><methodName>comment.save
</methodName><params><param><value><struct><member><name>mail</name><value>f
oo@bar.com</value></member><member><name>subject</name><value>my stupid subj
ect (1269113982563)</value></member><member><name>nid</name><value><i4>8</i4
></value></member><member><name>name</name><value>godzilla</value></member><
member><name>comment</name><value>a test comment entry at 1269113982563 ms s
ince epoch</value></member><member><name>homepage</name><value>http://homesw
eethome.us</value></member></struct></value></param></params></methodCall>
<
<
> HTTP/1.1 200 OK
> Date: Sat, 20 Mar 2010 19:39:42 GMT
> Server: Apache/2.0.63 (Unix) PHP/5.2.11 DAV/2
> X-Powered-By: PHP/5.2.11
> Set-Cookie: SESS421aa90e079fa326b6494f812ad13e79=da32e0283f634a62937761a01c
0fb91d; expires=Mon, 12-Apr-2010 23:13:02 GMT; path=/
> Expires: Sun, 19 Nov 1978 05:00:00 GMT
> Last-Modified: Sat, 20 Mar 2010 19:39:42 GMT
> Cache-Control: store, no-cache, must-revalidate
> Cache-Control: post-check=0, pre-check=0
> Connection: close
> Content-Length: 142
> Content-Type: text/xml
> 
> <?xml version="1.0"?>
> 
> <methodResponse>
>   <params>
>   <param>
>     <value><string>10</string></value>
>   </param>
>   </params>
> </methodResponse>
> 
> 

As you can see, there is still a bit of work needed to beautify the raw request if its XML (probably by parsing out the Content-Type header), but the script is usable right now, so that will be something I will do in the future.

Another use case I have tried is to put it between a client HTTP GET call and a Drupal webpage. I still need to test this stuff extensively through various use-cases - if I find bugs in the code, I will update it here. Meanwhile, if you find bugs or strange behavior (or even better, bugs with fixes :-)), would really appreciate knowing.

Update 2010-04-04: The code here breaks down when (a) the request/response payload is large and/or (b) servers are slow. I am trying to fix the code, will post the updated code when done.

Update 2010-12: Because of its unreliability, I basically abandoned the script in favor of this less general solution. Mikedominice was kind enough to prod me into looking at this again. As a result, I ended up rewriting most of the script, and it seems to work quite well now. The updated code is posted above.

Sunday, March 07, 2010

Drupal Hell: A Controlled Descent Story

As I mentioned before, I've been working on integrating Drupal into our (Java) publishing infrastructure. Over the last few weeks, I have been learning about Drupal, what is possible and what is not. This week, I describe two use-cases where I had to "extend" Drupal somewhat. In the second case, I came very close to hacking the core, hence the title, based on the advice given in The Road to Drupal Hell.

The setup is that we use Drupal as our blogging platform, and then publish the blogs using XML-RPC to a Java application. Only the editorial staff and bloggers have access to the Drupal app. Readers read the blog on the Java based website.

Of course, blogs by their very nature are interactive, so readers should have a way to comment on the blogs. This is done by a Java action which calls Drupal's comment.save XML-RPC service. By default, comments need to be moderated, so the editorial staff will review and publish these comments, which would then result in a node republish, and the comments will appear on the Java site.

Use Case #1: Batch Publishing

Trying to test using the Drupal interface is kind of painful (too many mouse clicks required), so I built a little PHP script that publishes all publishable blogs in one call. This post from Stonemind Consulting was very helpful. Basically, I followed the ideas in here to call the send_request() function. Here is the complete script if you are interested:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
<?php
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
error_reporting(E_ALL);

# include local settings for scripts. These are the values of the remote
# publishing and preview servers.
require_once './sites/default/local_settings.php';

$user = $GLOBALS['user'];
if (isset($user)) {
  if ($user->uid != 1) {
    echo 'Sorry, only administrator can perform this task.';
    return;
  }
} else {
  echo 'Not logged in. Please log in as administrator before performing task';
  return;
}

echo '<b>Republishing nodes...</b><br/>';
echo 'This tool republishes ALL nodes with node.status=1 and node.type=blog.<br/><br/>';
echo '<table cellspacing=3 cellpadding=3 border=1>';
# Header
$num_publishers = 1;
echo '<tr><td><b>Node-ID</b></td><td><b>Title</b></td>';
foreach ($healthline_drpub_remote_urls as &$remote_url) {
  echo '<td><b><a href="' . $remote_url . '">Publisher-' . $num_publishers . '</a></b></td>';
  $num_publishers++;
}
echo '</tr>';
$published_nids = db_query('select nid from node where status = %d and type=\'%s\'', array(1, 'blog'));
$num_nodes = 0;
$num_attempted = 0;
$num_ok = 0;
$num_failed = 0;
while ($result = db_fetch_object($published_nids)) {
  $node = node_load($result->nid);
  $preview_url = sprintf($preview_url, $node->nid);
  echo '<tr><td>' . $node->nid . '</td><td><a href="' . $preview_url . '">' . $node->title . '</td>';
  foreach ($remote_urls as &$remote_url) {
    $response = dxi_send_request($remote_url, 'publish', $node);
    if ($response == 0) {
      echo '<td><font color="green">OK</font></td>';
      $num_ok++;
    } else {
      echo '<td><font color="red">Failed</font></td>';
      $num_failed++;
    }
    $num_attempted++;
  }
  $num_nodes++;
  echo '</tr>';
}
echo '</table><br/>';
echo '#-nodes republished: ' . $num_nodes . ', #-publish attempts: ' . $num_attempted . ', #-successes= ', $num_ok . ', #-failures= ', $num_failed;

As you can see, most of this is plain old PHP, with a few Drupal API method calls thrown in for convenience. You basically call this on a browser from within Drupal's interface, similar to /update.php or /install.php. You will need to be logged in as administrator to run the script.

One problem that I had (which I also had in the second use case, and which I solved differently, and I think a bit more elegantly) was to get the value of the URL for the publisher XML-RPC service, and the URL for the preview service (to allow one-click viewing of the published node). Ideally, I should be able to get these from Drupal itself, since these are already provided to it, but I couldn't find an API call that provides this information, so I had to build up my own mechanism, similar to the settings.php file. My properties file is called local_settings.php and is located as a sibling of the settings.php file in sites/default.

1
2
3
4
5
<?php
$remote_urls = array(
  'http://somehost.mycompany.com/myapp/publish.do'
);
$preview_url = 'http://somehost.mycompany.com/myapp/blog/%d';

I don't really like this approach, as its not DRY, but its the best I could come up with at the time. The approach of setting it in the Drupal variables table (described below) is slightly better, but still not as DRY as I would like.

Use Case #2: Comment Publish/Unpublish

Wanting to do the right thing as a new Drupal developer, I had read the Pro Drupal Book, and heeded and understood the warnings about hacking Drupal's core. In fact, I had structured my module (described earlier) so it exposed a configurable action - that way a site administrator can attach it to whatever trigger he saw fit, rather than have to do this in code by implementing a hook_XXX() method.

Drupal exposes triggers on "insert", "update", "delete" and "view" comment actions. So I figured that a comment publish/unpublish results in a status change, therefore my action should be attached to a comment update trigger, but apparently I was wrong. For some reason (probably performance), the publish/unpublish operation is done through a straight SQL call (specified in comment_operations()), and does not result in the comment update trigger(s) being fired. Actual modification of the comment (such as modifying the content or the title), however, does.

So my first approach was to go in and hack the core, specifically, the comment_hook_info() method, to add the 'publish' and 'unpublish' operations in it. This allowed me to hook up my custom action, but still did not result in it being called (which was a good thing, since it forced me to re-evaluate my options).

Running through the publish/unpublish operation with a debugger, I found that it calls comment_invoke_comment() from comment_admin_overview_submit() when a comment is published from the Approval Queue (or vice versa). The comment_invoke_comment() invokes the hook_comment() method in all loaded modules, so I needed to explicitly implement a dxi_comment() function which would delegate to my custom action. I had originally tried using a hook_nodeapi() implementation and removed it in favor of a custom action, so this was kind of a step backward. In any case, this was finally what I came up with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<?php
function dxi_comment($comment, $op) {
  if ($op != 'publish' && $op != 'unpublish') {
    return;
  }
  if (! isset($comment->nid)) {
    watchdog('dxi', 'No Node for Comment(' . $comment->cid . ', op=' . $op . ')');
    return;
  }
  $nid = $comment->nid;
  $node = node_load($nid, NULL, TRUE);
  if (! $node) {
    watchdog('dxi', 'Node load failed for nid=' . $nid);
    return;
  }
  // prepare the context
  $context = array(
    'op' => $op,
    'hook' => 'comment',
    'remote_url' => variable_get('dxi_remote_url', DEFAULT_DXI_REMOTE_URL)
  );
  return dxi_call_action($node, $context);
}

Note how I have to build up the $context object. Unlike in the custom action, the $context object does not have the 'remote_url' field which is set by dxi_call_action_submit(). So I had to make another change here to set the remote_url value set up by the client into the variables database table, as shown below, so I could use it in my dxi_comment() to populate the $context.

1
2
3
4
5
6
7
8
<?php
function dxi_call_action_submit($form, $form_state) {
  $remote_url = $form_state['values']['remote_url'];
  variable_set('dxi_remote_url', $remote_url);
  return array(
    'remote_url' => $remote_url
  );
}

With this approach, I have basically worked around what I think is an oversight in Drupal's comment module. Since the comment module does pass in 'publish' and 'unpublish' in the $op variable when it calls dxi_comment(), these operations are recognized in the code, so I don't see why they are not exposed so custom actions can be added to them.

Conclusion

I am still learning Drupal, and my main programming language is Java, not PHP, and this probably shows in my code. I am wondering what the best practices are for this kind of stuff. Specifically, I am looking for answers to:

  • What is the best practice for accessing action configuration variables? I have two different ways of pulling the remote_url value from my custom action. Ideally, I would like to just pull it out of the context somehow, or use some Drupal call to get it out of the actions table. Does such a method exist?
  • Have others had this kind of problem with comment publishing and unpublishing? If so, how did you solve them? Is there a reason why the publish and unpublish actions are not exposed as triggers?

I guess I could ask around on the Drupal mailing list, and I probably will, but as someone who really cares just enough about Drupal to get things hooked up correctly with the Java application, I will probably hold off on this until my Java application is more stable. Meanwhile, if you have answers or alternative approaches to this problem, would appreciate hearing from you.