Saturday, August 23, 2008

Managing remote scripts with JMX

Around eight years ago, three of us found ourselves maintaining a fairly large multi-language distributed system. Originally, each component was maintained by their original authors, so it was relatively easy for them to ensure that things were good with that component, but with the development phase ending, most of them moved off to other jobs. While we knew our way around most of these systems, we did not have the bandwidth to check on each of them regularly. So our manager came up with the idea of a central monitoring application, which we built and used with great success. Monitoring scripts written in Perl were deployed on the servers where the actual components ran, and a Perl client queried them both via cron and on-demand from a console, and reported the results. In retrospect, I think this was my first experience with management applications.

My next job was with a web publishing house, and although I did not actually work on any of the backend systems, my middleware components would interact with some of them, and without exception, all of them had embedded HTTP (mostly Jetty) servers reporting status of the application in real-time. This was my second experience with management applications.

More recently, I have been thinking about a problem closer to home. Our backend systems are composed of shell scripts calling Java and C/C++ components, and the only way to see what is happening with them is to actually log into the machines and look at their logs, for which, as developers, we have no access. Since most scripts notify on errors, there is the possibility that a script never ran, and developers will only find out about this when they see the side effects (or lack of it) in their application.

I had read about JMX in the past and had even tinkered with it, but never found a compelling reason to use it - that is, until now. The solution I came up with is a hybrid of the two approaches described above. Like the first approach, the MBean relies on standard operating system commands to do the monitoring, and like the second, I use a HTTP server to display the monitoring information, although I use the JMX tool suite's built-in HTTP Adapter rather than build a custom one.

Here is my toy script that I want to set up JMX instrumentation on. It's a script to count the number of animal clones. Depending on the argument provided to it, it whips out a cloned sheep (Dolly) or a parrot (Polly) every 30 seconds and logs a count.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# Check if already running, if so exit
PID_FILE=/tmp/count_sheep_$1.pid
LOG_FILE=/tmp/count_sheep_$1.log
ERR_FILE=/tmp/count_sheep_$1.err
if [ -e $PID_FILE ]; then
  currdate=`date --rfc-3339=seconds`
  echo "[$currdate]: Clone counter already running, cannot start" >gt;>gt; $ERR_FILE
  exit -1
fi
# Start processing
trap `rm -f $PID_FILE` INT TERM EXIT
echo $$ >gt; $PID_FILE
rm -f $LOG_FILE
rm -f $ERR_FILE
if [ $# -eq 0 ]; then
  sheep_name='Dolly'
else
  sheep_name=$1
fi
i=0
while (true); do
  currdate=`date --rfc-3339=seconds`
  i=`expr $i + 1`
  echo "[$currdate]: Hello $sheep_name $i" >gt;>gt; $LOG_FILE
  sleep 1
done
# Clean up
rm $PID_FILE
trap - INT TERM EXIT

Notice that I am depending on certain coding and naming conventions which should be fairly easy to enforce in a corporate environment. If not, there is very likely to be other conventions that you can take advantage of to make your MBeans more reusable across different script types. The required conventions are as follows:

  • Scripts should always write their PID into a .pid file at a specific location on startup and remove it on normal termination.
  • Scripts should write progress information into a .log file at a specific location.
  • Scripts should write error information into a .err file at a specific location only if there is an error.

Our MBean to monitor and control scripts is a standard MBean, and its interface is shown below. Notice that we expose the operations "start" and "kill" (it runs in an infinite loop), and a bunch of read-only properties (signified by getXXX methods). In order to start the script, we pass in the path name of the script and its arguments via a constructor in its implementation. This is for security, so people cannot start up "/bin/rm -rf *" via JMX for example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Source: src/main/java/com/mycompany/myapp/ScriptAdapterMBean.java
package com.mycompany.myapp;

public interface ScriptAdapterMBean {

  // read-only attributes
  public boolean isRunning();
  public String getStatus();
  public String getLogs();
  public long getRunningTime();
  
  // operations
  public void start();
  public void kill();
}

The corresponding implementation is shown below. As you can see, for each of the exposed operations and attributes, I spawn a Java Process and run operating system commands through it, similar to what a human user would do if he could log on to the script's host machine.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
// Source: src/main/java/com/mycompany/myapp/ScriptAdapter.java
package com.mycompany.myapp;

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;
import org.apache.commons.lang.StringUtils;
import org.springframework.jmx.JmxException;

public class ScriptAdapter implements ScriptAdapterMBean {

  private static final String LOG_DIR = "/tmp";
  private static final String ERR_DIR = "/tmp";
  private static final String PID_DIR = "/tmp";
  
  private String path;
  private String args;

  private String adapterName;
  
  private String pidfileName;
  private String logfileName;
  private String errfileName;
  
  private Process process;
  
  /**
   * Ctor to build a ScriptAdapter MBean implementation. Path and args
   * are injected via ctor for security.
   * @param path the full path to the script.
   * @param args the arguments to the script.
   */
  public ScriptAdapter(String path, String[] args) {
    this.path = path;
    this.args = StringUtils.join(args, " ");
    this.adapterName = StringUtils.join(new String[] {
      FilenameUtils.getBaseName(path),
      (args.length == 0 ? "" : "_"),
      StringUtils.join(args, "_")
    });
    this.pidfileName = FilenameUtils.concat(PID_DIR, adapterName + ".pid");
    this.logfileName = FilenameUtils.concat(LOG_DIR, adapterName + ".log");
    this.errfileName = FilenameUtils.concat(ERR_DIR, adapterName + ".err");
  }

  /**
   * Needed by Agent to build the ObjectName reference. Not part of the MBean
   * so it will not be available as a manageable attribute.
   * @return the computed adapter name.
   */
  public String getAdapterName() {
    return adapterName;
  }
  
  /**
   * Checks for existence of the PID file. Uses naming conventions
   * to locate the correct pid file.
   */
  public boolean isRunning() {
    File pidfile = new File(pidfileName);
    return (pidfile.exists());
  }

  /**
   * If isRunning, then status == RUNNING.
   * If isRunning and .err file exists, then status == ERROR
   * If !isRunning and .err file does not exist, then status == READY 
   */
  public String getStatus() {
    File errorfile = new File(errfileName);
    if (errorfile.exists()) {
      return "ERROR";
    } else {
      if (isRunning()) {
        return "RUNNING";
      } else {
        return "READY";
      }
    }
  }

  /**
   * Runs a tail -10 logfile on the log file to get the latest logs at
   * this point in time.
   * @return the last 10 lines (or fewer) of the log file.
   */
  public String getLogs() {
    if ("ERROR".equals(getStatus())) {
      File errorfile = new File(errfileName);
      try {
        return FileUtils.readFileToString(errorfile, "UTF-8");
      } catch (IOException e) {
        throw new JmxException("IOException getting error file", e);
      }
    } else {
      try {
        Process tailProcess = Runtime.getRuntime().exec(
          StringUtils.join(new String[] {
          "/usr/bin/tail", "-10", logfileName}, " "));
        tailProcess.waitFor();
        BufferedReader console = new BufferedReader(
          new InputStreamReader(tailProcess.getInputStream()));
        StringBuilder consoleBuffer = new StringBuilder();
        String line;
        while ((line = console.readLine()) != null) {
          consoleBuffer.append(line).append("\n");
        }
        console.close();
        tailProcess.destroy();
        return consoleBuffer.toString();
      } catch (IOException e) {
        e.printStackTrace();
        throw new JmxException("IOException getting log file", e);
      } catch (InterruptedException e) {
        throw new JmxException("Tail interrupted", e);
      }
    }
  }

  /**
   * Returns the difference between the PID file creation and the current
   * system time.
   */
  public long getRunningTime() {
    if (isRunning()) {
      File pidfile = new File(pidfileName);
      return System.currentTimeMillis() - pidfile.lastModified();
    } else {
      return 0L;
    }
  }

  /**
   * Operation to start the script.
   */
  public void start() {
    try {
      if (! isRunning()) {
        process = Runtime.getRuntime().exec(
          StringUtils.join(new String[] {path, args}, " "));
        // we don't wait for it to complete, just start it
      }
    } catch (IOException e) {
      throw new JmxException("IOException starting process", e);
    }
  }

  /**
   * Operation to stop the script using a kill -9 PID call, using the
   * PID from the .pid file.
   */
  public void kill() {
    if (isRunning()) {
      File pidfile = new File(pidfileName);
      try {
        String pid = FileUtils.readFileToString(pidfile, "UTF-8");
        Runtime.getRuntime().exec(StringUtils.join(new String[] {
          "/usr/bin/kill", "-9", pid}, " "));
        if (process != null) {
          // remove hanging references
          process.destroy();
        }
        pidfile.delete();
      } catch (IOException e) {
        throw new JmxException("IOException killing process", e);
      }
    }
  }
}

Finally, we build an MBean server agent that will load up one or more instances of our MBean. In order to expose the HTTP interface, we also load in the HtmlAdaptorServer and set it to listen on port 8081.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// Source: src/main/java/com/mycompany/myapp/ScriptAgent.java
package com.mycompany.myapp;

import java.util.ArrayList;
import java.util.List;

import javax.management.MBeanServer;
import javax.management.MBeanServerFactory;
import javax.management.ObjectName;

import com.sun.jdmk.comm.HtmlAdaptorServer;

public class ScriptAgent {

  // the port number of the HTTP Server Adapter
  private final static int DEFAULT_AGENT_PORT = 8081;
  
  private MBeanServer server;
  private List<ScriptAdapter> scriptAdapters = 
    new ArrayList<ScriptAdapter>();
  
  public ScriptAgent() {
    super();
  }
  
  public void addScriptAdapter(ScriptAdapter scriptAdapter) {
    this.scriptAdapters.add(scriptAdapter);
  }

  protected void init() throws Exception {
    server = MBeanServerFactory.createMBeanServer();
    for (ScriptAdapter scriptAdapter : scriptAdapters) {
      server.registerMBean(scriptAdapter, new ObjectName(
        "script:name=" + scriptAdapter.getAdapterName()));
    }
    HtmlAdaptorServer adaptor = new HtmlAdaptorServer();
    adaptor.setPort(DEFAULT_AGENT_PORT);
    server.registerMBean(adaptor, new ObjectName("adapter:protocol=HTTP"));
    adaptor.start();
  }
  
  public static void main(String[] args) {
    try {
      ScriptAgent agent = new ScriptAgent();
      agent.addScriptAdapter(new ScriptAdapter(
        "/home/sujit/src/rscript/src/main/sh/count_sheep.sh", 
        new String[] {"Dolly"}));
      agent.addScriptAdapter(new ScriptAdapter(
        "/home/sujit/src/rscript/src/main/sh/count_sheep.sh", 
        new String[] {"Polly"}));
      agent.init();
    } catch (Exception e) {
      e.printStackTrace(System.err);
    }
  }
}

Next we compile our code and start this agent up on the command line using Maven2, like so:

1
2
sujit@sirocco:~$ mvn exec:java \
  -Dexec.mainClass=com.mycompany.myapp.ScriptAgent

And point our browser to http://localhost:8081. In my little experiment, everything is on the same machine, but in a real-world setup, I would be pointing my browser to a server agent listening on a remote machine.

And now, on to the mandatory screenshots, to walk you through a typical flow using the MBean HTML adapter.

The Agent View - we have loaded two instances of the ScriptAdapter MBean, one to manage the script for cloning sheep and one for cloning parrots.
The MBean view for the sheep cloning management bean. I've rolled it down a bit so you can see both the kill and start operation buttons. Notice the status attribute "READY". Right now, logs are empty, since there are no logs in the file system.
Clicking on the [Start] button says "start was successful". Lets go back to the MBean view by clicking the Back to MBean View link.
The MBean view shows that the script is running. Notice the status attribute "RUNNING" and the non-zero value of RunningTime. If you reload, you will see the logs and the running time refreshing.

Once we have enough of sheep-cloning, we can kill the process by hitting the [KILL] button. This will produce the "Kill Successful" page.
Back at the MBean view, we notice that the status is READY (to run), and the isRunning value is false. We also notice the logs from the last run in there, just in case one needs to do some post-mortem. If you don't like this, then you can remove the .log file in the script or change the MBean code to check for isRunning() before showing the logs.

For the longest time, JMX appeared to me to be useful for tool and middleware vendors, but somewhat pointless for application developers, so I never really pursued it. The only time I have seen it being used by application developers in web application development is to build in backdoors to the application to turn on and off verbose logging for a short time, for example.

This application seems to be a good use of JMX. Obviously there is much more to JMX than what I have done so far, but I plan to experiment with some more features of JMX and describe it in future posts.

No comments:

Post a Comment

Comments are moderated to prevent spam.