I recently attended the MySQL Users Conference at Santa Clara, CA. One of the tutorials I signed up for was Mark Matthew's talk on J2EE Performance Tuning. Of course, this was a MySQL conference, and the speaker happened to be the author of the original JDBC driver for MySQL, so understandably there was a lot of emphasis on the new performance monitoring aspects of MySQL/Connector-J version 5, the upcoming JDBC driver for MySQL version 5.x databases.
But the nice part of the talk was that it got me thinking again about how to monitor and address performance related issues in a holistic manner. In previous lives, I have been a developer and part time system administrator for console-based Unix systems, and more recently, an Informix DBA. In both these times, I have had to address performance issues, and I have done so in a non-holistic (for want of a better word) manner. For example, my first reaction to a performance issue as an Informix DBA would have been to check the database read-write statistics, looking for hot spots, and trying to address problems by splitting the reads and writes on different disks. The next place I would look for is to check cardinality of data in the tables, looking for missing indexes. Since Informix, to the best of my knowledge, did not log slow queries, analyzing the queries meant that I would have to scan the entire codebase looking for them, so that was something I would do after the other approaches did not deliver the required functionality.
As a developer in a J2EE environment, I still have to address performance issues, but the focus is developer oriented. Typically, I measure wall-clock times of various methods and find methods which take the longest times and see if there is SQL or code that can be optimized. I measure front end responses with the Apache Bench tool, which allows you to set the number of clients, and the number of requests each client will make, and returns (among other things), the requests per second the page could serve, and the average, minimum and maximum processing times. Although MySQL logs slow queries in the slow query log, it gets used only when I am reacting to a performance problem, not when I am being proactive about ensuring my code is performant, because getting the slow query log needs DBA involvement (on MySQL version 4.1). The important thing to note in all these cases is that the performance measurements are developer-centric.
Occasionally, when reacting to performance problems, I would also look at application server (Resin) thread dumps, and try to find and fix code bottlenecks by tracing the dump back to the offending code. Although we don't run Resin with the stock JVM settings (these settings are determined by another group, based on the machine capacity on which Resin will be running), the only JVM settings I have ever actually changed myself are the minimum and maximum heap sizes.
What the talk did for me was to highlight that a J2EE application is really a layered cake of potentially non-performant hardware and software. At the very bottom there is the CPU and the RAM, followed by the operating system, followed by the database, followed by the application server, followed by the application code. The operating system, the database and the application server could potentially be non-performant because they have been improperly tuned for the application.
Fortunately, however, one does not have to start from scratch when trying to optimize for performance. It really boils down to choosing the right sized components as a starting point. Based on the projected demands on your application, you can usually choose the appropriate number and type of CPUs, and the amount of memory on your system, and the size of your swap space (among other things) on a Linux (Unix) based operating system. Databases generally offer configuration profiles (for example, the tiny, small, large and huge memory models of MySQL) to suit the particular application and hardware. Java based application servers allow you to tune the starting and maximum heap sizes, the type of garbage collector you want to use, and the generation sizes within your heap to optimize garbage collection. So really, the starting point of delivering optimal performance is to set up the optimal capacity.
Still, a person who needs to diagnose and fix a problem with a J2EE application will need to be familiar with and be able to tweak all these subsystems. Because performance metrics are heavily application dependent, this person will also need to be familiar with the application itself. Finding such a person in an organization of even moderate size is next to impossible. Bringing together groups of people to do a performance audit or diagnose and fix performance issues is a possibility, but since fixing a performance problem involves an iterative cycle of observation, tweak and more observation, this is often a time consuming operation, and often not acceptable to a business, which is losing money every minute with the non-performing application.
There was a time when I would sneer at the practice of "throwing more hardware" at a problem to fix performance issues, but the more I think about the expenses in continuing to operate a non-performant application, and the logistics to try to fix this in the time provided, the more I lean towards this option as the simplest and most cost effective way to deliver performance. By that, I do not imply the sloppy and non-performant code is ok. If there are indexes that need to be applied to the database, or the SQL needs to be re-written, or the components appropriately sized, then these should by done first. However, if your application is serving 1000 pages per second, and it starts keeling over when it is required to serve 2000 pages per second, it is possible that a few days of performance tuning will allow you to scale to the new level, perhaps more. But if it did, then it is more than possible that your application was not performant to begin with, and that should have been addressed before the application was deployed.
What I mean by "throwing more hardware" is the ability to scale out using clustering technologies. Putting the application behind a webserver and setting up reverse proxies to multiple underlying application servers, all running the same application, is one way. While each application server will contain the exact same copy of the application, they will be serving different slices of the application. The slicing will be set up in the reverse proxy configuration of the webserver. Alternatively, each application server serves the full application, but through multiple webservers behind a hardware load balancer. A hybrid of these two approaches is also a possibility.
On the database side, the scale out can be achieved by clustering multiple database masters (the read-any, write-all approach) or database replicants (one master, multiple slaves). I personally prefer the multiple master clustering scenario, since the application does not need to be changed at all to accomodate the change. The application thinks it is talking to a single database. On the other hand, in a replicated setup, you will have to have separate configurations to read from the slaves and to read and write from and to the master. There is also a replication latency which you will have to account for if your application has a scenario where it writes and reads back within a very short time. Most databases (including MySQL) offer the ability to do replication. C-JDBC is an open source initiative to achieve multiple master clustering, but its performance leaves a lot to be desired. I am told that m/cluster from Continuent, a commercial offering based on C-JDBC is much better in that regard.
Another important component is the application cache. Most J2EE applications that serve dynamic (ie generated from a database) and have significant traffic tend to use caching of some kind. When we attempt to serve the traffic with multiple machines, the cache has to be shared between the application server JVMs. There are a variety of distributed caches available in the market. The best known of these is Coherence from Tangosol, but there are others, such as JBoss Cache and SwarmCache. These caches either replicate or distribute - replicated caches replicate the cached contents to all the caches in the cluster, while distributed caches store it in only one place, but are able to pull it from the right place when they are requested to.
I still think that performance reviews of application code and load testing on hardware comparable to actual production hardware have lots of value, but neither of these approaches are simple to set up. The new MySQL JDBC driver provides a lot more performance metrics (including a "local" list of slow queries), so a disciplined J2EE developer using MySQL will find this very helpful in finding and fixing code and SQL performance bottlenecks before pushing the code out of development.
So I think, my personal performance tuning mantra boils down to these two simple commandments:
- Find and fix bottlenecks in SQL and code in development.
- Simulate load using the Apache Bench tool.
- Design for clustering.