A few years ago, I wrote about starting to use awk after a long time. Over the last couple of years, I've used awk on and off, although not very frequently, and never beyond the basic pattern described in that post.
Last week, I described an AspectJ aspect that wrote out timing information into the server logs. I initially thought of using OpenOffice Calc, but then stumbled upon Jadu Saikia's post on implementing group-by with awk, and I realized that awk could do the job just as easily, and would be easier to use (for multiple invocations).
Here is the script I came up with.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #!/usr/bin/env awk -f
# Builds a report out of aggregate elapsed time data similar to the Spring
# StopWatch prettyPrint report.
# Usage:
# my_profile_report.awk [-v qt=query_term] filename
#
BEGIN {
FS="|";
}
(qt == "" || qt == $3) {
counts[$4]++;
times[$4] += $5;
if (mins[$4] + 0 != 0) {
if (mins[$4] > $5) {
mins[$4] = $5;
}
} else {
mins[$4] = $5;
}
if (maxs[$4] + 0 != 0) {
if (maxs[$4] < $5) {
maxs[$4] = $5;
}
} else {
maxs[$4] = $5;
}
totals += $5;
}
END {
totavg = 0;
for (t in times) {
totavg += times[t]/counts[t];
}
printf("---------------------------------------------------------------------\n");
printf("%-32s %8s %8s %8s %8s\n", "Controller", "Avg(ms)", "%", "Min(ms)", "Max(ms)");
printf("---------------------------------------------------------------------\n");
for (t in times) {
avg = times[t]/counts[t];
perc = avg * 100 / totavg;
printf("%-32s %8.2f %8.2f %8.2f %8.2f\n", t, avg, perc, mins[t], maxs[t]);
}
printf("---------------------------------------------------------------------\n");
printf("%-32s %8.2f\n", "Average Elapsed (ms):", totavg);
printf("---------------------------------------------------------------------\n");
}
|
As shown in my previous post, the input file looks something like this (without the header, I just grep the server log file with the aspect name).
1 2 3 4 5 6 | # aspect_name|request_uuid|query|tile_name|elapsed_time_millis
MyAspect2|ed9ce777-263e-4fe5-a8af-4ea84d78add3|some query|TileAController|132
MyAspect2|ed9ce777-263e-4fe5-a8af-4ea84d78add3|some query|TileBController|49
MyAspect2|ed9ce777-263e-4fe5-a8af-4ea84d78add3|some query|TileCController|2
MyAspect2|ed9ce777-263e-4fe5-a8af-4ea84d78add3|some query|TileDController|3
...
|
The script can be invoked with or without a qt parameter. Without parameters, the script computes the average, minimum and maximum elapsed times across all the queries that were given to the application during the profiling run. The report looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | sujit@cyclone:aspects$ ./my_profile_report.awk input_file
---------------------------------------------------------------------
Controller Avg(ms) % Min(ms) Max(ms)
---------------------------------------------------------------------
TileFController 1.42 0.02 0.00 42.00
TileAController 187.67 2.83 27.00 685.00
TileJController 169.97 2.56 7.00 3140.00
TileEController 9.14 0.14 2.00 44.00
TileNController 45.91 0.69 4.00 234.00
TileIController 444.91 6.71 0.00 3427.00
TileDController 1506.30 22.72 792.00 12140.00
TileMController 184.88 2.79 0.00 3078.00
TileHController 13.67 0.21 7.00 50.00
TileCController 34.06 0.51 14.00 108.00
TileLController 759.73 11.46 3.00 9921.00
TileGController 2473.24 37.31 579.00 10119.00
TileBController 24.48 0.37 7.00 132.00
TileKController 773.97 11.67 0.00 8554.00
---------------------------------------------------------------------
Average Elapsed (ms): 6629.35
---------------------------------------------------------------------
|
If we want only the times for a specific query term, then we can specify it on the command line as shown below. If the query has been invoked multiple times, then the report will show the average of the calls for the query. In this case, there is only a single invocation for the query, so the minimum and maximum times are redundant.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | sujit@cyclone:aspects$ ./my_profile_report.awk -v qt=some_query input_file
---------------------------------------------------------------------
Controller Avg(ms) % Min(ms) Max(ms)
---------------------------------------------------------------------
TileFController 0.00 0.00 0.00 0.00
TileAController 106.00 4.57 106.00 106.00
TileJController 7.00 0.30 7.00 7.00
TileEController 10.00 0.43 10.00 10.00
TileNController 8.00 0.34 8.00 8.00
TileIController 0.00 0.00 0.00 0.00
TileDController 1309.00 56.42 1309.00 1309.00
TileMController 0.00 0.00 0.00 0.00
TileHController 12.00 0.52 12.00 12.00
TileCController 25.00 1.08 25.00 25.00
TileLController 64.00 2.76 64.00 64.00
TileGController 767.00 33.06 767.00 767.00
TileBController 12.00 0.52 12.00 12.00
TileKController 0.00 0.00 0.00 0.00
---------------------------------------------------------------------
Average Elapsed (ms): 2320.00
---------------------------------------------------------------------
|
The script uses awk's associative arrays and the post processing block. Before this script, I had read about awk's BEGIN/END blocks but didn't know why I would want to use them, and I didn't know about awk's associative arrays at all. Thanks to Daniel Robbin's three part series - Common threads: Awk by example, Part 1, 2 and 3, I now have a greater understanding and appreciation of awk.
2 comments (moderated to prevent spam):
You can also use an in-mem database like H2 to import the CSV directly into a table and run all the aggregates and sorts you need - http://www.h2database.com/html/functions.html?highlight=CSVREAD&search=csv#csvread
Thanks Ashwin, did not know about H2, this is definitely a much nicer approach (more flexible) than the one I described. I read up a bit about H2, it looks very promising, thanks for the pointer.
Post a Comment