Sunday, January 25, 2009

Scala Mock Interview: OO Design

In my last post, I described some of my Scala based solutions to Steve Yegge's phone screen questions. This week, to address the OO Design section, I attempt an Object-Oriented implemention of the popular board game Monopoly. This was an actual question asked of me at my Amazon interview, one that I bungled hopelessly because I realized about 30 minutes into a 45 minute interview that I had forgotten the rules of the game.

Another reason is that my boys got the game last Christmas, so I figured it would be a good chance to re-learn it. About 6.5 hours into the game, we had to terminate it because the kids had homework to do. Made me wonder how long the game would last if we had kept going. Also, thinking about it some more, it seemed to be an interesting OO model, and implementing it in Scala would be a good way of learning the language a bit more.

I ended up with an object model, the UML diagram for which is shown below. To be honest, this is the final product after about a week (approximately 2 hours per day) of iterative design and coding, although the object model I started with contained about 80% of the classes you see in here.

The central class in the system is the Monopoly singleton object. It contains 40 slots (represented by objects of trait Slot), of which 28 are properties, and the rest represent some sort of action that the player landing on the spot has to do, represented by the Asset trait and the ActionSlot class respectively. Among the 28 properties, there are 4 railroads and 2 utilities, and the rest represent locations. The difference between locations and the other properties is that locations can be built on (improved). So Assets are subclassed into PropertySlot, RailroadSlot and UtilitySlot. The reason RailroadSlot and UtilitySlot are broken out is that the method to calculate rent is different for them.

In addition, there is a stack of Chance and Community Chest cards. On reaching certain slots, a player picks the top card and executes the action described in the card, so both Chance and Community Chests can be represented by a List of ActionSlots.

Moving on to more (sort of) animate entities, the players interact with the Monopoly object by rolling dice, moving through the slots, etc. They also interact with each other and the Bank by buying and selling property. The Bank is a participant in transactions, but does not actively buy and sell property. So our solution is to create a Participant trait which participates in transactions, which both the Player class (with methods to roll dice, move, buy, sell, liquidate, etc) and the Bank object (since there is only a single instance of this in our system) extend.

The Scala code for this system is shown below. The code is quite self-explanatory, and I have comments where I felt that things needed explaining. If you don't understand something, I suggest checking out the Programming Scala book or looking the thing up on the Internet. I've been doing a lot of both this last week, and I can tell you that its been quite an education. The code is broken up into three files, representing the three layers in the UML diagram above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// Source: src/main/scala/com/mycompany/smi/ood/Monopoly.scala
package com.mycompany.smi.ood

import scala.io.Source
import scala.collection.mutable.Map

object Monopoly {

  // initialize data structures from the config file
  val lines = 
    Source.fromFile("src/main/resources/monopoly.cfg").getLines.toList
  val slots = lines.filter(line => line.startsWith("slot")).
    map(line => {
      val cols = line.split(":")
      cols(2) match {
        case "Property" => new PropertySlot(cols(3), cols(4),
                                            cols(5).toInt, cols(6).toInt,
                                            cols(7).toInt);
        case "Railroad" => new RailroadSlot(cols(3), cols(4).toInt)
        case "Utility" => new UtilitySlot(cols(3), cols(4).toInt)
        case "Action" => new ActionSlot(cols(3))
      }
    }).toArray
  var chanceCards = shuffle(
    lines.filter(line => line.startsWith("chance")).
    map(line => new ActionSlot(line.split(":")(2))))
  var communityCards = shuffle(
    lines.filter(line => line.startsWith("community")).
    map(line => new ActionSlot(line.split(":")(2))))
  var players = List[Player]()

  // client code will add as many players that will play
  def addPlayer(player: Player): Player = {
    players ::= player
    player
  }

  // All players throw the dice, the player with the highest throw
  // starts first. We simulate this here with a shuffle.
  def startPlay(): Unit = {
    val bank = Bank
    players = shuffle(players)
  }

  // convenience method for client code to get a list of slots of a
  // particular type
  def slotsOfType[T](clazz: Class[_ <: T]): List[T] = {
    slots.filter(slot => clazz.isAssignableFrom(slot.getClass)).
      map(slot => slot.asInstanceOf[T]).toList
  }

  // Fisher-Yates shuffle adapted from:
  // http://jdleesmiller.blogspot.com/2008/12/shuffles-surprises-and-scala.html
  def shuffle[T](list: List[T]): List[T] = {
    val randomizer = new Random()
    var arr = list.toArray
    for (i <- arr.indices.reverse) {
      val temp = arr(i)
      val j = randomizer.nextInt(i + 1)
      arr(i) = arr(j)
      arr(j) = temp
    }
    arr.toList
  }
}

The Monopoly object is configured with data from a configuration file, which is just a colon delimited file as shown below. The comment lines at the top of each block describe the contents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# Source: src/main/resources/monopoly.cfg
# slot:${id}:Action:${name}:
# slot:${id}:Property:${name}:${group}:${sellingPrice}:${rent}:${housePrice}:
# slot:${id}:Railroad:${name}:${sellingPrice}:
# slot:${id}:Utility:${name}:${sellingPrice}:
slot:1:Action:AdvanceToGo:
slot:2:Property:Mediterenean Avenue:Brown:60:2:50:
slot:3:Action:PickCommunityCard:
slot:4:Property:Baltic Avenue:Brown:60:4:50:
slot:5:Action:IncomeTax:
slot:6:Railroad:Reading Railroad:200:
slot:7:Property:Oriental Avenue:LtBlue:100:6:50:
slot:8:Action:PickChanceCard:
slot:9:Property:Vermont Avenue:LtBlue:100:6:50:
slot:10:Property:Connecticut Avenue:LtBlue:120:8:50:
slot:11:Action:GoToJail:
slot:12:Property:St Charles Place:Violet:140:10:100:
slot:13:Utility:Electric Company:150:
slot:14:Property:States Avenue:Violet:140:10:100:
slot:15:Property:Virginia Avenue:Violet:160:12:100:
slot:16:Railroad:Pennsylvania Railroad:200:
slot:17:Property:St James Place:Orange:180:14:100:
slot:18:Action:PickCommunityCard:
slot:19:Property:Tennessee Avenue:Orange:180:14:100:
slot:20:Property:New York Avenue:Orange:200:16:100:
slot:21:Action:FreeParking:
slot:22:Property:Kentucky Avenue:Red:220:18:150:
slot:23:Action:PickChanceCard:
slot:24:Property:Indiana Avenue:Red:220:18:150:
slot:25:Property:Illinois Avenue:Red:240:18:150:
slot:26:Railroad:B&O Railroad:200:
slot:27:Property:Atlantic Avenue:Yellow:260:22:150:
slot:28:Property:Ventnor Avenue:Yellow:260:22:150:
slot:29:Utility:Water Works:150:
slot:30:Property:Marvin Gardens:Yellow:280:22:150:
slot:31:Action:GoToJail:
slot:32:Property:Pacific Avenue:Green:300:26:200:
slot:33:Property:North Carolina Avenue:Green:300:26:200:
slot:34:Action:PickCommunityCard:
slot:35:Property:Pennsylvania Avenue:Green:300:28:200:
slot:36:Railroad:Short Line Railroad:200:
slot:37:Action:PickChanceCard:
slot:38:Property:Park Place:DkBlue:350:35:200:
slot:39:Action:LuxuryTax:100:
slot:40:Property:Boardwalk:DkBlue:400:50:200:

# chance:${id}:${actionName}:
chance:1:AdvanceToGo:
chance:2:AdvanceToIllinoisAve:
chance:3:AdvanceToNearestUtility:
chance:4:AdvanceToNearestRailroad:
chance:5:AdvanceToStCharlesPlace:
chance:6:BankDividend:
chance:7:GetOutOfJailFree:
chance:8:Back3Spaces:
chance:9:GoToJail:
chance:10:OperaTickets:
chance:11:SpeedingFine:
chance:12:AdvanceToReadingRailroad:
chance:13:AdvanceToBoardwalk:
chance:14:LoanMatures:
chance:15:WonCrosswordComp:
chance:16:DrunkFine:

# community:${id}:${actionName}:
community:1:AdvanceToGo:
community:2:BankError:
community:3:DoctorsFee:
community:4:GetOutOfJailFree:
community:5:GoToJail:
community:6:Birthday:
community:7:IncomeTaxRefund:
community:8:HospitalFees:
community:9:ConsultingFees:
community:10:StreetRepairs:
community:11:BeautyContest:
community:12:Inheritance:
community:13:StockSale:
community:14:HolidayFund:
community:15:PayFine:
community:16:InsurancePremium:

The classes and traits for the Slot hierarchy is detailed in the Slot.scala file below:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
// Source: src/main/scala/com/mycompany/smi/ood/Slot.scala
package com.mycompany.smi.ood

trait Slot {
  val name: String

  override def hashCode(): Int = {
    name.hashCode
  }

  override def equals(that: Any): Boolean = {
    that.isInstanceOf[Slot] && this.name == that.asInstanceOf[Slot].name
  }

  override def toString(): String = {
    name
  }
}

trait Asset extends Slot {
  val sellingPrice: Int
  val mortgagePrice: Int = sellingPrice / 2

  var ownedBy: Participant = Bank
  var mortgaged: Boolean = false

  def rent(): Int
}

class PropertySlot(propertyName: String, propertyGroup: String,
                   value: Int, rentValue: Int, houseValue: Int) extends Asset {
  val name = propertyName
  val group = propertyGroup
  val sellingPrice = value
  val housePrice = houseValue
  val hotelPrice = houseValue * 4
  var houses = 0
  var hotels = 0

  def rent() = (value / 10)

  override def toString(): String = {
    name + "/" + group + "(" + houses + "," + hotels + ")"
  }
}

class RailroadSlot(railroadName: String, value: Int) extends Asset {
  val name = railroadName
  val sellingPrice = value

  // if owner owns (1,2,3,4) railroads, then charge (25,50,100,200)
  def rent: Int = {
    Monopoly.slotsOfType[RailroadSlot](classOf[RailroadSlot]).
      filter(slot => slot.ownedBy == this.ownedBy).size * 25
  }
}

class UtilitySlot(utilityName: String, value: Int) extends Asset {
  val name = utilityName
  val sellingPrice = value

  // we've modified this a bit. The rule says that the incoming player must
  // roll the dice and pay 4xdice if 1 utility owned, else 10x dice if both
  // owned. Since the dice throw is really the sum of two random numbers
  // generated between 1-6, it does not matter who throws it, and it is more
  // convenient for us to get a reference to the railroad owning player, so
  // we make the owner throw the dice to compute the rent.
  def rent: Int = {
    val dicethrow = ownedBy.asInstanceOf[Player].rollDice()
    val utilsOwned = Monopoly.slotsOfType[UtilitySlot](classOf[UtilitySlot]).
      filter(slot => slot.ownedBy == this.ownedBy).size
    if (utilsOwned == 1) (4 * dicethrow) else (10 * dicethrow)
  }
}

trait Action {
  def execute(player: Player)
}

class ActionSlot(actionName: String) extends Slot with Action {
  val name = actionName
  override def execute(player: Player) {
    println("...Executing: " + name)
    name match {
      case "AdvanceToBoardwalk" => player.advanceTo(39)
      case "AdvanceToGo" => player.advanceTo(0)
      case "AdvanceToIllinoisAve" => player.advanceTo(25)
      case "AdvanceToNearestRailroad" => 
        player.advanceToNearestOf(List(5, 15, 25, 35))
      case "AdvanceToNearestUtility" => 
        player.advanceToNearestOf(List(12, 28))
      case "AdvanceToReadingRailroad" => player.advanceTo(5)
      case "AdvanceToStCharlesPlace" => player.advanceTo(11)
      case "Back3Spaces" => player.advanceBy(-3)
      case "BankDividend" => Bank.pay(player, 50)
      case "BankError" => Bank.pay(player, 200)
      case "BeautyContest" => Bank.pay(player, 10)
      case "Birthday" => Bank.payOtherPlayers(player, 50)
      case "ConsultingFees" => Bank.pay(player, 25)
      case "DoctorsFee" => player.pay(Bank, 50)
      case "DrunkFine" => player.pay(Bank, 20)
      case "GetOutOfJailFree" => player.hasGetOutOfJailFreeCard = true
      case "GoToJail" => player.advanceToNearestOf(List(10, 30))
      case "HolidayFund" => Bank.pay(player, 100)
      case "HospitalFees" => player.pay(Bank, 100)
      case "IncomeTax" => player.pay(Bank, 200)
      case "IncomeTaxRefund" => Bank.pay(player, 20)
      case "Inheritance" => Bank.pay(player, 100)
      case "InsurancePremium" => player.pay(Bank, 50)
      case "LoanMatures" => Bank.pay(player, 150)
      case "LuxuryTax" => player.pay(Bank, 100)
      case "OperaTickets" => Bank.otherPlayersPay(player, 50)
      case "PayFine" => player.pay(Bank, 10)
      case "PickChanceCard" => player.pickCard(Monopoly.chanceCards)
      case "PickCommunityCard" => player.pickCard(Monopoly.communityCards)
      case "SpeedingFine" => player.pay(Bank, 15)
      case "StreetRepairs" => player.pay(Bank, player.computeRepairCharge)
      case "StockSale" => Bank.pay(player, 45)
      case "WonCrosswordComp" => Bank.pay(player, 100)
      case _ =>
    }
  }
}

Similarly, the Player.scala file contains traits and classes for the Participant hierarchy.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
// Source: src/main/scala/com/mycompany/smi/ood/Player.scala
package com.mycompany.smi.ood

import scala.util.Random
import scala.collection.mutable.Map

trait Participant {
  var cash: Int

  def pay(target: Participant, amount: Int): Int = {
    if ((cash - amount) > 0) {
      target.receive(amount)
    }
    cash = cash - amount
    cash
  }

  private def receive(amount: Int): Int = {
    cash = cash + amount
    cash
  }
}

class Player(playerName: String) extends Participant {

  var name = playerName
  var currentPosition = 0       // start position 0-based
  var cash = 1500               // starting cash balance
  var hasGetOutOfJailFreeCard = false
  var bankrupt = false

  def rollDice(): Int = {
    val random = new Random()
    val d1 = random.nextInt(6) + 1
    val d2 = random.nextInt(6) + 1
    println("...Roll: (" + d1 + "+" + d2 + ")=" + (d1+d2))
    d1 + d2
  }

  // this is used to respond to rules such as advance to the "nearest"
  // railroad or utility. It checks which of these is "closer" to the
  // player's current position and gets there. In case it is "ahead" of 
  // both slots, then this means that it passes Go so we should collect 
  // the salary.
  def advanceToNearestOf(slots: List[Int]): Int = {
    val slotsAhead = slots.filter(slot => (slot - currentPosition) > 0)
    val nearestSlot = if (slotsAhead.isEmpty) {
      // this will involve passing go, so collect $200
      Bank.pay(this, 200)
      slots(0)
    } else slotsAhead(0)
    advanceTo(nearestSlot)
  }

  // advance by the score specified. As before, we check to see if we pass Go
  // so we know to collect the $200 salary if we do.
  def advanceBy(numSlots: Int): Int = {
    val newPosition = numSlots + currentPosition
    currentPosition = if (newPosition >= 40) {
      // this passed go, so collect $200
      Bank.pay(this, 200)
      newPosition - 40
    } else if (newPosition < 0) {
      newPosition + 40
    } else {
      newPosition
    }
    advanceTo(currentPosition)
  }

  // advance to a specific slot position. This is used to handle commands
  // such as GoToIllinoisAvenue, etc. Although this method is delegated to by
  // both the other advanceTo* methods, we repeat the check for Passing Go
  // because each method can be called independently and the check must be
  // present in all these methods. The checks do not interfere with each
  // other, even though they appear to. For example, the Bank.pay call in
  // this method will never be fired if a sanitized slot id is passed in,
  // such as those from the calls from the other advanceTo* methods.
  def advanceTo(slot: Int): Int = {
    Bank.pay(this, 
      if (currentPosition < 0 || currentPosition > slot) 200 else 0)
    currentPosition = slot
    currentPosition
  }

  // pick a card from the named stack, execute the action specified in
  // the card, then put the card back at the bottom of the stack. In a real
  // game, a "get out of jail free" card is held until it is used, but since
  // it can come from either a chance or a community chest stack, it is hard
  // to figure out where it should be returned, so we return it immediately
  // here, but update a flag in the player so he can use it when required.
  def pickCard(cards: List[ActionSlot]): List[ActionSlot] = {
    val action = cards.head
    action.execute(this)
    (action :: cards.tail.reverse).reverse
  }

  // buy an asset from the bank. Return true if action succeeded else false
  def buy(asset: Asset): Boolean = {
    if (asset.ownedBy == Bank) {
      println("...Buying: " + asset.name)
      pay(Bank, asset.sellingPrice)
      asset.ownedBy = this
      true
    } else false
  }

  // pay the rent on the property if it is owned by another player. Return
  // true if action succeeded else false
  def rent(asset: Asset): Boolean = {
    if (asset.ownedBy != Bank && asset.ownedBy != this) {
      println("...Renting: " + asset.name)
      pay(asset.ownedBy, asset.rent)
      true
    } else false
  }

  // improve a property by building on it. If you own all the properties
  // in a given color group, then you can build houses and then a hotel on
  // it. Calling improve repeatedly will fire the correct transaction (or
  // if no transactions can be fired, a false is returned. Returns true or
  // false depending on whether the improve action succeeded or failed.
  def improve(buildable: PropertySlot): Boolean = {
    if (ownsColorGroup(buildable.group)) {
      if (buildable.hotels == 0) {
        if (buildable.houses < 3) {
          // build a house
          println("...Building house on: " + buildable.name)
          pay(Bank, buildable.housePrice)
          buildable.houses += 1
          true
        } else {
          // return houses, build a hotel
          println("...Building hotel on: " + buildable.name)
          Bank.pay(this, buildable.houses * buildable.housePrice)
          buildable.houses = 0
          pay(Bank, buildable.hotelPrice)
          buildable.hotels = 1
          true
        }
      } else false
    } else false
  }

  // return all properties and houses to bank at half purchase price in
  // return for cash. Return player's cash balance.'
  def liquidate(): Int = {
    println("...Liquidating assets")
    assets(asset => {true}).foreach(asset => asset match {
      case asset: PropertySlot => {
        // if there are houses/hotels, first sell that off
        Bank.pay(this, (0.5 * ((asset.houses * asset.housePrice) +
                               (asset.hotels * asset.hotelPrice))).toInt)
        asset.houses = 0
        asset.hotels = 0
        // ...followed by the property itself
        Bank.pay(this, (asset.sellingPrice * 0.5).toInt)
        asset.ownedBy = Bank
      }
      case _ => {
        Bank.pay(this, (asset.sellingPrice * 0.5).toInt)
        asset.ownedBy = Bank
      }
    })
    cash
  }

  // this method is called during an auction. Each player bids for a given
  // asset. A bid amount higher than the asset price and the player's cash
  // balance is returned. In case of a buildable asset, we attempt to model
  // player behavior by taking the maximum random bid amount over the number
  // of properties in the same color group owned by the player.
  def bid(asset: Asset): Int = {
    val randomizer = new Random()
    asset match {
      case asset: PropertySlot => {
        // check to see if player owns other buildable properties in color
        // group. Generate a random number that many times (to model player's
        // incentive to own the group) and return the max amount as the bid
        val slotsInGroup =
          buildables(buildable => {buildable.group == asset.group}).size
        val max = new Range(0, (slotsInGroup + 1), 1).
          map(i => randomizer.nextInt(cash)).
          foldLeft(0) ((max, elem) => {
            if (elem > max) elem else max
          }
        )
        asset.sellingPrice + max
      }
      case asset: Asset => {
        asset.sellingPrice + randomizer.nextInt(cash)
      }
    }
  }

  def assets(predicate: Asset => Boolean): List[Asset] = {
    Monopoly.slotsOfType[Asset](classOf[Asset]).
      filter(asset => asset.ownedBy == this).
      filter(predicate)
  }

  def buildables(predicate: PropertySlot => Boolean): 
      List[PropertySlot] = {
    Monopoly.slotsOfType[PropertySlot](classOf[PropertySlot]).
      filter(buildable => buildable.ownedBy == this).
      filter(predicate)
  }

  // method to check if the player owns the specified color group
  def ownsColorGroup(color: String): Boolean = {
    if (color == "Brown" || color == "DkBlue") {
      (buildables(buildable => { buildable.group == color }).size == 2)
    } else {
      (buildables(buildable => { buildable.group == color }).size == 3)
    }
  }

  // convenience method to check if player is in Jail (if so, he needs to
  // get out by either posting bail or using a get of out jail free card
  // if he has one.
  def inJail(): Boolean = {
    currentPosition == 10 || currentPosition == 30
  }

  // compute the repair charges due. Sum of $25/house and $100/hotel
  def computeRepairCharge(): Int = {
    val repairCharge = buildables(buildable => {true}).
      foldLeft(0) ((sum, elem) => sum + (25 * elem.houses) + 
        (100 * elem.hotels))
    repairCharge
  }

  // defines the sequence of steps that a player makes once he reaches
  // a new position. This uses certain heuristics so the Player is able
  // to play a simulated game. If the game was interactive, then the
  // player would use his judgement to call methods to control the play.
  def autoPlay(): Int = {
    if (inJail) {
      if (hasGetOutOfJailFreeCard) {
        hasGetOutOfJailFreeCard = false
      } else {
        pay(Bank, 50)
      }
    }
    val slot = Monopoly.slots(currentPosition)
    slot match {
      case slot: PropertySlot => buy(slot) || improve(slot) || rent(slot)
      case slot: RailroadSlot => buy(slot) || rent(slot)
      case slot: UtilitySlot => buy(slot) || rent(slot)
      case slot: ActionSlot => slot.execute(this)
    }
    if (cash <= 0) liquidate() else cash
  }

  // need to override hashCode and equals because we are using Player in
  // a filter equality clause
  override def hashCode(): Int = {
    name.hashCode
  }

  override def equals(that: Any): Boolean = {
    that.isInstanceOf[Player] && name == that.asInstanceOf[Player].name
  }

  override def toString(): String = {
    name + ": (pos:" + currentPosition + ",cash:" + cash + ")"
  }
}

// Models the Bank. Like a Player, the Bank can participate in financial
// transactions (pay, etc). However, the Bank is not a true player in the
// sense that it does not move across the board and own slots (except in
// the initial case where it owns everything).
object Bank extends Participant {
  var cash = 100000000

  // this method should be called if a player needs to pay all
  // the other players.
  def payOtherPlayers(source: Player, amount: Int): Unit = {
    otherPlayers(source).foreach(otherPlayer => {
      source.pay(Bank, amount)
      Bank.pay(otherPlayer, amount)
    })
  }

  // this method should be called if all the other Players should
  // pay the specified Player.
  def otherPlayersPay(target: Player, amount: Int): Unit = {
    otherPlayers(target).foreach(otherPlayer => {
      otherPlayer.pay(Bank, amount)
      Bank.pay(target, amount)
    })
  }

  // returns the player who places the maximum bid on the asset
  def auction(slot: Asset): Player = {
    println("......Holding action")
    Monopoly.players.sort((p1, p2) => p1.bid(slot) > p2.bid(slot))(0)
  }

  private def otherPlayers(thisPlayer: Player): List[Player] = {
    Monopoly.players.filter(player => player.name != thisPlayer.name)
  }
}

Here is my test. I guess I could have put this in a main() method in the Monopoly.scala file, but I like Maven's support for running JUnit tests. I wote quite a few other unit tests that I wrote as I built up the code, but I don't show them in the interests of keeping this post short.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Source: src/main/scala/com/mycompany/smi/ood/MonopolyTest.scala
package com.mycompany.smi.ood

import org.junit._
import org.junit.Assert._

class MonopolyTest {

  @Test
  def testRunSimulation(): Unit = {
    for (i <- 1 to 4) {
      Monopoly.addPlayer(new Player("Player " + i))
    }
    Monopoly.startPlay
    var turns = 1
    while (Monopoly.players.size > 1) {
      println("Turn:" + turns + ", #-players: " + Monopoly.players.size)
      for (player <- Monopoly.players) {
        println("..Player: " + player)
        player.advanceBy(player.rollDice)
        player.bankrupt = (player.autoPlay <= 0)
      }
      Monopoly.players = Monopoly.players.filter(
        player => (! player.bankrupt))
      turns = turns + 1
    }
    println("Winner: " + Monopoly.players(0))
  }
}

And here is some output from that test. As you can see, the players just get richer and richer and the simulation shows no sign of terminating, lending credence (I guess) to the truism about disciplined rule based investing strategies.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Turn:1, #-players: 4
..Player: Player 1: (pos:0,cash:1500)
...Roll: (3+2)=5
...Buying: Reading Railroad
..Player: Player 4: (pos:0,cash:1500)
...Roll: (1+6)=7
...Executing: PickChanceCard
...Executing: AdvanceToStCharlesPlace
..Player: Player 2: (pos:0,cash:1500)
...Roll: (4+6)=10
...Executing: GoToJail
..Player: Player 3: (pos:0,cash:1500)
...Roll: (4+4)=8
...Buying: Vermont Avenue
...
Turn:100, #-players: 4
..Player: Player 1: (pos:32,cash:3502)
...Roll: (6+1)=7
...Renting: Boardwalk
..Player: Player 4: (pos:28,cash:2528)
...Roll: (6+1)=7
...Renting: Short Line Railroad
..Player: Player 2: (pos:31,cash:3604)
...Roll: (1+3)=4
...Renting: Short Line Railroad
..Player: Player 3: (pos:19,cash:3086)
...Roll: (1+3)=4
...Renting: Indiana Avenue
...
Turn:500, #-players: 4
..Player: Player 1: (pos:29,cash:18835)
...Roll: (6+6)=12
...Renting: Mediterenean Avenue
..Player: Player 4: (pos:23,cash:9036)
...Roll: (6+6)=12
...Renting: Short Line Railroad
..Player: Player 2: (pos:23,cash:19544)
...Roll: (6+6)=12
...Renting: Short Line Railroad
..Player: Player 3: (pos:11,cash:15171)
...Roll: (6+6)=12
...Renting: Indiana Avenue
...

If you've read my previous post, you will find that the Scala code in this post is considerably less Java-like that one, representing, I guess, a bit of evolution towards becoming more comfortable with Scala. If you are an experienced Scala programmer, you may find some of my code a bit on the verbose side - this is by design. I understand that things can be dropped in certain cases, and perhaps I will start dropping them as I gain more experience with Scala, but I find that writing out the verbose Scala idiom makes it easier to understand and more maintainable. As this Java Code Conventions document says, part of good coding is to make sure other programmers can read your code, and this will become more important as Scala gets into the mainstream, which I think may happen sooner than most people think.

As always, if you think that there are ways the code and/or design above can be improved, please let me know.

Monday, January 19, 2009

Scala Mock Interview: Coding Questions

I recently started learning Scala because of its Actor Framework. Over the last few weeks, I've been working my way through the Programming in Scala book. Having done that, I figured I'd test myself out using some of Steve Yegge's Phone Screen Questions. I've had the opportunity to be on both sides of the interview table/phone with Steve's strategy, and both as interviewee and interviewer, I think the approach works much better than the others I have encountered/tried.

I've adapted the strategy somewhat for my own purposes. One, I hardly do any phone screens - almost all the interviews I've done over the past couple of years have been face-to-face. Two, I usually cover 3 areas instead of the 5 Steve advocates - our interviews last about 45 minutes, and 10-15 minutes per question leaves just enough time to answer questions that the candidate may have about us. To get maximum coverage, a few colleagues (who also follow this strategy) and I usually split up the areas amongst ourselves prior to the interview.

In any case, I've noticed that reading a programming language tutorial usually leaves me with just enough knowledge to read code in that language. Writing some code is the only way I actually get to a point where I can program in that language. So if you are like me, reading this post will probably not do you much good - except perhaps to notice the many similarities to Java. However, if you are one of the lucky few who are able to absorb a language by reading code, then this post contains some simple Scala code to implement the coding questions in Steve's page referenced above.

On the other hand, if you are an experienced Scala programmer, I am guessing that much of this code would look very primitive and/or Java-like to you. In that case, I would appreciate your feedback and hints/pointers on better ways to do things in Scala.

Maven Project Setup

The lift Project (a Scala web framework) provides a Maven2 archetype for basic Scala projects, so that you get all the Maven2 goodness and its associated productivity boost for free in your Scala project. To create a basic Scala project, run the command:

1
2
3
4
5
6
7
sujit@sirocco:~$ mvn archetype:create \
  -DarchetypeGroupId=org.scala-tools.archetypes \
  -DarchetypeArtifactId=scala-archetype-simple \
  -DarchetypeVersion=1.2 \
  -DremoteRepositories=http://scala-tools.org/repo-releases \
  -DgroupId=com.mycompany.smi \
  -DartifactId=scala-examples

Update the scala.version property in the generated POM to suit your setup (I am using Scala 2.7.3). I started out using the Scala Eclipse plugin, so I did mvn eclipse:eclipse to generate my Eclipse IDE descriptors. Then follow instructions on the plugin page to configure the plugin in your IDE.

However, (at least in my case, I was using Eclipse 3.4.1/MyEclipse 7/Scala 2.7.3 with the plugin available from the update site), the Eclipse plugin was so bad as to be almost unusable. There was no code-suggest, editors would freeze and would need to be closed and reopened, etc. I ended up using Netbeans Scala plugin (with Netbeans 6.5), and it is sooo much better. Netbeans also recognizes a Maven project natively, so no descriptors need to be generated. My only gripe with Netbeans is its tendency to randomly freeze up for 30-40s at a time, something I've never seen with Eclipse.

Anyway, here are Steve's questions (and some of my own) and my Scala solutions below.

Reverse a String

This is actually one of my favorites, although its quite sad to see how many people cannot come up with even a single solution. But the nice thing about this one is that it is extendable in so many interesting ways. Here are some of mine.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Source: src/main/scala/com/mycompany/smi/coding/StringReverser.scala
package com.mycompany.smi.coding

object StringReverser {
  
  // calling the method on String directly
  def reverse1(s: String): String = {
    s.reverse
  }
  
  // iterating backward through the string
  def reverse2(s: String): String = {
    val buf = new StringBuilder
    val len = s.length
    for (i <- 0 until len) {
      buf.append(s.charAt(len - i - 1))
    }
    buf.toString
  }
  
  // recursively
  def reverse3(s: String): String = {
    val len = s.length
    if (len == 1) {
      s
    } else {
      reverse3(s.substring(1, len)) + s.charAt(0)
    }
  }

  // recursively, with tail recursion
  def reverse4(s: String): String = {
    val len = s.length
    if (len == 1) {
      s
    } else {
      s.substring(len-1, len) + reverse4(s.substring(0, len-1))
    }
  }
}

The rather long and verbose class (and object) names in my post is in keeping with Steve's post about Java being the Kingdom of Nouns, and because I guess I've worked too long with Spring :-).

The first answer is correct, but probably not quite what most interviewers are looking for. The second one is usually what I get from a majority of people, with minor variations (I like reading the string backwards when I do this in Java, for one). The third and fourth versions are almost always the result of asking if there are other ways to do the same thing. Here is the JUnit test harness to run this code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Source: src/test/scala/com/mycompany/smi/coding/StringReverserTest.scala
package com.mycompany.smi.coding

import org.junit._
import org.junit.Assert._

@Test
class StringReverserTest {

  @Test
  def testReverse1() = {
    assertEquals("madA m'I madaM", StringReverser.reverse1("Madam I'm Adam"))
  }

  @Test
  def testReverse2() = {
    assertEquals("madA m'I madaM", StringReverser.reverse2("Madam I'm Adam"))
  }

  @Test
  def testReverse3() = {
    assertEquals("madA m'I madaM", StringReverser.reverse3("Madam I'm Adam"))
  }

  @Test
  def testReverse4() = {
    assertEquals("madA m'I madaM", StringReverser.reverse4("Madam I'm Adam"))
  }
}

Compute the n-th Fibonacci Number

The objective here is to compute the n-th Fibonacci number, where n is provided in the target method's parameter list. I provide two versions of a Fibonacci Number generator below. The recursive version is more intuitive, although someone may ask you do this using iteration, so its probably good to know.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Source: src/main/scala/com/mycompany/smi/coding/FibonacciGenerator.scala
package com.mycompany.smi.coding

object FibonacciGenerator {

  // iterative version
  def generate1(n: Int): Int = {
    if (n < 0) {
      throw new IllegalArgumentException(
        "Invalid Argument, n must be >= 0")
    }
    if (n == 0) 0
    else if (n == 1) 1
    else {
      var prev2 = 0
      var prev1 = 1
      var sum = 0
      for (i <- 2 to n) {
        sum = prev1 + prev2
        prev2 = prev1
        prev1 = sum
      }
      sum
    }
  }

  // recursive version
  def generate2(n: Int): Int = {
    if (n == 0) 0
    else if (n == 1) 1
    else {
      generate2(n - 2) + generate2(n - 1)
    }
  }
}

Notice that you are calling the same function with the same argument twice multiple times in the recursive version, so caching the results of the calls may result in some improvement in the algorithm. The JUnit test harness is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Source: src/test/scala/com/mycompany/smi/coding/FibonacciGeneratorTest.scala
package com.mycompany.smi.coding

import org.junit._
import org.junit.Assert._

class FibonacciGeneratorTest {

  val first8Fibs = List[Int] (0,1,1,2,3,5,8,13)

  @Test
  def testGenerate1() = {
    var results = List[Int]()
    for (i <- 0 until 8) {
      results += FibonacciGenerator.generate1(i)
    }
    assertEquals(results, first8Fibs)
  }

  @Test
  def testGenerate2() = {
    var results = List[Int]()
    for (i <- 0 until 8) {
      results += FibonacciGenerator.generate2(i)
    }
    assertEquals(results, first8Fibs)
  }
}

Print Multiplication Table of 12

The objective here is to print multiplication tables uptil the number provided. There are 4 implementations provided. The first three do exactly what is asked, in slightly different ways, and that is a good thing. The last one actually introduces complexity without any gain, and that was me pandering to the CS person in me, and hopefully I will not get my head cut off for it :-). Actually, I wanted to play with classes, and populating and printing a 2 dimensional array seemed to be a good idea to be able to do that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
// Source: src/main/scala/com/mycompany/smi/coding/MultiplicationTableGenerator.scala
package com.mycompany.smi.coding

import org.apache.commons.lang._

object MultiplicationTableGenerator {

  // just print it out in rows and columns, no formatting
  def generate1(n: Int): Unit = {
    for (i <- 1 to n) {
      for (j <- 1 to n) {
        print("  " + (i * j))
      }
      println
    }
  }

  // print with formatting
  def generate2(n: Int): Unit = {
    val maxwidth = String.valueOf(n * n).length
    for (i <- 1 to n) {
      for (j <- 1 to n) {
        print("  " + lpad(String.valueOf(i * j), maxwidth))
      }
      println
    }
  }

  def lpad(s: String, width: Int): String = {
    val maxpad = " " * width
    val maxpadded = maxpad + s
    val maxlen = maxpadded.length
    maxpadded.substring(maxlen - width, maxlen)
  }

  // use StringUtils.lpad for formatting
  def generate4(n: Int): Unit = {
    val maxwidth = String.valueOf(n * n).length
    for (i <- 1 until n) {
      for (j <- 1 until n) {
        print(StringUtils.leftPad(String.valueOf(i, j), maxwidth))
      }
      println
    }
  }

  // store and print
  def generate4(n: Int): Unit = {
    printFormatted(build(12))
  }

  private def build(n: Int): Matrix[Int] = {
    var matrix = new Matrix[Int](n, n)
    for (i <- 0 until n) {
      for (j <- 0 until n) {
        matrix.set(i, j, ((i + 1) * (j + 1)))
      }
    }
    matrix
  }

  private def printFormatted(matrix: Matrix[Int]): Unit = {
    val maxwidth = String.valueOf(matrix.rows * matrix.cols).length
    for (i <- 0 until matrix.rows) {
      for (j <- 0 until matrix.cols) {
        print("  " + lpad(String.valueOf(matrix.get(i, j)), maxwidth))
      }
      println
    }
  }
}

// only used for generate4()
class Matrix[T](nrows: Int, ncols: Int) {
  val rows = nrows
  val cols = ncols
  private var elements = new Array[T](rows * cols)

  def set(x:Int, y:Int, value:T): Unit = {
    if ((x < cols) && (y < rows)) {
      elements((y * cols) + x) = value
    } else {
      throw new IllegalArgumentException(
        "(" + x + "," + y + ") should be in range (0,0) to (" +
        cols + "," + rows + ")")
    }
  }

  def get(x:Int, y:Int): T = {
    if ((x < cols) && (y < rows)) {
      elements((y * cols) + x)
    } else {
      throw new IllegalArgumentException(
        "(" + x + "," + y + ") should be in range (0,0) to (" +
        cols + "," + rows + ")")
    }
  }
}

The JUnit test is similar to the ones shown already, and is shown below. The only difference is that it doesn't do asserts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Source: src/test/scala/com/mycompany/smi/coding/MultiplicationTableGeneratorTest.scala
package com.mycompany.smi.coding

import org.junit._
import org.junit.Assert._

class MultiplicationTableGeneratorTest {

  @Test
  def testGenerate1() = {
    println(">> generate1")
    MultiplicationTableGenerator.generate1(12)
  }

  @Test
  def testGenerate2() = {
    println(">> generate2")
    MultiplicationTableGenerator.generate2(12)
  }

  @Test
  def testGenerate3() = {
    println(">> generate3")
    MultiplicationTableGenerator.generate3(12)
  }

  @Test
  def testGenerate4() = {
    println(">> generate4")
    MultiplicationTableGenerator.generate4(12)
  }
}

Sum up Integers from Text File

The objective here is to read a text file of numbers, one number to a line, and print out the sum of these numbers. There are two implementations of this solution, the first one does it the Java way (with looping), and the second one does it the Scala way (with a foreach over a List. I also have a method to return an Iterator of lines given a file name (which is reused in some related questions, see below).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Source: src/main/scala/com/mycompany/smi/coding/FileOperator.scala
package com.mycompany.smi.coding

import scala.io._
import scala.collection.mutable.Map
import scala.collection.mutable.LinkedHashMap

object FileOperator {

  // the java way
  def computeSumOfValues1(file: String): Int = {
    var sum = 0
    var lno = 0
    for (line <- lines(file)) {
      lno = lno + 1
      try {
        sum += Integer.parseInt(chomp(line))
      } catch {
        case ex: NumberFormatException => {
          println("ERROR on line " + lno + ", " + chomp(line) + 
            " not number")
        }
      }
    }
    sum
  }

  // the scala way
  def computeSumOfValues2(file: String): Int = {
    var sum = 0
    lines(file) foreach (sum += toNumber(_))
    sum
  }

  private def lines(file: String): Iterator[String] = {
    Source.fromFile(file).getLines
  }

  private def toNumber(s: String): Int = {
    try {
      Integer.parseInt(chomp(s))
    } catch {
      case ex: NumberFormatException => 0
    }
  }

  // maybe just simpler to use StringUtils.chomp()
  private def chomp(s: String): String = {
    try {
      if ((s.charAt(s.length - 1) == '\n') ||
          (s.charAt(s.length - 1) == '\r')) {
        s.substring(0, s.length - 1)
      } else if (s.substring(s.length - 2, s.length) == "\r\n") {
        s.substring(0, s.length - 2)
      } else {
        s
      }
    } catch {
      case ex: StringIndexOutOfBoundsException => s
    }
  }
  ...
}

One thing to note that you should not call any of your application packages java or scala. Because of the way Scala namespaces work, IDEs (and possibly the scala compiler) will get confused about which namespace you are importing from. I initially had com.mycompany.scala as part of my package name, but changed it to com.mycompany.smi in response to the problem described above.

Here are a couple of my own questions. I try and ask different questions from whats on Steve's list, just because, although I am guessing someone who reads Steve's post is probably capable of solving problems of similar complexity.

Compute Word Frequency from Text File

The objective here is to find the word frequency for a given file full of words. The output is a Map of words and their frequencies. As before, the first implementation is in the Java style, and the second one is in the Scala style. The second implementation uses two nested generators with a predicate on the second generator. These two functions are part of the same FileOperator object for the last question.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Source: src/main/scala/com/mycompany/smi/coding/FileOperator.scala
// (cont'd)
...
object FileOperator {
  ...
  def computeWordFrequencyMap1(file: String): Map[String,Int] = {
    var wordcounts = Map.empty[String,Int]
    for (line <- lines(file)) {
      val words = line.split("[ ,;.!?-]+")
      for (word <- words) {
        if (word.trim.length > 0) {
          addWord(chomp(word), wordcounts)
        }
      }
    }
    wordcounts
  }

  // multiple generators version
  def computeWordFrequencyMap2(file: String): Map[String,Int] = {
    var wordcounts = Map.empty[String,Int]
    for (line <- lines(file);
         word <- line.split("[ ,;.!?]+")
         if (word.trim.length > 0)) {
      addWord(chomp(word), wordcounts)
    }
    wordcounts
  }

  // refactored out to make the second version easier to write
  def addWord(word: String, map: Map[String,Int]): Unit = {
    val origCount = if (map.contains(word)) map(word) else 0
    map + (word -> (origCount + 1))
  }
  ...
}

Sort a Map[String,Int] by Value

Now that we have a Map of word frequencies, we want to know the highest occuring words, so that involves sorting the Map by its value. The code below uses a Scala style comparator function in the keys.sort() call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Source: src/main/scala/com/mycompany/smi/coding/FileOperator.scala
// (cont'd)
...
object FileOperator {
  ...
  def sortByValue(unsorted: Map[String,Int]): Map[String,Int] = {
    val sortedMap = new LinkedHashMap[String,Int]()
    val keys = unsorted.keys.toList
    // sort in descending order of value
    val sortedKeys = keys.sort((a,b) => {unsorted(a) > unsorted(b)})
    sortedKeys.foreach(key => sortedMap + (key -> (unsorted(key))))
    sortedMap
  }
}

The JUnit test for the last 3 questions (in FileOperator) are shown below. It uses two test data files, test1.txt, which contains integers, one per line, and test2.txt, which contains free form text spread across multiple lines.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Source: src/main/scala/com/mycompany/smi/coding/FileOperatorTest.scala
package com.mycompany.smi.coding

import org.junit._
import org.junit.Assert._

class FileOperatorTest {

  @Test
  def testComputeSumOfValues1() = {
    println(">> sumOfValues1 [test1.txt]")
    val sum = FileOperator.computeSumOfValues1(
      "src/test/resources/test1.txt")
    println(sum)
    assertEquals(2667, sum)
  }

  @Test
  def testComputeSumOfValues2() = {
    println(">> sumOfValues2 [test1.txt]")
    val sum = FileOperator.computeSumOfValues2(
      "src/test/resources/test1.txt")
    println(sum)
    assertEquals(2667, sum)
  }

  @Test
  def testComputeWordFrequencyMap1() = {
    println(">> wordFrequencyMap1 [test2.txt]")
    val map = FileOperator.computeWordFrequencyMap1(
      "src/test/resources/test2.txt")
    println(map)
    assertEquals(2, map("morning"))
    assertEquals(1, map("breakfast"))
  }

  @Test
  def testComputeWordFrequencyMap2() = {
    println(">> wordFrequencyMap2 [test2.txt]")
    val map = FileOperator.computeWordFrequencyMap2(
      "src/test/resources/test2.txt")
    println(map)
    assertEquals(2, map("morning"))
    assertEquals(1, map("breakfast"))
  }

  @Test
  def testSortByValue() = {
    println(">> sortByValue")
    val sortedMap = FileOperator.sortByValue(
      FileOperator.computeWordFrequencyMap2("src/test/resources/test2.txt"))
    println(sortedMap)
  }
}

Print out Odd Numbers from 1-99

The objective is pretty clear from the heading. The next 3 questions have single implementations each, mostly focusing on (what I believe are) the Scala way of doing things. They are all contained in a single file Misc.scala, relevant portions of which are split up among the next three sections.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Source: src/main/scala/com/mycompany/smi/coding/Misc.scala
package com.mycompany.smi.coding

import java.util.Random
import org.apache.commons.lang.StringUtils

object Misc {

  def printOdds(min: Int, max: Int): List[Int] = {
    val range = new Range(min, max, 1)
    range.toList filter (_ % 2 == 1)
  }
  ...
}

Find largest int value in an int array

Given an int array (our method actually populates it first using java.util.Random), find the maximum value in the array.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// Source: src/main/scala/com/mycompany/smi/coding/Misc.scala
...
object Misc {
  ...
  def max(n: Int): Int = {
    // populate
    val randomizer = new Random(n)
    var arr = new Array[Int](n)
    for (i <- 0 until n) {
      arr(i) = randomizer.nextInt(n)
    }
    println(">> input: " + arr.toList)
    // find max in list
    var max = 0
    arr.foreach(elem => if (max < elem) max = elem else max = max)
    max
  }
  ...
}

Format RGB value as a 6 digit Hex String

The objective here is to return a hexadecimal RGB string given three bytes representing the R, G, and B portions of a color. The approach here is not Scala specific, a Java version of this would probably look quite similar.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Source: src/main/scala/com/mycompany/smi/coding/Misc.scala
...
object Misc {
  ...

  def toHexString(r: Byte, g: Byte, b: Byte): String = {
    val rgb = (r << 16) + (g << 8) + b
    "0x" + StringUtils.leftPad(Integer.toHexString(rgb), 6, '0')
  }
}

And here is the JUnit test for the 3 methods described above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Source: src/test/scala/com/mycompany/smi/coding/MiscTest.scala
package com.mycompany.smi.coding

import org.junit._
import org.junit.Assert._

class MiscTest {

  @Test
  def testPrintOdds() = {
    println(">> printOdds")
    println(Misc.printOdds(0, 99))
  }

  @Test
  def testMax() = {
    println(">> max")
    println(Misc.max(10))
  }

  @Test
  def testToHexString() = {
    println(">> toHexString")
    println("RGB(0x00, 0x00, 0x00)=" + Misc.toHexString(0x00, 0x00, 0x00))
    println("RGB(0x11, 0x11, 0x11)=" + Misc.toHexString(0x11, 0x11, 0x11))
    println("RGB(0x11, 0x00, 0x00)=" + Misc.toHexString(0x11, 0x00, 0x00))
    println("RGB(0x00, 0x11, 0x00)=" + Misc.toHexString(0x00, 0x11, 0x00))
    println("RGB(0x00, 0x00, 0x11)=" + Misc.toHexString(0x00, 0x00, 0x11))
    println("RGB(0x11, 0x00, 0x11)=" + Misc.toHexString(0x11, 0x00, 0x11))
  }
}

My take after going this far with Scala is that for an average Java programmer such as myself, learning to use Scala seems to be fairly simple. The map, foreach, etc. methods seem intuitive, but hard to grasp and to apply correctly at first, but become easier with repeated usage. There are also some aspects, such as discussions on currying, covariance, contravariance, etc, that I haven't fully gotten as yet, but hopefully they will also be understandable as I work more with Scala.

Friday, January 16, 2009

Extending Maven with Ant

Of late, I have been working with frameworks that seem to be doing an awful lot of bytecode manipulation, annotation processing and the like. Readers of past posts would already know about my experiments with Kilim and ActorFoundry, and I've recently started using JiBX for an application at work, which also does bytecode manipulation. If you've been reading my blog for a while, you'll also know that I'm a big Maven2 fan. I've been using Maven2 for almost couple of years (or more) now, and until recently, I hadn't really missed Ant that much.

I find that Maven2 makes standard build tasks trivial to non-existent, but non-standard tasks (for which there isn't already a plugin available) incredibly hard. With Ant, the level of effort is similar for a standard versus a non-standard task. This is because the design of Ant is imperative in nature, while Maven2's is declarative. With Ant, you tell it how to do a particular task using an XML based scripting language. With Maven2, you provide a standard project structure, and it knows how to do the standard tasks (called "goals" in Maven-speak). Not that I think this was a bad design decision, by the way - the declarative nature of Maven2 has served me (and I suspect most Maven2 users) quite well, with its automatic dependency management, standard goals, etc. And there are a huge number of plugins available - its only when you come across a situation where you need to roll your own is when you will have a problem.

Because I didn't know how to handle this sort of thing, my approach so far has been to build an Ant build.xml file from my Maven2 POM (using mvn ant:ant), and then add the non-standard task into the build.xml file. Of course, now anytime I need to add a new dependency into my pom, I have to add it in by hand into the build.xml. I still want to be able to generate IDE descriptors, build up the project on another machine, etc, so dispensing with the POM altogether is not an option.

One such example from the recent past (2 blog posts ago) is this little monster, refactored a bit for external access, detailing the steps to build my ActorFoundry and Kilim client code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
  <target name="compile" depends="get-deps" 
      description="Compile the code">
    <mkdir dir="${maven.build.output}"/>
    <javac srcdir="${maven.src.dir}"
           destdir="${maven.build.output}" 
           excludes="**/package.html" 
           debug="true" 
           deprecation="true" 
           optimize="false">
      <classpath refid="build.classpath"/>
    </javac>
    <ant target="check-local-constraints"/>
    <ant target="generate-af-executors"/>
    <ant target="compile-af-executors"/>
    <ant target="weave-classes"/>
  </target>
  <target name="check-local-constraints" 
      depends="_init" 
      description="Check local constraints">
    <!-- check local constraints happens for af only -->
    <apt 
         srcdir="${maven.src.dir}/${af.path.prefix}"
         compile="false"
         classpathref="build.classpath"
         debug="true"
         factory="osl.foundry.preprocessor.LocalSynchConstAPF"
         factorypathref="build.classpath"/>
  </target>
  <target name="generate-af-executors" 
      depends="_init" 
      description="Generate ActorFoundry Executors">
    <!-- code generation for af only -->
    <delete dir="${maven.src-gen.dir}"/>
    <mkdir dir="${maven.src-gen.dir}"/>
    <javadoc private="true"
         doclet="osl.foundry.preprocessor.ExecutorCodeGen"
         docletpathref="build.classpath"
         classpathref="build.classpath"
         sourcepath="${maven.src.dir}"
         packagenames="${af.pkg.prefix}">
      <arg line="-outdir ${maven.src-gen.dir}"/>
    </javadoc>
  </target>
  <target name="compile-af-executors" 
      depends="_init" 
      description="Compile ActorFoundry Executors">
    <!-- compile generated code: for af only -->
    <javac srcdir="${maven.src-gen.dir}"
           destdir="${maven.build.output}"
           debug="on"
           fork="on">
      <classpath refid="build.classpath"/>
    </javac>
  </target>
  <target name="weave-classes" 
      depends="_init" 
      description="Enhance classes using Kilim Weaver">
    <!-- weaving happens for kilim and af files -->
    <java classname="kilim.tools.Weaver" fork="yes">
      <classpath refid="weave.classpath"/>
      <assertions>
        <enable/>
      </assertions>
      <arg value="-x"/>
      <arg value="ExInvalid|test"/>
      <arg value="-d"/>
      <arg value="${maven.build.output}"/>
      <arg line="${kilim.pkg.prefix}.ActorManager 
                    ${kilim.pkg.prefix}.Actor 
                    ${kilim.pkg.prefix}.DownloadActor 
                    ${kilim.pkg.prefix}.IndexActor 
                    ${kilim.pkg.prefix}.WriteActor 
                    ${af.pkg.prefix}.ActorManagerExecutor
                    ${af.pkg.prefix}.DownloadActorExecutor
                    ${af.pkg.prefix}.IndexActorExecutor
                    ${af.pkg.prefix}.WriteActorExecutor"/>
    </java>
  </target>

As you can see, my compile target calls four other custom targets following the compilation phase. In Maven2's Default Build Lifecycle, these four targets would be called in the process-classes phase.

Approach #1: Build Custom Mojo(s)

The "pure" Maven way to address this is to build one or more MOJO (Maven pOJO) classes in Java that fires in the process-classes phase. This was my initial approach, which I later abandoned. However, in the process I learned some useful things, which I would like to describe here before going to my final solution.

Building a MOJO requires you to first build a Maven2 plugin project. The Plugin Developer's Guide page has quite a bit of information if you are interested. There is an archetype available for this, so you run:

1
2
3
4
5
prompt$ mvn archetype:create \
          -DgroupId=com.mycompany.plugins \
          -DartifactId=maven-mycompany-plugin \
          -DarchetypeGroupId=org.apache.maven.archetypes \
          -DarchetypeArtifactId=maven-archetype-mojo

This will create your project. Remove the url field from the POM, since this is going to be a local plugin. Since my MOJOs would need to walk directories and such, I needed commons-io (the recommended IO library for Maven) and plexus-utils (to access Plexus, the IoC container used by Maven), so I added them to the POM as shown below. I also made it Java 1.5 source/target compatible. Then I ran mvn eclipse:eclipse to generate the descriptors for Eclipse. The plugin project can then be opened in Eclipse as a standard Java project.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  <dependencies>
    ...
    <dependency>
      <groupId>commons-io</groupId>
      <artifactId>commons-io</artifactId>
      <version>1.4</version>
    </dependency>
    <dependency>
      <groupId>org.codehaus.plexus</groupId>
      <artifactId>plexus-utils</artifactId>
      <version>1.5.6</version>
    </dependency>
  </dependencies>

I started writing some code for a plugin that wrapped the Kilim Weaver. Essentially, it takes three parameters classpath, includes and excludes, and uses that to run the Weaver by calling java directly. I don't like this too much, but the alternative was to call the Exec plugin from the command line, which seemed much too heavyweight.

There are two popular books available on Maven, Better Builds with Maven and Maven: The Definitive Guide, (both free to download) and both have some information on how to build MOJOs, but you may have to peek at the sources of a similar plugin to figure out how to build your own. In my case, the sources for the Exec plugin were very helpful. Here is the code for the WeaverMojo.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
// Project: maven-mycompany-plugin
// Source: src/main/java/com/mycompany/plugin/WeaverMojo.java
package com.mycompany.plugin;

import java.io.File;
import java.io.FileFilter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;

import org.apache.commons.io.DirectoryWalker;
import org.apache.maven.plugin.AbstractMojo;
import org.apache.maven.plugin.MojoExecutionException;
import org.codehaus.plexus.util.StringUtils;
import org.codehaus.plexus.util.cli.Commandline;
import org.codehaus.plexus.util.cli.StreamConsumer;

/**
 * Maven Mojo for Kilim's Weaver.
 * 
 * @goal weave
 * @phase process-classes
 */
public class WeaverMojo extends AbstractMojo {
  
  /**
   * Input directory
   * @parameter default-value="${project.build.directory}"
   * @required
   * @readonly
   */
  private File inputDirectory;
  
  /**
   * Output directory
   * @parameter default-value="${project.build.directory}"
   * @required
   * @readonly
   */
  private File outputDirectory;
  
  /**
   * The Maven classpath, must be supplied manually in the config.
   * @parameter alias="classpath"
   * @required
   */
  private String mavenClassPath;
  
  /**
   * Specifies patterns to exclude. Multiple patterns can be specified
   * and are treated as exclude ${exclude[0]} OR ${exclude[1]} OR ...
   * @parameter alias="excludes"
   */
  private String[] excludes;

  /**
   * Specifies patterns to include. Multiple patterns can be specified
   * and are treated as include ${include[0]} OR ${include[1]} OR ...
   * @parameter alias="includes"
   */
  private String[] includes;
  
  public void execute() throws MojoExecutionException {
    getLog().info("Weaving classes...");
    WeaverDirectoryWalker walker = new WeaverDirectoryWalker();
    List<File> inputFiles = new ArrayList<File>();
    try {
      walker.walk(inputFiles);
    } catch (IOException e) {
      throw new MojoExecutionException("Problem walking directory", e);
    }
    // convert these from file name to package name notation
    List<String> classnames = new ArrayList<String>();
    String inputDirectoryPrefix = inputDirectory.getAbsolutePath(); 
    for (File inputFile : inputFiles) {
      classnames.add(inputFile.getAbsolutePath().
        replaceFirst(inputDirectoryPrefix, ""). // get rid of absolute path
        replaceFirst(".classes.", "").          // get rid of /classes/
        replaceAll("/", ".").                   // convert / to .
        replaceAll(".class", ""));              // remove trailing .class
    }
    // Call using java from command line
    Commandline commandline = new Commandline();
    commandline.setExecutable("java"); // assume that java is in PATH
    commandline.addArguments(new String[] {
      "-cp",
      mavenClassPath,
      "kilim.tools.weaver",
      "-x",
      "ExInvalid|Test",
      "-d",
      outputDirectory.getAbsolutePath(),
      StringUtils.join(classnames.iterator(), " ")
    });
    StreamConsumer stdout = new StreamConsumer() {
      public void consumeLine(String line) {
        getLog().info(line);
      }
    };
    StreamConsumer stderr = new StreamConsumer() {
      public void consumeLine(String line) {
        getLog().info( line );
      }
    };
    getLog().info("Running command: java " + 
      StringUtils.join(commandline.getArguments(), " "));  
    try {
      CommandLineUtils.executeCommandLine(commandline, stdout, stderr);
    } catch (CommandLineException e) {
      throw new MojoExecutionException("Java execution failed", e);
    }
  }
  
  private class WeaverDirectoryWalker extends DirectoryWalker {
    
    public WeaverDirectoryWalker() {
      super(new FileFilter() {
        public boolean accept(File f) {
          if (f.isDirectory()) {
            // we don't want directories in our list
            return true;
          }
          if (! f.getName().endsWith("class")) {
            // we only want .class files in our list
            return false;
          }
          if (f.getName().contains("$")) {
            // don't include inner class class files
            return false;
          }
          boolean included = false;
          boolean excluded = false;
          String filename = f.getAbsolutePath();
          if (includes != null) {
            for (int i = 0; i < includes.length; i++) {
              if (filename.matches(includes[i])) {
                getLog().info(filename + " == " + includes[i]);
                included = true;
                break;
              }
            }
          }
          if (excludes != null) {
            for (int i = 0; i < excludes.length; i++) {
              if (filename.matches(excludes[i])) {
                excluded = true;
                break;
              }
            }
          }
          return included && (! excluded);
        }
      }, -1);
    }

    public void walk(List<File> filenames) throws IOException {
      walk(inputDirectory, filenames);
    }
    
    @Override
    protected void handleFile(File file, int depth, Collection results) 
        throws IOException {
      results.add(file);
    }
  }
}

The work that the MOJO does is defined in its execute() method. The private member variables are annotated with commons-attribute annotations. Getting/setting the variables are handled by Plexus. The annotations are also used to generate the plugin.xml (plugin descriptor) file.

Incidentally, commons-io has a set of ready made FileFilters, which can be ANDed and ORed. It also has a RegexFilter, which uses Java regular expressions similar to my implementation. However, it applies the regular expression on the file name alone (not the entire path), so it did not work for me. It would be nicer to build up a composite filter using the AND/OR/NOT filters using my inputs and outputs arrays as the inputs and then pass it into the DirectoryWalker. It would perhaps also be nicer to be able to use an Ant style file filter in order to make the configuration easier to read, but I guess Java developers should be equally at home with either style.

To compile the MOJO, generate the plugin descriptor (plugin.xml) and install to your local repository, run mvn install.

On the client side (where you want to run the new plugin), you need to configure the build section with the plugin's configuration information. Here is the snippet from my client POM.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
  <build>
    ...
    <plugins>
      ...
      <!-- WeaverMojo configuration -->
      <plugin>
        <groupId>com.mycompany.plugin</groupId>
        <artifactId>maven-mycompany-plugin</artifactId>
        <version>1.0-SNAPSHOT</version>
        <executions>
          <execution>
            <phase>process-classes</phase>
            <goals>
              <goal>weave</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <classpath>full runtime classpath here</classpath>
          <includes>
            <param>^.*kilim.*$</param>
            <param>^.*?actorfoundry.*?Executor.*$</param>
          </includes>
        </configuration>
      </plugin>
      ...
    </plugins>
    ...
  </build>

You can run this plugin using either mvn process-classes or mvn mycompany:weave (according to the docs, the second option is automatic if the plugin project's artifactId is either mycompany-maven-plugin or maven-mycompany-plugin). However, I had to add the project's groupId to my settings.xml file.

1
2
3
4
5
6
<settings>
  ...
  <pluginGroups>
    <pluginGroup>com.mycompany.plugin</pluginGroup>
  </pluginGroups>
</settings>

However, as mentioned before, I ultimately abandoned this approach in favor of the one described below. The plugin as described does fire in the appropriate place in the client's build lifecycle, but because the other components are missing, it does not help too much to run it.

Approach #2: Call Ant with AntRun

Looking through various Maven2 plugin sites for their source code, I came across the AntRun plugin. I had heard of it in the past, but did not like the idea of having to depend on Ant. However, given my newly found knowledge of "pure" Maven2 plugins, AntRun seemed to be a ready-made solution to my problem.

The first step was refactoring the extra steps in the compile target into separate Ant targets so they could be called individually, as shown in the build.xml snippet above. The second step was simply add the AntRun plugin descriptor and its configuration into the client POM. There were two gotchas here, however:

  1. Ant's properties were not getting initialized from build.xml, in spite of setting the inheritRefs attribute to true.
  2. Apt was not getting recognized as a valid Ant task, even though its a core task in Ant 1.7.1.

The first problem is easily solved. When mvn ant:ant is used to generate the build.xml, the properties are stored as globals (i.e. within the scope of the project tag), which don't get initialized when called with <ant target="..."/>. The workaround was to create a separate _init target which wrapped the property initialization, and make all tasks dependent on _init at the lowest level (ie, if a task has no dependencies, then it should depend on _init now. You will notice that all our tasks have the depends="_init" set in the build.xml snippet above.

The second problem took me a while to figure out. Apparently, AntRun uses an internal version of Ant (the default version is 1.6.5), and Apt was not a core Ant task in that version. To reset the version, you have to inject the correct version of Ant jars in the plugin descriptor - many thanks to Jason Lee for this post, which I reached through this JIRA page.

The plugin descriptor for AntRun to run the tasks in the process-classes phase is quite simple and self-explanatory, and is shown below. To run this, you need to do mvn process-classes (no fancy aliases here).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
  <build>
    ...
    <plugins>
      ...
      <!-- Antrun plugin -->
      <plugin>
        <artifactId>maven-antrun-plugin</artifactId>
        <executions>
          <execution>
            <phase>process-classes</phase>
            <configuration>
              <tasks>
                <property name="user.home" value="${user.home}"/>
                <ant antfile="build.xml" 
                  target="check-local-constraints" 
                  inheritRefs="true"/>
                <ant antfile="build.xml" 
                  target="generate-af-executors" 
                  inheritRefs="true"/>
                <ant antfile="build.xml" 
                  target="compile-af-executors" 
                  inheritRefs="true"/>
                <ant antfile="build.xml" 
                  target="weave-classes" 
                  inheritRefs="true"/>
              </tasks>
            </configuration>
            <goals>
              <goal>run</goal>
            </goals>
          </execution>
        </executions>
        <dependencies>
          <dependency>
            <groupId>org.apache.ant</groupId>
            <artifactId>ant</artifactId>
            <version>1.7.1</version>
          </dependency>
          <dependency>
            <groupId>org.apache.ant</groupId>
            <artifactId>ant-launcher</artifactId>
            <version>1.7.1</version>
          </dependency>
          <dependency>
            <groupId>org.apache.ant</groupId>
            <artifactId>ant-nodeps</artifactId>
            <version>1.7.1</version>
          </dependency>
          <dependency>
            <groupId>org.apache.ant</groupId>
            <artifactId>ant-apache-bsf</artifactId>
            <version>1.7.1</version>
          </dependency>
          <dependency>
            <groupId>org.apache.bsf</groupId>
            <artifactId>bsf-all</artifactId>
            <version>3.0-beta2</version>
          </dependency>
          <dependency>
            <groupId>rhino</groupId>
            <artifactId>js</artifactId>
            <version>1.7R1</version>
          </dependency>
        </dependencies>
      </plugin>
      ...
    </plugins>
    ...
  </build>

One document that you may find useful if you go the AntRun route is this list of properties accessible in the POM, which you can pass to Ant's task.

There is a lot of XML in here (more than if I went the MOJO route), but no extra plugin code to write. I also don't have to maintain both Ant and Maven2 XMLs simultaneously, which was my original gripe. The ant tasks here call the target in build.xml, but I could just as easily have built a standalone XML file which contained the scripts to do the various tasks, and which would not do the standard stuff such as compile, jar, etc.

Conclusions

When building goals involving third party components, where you either don't have visibility into or control of the source code, it may be preferable to use the AntRun plugin. Ant, notwithstanding its limitations (some of which Maven2 addresses), is likely to be with us for the forseeable future, so it makes sense to leverage it if it makes sense.

However, for goals involving internal components, building MOJO based custom Maven2 plugins would probably provide more flexibility and remove the need for having two build frameworks in place in an organization. That is really the reason I went as far into developing the WeaverMojo as I did, to provide me with an understanding of how to build a Maven2 custom plugin should I need to at some point in the future.

Its paradoxical that Maven offers a built-in feature to facilitate project documentation (mvn site), yet it is harder to find information for Maven than for Ant, which does not offer any such feature. In all fairness, all the Maven plugin projects I have looked, are,without exception, very well documented. However, there is no one-stop shop such as the Ant manual.

Saturday, January 10, 2009

2008: A retrospective

Well, it is (was?) that time of year again, when people look back on the year past and make bold plans for the one ahead. This once-a-year all-fluff-no-stuff post actually started off because I had nothing to write about that first week three years ago, and it seemed a good idea to write a summary style post. But I've doing this now for two years, so its almost as much a tradition as one friend's yearly pilgrimage to Reno the day after Christmas. Anyway, I guess its good to take stock of the things I did last year and have a road map in my head of what I want to accomplish the year ahead, so here goes.

Last year, my main focus was on learning various Information Retrieval techniques and algorithms and the math behind them. The main source for these has been the TMAP book and the Internet. My motivation for learning this was to gain some background to help improve the algorithms for the proprietary indexing processes at work. A lot of companies who do Information Retrieval typically just use a standard IR library like Lucene and build around it. Unlike them, our indexing algorithms are intimately tied to our health taxonomy, and occur before we stuff the data into a Lucene index. I've been curious about the right way to do certain things in there, and the stuff in the TMAP book was quite useful.

I also did some work trying to figure out various ways of applying graph data structures to model an ontology, with a view to provide a simpler and more maintainable API to our taxonomy for internal client applications. This never made it beyond the blog post, though, because this would have replaced a (less maintainable, in my opinion) API that we already had in production, and replicating and regressing all the services would have been quite a bit of work. I guess I will try to revive this idea at a future job.

Another thing I worked on for a few weeks and described in my blog was an event driven workflow application that modeled an internal workflow that we have been trying to automate for some time. Ultimately, the automation was done (more elegantly, in my opinion) using multiprocess make (make -j). However, it introduced me to Spring's and OSWorkflow's event handling functionality, which I guess will help me in some future application.

Later in the year, I started looking at various ways of processing large volumes of data in reasonable amounts of time. To that end, I started looking at Hadoop and a bit of multithreaded programming using the Actor framework. In both cases, the motivation were some excellent talks organized by East Bay IT Group (ebig), which I joined this year.

There have been a few successes based on this blog too. One of my ideas that I introduced here and that I just accidentally talked about to one of our product people at the water cooler will ultimately become a product later this year.

This year, I plan to learn more math and statistics based Information Retrieval/Data Mining techniques. I still have a few chapters to go through in the TMAP book. I also plan to take a look at some larger frameworks such as GATE and Lingpipe that are described very nicely in Dr Konchady's new book Building Search Applications - Lucene, LingPipe and GATE.

I also want to explore some larger frameworks that may be useful in my work, such as Solr and Carrot. I have used Carrot, but without fully understanding the clustering algorithms. I am hoping that my new found knowledge of various IR algorithms will help me in doing this. Also, I want to use Hadoop and the Actor framework to do some real applications.

I also hope to learn a bit of Scala over this year. I started learning it late last year, and so far, it seems a very cool language. I still don't have a good use for it, but I am guessing that will become apparent as I learn more about the language.

There is a lot of open source software around, and engineers who are aware of and willing to use open source software in their work will be able to do more with less, something that we probably all need to do in the tougher economic climate ahead. I have already been doing this, and I plan to do more of it next year.

Well, that's pretty much it for my New Year's resolutions. Y'all have a very Happy New Year, everyone!

Friday, January 02, 2009

More Java Actor Frameworks Compared

Over the past few weeks, I've been looking at various (Java and Scala based) Actor frameworks. In an attempt to understand the API of these frameworks, I've been porting the same toy example consisting of three pipelined Actors responding to a bunch of requests shoved down one end of the pipe. To get a feel for how they compare, performance-wise, to one another, I've also been computing wallclock times for various request batch sizes. Mike Rettig (author of the Jetlang project) pointed out that the Jetlang numbers I published in my last week's post appeared incorrect in comparison to the Scala numbers. Rajesh Karmani (one of the authors of the ActorFoundry project) also expressed surprise that Kilim numbers were higher compared to Scala.

Mike was kind enough to take a look at the Jetlang code, and he suggested that the excessive amounts of console IO that the actors were making were causing it to perform worse than Scala. Taking the println() calls out from both the Scala and the Jetlang examples improved the performance of both significantly, and the Jetlang example ended up with lower elapsed time numbers than the Scala examples. Apparently, the performance characteristics of the Scala println() was different enough from the Java System.out.println() to skew the results. This week, I remove the console output from all the examples (after verifying that they work correctly) and republish the numbers.

Tim Jansen (author of the Actor's Guild framework), was also kind enough to build me an Actor's Guild version of my example.

Rajesh also took a look at the code for the Kilim example at my request, and he pointed out several improvements that may make Kilim run faster. I have incorporated his suggestions into the rewritten code. He also pointed me to his ActorFoundry project, and, over the last couple of days, he has been of immense assistance (via email) in helping me to build an ActorFoundry version of my toy example.

This week, I provide the updated code for Kilim and Jetlang, and code to work with Actor's Guild and ActorFoundry, and provide the elapsed time comparison between these frameworks (as well as the Scala examples from last week). In many ways, this post is largely due to the efforts of these three fine programmers. Thank you, gentlemen!

Kilim - updated

The original Kilim code for my example used a Message object that was passed around between the Actors. Since the Jetlang example just used a String messsage (which was really all that my example needed), I changed it over to use a String instead of the Message object, thereby removing the extra instanceof checks to distinguish between a regular message and a poison pill termination messsage. The code is explained in detail in my previous post, I just just post the updated code here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Source: src/main/java/com/mycompany/myapp/concurrent/kilim/ActorManager.java
package com.mycompany.myapp.concurrent.kilim;

import java.util.concurrent.TimeUnit;

import kilim.ExitMsg;
import kilim.Mailbox;

public class ActorManager {

  public static final String STOP = "__STOP__";
  private static final int ACTOR_THREAD_POOL_SIZE = 2;
  
  public static void main(String[] args) {
    Mailbox<String> mb0 = new Mailbox<String>();
    Mailbox<String> mb1 = new Mailbox<String>();
    Mailbox<String> mb2 = new Mailbox<String>();
    Mailbox<ExitMsg> callback = new Mailbox<ExitMsg>();
    
    // instantiate actors
    DownloadActor downloadActor = new DownloadActor(ACTOR_THREAD_POOL_SIZE, mb0, mb1);
    IndexActor indexActor = new IndexActor(ACTOR_THREAD_POOL_SIZE, mb1, mb2);
    WriteActor writeActor = new WriteActor(ACTOR_THREAD_POOL_SIZE, mb2, null);
    
    // start the actors
    downloadActor.start();
    indexActor.start();
    writeActor.start();
    writeActor.informOnExit(callback);
    
    long start = System.nanoTime();
    int numTasks = 1000000;
    for (int i = 0; i < numTasks; i++) {
      String req = "Requested " + i;
      mb0.putnb(req);
      log(req);
    }
    
    // poison pill to stop the actors
    mb0.putnb(ActorManager.STOP);
    // block till the last actor has informed the manager that it exited
    callback.getb();
    long elapsed = System.nanoTime() - start;
    System.out.println("elapsed=" + TimeUnit.MILLISECONDS.convert(elapsed, TimeUnit.NANOSECONDS));
    System.exit(0);
  }
  
  public static void log(String message) {
//    System.out.println(message);
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Source: src/main/java/com/mycompany/myapp/concurrent/kilim/Actor.java
package com.mycompany.myapp.concurrent.kilim;

import kilim.Mailbox;
import kilim.Task;
import kilim.pausable;

public abstract class Actor extends Task {

  private Mailbox<String> inbox;
  private Mailbox<String> outbox;
  
  public Actor(int numThreads, Mailbox<String> inbox, Mailbox<String> outbox) {
    this.inbox = inbox;
    this.outbox = outbox;
//    setScheduler(new Scheduler(numThreads));
  }

  @pausable
  public void execute() {
    for (;;) {
      String request = inbox.get();
      // this is custom poison pill handling code for our application
      if (request.equals(ActorManager.STOP)) {
        if (outbox != null) {
          outbox.put(request);
        }
        break;
      }
      // end of poison pill handling
      String response = act(request);
      ActorManager.log(response);
      if (outbox != null) {
        outbox.put(response);
      }
    }
  }
  
  public abstract String act(String request);
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Source: src/main/java/com/mycompany/myapp/concurrent/kilim/DownloadActor.java
package com.mycompany.myapp.concurrent.kilim;

import kilim.Mailbox;

public class DownloadActor extends Actor {

  public DownloadActor(int numThreads, Mailbox<String> inbox, Mailbox<String> outbox) {
    super(numThreads, inbox, outbox);
  }

  @Override
  public String act(String request) {
    return request.replaceFirst("Requested ", "Downloaded ");
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Source: src/main/java/com/mycompany/myapp/concurrent/kilim/IndexActor.java
package com.mycompany.myapp.concurrent.kilim;

import kilim.Mailbox;

public class IndexActor extends Actor {

  public IndexActor(int numThreads, Mailbox<String> inbox, Mailbox<String> outbox) {
    super(numThreads, inbox, outbox);
  }

  @Override
  public String act(String request) {
    return request.replaceFirst("Downloaded ", "Indexed ");
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Source: src/main/java/com/mycompany/myapp/concurrent/kilim/WriteActor.java
package com.mycompany.myapp.concurrent.kilim;

import kilim.Mailbox;

public class WriteActor extends Actor {

  public WriteActor(int numThreads, Mailbox<String> inbox, Mailbox<String> outbox) {
    super(numThreads, inbox, outbox);
  }

  @Override
  public String act(String request) {
    return request.replaceFirst("Indexed ", "Wrote ");
  }
}

Jetlang - updated

The original code for my Jetlang example can be found here.

Mike rewrote my example quite a bit and made it part of the Jetlang distribution examples. You can browse the code in the Jetlang SVN repository. The main change is the refactoring out of the System.out.println() calls into the Main.log() method (the Main.java is the same as my ActorManager.java) and commenting it out for benchmarking. Other changes include changing the Message object into a String, and the addition of channels and listener to respond to the poison pill. Overall, the resulting code is more elegant than mine, so I've changed my code to reflect these changes.

Scala (loop/receive) - updated

The Scala versions (originally described here) remain almost unchanged, except that there is now a new function log in the ActorManager object, and all the Actors use this method to log the message. As in the Jetlang example, it's body is commented out. I have also changed the while(true) call in the previous example to use the loop method of Actor. Here is the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
// Source: ActorManager.scala
package myjob

import java.lang._
import java.util.concurrent.CountDownLatch
import scala.actors._
import scala.actors.Actor._

object ActorManager {

  val latch = new CountDownLatch(3)
  def decrementLatch(): Unit = {
    latch.countDown
  }

  def log(message:String): Unit = {
    //println(message)
  }

  def main(args: Array[String]): Unit = {
    // start the actors
    DownloadActor.start
    IndexActor.start
    WriteActor.start
    // seed the download actor with requests
    val start = System.currentTimeMillis
    for (i <- 1 until 1000000) {
      val payload = "Requested " + i
      log(payload)
      DownloadActor ! payload
    }
    // ask them to stop
    DownloadActor ! StopMessage
    // wait for actors to stop
    latch.await
    println("elapsed = " + (System.currentTimeMillis - start))
  }
}

case class StopMessage()

object DownloadActor extends Actor {
  def act() {
    loop {
      receive {
        case payload: String => {
          val newPayload = payload.replaceFirst("Requested ", "Downloaded ")
          ActorManager.log(newPayload)
          IndexActor ! newPayload
        }
        case StopMessage => {
          ActorManager.log("Stopping download")
          IndexActor ! StopMessage
          ActorManager.decrementLatch
          exit
        }
      }
    }
  }
}

object IndexActor extends Actor {
  def act() {
    loop {
      receive {
        case payload: String => {
          val newPayload = payload.replaceFirst("Downloaded ", "Indexed ")
          ActorManager.log(newPayload)
          WriteActor ! newPayload
        }
        case StopMessage => {
          ActorManager.log("Stopping Index")
          WriteActor ! StopMessage
          ActorManager.decrementLatch
          exit
        }
      }
    }
  }
}

object WriteActor extends Actor {
  def act() {
    loop {
      receive {
        case payload: String => {
          val newPayload = payload.replaceFirst("Indexed ", "Wrote ")
          ActorManager.log(newPayload)
        }
        case StopMessage => {
          ActorManager.log("Stopping Write")
          ActorManager.decrementLatch
          exit
        }
      }
    }
  }
}

Scala (loop/react) - updated

The loop/react version of the above Scala example simply replaces the loop/receive calls with loop/react. In the interests of brevity, I am not including it here - just change the receive call to react in three places and you have the loop/react version.

Actor's Guild

The Actor's Guild framework provides a nice annotation based approach to build Actors. Methods that are marked as @Initializer roughly correspond to actor constructors, and methods annotated by @Message correspond roughly to the Actor.act() method. Both return an AsyncResult. @Message methods may take parameters. Actor's Guild provides an Actor class which all application Actors must extend. More information is available in the tutorial. The code (provided by Tim Jansen with some extra comments from me) is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Source: src/main/java/com/mycompany/myapp/concurrent/actorsguild/ActorManager.java
package com.mycompany.myapp.concurrent.actorsguild;

import java.util.concurrent.TimeUnit;

import org.actorsguildframework.AsyncResult;
import org.actorsguildframework.DefaultAgent;

public class ActorManager {
  public static void main(String[] args) {
    DefaultAgent ag = new DefaultAgent();
    
    WriteActor writeActor = ag.create(WriteActor.class);
    IndexActor indexActor = ag.create(IndexActor.class).init(writeActor).get();
    DownloadActor downloadActor = 
      ag.create(DownloadActor.class).init(indexActor).get();

    // The original code allocated an array of AsyncResult[numberOfRequests]
    // and populated it by looping through the number of tasks and seeding the
    // downloadActor with its initial request. Although conceptually simpler, 
    // it needed a huge amount of memory and didn't scale well for 
    // numberOfRequests > 100,000. So the strategy is to batch the tasks
    // into blocks of 100,000 and submit until they are all processed.
    long start = System.nanoTime();
    int numberOfRequests = 1000000;
    int tasksDone = 0;
    while (tasksDone < numberOfRequests) {
      int batchSize = Math.min(numberOfRequests - tasksDone, 100000);
      AsyncResult[] results = new AsyncResult[batchSize];
      for (int i = 0; i < batchSize; i++) {
        results[i] = downloadActor.download(tasksDone + i, "Requested " + i);
      }
      ag.awaitAllUntilError(results);
      tasksDone += batchSize;
    }
    long elapsed = System.nanoTime() - start;
    System.out.println("elapsed=" + TimeUnit.MILLISECONDS.convert(
      elapsed, TimeUnit.NANOSECONDS));
    ag.shutdown();
  }
  
  public static void log(String message) {
//    System.out.println(message);
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Source: src/main/java/com/mycompany/myapp/concurrent/actorsguild/DownloadActor.java
package com.mycompany.myapp.concurrent.actorsguild;

import org.actorsguildframework.Actor;
import org.actorsguildframework.AsyncResult;
import org.actorsguildframework.annotations.Initializer;
import org.actorsguildframework.annotations.Message;


public class DownloadActor extends Actor {
  public IndexActor indexActor;
  
  @Initializer
  public AsyncResult<DownloadActor> init(IndexActor indexActor) {
    this.indexActor = indexActor;
    return result(this);
  }
  
  @Message
  public AsyncResult<Void> download(int id, String payload) {
    String newPayload = payload.replaceFirst("Requested ", "Downloaded ");
    ActorManager.log(newPayload);
    return indexActor.index(id, newPayload);
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Source: src/main/java/com/mycompany/myapp/concurrent/actorsguild/IndexActor.java
package com.mycompany.myapp.concurrent.actorsguild;

import org.actorsguildframework.Actor;
import org.actorsguildframework.AsyncResult;
import org.actorsguildframework.annotations.Initializer;
import org.actorsguildframework.annotations.Message;

public class IndexActor extends Actor {
  public WriteActor writeActor;
  
  @Initializer
  public AsyncResult<IndexActor> init(WriteActor writeActor) {
    this.writeActor = writeActor;
    return result(this);
  }
  
  
  @Message
  public AsyncResult<Void> index(int id, String payload) {
    String newPayload = payload.replaceFirst("Downloaded ", "Indexed ");
    ActorManager.log(newPayload);
    return writeActor.write(id, newPayload);
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Source: src/main/java/com/mycompany/myapp/concurrent/actorsguild/WriteActor.java
package com.mycompany.myapp.concurrent.actorsguild;

import org.actorsguildframework.Actor;
import org.actorsguildframework.AsyncResult;
import org.actorsguildframework.annotations.Message;

public class WriteActor extends Actor {
  @Message
  public AsyncResult<Void> write(int id, String payload) {
    String newPayload = payload.replaceFirst("Indexed ", "Wrote ");
    ActorManager.log(newPayload);
    return noResult();
  }
}

Unlike Kilim, which uses bytecode enhancement as a post-compilation step, Actor's Guild uses bytecode enhancement at runtime to create several helper classes dynamically for each Actor. This is done once, the first time the Actor is created. Actor's Guild uses asm 3.1 (as opposed to asm-2.2.3 for Kilim) to do the bytecode enhancement.

The resulting code is quite easy to read. The initial version was even easier, but because we are pre-allocating the array of AsyncResult objects to hold the results, when there are a large number of requests to be processed, my machine was thrashing with a 2GB heap and times for 1 million tasks were quite high. So Tim made the change to batch them up in chunks, which yields much better times. Benchmarks aside, the idiom for breaking up a large concurrent job into batches of smaller size is quite neat, and could possibly find uses in similar situations elsewhere.

ActorFoundry

ActorFoundry uses Kilim internally. It provides a runner application (called Foundry) that the application Actors run within. Like Actor's Guild, it relies on annotations, and messages correspond to methods in the Actors. Unlike Actor's Guild, messages are sent using a send() call, the parameters of which identify the target actor and method name - it looks a bit like method invocation using Reflection. The methods which can be called as messages are marked with the @message annotation. In ActorFoundry, all components run within the foundry and must be Actors, so the ActorManager in my example is also an Actor. Here is the code. I wrote an initial version of the code based on the examples in the distribution, which didn't work, and which Rajesh modified so it would.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Source: src/main/java/com/mycompany/myapp/concurrent/actorfoundry/ActorManager.java
package com.mycompany.myapp.concurrent.actorfoundry;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;

import osl.manager.Actor;
import osl.manager.ActorName;
import osl.manager.RemoteCodeException;
import osl.manager.annotations.message;

public class ActorManager extends Actor {
  
  private static final long serialVersionUID = -8621318190754146319L;

  private static final CountDownLatch latch = new CountDownLatch(3);
  
  @message
  public void boot(Integer tasks) {
    try {
      ActorName downloadActor = create(DownloadActor.class, self());
      // seed the download actor with numRequests tasks
      long start = System.nanoTime();
      for (int i = 0; i < tasks; i++) {
        String message = "Requested " + i;
        send(downloadActor, "download", message);
//        send(stdout, "println", message);
      }
      // send poison pill to terminate actors
      send(downloadActor, "stop");
      // wait for all the actors to terminate after getting the poison pill
      latch.await();
      long elapsed = System.nanoTime() - start;
      send(stdout, "println", "elapsed=" + 
        TimeUnit.MILLISECONDS.convert(elapsed, TimeUnit.NANOSECONDS));
      System.exit(0);
    } catch (RemoteCodeException e) {
      e.printStackTrace();
    } catch (InterruptedException e) {
      e.printStackTrace();
    }
  }
  
  @message
  public static void decrementLatch() {
    latch.countDown();
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Source: src/main/java/com/mycompany/myapp/concurrent/actorfoundry/DownloadActor.java
package com.mycompany.myapp.concurrent.actorfoundry;

import osl.manager.Actor;
import osl.manager.ActorName;
import osl.manager.RemoteCodeException;
import osl.manager.annotations.message;

public class DownloadActor extends Actor {

  private static final long serialVersionUID = -2311959419132224127L;

  private ActorName actorManager;
  private ActorName indexActor;
  
  public DownloadActor(ActorName manager) throws RemoteCodeException {
    actorManager = manager;
  }
  
  @message
  public void download(String message) throws RemoteCodeException {
    String newMessage = message.replaceFirst("Requested ", "Downloaded ");
    if (indexActor == null) {
      indexActor = create(IndexActor.class, actorManager);
    }
//    send(stdout, "println", newMessage);
    send(indexActor, "index", newMessage);
  }
  
  @message
  public void stop() throws RemoteCodeException {
    send(indexActor, "stop");
    ActorManager.decrementLatch();
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Source: src/main/java/com/mycompany/myapp/concurrent/actorfoundry/IndexActor.java
package com.mycompany.myapp.concurrent.actorfoundry;

import osl.manager.Actor;
import osl.manager.ActorName;
import osl.manager.RemoteCodeException;
import osl.manager.annotations.message;

public class IndexActor extends Actor {

  private static final long serialVersionUID = -7939186176349943105L;

  private ActorName actorManager;
  private ActorName writeActor;
  
  public IndexActor(ActorName manager) throws RemoteCodeException {
    actorManager = manager;
  }

  @message
  public void index(String message) throws RemoteCodeException {
    String newMessage = message.replaceFirst("Downloaded ", "Indexed ");
    if (writeActor == null) {
      writeActor = create(WriteActor.class, actorManager);
    }
//    send(stdout,"println",newMessage);
    send(writeActor, "write", newMessage);
  }
  
  @message 
  public void stop() throws RemoteCodeException {
    send(writeActor, "stop");
    ActorManager.decrementLatch();
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Source: src/main/java/com/mycompany/myapp/concurrent/actorfoundry/WriteActor.java
package com.mycompany.myapp.concurrent.actorfoundry;

import osl.manager.Actor;
import osl.manager.ActorName;
import osl.manager.RemoteCodeException;
import osl.manager.annotations.message;

public class WriteActor extends Actor {

  private static final long serialVersionUID = -4203081425372996186L;

  private ActorName actorManager;

  public WriteActor(ActorName manager) {
    actorManager = manager;
  }

  @message
  public void write(String message) throws RemoteCodeException {
    String newMessage = message.replaceFirst("Indexed ", "Wrote ");
//    send(stdout, "println", newMessage);
  }
  
  @message 
  public void stop() throws RemoteCodeException {
    ActorManager.decrementLatch();
  }
}

There is actually no necessity to comment out the console IO calls in this case, since stdout is an actor and processes the println() calls asynchronously, but I did this in any case, for consistency with the other examples.

Processing the code so it can be run is quite complex, and involves code generation before compilation and post-processing (using the Kilim weaver) after compilation. Here is the snippet from my build.xml that selectively works on the ActorFoundry code in the compile target (its mostly copied from the various targets from the build.xml file in the ActorFoundry distribution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
  <target name="compile" depends="get-deps" description="Compile the code">
    <mkdir dir="${maven.build.output}"/>
    <javac srcdir="${maven.src.dir}"
           destdir="${maven.build.output}" 
           excludes="**/package.html" 
           debug="true" 
           deprecation="true" 
           optimize="false">
      <classpath refid="build.classpath"/>
    </javac>
    <!-- check local constraints happens for af only -->
    <apt srcdir="${maven.src.dir}/com/mycompany/myapp/concurrent/actorfoundry"
         compile="false"
         classpathref="build.classpath"
         debug="true"
         factory="osl.foundry.preprocessor.LocalSynchConstAPF"
         factorypathref="build.classpath"/>
    <!-- code generation for af only -->
    <delete dir="${maven.src-gen.dir}"/>
    <mkdir dir="${maven.src-gen.dir}"/>
    <javadoc private="true"
         doclet="osl.foundry.preprocessor.ExecutorCodeGen"
         docletpathref="build.classpath"
         classpathref="build.classpath"
         sourcepath="${maven.src.dir}"
         packagenames="com.mycompany.myapp.concurrent.actorfoundry">
      <arg line="-outdir ${maven.src-gen.dir}"/>
    </javadoc>
    <!-- compile generated code: for af only -->
    <javac srcdir="${maven.src-gen.dir}"
           destdir="${maven.build.output}"
           debug="on"
           fork="on">
      <classpath refid="build.classpath"/>
    </javac>
    <!-- weaving happens for kilim and af files -->
    <java classname="kilim.tools.Weaver" fork="yes">
      <classpath refid="weave.classpath"/>
      <assertions>
        <enable/>
      </assertions>
      <arg value="-x"/>
      <arg value="ExInvalid|test"/>
      <arg value="-d"/>
      <arg value="${maven.build.output}"/>
      <arg line="${kilim.prefix}.ActorManager 
                    ${kilim.prefix}.Actor 
                    ${kilim.prefix}.DownloadActor 
                    ${kilim.prefix}.IndexActor 
                    ${kilim.prefix}.WriteActor 
                    ${af.prefix}.ActorManagerExecutor 
                    ${af.prefix}.DownloadActorExecutor 
                    ${af.prefix}.IndexActorExecutor 
                    ${af.prefix}.WriteActorExecutor"/>
    </java>
  </target>

If you are observant, you will notice that there is no mention in my code of the *Executor class names I provide for weaving. These are actually code generated off the Actors shown by the ExecutorCodeGen code generator. To run the code, I use the following shell script. The class at the near end of the actor pipeline (ActorManager) is mentioned, and the parameter to its boot() method is provided. The -open means that the foundry remains running even after all the actors are finished. In the code, I have a System.exit(0) which terminates the run.

1
2
prompt$ java -cp $REPO/actorfoundry-1.0.jar:target/classes osl.foundry.FoundryStart \
    com.mycompany.myapp.concurrent.actorfoundry.ActorManager boot 1000000 -open

One thing to note about ActorFoundry is its license, it is not free for commercial use. There are plans for making the source repository visible to outsiders, and the distribution does come with lots of example code, but it would be nice if the project had a tutorial style user guide for new users to get started.

Elapsed Time Comparison

I ran all the examples, increasing the task size from 1 to 1,000,000 in power-steps of 10 (ie, 1, 10, 100, ..., 1,000,000). The results are shown below in graph and data form.

#-TASKSKILIMJETLANGSCALA
RECV
SCALA
REACT
ACTORS
GUILD
ACTOR
FOUNDRY
122545814170
10527447613142
10045538010678204
1000330310424509333557
100008609031921162611411497
100000242922865601424247403542
1000000197161865052601348373883424844

Please remember that these numbers are meaningless if you are trying to figure out which will perform the best for your application. All that the actors in my application do is replace a string with another. Real-world actors that you will write for your application are likely to do something less trivial that that, which could potentially cause threads to block, and perhaps result in very different performance characteristics. To figure out which actor application would be best suited to your application, you should run your own benchmarks - now that you've read this far, you are as familiar as I am with the various actor APIs, so writing your own application should be fairly simple.

Update - 2009-01-06

Phillip Haller (one of the people behind Scala's Actor framework) pointed out that there are some optimizations to Scala's internal thread pool implementation in version 2.7.3, so I reran the Scala examples with this version. The chart and table below summarize the results.

#-TASKSKILIMJETLANGSCALA
RECV
SCALA
REACT
ACTORS
GUILD
ACTOR
FOUNDRY
122524904170
10527554713142
1004553487578204
1000330310327382333557
100008609031351157011411497
100000242922865096327447403542
1000000197161865032975234693883424844

Update - 2009-01-12

Mats Henrickson pointed out that the Scala example was doing string concatenation and the Java examples were not, and also pointed out that a Message class was being used to pass the message as opposed to a plain string in the Java examples. He was kind enough to send me an updated version of the code. He noticed an 8% speedup in the Scala code as a result of these changes on his box (see comments below). I have made the updates to the Scala example above.