Couple weeks ago, aimlessly surfing the web (a relatively rare occurrence for me nowadays, thanks to Google), I came across someone's resume whose format I liked - at the bottom of the page, it said "Generated by XmlResume". That got me curious about what it was and how it could help me, so I decided to check it out.
With XmlResume, you write your resume out once in a standard XML format, and XmlResume can parse this XML into plain text, HTML and PDF. You can also filter out specific sections of the resume by setting an optional target attribute to any of the elements in the XML. So simple and elegant, yet such a powerful idea.
The last time I was looking for a job, the trend was to send out plain text resumes, which you would drop into the body of an email. Before that, it was PDF attachments. Apparently the trend now is to send them out as Microsoft Word attachments. Sadly XmlResume cannot write out MS-Word docs, and even the text format it writes is not exactly what I am used to (its formatted with margins and newlines to look almost like a Word or PDF doc, requiring extensive reformatting if I decided to send it in the body of an email).
I did take a quick look at the code, but decided it would be too much work to modify it to suit my requirements (MS-Word and email friendly text output). Thinking about this some more, I figured that if I could convert the XML into an OpenOffice text document (ODT), OpenOffice could then convert the ODT into a multitude of formats, including plain text, XHTML, PDF and Microsoft Word formats.
Since XmlResume is a Java application, I initially thought about adding this as an extension to it using the jOpenDocument library, but then found the odfpy Python library. Both of these are wrappers to write your content out into the OpenDocument format (ODF), which is basically just a zipped set of XML files. Since this was something that I would want to just run from the command line, writing the whole thing in Python seemed to be a simpler alternative than messing with Ant targets or shell script wrappers.
So I wrote a little Python script that parses the input XmlResume XML file into a bean using the XML parsing library elementtree, then converting the bean to either a plain text document (initially for testing) using plain file.write() calls or to an OpenOffice text document (.odt) using odfpy. Here it is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482 | #!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
from elementtree.ElementTree import parse
import getopt
from odf.opendocument import OpenDocumentText
from odf.style import FontFace
from odf.style import ListLevelProperties
from odf.style import ParagraphProperties
from odf.style import Style
from odf.style import TextProperties
from odf.text import List
from odf.text import ListItem
from odf.text import ListLevelStyleBullet
from odf.text import ListStyle
from odf.text import P
from odf.text import Span
import string
class ResumeModel:
def __init__(self):
self.name = None
self.address = None
self.phone = None
self.email = None
self.contacts = []
self.objective_title = None
self.objectives = []
self.skillarea_title = None
self.skillset_titles = []
self.skillsets = [[]]
self.jobs_title = None
self.job_titles = []
self.job_employers = []
self.job_periods = []
self.job_descriptions = []
self.job_achievements = [[]]
self.academics_title = None
self.academics = []
self.awards_title = None
self.awards = []
def to_string(self):
print "name=", self.name
print "address=", self.address
print "phone=", self.phone
print "email=", self.email
for contact in self.contacts:
print "contact=", contact
print "objective_title", self.objective_title
for objective in self.objectives:
print "objective=", objective
print "skillarea_title=", self.skillarea_title
for skillset_title in self.skillset_titles:
print "skillset_title:", skillset_title
for skillset in self.skillsets:
print "skillset=", ",".join(skillset)
print "jobs_title=", self.jobs_title
for job_title in self.job_titles:
print "job_title=", job_title
for job_employer in self.job_employers:
print "job_employer=", job_employer
for job_description in self.job_descriptions:
print "job_description=", job_description
for job_period in self.job_periods:
print "job_period=", job_period
for job_achievement in self.job_achievements:
for job_achievement_item in job_achievement:
print "achievement_item=", job_achievement_item
print "academics_title", self.academics_title
for academic in self.academics:
print "academic=", academic
print "awards_title=", self.awards_title
for award in self.awards:
print "award=", award
class XmlResumeParser():
def __init__(self, input_file, target):
self.target = target
self.input = open(input_file, "r")
self.root = parse(input_file).getroot()
self.breadcrumb = []
self.model = ResumeModel()
self.skillset_idx = -1
self.job_idx = -1
self.degree_idx = -1
self.award_idx = -1
def close(self):
self.input.close()
def parse(self):
self.parse_r(self.root)
def parse_r(self, parent):
if not self.process_target(parent, self.target):
return
self.breadcrumb.append(parent.tag)
self.process_element(parent)
for child in list(parent):
self.parse_r(child)
self.breadcrumb.pop()
def process_target(self, parent, target):
target_attr = parent.attrib.get("target")
if target is None:
if target_attr is None:
return True
else:
return False
else:
if target_attr is None:
return True
else:
if target.find('+') > -1 or target.find(',') > -1:
op = "+" if target.find('+') > -1 else ","
target_set = set(target.split(op))
target_attr_set = set(target_attr.split(op))
if target.find('+') > -1:
return True if len(target_set.intersection(target_attr_set)) \
== len(target_set) else False
else:
return True if len(target_set.intersection(target_attr_set)) > 0 \
else False
else:
return True if target_attr == target else False
def process_element(self, elem):
key = "/".join(self.breadcrumb)
tag = elem.tag
last_tag = self.breadcrumb[-1:][0]
if key.startswith("resume/header/name/"):
self.model.name = self.append(self.model.name, elem.text)
elif key.startswith("resume/header/address/"):
if tag == "street":
self.model.address = elem.text
elif tag == "city" or tag == "state":
self.model.address = self.append(self.model.address, elem.text, ", ")
elif tag == "zip":
self.model.address = self.append(self.model.address, elem.text, " ")
elif key.startswith("resume/header/contact/"):
if tag == "phone":
self.model.phone = "PHONE: " + elem.text
elif tag == "email":
self.model.email = "EMAIL: " + elem.text
else:
self.model.contacts.append(string.upper(elem.tag) + ": " + elem.text)
elif key == "resume/objective":
self.model.objective_title = self.get_title(elem)
elif key.startswith("resume/objective/"):
self.model.objectives.append(elem.text)
elif key == "resume/skillarea":
self.model.skillarea_title = self.get_title(elem)
elif key == "resume/skillarea/skillset":
self.skillset_idx = self.skillset_idx + 1
self.model.skillset_titles.append(self.get_title(elem))
self.model.skillsets.append([])
elif key == "resume/skillarea/skillset/skill":
if elem.attrib.get("level") != None:
self.model.skillsets[self.skillset_idx].append(elem.text +
" (" + elem.attrib.get("level") + ")")
else:
self.model.skillsets[self.skillset_idx].append(elem.text)
elif key == "resume/history":
self.model.jobs_title = self.get_title(elem)
elif key == "resume/history/job":
self.job_idx = self.job_idx + 1
self.model.job_achievements.append([])
elif key.startswith("resume/history/job/"):
if tag == "jobtitle":
self.model.job_titles.append(elem.text)
elif tag == "employer":
self.model.job_employers.append(elem.text)
elif tag == "from":
if len(list(elem)) == 1:
date_from = self.format_date(list(elem)[0])
self.model.job_employers[self.job_idx] = \
self.model.job_employers[self.job_idx] + " (" + date_from
elif tag == "to":
if len(list(elem)) == 1:
date_to = self.format_date(list(elem)[0])
self.model.job_employers[self.job_idx] = \
self.model.job_employers[self.job_idx] + " - " + date_to + ")"
elif tag == "description":
self.model.job_descriptions.append(elem.text)
elif tag == "achievement":
self.model.job_achievements[self.job_idx].append(elem.text)
elif key == "resume/academics":
self.model.academics_title = self.get_title(elem)
elif key == "resume/academics/degrees/degree":
self.degree_idx = self.degree_idx + 1
self.model.academics.append([])
elif key.startswith("resume/academics/degrees/degree/"):
if tag == "level":
self.model.academics[self.degree_idx] = elem.text
elif tag == "major":
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + ", " + elem.text
elif tag == "institution":
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + " from " + elem.text
elif tag == "from":
if len(list(elem) == 1):
from_date = self.format_date(list(elem)[0])
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + " (" + elem.text
elif tag == "to":
if len(list(elem) == 1):
to_date = self.format_date(list(elem)[0])
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + " - " + elem.text + ")"
elif key == "resume/awards":
self.model.awards_title = self.get_title(elem)
elif key == "resume/awards/award":
self.award_idx = self.award_idx + 1
self.model.awards.append([])
elif key.startswith("resume/awards/award/"):
if tag == "title":
self.model.awards[self.award_idx] = elem.text
elif tag == "organization":
self.model.awards[self.award_idx] = \
self.model.awards[self.award_idx] + " from " + elem.text
elif tag == "date":
award_date = self.format_date(elem)
self.model.awards[self.award_idx] = \
self.model.awards[self.award_idx] + " (" + award_date + ")"
def format_date(self, elem):
if elem.tag != "date":
return elem.tag
dmy = ["", "", ""]
for child in list(elem):
if child.tag == "day":
dmy[0] = child.text
elif child.tag == "month":
dmy[1] = child.text
elif child.tag == "year":
dmy[2] = child.text
else:
continue
filtered_dmy = filter(lambda e : len(e) > 0, dmy)
if len(filtered_dmy) > 0:
return " ".join(filtered_dmy)
def get_title(self, elem):
title = elem.attrib.get("title")
if title is None:
return string.upper(elem.tag)
else:
return title
def append(self, buf, str, sep=" "):
if buf == None:
buf = str
else:
buf = buf + sep + str
return buf
class TextResumeWriter():
def __init__(self, filename):
self.file = open(filename, 'w')
def write(self, model):
self.writeln(model.name)
self.writeln(model.address)
self.writeln(", ".join([model.phone, model.email]))
self.writeln(", ".join(model.contacts))
self.writeln("-" * 80)
self.writeln(model.objective_title)
self.writeln()
self.writeln("\n".join(model.objectives))
self.writeln("-" * 80)
self.writeln(model.skillarea_title)
self.writeln()
for i in range(0, len(model.skillset_titles)):
self.writeln(model.skillset_titles[i] + ": " + ",".join(model.skillsets[i]))
self.writeln("-" * 80)
self.writeln(model.jobs_title)
for i in range(0, len(model.job_titles)):
self.writeln()
self.writeln(model.job_titles[i])
self.writeln(model.job_employers[i])
self.writeln(model.job_descriptions[i])
for achievement in model.job_achievements[i]:
self.writeln("* " + achievement)
self.writeln("-" * 80)
self.writeln(model.academics_title)
self.writeln()
for academic in model.academics:
self.writeln("* " + academic)
self.writeln("-" * 80)
self.writeln(model.awards_title)
self.writeln()
for award in model.awards:
self.writeln("* " + award)
def writeln(self, s=None):
if s != None:
self.file.write(s)
self.file.write("\n")
def close(self):
self.file.close()
class OdfResumeWriter():
def __init__(self, filename):
self.filename = filename
self.doc = OpenDocumentText()
# font
self.doc.fontfacedecls.addElement((FontFace(name="Arial", \
fontfamily="Arial", fontsize="10", fontpitch="variable", \
fontfamilygeneric="swiss")))
# styles
style_standard = Style(name="Standard", family="paragraph", \
attributes={"class":"text"})
style_standard.addElement(ParagraphProperties(punctuationwrap="hanging", \
writingmode="page", linebreak="strict"))
style_standard.addElement(TextProperties(fontname="Arial", \
fontsize="10pt", fontsizecomplex="10pt", fontsizeasian="10pt"))
self.doc.styles.addElement(style_standard)
# automatic styles
style_normal = Style(name="ResumeText", parentstylename="Standard", \
family="paragraph")
self.doc.automaticstyles.addElement(style_normal)
style_bold_text = Style(name="ResumeBoldText", parentstylename="Standard", \
family="text")
style_bold_text.addElement(TextProperties(fontweight="bold", \
fontweightasian="bold", fontweightcomplex="bold"))
self.doc.automaticstyles.addElement(style_bold_text)
style_list_text = ListStyle(name="ResumeListText")
style_list_bullet = ListLevelStyleBullet(level="1", \
stylename="ResumeListTextBullet", numsuffix=".", bulletchar=u'\u2022')
style_list_bullet.addElement(ListLevelProperties(spacebefore="0.1in", \
minlabelwidth="0.2in"))
style_list_text.addElement(style_list_bullet)
self.doc.automaticstyles.addElement(style_list_text)
style_bold_para = Style(name="ResumeH2", parentstylename="Standard", \
family="paragraph")
style_bold_para.addElement(TextProperties(fontweight="bold", \
fontweightasian="bold", fontweightcomplex="bold"))
self.doc.automaticstyles.addElement(style_bold_para)
style_bold_center = Style(name="ResumeH1", parentstylename="Standard", \
family="paragraph")
style_bold_center.addElement(TextProperties(fontweight="bold", \
fontweightasian="bold", fontweightcomplex="bold"))
style_bold_center.addElement(ParagraphProperties(textalign="center"))
self.doc.automaticstyles.addElement(style_bold_center)
def write(self, model):
self.doc.text.addElement(P(text=model.name, stylename="ResumeH1"))
self.doc.text.addElement(P(text=model.address, stylename="ResumeH1"))
self.doc.text.addElement(P(text=", ".join([model.phone, model.email]), \
stylename="ResumeH1"))
for contact in model.contacts:
self.doc.text.addElement(P(text=contact, stylename="ResumeH1"))
self.nl()
self.doc.text.addElement(P(text=model.objective_title, \
stylename="ResumeH1"))
self.nl()
for objective in model.objectives:
self.doc.text.addElement(P(text=objective, stylename="ResumeText"))
self.nl()
self.doc.text.addElement(P(text=model.skillarea_title, \
stylename="ResumeH1"))
self.nl()
for i in range(0, len(model.skillset_titles)):
skillset_line = P(text="")
skillset_line.addElement(Span(text=model.skillset_titles[i], \
stylename="ResumeBoldText"))
skillset_line.addElement(Span(text=": ", stylename="ResumeBoldText"))
skillset_line.addText(", ".join(model.skillsets[i]))
self.doc.text.addElement(skillset_line)
self.nl()
self.doc.text.addElement(P(text=model.jobs_title, stylename="ResumeH1"))
for i in range(0, len(model.job_titles)):
self.nl()
self.doc.text.addElement(P(text=model.job_titles[i], \
stylename="ResumeH2"))
self.doc.text.addElement(P(text=model.job_employers[i], \
stylename="ResumeH2"))
self.doc.text.addElement(P(text=model.job_descriptions[i], \
stylename="ResumeText"))
achievements_list = List(stylename="ResumeTextList")
for achievement in model.job_achievements[i]:
achievements_listitem = ListItem()
achievements_listitem.addElement(P(text=achievement, \
stylename="ResumeText"))
achievements_list.addElement(achievements_listitem)
self.doc.text.addElement(achievements_list)
self.nl()
self.doc.text.addElement(P(text=model.academics_title, \
stylename="ResumeH1"))
academics_list = List(stylename="ResumeTextList")
for academic in model.academics:
academics_listitem = ListItem()
academics_listitem.addElement(P(text=academic, stylename="ResumeText"))
academics_list.addElement(academics_listitem)
self.doc.text.addElement(academics_list)
self.nl()
self.doc.text.addElement(P(text=model.awards_title, stylename="ResumeH1"))
awards_list = List(stylename="ResumeTextList")
for award in model.awards:
awards_listitem = ListItem()
awards_listitem.addElement(P(text=award, stylename="ResumeText"))
awards_list.addElement(awards_listitem)
self.doc.text.addElement(awards_list)
self.nl()
def nl(self):
self.doc.text.addElement(P(text="\n", stylename="ResumeText"))
def close(self):
self.doc.save(self.filename)
def usage(msg=None):
if msg:
print "ERROR: %s" % (msg)
print "Usage: %s -i input.xml -o output_file [-t target]" % (sys.argv[0])
print "OPTIONS:"
print "-i | --input : input resume.xml file"
print "-o | --output : output file name. Suffix dictates output format"
print " : supported formats (txt, odt)"
print "-t | --target : filters elements for target if specified"
print " : (optional, default is None)"
print "-h | --help : print this message"
sys.exit(2)
def get_writer(output):
output_format = output.split(".")[-1:][0]
if output_format == "txt":
return TextResumeWriter(output)
elif output_format == "odt":
return OdfResumeWriter(output)
else:
return None
def main():
try:
(opts, args) = getopt.getopt(sys.argv[1:], "i:o:t:h",
["input", "output", "target", "help"])
except:
usage()
if len(opts) == 0:
usage()
target = None
for opt in opts:
(key, value) = opt
if key in ("-h", "--help"):
usage()
elif key in ("-i", "--input"):
input = value
elif key in ("-o", "--output"):
output = value
elif key in ("-t", "--target"):
target = value
if input is None or output is None:
usage("Input and Output is mandatory")
writer = get_writer(output)
if writer is None:
usage("Unsupported output format")
parser = XmlResumeParser(input, target)
parser.parse()
writer.write(parser.model)
parser.close()
writer.close()
if __name__ == "__main__":
main()
|
You call this from the command line using something like this:
| sujit@cyclone:resume$ ./genresume.py --input your_resume.xml \
--output your_resume.[txt|odt] \
[--target="target1+target2+...|target1,target2,..."]
|
Specifying an output file with suffix .txt will create a text version of the resume (suitable for dropping into the body of an email as mentioned above), and specifying an .odt suffix will create an OpenOffice text document. I had initially meant for the text version to go away once I was done, but then found that OpenOffice does not do the ODT to text conversion correctly (it misses the bullets in list items).
The behavior of the target attribute is similar to that in XmlResume. Multiple targets can be specified, separated by plus or comma. If the separator is plus, all targets must be declared in the element for it to pass through the filter (AND filtering). If the separator is comma, any one of the targets needs to be declared in the XmlResume element for it to pass through the filter (OR filtering). In addition, elements with no target attribute are always passed through the filter.
One caveat - this is not a generic solution. That is, if you were planning on running this script against your own XmlResume XML resume, it very likely won't work the way you'd expect. While I have tried to model my own resume on others that I have seen in my industry (thereby making it somewhat standards-compliant), it is quite possible that your resume contains extra information or elements that I don't need and haven't handled. But if you know a bit of Python, it should be fairly easy to modify this script to come up with something that works for you.
For reference (to match up with the parsing code above), here is a RELAX-NG like definition of the portion of the XmlResume schema that I have used in my resume.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 | resume {
header {
naem { firstname, surname },
address { street, city, state, zip },
contact { phone, email, * }
},
objective {
@title, para+
},
skillarea {
@title,
skillset {
@title,
skill { @level }+
}+
},
history {
@title,
job {
jobtitle, employer, period {
from { date { year, month, day } },
to { present | date { month, year, day } }
},
employer,
description,
achievements { achievement+ }
}+
},
academics {
@title,
degrees {
degree { level, major, institution }+
}
},
awards {
@title,
award { title, organization, date { year } }+
}
}
|
Programming challenges wise, my elementtree knowledge was quite rusty and I hadn't used odfpy before this, but there are enough examples on the web to get you started on either one. I tried implementing a pure event based parsing approach initially, since I needed this to allow for filtering on any element that had the target attribute, but then settled on a hybrid approach where I parse all elements in a generic way, but save the text and attribtues off some of them into a model bean, which is then written out in a specific format by the text and ODT writers. This approach makes it easier to maintain and extend the functionality (at least for me).
Just writing out the text into the ODT using odfpy was fairly simple, but it took me a while to get the formatting (font size, bolding, centering, etc) right. The API documentation supplied with the odfpy distribution is not very useful. I ultimately wrote out an unformatted document, manually applied formatting to it, unzipped the resulting ODT file and pored through the style.xml and content.xml to find the correct parameters to pass to the various odfpy functions. Once you know what to pass it, though, it works like a charm.
I believe the solution I have now works better for me than just using XmlResume. For one, (like XmlResume) it allows me to maintain a single XML file for my resume, with information targeted to different job groups or industries I might be interested in. Second, by using OpenOffice as an intermediate output, it gives me the option of automatically writing it out into multiple formats, some of which are either not possible (MS Word) or difficult (PDF) with XmlResume. Third, as a nice side effect, it also supports an email friendly text format. Fourth, it offers a simple command line interface without any additional effort. The only downside is the need to modify the code if and when I decide to add more elements into my source XML file or modify the output format.
Update 2011-11-16: Found Serna Free, a nice XML Editor from Syntext while looking for something to view larger and more complex compacted XML files at work. These files were malformed so I could not use either Firefox/Chrome or my Python xmlcat script on it, but Serna opened it without problems. It also provides a stylesheet for XmlResume. Binaries are available for (at least) Linux and Mac OSX. Just putting it here in case its useful, I will probably continue to hand-edit mine using vim.