Couple weeks ago, aimlessly surfing the web (a relatively rare occurrence for me nowadays, thanks to Google), I came across someone's resume whose format I liked - at the bottom of the page, it said "Generated by XmlResume". That got me curious about what it was and how it could help me, so I decided to check it out.
With XmlResume, you write your resume out once in a standard XML format, and XmlResume can parse this XML into plain text, HTML and PDF. You can also filter out specific sections of the resume by setting an optional target attribute to any of the elements in the XML. So simple and elegant, yet such a powerful idea.
The last time I was looking for a job, the trend was to send out plain text resumes, which you would drop into the body of an email. Before that, it was PDF attachments. Apparently the trend now is to send them out as Microsoft Word attachments. Sadly XmlResume cannot write out MS-Word docs, and even the text format it writes is not exactly what I am used to (its formatted with margins and newlines to look almost like a Word or PDF doc, requiring extensive reformatting if I decided to send it in the body of an email).
I did take a quick look at the code, but decided it would be too much work to modify it to suit my requirements (MS-Word and email friendly text output). Thinking about this some more, I figured that if I could convert the XML into an OpenOffice text document (ODT), OpenOffice could then convert the ODT into a multitude of formats, including plain text, XHTML, PDF and Microsoft Word formats.
Since XmlResume is a Java application, I initially thought about adding this as an extension to it using the jOpenDocument library, but then found the odfpy Python library. Both of these are wrappers to write your content out into the OpenDocument format (ODF), which is basically just a zipped set of XML files. Since this was something that I would want to just run from the command line, writing the whole thing in Python seemed to be a simpler alternative than messing with Ant targets or shell script wrappers.
So I wrote a little Python script that parses the input XmlResume XML file into a bean using the XML parsing library elementtree, then converting the bean to either a plain text document (initially for testing) using plain file.write() calls or to an OpenOffice text document (.odt) using odfpy. Here it is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 | #!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
from elementtree.ElementTree import parse
import getopt
from odf.opendocument import OpenDocumentText
from odf.style import FontFace
from odf.style import ListLevelProperties
from odf.style import ParagraphProperties
from odf.style import Style
from odf.style import TextProperties
from odf.text import List
from odf.text import ListItem
from odf.text import ListLevelStyleBullet
from odf.text import ListStyle
from odf.text import P
from odf.text import Span
import string
class ResumeModel:
def __init__(self):
self.name = None
self.address = None
self.phone = None
self.email = None
self.contacts = []
self.objective_title = None
self.objectives = []
self.skillarea_title = None
self.skillset_titles = []
self.skillsets = [[]]
self.jobs_title = None
self.job_titles = []
self.job_employers = []
self.job_periods = []
self.job_descriptions = []
self.job_achievements = [[]]
self.academics_title = None
self.academics = []
self.awards_title = None
self.awards = []
def to_string(self):
print "name=", self.name
print "address=", self.address
print "phone=", self.phone
print "email=", self.email
for contact in self.contacts:
print "contact=", contact
print "objective_title", self.objective_title
for objective in self.objectives:
print "objective=", objective
print "skillarea_title=", self.skillarea_title
for skillset_title in self.skillset_titles:
print "skillset_title:", skillset_title
for skillset in self.skillsets:
print "skillset=", ",".join(skillset)
print "jobs_title=", self.jobs_title
for job_title in self.job_titles:
print "job_title=", job_title
for job_employer in self.job_employers:
print "job_employer=", job_employer
for job_description in self.job_descriptions:
print "job_description=", job_description
for job_period in self.job_periods:
print "job_period=", job_period
for job_achievement in self.job_achievements:
for job_achievement_item in job_achievement:
print "achievement_item=", job_achievement_item
print "academics_title", self.academics_title
for academic in self.academics:
print "academic=", academic
print "awards_title=", self.awards_title
for award in self.awards:
print "award=", award
class XmlResumeParser():
def __init__(self, input_file, target):
self.target = target
self.input = open(input_file, "r")
self.root = parse(input_file).getroot()
self.breadcrumb = []
self.model = ResumeModel()
self.skillset_idx = -1
self.job_idx = -1
self.degree_idx = -1
self.award_idx = -1
def close(self):
self.input.close()
def parse(self):
self.parse_r(self.root)
def parse_r(self, parent):
if not self.process_target(parent, self.target):
return
self.breadcrumb.append(parent.tag)
self.process_element(parent)
for child in list(parent):
self.parse_r(child)
self.breadcrumb.pop()
def process_target(self, parent, target):
target_attr = parent.attrib.get("target")
if target is None:
if target_attr is None:
return True
else:
return False
else:
if target_attr is None:
return True
else:
if target.find('+') > -1 or target.find(',') > -1:
op = "+" if target.find('+') > -1 else ","
target_set = set(target.split(op))
target_attr_set = set(target_attr.split(op))
if target.find('+') > -1:
return True if len(target_set.intersection(target_attr_set)) \
== len(target_set) else False
else:
return True if len(target_set.intersection(target_attr_set)) > 0 \
else False
else:
return True if target_attr == target else False
def process_element(self, elem):
key = "/".join(self.breadcrumb)
tag = elem.tag
last_tag = self.breadcrumb[-1:][0]
if key.startswith("resume/header/name/"):
self.model.name = self.append(self.model.name, elem.text)
elif key.startswith("resume/header/address/"):
if tag == "street":
self.model.address = elem.text
elif tag == "city" or tag == "state":
self.model.address = self.append(self.model.address, elem.text, ", ")
elif tag == "zip":
self.model.address = self.append(self.model.address, elem.text, " ")
elif key.startswith("resume/header/contact/"):
if tag == "phone":
self.model.phone = "PHONE: " + elem.text
elif tag == "email":
self.model.email = "EMAIL: " + elem.text
else:
self.model.contacts.append(string.upper(elem.tag) + ": " + elem.text)
elif key == "resume/objective":
self.model.objective_title = self.get_title(elem)
elif key.startswith("resume/objective/"):
self.model.objectives.append(elem.text)
elif key == "resume/skillarea":
self.model.skillarea_title = self.get_title(elem)
elif key == "resume/skillarea/skillset":
self.skillset_idx = self.skillset_idx + 1
self.model.skillset_titles.append(self.get_title(elem))
self.model.skillsets.append([])
elif key == "resume/skillarea/skillset/skill":
if elem.attrib.get("level") != None:
self.model.skillsets[self.skillset_idx].append(elem.text +
" (" + elem.attrib.get("level") + ")")
else:
self.model.skillsets[self.skillset_idx].append(elem.text)
elif key == "resume/history":
self.model.jobs_title = self.get_title(elem)
elif key == "resume/history/job":
self.job_idx = self.job_idx + 1
self.model.job_achievements.append([])
elif key.startswith("resume/history/job/"):
if tag == "jobtitle":
self.model.job_titles.append(elem.text)
elif tag == "employer":
self.model.job_employers.append(elem.text)
elif tag == "from":
if len(list(elem)) == 1:
date_from = self.format_date(list(elem)[0])
self.model.job_employers[self.job_idx] = \
self.model.job_employers[self.job_idx] + " (" + date_from
elif tag == "to":
if len(list(elem)) == 1:
date_to = self.format_date(list(elem)[0])
self.model.job_employers[self.job_idx] = \
self.model.job_employers[self.job_idx] + " - " + date_to + ")"
elif tag == "description":
self.model.job_descriptions.append(elem.text)
elif tag == "achievement":
self.model.job_achievements[self.job_idx].append(elem.text)
elif key == "resume/academics":
self.model.academics_title = self.get_title(elem)
elif key == "resume/academics/degrees/degree":
self.degree_idx = self.degree_idx + 1
self.model.academics.append([])
elif key.startswith("resume/academics/degrees/degree/"):
if tag == "level":
self.model.academics[self.degree_idx] = elem.text
elif tag == "major":
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + ", " + elem.text
elif tag == "institution":
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + " from " + elem.text
elif tag == "from":
if len(list(elem) == 1):
from_date = self.format_date(list(elem)[0])
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + " (" + elem.text
elif tag == "to":
if len(list(elem) == 1):
to_date = self.format_date(list(elem)[0])
self.model.academics[self.degree_idx] = \
self.model.academics[self.degree_idx] + " - " + elem.text + ")"
elif key == "resume/awards":
self.model.awards_title = self.get_title(elem)
elif key == "resume/awards/award":
self.award_idx = self.award_idx + 1
self.model.awards.append([])
elif key.startswith("resume/awards/award/"):
if tag == "title":
self.model.awards[self.award_idx] = elem.text
elif tag == "organization":
self.model.awards[self.award_idx] = \
self.model.awards[self.award_idx] + " from " + elem.text
elif tag == "date":
award_date = self.format_date(elem)
self.model.awards[self.award_idx] = \
self.model.awards[self.award_idx] + " (" + award_date + ")"
def format_date(self, elem):
if elem.tag != "date":
return elem.tag
dmy = ["", "", ""]
for child in list(elem):
if child.tag == "day":
dmy[0] = child.text
elif child.tag == "month":
dmy[1] = child.text
elif child.tag == "year":
dmy[2] = child.text
else:
continue
filtered_dmy = filter(lambda e : len(e) > 0, dmy)
if len(filtered_dmy) > 0:
return " ".join(filtered_dmy)
def get_title(self, elem):
title = elem.attrib.get("title")
if title is None:
return string.upper(elem.tag)
else:
return title
def append(self, buf, str, sep=" "):
if buf == None:
buf = str
else:
buf = buf + sep + str
return buf
class TextResumeWriter():
def __init__(self, filename):
self.file = open(filename, 'w')
def write(self, model):
self.writeln(model.name)
self.writeln(model.address)
self.writeln(", ".join([model.phone, model.email]))
self.writeln(", ".join(model.contacts))
self.writeln("-" * 80)
self.writeln(model.objective_title)
self.writeln()
self.writeln("\n".join(model.objectives))
self.writeln("-" * 80)
self.writeln(model.skillarea_title)
self.writeln()
for i in range(0, len(model.skillset_titles)):
self.writeln(model.skillset_titles[i] + ": " + ",".join(model.skillsets[i]))
self.writeln("-" * 80)
self.writeln(model.jobs_title)
for i in range(0, len(model.job_titles)):
self.writeln()
self.writeln(model.job_titles[i])
self.writeln(model.job_employers[i])
self.writeln(model.job_descriptions[i])
for achievement in model.job_achievements[i]:
self.writeln("* " + achievement)
self.writeln("-" * 80)
self.writeln(model.academics_title)
self.writeln()
for academic in model.academics:
self.writeln("* " + academic)
self.writeln("-" * 80)
self.writeln(model.awards_title)
self.writeln()
for award in model.awards:
self.writeln("* " + award)
def writeln(self, s=None):
if s != None:
self.file.write(s)
self.file.write("\n")
def close(self):
self.file.close()
class OdfResumeWriter():
def __init__(self, filename):
self.filename = filename
self.doc = OpenDocumentText()
# font
self.doc.fontfacedecls.addElement((FontFace(name="Arial", \
fontfamily="Arial", fontsize="10", fontpitch="variable", \
fontfamilygeneric="swiss")))
# styles
style_standard = Style(name="Standard", family="paragraph", \
attributes={"class":"text"})
style_standard.addElement(ParagraphProperties(punctuationwrap="hanging", \
writingmode="page", linebreak="strict"))
style_standard.addElement(TextProperties(fontname="Arial", \
fontsize="10pt", fontsizecomplex="10pt", fontsizeasian="10pt"))
self.doc.styles.addElement(style_standard)
# automatic styles
style_normal = Style(name="ResumeText", parentstylename="Standard", \
family="paragraph")
self.doc.automaticstyles.addElement(style_normal)
style_bold_text = Style(name="ResumeBoldText", parentstylename="Standard", \
family="text")
style_bold_text.addElement(TextProperties(fontweight="bold", \
fontweightasian="bold", fontweightcomplex="bold"))
self.doc.automaticstyles.addElement(style_bold_text)
style_list_text = ListStyle(name="ResumeListText")
style_list_bullet = ListLevelStyleBullet(level="1", \
stylename="ResumeListTextBullet", numsuffix=".", bulletchar=u'\u2022')
style_list_bullet.addElement(ListLevelProperties(spacebefore="0.1in", \
minlabelwidth="0.2in"))
style_list_text.addElement(style_list_bullet)
self.doc.automaticstyles.addElement(style_list_text)
style_bold_para = Style(name="ResumeH2", parentstylename="Standard", \
family="paragraph")
style_bold_para.addElement(TextProperties(fontweight="bold", \
fontweightasian="bold", fontweightcomplex="bold"))
self.doc.automaticstyles.addElement(style_bold_para)
style_bold_center = Style(name="ResumeH1", parentstylename="Standard", \
family="paragraph")
style_bold_center.addElement(TextProperties(fontweight="bold", \
fontweightasian="bold", fontweightcomplex="bold"))
style_bold_center.addElement(ParagraphProperties(textalign="center"))
self.doc.automaticstyles.addElement(style_bold_center)
def write(self, model):
self.doc.text.addElement(P(text=model.name, stylename="ResumeH1"))
self.doc.text.addElement(P(text=model.address, stylename="ResumeH1"))
self.doc.text.addElement(P(text=", ".join([model.phone, model.email]), \
stylename="ResumeH1"))
for contact in model.contacts:
self.doc.text.addElement(P(text=contact, stylename="ResumeH1"))
self.nl()
self.doc.text.addElement(P(text=model.objective_title, \
stylename="ResumeH1"))
self.nl()
for objective in model.objectives:
self.doc.text.addElement(P(text=objective, stylename="ResumeText"))
self.nl()
self.doc.text.addElement(P(text=model.skillarea_title, \
stylename="ResumeH1"))
self.nl()
for i in range(0, len(model.skillset_titles)):
skillset_line = P(text="")
skillset_line.addElement(Span(text=model.skillset_titles[i], \
stylename="ResumeBoldText"))
skillset_line.addElement(Span(text=": ", stylename="ResumeBoldText"))
skillset_line.addText(", ".join(model.skillsets[i]))
self.doc.text.addElement(skillset_line)
self.nl()
self.doc.text.addElement(P(text=model.jobs_title, stylename="ResumeH1"))
for i in range(0, len(model.job_titles)):
self.nl()
self.doc.text.addElement(P(text=model.job_titles[i], \
stylename="ResumeH2"))
self.doc.text.addElement(P(text=model.job_employers[i], \
stylename="ResumeH2"))
self.doc.text.addElement(P(text=model.job_descriptions[i], \
stylename="ResumeText"))
achievements_list = List(stylename="ResumeTextList")
for achievement in model.job_achievements[i]:
achievements_listitem = ListItem()
achievements_listitem.addElement(P(text=achievement, \
stylename="ResumeText"))
achievements_list.addElement(achievements_listitem)
self.doc.text.addElement(achievements_list)
self.nl()
self.doc.text.addElement(P(text=model.academics_title, \
stylename="ResumeH1"))
academics_list = List(stylename="ResumeTextList")
for academic in model.academics:
academics_listitem = ListItem()
academics_listitem.addElement(P(text=academic, stylename="ResumeText"))
academics_list.addElement(academics_listitem)
self.doc.text.addElement(academics_list)
self.nl()
self.doc.text.addElement(P(text=model.awards_title, stylename="ResumeH1"))
awards_list = List(stylename="ResumeTextList")
for award in model.awards:
awards_listitem = ListItem()
awards_listitem.addElement(P(text=award, stylename="ResumeText"))
awards_list.addElement(awards_listitem)
self.doc.text.addElement(awards_list)
self.nl()
def nl(self):
self.doc.text.addElement(P(text="\n", stylename="ResumeText"))
def close(self):
self.doc.save(self.filename)
def usage(msg=None):
if msg:
print "ERROR: %s" % (msg)
print "Usage: %s -i input.xml -o output_file [-t target]" % (sys.argv[0])
print "OPTIONS:"
print "-i | --input : input resume.xml file"
print "-o | --output : output file name. Suffix dictates output format"
print " : supported formats (txt, odt)"
print "-t | --target : filters elements for target if specified"
print " : (optional, default is None)"
print "-h | --help : print this message"
sys.exit(2)
def get_writer(output):
output_format = output.split(".")[-1:][0]
if output_format == "txt":
return TextResumeWriter(output)
elif output_format == "odt":
return OdfResumeWriter(output)
else:
return None
def main():
try:
(opts, args) = getopt.getopt(sys.argv[1:], "i:o:t:h",
["input", "output", "target", "help"])
except:
usage()
if len(opts) == 0:
usage()
target = None
for opt in opts:
(key, value) = opt
if key in ("-h", "--help"):
usage()
elif key in ("-i", "--input"):
input = value
elif key in ("-o", "--output"):
output = value
elif key in ("-t", "--target"):
target = value
if input is None or output is None:
usage("Input and Output is mandatory")
writer = get_writer(output)
if writer is None:
usage("Unsupported output format")
parser = XmlResumeParser(input, target)
parser.parse()
writer.write(parser.model)
parser.close()
writer.close()
if __name__ == "__main__":
main()
|
You call this from the command line using something like this:
1 2 3 | sujit@cyclone:resume$ ./genresume.py --input your_resume.xml \
--output your_resume.[txt|odt] \
[--target="target1+target2+...|target1,target2,..."]
|
Specifying an output file with suffix .txt will create a text version of the resume (suitable for dropping into the body of an email as mentioned above), and specifying an .odt suffix will create an OpenOffice text document. I had initially meant for the text version to go away once I was done, but then found that OpenOffice does not do the ODT to text conversion correctly (it misses the bullets in list items).
The behavior of the target attribute is similar to that in XmlResume. Multiple targets can be specified, separated by plus or comma. If the separator is plus, all targets must be declared in the element for it to pass through the filter (AND filtering). If the separator is comma, any one of the targets needs to be declared in the XmlResume element for it to pass through the filter (OR filtering). In addition, elements with no target attribute are always passed through the filter.
One caveat - this is not a generic solution. That is, if you were planning on running this script against your own XmlResume XML resume, it very likely won't work the way you'd expect. While I have tried to model my own resume on others that I have seen in my industry (thereby making it somewhat standards-compliant), it is quite possible that your resume contains extra information or elements that I don't need and haven't handled. But if you know a bit of Python, it should be fairly easy to modify this script to come up with something that works for you.
For reference (to match up with the parsing code above), here is a RELAX-NG like definition of the portion of the XmlResume schema that I have used in my resume.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | resume {
header {
naem { firstname, surname },
address { street, city, state, zip },
contact { phone, email, * }
},
objective {
@title, para+
},
skillarea {
@title,
skillset {
@title,
skill { @level }+
}+
},
history {
@title,
job {
jobtitle, employer, period {
from { date { year, month, day } },
to { present | date { month, year, day } }
},
employer,
description,
achievements { achievement+ }
}+
},
academics {
@title,
degrees {
degree { level, major, institution }+
}
},
awards {
@title,
award { title, organization, date { year } }+
}
}
|
Programming challenges wise, my elementtree knowledge was quite rusty and I hadn't used odfpy before this, but there are enough examples on the web to get you started on either one. I tried implementing a pure event based parsing approach initially, since I needed this to allow for filtering on any element that had the target attribute, but then settled on a hybrid approach where I parse all elements in a generic way, but save the text and attribtues off some of them into a model bean, which is then written out in a specific format by the text and ODT writers. This approach makes it easier to maintain and extend the functionality (at least for me).
Just writing out the text into the ODT using odfpy was fairly simple, but it took me a while to get the formatting (font size, bolding, centering, etc) right. The API documentation supplied with the odfpy distribution is not very useful. I ultimately wrote out an unformatted document, manually applied formatting to it, unzipped the resulting ODT file and pored through the style.xml and content.xml to find the correct parameters to pass to the various odfpy functions. Once you know what to pass it, though, it works like a charm.
I believe the solution I have now works better for me than just using XmlResume. For one, (like XmlResume) it allows me to maintain a single XML file for my resume, with information targeted to different job groups or industries I might be interested in. Second, by using OpenOffice as an intermediate output, it gives me the option of automatically writing it out into multiple formats, some of which are either not possible (MS Word) or difficult (PDF) with XmlResume. Third, as a nice side effect, it also supports an email friendly text format. Fourth, it offers a simple command line interface without any additional effort. The only downside is the need to modify the code if and when I decide to add more elements into my source XML file or modify the output format.
Update 2011-11-16: Found Serna Free, a nice XML Editor from Syntext while looking for something to view larger and more complex compacted XML files at work. These files were malformed so I could not use either Firefox/Chrome or my Python xmlcat script on it, but Serna opened it without problems. It also provides a stylesheet for XmlResume. Binaries are available for (at least) Linux and Mac OSX. Just putting it here in case its useful, I will probably continue to hand-edit mine using vim.
1 comments (moderated to prevent spam):
I also have been modeling my resume use XmlResume for many years (since at least 2003). I use [Apache FOP](https://xmlgraphics.apache.org/fop/) and [my CherryPy web site](https://github.com/jaraco/jaraco.site/blob/71ee8587c186d68225959b939de7cd79dd1d8429/jaraco/site/resume.py#L28-L43) to live-render my resume to PDF and HTML.
Post a Comment