Salmon Run: Resume Management with XmlResume, Python and OpenOffice

Couple weeks ago, aimlessly surfing the web (a relatively rare occurrence for me nowadays, thanks to Google), I came across someone's resume whose format I liked - at the bottom of the page, it said "Generated by XmlResume". That got me curious about what it was and how it could help me, so I decided to check it out.

With XmlResume, you write your resume out once in a standard XML format, and XmlResume can parse this XML into plain text, HTML and PDF. You can also filter out specific sections of the resume by setting an optional target attribute to any of the elements in the XML. So simple and elegant, yet such a powerful idea.

The last time I was looking for a job, the trend was to send out plain text resumes, which you would drop into the body of an email. Before that, it was PDF attachments. Apparently the trend now is to send them out as Microsoft Word attachments. Sadly XmlResume cannot write out MS-Word docs, and even the text format it writes is not exactly what I am used to (its formatted with margins and newlines to look almost like a Word or PDF doc, requiring extensive reformatting if I decided to send it in the body of an email).

I did take a quick look at the code, but decided it would be too much work to modify it to suit my requirements (MS-Word and email friendly text output). Thinking about this some more, I figured that if I could convert the XML into an OpenOffice text document (ODT), OpenOffice could then convert the ODT into a multitude of formats, including plain text, XHTML, PDF and Microsoft Word formats.

Since XmlResume is a Java application, I initially thought about adding this as an extension to it using the jOpenDocument library, but then found the odfpy Python library. Both of these are wrappers to write your content out into the OpenDocument format (ODF), which is basically just a zipped set of XML files. Since this was something that I would want to just run from the command line, writing the whole thing in Python seemed to be a simpler alternative than messing with Ant targets or shell script wrappers.

So I wrote a little Python script that parses the input XmlResume XML file into a bean using the XML parsing library elementtree, then converting the bean to either a plain text document (initially for testing) using plain file.write() calls or to an OpenOffice text document (.odt) using odfpy. Here it is:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys

from elementtree.ElementTree import parse
import getopt
from odf.opendocument import OpenDocumentText
from odf.style import FontFace
from odf.style import ListLevelProperties
from odf.style import ParagraphProperties
from odf.style import Style
from odf.style import TextProperties
from odf.text import List
from odf.text import ListItem
from odf.text import ListLevelStyleBullet
from odf.text import ListStyle
from odf.text import P
from odf.text import Span
import string


class ResumeModel:

  def __init__(self):
    self.name = None
    self.address = None
    self.phone = None
    self.email = None
    self.contacts = []
    self.objective_title = None
    self.objectives = []
    self.skillarea_title = None
    self.skillset_titles = []
    self.skillsets = [[]]
    self.jobs_title = None
    self.job_titles = []
    self.job_employers = []
    self.job_periods = []
    self.job_descriptions = []
    self.job_achievements = [[]]
    self.academics_title = None
    self.academics = []
    self.awards_title = None
    self.awards = []

  def to_string(self):
    print "name=", self.name
    print "address=", self.address
    print "phone=", self.phone
    print "email=", self.email
    for contact in self.contacts:
      print "contact=", contact
    print "objective_title", self.objective_title
    for objective in self.objectives:
      print "objective=", objective
    print "skillarea_title=", self.skillarea_title
    for skillset_title in self.skillset_titles:
      print "skillset_title:", skillset_title
    for skillset in self.skillsets:
      print "skillset=", ",".join(skillset)
    print "jobs_title=", self.jobs_title
    for job_title in self.job_titles:
      print "job_title=", job_title
    for job_employer in self.job_employers:
      print "job_employer=", job_employer
    for job_description in self.job_descriptions:
      print "job_description=", job_description
    for job_period in self.job_periods:
      print "job_period=", job_period
    for job_achievement in self.job_achievements:
      for job_achievement_item in job_achievement:
        print "achievement_item=", job_achievement_item
    print "academics_title", self.academics_title
    for academic in self.academics:
      print "academic=", academic
    print "awards_title=", self.awards_title
    for award in self.awards:
      print "award=", award


class XmlResumeParser():

  def __init__(self, input_file, target):
    self.target = target
    self.input = open(input_file, "r")
    self.root = parse(input_file).getroot()
    self.breadcrumb = []
    self.model = ResumeModel()
    self.skillset_idx = -1
    self.job_idx = -1
    self.degree_idx = -1
    self.award_idx = -1

  def close(self):
    self.input.close()

  def parse(self):
    self.parse_r(self.root)
    
  def parse_r(self, parent):
    if not self.process_target(parent, self.target):
      return
    self.breadcrumb.append(parent.tag)
    self.process_element(parent)
    for child in list(parent):
      self.parse_r(child)
    self.breadcrumb.pop()

  def process_target(self, parent, target):
    target_attr = parent.attrib.get("target")
    if target is None:
      if target_attr is None:
        return True
      else:
        return False
    else:
      if target_attr is None:
        return True
      else:
        if target.find('+') > -1 or target.find(',') > -1:
          op = "+" if target.find('+') > -1 else ","
          target_set = set(target.split(op))
          target_attr_set = set(target_attr.split(op))
          if target.find('+') > -1:
            return True if len(target_set.intersection(target_attr_set)) \
              == len(target_set) else False
          else:
            return True if len(target_set.intersection(target_attr_set)) > 0 \
              else False
        else:
          return True if target_attr == target else False

  def process_element(self, elem):
    key = "/".join(self.breadcrumb)
    tag = elem.tag
    last_tag = self.breadcrumb[-1:][0]
    if key.startswith("resume/header/name/"):
      self.model.name = self.append(self.model.name, elem.text)
    elif key.startswith("resume/header/address/"):
      if tag == "street":
        self.model.address = elem.text
      elif tag == "city" or tag == "state":
        self.model.address = self.append(self.model.address, elem.text, ", ")
      elif tag == "zip":
        self.model.address = self.append(self.model.address, elem.text, " ")
    elif key.startswith("resume/header/contact/"):
      if tag == "phone":
        self.model.phone = "PHONE: " + elem.text
      elif tag == "email":
        self.model.email = "EMAIL: " + elem.text
      else:
        self.model.contacts.append(string.upper(elem.tag) + ": " + elem.text)
    elif key == "resume/objective":
      self.model.objective_title = self.get_title(elem)
    elif key.startswith("resume/objective/"):
      self.model.objectives.append(elem.text)
    elif key == "resume/skillarea":
      self.model.skillarea_title = self.get_title(elem)
    elif key == "resume/skillarea/skillset":
      self.skillset_idx = self.skillset_idx + 1
      self.model.skillset_titles.append(self.get_title(elem))
      self.model.skillsets.append([])
    elif key == "resume/skillarea/skillset/skill":
      if elem.attrib.get("level") != None:
        self.model.skillsets[self.skillset_idx].append(elem.text +
          " (" + elem.attrib.get("level") + ")")
      else:
        self.model.skillsets[self.skillset_idx].append(elem.text)
    elif key == "resume/history":
      self.model.jobs_title = self.get_title(elem)
    elif key == "resume/history/job":
      self.job_idx = self.job_idx + 1
      self.model.job_achievements.append([])
    elif key.startswith("resume/history/job/"):
      if tag == "jobtitle":
        self.model.job_titles.append(elem.text)
      elif tag == "employer":
        self.model.job_employers.append(elem.text)
      elif tag == "from":
        if len(list(elem)) == 1:
          date_from = self.format_date(list(elem)[0])
          self.model.job_employers[self.job_idx] = \
            self.model.job_employers[self.job_idx] + " (" + date_from
      elif tag == "to":
        if len(list(elem)) == 1:
          date_to = self.format_date(list(elem)[0])
          self.model.job_employers[self.job_idx] = \
            self.model.job_employers[self.job_idx] + " - " + date_to + ")"
      elif tag == "description":
        self.model.job_descriptions.append(elem.text)
      elif tag == "achievement":
        self.model.job_achievements[self.job_idx].append(elem.text)
    elif key == "resume/academics":
      self.model.academics_title = self.get_title(elem)
    elif key == "resume/academics/degrees/degree":
      self.degree_idx = self.degree_idx + 1
      self.model.academics.append([])
    elif key.startswith("resume/academics/degrees/degree/"):
      if tag == "level":
        self.model.academics[self.degree_idx] = elem.text
      elif tag == "major":
        self.model.academics[self.degree_idx] = \
          self.model.academics[self.degree_idx] + ", " + elem.text
      elif tag == "institution":
        self.model.academics[self.degree_idx] = \
          self.model.academics[self.degree_idx] + " from " + elem.text
      elif tag == "from":
        if len(list(elem) == 1):
          from_date = self.format_date(list(elem)[0])
          self.model.academics[self.degree_idx] = \
            self.model.academics[self.degree_idx] + " (" + elem.text
      elif tag == "to":
        if len(list(elem) == 1):
          to_date = self.format_date(list(elem)[0])
          self.model.academics[self.degree_idx] = \
            self.model.academics[self.degree_idx] + " - " + elem.text + ")"
    elif key == "resume/awards":
      self.model.awards_title = self.get_title(elem)
    elif key == "resume/awards/award":
      self.award_idx = self.award_idx + 1
      self.model.awards.append([])
    elif key.startswith("resume/awards/award/"):
      if tag == "title":
        self.model.awards[self.award_idx] = elem.text
      elif tag == "organization":
        self.model.awards[self.award_idx] = \
          self.model.awards[self.award_idx] + " from " + elem.text
      elif tag == "date":
        award_date = self.format_date(elem)
        self.model.awards[self.award_idx] = \
          self.model.awards[self.award_idx] + " (" + award_date + ")"

  def format_date(self, elem):
    if elem.tag != "date":
      return elem.tag
    dmy = ["", "", ""]
    for child in list(elem):
      if child.tag == "day":
        dmy[0] = child.text
      elif child.tag == "month":
        dmy[1] = child.text
      elif child.tag == "year":
        dmy[2] = child.text
      else:
        continue
    filtered_dmy = filter(lambda e : len(e) > 0, dmy)
    if len(filtered_dmy) > 0:
      return " ".join(filtered_dmy)

  def get_title(self, elem):
    title = elem.attrib.get("title")
    if title is None:
      return string.upper(elem.tag)
    else:
      return title

  def append(self, buf, str, sep=" "):
    if buf == None:
      buf = str
    else:
      buf = buf + sep + str
    return buf


class TextResumeWriter():

  def __init__(self, filename):
    self.file = open(filename, 'w')

  def write(self, model):
    self.writeln(model.name)
    self.writeln(model.address)
    self.writeln(", ".join([model.phone, model.email]))
    self.writeln(", ".join(model.contacts))
    self.writeln("-" * 80)
    self.writeln(model.objective_title)
    self.writeln()
    self.writeln("\n".join(model.objectives))
    self.writeln("-" * 80)
    self.writeln(model.skillarea_title)
    self.writeln()
    for i in range(0, len(model.skillset_titles)):
      self.writeln(model.skillset_titles[i] + ": " + ",".join(model.skillsets[i]))
    self.writeln("-" * 80)
    self.writeln(model.jobs_title)
    for i in range(0, len(model.job_titles)):
      self.writeln()
      self.writeln(model.job_titles[i])
      self.writeln(model.job_employers[i])
      self.writeln(model.job_descriptions[i])
      for achievement in model.job_achievements[i]:
        self.writeln("* " + achievement)
    self.writeln("-" * 80)
    self.writeln(model.academics_title)
    self.writeln()
    for academic in model.academics:
      self.writeln("* " + academic)
    self.writeln("-" * 80)
    self.writeln(model.awards_title)
    self.writeln()
    for award in model.awards:
      self.writeln("* " + award)
      
  def writeln(self, s=None):
    if s != None:
      self.file.write(s)
    self.file.write("\n")
    
  def close(self):
    self.file.close()


class OdfResumeWriter():

  def __init__(self, filename):
    self.filename = filename
    self.doc = OpenDocumentText()
    # font
    self.doc.fontfacedecls.addElement((FontFace(name="Arial", \
      fontfamily="Arial", fontsize="10", fontpitch="variable", \
      fontfamilygeneric="swiss")))
    # styles
    style_standard = Style(name="Standard", family="paragraph", \
      attributes={"class":"text"})
    style_standard.addElement(ParagraphProperties(punctuationwrap="hanging", \
      writingmode="page", linebreak="strict"))
    style_standard.addElement(TextProperties(fontname="Arial", \
      fontsize="10pt", fontsizecomplex="10pt", fontsizeasian="10pt"))
    self.doc.styles.addElement(style_standard)
    # automatic styles
    style_normal = Style(name="ResumeText", parentstylename="Standard", \
        family="paragraph")
    self.doc.automaticstyles.addElement(style_normal)

    style_bold_text = Style(name="ResumeBoldText", parentstylename="Standard", \
        family="text")
    style_bold_text.addElement(TextProperties(fontweight="bold", \
      fontweightasian="bold", fontweightcomplex="bold"))
    self.doc.automaticstyles.addElement(style_bold_text)

    style_list_text = ListStyle(name="ResumeListText")
    style_list_bullet = ListLevelStyleBullet(level="1", \
      stylename="ResumeListTextBullet", numsuffix=".", bulletchar=u'\u2022')
    style_list_bullet.addElement(ListLevelProperties(spacebefore="0.1in", \
      minlabelwidth="0.2in"))
    style_list_text.addElement(style_list_bullet)
    self.doc.automaticstyles.addElement(style_list_text)

    style_bold_para = Style(name="ResumeH2", parentstylename="Standard", \
      family="paragraph")
    style_bold_para.addElement(TextProperties(fontweight="bold", \
      fontweightasian="bold", fontweightcomplex="bold"))
    self.doc.automaticstyles.addElement(style_bold_para)

    style_bold_center = Style(name="ResumeH1", parentstylename="Standard", \
        family="paragraph")
    style_bold_center.addElement(TextProperties(fontweight="bold", \
      fontweightasian="bold", fontweightcomplex="bold"))
    style_bold_center.addElement(ParagraphProperties(textalign="center"))
    self.doc.automaticstyles.addElement(style_bold_center)

  def write(self, model):
    self.doc.text.addElement(P(text=model.name, stylename="ResumeH1"))
    self.doc.text.addElement(P(text=model.address, stylename="ResumeH1"))
    self.doc.text.addElement(P(text=", ".join([model.phone, model.email]), \
      stylename="ResumeH1"))
    for contact in model.contacts:
      self.doc.text.addElement(P(text=contact, stylename="ResumeH1"))
    self.nl()
    self.doc.text.addElement(P(text=model.objective_title, \
      stylename="ResumeH1"))
    self.nl()
    for objective in model.objectives:
      self.doc.text.addElement(P(text=objective, stylename="ResumeText"))
    self.nl()
    self.doc.text.addElement(P(text=model.skillarea_title, \
      stylename="ResumeH1"))
    self.nl()
    for i in range(0, len(model.skillset_titles)):
      skillset_line = P(text="")
      skillset_line.addElement(Span(text=model.skillset_titles[i], \
        stylename="ResumeBoldText"))
      skillset_line.addElement(Span(text=": ", stylename="ResumeBoldText"))
      skillset_line.addText(", ".join(model.skillsets[i]))
      self.doc.text.addElement(skillset_line)
    self.nl()
    self.doc.text.addElement(P(text=model.jobs_title, stylename="ResumeH1"))
    for i in range(0, len(model.job_titles)):
      self.nl()
      self.doc.text.addElement(P(text=model.job_titles[i], \
        stylename="ResumeH2"))
      self.doc.text.addElement(P(text=model.job_employers[i], \
        stylename="ResumeH2"))
      self.doc.text.addElement(P(text=model.job_descriptions[i], \
        stylename="ResumeText"))
      achievements_list = List(stylename="ResumeTextList")
      for achievement in model.job_achievements[i]:
        achievements_listitem = ListItem()
        achievements_listitem.addElement(P(text=achievement, \
          stylename="ResumeText"))
        achievements_list.addElement(achievements_listitem)
      self.doc.text.addElement(achievements_list)
    self.nl()
    self.doc.text.addElement(P(text=model.academics_title, \
      stylename="ResumeH1"))
    academics_list = List(stylename="ResumeTextList")
    for academic in model.academics:
      academics_listitem = ListItem()
      academics_listitem.addElement(P(text=academic, stylename="ResumeText"))
      academics_list.addElement(academics_listitem)
    self.doc.text.addElement(academics_list)
    self.nl()
    self.doc.text.addElement(P(text=model.awards_title, stylename="ResumeH1"))
    awards_list = List(stylename="ResumeTextList")
    for award in model.awards:
      awards_listitem = ListItem()
      awards_listitem.addElement(P(text=award, stylename="ResumeText"))
      awards_list.addElement(awards_listitem)
    self.doc.text.addElement(awards_list)
    self.nl()

  def nl(self):
    self.doc.text.addElement(P(text="\n", stylename="ResumeText"))

  def close(self):
    self.doc.save(self.filename)


def usage(msg=None):
  if msg:
    print "ERROR: %s" % (msg)
  print "Usage: %s -i input.xml -o output_file [-t target]" % (sys.argv[0])
  print "OPTIONS:"
  print "-i | --input  : input resume.xml file"
  print "-o | --output : output file name. Suffix dictates output format"
  print "              : supported formats (txt, odt)"
  print "-t | --target : filters elements for target if specified"
  print "              : (optional, default is None)"
  print "-h | --help   : print this message"
  sys.exit(2)

def get_writer(output):
  output_format = output.split(".")[-1:][0]
  if output_format == "txt":
    return TextResumeWriter(output)
  elif output_format == "odt":
    return OdfResumeWriter(output)
  else:
    return None

def main():
  try:
    (opts, args) = getopt.getopt(sys.argv[1:], "i:o:t:h",
      ["input", "output", "target", "help"])
  except:
    usage()
  if len(opts) == 0:
    usage()
  target = None
  for opt in opts:
    (key, value) = opt
    if key in ("-h", "--help"):
      usage()
    elif key in ("-i", "--input"):
      input = value
    elif key in ("-o", "--output"):
      output = value
    elif key in ("-t", "--target"):
      target = value
  if input is None or output is None:
    usage("Input and Output is mandatory")
  writer = get_writer(output)
  if writer is None:
    usage("Unsupported output format")
  parser = XmlResumeParser(input, target)
  parser.parse()
  writer.write(parser.model)
  parser.close()
  writer.close()

if __name__ == "__main__":
  main()

You call this from the command line using something like this:

1
2
3

sujit@cyclone:resume$ ./genresume.py --input your_resume.xml \
    --output your_resume.[txt|odt] \
    [--target="target1+target2+...|target1,target2,..."]

Specifying an output file with suffix .txt will create a text version of the resume (suitable for dropping into the body of an email as mentioned above), and specifying an .odt suffix will create an OpenOffice text document. I had initially meant for the text version to go away once I was done, but then found that OpenOffice does not do the ODT to text conversion correctly (it misses the bullets in list items).

The behavior of the target attribute is similar to that in XmlResume. Multiple targets can be specified, separated by plus or comma. If the separator is plus, all targets must be declared in the element for it to pass through the filter (AND filtering). If the separator is comma, any one of the targets needs to be declared in the XmlResume element for it to pass through the filter (OR filtering). In addition, elements with no target attribute are always passed through the filter.

One caveat - this is not a generic solution. That is, if you were planning on running this script against your own XmlResume XML resume, it very likely won't work the way you'd expect. While I have tried to model my own resume on others that I have seen in my industry (thereby making it somewhat standards-compliant), it is quite possible that your resume contains extra information or elements that I don't need and haven't handled. But if you know a bit of Python, it should be fairly easy to modify this script to come up with something that works for you.

For reference (to match up with the parsing code above), here is a RELAX-NG like definition of the portion of the XmlResume schema that I have used in my resume.

resume {
  header {
    naem { firstname, surname },
    address { street, city, state, zip },
    contact { phone, email, * }
  },
  objective {
    @title, para+
  },
  skillarea {
    @title,
    skillset { 
      @title, 
      skill { @level }+ 
    }+
  },
  history {
    @title,
    job {
      jobtitle, employer, period {
        from { date { year, month, day } },
        to { present | date { month, year, day } }
      },
      employer,
      description,
      achievements { achievement+ }
    }+
  },
  academics {
    @title,
    degrees {
      degree { level, major, institution }+
    }
  },
  awards {
    @title,
    award { title, organization, date { year } }+
  }
}

Programming challenges wise, my elementtree knowledge was quite rusty and I hadn't used odfpy before this, but there are enough examples on the web to get you started on either one. I tried implementing a pure event based parsing approach initially, since I needed this to allow for filtering on any element that had the target attribute, but then settled on a hybrid approach where I parse all elements in a generic way, but save the text and attribtues off some of them into a model bean, which is then written out in a specific format by the text and ODT writers. This approach makes it easier to maintain and extend the functionality (at least for me).

Just writing out the text into the ODT using odfpy was fairly simple, but it took me a while to get the formatting (font size, bolding, centering, etc) right. The API documentation supplied with the odfpy distribution is not very useful. I ultimately wrote out an unformatted document, manually applied formatting to it, unzipped the resulting ODT file and pored through the style.xml and content.xml to find the correct parameters to pass to the various odfpy functions. Once you know what to pass it, though, it works like a charm.

I believe the solution I have now works better for me than just using XmlResume. For one, (like XmlResume) it allows me to maintain a single XML file for my resume, with information targeted to different job groups or industries I might be interested in. Second, by using OpenOffice as an intermediate output, it gives me the option of automatically writing it out into multiple formats, some of which are either not possible (MS Word) or difficult (PDF) with XmlResume. Third, as a nice side effect, it also supports an email friendly text format. Fourth, it offers a simple command line interface without any additional effort. The only downside is the need to modify the code if and when I decide to add more elements into my source XML file or modify the output format.

Update 2011-11-16: Found Serna Free, a nice XML Editor from Syntext while looking for something to view larger and more complex compacted XML files at work. These files were malformed so I could not use either Firefox/Chrome or my Python xmlcat script on it, but Serna opened it without problems. It also provides a stylesheet for XmlResume. Binaries are available for (at least) Linux and Mac OSX. Just putting it here in case its useful, I will probably continue to hand-edit mine using vim.

1 comments (moderated to prevent spam):

Jason R. Coombs said...: I also have been modeling my resume use XmlResume for many years (since at least 2003). I use [Apache FOP](https://xmlgraphics.apache.org/fop/) and [my CherryPy web site](https://github.com/jaraco/jaraco.site/blob/71ee8587c186d68225959b939de7cd79dd1d8429/jaraco/site/resume.py#L28-L43) to live-render my resume to PDF and HTML.; 6/14/2024 12:55 PM

Salmon Run

Friday, November 04, 2011

Resume Management with XmlResume, Python and OpenOffice

1 comments (moderated to prevent spam):

Posts

Labels

Blogs I Read

About me

My Nerd Rating

Visitor Map

Contact Me