Salmon Run: Using SSH in Python with Paramiko

I recently had to modify a shell script that downloaded a bunch of files from a remote machine using scp, verify that the remote file was the "same as" the downloaded file (using a md5 checksum match on the remote and local files), and a few other things. I generally prefer using Python for my own scripting needs, but sadly Python hasn't caught on with our operations team, and since they were responsible for maintaining this particular script, it had to be a shell script. My shell scripting is a bit rusty, so the process was a bit rough ... and I kept asking myself how easy/hard it would be to do this with Python.

The shell script is run via a cron job, so it depends on passwordless ssh being set up between the two machines. During local testing, I did not have this set up, so another annoying thing was that I had to type in the password each time a ssh/scp call was made within the script. Of course, passwordless ssh is fairly easy to set up, but if you do this sort of thing infrequently (true in my case), its a bit of a pain to figure out each time.

I had never used SSH from within Python before - but I found this article an excellent tutorial on how to get started. You need to install paramiko (which needs PyCrypto). Once downloaded, installation is a matter of simply exploding the bundles and running "sudo python setup.py install" - the process was the same for both my Centos 5.x desktop and my Mac OSX laptop.

So in any case, to answer my own question, I ended up building a Python version of the same script. The resulting Python version is more verbose than the shell script, but it is more structured, has more optional features (so developers can specify command line options rather than have to hack the script to make it run in their local environment) and is much easier (for me anyway) to read.

There are some caveats though - first, paramiko supports the SFTP protocol but not the SCP protocol - so scp's have to be "faked" using the sftp client's get and put calls - not really a big deal, just something to be aware of, since sftp is a subsystem of ssh, and if you have sshd running on your remote machine, you automatically get sftp. The second caveat is that the sftp client does not support Unix wildcards, so you have to write application code to get each filename individually - this makes your application code more verbose.

Since the script is somewhat application specific, I decided to take the relevant parts and build them into a Python version of scp that optionally allows a password to be specified on the command line, and does an md5 checksum comparison between each remote and local file to verify the copy.

sujit@cyclone:unix$ ./ssh_copy.py --help
Usage: ssh_copy.py source_path target_path

Options:
  -h, --help            show this help message and exit
  -p PASSWORD, --password=PASSWORD
                        SSH password
  -v, --verify          Verify copy

Remote path should be specified as user@host:/full/path

The code for the script is shown below. The switch on the shebang line is to suppress deprecation warnings from PyCrypto saying that the random number generator in the current release was broken. The rest of the code is fairly self-explanatory, the paramiko SSH client setup/teardown are in the open_sshclient and close_sshclient methods, and the meat of the code is in copy_files.

#! /usr/bin/python -W ignore::DeprecationWarning
# Copy a directory of files from a remote to local machine

import commands
import os
import os.path
import paramiko
import re
import sys
from optparse import OptionParser

def is_remote_path(path):
  return path.find("@") > -1 and path.find(":") > -1

def parse_remote_path(path):
  return re.split("[@:]", path)

def validate_command(argv):
  usage = "Usage: %prog source_path target_path"
  epilog = "Remote path should be specified as user@host:/full/path"
  parser = OptionParser(usage=usage, epilog=epilog)
  parser.add_option("-p", "--password", dest="password",
    default="", help="SSH password")
  parser.add_option("-v", "--verify", action="store_true",
    dest="verify", default=False, help="Verify copy")
  parser.set_defaults()
  (opts, args) = parser.parse_args(argv)
  try:
    if opts.help:
      parser.print_help()
      sys.exit(0)
  except AttributeError:
    pass
  if len(args) != 3:
    print "Error: Too many or too few arguments supplied"
    parser.print_help()
    sys.exit(-1)
  if (is_remote_path(args[1]) and is_remote_path(args[2])) or \
    (not is_remote_path(args[1]) and not is_remote_path(args[2])):
    print "Error: One path should be remote and one local"
    parser.print_help()
    sys.exit(-1)
  cmd_args = {}
  cmd_args["password"] = opts.password
  cmd_args["verify"] = opts.verify
  if is_remote_path(args[1]):
    (user, host, source) = parse_remote_path(args[1])
    target = args[2]
    mode = "download"
  else:
    source = args[1]
    (user, host, target) = parse_remote_path(args[2])
    mode = "upload"
  cmd_args["user"] = user
  cmd_args["host"] = host
  cmd_args["source"] = source
  cmd_args["target"] = target
  cmd_args["mode"] = mode
  return cmd_args

def is_download_mode(cmd_args):
  return cmd_args["mode"] == "download"

def open_sshclient(cmd_args):
  ssh_client = paramiko.SSHClient()
  ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
  ssh_client.load_system_host_keys()
  if len(cmd_args["password"]) == 0:
    ssh_client.connect(cmd_args["host"])
  else:
    ssh_client.connect(cmd_args["host"], \
      username=cmd_args["user"], password=cmd_args["password"])
  return ssh_client

def find_remote_files(cmd_args, type, ssh):
  (ssh_in, ssh_out, ssh_err) = ssh.exec_command(
    "find %s -name \"*\" -type %s" % (cmd_args["source"], type))
  files = []
  for file in ssh_out.readlines():
    files.append(file.rstrip())
  return files

def remote_mkdir(dir, ssh):
  ssh.exec_command("mkdir %s" % dir)

def find_local_files(cmd_args, type):
  local_out = commands.getoutput(
    "find %s -name \"*\" -type %s" % (cmd_args["source"], type))
  files = []
  for file in local_out.split("\n"):
    files.append(file)
  return files

def get_remote_md5(file, ssh):
  # md5sum was not being found via SSH, so had to add full path
  (ssh_in, ssh_out, ssh_err) = ssh.exec_command(
    "/usr/local/bin/md5sum %s" % file)
  for line in ssh_out.readlines():
    md5sum = line.split(" ")[0]
    return md5sum

def get_local_md5(file):
  local_out = commands.getoutput("md5sum %s" % file)
  return local_out.split(" ")[0]

def verify_files(source_file, target_file, cmd_args, ssh):
  if is_download_mode(cmd_args):
    local_md5 = get_local_md5(target_file)
    remote_md5 = get_remote_md5(source_file, ssh)
  else:
    local_md5 = get_local_md5(source_file)
    remote_md5 = get_remote_md5(target_file, ssh)
  if local_md5 == remote_md5:
    if is_download_mode(cmd_args):
      print "Download %s (%s)" % (target_file, "OK")
    else:
      print "Upload %s (%s)" % (source_file, "OK")
  else:
    if is_download_mode(cmd_args):
      print "Download %s (%s)" % (target_file, "Failed")
    else:
      print "Upload %s (%s)" % (source_file, "Failed")

def copy_files(cmd_args, ssh):
  if is_download_mode(cmd_args):
    source_dirs = find_remote_files(cmd_args, "d", ssh)
    source_files = find_remote_files(cmd_args, "f", ssh)
  else:
    source_dirs = find_local_files(cmd_args, "d")
    source_files = find_local_files(cmd_args, "f")
  for source_dir in source_dirs:
    rel_path = re.sub(cmd_args["source"], "", source_dir)
    if is_download_mode(cmd_args):
      os.mkdir("".join([cmd_args["target"], rel_path]))
    else:
      remote_mkdir("".join([cmd_args["target"], rel_path]), ssh)
  ftp = ssh.open_sftp()
  for source_file in source_files:
    rel_path = re.sub(cmd_args["source"], "", source_file)
    target_file = "".join([cmd_args["target"], rel_path])
    if is_download_mode(cmd_args):
      ftp.get(source_file, target_file)
    else:
      ftp.put(source_file, target_file)
    if cmd_args["verify"]:
      verify_files(source_file, target_file, cmd_args, ssh)
  ftp.close()
  
def close_sshclient(ssh):
  ssh.close()
  
def main():
  cmd_args = validate_command(sys.argv)
  ssh = open_sshclient(cmd_args)
  copy_files(cmd_args, ssh)
  close_sshclient(ssh)

if __name__ == "__main__":
  main()

Here are two examples of using this script, the first for downloading files from a remote server, and the other to upload some files to the remote server. In both cases, we supply the password, and ask that the file transfer be verified by comparing the source and target file MD5 Checksums.

sujit@cyclone:unix$ ./ssh_copy.py \
  sujit@avalanche:/Users/sujit/Projects/python-scripts \
  /tmp/python-scripts -p secret -v
...
Download /tmp/python-scripts/src/unix/rcstool.py (OK)
Download /tmp/python-scripts/src/unix/ssh_copy.py (OK)
sujit@cyclone:unix$
sujit@cyclone:unix$ ./ssh_copy.py \
  /Users/sujit/Projects/python-scripts \
  sujit@avalanche:/tmp/python-scripts -p secret -v
...
Upload /Users/sujit/Projects/python-scripts/src/unix/rcstool.py (OK)
Upload /Users/sujit/Projects/python-scripts/src/unix/ssh_copy.py (OK)
sujit@cyclone:unix$

Obviously, for one-off command line usage, this is not a huge deal... you could simply use the built in scp command to do the same thing (with much less effort). The fun starts when you want to embed one or more scp commands inside of your script, then the convenience of being able to keep a single ssh connection open and run multiple commands over it, not having to deal with backtick hell, and having a full-blown language to parse returned values really starts to makes a difference.