I recently had to modify a shell script that downloaded a bunch of files from a remote machine using scp, verify that the remote file was the "same as" the downloaded file (using a md5 checksum match on the remote and local files), and a few other things. I generally prefer using Python for my own scripting needs, but sadly Python hasn't caught on with our operations team, and since they were responsible for maintaining this particular script, it had to be a shell script. My shell scripting is a bit rusty, so the process was a bit rough ... and I kept asking myself how easy/hard it would be to do this with Python.
The shell script is run via a cron job, so it depends on passwordless ssh being set up between the two machines. During local testing, I did not have this set up, so another annoying thing was that I had to type in the password each time a ssh/scp call was made within the script. Of course, passwordless ssh is fairly easy to set up, but if you do this sort of thing infrequently (true in my case), its a bit of a pain to figure out each time.
I had never used SSH from within Python before - but I found this article an excellent tutorial on how to get started. You need to install paramiko (which needs PyCrypto). Once downloaded, installation is a matter of simply exploding the bundles and running "sudo python setup.py install" - the process was the same for both my Centos 5.x desktop and my Mac OSX laptop.
So in any case, to answer my own question, I ended up building a Python version of the same script. The resulting Python version is more verbose than the shell script, but it is more structured, has more optional features (so developers can specify command line options rather than have to hack the script to make it run in their local environment) and is much easier (for me anyway) to read.
There are some caveats though - first, paramiko supports the SFTP protocol but not the SCP protocol - so scp's have to be "faked" using the sftp client's get and put calls - not really a big deal, just something to be aware of, since sftp is a subsystem of ssh, and if you have sshd running on your remote machine, you automatically get sftp. The second caveat is that the sftp client does not support Unix wildcards, so you have to write application code to get each filename individually - this makes your application code more verbose.
Since the script is somewhat application specific, I decided to take the relevant parts and build them into a Python version of scp that optionally allows a password to be specified on the command line, and does an md5 checksum comparison between each remote and local file to verify the copy.
1 2 3 4 5 6 7 8 9 10 | sujit@cyclone:unix$ ./ssh_copy.py --help
Usage: ssh_copy.py source_path target_path
Options:
-h, --help show this help message and exit
-p PASSWORD, --password=PASSWORD
SSH password
-v, --verify Verify copy
Remote path should be specified as user@host:/full/path
|
The code for the script is shown below. The switch on the shebang line is to suppress deprecation warnings from PyCrypto saying that the random number generator in the current release was broken. The rest of the code is fairly self-explanatory, the paramiko SSH client setup/teardown are in the open_sshclient and close_sshclient methods, and the meat of the code is in copy_files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | #! /usr/bin/python -W ignore::DeprecationWarning
# Copy a directory of files from a remote to local machine
import commands
import os
import os.path
import paramiko
import re
import sys
from optparse import OptionParser
def is_remote_path(path):
return path.find("@") > -1 and path.find(":") > -1
def parse_remote_path(path):
return re.split("[@:]", path)
def validate_command(argv):
usage = "Usage: %prog source_path target_path"
epilog = "Remote path should be specified as user@host:/full/path"
parser = OptionParser(usage=usage, epilog=epilog)
parser.add_option("-p", "--password", dest="password",
default="", help="SSH password")
parser.add_option("-v", "--verify", action="store_true",
dest="verify", default=False, help="Verify copy")
parser.set_defaults()
(opts, args) = parser.parse_args(argv)
try:
if opts.help:
parser.print_help()
sys.exit(0)
except AttributeError:
pass
if len(args) != 3:
print "Error: Too many or too few arguments supplied"
parser.print_help()
sys.exit(-1)
if (is_remote_path(args[1]) and is_remote_path(args[2])) or \
(not is_remote_path(args[1]) and not is_remote_path(args[2])):
print "Error: One path should be remote and one local"
parser.print_help()
sys.exit(-1)
cmd_args = {}
cmd_args["password"] = opts.password
cmd_args["verify"] = opts.verify
if is_remote_path(args[1]):
(user, host, source) = parse_remote_path(args[1])
target = args[2]
mode = "download"
else:
source = args[1]
(user, host, target) = parse_remote_path(args[2])
mode = "upload"
cmd_args["user"] = user
cmd_args["host"] = host
cmd_args["source"] = source
cmd_args["target"] = target
cmd_args["mode"] = mode
return cmd_args
def is_download_mode(cmd_args):
return cmd_args["mode"] == "download"
def open_sshclient(cmd_args):
ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_client.load_system_host_keys()
if len(cmd_args["password"]) == 0:
ssh_client.connect(cmd_args["host"])
else:
ssh_client.connect(cmd_args["host"], \
username=cmd_args["user"], password=cmd_args["password"])
return ssh_client
def find_remote_files(cmd_args, type, ssh):
(ssh_in, ssh_out, ssh_err) = ssh.exec_command(
"find %s -name \"*\" -type %s" % (cmd_args["source"], type))
files = []
for file in ssh_out.readlines():
files.append(file.rstrip())
return files
def remote_mkdir(dir, ssh):
ssh.exec_command("mkdir %s" % dir)
def find_local_files(cmd_args, type):
local_out = commands.getoutput(
"find %s -name \"*\" -type %s" % (cmd_args["source"], type))
files = []
for file in local_out.split("\n"):
files.append(file)
return files
def get_remote_md5(file, ssh):
# md5sum was not being found via SSH, so had to add full path
(ssh_in, ssh_out, ssh_err) = ssh.exec_command(
"/usr/local/bin/md5sum %s" % file)
for line in ssh_out.readlines():
md5sum = line.split(" ")[0]
return md5sum
def get_local_md5(file):
local_out = commands.getoutput("md5sum %s" % file)
return local_out.split(" ")[0]
def verify_files(source_file, target_file, cmd_args, ssh):
if is_download_mode(cmd_args):
local_md5 = get_local_md5(target_file)
remote_md5 = get_remote_md5(source_file, ssh)
else:
local_md5 = get_local_md5(source_file)
remote_md5 = get_remote_md5(target_file, ssh)
if local_md5 == remote_md5:
if is_download_mode(cmd_args):
print "Download %s (%s)" % (target_file, "OK")
else:
print "Upload %s (%s)" % (source_file, "OK")
else:
if is_download_mode(cmd_args):
print "Download %s (%s)" % (target_file, "Failed")
else:
print "Upload %s (%s)" % (source_file, "Failed")
def copy_files(cmd_args, ssh):
if is_download_mode(cmd_args):
source_dirs = find_remote_files(cmd_args, "d", ssh)
source_files = find_remote_files(cmd_args, "f", ssh)
else:
source_dirs = find_local_files(cmd_args, "d")
source_files = find_local_files(cmd_args, "f")
for source_dir in source_dirs:
rel_path = re.sub(cmd_args["source"], "", source_dir)
if is_download_mode(cmd_args):
os.mkdir("".join([cmd_args["target"], rel_path]))
else:
remote_mkdir("".join([cmd_args["target"], rel_path]), ssh)
ftp = ssh.open_sftp()
for source_file in source_files:
rel_path = re.sub(cmd_args["source"], "", source_file)
target_file = "".join([cmd_args["target"], rel_path])
if is_download_mode(cmd_args):
ftp.get(source_file, target_file)
else:
ftp.put(source_file, target_file)
if cmd_args["verify"]:
verify_files(source_file, target_file, cmd_args, ssh)
ftp.close()
def close_sshclient(ssh):
ssh.close()
def main():
cmd_args = validate_command(sys.argv)
ssh = open_sshclient(cmd_args)
copy_files(cmd_args, ssh)
close_sshclient(ssh)
if __name__ == "__main__":
main()
|
Here are two examples of using this script, the first for downloading files from a remote server, and the other to upload some files to the remote server. In both cases, we supply the password, and ask that the file transfer be verified by comparing the source and target file MD5 Checksums.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | sujit@cyclone:unix$ ./ssh_copy.py \
sujit@avalanche:/Users/sujit/Projects/python-scripts \
/tmp/python-scripts -p secret -v
...
Download /tmp/python-scripts/src/unix/rcstool.py (OK)
Download /tmp/python-scripts/src/unix/ssh_copy.py (OK)
sujit@cyclone:unix$
sujit@cyclone:unix$ ./ssh_copy.py \
/Users/sujit/Projects/python-scripts \
sujit@avalanche:/tmp/python-scripts -p secret -v
...
Upload /Users/sujit/Projects/python-scripts/src/unix/rcstool.py (OK)
Upload /Users/sujit/Projects/python-scripts/src/unix/ssh_copy.py (OK)
sujit@cyclone:unix$
|
Obviously, for one-off command line usage, this is not a huge deal... you could simply use the built in scp command to do the same thing (with much less effort). The fun starts when you want to embed one or more scp commands inside of your script, then the convenience of being able to keep a single ssh connection open and run multiple commands over it, not having to deal with backtick hell, and having a full-blown language to parse returned values really starts to makes a difference.
No comments:
Post a Comment
Comments are moderated to prevent spam.