I spent much of my last weekend generating large flat files of denormalized data from various data sources, and then converting it to Excel spreadsheets for human consumption. Although the process is quite simple, thanks to OpenOffice (I am a Linux user, in case you haven't guessed already), its a pain to have to navigate the GUI repeatedly to do this.
Looking around the web, I did not find a ready made script that would do what I wanted, but it seemed fairly simple to do, so I decided to do this rather than suffer through yet another GUI invocation just to convert my CSV file to Excel. To read and parse the file, I used Python's built in CSV module, and to write out the Excel spreadsheets, I downloaded and installed the pyExcelerator module. In addition, since my data files were rather large, I put in a splitting mechanism that would allow me to split my output into multiple Excel files of a specified file size. Here is the script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | #!/usr/local/bin/python
# Tool to convert CSV files (with configurable delimiter and text wrap
# character) to Excel spreadsheets.
import string
import sys
import getopt
import re
import os
import os.path
import csv
from pyExcelerator import *
def usage():
""" Display the usage """
print "Usage:" + sys.argv[0] + " [OPTIONS] csvfile"
print "OPTIONS:"
print "--title|-t: If set, the first line is the title line"
print "--lines|-l n: Split output into files of n lines or less each"
print "--sep|-s c [def:,] : The character to use for field delimiter"
print "--output|o : output file name/pattern"
print "--help|h : print this information"
sys.exit(2)
def openExcelSheet(outputFileName):
""" Opens a reference to an Excel WorkBook and Worksheet objects """
workbook = Workbook()
worksheet = workbook.add_sheet("Sheet 1")
return workbook, worksheet
def writeExcelHeader(worksheet, titleCols):
""" Write the header line into the worksheet """
cno = 0
for titleCol in titleCols:
worksheet.write(0, cno, titleCol)
cno = cno + 1
def writeExcelRow(worksheet, lno, columns):
""" Write a non-header row into the worksheet """
cno = 0
for column in columns:
worksheet.write(lno, cno, column)
cno = cno + 1
def closeExcelSheet(workbook, outputFileName):
""" Saves the in-memory WorkBook object into the specified file """
workbook.save(outputFileName)
def getDefaultOutputFileName(inputFileName):
""" Returns the name of the default output file based on the value
of the input file. The default output file is always created in
the current working directory. This can be overriden using the
-o or --output option to explicitly specify an output file """
baseName = os.path.basename(inputFileName)
rootName = os.path.splitext(baseName)[0]
return string.join([rootName, "xls"], '.')
def renameOutputFile(outputFileName, fno):
""" Renames the output file name by appending the current file number
to it """
dirName, baseName = os.path.split(outputFileName)
rootName, extName = os.path.splitext(baseName)
backupFileBaseName = string.join([string.join([rootName, str(fno)], '-'), extName], '')
backupFileName = os.path.join(dirName, backupFileBaseName)
try:
os.rename(outputFileName, backupFileName)
except OSError:
print "Error renaming output file:", outputFileName, "to", backupFileName, "...aborting"
sys.exit(-1)
def validateOpts(opts):
""" Returns option values specified, or the default if none """
titlePresent = False
linesPerFile = -1
outputFileName = ""
sepChar = ","
for option, argval in opts:
if (option in ("-t", "--title")):
titlePresent = True
if (option in ("-l", "--lines")):
linesPerFile = int(argval)
if (option in ("-s", "--sep")):
sepChar = argval
if (option in ("-o", "--output")):
outputFileName = argval
if (option in ("-h", "--help")):
usage()
return titlePresent, linesPerFile, sepChar, outputFileName
def main():
""" This is how we are called """
try:
opts,args = getopt.getopt(sys.argv[1:], "tl:s:o:h", ["title", "lines=", "sep=", "output=", "help"])
except getopt.GetoptError:
usage()
if (len(args) != 1):
usage()
inputFileName = args[0]
try:
inputFile = open(inputFileName, 'r')
except IOError:
print "File not found:", inputFileName, "...aborting"
sys.exit(-1)
titlePresent, linesPerFile, sepChar, outputFileName = validateOpts(opts)
if (outputFileName == ""):
outputFileName = getDefaultOutputFileName(inputFileName)
workbook, worksheet = openExcelSheet(outputFileName)
fno = 0
lno = 0
titleCols = []
reader = csv.reader(inputFile, delimiter=sepChar)
for line in reader:
if (lno == 0 and titlePresent):
if (len(titleCols) == 0):
titleCols = line
writeExcelHeader(worksheet, titleCols)
else:
writeExcelRow(worksheet, lno, line)
lno = lno + 1
if (linesPerFile != -1 and lno >= linesPerFile):
closeExcelSheet(workbook, outputFileName)
renameOutputFile(outputFileName, fno)
fno = fno + 1
lno = 0
workbook, worksheet = openExcelSheet(outputFileName)
inputFile.close()
closeExcelSheet(workbook, outputFileName)
if (fno > 0):
renameOutputFile(outputFileName, fno)
if __name__ == "__main__":
main()
|
Based on my not finding completed examples for this kind of thing on the web, I can only assume that the script is probably too trivial for most seasoned programmers to consider contributing back to the community. However, based on the few questions I saw about whether such a thing exists, I would conclude that there are some programmers (like me) for whom this is sufficiently non-trivial to consider looking around before writing one themselves. If you are in the latter category, I hope you find this script useful.
Thank you. I am new to python, and dislike excel, but need to do just this.
ReplyDeleteCan I ask what license you are offering this under?
Hi Jerry, I am glad you find this script useful. As for licensing, I would appreciate an attribution if you feel it is practical (for example, if you are using it in open-source code), but other than that, you are welcome to use it as you see fit. Its a small enough thing considering the benefits I have received from other programmers posting their code and expertise online...
ReplyDeleteNice work, I'm going to give it a try on my macbook.
ReplyDeleteThanks for posting this, here's a quick hack to get it working on Mac where the CSV module chokes on '\r' newline characters:
ReplyDeleteModify the file read in main() to use universal newline for Macs:
inputFile = open(inputFileName,'rUb')
-Pete
Thanks very much for the patch, Peter. I don't use Mac OSX, but this would be helpful info for the people that do. Thanks again.
ReplyDeleteHi there Sujit,
ReplyDeleteJust leaving a comment to say how helpful this was for me. Had to make a few minor changes, but other than that it suited my purpose perfectly!
THANKS!!
David
Thanks for the feedback David, and I am glad that it helped.
ReplyDeleteThank you so much for this package. :)
ReplyDeleteFinding it more useful !!
You are welcome, and I am glad you find it useful.
ReplyDeleteWould it be possible to get the header line to output in bold and to set the column widths?
ReplyDeleteThanks,
P.S. This script is great !
Yes, I am pretty sure you can do this - there are methods in the Python Excel extension to change the font and set column widths.
ReplyDeleteHere is a patch that will allow import and use of this program in python. Instead of command line opts you specify the kwargs titlePresent, linesPerFile, sepChar, and outputFilename, which are set to the same defaults as they were in the original.
ReplyDelete3c3,8
< # character) to Excel spreadsheets.
---
> # character) to Excel spreadsheets. Written by Sujit Pal
> # (sujitpal.blogspot.com), posted at http://sujitpal.blogspot.com/2007/02/python-script-
to-convert-csv-files-to.html.
> # Modified by Daniel Frishberg on 4/9/2008 to separate out main() into
> # main_from_cmd_line() and main(), to allow use from other python programs.
> # Dependency: pyExcelerator at http://pypi.python.org/pypi/pyExcelerator.
>
89c94
< def main():
---
> def main_from_cmd_line():
97a103,106
> titlePresent, linesPerFile, sepChar, outputFileName = validateOpts(opts)
> main(inputFileName, titlePresent, linesPerFile, outputFileName, sepChar)
>
> def main(inputFileName, titlePresent=False, linesPerFile=-1, outputFileName="", sepChar=
","):
103d111
< titlePresent, linesPerFile, sepChar, outputFileName = validateOpts(opts)
131c139
< main()
---
> main_from_cmd_line()
Thanks Daniel, much appreciated. I will probably continue to use this as a script, but maybe this will be useful to others.
ReplyDeleteThank You very much for that (excel)lent script!
ReplyDeleteThis is exactly what I need!
Thanks for the feedback, Zook, glad it helped.
ReplyDeleteYou have some minor but possibly important typos in your script: You have 'cvs' in a number of places where you mean 'csv'.
ReplyDeleteThanks for the post. It's saved my a good amount of work by giving me a nice starting place for a small project.
Hi, glad it helped you and thanks for pointing this out, I did a search and found the place you are referring to - fixing now.
ReplyDeleteThanks so much, this script works like a charm :)
ReplyDeleteMatt Dubins
Thanks Matt, glad it helped.
ReplyDeleteThis a great script ..... I used this to convert my large csv file to .xls with multiple sheets .... But for a large .csv file, conversion taking more time ... sometimes it hangs ... I donot have much idea over Python scipting ..... is it possible to reduce time ???..
ReplyDeleteThanks, rana. I have been using my script quite extensively these past few months (once again generating large spreadsheets from flat files :-)), and I haven't actually found a case where it hangs. In fact, I find in lots of cases, OpenOffice chokes on some of my files, but csv2xls.py rips right through. There /is/ a part of the script that is memory intensive though, where it uses the pyExcelerator to build potentially large spreadsheets in memory before dumping them out to the spreadsheet file, and that could cause a hang like you noticed. If thats the case, you may want to ask the script to produce a larger number of smaller excel files.
ReplyDeleteVery useful script. Thanks that you made it available. I couldn't have written is myself (yet :)
ReplyDeleteYou're welcome, Michael, and thanks for the feedback.
ReplyDeleteThis was awesome. Thanks!
ReplyDeleteThanks, John.
ReplyDeleteHy everybody, nice script. I used at work :). but I have a problem, when I have to add to the worksheet special characters (French letters) it doesn't work: even with: # -*- coding:utf-8 -*- at the beginning of the script.
ReplyDeleteCan anyone help me?
Thanks pclin1988. I have only used it with US_ASCII, so won't be able to help you there. If someone knows the fix to this, I am guessing they would post it here. You may want to post to the pyExcelerator support forums, they would probably be more likely able to help with your issue.
ReplyDeleteThanks Alex, good to know.
ReplyDeletesujit, thanks for a script. I would like to know if it is possible to combine many csv files into a single xls file? that would be an added flexibility..
ReplyDeleteSatish
Thanks Satish. The functionality you are looking for is fairly easy to build in, instead of reading a single csv file, just get a list of files and loop through them.
ReplyDeleteHi Sujit, thaks very much for this. Surprising that since 2007 there aren't any other competitors out there for your CSV2Excel, which probably indicates the usability of your script. I'll give it a shot and let you know how it works for me.
ReplyDeleteRanjeet
Sujit,
ReplyDeleteFirst I must say, thank you for a wonderful script that has solved worlds of headaches. I find it incredible that 4 years after it's release, it's still be looking for and found based on a simple google search.
I, in the future, would like to learn from this and tailor it to some specific needs/wants I have, but I do have one question.
Is there a way to change this around so you can pipe output directly to the script instead of having to save a file and run the script against that?
Thanks
@Josh/Ranjit: probably indicates that CSV to Excel conversion is somewhat rare, which is why nobody really cares that much about the problem to write different versions :-). Anyway, glad it helped, I know it has helped me a lot over the years :-).
ReplyDelete@Josh: regarding reading from stdin, haven't tried it myself, but you may want to change the script something like this:
114: reader = csv.read(sys.stdin, delimiter=sepChar)
Sujit, Thank you so much, It works like a charm, and is teaching me how to write good code.
ReplyDeleteThanks for the kind words, Ranjeet, and you are welcome.
ReplyDeleteHi Sujith
ReplyDeleteDo you have any script to convert .xlsx to .csv
I tried using xlrd but it only works for .xls and not for .xlsx
Any inputs to convert .xlsx to .csv would be much appreciated
Hi Tejas, AFAIK XLSX uses the MS Open XML format for spreadsheets (the format they came up with in response to Open Document Format's ODS format), so it is written out as an XML file ultimately. With ODS, the file is a zipped set of (mainly) 4 XML files, one of which is the content XML file. I suspect (but am not sure) that MS OOXML does it similarly, but it should be easy to find on the internet or figure out. So once you have the XML contents, it should be relatively simple to parse with an XML parser and convert to CSV.
ReplyDeleteThis is an awesome conversion tool. However, is there a way to make it convert spreadsheets into csv files instead?
ReplyDeleteThanks
We had this same requirement couple of weeks ago, and our DBA suggested the unix add-on xls2csv. Check out this page for a review of both the C version (which was suggested by our DBA) and a Perl version.
ReplyDeleteHello, Great script but how do you turn on word wrapping for all columns? I get how to format each column but not with word wrapping.
ReplyDeleteHi jeeads, by word wrapping you mean excel columns with embedded newlines, yes? If so, I believe this script will work, although I haven't tried it myself. Columns with embedded newlines are represented in CSV as quoted multiline text, thus your CSV row will span multiple lines. Reason I think it should just work is that I used the Python csv module to read the input (rather than just readline and split), which should handle this condition.
ReplyDeleteHi Sujit,
ReplyDeletemany thanks for your script!
I adapted it for my needs and it works perfectly! And it helped me improving my Python skills!
Cheers
You are welcome Milan, glad it helped.
ReplyDeleteThanks, Sujit. Helpful script but it is only half of what I am in need of! I am tweaking it a bit but, is there an updated version available somewhere which can take in multiple csv files for conversion?(the output needs to be in a single xls file with each csv input file in a separate sheet)
ReplyDeleteThanks Sahana, but unfortunately, no there no updates as far as I know... I built this for myself and I haven't had a need for loading separate worksheets from multiple csv files, I guess my clients are less demanding than yours :-). But I just looked at the pyExcelerator documentation (in the python REPL, import pyExcelerator; help(pyExcelerator) and they seem to have support for worksheets, so I would think this would be fairly easy to extend by calling the current functionality (one file -> one worksheet) multiple times via function calls.
ReplyDeleteThanks to my earlier reply. I did eventually figure out how to word wrap -- I changed the line 41 to
ReplyDeleteworksheet.write(lno, cno, column, font_style('left',1))
but end up with errors if csv is over 2 megs. And if I take the script as is I end up getting same errors if csv is over 15 megs.
Traceback (most recent call last):
File "csvtoexcel_Patrons.py", line 175, in
main()
File "csvtoexcel_Patrons.py", line 170, in main
closeExcelSheet(workbook, outputFileName)
File "csvtoexcel_Patrons.py", line 90, in closeExcelSheet
workbook.save(outputFileName)
File "/Library/Python/2.7/site-packages/pyExcelerator/Workbook.py", line 610, in save
doc.save(filename, self.get_biff_data())
File "/Library/Python/2.7/site-packages/pyExcelerator/Workbook.py", line 595, in get_biff_data
data = sheet.get_biff_data()
File "/Library/Python/2.7/site-packages/pyExcelerator/Worksheet.py", line 1424, in get_biff_data
result += self.__row_blocks_rec()
File "/Library/Python/2.7/site-packages/pyExcelerator/Worksheet.py", line 1333, in __row_blocks_rec
result.append(self.__rows[used_rows[i]].get_row_biff_data())
File "/Library/Python/2.7/site-packages/pyExcelerator/Row.py", line 200, in get_row_biff_data
return BIFFRecords.RowRecord(self.__idx, self.__min_col_idx, self.__max_col_idx, height_options, options).get()
File "/Library/Python/2.7/site-packages/pyExcelerator/BIFFRecords.py", line 1430, in __init__
options)
struct.error: ushort format requires 0 <= number <= USHRT_MAX
Hi, this appears to be a bug/limitation of pyExcelerator, it may be better to ask on its mailing list. I also found this python-excel issue, which indicates that the column in question may be defined incorrectly, that may be something you may want to look at as well.
ReplyDeletehi,
ReplyDeletei am new to python. i was trying to run this code by the command "python csv2xls.py" (without the quotes) and it is giving me the following without any output.
OPTIONS:
--title|-t: If set, the first line is the title line
--lines|-l n: Split output into files of n lines or less each
--sep|-s c [def:,] : The character to use for field delimiter
--output|o : output file name/pattern
--help|h : print this information
What must be going wrong here? Thanks in advance and thanks for the script!
Hi Prem, its telling you how to use it. if you look one line up in the generated message (you didn't include it in your comment), it will say something like "Usage: ...". So minimally you will need to call it like "python csv2xls.py inputfile.csv". Additionally, if your column delimiters are other than "," then you must specify this using the --sep or -s option. If you want the output written to a specific Excel file (as opposed to basename(inputfile).xls, then you must specify the --output or -o option, etc.
ReplyDeleteFor reading from stdin, to allow bash pipeing, this may work
ReplyDelete< inputFile = open(inputFileName, 'r')
---
> if (inputFileName=="-"):
> inputFile= sys.stdin
> else:
> inputFile = open(inputFileName, 'r')
Thanks!
ReplyDeleteEven after 7 years, this post is much alive ! Many thanks.
ReplyDeleteFor my personal needs, I just modified your script to allow me to insert as many CSV files as I want in a single XLS file (each CSV has his own worksheet), not just one.
This way, I can aggregate all my CSV files in a single XLS with just a call of :
csv2excel.py csv_path/*.csv
Nice! And yes, it does look I ended up scratching a big itch with this one :-).
ReplyDeleteHi,
ReplyDeleteI am trying to convert a .csv file with 2500 columns and 14 rows. The script generates an erroneous file which is only partly readable in excel. The transposed version of this table is processed correctly. Do you have any suggestions how I could get it to work?
Thanks, Bernd
Hi Bernd, its possible that you are hitting some limit in pyExcelerator with the high number of columns. Since you mention your ability to transpose, I take it that your data is probably coming from a matrix library like numpy? In that case, perhaps you could simply transpose and write it to the file, then open Excel and transpose it back?
ReplyDeleteThank you for this wonderful script.
ReplyDeleteI am trying to change it so it can create a unique excel file with multiple tabs, one per csv file but so far I was not able to do this. Thanks Again!
You are welcome and thank you for the kind words. With respect to creating spreadsheets with multiple tabs I haven't been able to do it either - I needed this for generating spreadsheets out of reports for our Informaticist, and since it was for internal use, I was able to convince him to do without the convenience of multiple tabs :-).
ReplyDeleteTo make a .xls file from multiple csv files, in Linux you can install gnumeric, which comes with a CLI utility called ssconvert.
ReplyDeletessconvert --merge-to=mynewfiile.xls 1.csv 2.csv 3.csv ...
Each file ends up on a tab in the xls with the filename.
Very nice, thank you!
ReplyDeleteHi Sujit
ReplyDeleteI tried looping through multiple csvs to output to a single xls file so that each csv becomes a sheet a in the excel file, but it simply overwrites onto Sheet1.
>> python ~/bin/hs/csv2xls.py --output review_timing.xls hold_timing.csv
>> python ~/bin/hs/csv2xls.py --output review_timing.xls setup_timing.csv
At the end , all I see is one sheet in review_timing.xls with the content of setup_timing.csv
How do i create one sheet for each csv.
Thanks
Hi Trintelpathy, sorry, but my script does not support multiple sheets. I wrote this basically as a time-saving hack because I was tired of having to manually convert CSV outputs from my reporting code into Excel spreadsheets for management. Fortunately (or unfortunately depending on your POV) they didn't require me to submit multiple reports packaged into a multi-sheet Excel spreadsheet, so I never investigated that.
ReplyDelete