Parsing and Calculating Data with Python and CSV Files

Posted March 25th @ 10:58 am  |  Filed in: Python    

I’ve recently wrapped up a Dynamic Languages class, taught by David Beazley. Being a Python guru, this sorta turned into a Python class, but I had no problem with that. With Dave’s permission, I’ll be posting a few of the lessons here, as well as their solutions. I’ve already posted the first assignment and a couple solutions, and here’s another one.

What we’ll do here is take a csv file that contains our stock portfolio. We’ll then take another csv file of stock data, and calculate the minimum and maximum value of our portfolio over time, and spit out a nicely formatted text table. All the files are available for you to download and play with. I highly encourage newbies to try things out and beat their heads against the wall before viewing the solution, lord knows that’s how I’ve been picking up Python. Books can only take you so far.

Your Mission, Should You Choose to Accept It:

(Taken straight from Dave’s syllabus.) The file dowstocks.csv contains a trace of stock market data scraped off of a finance web site over about a two day period. This was done using a Python program:

import urllib
import time

url = "http://someurl.known.to.me.but.not.to.you.com"
f = open("dowstocks.csv","w")

while True:
    u = urllib.urlopen(url)
    for line in u:
        f.write(line)
    f.flush()
    u.close()
    time.sleep(5*60)

The contents of this file look like this:

...
"GE",36.90,"8/15/2007","4:02pm",-0.78,37.58,37.93,36.72,45992656"
"GM",31.54,"8/15/2007","4:01pm",-1.53,32.85,33.31,31.19,12348300
"HD",33.36,"8/15/2007","4:01pm",-0.16,33.51,34.44,N/A,15896314
"HON",54.72,"8/15/2007","4:01pm",-1.65,56.59,57.21,54.55,3692121
"HPQ",46.15,"8/15/2007","4:00pm",-1.13,47.00,47.59,45.95,8860623
...

The different fields are follows:

symbol, price, date, time, change, open, low, high, volume

If you look at the file, you will find that it’s a little messy. Certain fields have “N/A” filled in as a value. Moreover, you’ll probably find time gaps in the data (since I probably put my machine to sleep from time to time while this was running). The file dowportfolio.csv contains a portfolio of stocks that have been purchased earlier. This file looks like this:

MSFT,100,54.25
IBM,50,91.10
AA,25,23.10
...

The different fields are follows:

symbol, shares, purchase_price

The Big Perl Stock Smackdown

Perl programmers are well aware that it is quite adept at processing data files. Therefore, your challenge in this assignment is to put them all in their rightful place! Well, if you can.

Your task is really quite simple: You are to write a program that reads the stock portfolio in dowportfolio.csv, the stock price data in dowstocks.csv and does the following:

  • Finds the maximum value of the entire stock portfolio and the approximate date/time when it occurred.
  • Finds the minimum value of the entire stock portfolio and the approximate date/time when it occurred.
  • Produces a nicely formatted report that shows the performance of all holdings in the portfolio at both of those times. For example:
stock table example

Wrapping it Up

OK now - the perl script I’ve provided in the zip file was written by Dave. I’ve also included my own script, which is the 3rd incantation of my original Python script. The 1st I wrote was not so good. After a few pointers from Dave, my second was better. The 3rd version I’m including I did off the top of my head, and it’s much cleaner, leaner, and faster, and beats the Perl script by 4 seconds or so when put up against this 50MB stock csv file. Also note that the CSV module was NOT used. Another facet of this is to learn how powerful Python’s string manipulation tools are.

Again, try this out - take a few days with it before just looking at the answer. I’m no Python guru, but I sure learned a lot about how not to do things as I struggled with this, and sometimes the most learning occurs then. Good luck! Please leave your questions and comments below.

No Comments Yet

You can be the first to comment!

Leave a comment

OpenID Login

Standard Login

Options:

Size

Colors