Monthly Archives: January 2008

Basic File Parsing with PHP and Python

I’m currently taking a masters class on dynamic languages with DaBeaz, and our first assignment was to parse a basic text file containing a stock portfolio, like so (here’s a simple text file w/ all of them):

YHOO 50 19.25
AAPL 100 143.41
SCOX 500 4.21...

The values are symbol, shares, and price. We had to produce a nicely formatted, text-based table, and calculate the total value. Not so bad, but we had to do it in 9 different languages. I won’t talk about all of them, rather, just the PHP and Python versions. I’ve been working in PHP for quite awhile now, and and learning quite a bit about Python as DaBeaz wrote a book about it.

Simplest is Best

So, there’s all kinds of ways to perform the task at hand. Since you know there are 3 values on each line, you could “explode” each line into an array, based on the space between each value. You could toss in a regular expression or two. But let’s not get too clever, and try to write the most efficient code possible with the least overhead – what if our portfolio contains thousands of lines?

Here’s the Python solution:

total = 0
print "%10s %10s %10s " % ('Names', 'Shares', 'Price')
print "---------- " * 3;
for line in open('portfolio.txt', 'r'):
    vals = line.split()
    symbol = vals[0]
    shares = int(vals[1])
    price = float(vals[2])
    print "%10s %10s %10.2f" % (symbol, shares, price)
    total += shares * price
print "\nTotal value : $%0.2f" % total

One thing that’s really winning me over to Python is the syntax – it’s compact and easy to read. Also note that in Python, we can’t create “on the fly” variables via expressions, which is why we had to declare the “total” variable on the first line, whereas PHP is very loosie-goosie:


Also note that with PHP, we don’t have to convert strings to numbers and floats, it just sort of “magically” knows what we want to do by looking at the context (in this case, perform a math calculation). While I found Python’s “strictness” a bit annoying at first, I’ve come to appreciate it’s explicit behavior as it’s less error-prone. (And I say “strictness” because compared to compiled languages like Java, this isn’t strict at all.) But note PHP’s list function – it’s highly useful, and can be used to iterate database queries as well.

Printf (PHP) and % (Python) are Your Friends

Seeing that all of my work is on the web, I never really used these puppies before. But they sure are useful for formatting text and numbers. The syntax can be a little intimidating, but not when you understand it. Let’s look at this statement in PHP:

printf("%10s %10d %10.2f\n", $name, $shares, $price);

Think of “printf” like printing w/ formatting. All this is saying is to take the stuff to the right of the quotes (our variables), and format it like this:

%10s = print a string, right-aligned, within a 10 space field
%10d = print a digit, right-aligned, within a 10 space field
%10.2f = print a floating point number, right-aligned, within a 10 space field, and round it to 2 decimal points.

(And the “\n” is just a line-break command.)

In Python, you just use a regular print statement, with a “%” between the formatted output (always in quotes, as with PHP), and the variables to sub in:

 print "%10s %10s %10.2f" % (symbol, shares, price)

(In Python, a new line is automatically inserted after each print statement.)

And there you have it. In the next article, we’ll take this to the next level, and parse real stock data from a (huge) CSV file – whooot!

Moving to a New Domain? Use Apache to Redirct the Old to the New

So you have to move your site to a new domain – it happens. One of my clients just had to go through this because of a legal issue. But never fear – there’s no need to break your existing search engine results, or websites that have linked to your old domain, or folks that have bookmarked you, etc. Put these three lines, and only these three lines, into an .htaccess file on the old site (assuming you’re using Apache as your web server):

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^ http://www.new-domain.com%{REQUEST_URI} [L,R=301]

Of course, replace “www.new-domain.com” with your new domain. That’s it. Every request that goes to the old site will be automatically redirected to the new. In case you’re wondering, the “301″ number sends what’s called a “header message” to the search bots that this is a permanent move, and that they should update their records accordingly. This elegantly avoids any duplicate content issues.

Simple SQL to Track Hits/Views

I just added a feature to ChronicBabe.com, which runs a custom CMS, to keep track of how many times an article as been viewed. In the code that pulls an article from the database, this single line of SQL quickly increments a “hits” row (type=INT) in the “articles” table for a given id:

UPDATE articles SET hits=hits+1 WHERE article_id=$id

There is no need to fetch the existing number from the database, that’d be a wasted call. Instead, we just increment the existing count by one. Note that this method does not track visits by an actual person or differentiate a real person from a search engine hit (use a real stats program for that stuff). But, for a quick and dirty way of generating a “Top 10 Articles” kind of list on your website, this does the trick nicely.

Force/Strip the “www.” On or Off Your Domain (Apache)

One of the first things I do after launching a new site is make sure every page is unique as far as the URL is concerned. By this I mean that IMHO you should not be able to view this page:

http://some-site.com/articles/12

As well as this page:

http://www.some-site.com/articles/12

One should forward to the other, or vice-versa.

How come?

Once-upon-a-time, allowing this could throw off your Google Page Rank. While this isn’t the end of the world, why have http://www.some-site.com be a PR5 and http://some-site.com be a PR3? (This happens when some people link to one or the other.) By forcing the “www.” on or off of a domain, you could easily remedy this issue. I think Google has tweaked their algorithms to where this doesn’t really matter much anymore, but for me, I just think it’s cool and a sign of great attention to detail to have one or the other.

Anyways, to always force the “www.” onto your domain, add this to your .htaccess file:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^(www\.|$) [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

And to strip it off, you might try something like this:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.your-domain\.com [NC]
RewriteRule ^(.*)$ http://your-domain.com/$1 [L,R=301]

Note that the above example “hard-codes” the domain into the rewrite rule. It’s not as slick as the previous example (which you can just cut/paste into any website’s htaccess file), but it should work.

Blogging Gotch-ya’s

Some blogging software makes the above obsolete. For example, in WordPress you can set up your “WordPress address” in the Options section. Now, if you use the above code to strip off the “www” from your domain in htaccess, but in WordPress you set up your domain with the “www” in your “WordPress address” – you’ll get an endless loop that’ll either crash the browser or result in some sort of error.

Does it matter?

These days, I think it boils down to personal preference. Once upon a time, yes, it did matter – the “www” (which of course stands for world wide web) is actually a subdomain call. It used to be needed in order to direct internet traffic from non-internet traffic, such as “ftp.domain.com.” Of course, we’ve moved passed this. There are some people who feel very strongly about dropping the “www” from all websites. I think these folks have a bit too much time on their hands.

The bottom line is as long as your website works for both the www and non-www version, you’re OK. If your website only works on one of these, then it’s time to worry (I run across this with Microsoft’s IIS servers, which I loathe to the point of never ever touching again). The rest are merely style-points.

The Best Way to Start a Blog

So you’ve been pondering a weblog. All of these ideas keep popping into your head, but something keeps you from starting that blog. Maybe it’s because the process is a bit intimidating. Or maybe you just can’t “picture” how it will be in your mind. Or, as many of us are, you’re afraid of putting yourself and your thoughts out there, whether they are personal or business related, or somewhere in-between.

As they say, the first step is often the hardest, and it’s not different here. So what is the first step? Just get a blog. Forget how it looks – it doesn’t really matter. It can always be changed. Nobody’s going to start flocking to your blog just because it’s there, so just start doing it, and then let people know about it when you’re at your comfort level. And just be yourself – if you try to write like somebody else, you’ll never get as much writing done.

Where to Get a Blog

The easiest, cheapest, and fastest way is to get a Blogger account, which is owned by Google. You’re very limited as far as design goes, and you won’t get your own domain name (rather, it’ll be like http://yourname.blogspot.com) – but that’s not the issue. The issue is to get started. Once you’re going, you can either keep it up there or export your blog and take it elsewhere.

If you or somebody you know is comfortable with uploading files to a web server, you can get a free copy of WordPress and use it on your own domain name. Or, you can get a WordPress-friendly host that has WordPress already installed and ready to use, or get up and running immediately with a free account at wordpress.com (similar to blogger) – with many of these setups, you can get your own domain name too.

This site is running WordPress. Even though I’m a web developer that tends to write content management systems, there’s no need to reinvent the wheel sometimes. WordPress is primarily designed for blogging, and that’s all I wanted to do. As of this writing, I’m still using the default WordPress template, which is fine but not really my style. But you know what? I’m writing again. I’ll give it a face-lift later.

Learning Languages is Like Shopping at a New Grocery Store

A couple weeks ago I started taking a class down at the U of C called Dynamic Languages, taught by python guru David Beazley. Languages like Ruby, PHP, Python, JavaScript, Perl, etc. are dynamic in that they don’t have to be compiled in order to run, much like C and Java, which are static languages. In addition, you can do things like change variables from strings to integers and vice-versa on the fly, otherwise known as type-casting.

I’ve been programming mostly in PHP, and recently learning some Ruby. So imagine my surprise when our first assignment was to write a simple program in 9 different languages! However, this proved to be a very useful thing (after bashing my head on the desk for awhile).

First off, I didn’t realize how good I had it – working in static languages can be a royal pain. For example, in C, you have to declare your variable and what type of data they will hold, like this:

char c[30];  /* declare a char array */
FILE *file;  /* declare a FILE pointer  */
int i = 0;   /* declare an integer */
double total = 0;  /* declare a number with decimal points */

If you try to change any of those later:

i = "Matt" /* no longer a number */

You get an error when you try to compile it. And that’s another pain – recompiling every time just to insert a minor change. These dynamic languages were created in order to facilitate and speed up the development process and begin solving real problems right away.

However, I was glad I learned a bit of C (Java? not so much), as I didn’t realize just how closely many of these dynamic languages are related to it (and get this, C is written in C, I should’ve taken the blue pill…). Also, some of these dyno-languages have C-extensions, so you can take advantage of the speed enhancements that a compiled language offers.

But I digress – how is learning a new language like shopping in a new grocery store?

Every grocery store pretty much has the same stuff – the bread may be in isle 12 instead of 5, but once you find it, you’re good to go the next time you shop there. Programming languages have many of the same abilities, but sometimes there’s just a different volcabulary (and sometimes, it’s almost exactly the same – like in some of the C-influenced languages). Once you figure out where the stuff you usually shop for is, it’s downhill from there.

But the real fun about learning a new language is that sometimes you find that magic isle that has stuff you never knew about before. Ooooh, that coffee looks really good, and look at all this other stuff right next to it. I wonder if my old store had that stuff…

Much like that, learning a new language can help you understand things about the ones you already knew, but just never explored before.