CouchDB View Generation
An alternative technology quickly gaining popularity these days is CouchDB, a document-based database system for semi-structured data. I wasn’t sure what that meant at first, so I read as much as I could about it. The result? I couldn’t wait to use it.
I decided CouchDB would be a good fit for my next project (which I should be releasing sometime this week BTW) and rolled up my sleeves. Because of the amount of data I’m working with, I hit a few snags along the way with regard to CouchDB view performance. Some of the things I learned, although they make sense, were not what I was expecting initially (even after reading all the docs). So for the benefit of others, I thought it’d be a good idea to share my current understanding of the way views work in CouchDB, and share some of the tips & tricks Jan, Chris, and others have given me along the way.
MySQL SELECT Entries Before NOW()
I’m in the business of making things faster. Using NOW() in a SQL query is something I’m going to complain about. Here’s a familiar scenario from the online publishing industry where future dating articles is a commonality:
You have a news site. You need to display only articles that have been published, and one of the criteria is that they need to have a publish_date before now. Easy, peasy, lemon squeezy.
Natural Order Numerical Sorting
I came across an interesting problem today in the CodeIgniter forums. Suppose you have a dataset like this:
1
1.2
1.2.1
1.2.3
1.2.4
2
5
5
5.1
5.7
11
11.1.2
Obviously, the example cited above is sorted, and sorted correctly. Now lets say these values exist in a random order in a database and we want to retrieve them in the same order as above–sorted ascending. The simple SELECT version FROM test ORDER BY version ASC doesn’t quite do the job.
+---------+ | version | +---------+ | 1 | | 1.2 | | 1.2.1 | | 1.2.3 | | 1.2.4 | | 11 | | 11.1.2 | | 2 | | 5 | | 5 | | 5.1 | | 5.7 | +---------+
This is because our data is stored as a string, and the database will sort it as such lexicographically. The thinking human, however, wants it sorted numerically. But we cant, it’s a string. “So convert it to a decimal”, you say. Well, that partly works, except, we loose everything after the second period. For example, SELECT version, (version + 0) FROM test ORDER BY (version + 0) ASC will implicitly cast our field to a decimal. But the results will look something like this:
+---------+---------------+ | version | (version + 0) | +---------+---------------+ | 1 | 1 | | 1.2.1 | 1.2 | | 1.2 | 1.2 | | 1.2.4 | 1.2 | | 1.2.3 | 1.2 | | 2 | 2 | | 5 | 5 | | 5 | 5 | | 5.1 | 5.1 | | 5.7 | 5.7 | | 11 | 11 | | 11.1.2 | 11.1 | +---------+---------------+
The second column is there to show what we’re actually sorting by. As you can see we’re close, but not there. The secret is to sort both numerically AND lexicographically (in that order). The new SQL query will be SELECT version FROM test ORDER BY (version + 0), version ASC. And the results are just what we wanted:
+---------+ | version | +---------+ | 1 | | 1.2 | | 1.2.1 | | 1.2.3 | | 1.2.4 | | 2 | | 5 | | 5 | | 5.1 | | 5.7 | | 11 | | 11.1.2 | +---------+


