<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hello, I am Sean Murphy &#187; Performance</title>
	<atom:link href="http://iamseanmurphy.com/category/performance/feed/" rel="self" type="application/rss+xml" />
	<link>http://iamseanmurphy.com</link>
	<description>Thoughts, news, code by Sean Murphy</description>
	<lastBuildDate>Thu, 26 Jan 2012 02:37:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Binary Search for Javascript Arrays</title>
		<link>http://iamseanmurphy.com/2009/04/29/binary-search-for-javascript-arrays/</link>
		<comments>http://iamseanmurphy.com/2009/04/29/binary-search-for-javascript-arrays/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 21:41:12 +0000</pubDate>
		<dc:creator>Sean Murphy</dc:creator>
				<category><![CDATA[Code Snippets]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[array]]></category>
		<category><![CDATA[binary search]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://iamseanmurphy.com/2009/04/29/binary-search-for-javascript-arrays/</guid>
		<description><![CDATA[If you need to search through a large array, or you search arrays frequently in your Javascript code, or if you do both, chances are a binary search will give you better performance than a linear search (read: for loop). One caveat, however, is that binary search algorithms only work on sorted arrays. Here is [...]]]></description>
			<content:encoded><![CDATA[<p>If you need to search through a large array, or you search arrays frequently in your Javascript code, or if you do both, chances are a binary search will give you better performance than a linear search (read: for loop). One caveat, however, is that binary search algorithms only work on sorted arrays. Here is a binary search function I sometimes use in my code:</p>
<p><span id="more-34"></span></p>
<pre name="code" class="js">Array.prototype.binSearch = function(needle, case_insensitive) {
    if (!this.length) return -1;

	var high = this.length - 1;
	var low = 0;
	case_insensitive = (typeof(case_insensitive) !== 'undefined' &amp;&amp; case_insensitive) ? true:false;
	needle = (case_insensitive) ? needle.toLowerCase():needle;

	while (low &lt;= high) {
		mid = parseInt((low + high) / 2)
		element = (case_insensitive) ? this[mid].toLowerCase():this[mid];
		if (element &gt; needle) {
			high = mid - 1;
		} else if (element &lt; needle) {
			low = mid + 1;
		} else {
			return mid;
		}
	}

	return -1;
};</pre>
]]></content:encoded>
			<wfw:commentRss>http://iamseanmurphy.com/2009/04/29/binary-search-for-javascript-arrays/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>High Performance Comet on a Shoestring</title>
		<link>http://iamseanmurphy.com/2009/03/02/high-performance-comet-on-a-shoestring/</link>
		<comments>http://iamseanmurphy.com/2009/03/02/high-performance-comet-on-a-shoestring/#comments</comments>
		<pubDate>Tue, 03 Mar 2009 03:34:33 +0000</pubDate>
		<dc:creator>Sean Murphy</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[comet]]></category>
		<category><![CDATA[haproxy]]></category>
		<category><![CDATA[ip addresses]]></category>
		<category><![CDATA[meteor]]></category>
		<category><![CDATA[network load balancing]]></category>
		<category><![CDATA[ports]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://iamseanmurphy.com/2009/03/02/high-performance-comet-on-a-shoestring/</guid>
		<description><![CDATA[I&#8217;ve had my eye on the advances that are being made in the Comet arena for a while now, but it was only this past weekend that I finally sat down and used it for a project. In doing so, there was a particular configuration problem I needed to address, and that was&#8230;uh, addressing. Introducing [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had my eye on the advances that are being made in the Comet arena for a while now, but it was only this past weekend that I finally sat down and used it for a project. In doing so, there was a particular configuration problem I needed to address, and that was&#8230;uh, addressing.</p>
<p>Introducing Comet to an existing architecture assumes there is already a web server in the neighborhood, and that it is, in one way or another, receiving traffic from port 80. Due to the fact that many site visitors will likely be positioned behind a firewall unwilling to accept connections on ports other than 80 or 443, we also need to get our Comet server running on port 80 as well. This normally wouldn&#8217;t be much of a problem at all, unless you don&#8217;t want to fork over the money for an extra IP address. I don&#8217;t &amp; I didn&#8217;t. So let me show you how I did so.</p>
<p><span id="more-28"></span><br />
As I eluded to above, to solve this problem of running two services on the same port in the same server environment you would normally have two different IP addresses assigned to the same front-end server. This is typically a load balancer or firewall, but these could also be running on the same machine as a web server and Comet server. The load balancer would then accept requests for x.x.x.1:80 and send them to the web server, and requests for x.x.x.2:80 would go to the Comet server. However if we only have one IP address that means we have to route requests based on a higher network layer, the Application Layer (7). Now we route by domain name.</p>
<p>In fact, that is something most web servers can handle using name-based virtual hosts. &#8220;So why not set-up Apache to reverse-proxy requests to the Comet server?&#8221;, you ask. Well, that would work. The reason Comet servers even exists though, is because web server connection threads are too heavy to support the level of concurrency Comet requires (for a decent number of users). This is where the &#8220;high performance&#8221; part comes in. HAProxy is a fantastic high performance layer 7 load balancer. Using HAProxy&#8217;s ACL feature we can basically mimic Apache virtual hosts. Consider this example snippet from haproxy.cfg:</p>
<pre name="code">
frontend www *:80
    mode http
    acl comet hdr_beg(host) comet.
    use_backend meteor if comet

default_backend apachebackend meteor
    mode http
    server server1 127.0.0.1:4670

backend apache
    mode http
    server server2 127.0.0.1:8080</pre>
<p>As you can see, I set up a front-end to accept all connections on port 80. Then I use an ACL to examine the HOST header and see if it begins with comet. (e.g. http://comet.example.com). If it does, the request is sent to the comet server on port 4670, and if not, requests go to Apache on port 8080. And there you have it, a high performance Comet installation with no money out-of-pocket.</p>
]]></content:encoded>
			<wfw:commentRss>http://iamseanmurphy.com/2009/03/02/high-performance-comet-on-a-shoestring/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>CouchDB View Generation</title>
		<link>http://iamseanmurphy.com/2008/09/08/couchdb-view-generation/</link>
		<comments>http://iamseanmurphy.com/2008/09/08/couchdb-view-generation/#comments</comments>
		<pubDate>Tue, 09 Sep 2008 03:38:16 +0000</pubDate>
		<dc:creator>Sean Murphy</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[import]]></category>
		<category><![CDATA[views]]></category>

		<guid isPermaLink="false">http://iamseanmurphy.com/2008/09/08/couchdb-view-generation/</guid>
		<description><![CDATA[An alternative technology quickly gaining popularity these days is CouchDB, a document-based database system for semi-structured data. I wasn&#8217;t sure what that meant at first, so I read as much as I could about it. The result? I couldn&#8217;t wait to use it. I decided CouchDB would be a good fit for my next project [...]]]></description>
			<content:encoded><![CDATA[<p>An alternative technology quickly gaining popularity these days is <a href="http://incubator.apache.org/couchdb/">CouchDB</a>, a document-based database system for semi-structured data. I wasn&#8217;t sure what that meant at first, so I read as much as I could about it. The result? I couldn&#8217;t wait to use it.</p>
<p>I decided CouchDB would be a good fit for my next project (which I should be releasing sometime this week BTW) and rolled up my sleeves. Because of the amount of data I&#8217;m working with, I hit a few snags along the way with regard to CouchDB view performance. Some of the things I learned, although they make sense, were not what I was expecting initially (even after reading all the docs). So for the benefit of others, I thought it&#8217;d be a good idea to share my current understanding of the way views work in CouchDB, and share some of the tips &amp; tricks <a href="http://jan.prima.de/">Jan</a>, <a href="http://jchris.mfdz.com/">Chris</a>, and others have given me along the way.</p>
<p><span id="more-20"></span></p>
<h2>Importing Data for Speed and Glory</h2>
<p>Most of the work I&#8217;ve done with CouchDB this past week has been related to importing a fair amount of data (600k+ documents). Initially I tried creating one document at a time. This worked, but each request has associated with it a certain amount of overhead and latency. For example, creating 33,847 documents, one at a time, took 726 seconds (~12 min). Thankfully CouchDB has a bulk create mode. Creating the same documents 1,000 at a time took 58 seconds. That&#8217;s a 1,250% improvement! An added benefit of using bulk create is that it consumes less hard disk space (28.2MB vs 213.2MB).</p>
<h2>CouchDB Views vs RDBMS Tables</h2>
<p>Now that I could get my data in the database in a decent amount of time, I wanted to aggregate some of it together in a view. Before I get into too many details, let me explain how I think about CouchDB views. I&#8217;m a very spatial thinker, and so visualizing the similarities and differences between CouchDB views and traditional RDBMS tables helps me to understand how they work. It may be stupid, it may be naive, it may even be wrong, but here goes: Imagine a RDBMS database. Imagine a handful of tables in that database, each with different columns. Now imagine that <em>every</em> row in <em>every</em> one of those tables is just a document in CouchDB, all lumped into the same bucket (database) and with no hierarchy. <em>Views</em> are what filter and aggregate documents together to create (in a very limited sense) the equivalent of a <em>table</em>.</p>
<p>You don&#8217;t join views with each other because you&#8217;re already essentially <a href="http://www.cmlenz.net/archives/2007/10/couchdb-joins">&#8220;joining&#8221; documents</a> (rows) to create the view. This might get you thinking that CouchDB views relate better to RDBMS views. In some ways that is true, but RDBMS views are a one-time snapshot of the underlying tables, and so for the sake of this discussion I&#8217;m leaving RDBMS views out.</p>
<h2>Indexing, the  Slowdown</h2>
<p>CouchDB view indexes are generated when the view is first called. More than one view can be stored inside a design document, but as long as they&#8217;re in the same doc they get generated (and updated) at the same time. After that initial creation, updating the view indexes is incremental based on what documents in the database are added, edited, or deleted. Notice I said indexes are updated based on what documents are modified in the <em>database</em>. <em>This is an important point!</em>, something that wasn&#8217;t obvious to me initially.  There are no tables. This is no hierarchy. No isolation. Modifying any documents in the database means that <em>all</em> view indexes in all design documents have to be updated.</p>
<p>The only time view generation is isolated is when a new design document is created or updated. In this case, though, the process is <em>not</em> incremental. For this reason, if you plan to store a large number of documents I strongly suggest that you work out and create your design documents before populating your database. Although CouchDB is a schema-less database, creating views for a large data set is currently much like designing a schema: you mostly do it before filling your database with data, and you generally don&#8217;t change it often. Why? Because generating views is currently a slow process, and gets slower the more documents you have. As a point of reference, for my example of 33,847 documents it took me 6,705 seconds (~1.86 hours) to generate a view for the first time. Retrieving the view after that took 0.006 seconds.</p>
<h2>Improving Speed Now, and in the Future</h2>
<p>In some cases it is possible speed up view generation by priming the view as you create documents. This method has great results. For example, if I create 33,847 documents in batches of 1,000, calling my view after every bulk create, the whole process takes 219 seconds (~3.65 min). If we compare the time it takes to insert the documents and then generate a view separately vs doing them at the same time, the latter is 3,088% faster (58 + 6,705 / 219 = 30.88).</p>
<p>CouchDB uses an implementation of <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> for generating views. Currently though, view generation cannot be distributed across several nodes. I&#8217;ve been told this feature is on the development roadmap, and so chances are view generation will get much, much faster in the (hopefully) not too distant future. Also worth noting, is that CouchDB has not yet been optimized, and <a href="http://damienkatz.net/2007/12/thoughts_on_opt.html">Damien is quite optimistic</a> about its potential, as am I.</p>
<p>Now, I have only been working with CouchDB for a week, so it&#8217;s quite possible my understanding of something might be off. If so, please correct me. Working with CouchDB has been a load of fun (and education). I&#8217;m really looking forward to where it goes in the future, and I hope to do what I can get help get it there.</p>
<p><strong>UPDATE: </strong>For anyone that might be interested, you can get the import script I&#8217;m using from my <a href="http://bazaar.launchpad.net/~seanmurphy/otherwords/trunk/annotate/head%3A/libs/load_thesaurus.php">Launchpad.net repository</a>. Depending on what you&#8217;re doing it might be a decent start. The script has a few nice features like resuming interrupted imports, bulk inserts with priming, and graceful handling of failed bulk inserts.</p>
]]></content:encoded>
			<wfw:commentRss>http://iamseanmurphy.com/2008/09/08/couchdb-view-generation/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>MySQL SELECT Entries Before NOW()</title>
		<link>http://iamseanmurphy.com/2008/02/19/mysql-select-entries-before-now/</link>
		<comments>http://iamseanmurphy.com/2008/02/19/mysql-select-entries-before-now/#comments</comments>
		<pubDate>Tue, 19 Feb 2008 22:57:22 +0000</pubDate>
		<dc:creator>Sean Murphy</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[now]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[query cache]]></category>
		<category><![CDATA[rounded time]]></category>

		<guid isPermaLink="false">http://iamseanmurphy.com/2008/02/19/mysql-select-entries-before-now/</guid>
		<description><![CDATA[I’m in the business of making things faster. Using NOW() in a SQL query is something I’m going to complain about. Here’s a familiar scenario from the online publishing industry where future dating articles is a commonality: You have a news site. You need to display only articles that have been published, and one of [...]]]></description>
			<content:encoded><![CDATA[<p>I’m in the business of making things faster. Using NOW() in a SQL query is something I’m going to complain about. Here’s a familiar scenario from the online publishing industry where future dating articles is a commonality:</p>
<p>You have a news site. You need to display only articles that have been published, and one of the criteria is that they need to have a publish_date before now. Easy, peasy, lemon squeezy.</p>
<p><span id="more-3"></span>
<pre name="code" class="sql">SELECT author, title, body FROM articles WHERE publish_date &lt;= NOW();</pre>
<p>That works, right? Yeeeeah, it works, but it isn’t <em>optimal</em>. The problem is that MySQL can’t use the query cache on any query that has NOW() in it (or CURRENT_TIME() or any of <a href="http://dev.mysql.com/doc/refman/5.0/en/query-cache-how.html" title="How MySQL Query Cache Works">these other functions</a> for that matter).  The solution I like to use is have PHP generate the timestamp. Even better is to have PHP round the timestamp, like so:</p>
<pre name="code" class="php">// Calculate time to nearest 15 minutes
$roundness = 60 * 15;
$rounded_now = (round(time() / $roundness) * $roundness);
$sql = "SELECT author, title, body FROM articles WHERE publish_date &lt;= $rounded_now";</pre>
<p>Of course, depending on how time sensitive your application is, you may need to change the code from rounding to 15 minutes to something like 5 minutes, or 1 minute. Hey, even rounding to 30 seconds would be better than using NOW() because you can use query cache!</p>
]]></content:encoded>
			<wfw:commentRss>http://iamseanmurphy.com/2008/02/19/mysql-select-entries-before-now/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

