One of the things I love about cloud computing is you can put an honest price on computing time. You can than balance the human engineering time required to optimize code (and often have more complex code) vs just paying for the cloud to do it. The Zillow rent estimate post speaks to this brilliantly:
We implemented the Rent Zestimation process as a software application taking input from Zillow databases and producing an output table with about 100 million rows.
We deploy this software application into a production environment using Amazon Web Services (AWS) cloud. The total time to complete a run is four hours using four of Amazon EC2 instances of Extra-Large-High-CPU type. This type of machine costs $1.16/hr. Thus, it costs us about $19 to produce 100 million Rent Zestimates which is the same as a 3-D movie ticket or about 5 gallons of gasoline in New York City today.
A few things to note about this quote:
- If your data processing can be done in parallel, you’d have the same cost but at a lower time since you can run all instances in parallel
- If your data processing must be done in serial, and you want results faster than the largest instance can compute them, buy some coffee you’ll be up doing optimizations.
- Gas prices are through the roof in New York