Saturday, November 14, 2009

The Performance of Everyday Things

I've spent much time fixing code optimizations that added no business value (with often matching performance value). Please do not try to make your code faster unless you need to. The way I handle performance issues on my projects:

  1. Define acceptable performance.
  2. Write my code as simply as possible.
  3. Measure performance: against definition, if performance > acceptable - goto DONE.
  4. /*Performance not acceptable*/ Profile; Fix as simply as possible; goto Measure.
  5. DONE

To be explicit: I'm comfortable using slower patterns if they are clear and simple. As soon as I've hit my acceptable performance bar - I'm done.

With that out of the way, let me discuss a performance riddle I hit this week. I was wandering through some powershell code that processed slews of objects (over 200K of 'em):
$interestingObjects = @()
foreach ($object in $inputObjects) 
    if ($object.IsInteresting) 
        $interestingObjects += $objects


I'm a real fan of succinct code so I replaced the code with the following and it was much faster:
$inputObjects| where {$_.IsInteresting}

I prefer the succinct version of the code, but why is this one liner so much faster? The answer lies in the type of $interestingObjects:
PS C:\Users\igord> $interestingObjects.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array

$interestingObjects is an array, so when we add an item, we could end up doing an expensive operation - for example:
$Array + $item :=
     $newArray = new Object[$array.Length+1]
     $array.CopyTo(newArray,0)   # O(N) copy - nasty for large datasets.
     $newArray[$array.Length] = item
     return $newArray

BTW - in .NET List is implemented via an ArrayList. ArrayList is an Array with the helpful property that Add is mostly O(1):

  • If Count already equals Capacity, the capacity of the list is doubled by automatically reallocating the internal array and copying the existing elements to the new array before the new element is added.
  • If Count is less than Capacity, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.

No comments: