Monthly Archives: August 2013

Median tips in Matlab

Medians are sometimes necessary when trying to characterize data. Datasets with large, infrequent outliers just tend to skew the mean too much for it to be a good estimator of the data’s bulk behavior.

I recently had a need to process a lot of FITS files with the median operation.  The details of this are not terribly important except that I need a good estimator of each pixel value across ~50 FITS frames (about  18 sets of these 50 frame chunks).  To quickly get an estimate of what I would be in for I checked out the performance of Matlab’s median operation. Doing these calculations doesn’t take a lot of time individually but collectively I was looking at about 1/2 hour to get just medians.  Out of curiousity I checked into the Profiler to examine if readin was taking longer than the processing. The images were being read at about 50Mb/sec @ 2.2Mb each so about 25 fps on my machine. Median calculations on the other hand were taking about 8sec each! — which of course depends on the type of data you are feeding the median operation — however for those keeping score that’s 6.25 fps.

I wasn’t expecting the median to be fast since a KxLx50 matrix is not the optimized layout for caching. Matlab does eventually rotate the input for this case and copies all 2Gb for sorting. I was suspect of the sorting operation and the necessary rotation but most of the time wasn’t spend doing this operations(!).

About 3/4 of the time was spent processing the case for even element median operations!

Feeding in 45 frames for median’ing raised my processing rate to 2.1sec per med op (that’s 24 fps kids).

Moral: If you can, only do median operations on odd numbers of elements {possibly matlab specific}

Extras: By the way I am processing 16bit numbers from the FITS files. Interestingly the median operation on these integers gives an actual pixel value for inputs where the number of frames median’ed is odd! When the number of frames is even the median that is computed is round(mean( [sorted_pxl_vals( N/2 ) , sorted_pxl_vals( N/2+1] ) )). This may not correspond to an actual pxl value and at least for my data significantly skews the mean of the pxl value distribution by about 3 LSBs.