performance - Perl: Most efficent way to calculate percentile -


i have perl script, goes through couple of gig worth of files , generates report.

in order calculate percentile doing following

my @values = 0; while (my $line = <inputfile>){     .....     push(@values, $line);  } # sort @values = sort {$a <=> $b} @values;   # print 95% percentile print $values[sprintf("%.0f",(0.95*($#values)))]; 

this saves values upfront in array , calculates percentile, can heavy on memory (assuming millions of values), there more memory efficient way of doing this?

you can process file twice: first run count number of lines ($.). number, can count size of sliding window keep highest numbers needed find percentile (for percentiles < 50, should invert logic).

#!/usr/bin/perl use warnings; use strict;  $percentile = 95;  $file = shift; open $in, '<', $file or die $!;  1 while <$in>;             # count number of lines. $line_count = $.; seek $in, 0, 0;            # rewind.  # calculate size of sliding window. $remember_count = 1 + (100 - $percentile) * $line_count / 100;  # initialize window first lines. @window = sort { $a <=> $b }              map scalar <$in>,              1 .. $remember_count; chomp @window;  while (<$in>) {     chomp;     next if $_ < $window[0];     shift @window;     $i = 0;     $i++ while $i <= $#window , $window[$i] <= $_;     splice @window, $i, 0, $_; } print "$window[0]\n"; 

Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -