performance - Perl: Most efficent way to calculate percentile -
i have perl script, goes through couple of gig worth of files , generates report.
in order calculate percentile doing following
my @values = 0; while (my $line = <inputfile>){ ..... push(@values, $line); } # sort @values = sort {$a <=> $b} @values; # print 95% percentile print $values[sprintf("%.0f",(0.95*($#values)))];
this saves values upfront in array , calculates percentile, can heavy on memory (assuming millions of values), there more memory efficient way of doing this?
you can process file twice: first run count number of lines ($.
). number, can count size of sliding window keep highest numbers needed find percentile (for percentiles < 50, should invert logic).
#!/usr/bin/perl use warnings; use strict; $percentile = 95; $file = shift; open $in, '<', $file or die $!; 1 while <$in>; # count number of lines. $line_count = $.; seek $in, 0, 0; # rewind. # calculate size of sliding window. $remember_count = 1 + (100 - $percentile) * $line_count / 100; # initialize window first lines. @window = sort { $a <=> $b } map scalar <$in>, 1 .. $remember_count; chomp @window; while (<$in>) { chomp; next if $_ < $window[0]; shift @window; $i = 0; $i++ while $i <= $#window , $window[$i] <= $_; splice @window, $i, 0, $_; } print "$window[0]\n";
Comments
Post a Comment