c++ - How to analyze program running time -


i trying optimize c++ program's performance , reduce run time. however, having trouble figuring out bottleneck.

time command shows program takes 5 minutes run, , 5 minutes, user cpu time takes 4.5 minutes.

cpu profiler (both gcc profiler , google perftool) shows function calls take 60 seconds in total in cpu time. tried use profiler sample real time instead of cpu time, , gives me similar results.

i/o profiler (i used ioapps) shows i/o takes 30 seconds of program running time.

so have 3.5 minutes (the largest bulk of program running time) unaccounted for, , believe bottleneck is.

what did miss , how know time goes?

as Öö tiib suggested, break program in debugger. way program running, switch output window, type ctrl-c interrupt program, switch gdb window, type "thread 1" in context of main program, , type "bt" see stack trace.

now, @ stack trace , understand it, because while instruction @ program counter responsible particular cycle being spent, so every call on stack.

if few times, you're going see line responsible bottleneck. see on 2 (2) samples, you've nailed it. fix , again, finding next bottleneck, , on. find enormous speedup way.

< flame>

some people profilers do, better. that's hear in lecture halls , on blogs, here's deal: there ways speed code not reveal "slow functions" or "hot paths", example - reorganizing data structure. every function looks more-or-less innocent, if has high inclusive time percent.

they reveal if actually @ stack samples. problem profilers not in collection of samples, it in presentation of results. statistics , measurements cannot tell small selection of samples, examined carefully, tell you.

what issue of small vs. large number of samples? aren't more better? ok, suppose have infinite loop, or if not infinite, runs far longer know should? 1000 stack samples find better single sample? (no.) if @ under debugger, know you're in loop because takes 100% of time. it's on stack somewhere - scan stack until find it. if loop takes 50% or 20% of time, that's probability each sample see it. so, if see rid of on few 2 samples, it's worth doing it. so, 1000 samples buy you?

maybe 1 thinks: "so if miss problem or two? maybe it's enough." well, it? suppose code has 3 problems p taking 50% of time, q taking 25%, , r taking 12.5%. stuff called a. shows speedup if fix 1 of them, 2 of them, or 3 of them.

prpqpqpapqpaprpq original time avoidable code p, q, , r mixed rqqaqarq         fix p           - 2 x   speedup prpppappaprp     fix q           - 1.3 x    " ppqpqpapqpappq   fix r           - 1.14 x   " raar             fix p , q     - 4 x      " qqaqaq           fix p , r     - 2.7 x    " ppppappapp       fix q , r     - 1.6 x    " aa               fix p, q, , r - 8 x   speedup 

does make clear why ones "get away" hurt? best can if miss twice slow.

they easy find if examine samples. p on half samples. if fix p , again, q on half samples. once fix q, r on half samples. fix r , you've got 8x speedup. don't have stop there. can keep going until can't find fix.

the more problems there are, higher potential speedup, can't afford miss any. problem profilers (even ones) that, denying chance see , study individual samples, hide problems need find. more on that. for statistically inclined, here's how works.

there profilers. best wall-time stack samplers report inclusive percent @ individual lines, letting turn sampling on , off hot-key. zoom (wiki) such profiler.

but make mistake of assuming need lots of samples. don't, , price pay them can't see any, can't see why time being spent, can't tell if it's necessary, , can't rid of unless know don't need it. result miss bottlenecks, , end stunting speedup.

< /flame>


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -