c# - Parallel.For performance -
this code microsoft article http://msdn.microsoft.com/en-us/library/dd460703.aspx, small changes:
const int size = 10000000; int[] nums = new int[size]; parallel.for(0, size, => {nums[i] = 1;}); long total = 0; parallel.for<long>( 0, size, () => 0, (j, loop, subtotal) => { return subtotal + nums[j]; }, (x) => interlocked.add(ref total, x) ); if (total != size) { console.writeline("error"); }
non-parallel loop version is:
(int = 0; < size; ++i) { total += nums[i]; }
when measure loop execution time using stopwatch
class, see parallel version slower 10-20%. testing done on windows 7 64 bit, intel i5-2400 cpu, 4 cores, 4 gb ram. of course, in release configuration.
in real program trying compute image histogram, , parallel version runs 10 times slower. can such kind of computation tasks, when every loop invocation fast, parallelized tpl?
edit.
finally managed shave more 50% of histogram calculation execution time parallel.for, when divided whole image number of chunks. every loop body invocation handles whole chunk, , not 1 pixel.
because parallel.for
should used things little heacy, not sum simple numbers! use of delegate (j, loop, subtotal) =>
more enough give 10-20% more time. , aren't speaking of threading overhead. interesting see benchmark against delegate summer in cycle , see not "real world" time, cpu time.
i have added comparison "simple" delegate same thing parallel.for<>
delegate.
mmmh... have numbers @ 32 bits, on pc (an amd 6 core)
32 bits parallel: ticks: 74581, total processtime: 2496016 base : ticks: 90395, total processtime: 312002 func : ticks: 147037, total processtime: 468003
the parallel little faster @ wall time, 8x slower @ processor time :-)
but @ 64 bits:
64 bits parallel: ticks: 104326, total processtime: 2652017 base : ticks: 51664, total processtime: 156001 func : ticks: 77861, total processtime: 312002
modified code:
console.writeline("{0} bits", intptr.size == 4 ? 32 : 64); var cp = process.getcurrentprocess(); cp.priorityclass = processpriorityclass.high; const int size = 10000000; int[] nums = new int[size]; parallel.for(0, size, => { nums[i] = 1; }); gc.collect(); gc.waitforpendingfinalizers(); long total = 0; { timespan start = cp.totalprocessortime; stopwatch sw = stopwatch.startnew(); parallel.for<long>( 0, size, () => 0, (j, loop, subtotal) => { return subtotal + nums[j]; }, (x) => interlocked.add(ref total, x) ); sw.stop(); timespan end = cp.totalprocessortime; console.writeline("parallel: ticks: {0,10}, total processtime: {1,10}", sw.elapsedticks, (end - start).ticks); } if (total != size) { console.writeline("error"); } gc.collect(); gc.waitforpendingfinalizers(); total = 0; { timespan start = cp.totalprocessortime; stopwatch sw = stopwatch.startnew(); (int = 0; < size; ++i) { total += nums[i]; } sw.stop(); timespan end = cp.totalprocessortime; console.writeline("base : ticks: {0,10}, total processtime: {1,10}", sw.elapsedticks, (end - start).ticks); } if (total != size) { console.writeline("error"); } gc.collect(); gc.waitforpendingfinalizers(); total = 0; func<int, int, long, long> adder = (j, loop, subtotal) => { return subtotal + nums[j]; }; { timespan start = cp.totalprocessortime; stopwatch sw = stopwatch.startnew(); (int = 0; < size; ++i) { total = adder(i, 0, total); } sw.stop(); timespan end = cp.totalprocessortime; console.writeline("func : ticks: {0,10}, total processtime: {1,10}", sw.elapsedticks, (end - start).ticks); } if (total != size) { console.writeline("error"); }
Comments
Post a Comment