Friday, May 24, 2013

issues in profiling

I'm trying to profile some piece of code. I'm using google perftools for that. I have 4 nested loops:


1:  for( ... ) {  
2:    for( ... ) {  
3:      for( /*loop over some values of b2*/ ) {  
4:        for( vector<xxx>::const_iterator it=some_vector[b2].begin(); it!=some_vector[b2].end(); ++it ) {  
5:          do_something_heavy(it);  
6:        }  
7:      }  
8:    }  
9:  }  

I built this with debugging symbols and compiler optimization turned off. and ran it with google's profiler. Which told me that there were many samples on line 5, which I expected of course, and also many samples on line 4 which I expected less (about 500 each). Of course line 4 being the 4th nested loop, it's being called a lot, but the only thing it does is incrementing the iterator and comparing it with the end iterator.

To understand it a bit better, I broke it down and cached the end iterator:

1:  for( ... ) {  
2:    for( ... ) {  
3:      for( /*loop over some values of b2*/ ) {  
4:        const vector<xxx>::const_iterator end = some_vector[b2].end();  
5:        vector<xxx>::const_iterator it = some_vector[b2].begin();  
6:        while( it != end ) {  
7:          do_something_heavy(it);  
8:          ++it;  
9:        }  
10:      }  
11:    }  
12:  }  


and what the profiler is telling me now, is that line 6 is getting about 200 samples, and that lines 4 and 5 get almost nothing, line 8 (++it) gets 50. Meaning that most of the looping time is spent on comparing the 2 iterators...

I tried with using an integer index rather than an iterator:


1:  for( ... ) {  
2:    for( ... ) {  
3:      for( /*loop over some values of b2*/ ) {  
4:        const vector<xxx> & vector_b2 = some_vector[b2];  
5:        const unsigned N = vector_b2.size();  
6:        unsigned n = 0;  
7:        while( n != N ) {  
8:          do_something_heavy(vector_b2[n]);  
9:          ++n;  
10:        }  
11:      }  
12:    }  
13:  }  

what the profiler is telling me now is that a very small amount of samples is associated with lines other than the do_something_heavy line.


BUT !

when I turn on compiler optimization on, and time the execution of the code, the 3 versions give no significant difference in execution time. Which is quite reassuring in a way.

Conclusion:
- it sucks to have to run the profiler with the compiler optimization flags turned off
- it's good to know (or rather confirm) that iterator are not slower than indexing

If somebody knows of a reliable method to profile code, I'd be happy to hear it.

No comments:

Post a Comment