Pointer Wars Level 3: Optimization Opportunities

Congratulations on making it to level 3! The contest pivots from primarily writing code to optimizing existing code, whether it be your queue implementation or the average breadth first search code that we provided in level 2. Both can be improved.

Numerous opportunities for optimization exist. Start with improving the algorithm by doing less work, because less work means less time for the program to execute. Then, jump into making the work more efficient.

For many of you, a lot of these concepts will be new, or concepts that you haven’t explored yet. We are in the process of writing about all of them. Some of these are relatively basic optimizations, others are extremely advanced. We urge you to consider quantifying your decision (perhaps by asking us if there’s a relevant hardware performance counter and what that counter’s value is for your runs) and then making a decision based on what data you have.

If you have a particular interest about one, and are a Level 3 participant, send us email about what you’d like to read about. That tells us to prioritize writing about that subject.

Algorithmic Changes

Algorithm changes are all about doing less work but getting the same result. All of these can be tested on your local machine to see whether they make a difference with the software provided to you in level 2.

Here’s a list of things to get you started:

  1. Investigate improving the handling of a visited node.
  2. Investigate improving memory allocation to improve performance. An example custom memory allocator and design tradeoffs are discussed to get you started.
  3. Quantify and determine whether the number of memory copies in the program can be reduced.

Non-Algorithmic Performance Improvements

  1. Investigate system level performance considerations.
    • Investigate whether the use of a larger page size reduces the execution time.
    • Investigate whether the use of a shared object file for your queue code impacts performance, and explore alternatives.
    • Investigate whether the OS scheduling policy has an impact on performance, including reducing the number of context switches.
    • Investigate whether OS policy around dirty pages impacts performance.
    • Investigate the impact of link time optimization.
    • Investigate the impact of interrupts and other exceptions on the code and attempt to reduce them via minor code tweaks or changes in OS policy.
    • Investigate the impact of CPU pinning.
  2. Investigate memory system related effects.
    • Determine whether data structures are optimally laid out in memory for cache accesses.
    • Determine whether the number of loads and stores in the program can be generally reduced.
    • Determine whether C/C++ structure memory alignment may cause performance problems.
    • Determine whether cache misses can be reduced by creative use of software prefetching or generally making data structures more prefetchable by hardware prefetchers.
  3. Investigate branch related effects.
    • Start by reading about CPU branches here.
    • Reduce the number of branches in the program by reworking the code.
    • For the number of branches that remain, determine whether they can be made to be more predictable by the hardware branch predictor.
    • Reduce the number of indirect branches in the program.
    • Reduce the number of calls and returns in the program.
  4. Investigate better code practices.
  5. Investigate whether SIMD instructions may be beneficial.
    • Can certain aspects of the program be optimized such that SIMD instructions can be used?

Note that multi-core optimizations are coming in Level 4, so they are deliberately left off of this list.

We’ll add to the page as we continue to think of more!