Optimizing Massively Multithreaded C++ Applications - Where's the Output Going?

If you have dozens or hundreds of threads doing work they are probably producing some sort of output. I’m not referring to logger output, but to permanent storage, perhaps in a database. If you cannot store the data as quickly as you produce it, you will eventually run into problems. If you are just barely able to keep up with data storage then your scalability will be limited. In the system I am working on poor database performance caused a cascade of errors, making the root cause very difficult to track down. The queue used to store database inserts was growing so large that it was causing out of memory errors which resulted in news returning null. The library calling new did not check the return value, which then caused a segmentation fault when the library attempted to use the pointer.



Optimizing Massively Multithreaded C++ Applications - Beware of Heap Operations

While debugging my massively multithreaded C++ application I would notice times where the application would seem to pause for a few moments. During one of these pauses I halted the application and attached to it with the debugger (GDB). From within GDB I listed (info threads), switched to (thread ) and looked at the stack (bt) of each thread running. I saw something surprising and very telling. Nearly every single thread that was supposed to be performing work was actually blocked on a mutex inside of either malloc or free.



Optimizing Massively Multithreaded C++ Applications - Watch for Hidden Mutexes

If your application does not scale as your threads increase, you should check the code to make sure there are no hidden mutexes limiting your concurrency.



Optimizing Massively Multithreaded C++ Applications - Don't Forget the Obvious

Let’s not overcomplicate things here. If you have 160 threads all trying to run concurrently, even if they are doing little to no work, they are all still doing some work. There’s no reason to make the threads work harder than they need to. Before we get too far, let’s make sure that our compiler optimizations are enabled. It’s possible that compiler optimizations can obscure debugging information or introduce runtime errors if you are doing things that are “tricky” in your C++ code. Be careful, at the first sign that the optimizations are causing problems, back them off. As far as I know, all optimizations are generally considered to be safe, but add to compile time and to the size of the compiled code. Hopefully, if you read this blog often, you do not try to be too clever in your code and are sticking to the spec. If you do not specify any compiler options to GCC you are telling the compiler to build for i386 architecture with no optimizations. Modern CPU’s have far more capabilities than the i386 and taking advantage of them is a good idea. A good start is to enable the most common optimizations and tune the compiled code for a modern CPU architecture: ` gcc -O2 -mtune=pentium3`



Optimizing Massively Multithreaded C++ Applications - Intro

This is the beginning of short series of articles related to optimizing massively multithreadded C++ applications. I’m not entirely sure what the exact definition of “massively multithreaded” is, but for our purposes, let’s assume at least twice as many threads as CPU cores. Having so many OS level threads may not be the most efficient way of handling concurrency, but it is legitimate if most of your threads spend most of their time waiting. They may be waiting on a timeout, as in a timer thread that fires on regular intervals, or a message processing thread that is waiting on IO. The application that I am currently optimizing and debugging has anywhere from 25 to 165 threads running depending on the current parameters of the system. All of them are waiting on something: timers, network IO or message queue condition variables that signal the arrival of new messages. There were several hurdles to getting this configuration to execute efficiently. For this intro article, I’m going to start with two links that I actually found after I had done most of the optimization, and were not directly useful to me.



"The No Twinkie Database"

This blog and I have an ancillary interest in game development. In my personal reading about adventure game design I came across the “No Twinkie Database,” as in: “Bad Game Designer, No Twinkie.” There are some great tidbits in there worth considering.



More System Recovery Options

We recently covered some options for recovery of Windows XP on a netbook. This past week, I uninstalled Linux from my dual boot Ubuntu/Vista HTPC and needed to reformat my master boot record to get Vista booting again. Fortunately, I did have my Vista install CD available and I was able to use it to boot into the Vista “recovery center,” which allowed me to format the master boot record and install the Windows boot loader. While searching for information regarding the Vista Recovery Center, I came across this website which has info on downloading a Vista Recovery Disc that contains only the Recovery Center, and no installation files.



Iterators Must Go (aka, Alexandrescu endorses D)

Andrei Alexandrescu gave the keynote speech at Boostcon 2009. The speech’s title was “Iterators Must Go.” I did not have the opportunity to attend this year’s Boostcon, but the slides of the keynote are available online.



What is C++ Virtual Inheritance?

C++ Virtual Methods

In C++, the virtual keyword, when applied to class methods, aids in polymorphism. If a method is declared to be virtual, the most derived version of the method is executed when a call is made. If a method is non-virtual, the specific version that is called depends on how the method is called. If the method is called via a pointer to the base class, the base class method is called, if by the derived class, then the derived method is called. This definition is weak; an example is better, as usual: