In this episode Jason gives a very fast quick-start to what std::future is and how to use std::async to run a function in another thread.
In this episode Jason gives a very fast quick-start to what std::future is and how to use std::async to run a function in another thread.
Yesterday I decided to look into adding std::future
support to ChaiScript. To be fair future is the return value of several other higher level threading constructs, so we don’t want just “std::future
,” we want enough support to make it usable.
If you have dozens or hundreds of threads doing work they are probably producing some sort of output. I’m not referring to logger output, but to permanent storage, perhaps in a database. If you cannot store the data as quickly as you produce it, you will eventually run into problems. If you are just barely able to keep up with data storage then your scalability will be limited. In the system I am working on poor database performance caused a cascade of errors, making the root cause very difficult to track down. The queue used to store database inserts was growing so large that it was causing out of memory errors which resulted in new
s returning null. The library calling new
did not check the return value, which then caused a segmentation fault when the library attempted to use the pointer.
While debugging my massively multithreaded C++ application I would notice times where the application would seem to pause for a few moments. During one of these pauses I halted the application and attached to it with the debugger (GDB). From within GDB I listed (info threads
), switched to (thread
) and looked at the stack (bt
) of each thread running. I saw something surprising and very telling. Nearly every single thread that was supposed to be performing work was actually blocked on a mutex inside of either malloc
or free
.
If your application does not scale as your threads increase, you should check the code to make sure there are no hidden mutexes limiting your concurrency.
Let’s not overcomplicate things here. If you have 160 threads all trying to run concurrently, even if they are doing little to no work, they are all still doing some work. There’s no reason to make the threads work harder than they need to. Before we get too far, let’s make sure that our compiler optimizations are enabled. It’s possible that compiler optimizations can obscure debugging information or introduce runtime errors if you are doing things that are “tricky” in your C++ code. Be careful, at the first sign that the optimizations are causing problems, back them off. As far as I know, all optimizations are generally considered to be safe, but add to compile time and to the size of the compiled code. Hopefully, if you read this blog often, you do not try to be too clever in your code and are sticking to the spec. If you do not specify any compiler options to GCC you are telling the compiler to build for i386 architecture with no optimizations. Modern CPU’s have far more capabilities than the i386 and taking advantage of them is a good idea. A good start is to enable the most common optimizations and tune the compiled code for a modern CPU architecture: ` gcc -O2 -mtune=pentium3`
This is the beginning of short series of articles related to optimizing massively multithreadded C++ applications. I’m not entirely sure what the exact definition of “massively multithreaded” is, but for our purposes, let’s assume at least twice as many threads as CPU cores. Having so many OS level threads may not be the most efficient way of handling concurrency, but it is legitimate if most of your threads spend most of their time waiting. They may be waiting on a timeout, as in a timer thread that fires on regular intervals, or a message processing thread that is waiting on IO. The application that I am currently optimizing and debugging has anywhere from 25 to 165 threads running depending on the current parameters of the system. All of them are waiting on something: timers, network IO or message queue condition variables that signal the arrival of new messages. There were several hurdles to getting this configuration to execute efficiently. For this intro article, I’m going to start with two links that I actually found after I had done most of the optimization, and were not directly useful to me.
We’ve covered the “Assembly Language”, “C” and “C++” of the C++ threading world, and now we are going to try and move beyond that.
I’m going to cover a thread safety strategy I have been thinking about lately. Let’s look at an example for a typical “lock the variables as you use them” approach:
If boost::threads represent the C of multithreaded programming, then RAII and automatically managed threads represent the C++ of multithreaded programming. In the last article we promised that using more RAII would allow us to get this code even smaller and better to manage. Here is the result of that:
Note 2016-03-15 std::threads now have all this and more
During the course of debugging a potential memory leak at work I noticed that Linux seems to allocate at least 8M of memory for each thread created. This very simple test program illustrates the memory allocations:
Note that some of the details on volatile
are out of date now - 2016-03-15
Jon asks: