Neither C++ Coding Standards nor Effective C++ addresses the question of which float point type is best to use and in what situations. There are three floating point types in C and C++:
float
double
long double
What the Standard Has to Say There are exactly two guarantees provided by the standard:
precision(float) <= precision(double) <= precision(long double)
. We are guaranteed that float
has no more precision than double
and double
has no more precision than long double
. It is possible for all three to be the same data type in a given implementation.
The default floating point type is double
, that is, typeid(3.3)
is double
.
f suffix
If you want a float
constant, you must specify this with the f
suffix: 3.3f
.
L suffix
Similarly, a long double
constant must be specified with the L
suffix: 3.3L
.
Guidance Provided by Stroustrup
In all of my C++ resources, the only guidance I can find is in the C++ Programming Language by Bjarne Stroustrup (the creator of C++).
“The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don’t have that understanding, get advice, take the time to learn, or use double and hope for the best.” [emphasis added]
Standard Library Implementation
Where there is only one version of a standard library floating point operation, the library defaults to working with double
. This includes the functions atof
and strtod
. In C89 the only data type supported by all math.h functions was double
. Performance On modern hardware double
outperforms float
in every case. In the higher optimization levels long double
even outperforms float. The test code was compiled with the command line
g++ floatdouble.cpp -std=c++0x -O3 -march=native
Type name: f Size in bytes: 4 Summation time in s: 2.82 summed value: 6.71089e+07 // float Type name: d Size in bytes: 8 Summation time in s: 2.78585 summed value: 6.6e+09 // double Type name: e Size in bytes: 16 Summation time in s: 2.76812 summed value: 6.6e+09 // long double
The test code was:
#include <chrono>
#include <vector>
#include <iostream>
#include <typeinfo>
template
T sum(int num_times, T value)
{
T val=0;
std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < num_times; ++i)
{
val += value;
}
std::chrono::high_resolution_clock::duration d = std::chrono::high_resolution_clock::now() - t1;
std::cout << "Type name: " << typeid(T).name() << " Size in bytes: " << sizeof(T) << " Summation time in s: " << std::chrono::duration_cast>(d).count();
return val;
}
int main()
{
std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
}
Conclusion
double
should be your preferred floating point type in nearly every situation.
float
instead of the correct answer of 6600.There is exactly one case where you should use float
instead of double
double
is 8 bytes and float
is 4 bytes.Similarly, there is exactly one time you would need to use long double
.
Finally, and perhaps most importantly: 64bit floating point has been the standard supported in Intel compatible CPU’s supporting the SSE2 instruction set since 2001. On most platforms double
uses this same 64 SSE2 (IEEE64 bit) type.