EmptyCrate.com: Double vs. Float, Which is Better?

Neither C++ Coding Standards nor Effective C++ addresses the question of which float point type is best to use and in what situations. There are three floating point types in C and C++:

float
double
long double

What the Standard Has to Say There are exactly two guarantees provided by the standard:

precision(float) <= precision(double) <= precision(long double). We are guaranteed that float has no more precision than double and double has no more precision than long double. It is possible for all three to be the same data type in a given implementation.

The default floating point type is double, that is, typeid(3.3) is double.

f suffix

If you want a float constant, you must specify this with the f suffix: 3.3f.
L suffix

Similarly, a long double constant must be specified with the L suffix: 3.3L.

Guidance Provided by Stroustrup

In all of my C++ resources, the only guidance I can find is in the C++ Programming Language by Bjarne Stroustrup (the creator of C++).

“The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don’t have that understanding, get advice, take the time to learn, or use double and hope for the best.” [emphasis added]

Standard Library Implementation

Where there is only one version of a standard library floating point operation, the library defaults to working with double. This includes the functions atof and strtod. In C89 the only data type supported by all math.h functions was double. Performance On modern hardware double outperforms float in every case. In the higher optimization levels long double even outperforms float. The test code was compiled with the command line

g++ floatdouble.cpp -std=c++0x -O3 -march=native

Type name: f Size in bytes: 4 Summation time in s: 2.82 summed value: 6.71089e+07 // float Type name: d Size in bytes: 8 Summation time in s: 2.78585 summed value: 6.6e+09 // double Type name: e Size in bytes: 16 Summation time in s: 2.76812 summed value: 6.6e+09 // long double

The test code was:

#include <chrono>
#include <vector>
#include <iostream>
#include <typeinfo>

template
T sum(int num_times, T value)
{
  T val=0;

  std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < num_times; ++i)
  {
    val += value;
  }
  std::chrono::high_resolution_clock::duration d = std::chrono::high_resolution_clock::now() - t1;

  std::cout << "Type name: " << typeid(T).name() << " Size in bytes: " << sizeof(T) << " Summation time in s: " << std::chrono::duration_cast>(d).count(); 

  return val;
}

int main()
{
  std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
  std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
  std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
  std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
  std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
  std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
  std::cout << " summed value: " << sum(2000000000, 3.3) << std::endl;
}

Conclusion

double should be your preferred floating point type in nearly every situation.

It’s faster.
It’s the default in C and C++.
It’s more portable and the default across all C and C++ library functions.
It’s significantly higher precision. The answer above is outright wrong (off by 2 orders of magnitude) for float. My simple 3.3 summation example quickly loses precision. In a smaller case, summing the value 3.3 2000 times results in: 6599.89 when using float instead of the correct answer of 6600.
Stroustrup recommends it.

There is exactly one case where you should use float instead of double

It’s smaller. On 64bit hardware with a modern gcc, double is 8 bytes and float is 4 bytes.

Similarly, there is exactly one time you would need to use long double.

It’s very high precision. You might have an application that needs the absolutely most correct floating point precision you can get today.

Finally, and perhaps most importantly: 64bit floating point has been the standard supported in Intel compatible CPU’s supporting the SSE2 instruction set since 2001. On most platforms double uses this same 64 SSE2 (IEEE64 bit) type.

EmptyCrate

About Me | Contact Me | Training | C++ Weekly (YouTube) | Awesome C++ T-Shirts

Double vs. Float, Which is Better?