C++ Implicit Conversion to Boolean, A Love Letter
When legacy rears its gnarly head.
Dear C++
We’ve known each other a while. Oh how we danced together, laughed together, cried together. But I must confess some distress. Your dirty little secret is out.
void foo(std::string_view str) {
— Martin Hořeňovský (@horenmar_ctu) May 30, 2021
std::cout << "string_view\n";
}
void foo(bool b) {
std::cout << "bool\n";
}
int main() {
foo("hello");
}
this fucking language
I can do these theatrics no longer!
Implicit Conversion
The C++ standard defines a set of implicit conversions, that is, conversions that can be performed automatically by the compiler without an explicit cast (static_cast
et. al.). “Why?” you may ask.
For certain types this makes sense. For instance, performing an implicit widening cast from char
to int
in the following code means you don’t need an overload to handle both types. Less work == better, nice.
int decrement(int n) { return n - 1; }
// Unnecessary overload for char, yay!:
// char decrement(char n) { return static_cast<int>(n - 1); }
int main() {
char c = 1;
return decrement(c);
}
There is actually another subtle implicit conversion going on here. Unless specified with a valid suffix, integer literals are int
s. char c = 1
is an implicit conversion from an int
literal to a char
.
In general, it removes some of the tedium when different types interact.
Boolean Implicit Conversion
Then there is boolean implicit conversion.
C++ has a lot of legacy in the standard because of its initial desire to build on C. Implicit boolean conversion is one of them. There is no bool
type in C; in a boolean context any zero value is false and everything else is true. Same goes for C++ despite having a boolean type. From CppReference:
A prvalue of integral, floating-point, unscoped enumeration, pointer, and pointer-to-member types can be converted to a prvalue of type bool.
The value zero (for integral, floating-point, and unscoped enumeration) and the null pointer and the null pointer-to-member values become false. All other values become true.
Boolean Implicit Conversion and Function Overload Resolution
When a function is overloaded, for each call site the compiler ranks each overload based on a complicated definition of best candidate. The easiest case is when it finds a perfect match; the arguments match the parameter types exactly. When they don’t, the compiler falls back on trying to convert the arguments to the parameter types expected by each overload. The easiest (read “least bad”) set of conversions wins and the associated overload gets called.
This is when two titans meet.
As well as user-defined conversions (constructors or conversion functions), implicit conversions are used to determine overload viability. The conversions are ranked in order of best candidate to worst:
Each type of standard conversion sequence is assigned one of three ranks:1
- Exact match: no conversion required, lvalue-to-rvalue conversion, qualification conversion, function pointer conversion, (since C++17) user-defined conversion of class type to the same class
- Promotion: integral promotion, floating-point promotion
- Conversion: integral conversion, floating-point conversion, floating-integral conversion, pointer conversion, pointer-to-member conversion, boolean conversion, user-defined conversion of a derived class to its base
[…]
- A standard conversion sequence is always better than a user-defined conversion sequence or an ellipsis conversion sequence.
A standard conversion sequence consists of the following, in this order:2
- zero or one conversion from the following set: lvalue-to-rvalue conversion, array-to-pointer conversion, and function-to-pointer conversion;
- zero or one numeric promotion or numeric conversion;
- zero or one function pointer conversion; (since C++17)
- zero or one qualification conversion.
In summary, constructors and other user written code are worse candidates than implicit conversions like widening casts and boolean conversion. It is that way because the C++ standard committee decreed it so.
I thought I knew thee
To understand why boolean conversion can be painful in practice, let’s look at an example with our favourite company, The Big X (TBX). One day, TBX developer Alex needs to ‘foo’ a string, so they write:
void foo(char const* s) { std::cout << "char const*\n"; }
// ...
foo("hello world");
// Output:
// char const*
Many gallons of coffee later, their colleague, Sally, decides that there are a lot of cases where TBX doesn’t care about the value of the string parameter, only if it’s non-null. Sick of competing for coolest phrase at review time3, she adds an overload:
void foo(char const* s) { std::cout << "char const*\n"; }
void foo(bool b) { std::cout << "bool\n"; }
// ...
foo("hello world");
foo(true);
// Output:
// char const*
// bool
Happy day!
Some years later Kevin comes along. Kevin has great ideas. He just learned about std::string_view
. Wouldn’t it be great if we didn’t have to call std::string::c_str()
when trying to foo a std::string
? Why yes Kevin, it would be great. So Kevin replaces the c string overload:
void foo(std::string_view sv) { std::cout << "string_view\n"; }
void foo(bool b) { std::cout << "bool\n"; }
// ...
foo("hello world");
foo(true);
std::string s("goodbye nothing");
foo(s);
// Output:
// bool <<<< Unintended overload!!!
// bool
// string_view
Because std::string_view
uses a constructor to convert from a c string, it’s considered a user-defined conversion. And as we just learned, user-defined conversions have less precedence than the standard conversion to the boolean overload. Now the whole code base is calling the wrong function. Poor Kevin’s done it again.
The worst part – and I can’t emphasize this enough – is that it’s a silent change with innocent looking code. This is just one of many subtle ways it could creep into a code base. Would this silent change be caught in the wild? Let’s just hope that TBX has sufficient testing.
TL;DR
This f$#%ing language. We’re on a break.
-
Order of the conversions, Implicit conversions, CppReference ↩
-
Add it saves on future bloating in the
.text
section of the binary. Win-win! ↩