string literal

< cpp‎ | language


[edit] Syntax

" (unescaped_character|escaped_character)* " (1)
L " (unescaped_character|escaped_character)* " (2)
u8 " (unescaped_character|escaped_character)* " (3) (since C++11)
u " (unescaped_character|escaped_character)* " (4) (since C++11)
U " (unescaped_character|escaped_character)* " (5) (since C++11)
prefix(optional) R "delimiter( raw_character* )delimiter" (6) (since C++11)

[edit] Explanation

unescaped_character - Any valid character
escaped_character - See escape sequences
prefix - One of L, u8, u, U
delimiter - A string made of any source character but parentheses, backslash and spaces (can be empty)
raw_character - Must not contain the closing sequence )delimiter"

1) Narrow multibyte string literal. The type of an unprefixed string literal is const char[]
2) Wide string literal. The type of a L"..." string literal is const wchar_t[]
3) UTF-8 encoded string literal. The type of a u8"..." string literal is const char[]
4) UTF-16 encoded string literal. The type of a u"..." string literal is const char16_t[]
5) UTF-32 encoded string literal. The type of a U"..." string literal is const char32_t[]
6) Raw string literal. Used to avoid escaping of any character, anything between the delimiters becomes part of the string, if prefix is present has the same meaning as described above.

[edit] Notes

  • The null character ('\0', L'\0', char16_t(), etc) is always appended to the string literal: thus, a string literal "Hello" is a const char[6] holding the characters 'H', 'e', 'l', 'l', 'o', and '\0'.
  • String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello,"  " world!" yields the (single) string "Hello, world!".
    • If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).
    • (since C++11) If one of the strings has an encoding prefix and the other doesn't, the one that doesn't will be considered to have the same encoding prefix as the other.
    • (since C++11) If a UTF-8 string literal and a wide string literal are side by side, the program is ill-formed.
    • Any other combination of encoding prefixes may or may not be supported by the implementation. The result of such a concatenation is implementation-defined.
  • String literals have static storage duration, and thus exist in memory for the life of the program.
  • String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
  • The compiler is allowed, but not required, to merge string literals. That means that identical string literals may or may not compare equal when compared by pointer. Even whether the expression "foo" == "foo" returns true is implementation-defined.
  • In C, string literals are of type char[], and can be assigned directly to a (non-const) char*. C++03 allowed it as well (but deprecated it, as literals are const in C++). C++11 no longer allows such assignments without a cast.
  • Attempting to modify a string literal results in undefined behavior.

[edit] Example

#include <iostream>
char array1[] = "Foo" "bar";
// same as
char array2[] = { 'F', 'o', 'o', 'b', 'a', 'r', '\0' };
const char* s1 = R"foo(
//same as
const char* s2 = "\nHello\nWorld\n";
int main()
    std::cout << array1 << '\n' << array2 << '\n';
    std::cout << s1 << s2;