Show HN: SSO – Small String Optimization
github.comI wrote this just for fun when saw article about sso in Rust[1]. My string can store up to 23 (excluding null-terminator) 8-bit chars without calling allocator.
Here I can mistake, but.. Curious fact: both - libstdc++[2] and libc++[3] - do access to union member without any check that it is active now. AFAIK, this is UB in C++. But I assume that they just rely on theirs compiler features. I tried to avoid this using `std::byte[]`. But I'm still sure that there are several UB's in my code :)
[1] https://tunglevo.com/note/an-optimization-thats-impossible-i...
[2] https://github.com/gcc-mirror/gcc/blob/d09131eea083e80ccad60...
[3] https://github.com/llvm/llvm-project/blob/4468d58080d0502a05...
> Curious fact: both - libstdc++ and libc++ - do access to union member without any check that it is active now.
Accessing a data member that's within the common initial sequence[1] of both union alternatives is perfectly well-defined.[2]
However, it's true that in this case (I'm looking at libc++) the member isn't quite the same in both alternatives: In one case it's a `char:1` and in the other case a `size_t:1`. Also, in both cases it's nested inside an anonymous `struct __attribute__((packed))`, which means we're dealing with two different compiler extensions already. (Standard C++ supports anonymous unions,[3] but not anonymous structs.) So yes, pedantically speaking, they're relying on the compiler's behavior.
> I tried to avoid this using `std::byte[]`
I don't know about Rust, but in C++ you probably wouldn't be able to type-pun `std::byte[]` in all the ways you'd need to during constant evaluation (i.e., at constexpr time). C++20-and-later require `std::string` to be constexpr-friendly. So that's probably relevant to the library vendors' choices here.
[1] https://eel.is/c++draft/class.mem#def:common_initial_sequenc...
[2] https://eel.is/c++draft/class.mem#general-28
[3] https://eel.is/c++draft/class.union.anon
> Accessing a data member that's within the common initial sequence[1] of both union alternatives is perfectly well-defined.[2]
Wow, thanks for the detailed answer. I'm not that familiar with standard. I will explore your links.
> I don't know about Rust, but in C++ you probably wouldn't be able to type-pun `std::byte[]` in all the ways you'd need to during constant evaluation
Probably yes. Despite on that I used `constexpr` in impl, I didn't tested this. I will add task to backlog (and never return to it, xd).
Interesting. The writing is a little unclear, but I enjoyed nonetheless!
Here's my user test:
https://news.pub/?try=https://www.youtube.com/embed/tQXoCbUh...
Sure, I need to add some descriptions for people who don't know what it is before seeing my implementation.
Thanks for the review!