Reduce size of Typed::type_info()#24171
Conversation
…s>` to `Option<Arc<CustomAttributes>>`. This avoids needing to allocate them for the common case of being empty.
…icInfo]>>`. This avoids needing to allocate it for the common case of being empty.
…eneric parts of their `new`. This avoid monomorphizing calls to the allocator.
…the more standard naming convention.
| /// Creates an empty set of generics. | ||
| pub fn new() -> Self { | ||
| Self(Box::new([])) | ||
| Self(None) |
There was a problem hiding this comment.
Note that a box doesn't allocate for zero sized allocations. https://godbolt.org/z/oG8vYj14M
There was a problem hiding this comment.
You're right, I didn't know that. But making it an option does still make a difference to binary sizes. Problem is that I haven't got a firm explanation for why.
In x86 release the difference is -137KB, but that doesn't seem to come from type_info functions - there's a ton of small differences across the codebase. WASM release is similar (-57KB).
I tried codegen-units = 1 to make it more consistent. Now the binaries have a much smaller gap (-3KB), but it's clearly in type_info related things (and one Debug::fmt for some reason).
Delta Bytes │ Item
─────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+111 ┊ <&T as core::fmt::Debug>::fmt::h63ae76a0c6ee540b
-51 ┊ bevy_animation::_::<impl bevy_reflect::type_info::Typed for bevy_animation::AnimationEventFn>::type_info::{{closure}}::h12b731417151ee00
-51 ┊ bevy_asset::path::_::<impl bevy_reflect::type_info::Typed for bevy_asset::path::AssetPath>::type_info::{{closure}}::hd80b9ced7d70fa35
-51 ┊ bevy_asset::render_asset::_::<impl bevy_reflect::type_info::Typed for bevy_asset::render_asset::RenderAssetUsages>::type_info::{{closure}}::hcb8bc4662600b515
-51 ┊ bevy_camera::camera::_::<impl bevy_reflect::type_info::Typed for bevy_camera::camera::CameraMainTextureUsages>::type_info::{{closure}}::hb3a19a65e0b77a74
-51 ┊ bevy_camera::camera::_::<impl bevy_reflect::type_info::Typed for bevy_camera::camera::Exposure>::type_info::{{closure}}::h3b5aff6c36ea2d1c
-51 ┊ bevy_ecs::entity::_::<impl bevy_reflect::type_info::Typed for bevy_ecs::entity::Entity>::type_info::{{closure}}::h8ba36f50a2ead030
-51 ┊ bevy_ecs::entity::_::<impl bevy_reflect::type_info::Typed for bevy_ecs::entity::EntityGeneration>::type_info::{{closure}}::he8a1fb98c4a32142
-51 ┊ bevy_ecs::entity::_::<impl bevy_reflect::type_info::Typed for bevy_ecs::entity::EntityIndex>::type_info::{{closure}}::h2e90f066044f2ab0
-51 ┊ bevy_image::image::_::<impl bevy_reflect::type_info::Typed for bevy_image::image::Image>::type_info::{{closure}}::h890c0d08362bae5d
-51 ┊ bevy_render::camera::_::<impl bevy_reflect::type_info::Typed for bevy_render::camera::CameraRenderGraph>::type_info::{{closure}}::hbce80d5fd843e51c
-51 ┊ bevy_render::storage::_::<impl bevy_reflect::type_info::Typed for bevy_render::storage::ShaderBuffer>::type_info::{{closure}}::h237ed12a74b20ba7
-37 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::h0efa23d8c559cc3b
-37 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::h33796cf1654ce1db
-37 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::h41cae767814fb25c
-37 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::hadd00d4cdabd4ae8
-37 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::hc7bf014d11ab9b1a
-37 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::hfb9e3e46da5b9169
-30 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::h1f3bcfa6fdd52b19
-30 ┊ bevy_reflect::utility::GenericTypeCell<T>::get_or_insert_by_type_id::h2e6cfdb8f5594620
-1906 ┊ ... and 116 more.
-2638 ┊ Σ [136 Total Rows]
So my low confidence guess is:
- In release, something is getting duplicated or failing to optimize across code-gen units, so the problem is magnified.
- In release with
codegen-units = 1, the compiler is able to optimize out a small number of drop calls (not allocations) because it works out the option is alwaysNonein some cases.
I'm not sure whether to revert the change to Generics or leave it in. It does make a small difference to release mode, but also complicates the code.
There was a problem hiding this comment.
I'd leave it in, in some form. It helps communicate intent. Not every type has generics. Plus it's a bit less work for the optimizer. Maybe the simpler code is making the inliner a bit more eager.
Objective
Reduce binary sizes by changing the generated code for
Typed::type_info(). Also makes some very minor improvements to runtime memory and compile times.Background
Say I have this reflected struct:
The generated code for
Example::type_infois:So on the first call it creates a bunch of data structures that represent the type (
StructInfo,NamedFieldetc), and later calls just return a reference to that data.The assembly code for
type_infocan be surprisingly large -Example::type_infois 2KB (x86, release). This might not sound like much, but it adds up when there's over a thousand reflected types, and some of them are much more complex - the assembly forbevy_ui::Node::type_infois 26.5KB. In total there's several megabytes depending on build settings (it's hard to be completely accurate due to inlining and the way some things are hidden behind closures).Solution
This PR reduces the size of
type_info. In a release build,Example:type_infogoes from 2KB to 126 bytes, andbevy_ui::Node::type_infofrom 26.5KB to 10.1KB. The binary goes from 79.03MB to 74.38MB (-4.64MB, -5.9%).There's a few different tweaks:
1. Avoid allocations
Several structs contain an
Arc<CustomAttributes>. But most types doesn't have any custom attributes. Switching toOption<Arc<CustomAttributes>>avoids allocating anArcfor the empty case.pub fn new(name: &'static str) -> Self { Self { name, - custom_attributes: Arc::new(CustomAttributes::default()), + custom_attributes: None, #[cfg(feature = "reflect_documentation")] docs: None, }Similarly,
Genericsis aBox<[GenericInfo]>that's usually empty.These two changes should also give a small runtime performance and memory win.
2. Reduce generic code
StructInfo::newand others look roughly like this:Being generic means that the whole function gets duplicated for each unique
T, and often gets inlined too. ButTis only used once byType::of::<T>().This PR splits the function into a small generic stub that calls a larger non-generic with
inline(never).3. Reduce inlining
I've also made a few changes to inlining on
type_info:impl bevy_reflect::Typed for Example { - #[inline] + #[inline(never)] fn type_info() -> &'static bevy_reflect::TypeInfo { static CELL: bevy_reflect::utility::NonGenericTypeInfoCell = bevy_reflect::utility::NonGenericTypeInfoCell::new(); - CELL.get_or_set(|| { + CELL.get_or_set(#[inline(never)] || { ...The first change might be contentious as it makes
get_represented_type_info+20% slower in micro-benchmarks - although 20% slower only means one cycle slower. I'd guess that's worth it to save ~363KB - it might even be a performance increase in real code due to memory latency. But it's debatable - I'd be fine to remove it.The second change prevents inlining on the closure that can only be called once - this gives a ~330KB saving with no performance cost. Although I'm not quite sure why it's a saving. I suspect that non-LTO release builds are ending up with two copies - one regular closure and one inlined copy.
Putting it all together
Here's the effect of each change on a release build of
3d_scene(Win10, x86).Arc<CustomAttributes>to optionGenericsto optionTypeInfovariants to reduce genericstype_infoinner closuretype_infoitselfCompile times maybe went down a second or two (2m21s -> 2m19s), although I'm not sure how much to trust that.
And the before/after on a few different builds:
The optimized build has
codegen-units = 1, lto = "fat", panic = "abort".Can More Be Done?
Yes, I can imagine a bunch more ways to make
type_infosmaller. This PR only does the easy stuff. But all the options I could think of are more complex, and there will be diminishing returns.The dream would be to make
TypeInfoentirelyconstdata. But that's a major overhaul and might mean requiring it to leak memory - that's fine for static reflection, but I don't know if it's also used where leaking would be problematic.A less drastic option would involve changing
type_infoto not useStructInfo/etc directly - instead it would build up a bunch of smaller POD types that don't need much copying and never need to free memory. Then finally it would pass this simpler type to a non-inlined function that actually creates the realTypeInfo. Would that be worth it? It adds a lot of code, and there's a risk that the savings might be just a few hundred KB.Testing