1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
//! This crate provides the [`multiversion`] attribute for implementing function multiversioning.
//!
//! Many CPU architectures have a variety of instruction set extensions that provide additional
//! functionality. Common examples are single instruction, multiple data (SIMD) extensions such as
//! SSE and AVX on x86/x86-64 and NEON on ARM/AArch64. When available, these extended features can
//! provide significant speed improvements to some functions. These optional features cannot be
//! haphazardly compiled into programs–executing an unsupported instruction will result in a
//! crash.
//!
//! **Function multiversioning** is the practice of compiling multiple versions of a function
//! with various features enabled and safely detecting which version to use at runtime.
//!
//! # Cargo features
//! There is one cargo feature, `std`, enabled by default. When enabled, [`multiversion`] will
//! use CPU feature detection at runtime to dispatch the appropriate function. Disabling this
//! feature will only allow compile-time function dispatch using `#[cfg(target_feature)]` and can
//! be used in `#[no_std]` crates.
//!
//! # Capabilities
//! The intention of this crate is to allow nearly any function to be multiversioned.
//! The following cases are not supported:
//! * functions that use `self` or `Self`
//! * `impl Trait` return types (arguments are fine)
//!
//! If any other functions do not work please file an issue on GitHub.
//!
//! # Target specification strings
//! Targets are specified as a combination of architecture (as specified in [`target_arch`]) and
//! feature (as specified in [`target_feature`]).
//!
//! A target can be specified as:
//! * `"arch"`
//! * `"arch+feature"`
//! * `"arch+feature1+feature2"`
//!
//! A particular CPU can also be specified with a slash:
//! * `"arch/cpu"`
//! * `"arch/cpu+feature"`
//!
//! The following are some valid target specification strings:
//! * `"x86"` (matches the `"x86"` architecture)
//! * `"x86_64+avx+avx2"` (matches the `"x86_64"` architecture with the `"avx"` and `"avx2"`
//! features)
//! * `"x86_64/x86-64-v2"` (matches the `"x86_64"` architecture with the `"x86-64-v2"` CPU)
//! * `"x86/i686+avx"` (matches the `"x86"` architecture with the `"i686"` CPU and `"avx"`
//! feature)
//! * `"arm+neon"` (matches the `arm` architecture with the `"neon"` feature
//!
//! A complete list of available target features and CPUs is available in the [`target-features`
//! crate documentation](target_features::docs).
//!
//! [`target`]: attr.target.html
//! [`multiversion`]: attr.multiversion.html
//! [`target_arch`]: https://doc.rust-lang.org/reference/conditional-compilation.html#target_arch
//! [`target_feature`]: https://doc.rust-lang.org/reference/conditional-compilation.html#target_feature
/// Provides function multiversioning.
///
/// The annotated function is compiled multiple times, once for each target, and the
/// best target is selected at runtime.
///
/// Options:
/// * `targets`
/// * Takes a list of targets, such as `targets("x86_64+avx2", "x86_64+sse4.1")`.
/// * Target priority is first to last. The first matching target is used.
/// * May also take a special value `targets = "simd"` to automatically multiversion for common
/// SIMD target features.
/// * `attrs`
/// * Takes a list of attributes to attach to each target clone function.
/// * `dispatcher`
/// * Selects the preferred dispatcher. Defaults to `default`.
/// * `default`: If the `std` feature is enabled, uses either `direct` or `indirect`,
/// attempting to choose the fastest choice. If the `std` feature is not enabled, uses `static`.
/// * `static`: Detects features at compile time from the enabled target features.
/// * `indirect`: Detect features at runtime, and dispatches with an indirect function call.
/// Cannot be used for generic functions, `async` functions, or functions that take or return an
/// `impl Trait`. This is usually the default.
/// * `direct`: Detects features at runtime, and dispatches with direct function calls. This is
/// the default on functions that do not support indirect dispatch, or in the presence of
/// indirect branch exploit mitigations such as retpolines.
///
/// # Example
/// This function is a good candidate for optimization using SIMD.
/// The following compiles `square` three times, once for each target and once for the generic
/// target. Calling `square` selects the appropriate version at runtime.
///
/// ```
/// use multiversion::multiversion;
///
/// #[multiversion(targets("x86_64+avx", "x86+sse"))]
/// fn square(x: &mut [f32]) {
/// for v in x {
/// *v *= *v
/// }
/// }
/// ```
///
/// This example is similar, but targets all supported SIMD instruction sets (not just the two shown above):
///
/// ```
/// use multiversion::multiversion;
///
/// #[multiversion(targets = "simd")]
/// fn square(x: &mut [f32]) {
/// for v in x {
/// *v *= *v
/// }
/// }
/// ```
///
/// # Notes on dispatcher performance
///
/// ### Feature detection is performed only once
/// The `direct` and `indirect` dispatchers perform function selection on the first invocation.
/// This is implemented with a static atomic variable containing the selected function.
///
/// This implementation has a few benefits:
/// * The function selector is typically only invoked once. Subsequent calls are reduced to an
/// atomic load.
/// * If called in multiple threads, there is no contention. Both threads may perform feature
/// detection, but the atomic ensures these are synchronized correctly.
///
/// ### Dispatcher elision
/// If the optimal set of features is already known to exist at compile time, the entire dispatcher
/// is elided. For example, if the highest priority target requires `avx512f` and the function is
/// compiled with `RUSTFLAGS=-Ctarget-cpu=skylake-avx512`, the function is not multiversioned and
/// the highest priority target is used.
///
/// [`target`]: attr.target.html
/// [`multiversion`]: attr.multiversion.html
pub use multiversion;
/// Provides a less verbose equivalent to the `cfg(target_arch)` and `target_feature` attributes.
///
/// A function tagged with `#[target("x86_64+avx+avx2")]`, for example, is equivalent to a
/// function tagged with each of:
/// * `#[cfg(target_arch = "x86_64")]`
/// * `#[target_feature(enable = "avx")]`
/// * `#[target_feature(enable = "avx2")]`
///
/// The [`target`] attribute is intended to be used in tandem with the [`multiversion`] attribute
/// to produce hand-written multiversioned functions.
///
/// [`target`]: attr.target.html
/// [`multiversion`]: attr.multiversion.html
pub use target;
/// Inherit the `target_feature` attributes of the selected target in a multiversioned function.
///
/// # Example
/// ```
/// use multiversion::{multiversion, inherit_target};
/// #[multiversion(targets = "simd")]
/// fn select_sum() -> unsafe fn(x: &mut[f32]) -> f32 {
/// #[inherit_target]
/// unsafe fn sum(x: &mut[f32]) -> f32 {
/// x.iter().sum()
/// }
/// sum as unsafe fn(&mut[f32]) -> f32
/// }
pub use inherit_target;
/// Information related to the current target.
pub use target_features;