1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
//! This crate provides the [`target_clones`] attribute and [`multiversion!`] macro for
//! implementing function multiversioning.
//!
//! ```toml
//! [dependencies]
//! multiversion = "0.1"
//! ```
//!
//! Many CPU architectures have a variety of instruction set extensions that provide additional
//! functionality. Common examples are single instruction, multiple data (SIMD) extensions such as
//! SSE and AVX on x86/x86-64 and NEON on ARM/AArch64. When available, these extended features can
//! provide significant speed improvements to some functions. These optional features cannot be
//! haphazardly compiled into programs–executing an unsupported instruction will result in a
//! crash. Function multiversioning is the practice of compiling multiple versions of a function
//! with various features enabled and safely detecting which version to use at runtime.
//!
//! # Target specification strings
//! Targets for both the [`target_clones`] attribute and the [`multiversion!`] macro are specified
//! as a combination of architecture (as specified in the `target_arch` attribute) and feature (as
//! specified in the `target_feature` attribute). A single architecture can be specified as:
//! * `"arch"`
//! * `"arch+feature"`
//! * `"arch+feature1+feature2"`
//!
//! while multiple architectures can be specified as:
//! * `"[arch1|arch2]"`
//! * `"[arch1|arch2]+feature"`
//! * `"[arch1|arch2]+feature1+feature2"`
//!
//! The following are all valid target specification strings:
//! * `"x86"` (matches the `"x86"` architecture)
//! * `"x86_64+avx+avx2"` (matches the `"x86_64"` architecture with the `"avx"` and `"avx2"`
//! features)
//! * `"[mips|mips64|powerpc|powerpc64]"` (matches any of the `"mips"`, `"mips64"`, `"powerpc"` or
//! `"powerpc64"` architectures)
//! * `"[arm|aarch64]+neon"` (matches either the `"arm"` or `"aarch64"` architectures with the
//! `"neon"` feature)
//!
//! # Example
//! The following example is a good candidate for optimization with SIMD. The function `square`
//! optionally uses the AVX instruction set extension on x86 or x86-64. The SSE instruciton set
//! extension is part of x86-64, but is optional on x86 so the square function optionally detects
//! that as well. This is automatically implemented by the [`target_clones`] attribute.
//!
//! This is works by compiling multiple *clones* of the function with various features enabled and
//! detecting which to use at runtime. If none of the targets match the current CPU (e.g. an older
//! x86-64 CPU, or another architecture such as ARM), a clone without any features enabled is used.
//! ```
//! use multiversion::target_clones;
//!
//! #[target_clones("[x86|x86_64]+avx", "x86+sse")]
//! fn square(x: &mut [f32]) {
//! for v in x {
//! *v *= *v;
//! }
//! }
//! ```
//!
//! The following produces a nearly identical function, but instead of cloning the function, the
//! implementations are manually specified. This is typically more useful when the implementations
//! aren't identical, such as when using explicit SIMD instructions instead of relying on compiler
//! optimizations. The multiversioned function is generated by the [`multiversion!`] macro.
//! ```
//! use multiversion::multiversion;
//!
//! multiversion!{
//! fn square(x: &mut [f32])
//! "[x86|x86_64]+avx" => square_avx,
//! "x86+sse" => square_sse,
//! default => square_generic,
//! }
//!
//! #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
//! #[target_feature(enable = "avx")]
//! unsafe fn square_avx(x: &mut [f32]) {
//! for v in x {
//! *v *= *v;
//! }
//! }
//!
//! #[cfg(target_arch = "x86")]
//! #[target_feature(enable = "sse")]
//! unsafe fn square_sse(x: &mut [f32]) {
//! for v in x {
//! *v *= *v;
//! }
//! }
//!
//! fn square_generic(x: &mut [f32]) {
//! for v in x {
//! *v *= *v;
//! }
//! }
//!
//! # fn main() {}
//! ```
//!
//! # Implementation details
//! The function version dispatcher consists of a function selector and an atomic function pointer.
//! On the first invocation of a multiversioned function, the dispatcher loads the atomic and since
//! it's null, invokes the function selector. The result of the function selector is stored in the
//! atomic, then invoked. On subsequent calls, the atomic is not null and the contents are invoked.
//!
//! Some comments on the benefits of this implementation:
//! * The function selector is only invoked once. Subsequent calls are reduced to an atomic load,
//! branch, and indirect function call.
//! * If called in multiple threads, there is no contention. It is possible for two threads to hit
//! the same function before function selection has completed, which results in each thread
//! invoking the function selector, but the atomic ensures that these are synchronized correctly.
//!
//! [`target_clones`]: attr.target_clones.html
//! [`multiversion!`]: macro.multiversion.html
extern crate proc_macro;
use ToTokens;
use ;
/// Provides function multiversioning by explicitly specifying function versions.
///
/// Functions are selected in order, calling the first matching target. The final function must
/// have the `default` target, which indicates that this function does not require any special features.
///
/// # Safety
/// Functions compiled with the `target_feature` attribute must be marked unsafe, since calling
/// them on an unsupported CPU results in a crash. The `multiversion!` macro will produce a safe
/// function that calls `unsafe` function versions, and the safety contract is fulfilled as long as
/// your specified targets are correct. If your function versions are `unsafe` for any other
/// reason, you must remember to mark your generated function `unsafe` as well.
///
/// # Examples
/// ## A simple feature-specific function
/// This example creates a function `where_am_i` that prints the detected CPU feature.
/// ```
/// use multiversion::multiversion;
///
/// multiversion!{
/// fn where_am_i()
/// "[x86|x86_64]+avx" => where_am_i_avx,
/// "x86+sse" => where_am_i_sse,
/// "[arm|aarch64]+neon" => where_am_i_neon,
/// default => where_am_i_generic,
/// }
///
/// fn where_am_i_avx() {
/// println!("avx");
/// }
///
/// fn where_am_i_sse() {
/// println!("sse");
/// }
///
/// fn where_am_i_neon() {
/// println!("neon");
/// }
///
/// fn where_am_i_generic() {
/// println!("generic");
/// }
///
/// # fn main() {}
/// ```
/// ## Making `target_feature` functions safe
/// This example is the same as the above example, but calls `unsafe` specialized functions. Note
/// that the `where_am_i` function is still safe, since we know we are only calling specialized
/// functions on supported CPUs.
/// ```
/// use multiversion::multiversion;
///
/// multiversion!{
/// fn where_am_i()
/// "[x86|x86_64]+avx" => where_am_i_avx,
/// "x86+sse" => where_am_i_sse,
/// "[arm|aarch64]+neon" => where_am_i_neon,
/// default => where_am_i_generic,
/// }
///
/// #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
/// #[target_feature(enable = "avx")]
/// unsafe fn where_am_i_avx() {
/// println!("avx");
/// }
///
/// #[cfg(target_arch = "x86")]
/// #[target_feature(enable = "sse")]
/// unsafe fn where_am_i_sse() {
/// println!("sse");
/// }
///
/// #[cfg(any(target_arch = "arm", target_arch = "aarch64"))]
/// #[target_feature(enable = "neon")]
/// unsafe fn where_am_i_neon() {
/// println!("neon");
/// }
///
/// fn where_am_i_generic() {
/// println!("generic");
/// }
///
/// # fn main() {}
/// ```
/// Provides automatic function multiversioning by compiling *clones* of the function for each
/// target.
///
/// The proper function clone is invoked depending on runtime CPU feature detection. Priority is
/// evaluated left-to-right, selecting the first matching target. If no matching target is found,
/// a clone with no required features is called.
/// # Example
/// The function `square` runs with AVX or SSE compiler optimizations when detected on the CPU at
/// runtime.
/// ```
/// use multiversion::target_clones;
///
/// #[target_clones("[x86|x86_64]+avx", "x86+sse")]
/// fn square(x: &mut [f32]) {
/// for v in x {
/// *v *= *v;
/// }
/// }
/// ```