On 32-bit systems, the widest HAL types are 32 bits ('int' and 'float').
Given that operations like ddt quickly begin to show problems due to the
limited precision of float, why doesn't HAL provide a 64-bit floating-point
type ('double')? Update, 2008-03-26: Clarify that 64-bit stores are
guaranteed atomic on pentium and later systems, and link to some fancy
inline asm tricks for this purpose.
HAL works on the principle that updates to all values must be atomic, and
currently all assignments to HAL pins are of the simple form
*pin = newvalue;
*pin is therefore restricted to types for which gcc always generates
atomic store operations.
Unfortunately, when *pin is double or volatile double,
gcc at least sometimes generates sequences like
movl newvalue, %eax
movl newvalue+4, %edx
movl pin, %ecx
movl %eax, (%ecx)
movl %edx, 4(%ecx)
so that even if each individual movl instruction is atomic,
the full store of newvalue can be interrupted after only 4
of the 8 bytes have been changed.
Here's a program that demonstrates this problem:
typedef volatile double hal_double;
double newvalue;
hal_double *pin;
void test(void) { *pin = newvalue; }
which shows the behavior when compiled with 'gcc -O -mtune=i386 -S vd.c
-o -'. Removal of the 'volatile' qualifier from hal_double makes no
difference to the result.
What can be done about it? Personally, I'm adopting a "just wait" policy.
On 64-bit systems, it will simply be possible
to widen the HAL types to 64 bits and rely on the compiler to generate
'movq' instructions.
Other possibilities include:
- Allowing components to use sei()/cli() pairs around critical regions
where multi-word values are read or written. Error-prone and probably bad
for performance.
- On Pentium and newer systems, 64-bit stores to 64-bit aligned addresses
are guaranteed atomic. Write clever macros which use inline assembly to always
get the desired instruction sequence. Requires sacrificing the '*pin =
newvalue;' assignment syntax.
- Find compiler flags that inhibit the use of two 32-bit stores for
doubles. -mtune=i686 and higher seem to do this.
- Use the "double read" method, which reads twice and compares the
results, until the two reads come up identical. Under a few simple assumptions
(the biggest of which is that the writer is not on a slower thread than the
reader--this means userspace components cannot be 'double' writers to realtime
components) this works, and with high performance. Requires sacrificing the
'*pin = newvalue;' assignment syntax. This trick is used now for some 64-bit
values internal to stepgen.
- One of the above, but with a C++ class wrapper which overrides
operator=. This is trouble because EMC can currently only do C, not C++,
in kernel modules.
atomic64.h uses inline assembly to perform guaranteed-atomic
reads and stores of 64-bit values, and provides C++ wrapper classes that
make these types behave just like built-in types in arithmetic and assignment.
'double', 'unsigned long long' and 'long long' are supported.
(originally posted on the AXIS blog)