After reading the source to current versions of gcc, and thinking about the motivations for writing "two movl" sequences to load or store doubles on x86, I have a high level of confidence that the guarantee on Pentiums for 8-byte-aligned doubles is enough to give the same atomicity requirements for 'double' pins as other types, as long as a few assumptions are made about the compiler. I've produced a patch and show one way in which this is a big improvement.
There are many other places in emc where more precision would be nice. The same precision problems exist, for instance, when determining following error on axis positions which reach into the 10000s or 100000s---think rotary axes which almost always turn in one direction, frequently seen when using cnc to create gears.
*pin is therefore restricted to types for which gcc always generates atomic store operations.*pin = newvalue;
On Pentium systems, 8-byte-aligned 8-byte stores (such as with the fld and fst instructions) give the same guarantees as 4-byte-aligned 4-byte stores (such as movl).
Historically, two main problems that I am aware of prevented us from taking advantage of this property. First, when hal was conceived, it was pretty reasonable to imagine it running on pre-Pentium systems where there is no atomicity guarantee for fld/fst. Second, it was poorly understood when gcc would generate fld/fst, and when it might produce some other sequence such as the "two movl" sequence.
In 2008, we are promoting Pentium-III PCs as minimum for emc (in part because this performance level is needed for gnome and for emc's modern GUIs, and in part because almost everyone is using systems with this performance level anyway). Since 2006 or so we have required that systems include the 'rdtsc' instruction, and have never heard from anyone who couldn't use emc because of this detail. We can depend on Pentium-class CPUs.
I have also recently spent time looking at what current gcc does to load and store floats. While I can't say with certainty what it does in all cases (and I certainly can't say for sure what future versions of gcc will do), I believe I understand where and why it chooses to use "two movl" sequences---and, more importantly, I am pretty sure I know how to prevent it from doing so, and instead giving fld/fst in all the cases I looked at.
Basically, gcc in any optimizing mode will attempt to generate an instruction sequence that is fairly efficient. This leads to guess that if any arithmetic is done on a floating-point value, it will be loaded from memory using fld, and the result will be stored with fst. The alternative would be to store it in a temporary location, then move it from the temporary location to the final location with the "two movl" sequence, which is so obviously suboptimal that I just can't believe gcc ever does it.
That leaves two basic cases to look at: direct memory-memory assignments (like the *pin = newvalue; example) and constant memory assignments (*pin = 1.0).
(Actually, a third one occurs to me: you might think to write *pin = -newvalue; as an integer operation, since it is merely inverting the top bit; a similar argument can be made for abs. I can't find any sign that gcc actually does this, though.)
;; Moves usually have one cycle penalty, but there are exceptions. (define_insn_reservation "pent_fmov" 1
In the gcc i386.md file, there are several alternatives for "movdf" (move double-precision float). One of them is called "movdf_integer", and it is deactivated by (among other conditions) (optimize_size || !TARGET_INTEGER_DFMODE_MOVES). This ties in exactly with the methods I found for avoiding the "two movl" sequence: either specifiy a modern architecture with -mtune=pentium4, or specify space optimization with -Os. (fld/fst is certainly a shorter sequence than movl/movl/movl/movl!)
This required a fair number of other changes--in some places, temporaries of type "float" were deliberately used. In other places, the assumption was made that sizeof(hal_float) == sizeof(hal_s32). Overall, I was a little surprised at the size of the patch. However, the testsuite works, emc works, and halscope works. I haven't actually run any torture tests designed to determine whether these stores really are atomic, though.
I'd be thrilled if this could get into emc 2.3, but for now I'm going to let it live as a patch.
$ diffstat doubles.patch Makefile | 2 +- hal/components/ddt.comp | 4 ++-- hal/components/pid.c | 8 ++++---- hal/components/streamer.h | 4 ++-- hal/hal.h | 4 +++- hal/hal_lib.c | 30 +++++++++++++++++++++++------- hal/hal_priv.h | 12 +++++++++++- hal/utils/halcmd_commands.c | 2 +- hal/utils/scope_disp.c | 4 ++-- hal/utils/scope_files.c | 2 +- hal/utils/scope_rt.c | 22 ++++++++++++++-------- hal/utils/scope_shm.h | 5 +++-- hal/utils/scope_trig.c | 4 ++-- 13 files changed, 69 insertions(+), 34 deletions(-)
Files currently attached to this page:
doubles.patch | 14.5kB |
(originally posted on the AXIS blog)