You can use %w0 if I remember right. I just tested it, too. :-).
Up vote 8 down vote favorite 1 share g+ share fb share tw.
Code is x86, 32 bit: int test (int x) { int y; // do a bit-rotate by 8 on the lower word. Leave upper word intact. Asm ("rorw $8, %0\n\t": "=q"(y) :"0"(x)); return y; } If I compile it I get the following (very valid) warning: Warning: using `%ax' instead of `%eax' due to `w' suffix What I'm looking for is a way to tell the compiler/assembler that I want to access the lower 16 bit sub-register of %0.
Accessing the byte sub-registers (in this case AL and AH) would be nice to know as well. I've already chosen the "q" modifier, so the compiler is forced to use EAX, EBX, ECX or EDX. I've made sure the compiler has to pick a register that has sub-registers.
I know that I can force the asm-code to use a specific register (and its sub-registers), but I want to leave the register-allocation job up to the compiler. Gcc assembly x86 gas link|improve this question edited Mar 27 '11 at 19:59ire_and_curses17.1k83366 asked Sep 23 '08 at 1:59Nils Pipenbrinck25.2k448121 90% accept rate.
You can use %w0 if I remember right. I just tested it, too. :-) int test(int x) { int y; asm ("rorw $8, %w0" : "=q" (y) : "0" (x)); return y; } Edit: In response to the OP, yes, you can do the following too: int test(int x) { int y; asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x)); return y; } At present, the only place (that I know of) it's documented in is gcc/config/i386/i386.
Md, not in any of the standard documentation.
– Nils Pipenbrinck Sep 23 '08 at 2:06 Thanks, I'm glad it helped! – Chris Jester-Young Sep 23 '08 at 11:03.
Dan, I need that lower byte swapping primitive for a larger tweak. I know that 16 bit operations in 32 bit code have been slow and frowned upon, but the code will be surrounded with other 32 bit operations. I hope that the slowness of the 16 bit code will just get lost in the out of order scheduling.
What I want to archive in the end is a mechansim to do all 24 possible byte permutation of a dword in-place. For this you need only three instructions at most: low-byte swap (e.g. Xchg al, ah), bswap and 32 bit rotates. The in-place way does not need any constants (faster code fetch / decode time) and only uses a single register.
For x86/32 that may save me up to 6 costly memory-accesses (push/pop) ontop of the ca. 10 instructions I save for byte shuffling. First tests have shown that such a code can run up to three times faster on my core2, but I have to make more measurements on other machines before I can use it. My secret plan is to integrate this tweak into GCC one day, but that may not ever happen because GCC is such a huge codebase.
While I'm thinking about it ... you should replace the "q" constraint with a capital "Q" constraint in Chris's second solution: int test(int x) { int y; asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x)); return y; } "q" and "Q" are slightly different in 64-bit mode, where you can get the lowest byte for all of the integer registers (ax, bx, cx, dx, si, di, sp, bp, r8-r15). But you can only get the second-lowest byte (e.g. Ah) for the four original 386 registers (ax, bx, cx, dx).
Yes, good point, thank you! I'll edit my post now. :-) – Chris Jester-Young Sep 24 '08 at 3:56.
So apparently there are tricks to do this... but it may not be so efficient. 32-bit x86 processors are generally slow at manipulating 16-bit data in general purpose registers. You ought to benchmark it if performance is important.
Unless this is (a) performance critical and (b) proves to be much faster, I would save myself some maintenance hassle and just do it in C: uint32_t y, hi=(x&~0xffff), lo=(x&0xffff); y = hi + (((lo >> 8) + (lo.
! My timing tests (for a billion runs, 5 trials) were: my version = (3, Sep 237, 3, Sep 238, 3, Sep 239, 4.10, 4.18), your version = (5.33, Sep 23 '083, Sep 23 '083, Sep 23 '083, Sep 23 '083). – Chris Jester-Young Sep 23 '08 at 11:21 So, we're looking at a 20% speed improvement.
Isn't that "much faster"? – Chris Jester-Young Sep 23 '08 at 11:23 Chris, absolutely right... your version is faster it seems. But not nearly as much as 6-instructions-vs.-1-instruction would lead you to expect, and that's what I was warning about.
I didn't actually do the comparison myself, so props to you for testing it! – Dan Sep 23 '08 at 16:38.
Nils, Gotcha. Well if it's a primitive routine that you're going to be reusing over and over, I have no argument with it... the register naming trick that Chris pointed out is a nice one that I'm going to have to remember. It would be nice if it made it into the standard GCC docs too!
Dan, I checked the GCC documentation twice and then filed a bug report because this info is missing. Who knows - maybe it makes it into the next release. – Nils Pipenbrinck Sep 23 '08 at 16:44 I found the bug at gcc.gnu.org/bugzilla/show_bug.cgi?id=37621, and it looks like there may be resistance to documenting this feature since it's only meant for internal use.
Hrm... – Dan Sep 24 '08 at 17:35.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.