Why is xor faster than mov




















But of course, it may do the same thing with a "MOV Rn, 0". A good compiler will choose the best variant for the target platform anyway so this is usually only an issue if you're coding in assembler. If the CPU is smart enough, your XOR dependency disappears since it knows the value is irrelevant and will set it to zero anyway again this depends on the actual CPU being used.

However, I'm long past caring about a few bytes or a few clock cycles in my code - this seems like micro-optimisation gone mad. Smaller actually does matter because on many real workloads one of the main factors limiting performance is i-cache misses.

This wouldn't be captured in a micro-benchmark comparing the two options, but in the real world it will make code run slightly faster. What could be faster than executing a MOV instruction? Not executing any instruction at all! It then throws away the instruction because there is no need to execute it. The net result is that the XOR pattern uses zero execution resources and can, on recent Intel CPUs, 'execute' four instructions per cycle.

Improve this question. Jens Only posting a comment because this question seems to be confused about asking about asm vs. Possible duplicate of What is the best way to set a register to zero in x86 assembly: xor, mov or and?

Add a comment. Active Oldest Votes. But they still execute differently. Now if you were talking about another processor, lets say ARM 0: e3a mov r0, 0 4: e eor r0, r0, r0 8: e3a mov r0, 0 c: e str r0, [r1] e ldr r0, [r1] e eor r0, r0, r0 e str r0, [r1] You dont save anything by using the xor exclusive or, eor : one instruction is one instruction both fetched and execution. Improve this answer. Community Bot 1 1 1 silver badge. Necrolis Necrolis I have no idea what you mean by "hot paths for zero assignment".

Can you provide a reference? As a side note, xor reg, reg was slower than mov reg, 0 on the Pentium Pro, because the processor thought the former had a dependency on reg. Prior to that, there was no Out-Of-Order execution in this family of processor, and after that, processors learnt to recognize xor reg, reg as independent of the previous value of reg.

Pascal: By "hot paths for zero assignment" I meant that the micro-code is optimized to do this with minimal latency by breaking dependancies as you mentioned — Necrolis. On SandyBridge, xor-zeroing is special-cased and handled by register renaming, it doesn't even use an execution port. I've never heard anything about similar tricks applying to mov reg,0 but it'd be cool if they exist, do you have a source for that?

That's why Intel recommends it. Brian Knoblauch Brian Knoblauch Yes, I used to find that too. It's because, pre-optimization, on older architectures, the compiler outputs chunks of asm as if from a recipe book. Loads of unnecessary memory access, pointless moving data between registers, etc.

A proficient human coder, on the other hand, writes assembler that is partly optimized by default. But few humans could write code like a seriously optimizing compiler, esp. Which is as it should be, because modern processors are not designed to be programmed directly by humans.

Hacker News new past comments ask show jobs submit. Steve44 on Nov 28, parent next [—] It's been a while since I programmed low level but I think on the 68k series they started to introduce cache and multi stage instruction pipelines.

Sharlin on Nov 28, root parent next [—] It's fascinating how far down the rabbit hole goes these days. At least we have movhps as a replacement for memory-source punpcklqdq , but narrower widths that actually shuffle can't be replaced.

With this, there's no reason for MMX. If a is related to b , then b is related to a. What is the best way to set a register to zero in x86 assembly: xor, mov or and? Asked 6 Months ago Answers: 5 Viewed 65 times. Still prefer bit operand-size. Microbenchmark experiments might want this sub eax, eax ; same as xor on most but not all CPUs; bad on Silvermont for example.

Code-size is the only penalty. May be worth using only high regs to avoid needing vzeroupper in short functions. What's special about zeroing idioms like xor on various uarches Some CPUs recognize sub same,same as a zeroing idiom like xor , but all CPUs that recognize any zeroing idioms recognize xor. All CPUs avoids partial-register penalties for later code. Intel P6-family and SnB-family. Intel SnB-family smaller uop no immediate data leaves room in the uop cache-line for nearby instructions to borrow if needed.

Intel SnB-family. See my answer on another question about zeroing registers for some more details. From Agner Fog's microarch guide, pg 98 Pentium M section, referenced by later sections including SnB : The processor recognizes the XOR of a register with itself as setting it to zero.

This tag is remembered even in a loop: ; Example 7. Partial register problem avoided in loop xor eax, eax mov ecx, LL: mov al, [esi] mov [edi], eax ; No extra uop inc esi add edi, 4 dec ecx jnz LL from pg82 : The processor remembers that the upper 24 bits of EAX are zero as long as you don't get an interrupt, misprediction, or other serializing event.

TL:DR: If it really makes your code nicer or saves instructions, then sure, zero with mov to avoid touching the flags, as long as you don't introduce a performance problem other than code size. Since C 6. Padding instructions, like the question asked for: Agner Fog has a whole section on this: " Without an immediate, there's no extra cost for Note that PIE executables allow ASLR even for the executable, and are the default in many Linux distro, so if you can keep your code PIC without any perf downsides, then that's preferable.

You could even use mov reg, 0 instead of xor reg,reg. This penalty is usually one clock per extra prefix. Agner's microarch guide, end of section 6. TODO: finish this section. Until then, consult Agner Fog's microarch guide. GAS doesn't have an override to immediate size, only displacements. GAS does let you add an explicit ds prefix, with ds mov src,dst gcc -g -c padding.

All above is valid for a symmetric and anti-reflexive relationship. That means that: If a is related to b , then b is related to a a is never related to a.



0コメント

  • 1000 / 1000