Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.* 2 > Newsgroup comp.lang.asm.x86

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 07-11-2012, 07:21 PM
Jim Leonard
Guest
 
Posts: n/a
Default Possible to change flags without math op?

I'm writing a piece of code for real-mode 16-bit x86 that needs to
process a sequence of bytes and do something with their values based
on whether or not they are 00, FF, or neither. Jumping based on the
zero flag and/or sign flag was my first idea, but MOV and LODS don't
alter flags. Loading to CX and using JCXZ works for zero values, but
not signed (FF) values. So my next thought was to just ensure that a
target register is empty, so that a math op will alter the flags for
me:

XOR reg,reg
ADD reg,mem

....but I'm optimizing for speed (this is in a loop) and it irks me to
have to zero the register. Is the above the fastest way to do what I
want, or am I missing something obvious?

(For those who enjoy optimizing the core problem itself instead of
assembly, the goal is to load bytes in sequence and add their values
to a target register. The rules:
1. If a loaded value is 0, STOP reading.
2. If a loaded value is FF, add to target register and keep reading.
3. If a loaded value is not FF, add to target register and STOP
reading.
There is probably a way to combine rules #1 and #3 such that they use
the same branch or something, but so far I'm not seeing it.)
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 07-11-2012, 07:54 PM
Bernhard Schornak
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

Jim Leonard wrote:


> I'm writing a piece of code for real-mode 16-bit x86 that needs to
> process a sequence of bytes and do something with their values based
> on whether or not they are 00, FF, or neither. Jumping based on the
> zero flag and/or sign flag was my first idea, but MOV and LODS don't
> alter flags. Loading to CX and using JCXZ works for zero values, but
> not signed (FF) values. So my next thought was to just ensure that a
> target register is empty, so that a math op will alter the flags for
> me:
>
> XOR reg,reg
> ADD reg,mem
>
> ...but I'm optimizing for speed (this is in a loop) and it irks me to
> have to zero the register. Is the above the fastest way to do what I
> want, or am I missing something obvious?
>
> (For those who enjoy optimizing the core problem itself instead of
> assembly, the goal is to load bytes in sequence and add their values
> to a target register. The rules:
> 1. If a loaded value is 0, STOP reading.
> 2. If a loaded value is FF, add to target register and keep reading.
> 3. If a loaded value is not FF, add to target register and STOP
> reading.
> There is probably a way to combine rules #1 and #3 such that they use
> the same branch or something, but so far I'm not seeing it.)



test reg, reg
je 1f
inc reg
jne 0f
add tgt, 0xFF
jmp read

0:add tgt, -1(reg)
1:...


Greetings from Augsburg

Bernhard Schornak
Reply With Quote
  #3 (permalink)  
Old 07-11-2012, 08:35 PM
Jim Leonard
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 11, 2:54*pm, Bernhard Schornak <schor...@nospicedham.web.de>
wrote:
> * *test reg, reg
> * *je * 1f
> * *inc *reg
> * *jne *0f
> * *add *tgt, 0xFF
> * *jmp *read


So the main answer to my question is that this:

mov reg,mem
test reg,reg

....is the most direct way to set flags. On my target platform it is 1
cycle faster than:

xor reg,reg
add reg,[mem]

....so I'll use that. Yes, I am not very bright.
Reply With Quote
  #4 (permalink)  
Old 07-11-2012, 09:22 PM
Alex McDonald
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 11, 8:21*pm, Jim Leonard <mobyga...@nospicedham.gmail.com>
wrote:
> I'm writing a piece of code for real-mode 16-bit x86 that needs to
> process a sequence of bytes and do something with their values based
> on whether or not they are 00, FF, or neither. *Jumping based on the
> zero flag and/or sign flag was my first idea, but MOV and LODS don't
> alter flags. *Loading to CX and using JCXZ works for zero values, but
> not signed (FF) values. *So my next thought was to just ensure that a
> target register is empty, so that a math op will alter the flags for
> me:
>
> XOR reg,reg
> ADD reg,mem
>
> ...but I'm optimizing for speed (this is in a loop) and it irks me to
> have to zero the register. *Is the above the fastest way to do what I
> want, or am I missing something obvious?
>
> (For those who enjoy optimizing the core problem itself instead of
> assembly, the goal is to load bytes in sequence and add their values
> to a target register. *The rules:
> 1. If a loaded value is 0, STOP reading.
> 2. If a loaded value is FF, add to target register and keep reading.
> 3. If a loaded value is not FF, add to target register and STOP
> reading.
> There is probably a way to combine rules #1 and #3 such that they use
> the same branch or something, but so far I'm not seeing it.)


Since adding 0 does nothing;

1 read, add to target
2 if FF goto 1

lea si, mem
xor bx, bx
@@1: lods byte
add bx, ah
cmp ah, ffh
je @@1
Reply With Quote
  #5 (permalink)  
Old 07-11-2012, 09:59 PM
Terje Mathisen
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

Alex McDonald wrote:
> On Jul 11, 8:21 pm, Jim Leonard <mobyga...@nospicedham.gmail.com>
>> 1. If a loaded value is 0, STOP reading.
>> 2. If a loaded value is FF, add to target register and keep reading.
>> 3. If a loaded value is not FF, add to target register and STOP
>> reading.
>> There is probably a way to combine rules #1 and #3 such that they use
>> the same branch or something, but so far I'm not seeing it.)

>
> Since adding 0 does nothing;
>
> 1 read, add to target
> 2 if FF goto 1
>
> lea si, mem
> xor bx, bx
> @@1: lods byte
> add bx, ah


That's an illegal register combination, won't assemble...

> cmp ah, ffh
> je @@1
>


First, there is no need to actually count the number of FF bytes, that
will drop out from the offset between the start and the end address, right?

Second, any byte value except FF will cause a STOP, so that's the only
time-critical test:

mov dx,si ; Save starting point
next2:
lodsw
inc ax
jz next2 ; Loop while we get FFFF

; At this point we know that we didn't get two FF bytes,
; so adjust the SI pointer/counter

dec si
dec si

; Was the first byte 0?

cmp al,1
je STOP

; Count the non-zero byte
inc si

; Was it not FF (which would have turned into 0)?
jnc STOP

; The first byte was FF, must check the second:
; It cannot have been FF as well, so 0 or not

cmp ah,1
je STOP

inc si

STOP:
sub si,dx

You can also consider using REPE SCASW instead of the initial load loop!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"


Reply With Quote
  #6 (permalink)  
Old 07-11-2012, 10:14 PM
Bernhard Schornak
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

Jim Leonard wrote:


> On Jul 11, 2:54 pm, Bernhard Schornak <schor...@nospicedham.web.de>
> wrote:
>> test reg, reg
>> je 1f
>> inc reg
>> jne 0f
>> add tgt, 0xFF
>> jmp read

>
> So the main answer to my question is that this:
>
> mov reg,mem
> test reg,reg
>
> ...is the most direct way to set flags. On my target platform it is 1
> cycle faster than:
>
> xor reg,reg
> add reg,[mem]
>
> ...so I'll use that. Yes, I am not very bright.



It depends on the processor. If the "read" label looks like


read:mov reg, [mem]
test reg, reg


then this


read:mov reg, [mem]

0:test reg, reg
je 2f
inc reg
jne 1f
mov reg, [mem]
add tgt, 0xFF
jmp 0b

1:add tgt, -1(reg)
2:...


separates the read from the compare and might save a cycle,
as well (the latency for the read is partially "hidden").


Greetings from Augsburg

Bernhard Schornak
Reply With Quote
  #7 (permalink)  
Old 07-12-2012, 09:25 AM
Philip Lantz
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

Jim Leonard wrote:
> Bernhard Schornak wrote:
> > * *test reg, reg
> > * *je * 1f
> > * *inc *reg
> > * *jne *0f
> > * *add *tgt, 0xFF
> > * *jmp *read

>
> So the main answer to my question is that this:
>
> mov reg,mem
> test reg,reg
>
> ...is the most direct way to set flags. On my target platform it is 1
> cycle faster than:
>
> xor reg,reg
> add reg,[mem]
>
> ...so I'll use that.



Yes, mov/test would be the most direct way to test for 0. But as Alex
pointed out, you don't need to test for 0, you only need to test for ff.
The easiest way to test for ff is inc. So:

1: lodsb
add
inc al
jz 1b

It wasn't clear whether you wanted an 8- or 16-bit add, and if 16-bit,
whether the byte should be sign extended or not. If you need a 16-bit
add, you probably want to use movzx or movsx after (or instead of)
lodsb.

However, having said all that, I like Terje's suggestion better--just
loop over all the ff bytes, take the difference between the beginning
and ending address as the number of ff's to add, and then add the
terminating byte (whether it is 0 or not, doesn't matter).
Reply With Quote
  #8 (permalink)  
Old 07-12-2012, 11:15 AM
Terje Mathisen
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

Philip Lantz wrote:
> However, having said all that, I like Terje's suggestion better--just
> loop over all the ff bytes, take the difference between the beginning
> and ending address as the number of ff's to add, and then add the
> terminating byte (whether it is 0 or not, doesn't matter).


Indeed.

I suggested using REPE SCAS at the end of my post, I'm now fairly
confident that this will be the fastest on a 16-bit x86 platform simply
due to smaller code:

; Initial environment setup code
cld ; Might not be needed if the state is known!
push ds
pop es ; ES might already be equal to DS?

; Get ready to scan past up to 64K 0FFh bytes
mov ax,0ffh ; Setting AH = 0 at the same time...
or cx,-1 ; Shorter than MOV CX,0FFFFh
repe scasb

je all_ff

; The count of FF bytes is 0FFFFh - CX

; Check the last byte, should it be counted?
cmp ah,[di-1] ; Will generate a carry IFF the byte isn't 0
sbb cx,0 ; Decrement CX if the STOP byte wasn't 0

all_ff:
or ax,-1 ; MOV AX,0FFFFh
sub ax,cx

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"


Reply With Quote
  #9 (permalink)  
Old 07-12-2012, 11:24 AM
Alex McDonald
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 11, 10:59*pm, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> Alex McDonald wrote:
> > On Jul 11, 8:21 pm, Jim Leonard <mobyga...@nospicedham.gmail.com>
> >> 1. If a loaded value is 0, STOP reading.
> >> 2. If a loaded value is FF, add to target register and keep reading.
> >> 3. If a loaded value is not FF, add to target register and STOP
> >> reading.
> >> There is probably a way to combine rules #1 and #3 such that they use
> >> the same branch or something, but so far I'm not seeing it.)

>
> > Since adding 0 does nothing;

>
> > 1 read, add to target
> > 2 if FF goto 1

>
> > * * * lea *si, mem
> > * * * xor *bx, bx
> > @@1: lods byte
> > * * * add *bx, ah

>
> That's an illegal register combination, won't assemble...


I gave up on 16 bit code a long time ago... :-) Replace with [xor ax,
ax] before the loop and [add bx, ax]

>
> > * * * cmp *ah, ffh
> > * * * je * @@1

>
> First, there is no need to actually count the number of FF bytes, that
> will drop out from the offset between the start and the end address, right?


Then I misinterpreted Jim's description; I thought he required adding
values, as in FF FF FF FE -> 1023 dec, as opposed to FF FF FF FE -> 4.

<snip>

Reply With Quote
  #10 (permalink)  
Old 07-12-2012, 12:37 PM
Terje Mathisen
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

Alex McDonald wrote:
> On Jul 11, 10:59 pm, Terje Mathisen <"terje.mathisen at
>> First, there is no need to actually count the number of FF bytes, that
>> will drop out from the offset between the start and the end address, right?

>
> Then I misinterpreted Jim's description; I thought he required adding
> values, as in FF FF FF FE -> 1023 dec, as opposed to FF FF FF FE -> 4.


That was possible, from the initial description, but it doesn't matter:

The sum of N bytes all equal to FF is fairly easy to calculate. :-)

sum = (N << 8) - N

It also makes the final addition even simpler, since adding 0 is a NOP

sum += last_byte

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"


Reply With Quote
  #11 (permalink)  
Old 07-12-2012, 01:45 PM
James Harris
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 11, 8:21*pm, Jim Leonard <mobyga...@nospicedham.gmail.com>
wrote:

....

> (For those who enjoy optimizing the core problem itself instead of
> assembly, the goal is to load bytes in sequence and add their values
> to a target register. *The rules:
> 1. If a loaded value is 0, STOP reading.
> 2. If a loaded value is FF, add to target register and keep reading.
> 3. If a loaded value is not FF, add to target register and STOP
> reading.
> There is probably a way to combine rules #1 and #3 such that they use
> the same branch or something, but so far I'm not seeing it.)


Yes, it's often better if you can reduce the number of branches. In
this case you should be able to do all the tests and the looping with
just one branch. Since you mention 16-bit mode how about this:

xor ax, ax ;To hold each value 0 to 255 as 16-bit
xor dx, dx ;Sum

loop:
mov al, [si]
inc si
add dx, ax
cmp al, 0xff
jne loop

If you wanted an 8-bit sum instead:

xor ah, ah ;AH = sum

loop:
mov al, [si]
inc si
add ah, al
cmp al, 0xff
jne loop

You could replace the mov al, [si]; inc si with a lodsb instruction
but it would be slower on certain generations of CPU.

If you have any significant section of memory to scan or are running
on a very early CPU the routine might be dominated by the memory read
time so you won't get it to be any faster.

If you have only a small amount of memory to read be careful you don't
try to optimise too much. Instead, it may be better to keep it simple
and easy to maintain.

James
Reply With Quote
  #12 (permalink)  
Old 07-12-2012, 06:08 PM
Jim Leonard
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 12, 6:15*am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> > However, having said all that, I like Terje's suggestion better--just
> > loop over all the ff bytes, take the difference between the beginning
> > and ending address as the number of ff's to add, and then add the
> > terminating byte (whether it is 0 or not, doesn't matter).

> I suggested using REPE SCAS at the end of my post, I'm now fairly
> confident that this will be the fastest on a 16-bit x86 platform simply
> due to smaller code:


Thanks to everyone here for the great suggestions. I appreciate
pointing out the INC suggestion to pick up FF, and I had not
considered the REPE SCAS suggestion AT ALL which is fun to think
about.

For the curious, this is part of decompression code I'm writing (I
cannot change the format). The code will populate CX which will be
used to copy literal or match data.
Reply With Quote
  #13 (permalink)  
Old 07-12-2012, 07:42 PM
Jim Leonard
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 12, 6:24*am, Alex McDonald <b...@nospicedham.rivadpm.com>
wrote:
> Then I misinterpreted Jim's description; I thought he required adding
> values, as in FF FF FF FE -> 1023 dec, as opposed to FF FF FF FE -> 4.


No, you were right: FF FF FF FE -> 1023 dec.
Reply With Quote
  #14 (permalink)  
Old 07-12-2012, 07:43 PM
Jim Leonard
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 12, 6:15*am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> * *or cx,-1 * * ; Shorter than MOV CX,0FFFFh


Wait, what?

-a 0100
13A8:0100 mov cx,ffff
13A8:0103 or cx,-1
13A8:0107
-u 0100
13A8:0100 B9FFFF MOV CX,FFFF
13A8:0103 81C9FFFF OR CX,FFFF
Reply With Quote
  #15 (permalink)  
Old 07-12-2012, 08:09 PM
Jim Leonard
Guest
 
Posts: n/a
Default Re: Possible to change flags without math op?

On Jul 12, 2:42*pm, Jim Leonard <mobyga...@nospicedham.gmail.com>
wrote:
> On Jul 12, 6:24*am, Alex McDonald <b...@nospicedham.rivadpm.com>
> wrote:
>
> > Then I misinterpreted Jim's description; I thought he required adding
> > values, as in FF FF FF FE -> 1023 dec, as opposed to FF FF FF FE -> 4.

>
> No, you were right: *FF FF FF FE -> 1023 dec.


Wait, no, I'm wrong. A sequence of FF FF FF FE should result in
255+255+255+254=1019.

And for extra trouble, this value needs to land in CX because it is
going to be used to immediately copy stuff. Which is why I'm re-
thinking Terje's REPE SCAS idea, because once the last FF byte is
found, there's data now clogging up CX that I need to deal with.
Actually... I guess it's not that big a deal, as both BP and DX are
still free in this loop I'm designing and I can just use them as
scratch.
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 12:59 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.