Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / General Programming / Color addition with saturation Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
Thierry Tremblay

April 19, 1999, 01:08 PM

I have two colors in two 32 bits integers. I want to add them with color saturation. I know there is a way to do this quite fast, bacause I coded it in the Matrox D3D driver when I was working there. Unfortunately, I don't remember where I found that... I though it was in the Graphics Gems, but I cant find it.

Here is what I managed to get, but it's not perfect:

unsigned AddColor( unsigned c1, unsigned c2 )
{
unsigned total, carry, temp, mask;

total = (c1 & 0x00FEFEFE) + (c2 & 0x00FEFEFE);
carry = total & 0x01010100;
temp = carry >> 8;
mask = carry - temp;

return (total | mask);
}

But this is wrong... Because say you add 0x80 and 0x7F together for a color component, the result will be 0xFE instead of 0xFF. The reason is that when I compute "total" I am masking off the lower bits (I need to do this to get the carry bits).

I've seen methods involving computing bytes 2 and 0 together, then compute byte 1, and finally merge the results. But as I said, I know it can be done (scrapping the highest byte, and I don't care about him).

Thierry

 
Harmless

May 02, 1999, 02:34 PM

Thierry Tremblay wrote:

>>I have two colors in two 32 bits integers. I want to add them with color saturation. I know there is a way to do this quite fast, bacause I coded it in the Matrox D3D driver when I was working there. Unfortunately, I don't remember where I found that... I though it was in the Graphics Gems, but I cant find it.

>>unsigned AddColor( unsigned c1, unsigned c2 )
>>{
>> unsigned total, carry, temp, mask;
>>
>> total = (c1 & 0x00FEFEFE) + (c2 & 0x00FEFEFE);
>> carry = total & 0x01010100;
>> temp = carry >> 8;
>> mask = carry - temp;
>>
>> return (total | mask);
>>}

>>But this is wrong... Because say you add 0x80 and 0x7F together for a color component, the result will be 0xFE instead of 0xFF. The reason is that when I compute "total" I am masking off the lower bits (I need to do this to get the carry bits).

Bah white is overrated anyways. ;) MMX opcodes may make life a bit easier in this regard.

__fastcall unsigned int (unsigned char a[8], unsigned char b[8]) {
// for (int i=0;i += b; } __asm { movq mm1, [ecx] paddusb mm2, [edx] movq [ecx], mm1 emms }; } In practice I would probably move the MMX code into an inner loop so you don't get slaughtered by emms lag. Rolled into an inner loop it becomes 3 cycles for 2 integer adds saturated on individual bytes. Saturated addition one one byte can be done with the following on a ppro or later // ppro/p2/p3 add cl,dl // 1 uop cmovc cl,255 // 2 uops // 3 linearly dependant uops or with the more generic approach that follows on anything: // ppro/p2/p3 // p/pmmx add cl,dl // 1 uop // U pipe sbb dl,dl // 2 uops // V pipe or cl,dl // 1 uop // U pipe // 3 dependant // 1.5 cycles // uops, 4 total In practice the mmx variant is the fastest, because well thats what it mmx was designed for, your method is probably faster than massaging them on a byte by byte basis to use either of the more generic saturated byte snippets above. >>I've seen methods involving computing bytes 2 and 0 together, then compute byte 1, and finally merge the results. But as I said, I know it can be done (scrapping the highest byte, and I don't care about him). Seems reasonable, in practice I'd probably just use the mmx loop and use something like your current one as a stand-in for non-mmx machines, they are getting rare enough these days. But on the other hand, if you're doing a lot of fpu operations right around the same time mmx could be bad. -Harmless

 
This thread contains 2 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.