Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / General Programming / Fountain Re: Data alignment Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
Rock

June 28, 1999, 12:17 PM

I'm replying to an answer in The Fountain of Knowledge about data alignment. Although the cache probably plays a part in alignment efficiency, it is not the real reason that unaligned data is slow. It has to do with how the cpu accesses memory. During a memory read, the processor always accesses 32 bits of data from an aligned 32 bit address, and then selects the appropriate bytes (called banks) from the 32 bit data, depending on if your reading 1, 2, or 4 bytes of data. If your trying to read an unaligned address though, the processoor reads the first 32 bit aligned address for the first part of the data, and then must do a second read instruction to get the rest of the data from the next 32 bit aligned address.
When writing data, the processor only writes the data needed; 1, 2, or 4 bytes. However, if the destination address is not aligned and your writing a 4 byte value, then the processor must write part of the data to the first 32 bit aligned address, and again perform a second write to the next 32 bit address for the rest of the data.

As for incurring unaligned penalties in tight loops, this is where I've seen data unalignment crumble a program. I timed a non-transparent blt (rep movsd) on my P200, and found that copying to/from an unaligned address was 5 times slower than an aligned copy!!!!!! Its a WORLD of difference.

Rock

 
Tim Smith

June 29, 1999, 03:21 PM

A good rule of thumb I use is to naturally align all structure elements.

For anyone who doesn't know, natrual alignment means that:

(structure_offset MOD data_size) == 0

For example:

A 2 byte data variable should be aligned on an even address.

struct NOT_NATURALLY_ALIGNED {
char a;
short b; // 1 MOD 2 = 1
char c;
};

struct NATURALLY_ALIGNED {
char a;
char c;
short b;
};

Some RISC processors, such as the DEC Alpha would actually throw an exception if data was not aligned. The OS would handle the exeception and fetch the data properly, but the program would take a HUGE performance hit. When we ported our application from the DEC VAX chip to the DEC Alpha chip, we found it was easy to align the data. So the processor throwing an exception with misaligned data wasn't as bad as it first might have seemed.

Tim

 
Mark Neumann

June 30, 1999, 01:29 PM

Thanks for the extra info. I also was looking at some demo code and learned how to align data in memory returned from malloc by adjusting the pointers. And it runs a bit faster, too... now I can write video memory at 32MB/sec! This is with just copying single words, dwords won't work for my application. I'm pretty satisfied with the speed now, but it just makes me wonder how Intel claims AGP 2X (like mine is) video cards can transfer 533MB/sec. Hehe, that seems pretty outrageous.

Tim Smith wrote:
>>A good rule of thumb I use is to naturally align all structure elements.
>>
>>For anyone who doesn't know, natrual alignment means that:
>>
>>(structure_offset MOD data_size) == 0
>>
>>For example:
>>
>>A 2 byte data variable should be aligned on an even address.
>>
>>struct NOT_NATURALLY_ALIGNED {
>> char a;
>> short b; // 1 MOD 2 = 1
>> char c;
>>};
>>
>>struct NATURALLY_ALIGNED {
>> char a;
>> char c;
>> short b;
>>};
>>
>>Some RISC processors, such as the DEC Alpha would actually throw an exception if data was not aligned. The OS would handle the exeception and fetch the data properly, but the program would take a HUGE performance hit. When we ported our application from the DEC VAX chip to the DEC Alpha chip, we found it was easy to align the data. So the processor throwing an exception with misaligned data wasn't as bad as it first might have seemed.
>>
>>Tim

 
Kurt Miller

June 30, 1999, 02:20 PM


There was a small mistake in the original response, bits vs. bytes, but
the information is still valid. Apologies for the confusion, and thanks
to you folks for clarifying/adding to this.

-kurt

 
Rock

June 30, 1999, 03:01 PM

>>Thanks for the extra info. I also was looking at some demo code and learned how to align data in memory returned from malloc by adjusting the pointers. And it runs a bit faster, too... now I can write video memory at 32MB/sec! This is with just copying single words, dwords won't work for my application. I'm pretty satisfied with the speed now, but it just makes me wonder how Intel claims AGP 2X (like mine is) video cards can transfer 533MB/sec. Hehe, that seems pretty outrageous.

malloc doesn't align dynamic memory automatically? I was under the impression it did (though I never checked).
One other comment is that there is sometimes more involved than just aligning structures/memory (though they are very important). Alignment plays a role when blitting images, which may need to be unaligned by nature (drawing a sprite at x=3, or drawing a clipped sprite from a source offset of x=1). There are sometimes tricks you can use, but for the most part these are things you can't avoid (without having 4 different sprite images anyways). Just something to keep in mind.
As for a video card transfering 533MB/sec, they probably aren't doing it the same way we're discussing. We're using a software blitter (we manually copy bytes to the screen). They are referring to hardware blitting (or some AGP blits, which use some other bus. I don't know much about it) which can taken advantage of through DirectDraw. You tell the card what to copy, and it copies it using its internal bus (usually much more than 32 bits at a time), assuming the images are in video memory. Be aware that software blitter speeds vary widly depending on VRAM memory speed (my friend's 8 meg AGP card has slower VRAM memory access than my old Trident 1Meg card did. Very sad).

Rock

 
This thread contains 5 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.