flipCode - Tech File - Adam Robinson [an error occurred while processing this directive]
Adam Robinson
(aka Iok)
Click the name for some bio info

E-Mail: Iok@bigwig.net
Website: http://www.furryfoxy.freeserve.co.uk

   04/01/2002, Tech File Update

Where did I go?

Back! Finally - and I thought I was going to be updating this regually :) This is basically a slight edit of a Tech File I meant to post about 5 months ago, but due to events [below] wasn't able to - so not much of this stuff is new, but might as well get it out my system before carrying on.

At the beginning of July I moved house [to the wonderful Sheffield], then managed to set fire to it - lots of stuff got torched and I got a good lesson in why other people had insurance... anyway - things are finally back to normal, obviosly anyone who's emailed me in the last 5 months is going to have had no joy, and I'll reply to you as fast as I can now I've got an internet conneciton again.

[I'm probably moving house again soon, to Birmingham, so I may dissapear for a while again.]

Where was I then?

Prior to the troubles, I was working on this - the editor for my sound sample synthesiser - written in the wonderful Visual Basic it's not particually nippy but it does the job. I'll release it for you all to play with once I'm sure most of the bugs are gone (They're not atm :)). I still can't make my mind up whether I like writing GUI's better in C or VB - VB development does seem to be rapid but it'd be nice if the language was more solid.

The synth is part of the 64k intro I was coding, work on which has not gone the way I was expecting. When I started out I wrote a plan of all the things I wanted the final code to do, and mapped out how much space I thought it'd all need and decided that I'd have to write most of the intro in MMX optimised assembly language - I know from experience that if you try very hard you can easily get code down to the order of 1/3 the size of compiled C code if you write it by hand in asm [it's much much easier to write smaller code verses faster code against a compiler]. The only problem with this approach is how time consuming it is - I wrote a small, fast sound mixer in asm, it's a nice piece of code but I'm sure it'd have taken me 1/10 the time to write and debug if written in C, even if the end product wasn't as good [in size terms].

Due to the time constraints, I've had to target specific bits of code for size optimisation, unfortunatly picking which bits of code to work on isn't as easy as speed optimising - how do you tell if a piece of code is really shrinkable? I've not found a better way than guessing so far, although anything that touches many different data structures seems to be a good candidate.

As stated, I thought that I'd end up writing loads of MMX code - but I haven't, most of my code is [currently] heavily dependant on the FPU and conditional move instructions - You can do so much with them that current compilers don't (some try :) ), using MMX on top of that is just unnessecary. I've had to spend lots and lots of time looking through compiled C/C++ code to see exactly what my compiler is doing [nessecary when every byte of space is essencial] and the impression I get is that compiler makers don't try very hard to make compiled code very efficient in terms of duplicating information - maybe it's just very difficult - anyway, hand written assembly wins almost every time (I've been beaten on ocasion though :) ).

Things of Note 5 months ago:

Global variables: Don't use them! At least not if you're using UPX [notes below], they also take up far too much codespace to access unless you've pointers to blocks of them [5 byte instruction to access a global variable verses 2/3 for a variable on the stack/pointer (most of the time)].

MASM: It's got bugs, and I've found where most of them are - the structure handling. MASM is meant to be able to handle C like structs and for the most part it does, but inexlicably it'll mess things up sometimes [failing to recognise definitions e.t.c.]- I have a feeling it's just a parser bug. I've been editing and building all my assembly code from Visual Studio - since there is no native support you have to write your own custom build rules, which isn't difficult but it's annoying that you loose having a dependancy generator - you have to add them yourself. The major advantage of using Visual Studio is you get the debugger - just get MASM to drop in debugging symbols and off you go.

[The best place I've found to get MASM, support files and advice on how to use it is the excellent Iczelion's Win32 Assembly Homepage {Note: I can't find a link to this at the moment, anyone know where it's gone?}]

Floating point constatnts: Remember, none of the fpu instuctions can take immediate operands, so everytime you write something like 'dosomething(10.0);' in your code, 4 bytes are added to the data segment. This isn't so much a problem as something to keep in mind when you're coding. I've got a little function that generates fp tables of 2^x / (2^x)-1 and 2^(-x), which is a big win in functions where you need to access lots of different powers of two since keeping the table address in a register or on the stack saves you 2 bytes per fpu instuction :)

Dxguid.lib: Don't link with it. It adds 8k worth of GUID's to your program - if you need a particular GUID just copy it from the DirectX header files. Speaking of them, by default they #include loads of CRT and standard library crap that you really don't want in an intro - it's not hard to rip out all the references, I'd put the modified headers on my website but I'm not sure it's legal to redistribute them.

Fixed point: Use it!... In certain cases :) It's a big size win when used for representing angles [the wrap around helps] even if you get a slight speed hit when loading values in the fpu.

Include Files: Makes me scream this problem does :) I'm writing this intro using Visual Basic, C/C++ and x86 assembly language - I'm using lots of custom data structures that all need their own header files x3 because each language needs them in a different format! Anyone want to write a little tool that'll convert C++ header files to either x86 or VB? Pwease? :)

{I'm running XP now, but if you're on 95/98...} KillHelp.exe: My god - there should be a link to this beautiful little executable on the title page of the DirectX SDK - ever had your system bomb out after having a few DirectX apps crash? It's especially annoying during development isn't it - run KillHelp.exe [comes with the DirectX SDK - in the Bin directory] after a DirectX app crashes and 99 times out of 100 your system won't bomb. It does have the slight annoyance that if you've anything else using DirectX that happens to be unaffected after the crash, this gets terminated too.

UPX, compression and the BSS: UPX is undoubtably the best .EXE compressor I've used on the Win32 platform, even better since you can get hold of the source code and see what's going on. For those of you who don't know, Win32 executables are split into several pieces, headers, some sections of data and some of code or resources [actually you can put what the hell you want in different sections but they are the usual ones]. I won't go into too much detail, but basically when you write a line like this in C: 'char bytearray[1024];' or this in asm 'db (?) dup 1024' you allocate 1024 bytes of space in your applicaitons memory map, but take up no space in the disk image - this is uninitalised data and is often put into a EXE section called the BSS [although, because of the way Win32 EXE's work, you can just increase the virtual size of any section, I think Borland C actually does make a BSS section though]. Now this is all well and good and seems of no consequence... until you use UPX to compress your program. It seems that it actually trys to compress unitalised data, no matter where it is and that will increase the size of your compressed EXE. I don't know if this is a bug or a nessesity of the way the comrpession works but it's very annoying. It's not hard to dymanically allocate all your memory and classes though, and this is what I do now [Although I had initally planned to use loads of huge global arrays, just like in the good old days :)]

Back to the current - last few things I wrote for the Intro

A ClassFactory/ClassScrapheap that gets rid of all the calls to New/Delete. I use objects a hell of a lot in the intro's code, but New and Delete have an annoying call overhead that was pissing me off (and that I found no way to get rid of, the code appears to be built into MSVC), so I built a system to get rid of them. Code and details are on my website, but basically it turns this [pseudo]code (from a New):

object_ptr = malloc(object_size);
  object = object_ptr;
  object = NULL;

Into this code:

object_ptr = ClassFactory(objectid,param,...); 

Which doesn't look like anything special [and I suppose it isn't] but the ClassFactory code is tiny [see webpage] and object instancing becomes tiny and can all go through a single function pointer if lots of different objects are instanced at a time.

Video Capture to a texture from my Webcam. This was an arse to shrink - first of all, don't use the wrapper macros in VFW.h, they're unnessecary and have a code overhead use SendMessage and stick the parameters for consecutive calls in an array so you can loop and send them all off in a smaller piece of code. Secondly, the documentation is bad - but worse, the drivers for most Webcams are terrible. I've got my code working on Creative and Logitech Cameras, but have had problems with others. So I suggest experimentation :) Looking at the source to VirtualDub helped loads [Has anyone got the capture code from Games Programming Gems 2 to work? It crashes for me].

Also, you need to be very careful about exactly what bits of the VFH.h you include, some bits add massive object files to your application - there are a load of switches, I use:

#define NOMMREG
#define NOMCIWND
#define NOMSACM 

and I've modified the header on top of that. (Am I the only one who spends ages combing through Microsoft's header files to see what kinda undocumented stuff you can pull off? :) )

Direct3D Render and Texturestage states, FVF's and Strides. I need to use many [about 30 so far] very different pipeline setups for what I want to do, so storing different configurations is hugely space consuming. I decided in the end to store the setups in 32bit DWORD 'shaders' [I'm only using 27 bits atm and I've got almost every effect I could want]. Some [rather large ~500 Bytes] code then generates the pipeline setup [in a D3D stateblock], the correct FVF and vertex stride, and sets up a table enabling my 3D geometry generator [GEC - 'Geometry Encapsulation Class'] to convert it's native vertex format (which contains all vertex components of D3D FVF's plus a few extras to help with the mesh generation) quickly into the correct [and most efficient] D3D FVF format for whatever components that particular 'shader' uses.

I can currently select/change [per 'shader'];

Alpha Blending [16 combinations, plus switching it on and off].
Multiple textures with static or hardware generated 2D or 3D coordinates.
Indexed Vertex Blending [4 bones - is always done in Software Vertex processing mode since Geforce's can't do this in hardware - all other 'shaders' are done in hardware].
Various Texture Transforms.
D3D specular or diffuse lighting or static pervertex colours.
Alpha testing of a few types [gt/lt 0,128,255].
Ztesting and Zwriting.
Whether clipping is wanted or not.
A wide range of multitexture and texture->diffuse blend modes. [Including some useful DOT3 stuff].

Floating Point Optimisation in MSVC. Deh. For historical reasons [mainly good old Watcom C/C++ 10.6] I was used to a) Making sure I put (float) casts all over the place in floating point expressions, and b) making sure I defined all floating point constants like this: 10.0f instead of this: 10.0

Well - don't (I bet most of you don't anyway, but so :)) MSVC 6, won't optimise across (float) casts, and the 10.0f counts as a cast. If it hits a cast it dumps all fp variables back to memory before starting the next bit of the expression. I always wondered why my compiled floating point code looked so bad when I disassembled it. It's still not great though, the optimiser is not good at keeping values on the FPU stack between expressions.

Other annoying MSVC problems I'm almost too annoyed with to talk about. Volatile variables - make the optimiser write correct but seriously weird code. Function pointers - in 'optimise for size' builds, even if you explicitly tell the compiler to make a series of function calls through a pointer, more often than not it optimises them back out to explicit calls again! Force it by loading the function pointer from some inline asm :) Word/Byte loads and compares - Most of the time and code the compiler produces is fine, but on ocassion, especially in places I can imagine are difficult to optimise, I've seen it output correct but huge/horrible code. Redundant writes etc.

All I can suggest is that you do what I do and disassemble your code every couple of builds and see exactly how your C/C++ [or whatever] gets converted. Anyway - more programming to do now, got a hardware accelerated Texture generator to finish [and studying - just started a degree and I'm behind in my work already :) ].


Oh - don't compile 64k Intro's with Intel's compiler - the code will NOT be smaller than MSVC (when you've got the project settings correct) :)

  • 04/01/2002 - Tech File Update
  • 03/08/2001 - Tech File Update
  • 02/11/2000 - Tech File Update
  • 01/06/2000 - Adsy Loves Microsoft...
  • 10/27/1999 - Gone For A Burton - Or Not - Maybe.
  • 09/14/1999 - Update 3
  • 08/23/1999 - Update 2
  • 07/23/1999 - In the Beginning

  • This document may not be reproduced in any way without explicit permission from the author and flipCode. All Rights Reserved. Best viewed at a high resolution. The views expressed in this document are the views of the author and NOT neccesarily of anyone else associated with flipCode.

    [an error occurred while processing this directive]