This update is a 'little' later than expected mainly due to the explosion of my modem. [KABOOM!]. Anyway upto now I haven't been able to read any emails [if anyone actually sent me any...].
16-bits too many
My 3D engine has a 32bit zbuffer and 32bpp output. The 32bit zbuffer could be a mistake but I don't think 16bit zbuffers are accurate enough. The 32bit zbuffer reduces the speed of my rasteriser by 40%! [When zfilling world polygons].
My 3D engine now runs in 32bpp. This allows very nice looking graphics [I'll upload a sample soon], [the 555/565 problem is now dead!] but creates more than a few problems. The highest resolution I'm expecting the game to run in is 640x480x32 on a 300Mhz+ cpu. Currently the engine can do 30fps easily on my computer at this resolution if [and this is a big and complex if] I draw each frame to video memory and run a flip each frame. At the moment this is good and causes no problems, problems will begin to surface when I need to draw any sort of transparancy...
As anyone who has tried will know the video bandwidth is crap when reading from video memory. I recently purchased Unreal [nice 3D engine shame about the game]. The appearance of one little corona on screen halves the framerate. I was planning to use a lot of tranparancy effects in my game and this could means a pathetic frame rate [When unreal has to draw a fullscreen transparancy i.e. 640x480 it gets about 2fps].
What can I do?
I could put my back surface in system memory [there is no point triple buffering when surfaces are in system memory as the cpu has to copy the surface to video memory] - transparancies to system memory in 32bpp are very fast, you can use the mmx packed addition and subtraction instructions to do 4/5 cycles per pixel. Problem is that the base frame rate is very slow compared with the flip. [On my computer - Celeron 333Mhz with an ATI Rage Pro the fastest framerate I can get in 640x480x32 just clearing and copying the screen is 45fps from system memory and 126fps from video memory].
Putting the surface in system memory could work well if there was a way to get some slave system in the computer to copy the surface to video ram for you [My card cannot do this].
There is another solution I'm considering - its sort of weird which is why I haven't implemented it yet. I would allocate a surface in system memory and a surface in video memory. Normally the engine just draws to the video memory and uses a flip. When a transparancy is on screen any scanlines it intersects are drawn to system memory instead of video memory and processed there. At the end of the frame the system memory lines are copied to video memory and a flip is carried out.
This method could be good - with no transparancies on screen we get a very fast framerate and with lots of transparancies on screen we don't drop to the 2fps that Unreal gets. With a few transparancies on screen we get the most benifit [the corona's won't halve the framerate anymore].
There is a huge problem with the above method - you need to know the position of all tranparancies on the screen before you start drawing a frame - this would means a serious redesign of my 3D engine. I'm thinking about a different implementation of the above system that would use bounding rectangles instead of entire scanlines but would solve the problem.
How I'm doing the lighting
My perspective texture mapper [subdividing affine with a dithered tiled inner loop] get pixels from palettised textures, references the palette and dumps 'ARGB' 4 byte entries into a table that is aligned on a 4096 byte boundary. [This should enable the whole table to remain in the cache during the postprocessing even on a P166MMX as long as I'm careful where I put any temporary variables I use so they don't knock out cache lines I need - 4 cache lines [2 in non mmx ?] can be referencing the same 'address&4095' at a time]. I use paletted textures for their cache performance [4 times less data to read than 32bpp textures]. Texture mappers inner loops are normally quite slow, a loop that looks as if it should take 4/5 ticks to run probably takes 18 or 19, for this reason I think that running a 6/7 tick per pixel postprocessing loop shouldn't harm performance greatly [i.e. it ain't gonna halve the frame rate]. Anyway the second pass takes the 'ARGB' entries for a single textured span and lights them - its a multiplacative blend with a gourard shaded triangle - it then drops lit entries to the screen. If there is any fogging required then another fogging pass can be run on the 32bpp entries.
Why the hell would he use interpolated shading?
Good question :).
The outdoor/terrain engine is the most suited to interpolated shading because it has to display so many texels each frame. If I wasn't using interpolated shading I would either have to use a surface cache or the dynamic lightmap blending that Unreal used. A Surface cache would require huge amounts of ram and would make the dynamic outdoor lighting I want to do impossible. The Unreal system has a problem because you can't run bilinear interpolations in real time fast and have to dither the light map onto the texture. Because of this low resolution lightmaps look very blocky. [high res maps would lead to the surface cache problem again].
If you have Unreal and want to see a good example, goto the first outdoor section in the game and use the 'Ghost' cheat. Flyout above the big lake as high as you can without going out of the top of the level and then look at the cliff walls. Although the lightmap systems work very well for games like Unreal and Quake they wouldn't work well at all in my game.
The Terrain Engine
High resolution textures drawn onto a 256x256 grid landscape that has dynamic Lod that enables large amounts of terrain to be drawn in a software 3D engine. The Lod will group terrain polygons down to blocks of 2x2 in the middle distance and 4x4 in the fogged far distance. I want volumetic fog regions on the terrain map [so I could make fog roll in off the sea or foggy smoke from big explosions].
There are lots of other things I'm planning as well [I'll go into these at a later date] but the main reason for them is the dynamic weather system [which I've designed but am yet to implement - mainly because of its complexity - I'm the only coder working on this project and there seems to be a hell of a lot to do apart from 3D engine coding]. I'm going to have a simple atmosperic model that works on a 32 by 32 block grid. Each grid reference can have a different weather status cloud/fog/rain/snow/wind e.t.c. The weather status is not stored as cloud/fog/rain e.t.c. this would be way to simple for what I want - I am running a physical model of the atmosphere inside the game [its a very simple model but who's going to notice! [Not bad for saying I've only just finnished my Alevels]].
The model gets a set of starting conditions whan a level is loaded. During play the atmospere is 'simulated' and a hint of randomness is added [There are not enough games around that add random elements - a little randomness makes games look real].
As well as the atmosperics the 256x256 grid is lit by a simulated sun [or suns]. This is done via a raytracing process. Raytracing! There is a little trick that lots and lots of coders do not know and never exploit and that is the art of spreading complex calculations over several frames. Surface cache's are a simple example of this, raytracing is a more complex example. The 'day' in my game is quite long [I'm planning it at about 10/15 minutes]. The sun therefore moves across the sky very slowly and that means that I only need to process very small parts of the 256x256 grid each frame. I should be able to get effects such as reflections from bodies of water e.t.c.
At the moment I'm using an untextured gourard shaded sky box - if I don't texture it then I'll make it a sky sphere [so banding is less obvious]. If I leave the sky untextured then I'll use flat shaded untextured 3D clouds, if I texture the sky then I won't use the 3D clouds.
Points about the Indoor Engine
1)'Hard' Shadows / Lighting
I quite like hard shadows and up till very recently they have not been done dynamically in real time. Unreal didn't have a surface cache as far as I know but used mmx to blend lightmaps onto textures in screen space. I could take this approach but then I'd be left with the slow dynamic lighting problems I'm trying to avoid. Unreal's software rasteriser wasn't all that fast anyway esp on say a P200.
Hard shadows are very easy [though not fast] to do, all we need to do is intersect and clip a portal/sector dataset with light frustrums. The portal/sector set does not change you just have to have all the sector surfaces keep lists of clipped polygons. This could be very very slow on Quake/Quake2/Unreal/Half Life complexity levels but I won't have that problem since the levels in my game are going to be very simple.
The best way to design a 3D engine is to compromise its features with the features of the game you are designing it for [If you just go out to write a '3d engine' with no purpose then you will not get anywhere and will become bogged down in overblown features]. The game I'm working on does not require hugly complex arcutecture [I've decided to trade level complexity for faster dynamic lighting], its very fast paced and more importantly its multiplayer [even with no net connection...] i.e. there is no time to stop and look at the scenery - if it wasn't like this there would be no way I'd be using a portal based engine, it would run too slowly - there is no point me trying to compete directly with Quake or Unreal or Halflife, I'm coding the whole thing by myself - it would be impossible.
I still haven't decided what to do when the light volumes have to be intersected with models - because the 3D engine is based on portals and convex sectors, clipping frustrums to it is simple as convex frustrums are always produced. Models and 'things' [as they are called by Crystal space e.t.c.] are a problem.
2)High Resolution Texutres
Surface cache games always have very low resolution textures to try and keep the size of the surface cache lower and to make sure that tiled textures are always in the cache while they are tiled into the surface cache- we don't have a surface cache so we can have very high res textures.
3)Coloured Lighting in Software
Not many software rasterising games can do true fogging and lighting in software in colour [Unreal engine based games can but slowly].
4)Fast [almost free] Dymanic Lighting
Explosions e.t.c. can light up rooms with no drop in frame rate since nothing extra will be drawn. No extra memory is used if lots of polys are on the screen compared to one poly on the screen.
It was playing this game that finally convinced me to go the way of vertex shading/hard shadows instead of soft shadows and light maps. Lots of people argue that light maps produce more realistic looking graphics than flat shaded/vertex shaded polygons. I know that Goldeneye runs with hardware acceleration with bi/trilinear mapping - that though is irrelevant. Anyone that has access to this game, just go and look at the guard towers in 'Dam' or the houses in 'Streets' . There is no way I could acheive effects like this with light maps in software, I can [just about] using vertex shading.
6)Low Quality Lighting
Never try to push a 3D engine into doing something it wasn't designed for. Interpolated lighting will not work well for small dynamic lights. I am therefore not going to light the level via interpolated lighting for small dymamic lights - they will light the level via projective lighting instead.
7)Specular surfaces and Bump mapping
It is possible! I plan to do this using palette modification. [Which is another reason why I'm using paletted textures - if for example I want to do an underwater effect where all the textures go blue then all I have to do is modfy the palettes...]. The bump mapping is not true bump mapping but apart from other coders, who the hell is going to notice?
I'm using direct input to get mouse and keyboard input and feed data into my applications internal event queue [not the windows event queue]. Its very important is to scale input values to make them frame rate independant and to take note of the time that Directinput said the input began. This is so that if your game drops to a low frame rate inputs do not get distorted.
I converted all my assembler code from wasm [Watcom 10.6's assembler] to Nasm format. I can now use mmx instuctions in my assembly code. Nasm's syntax is slightly different to Wasm/Masm/Tasm but i'd recommend it to anyone using any other x86 assembler, its very fast and efficient and it never 'goes behind your back'.
My game engine is linked to and uses the Midas sound system dll. This is a very nice sound system that should enable the dynamic music I want to use via impulse tracker files. I've got Modplug as well as Impulse tracker but I think that Impulse tracker is better.
This is how I want to handle the main lighting in my portal engine. It should also enable me to use dynamic hard lights and have dynamic geometry [something that very few games have]. The hard lights / shadows will be dynamic for things such as opening doors e.t.c. [i.e. Not going to be from rockets or muzzle flashes].
I'm planning to handle small dynamic lights [head lights / search lights e.t.c.] via texutre projection - this will hurt performance but is not going to be used for explosions and such things. Larger dynamic lights are going to be handled via vertex shading.
)Screen copying routine and performance for system memory surfaces
Using a mmx screen copy is slightly faster than blit which is faster than a flip from system memory - as long as the primary screen pitch is the same as the back buffer pitch. If it isn't then use the blit instead of the mmx copy [no real performance difference].
)Fixed the 'dm.dll' problem
After changing the position of the controls on my applications opening dialog box the error vanished. More Win95 weirdness I think.
)Fixed Profiling problem
My previous installation of Watcom 10.6 half left a load of environment vars pointing to its directories in my autoexec.bat. Vcc had put its new defs BEFORE the Watcom ones [and they had the exact same names!]. When Vcc came to profile it found its default paths pointing into Watcom directories - result, nothing happened. Now it works I've found it very slow - it drops the frame rate of the 3D engine test I've written from 30fps to 1.4fps [ish].
I edit all my textures as 24bit pcx files and use a texture converter I have written to bring them into the game. It takes the pcx and mip maps it from what ever resolution it is down to 1x1. Since I use paletted textures in the game engine I then use an octree to reduce the colours in the mipmaps to a shared 256 colour palette. I recently added the octree colour reduction, its very fast and makes a very good palette. I only use power of 2 sized textures so any calculations to do with texture size can be done with shifts instead of multiplies.
I got a web site - its not got anything on it at the moment but in the future I will upload versions of my 3D test program. The address is http://www.bigwig.net/mooshtack_boorai.
Bought this in a sale. A very nice looking game - Taking a cue from it I changed my texture mappers to dither the textures to the screen which makes the graphics from my engine look very nice. As long as you don't go right upto textures they look as if they are bilinear texture mapped + it doesn't cost me any clock cycles! [which I like very much.].
)Word of the week
- Adsy [#KliX#] -
The inspirations for my 3D/game engines:
Sonic 3 - Huge amounts of dynamic geometry, never been bettered
Goldeneye - A vertex shader [non dynamic] that looks better than any light mapper.
Black and White - Not out yet but the screen shots look beautiful.
The sky - Look at it for a while and you realise that mapping a cloud scene into an environment box is really not doing it justice.
Half life - Showed that gameplay is far far more important than graphics. Just remember the number of games that have used the Quake2 engine.
Civilisation 1 - I still play this, again proves that in depth game play is better than flashy graphics.
Magic Carpet -Dynamic landscapes and many clever visual effects.
FzeroX and Goldeneye[N64] -Tonnes of graphical glitches but still tonnes more fun than 99% of PC games.
Elite2 - I looked at this the other day, I didn't realise how good the graphics were until now. It even has curved surfaces!