Not logged in, Join Here! or Log In Below:  
News Articles Search    

Submitted by Lynn Duke, posted on February 25, 2005

Image Description, by Lynn Duke

This is a screenshot of XVP skinning and lighting 160 skin meshes in hardware via a 2.0 CG vertex shader. Here are the specs:

1. Each mesh has 36 bones.
2. Each mesh has a little over 2100 faces
3. No more that 5 bone influences per vertex
4. Each mesh is playing its own 45 second skeletal animation (performed by CPU)
5. Each mesh is lit, per-vertex, by a blue point light.

Right now the screenshot shows 30 FPS at 160 meshes, but I had to put it in windowed mode to snap the screenshot (ALT-PRINTSCREEN). The code can actually get closer to 250 skin meshes on the screen before dropping to the 30 FPS mark. THAT'S what I wanted to take a screen shot of, but this will have to do.

Actually, I'd be very interested in knowing performance stats on other people's hardware. If you have the time, you can download the demo along with a whole slew of other XVP demos at There are other demos there like cel shading, per-pixel lighting, cubic environment mapping, animation blending, etc...

Thanks for the viewing time!!

Image of the Day Gallery


Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.

February 25, 2005, 02:19 AM

Nice =)

What kind of hardware did you use to get these figures?

I downloaded your demo and it seems your missing a few DLL's, cg.dll and cgd3d9.dll to be specific.

/ Anders


February 25, 2005, 02:24 AM

Nice image you have there. I've tried to run your demo, but it doesn't work for me.

The first problem was that I was missing several .DLL files cg.dll, etc.
I fixed this by downloading and installing the NVidia CG compiler.

Now your demo runs... but I only get a white screen (lucky that it's not blue, eh?) and the Skin Mesh count is zero. Maybe it fails, since I don't own a NVidia graphics card? Mine is an ATI Radeon 9800pro, so it should support Vertex shaders 2.0.


February 25, 2005, 02:35 AM

Got the CG SDK, now it works... I get 13 fps with 250 characters on screen with my relatively "low end" 6800 LE. Still pretty decent though.

/ Anders

Aras Pranckevicius

February 25, 2005, 02:48 AM

Hmmm... I'm not sure, but I think back something like 1 year ago I was getting ~500 skinned meshes with a similar vertex count before it dropped to 30FPS; on a Radeon 9800Pro. I need to double check again of course.

Things to watch out when skinning on hardware: optimize meshes for vertex cache; feed 4x3 matrices (there's no need to have 4x4 with one row wasted) and play with the options whether it's better to skin stuff and blend with weights or construct a blended matrix and transform just with it (the result is equivalent, but you may win or lose some instructions depending on how much weights/data you have).

Still, your stuff is pretty nice.


February 25, 2005, 04:14 AM

Nice work.
Will give it a try when i get home.

PS. Make them say "Mr Andersson" once in a while ;)


February 25, 2005, 07:34 AM

Would be nice if you included cg.dll, can't be arsed to download the nv sdk.


February 25, 2005, 10:19 AM

Wow. Thanks for giving the demo a run. I appreciate that. I totally forgot to include the Cg dlls. That means the other shader demos I have on the site will have the same issue. I'll fix that tonight.

Dolphin - Press the space bar to add instances of the skin mesh when you get that "white screen". You will see something no matter what because it has a software fallback mechanism for cards that don't support vs 2.0. If it fell back, they won't be lit with a blue light.

Aras Pranckevicius - Great advice. Right now I'm passing the whole 4x4 matrix. Also, you're right about the Vertex Cache. This model is COMPLETELY unoptimized. In fact, it's a triangle list! I'm going to re-post later with an optimized model and a re-written shader and we'll see what I can get then. When you say "optimized", are you referring to tri-strips???

Hardware is a 6800 Ultra, so let me go back to the lab with Aras Pranckevicius' suggestions.

Thanks again.


February 25, 2005, 10:50 AM

Thanks for your reply!

When I press Space 250 times (can you change that?) .... I get 250 blueish characters on the screen now :)
My test results are 13fps with 250 characters on my Radeon 9800pro. As processor I have an AMD64 running at 2,1 GHz with 1GB RAM.

Looking forward for your "optimized" version. :)

Aras Pranckevicius

February 25, 2005, 10:53 AM

For modern cards basically there's no difference in indexed triangle lists and indexed triangle strips. By "optimized" I meant "optimized for post-vertex-shader cache" - to use that, you HAVE to use indexed primitives (no matter whether that's lists or strips), and also you have to optimize the indices to efficiently use the cache. On D3D, there's functionality in D3DX that does this; and there are lots of other vertex cache optimizers floating around.

Aras Pranckevicius

February 25, 2005, 11:01 AM

Yeah, you seem to lose some efficiency somewhere.

I've just tested on my own stuff, I get 30 FPS with 350 skinned meshes, each from 6365 triangles (and all per pixel lit with a normal map, but thats irrelevant :)), each playing it's own animation on 55 bones.

On a GeForce 6800GT, so your Ultra should be somewhat faster (and your meshes are smaller).


February 25, 2005, 11:02 AM

Dlls are included now. Also, Aras Pranckevicius, when you got those ~500, how many bones did they have? Were you lighting and animating all of them?

I basically emulated a DirectX demo that ships with DirectX 9c. In it they animate, light and skin 'tiny' who has 35 bones and then let you add instances of her. I used that app as a benchmark. On my machine I could get about ~60 meshes ahead of them with similar frame rates.

Just curious...later...


February 25, 2005, 12:02 PM

Hmmm...are your's animated? That's a big one.

The only other difference I see is the optimzed mesh. I've run a profile session and the actual render call comes out on top.

I'm using indexed primitives, so I think I'll look into optimizing the mesh first and see if that gets me closer to your stats. Thanks....


February 25, 2005, 12:04 PM

Cool. I put a 300 mesh limit on it.


February 25, 2005, 02:15 PM

FYI....I just found something major. My index buffer and vertex buffer were declared with default management and a usage of 0. According to DirectX debug output, it said I should switch my usage to USAGE_WRITEONLY.

I did and here are my new stats!!

382 meshes at 30 FPS
460 meshes at 20 FPS

That's a huge improvement...however I still need to make the improvements you suggest.

Thanks for your feedback.


Aras Pranckevicius

February 25, 2005, 11:50 PM


If all the meshes are playing their own separate animations, then at some point you can get CPU limited. As you're using nVidia card, I recommend NVPerfHud to see which one's idle.

Yes, my meshes were also animated. It would be silly to compare animated vs static meshes and decide which ones are faster :)

This thread contains 15 messages.
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.