Not logged in, Join Here! or Log In Below:  
News Articles Search    

Submitted by Austin Appleby, posted on August 09, 2001

Image Description, by Austin Appleby

8,000 Butterflies at 100 frames per second - no assembly, no hardcoding, no hacks.

These are three shots from a test of the particle system in my experimental engine (currently named Pandora). The butterflies are completely dynamic and are independently animated - no precalculation. When the butterflies flap their wings they climb higher, and when the don't flap they glide down. The butterflies are all different sizes, and smaller butterflies are faster than larger ones. There's also a simple physics model (a force field) that corrals the butterflies into a donut-shape.

Each butterfly is a separate C++ object, and each uses virtual functions to implement its behavior (they derive from a CParticle base class). Before people complain about the performance penalties of virtual functions, I've extensively benchmarked the virtual function overhead here and it's practically negligible. The particles are moderately memory-inefficient, but that can be cleaned up quite a bit with a custom allocator. There are exactly 0 lines of assembly code used in the particle system code - the particles themselves are straight C++, and the vector math routines underneath them are hybrid C/C++.

The geometry for each butterfly is essentially a square folded down the diagonal, and is built with 8 vertices and 4 triangles (4 verts and 2 triangles per side). The current version renders 8,000 butterflies at 95-100 frames a second on my development machine (a P4-1.7ghz + GeForce 3), or approximately 800,000 particles per second (3.2 million triangles per second). It doesn't use any GeForce 3 or Pentium 4 specific features (no vertex shaders, SSE, etc.) though it does use some NVidia-specific OpenGL extensions (mainly NV_vertex_array_range).

I wrote this demo just to prove that you can get excellent performance out of a C++/OpenGL engine without any sort of hacking or assembly optimization - as long as you keep your code efficient and benchmark every one of your changes, you can usually avoid any performance bottlenecks. I probably won't be releasing the full source code to the demo (as that would require releasing huge chunks of my still-in-development engine) but I can write up a quick overview of the techniques I used if enough people are interested.

-Austin Appleby

Image of the Day Gallery


Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.

August 09, 2001, 06:36 PM

well, it's a particle system with the particle decay done differently, rather than the particles spraying up, falling, and dying, and being replaced, he went with keeping the same 8,000 particles onscreen, and implementing a slightly more artistic way of particle flow. at it's roots, it's the same as any other fountain particle demo, but it's done very slick, and very nice with the coding. I'd still love to see how much more performance could be gotten out of it by implementing some assembly, if for nothing else than academic curiosity.

James Matthews

August 09, 2001, 06:49 PM

I'll skip the whole "assembly" and "performance" comments here, you've got enough of them :)

Nice pictures, and I'd really like to see a flocking algorithm implemented in there. With a system like that you could do something really cool like have a tree drenched with monarch butterflies. Smack the tree and the entire swarm takes off! It'd be a very cool effect, I imagine.

This sort of stuff probably looks 100x better when seen in motion. Great stuff, keep it up.

Hiro Protagonist

August 09, 2001, 06:51 PM

"Do they rise and fall because of the flapping, or is it just a "if (flapping) butterfly.y++" type of thing? "

Are you asking if he has succesfully modelled air particles that react against the polygons on the butterflies' wings, forcing them skyward?


August 09, 2001, 06:53 PM

That's exactly what Intentional Programming is... Its currently being researched/developed at Microsoft. You specify your "intentions" or "assumptions", and the compiler optimizes with that. It's a lot more than that; I can send you a .doc if you are interested.


August 09, 2001, 07:01 PM

Thanks for the note! Please do send me this document...

Hiro Protagonist

August 09, 2001, 07:05 PM

"If you gave the chess player enough time (and pacience) he would always be able to win, except when the program has won before the match begins because it has stored all solutions in memory "

That is a ridiculous statement.


August 09, 2001, 07:09 PM

Which part? Could you please explain what you find ridiculous?

BTW, about storing all possible combinations, that's a joke of course...


August 09, 2001, 07:12 PM

You have a MAXED out system, just about the fastest PC computer possible today. Your drawing what, 32000 polys? And only 100 fps?

Small clue: That's not fast.


August 09, 2001, 07:14 PM

"I'll skip the whole "assembly" and "performance" comments here, you've got enough of them :)"

Sorry, I just get a little angy when somebody sais "that assembly should be avoided today". I didn't mean to spam this board or anything...


August 09, 2001, 08:25 PM

You might in an odd way, be on to something here.

Keep in mind, modern day compliers are optimized between, optimization, compile speed, and size of executable. Now, what if we removed the final two hindrances, and gave a compilier ALOT more time to work optimizations. Granted, "creative" optimizations would be lost (ie, restructuring code to do something different then intended to gain speed). However, there is NO reason whatsover, why a computer cant do something like the above mention SIMD optimization better then a human.

Really, its just a matter of nowing the restraints, and running the formulas against them. This is an area where computers excel. If we gave a compilier the chance to compile code hundreds or even thousands of different times, instead of just once, Im sure it would come up with faster code. Now, add some elements of a learning system to the equation, and in now time, once again, our compiliers would be out optimizing most of us. Now, in terms of the technology we have today, I agree with you. However, what I have suggested, is both possible and feasible.

Actually, now that I think about it, im shocked nobody has developed this??? Oh wait, nevermind anything I said in this conversation, im on my way to the patent office! :)



August 09, 2001, 08:29 PM

I really outta check my posts for spelling errors and grammatical mistakes, eh? :) Ignore the bad typos, ive been drinking... Just rest assured knowing, im Canadian, so Im better educated then those south of the border...

Sorry, cheap shot, I take that back :)



August 09, 2001, 08:34 PM

Hmmm, forgot that discussing Assembly caused huge flame wars to erupt :) Been a while...

While we are at it,

God doesnt exist!
Man never walked on the moon!

Just adding to the pot... time to stir, shake well, and serve!



August 09, 2001, 08:35 PM

I'm really impressed ... by your development machine :-)

About asm just another opinion. Assembly coding was useful and nice in the 486's and early Pentium's days. I've done a lots of asm coding
then. It was powerful and easy thing to do. But these days, as CPU's internal complexity has grows a lot, the asm coding has turn into a very hard task. Well done code, of course. Everybody is able to do asm code. Almost nobody can do it in the right way. The risk of doing asm just now is that is easier to spoil it all than ensure some benefit. So I really think it's currently a waste of time to code in it. Basically, those who can take a benefit by asm coding aren't able to do other things cos there is not enough time to learn other things.
I like asm but now it's for fun.


Louis Howe

August 09, 2001, 09:13 PM

Very nice looking!

Just kidding! That really looks great! What's interesting about the screenshots is that it makes eight thousand butterflies look like many more. But I think twenty of them would be enough for me...

If you're going to use butterflies as particles (and not examine them up close for long periods of time), I wonder if it would be better to use two triangles instead of four, and have half as many vertices to process.

This would be a great screen saver!


August 09, 2001, 09:47 PM

great work! but where is the gun? when you blow them up, do butterfly guts fly everywhere? can I have your computer?


August 09, 2001, 10:04 PM

pretty interesting butterfly things...
but, i think if the butterflies have diferent colour of wings, then the environment should be more beautiful and interesting.
Nice works..


August 09, 2001, 10:19 PM

hey Serapth, this post was a key reference to an argument i just won about the existance of god. how ironic ;), but it's great to see someone has enough balls to speak the truth

Matthew Harmon

August 09, 2001, 10:38 PM

Very nice looking. And your point should be taken by people talking about optimization. (At least I think this is your point.) An application like this probably spends less than 5% of CPU time doing anything other than waiting for rendering to finish.

After profiling our game engine (without AI yet) I found that specifying vertices, setting states, doing flight dynamics, etc. is less than 3% of the CPU. If you feed the graphics subsystem correctly, you should be able to spend lots of cycles on "game stuff" while your images are rendering. (Of course, the lack of state changes here helps alot also!)

Now... my question. Do you specify a matrix for each butterfly and only have one tiny vertex list, or did you find that too slow? At first I specified a separate matrix for each of my "particles" but I think that is the wrong way to go. I'm going to try transforming them myself into world space so I don't have to do a matrix switch. That way, all the particles can be drawn in one batch.

What approach did you take?



August 09, 2001, 11:12 PM

Omg, you just won an argument about the existance of god?? Please fill us in, myth or truth? :)

The agnostic programmer


August 09, 2001, 11:58 PM

Heh.. the iotd description is tantamount to approximately the following:

"Look! I can lift a chair on my own! And I have NEVER used steroids!"



August 10, 2001, 12:08 AM

Actually.. correction.

"Look! I can lift a chair on my own! And I have NEVER used steroids!" followed by a lengthy explanation of chair lifting techniques, a sample training regimen, signing of autographs and a photo shoot of the author holding a lawn chair.

In short - HINT: you're CPU BOUND on a 1.4ghz p4. Nothing to brag about.



August 10, 2001, 12:58 AM

I've never seen and IOTD get ripped apart like this... Cut the man (and his butterflies) some slack.


August 10, 2001, 01:49 AM

"I'd still love to see how much more performance could be gotten out of it by implementing some assembly."

I dont really see how assembly could be of much help here. I mean granted theres some code handling the particle updates and what not, but most of it is the GFX card doing the work. I bet most of the stuff you see going on is nothing more than shoving some vertex/index lists to the video card. I could be wrong tho..

Gerald Filimonov
-=[ Megahertz ]=-

Jason Kozak

August 10, 2001, 02:09 AM

I'd love to see a doc on that too if possible.

Arne Rosenfeldt

August 10, 2001, 02:45 AM

So compilers would need AI?
A friend of mine said there is no "real AI": our brain uses hacks almost like these applied AI stuff today
And since out brain is faster (more parallel) it can do better
But this is only a matter of time

Mike Taylor

August 10, 2001, 02:50 AM

Well, that's a yes and no type thing. No, because much of optimizing and compiling in general is a huge calculation, and no human could turn a 1000 line C++ program into good ASM without doing some hardcore calculations ie LALR parse table generation and the such. Much bigger than 1000 lines and it becomes pretty intractable for humans. Also, large scale dependencies are difficult for humans to track. On the other hand, the best compiler possible will never make a bubble sort better than quicksort. This is where human thought comes in. For a problem like particle physics like this, I really think a compiler will probably be the better optimizer. The days of working too hard on pairing and branch handling are almost gone, as a result of out-of-order execution and advanced branch prediction. The most increase you could see was always algorithmic, but in the old days, you could speed Pentium code by 50-100% in almost all cases if you REALLY knew the rules. Now I doubt very many people here know ALL the pipeline/out-of-order/prediction/paralellizing rules, so I doubt many can squeeze more than 10% improvement by juggling instructions...

BTW, that Intentional Programming idea should almost completely kill hand optimizing.

Also, most of the hand optimizing in OGL drivers isn't really optimizing, its ASM because it is easier to do most driver tasks in ASM than C++.

-Mike Taylor

Mike Taylor

August 10, 2001, 02:54 AM

God I love this guy. He just summed my post above into 2 lines. Maybe I should have read ahead more...

-Mike Taylor

Mike Taylor

August 10, 2001, 03:00 AM

Damn fine point about knowing ASM to understand how cpus work. For years I couldn't understand why linear access is slower with linked lists than arrays, it was only after I learned about lookup costs and cache misses that I figured it out. I am suprised nobody has mentioned the other common argument for asm, that many techs will eventually come back to ASM ie: 10 years ago we all wrote asm, 5 years ago we stopped, and now we write things in vertex and pixel programs, which are JUST like asm. I'm sure this guy could get much better performance by moving some of his particle calc onto the GPU (it can be done, look at NVidia's site for the G3 comtest, one of the winners did a general particle handler in PURE shader programs, amazing)...

-Mike Taylor

Arne Rosenfeldt

August 10, 2001, 03:02 AM

I find these instruction sets for compilers much better for assembly.
Not these complex wired 086 CISC stuff,
but simple RISC with lots of regs


August 10, 2001, 03:22 AM

And about demo ?

(with just one ?)


This thread contains 152 messages.
First Previous ( To view more messages, select a page: 0 1 2 3 4 5 ... out of 5) Next Last
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.