Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / General Programming / Looking for a optimizer tool Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
Julien Houllier

January 28, 2005, 02:38 AM

Hi everybody cool here !

I have to optimize my OpenGL based three-D engine, coz large scenes ( starting about 15000 triangles per frame ) slow it down a LOT...

I noticed that with or without large textures, it slow down the same.
I noticed that with 320x240 output or 1600x1200, it produce the same framerate .
I think the threatment of the meshes is too slow. But I can't see WHERE.

Here some info about my hardware ( not too bad for a coder :) :
- GPU is Radeon9800Pro
- CPU is Athlon XP Barton 3000+
- 1GB memory onboard

I have tryed to put and 'AddToLog( char *logtext );' function that dump the timing in a log file, but that's slowing down the app like hell (imagine doing that EACH vertex.... arghhhhhhh :-) )

Do you know an application ( free, by preference ) that can help me to do that, or a bunch of MACROs that I can use to find the bottleneck ??


Thanks everybody here.

Julien

 
Nick

January 28, 2005, 04:49 AM

First of all, what is your framerate? If it dropped from 500 to 200, there is absolutely nothing to worry about. You should be able to reach about 2,000,000 triangles per second with that hardware. But don't waste -any- time at all on optimizations before the application is finished or you go below 30 FPS.

Are you drawing triangles one at a time or in big batches (vertex buffers)?

 
Julien Houllier

January 28, 2005, 05:11 AM

Oh yes, my framerate in going from 400Fps to something like 60-80Fps...
Thanks for the advice that I must spend my time finishing the application, instead of optimize it. ;-) I'll try to follow it.

Oh, last thing, yes I am using vertex-buffers ( when available ) and also compiled vertex arrays ( when available too ).

But I made a kind of vertex pre-buffering, ( I think THIS IS the bottleneck ) to take only visible triangles from meshes ( after computing frustum ), storing them (triangles) linearly in a fixed size vertex buffer ( about 1024 vertices long ), and draw this buffer using hardware compiled vertex buffer when not empty.

Maybe if I can remove this temporary buffering, I can accelerate the application..

 
Nick

January 28, 2005, 07:23 AM

Yes, you should do frustum culling on whole models (using a bounding box or sphere) and big parts of the scene, not individual triangles. Your information indicates that you are CPU bound, so just do only some coarse work on it and let the GPU do the rest.

 
Nick

January 28, 2005, 07:41 AM

I should also add that I was not entirely correct when I said you should only think about optimization at the end of the project. In fact, early design decisions that have implications on the performance could very well make the difference between a great product and being less than average. This assumes though that all performance bottlenecks can be anticipated. In reality a great deal of experience is required to come up with a good design. And lots of testing (profiling) is required to focus down on the real cause of bottlenecks.

Anyway, the general rule is that out of a hundred possible optimizations only a handful are worth spending your time on. Just a little warning...

 
Sander van Rossen

January 28, 2005, 08:07 AM

http://developer.nvidia.com/object/practical_perf_analysis.html

here's a pdf with some tips on how to find your bottleneck
but it sounds to me that your cpu limited
or maybe you're batching your geometry poorly..

anyway, like the other guys said, you should only optimize when you have to..

development time is precious afterall

 
Scali

January 28, 2005, 08:35 AM

In general you want to use the CPU as little as possible for processing geometry.
3d hardware is the fastest when you store the geometry in static vertexbuffers on the videocard, and only send draw operations to the GPU.
That's also why we have hardware T&L and vertexshading. We only have to send new matrices to the videocard in order to draw most animated worlds.
The videocard is lots faster at culling than the CPU, and even more, it leaves the CPU free to do other things.
So don't worry about sending a few tens of thousands of triangles that are potentially outside the viewing frustum. The GPU will handle that quicker than you could.

A typical engine design would be to take meshes as the smallest drawing primitive. The mesh can be stored in a single static vertexbuffer, and extra data like textures, shaders and matrices linked to it.
For special cases where the CPU has to process every triangle, you can create meshes with dynamic vertexbuffers. You should still be able to handle a few thousand dynamic triangles with the CPU... Try to offload as much work to the GPU as possible. Usually you can at least let the GPU handle T&L on dynamic meshes.

This way you can batch your geometry quite efficiently aswell... one mesh is one draw call... You can get pretty highpoly... up to about 500 triangles it's 'free' compared to the call overhead. At about 5000 or so you get maximum efficiency.

 
Julien Houllier

January 28, 2005, 10:30 AM

Ok thanks for the tips

I'll try to move my code from CPU to GPU.... quite a challenging modification in my code !

 
Julien Houllier

January 28, 2005, 10:30 AM

Good link, thank you.

 
Julien Houllier

January 28, 2005, 10:41 AM

I'm not sure about storing vertices in the VideoRAM using OpenGL...
If you have some tips, I'd love to read them !

Actually I am using a function like that to draw a mesh each frame :
( where NormalBuffer, VertexBuffer and ColorBuffer are buffered each time for each mesh, and IndexBuffer is a static 0 1 2 3 4 5 6 7 ...and so on... index buffer )

  1.  
  2. glEnableClientState ( GL_VERTEX_ARRAY );
  3. glEnableClientState ( GL_NORMAL_ARRAY );
  4. glEnableClientState ( GL_COLOR_ARRAY );
  5. glEnableClientState ( GL_TEXTURE_COORD_ARRAY );
  6.  
  7. glNormalPointer(        GL_FLOAT, 0, &NormalBuffer);
  8. glVertexPointer(3,      GL_FLOAT, 0, &VertexBuffer);
  9. glColorPointer(4,       GL_FLOAT, 0, &ColorBuffer);
  10.  
  11. // lock buffers
  12. glLockArraysEXT(0, Curr_indexBuffer);
  13.  
  14. // draw mesh
  15. glDrawElements( GL_TRIANGLES, Curr_indexBuffer, GL_UNSIGNED_INT, &indexBuffer );
  16.  
  17. // unlock buffers
  18. glUnlockArraysEXT();
  19.  


Is it the way to use ? ( except for buffering: I'll try to use the data directly from the mesh, without buffering )

 
Julien Houllier

January 29, 2005, 06:09 AM

Thanks to everybody here, I started rewriting my engine to let the GPU work in place of the CPU, and primal tests are really enjoying and promising !

CPU sucksss, GPU rulezzz :D





 
Scali

January 29, 2005, 06:16 AM

I'm not familiar with OpenGL's new extensions... but there are some vertexbuffer object extensions, I suppose you'll need to use those, as they would most closely relate to the ones in Direct3D, which allow you to create the fast static and dynamic buffers.

 
JoniM

February 11, 2005, 09:34 AM

You should try AMD CodeAnalyst. It is a totally free performance analyzer.

http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_3604,00.html

 
Julien Houllier

February 11, 2005, 10:32 AM

Good Shot JoniM !

I am currently downloading it, I try, I tell you :)

thx

 
Julien Houllier

February 11, 2005, 04:09 PM

YES :) !!! That is EXACTLY what I was looking for .. THANKS TO YOU !

Great tool

 
This thread contains 15 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.