Not logged in, Join Here! or Log In Below:  
News Articles Search    

 Home / Cross Platform Development / Offworkproject: Software-Rendering with no FPU Account Manager

September 21, 2012, 03:50 PM

I'm currently into creating a lightweight, experimental software-rendering engine for a mips architecture (alchemy), which runs under linux. It doesnt have a hardware fpu, only a linux kernel fpu emulator.

It shall be a c++ engine with its own framebuffer module, and shall support tslib input by default (and prefered over mouse/keyboard). It also shall be totally portable (not os portable, but hw portable), so the fpu emulator shall only be used if there is no way out.
As stated, its merely an experiment, to create an engine and to see where are the commons to a modern opengl implementation and hopefully uncover some optimization concepts usefull for pc rendering.

The framebuffer module is not really a matter, it will be a direct memory access, with a sleek api for putting/reading pixels and some enhanced primitives, such as lines, maybe curves, etc.

For the rendering infrastructure i thought of a Scene-oriented concept, based on the idea of a scene master where all "to be rendered" components are registered (maybe through a scene composer, or just by loading a scene bsp). the scene master would then handle all movement-, sequences-, input- events.

I will update this thread as more infos or news arise.

Would love to hear your oppinions, tips or maybe potential performance issues, which i'm not seeing yet.

Bad Sector

September 24, 2012, 04:45 AM

I don't know about MIPS tips, but a nice MIPS device you can play around with is a Dingoo handheld. I have one and you can either put Linux or it or use its native OS. Also it is very cheap - i bought one for ~78 euros about a year ago.


September 25, 2012, 01:37 PM

The company i work, creates own embedded devices which are configured to be used with touch and displays.The predecessor of our latest board has a MIPS Alchemy AU1250. This will be my platform, the display is a 16bit 640x480. It runs with closely 500 Mhz (492) and got 256MB RAM.

Comparing only the data sheets the Alchemy is a bit faster than the Dingoo CPU, but its the same architecture. Thanks for that tip.

Bad Sector

September 27, 2012, 05:42 AM

There are many MIPS machines out there, i just like Dingoo because it is dirt cheap and you can carry it with you to show(off) stuff around :-)


September 27, 2012, 02:44 PM

This is really an important issue. Nobody really trusts you if you carry on a bunch of pcb with free flying displays.

It doesnt matter how good your software is, a demo on an embedded device looks much cooler as it is wrapped in a professional looking case or if it looks 'hacked'.

As of a really strict focus on doing things in software and in c++, i hope that porting to other devices (porting to mips shall be easy than) which arent necessarily mips is getting easier than. But thats a secondary issue at first.


September 28, 2012, 01:51 PM

Would love to hear your opinions, tips or maybe potential performance issues, which i'm not seeing yet.

Sounds awesome! anything online about your project?


September 29, 2012, 10:34 AM

Not yet. But i hope to get some samples uploaded within the next weeks.


October 06, 2012, 06:16 PM

So, the first version of the frame buffer layer is ready. At the moment it is only a "setPixel(x,y,ColorValue)" API with direct HW access for each call which makes it really slow. It already has some functions for drawing lines, triangles, rects but they all call setPixel for each pixel. Next step is to implement a buffered output or at least, to reduce the amount of memory needed, a mechanism to draw multiple, consecutive pixels at a time.

first impressions:
The below image is drawn in about 1:45minutes. (will upload a video if i have more time).


October 09, 2012, 01:30 AM

krany, if you don't mind saying, what company do you work for? (I'm interested in such a device)


October 11, 2012, 02:37 PM

The company i work for is Ultratronik. A german MMI focused embedded full service company.

But unfortunally, i have to disappoint you. We arent selling dev boards for individuals (even developers) as our infrastructure wouldnt be able to handle it (like warranty, rma, etc). the service we offer (incl. SMT, embedded development, etc) is for customer projects with greater lot numbers and structured to handle it.

The board i'm developing on, is our internal development and presentation platform, which lot count is really limited to internal needs. the only exceptions where the boards can be used to other purposes (like this project) is, if they are damaged or really old. Mine for example (I will upload a photo soon) got problems with the CAN bus HW and RAM timings (only half speed possible), which isnt primarily blocking my project.

If you are primarly interested in the Alchemy SoC, look at for other devices.

Really sorry. :/


October 16, 2012, 04:30 PM

Hi, krany, sorry for the late reply, but glad for the information.


November 01, 2012, 04:39 PM

I'm struggeling a bit. After some research and try-and-error i figured out that there are two ways for drawing triangles manually.
One is described here:
The other one is based upon the Barycentric coordinate of each triangle ( ).

The first approach aimes for parallelism and does not makes use of float types. but it isnt also really portable as the fixpoint relies on int type which is on x86 pc 32bit signed int. but even if i change the types to portable types it may be slower as the targeted platforms are 16bit.

the second may result in more understandable code as it does not require fixpoint in the first place, are portable as float is in C99 and on every linux useable (i know that this is not always true). But it will be slower as on the target, we dont have a fpu. And it may result in less code (need to prove this, its just my theory).

i really cant decide which road to go.
any tips?


November 01, 2012, 04:50 PM

I need to correct myself as my original sentence is ambiguous.
"but even if i change the types to portable types it may be slower as the targeted platforms are 16bit. "

I know that the Alchemy is 32bit, which is the reason why i'm more with this approach. but the aim is to support 16bit too and dont have much performance problems because of this, this is why i'm struggeling.

one workaround for the performance problems of the second approach is to use fixpoint too, but then i have the same problem as in the first one.

Krzysztof Narkowicz

November 02, 2012, 11:23 AM

Actually both above ways are the same - half-space rasterizers, which are nice for parallelism, computing pixel quad derivatives, large resolutions / large triangles. An alternative way is classic scanline rasterization . Scanline is much faster on 1 thread without any SIMD instructions, like SSE.

Both rasterization algorithms can be implemented using 16bit integers (of course it depends on the max frame buffer resolution).

This thread contains 14 messages.

If you'd like to post a comment on this discussion thread, please log in or create an account.

Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.