flipcode - Loop-linked Function Calls

This section of the archives stores flipcode's complete Developer Toolbox collection, featuring a variety of mini-articles and source code contributions from our readers.

Loop-linked Function Calls
Submitted by

I'm not claiming to be the first to invent this, however it isn't something I've seen covered in any depth elsewhere. Also I don't know how applicable this is to other chip architectures; this is all done in ARM code.

I've recently found that I'm often ending a simple loop with a function call like this:

(pseudocode)
.loop
[code]
branch with link to function
loop test
conditional branch to loop

where the function is large or unwieldy or is used in a lot of places.

in ARM code this might be:

MOV counter,#cycles
.loop
[code]
BL function              ;branch with link
SUBS counter,counter,#1  ;decrement loop counter
BGT loop                 ;repeat loop (while counter +ve)

...
.function
[more code]
MOV PC,R14               ;return - r14 is link register

However, this means that each loop cycle you are branching twice in quick succession - the return from the function call and then back to the start of the loop. This is bad because each branch causes a pipeline flush. This can be avoided by setting up the return address before the loop:

(pseudocode)
set up link register to point to loop
.loop
do loop checking
[code]
branch to function
.end of loop

or in ARM code:

ADR R14,loop             ;get the address of loop in R14
MOV counter,#cycles
.loop
SUBS counter,counter,#1
BLT loop_finish          ;finish when counter becomes negative
B function               ;branch to function (without link)
.loop_finish

So the function returns directly to the start of the loop. Note that the function itself is unchanged.

Of course this means that you can't use R14 for anything else, ie you couldn't call other subroutines. However if the final function in your loop is always called in this manner you can use another register for your link register. There are other possibilities, such as sharing functions called by branch-tables, etc.

As I said, I'm not claiming this is big or clever, but it is slightly faster than the naive code.

The zip file viewer built into the Developer Toolbox made use of the zlib library, as well as the zlibdll source additions.