An tutorial on optimizing LPC
(specifically the MudOS 0.9.x variant)
Written by Luke Mewburn
Version: 1.2
Date: 930321
0. INTRODUCTION
I wrote this document after noticing that the growing trend in LP
based muds is for more and more complexity and player orientated
features. This is fine, but due to the way that LPC works (only one
part of the code being interpreted at once), badly written code can
slow down the mud or cause `lag' to all other users on the mud.
Thus, I decided to find out the most efficient ways of performing
common tasks and set about documenting them. I assume that the reader
has had some programming experience, especially in C or LPC. But, even
if you are a beginner, I still recommend that you read this because
it's good to learn the correct way from the start.
Although I based most of my findings on a MudOS 0.9 driver, almost
all of the points will be relevant to other LP v3 based drivers. But,
this does not guarantee that this document is correct for all non-MudOS
drivers (such as Amylaar's 3.2@22 driver), given the increasing divergance
in the driver code.
In code examples, assume the following definitions:
int i, j, max, bitfield;
object *list;
string *arr, name;
LPC, like C, supports the concept of `dynamic allocation'. This is
where the driver only allocates memory when needed for various
operations (and deallocates it when finished). Unlike C, LPC will
automatically deallocate memory when you are finished with it (i.e, when
no LPC code references it anymore) - this is `garbage collection'.
Strings (q.v) are handled dynamically, as are arrays and mappings
(q.v).
Although dynamic allocation is useful to structured programming,
it can be costly on CPU time. One of the aims of this document is to
show you how to reap the benefits of dynamic allocation without the
huge overheads of it.
1. GENERAL POINTS
Before I show you how to speed up specific elements of your code,
I'll mention a couple of well-known axioms in computer programming.
The first is `80% of the time spent executing an algorithm is
in 20% of the code', or the `80/20' rule. (Sometimes quoted with 75/25
or 90/10 as the figures). This well proven fact can be used to assist
in improving your LPC. Don't spend your time constantly optimising every
piece of code you write - there isn't the time. Concentrate on the
commonly executed sections such as inner for-loops, etc.
The second is the choice of correct algorithm. Often a more
complex algorithm is used when a simpler one will suffice, especially
when for a small number of elements, simpler algorithms are more efficient.
But, the tradeoff must be made between simplicity and speed.
As most (if any) drivers currently don't have more advanced
optimization techniques for the LPC code (such as expression re-writing,
and loop strength-reduction) you can assist by more intelligently writing
your code.
A good speedup is to move a common constant expression/function
out of a loop or loop test and into a temporary variable. The most
obvious form of this is the use of 'sizeof(arr)' in a while/for
condition (q.v). In other situations, there might be a function that
is called inside the loop at each iteration, but that function will
always return the same value - if that is the case, assign the result
outside of the loop and access the temporary variable instead. E.g:
for ( i = 0; i < max ; i++)
if ( list[i] == some_condition )
do_something_with( list[i], this_player()->query("name") );
the name can assume to be static, so the following can be done:
name = this_player()->query("name");
for ( i = 0; i < max ; i++)
if ( list[i] == some_condition )
do_something_with( list[i], name );
While not an actual code speedup, writing easy-to-read and
consistantly formatted code can speed up debugging and changing. A lot
of people have differing views on indenting, and I'm not going to
start a jihad by forcing my style down your throats - just pick what
is good for you and be consistant with it.
2. DATA TYPES
LPC has (at least) the following data types: int, string, object,
mixed, mapping. Arrays of these are also possible. Some notes on using
some of these data types:
2.1. INT
The integer data type is very efficient to use, as no dynamic
allocation required to use it. By using bitfields (e.g, hand-coded
with #defines, (simulating ANSI c's enum's)), you can store a variety
of flags information as well. This latter use is much faster than
storing information of bit size, as long as you only need 32 bits.
Example code to manipulate integer bit fields follows (assuming that
VALUE is 1, 2, 4, 8, etc):
bitfield |= VALUE; // set bit VALUE
bitfield &= ~VALUE; // reset bit VALUE
bitfield & VALUE // test bit VALUE
If you want to set bit n, replace VALUE with (1 << n).
2.2. STRINGS
Strings form the most commonly used data-type, but require dynamic
allocation. Constant strings (i.e, such as pure 'write' statements)
are stored in a 'string table', so usually don't require extra memory
for multiple copies. String addition (with the + operator) can be a
costly operation - memory allocation is done for _each_ addition at
compile time. A common coding style used is something like:
write("This is a forest\n" +
"and is really boring.\n" +
"You are feeling sleepy\n");
Now, this is one of the _worst_ ways to represent constant strings
- it's slow when the object is being compiled, and is not memory efficient
either. Note that with MudOS 0.9.15 the `+' in a string addition is
optional - but this only saves characters in the source file - again,
it requires a dynamic memory allocation for each string within double
quotes when the object is loaded. Try something like:
write("\
This is a forest\n\
and is really boring.\n\
You are feeling sleepy.\n\
");
So that the string is stored as 1 chunk. You can usually store upto
1 page of screen text before your parser chokes - so split up your
paragraphs within this limit, using a separate write statement.
NOTE: If there is a space between the \ and the end of the line, the
parser will barf.
UPDATE: From MudOS 0.9.14.3 onwards, there is a new operator for
constant string assignment - `@', so the above could be written as:
write( @ENDMESSAGE
This is a forest
and is really boring
You are feeling sleepy.
ENDMESSAGE
);
(The 'ENDMESSAGE' can be any string that won't occur in the main
body, and it must start on a line of it's own, otherwise it won't be
recognised as the 'end of message' delimeter).
Another inefficient method is something like:
write("Your name is " + this_player()->query("name") + "\n" +
"and your level is " + this_player()->query("level") + "\n");
If your driver supports printf(), try doing something like:
printf("Your name is %s\nand your level is %d\n",
this_player->query("name"), this_player->query("level") );
There are many other examples like this, so you get the general
idea.
But, don't get me wrong - often you must use string addition, so
don't try and use extremely long-winded and fancy techniques to avoid
it (which will most likely end up being slower than the addition).
2.3. FLOATS
With the 0.9.15 driver, a floating point data type was introduced.
A variable can be declared as this with the `float' keyword. (For C
programmers, LPC float is the same as the C `double' type).
A variety of operations such as cos, sin, (and associated
operations), ln, and sqrt are available. This datatype can be
optionally excluded from a driver at compile time, but I cannot see
why this would be necessary.
Mathmatical operations on floats are not as fast as for integers -
so remain with integer arithmatic wherever possible. If you need to
use floats, implement them by precalculating results wherever
possible, to cut down on the number of redundant operations (this is
true for any code though).
2.4. ARRAYS
Arrays are extremely useful, but like strings, if used incorrectly,
can be very inefficient.
A loop which incrementally adds or removes an element can be very
slow. Intelligent use of pre-allocation is the solution in this case.
From /adm/daemons/cmd_d.c on TMI-2:
bin_ls = get_dir(path + "/");
result = ({ });
for (i = 0; i < sizeof(bin_ls); i++) {
if(sscanf(bin_ls[i], "_%s.c", tmp) == 1)
result += ({ tmp });
}
cmd_table[path]=result;
Here, not all of the elements are selected, and those that are
are modified before being assigned to the result array. This is a
replacement:
bin_ls = get_dir(path + "/");
i = sizeof(bin_ls);
result = allocate(i);
j = 0;
while (i--) { // see below for why `while' is used
if(sscanf(bin_ls[i], "_%s.c", tmp) == 1)
result[j++] = tmp;
}
cmd_table[path]=result[0..j-1];
Note how that since we know that result can only be <=
sizeof(bin_ls), we allocate that much. Then, we add elements as we
need them. Finally, before we use result, it's resized to the correct
size (with the [0..j-1] operation).
[ Note, in previous versions of this document, I recommended that
the filter_array() efun to be used - unfortunately, due to function
The major reason for the speed difference with preallocation is that
the preallocation technique has a linear increase in time with respect
to an increase in elements, and the array addition using += has exponential
increase.
2.5. MAPPINGS
Mappings are associative arrays - i.e, arrays which can be indexed
on any basic datatype - int, string, or object. As a data structure,
they are usually more efficient than arrays, and can be used to
simulate `structs' that exists in C.
Due to the internal storage method of mappings, a performance gain
can be obtained if you preallocate the mapping before use. This is most
effective when you know that the mapping will almost always have x
elements (which can increase), and won't usually have elements deleted
that will bring the total below x. (i.e., when map_delete is used on a
mapping which has been preallocated to x elements, but may have more
than x elements used, and most of the x elements have been deleted with
map_delete, then you will waste more memory as map_delete doesn't free
up individual elements of a preallocated mapping back to the system).
A common use of preallocation would be in the emote daemon or in a
standard object such as /std/user (or /obj/player). With the former, if
you already have approximatly 200 elemets, don't initialize your mapping
with:
emotemap = ([ ]);
Instead use:
emotemap = allocate_mapping( 200 );
3. CONTROL FLOW & LOOPING
An integral part of LPC programming, the control of execution flow
(through tests, and as part of looping constructs) can be made more
efficient through careful use of various forms of it.
3.1. WHILE
The simplest (and quickest in the 'simple' case) way of looping.
A common task is something like:
list = users();
for ( i = 0 ; i < sizeof(list) ; i++ )
do_something_with_item( list[i] );
This is inefficient, especially since the sizeof(list) must be
recalculated (see above under `General Points'). If you wish to traverse
the list in ascending order, try something like:
max = sizeof(list); // slight performance gain
i = -1;
while ( ++i < max ) // evaluate and increment at same time
do_something_with_item(list[i]);
If the order of traversal is unimportant, the following is the
fastest method:
i = sizeof(list);
while (i--)
do_something_with_item(list[i]);
This is the 'simple condition while loop'. There is actually
minimal difference between
for ( i = sizeof(list) ; i-- ; ) ;
and the equivalent while loop, but it is a minor gain.
3.2. FOR
One of the most popular looping constructs, and, in situations
where you need to perform an operation at the end of each test that is
more complex than a simple test or decrement, it is useful. If a
simple increment or decrement of a loop counter is required, the
`while' is probably better suited to the task.
If the 'ending operation' is more complex, then the generic form:
for ( initialise ; test ; final ) {
main body
}
is `cleaner' than
initialise
while ( test ) {
main body
final
}
3.3. SWITCH
The switch statement should be used in preference to
'if-then-else-...' based structures (if possible), as in most of the
newer, more advanced drivers, the switch statement is optimized within
the driver. Anyway, the code looks cleaner.
The other advantage of switch is the use of case ranges (a feature
not found in C). If you wish to test for a range of elements,
replacing
case 1: case 2: case 3:
with
case 1..3:
could be more efficient (on typing anyway...)
4. SUMMARY
I'm sure that there are probably mistakes in this document, but
nobody's perfect.
The following people asked questions, provided suggestions, etc for
version 1.1:
@TMI-2: Blackthorn, Mobydick, Square, Alexus, Amylaar, Darin
@Ivory.tower: Telsin, Vampyr
@Underdark: Cynic, Brian