Almost all of the drive towards getting out the first DP of LiveCode 7 consisted of squashing bugs and improving general stability. In other words, we were focused almost entirely on functionality. Now that the list of bugs has dropped to a manageable level, we have turned our attention to improving speed (don’t worry though, we haven’t forgotten about your bugs!)
If you’ve read Fraser’s blog post about storage of unicode strings you’ll know that certain on-screen characters take up more space in memory than others. Most of the awkwardness in manipulating and processing strings comes from having to deal with this possibility. This means that where previously accessing a character of a string was just a constant time array access operation, now it requires mapping the code unit indices to character indices. “You can’t make a Unicode omelette without having to crack a few performance eggs,” if you will.
So one way we’ve improved performance is by keeping track of when the strings have such anomalies. This is what I’m thinking of as the ‘count your chickens before they are hatched’ optimisation. Each code unit is an egg, and the characters are the resulting chicks. If I’ve promised to give you my 20,000th chick, I can simply give you the 20,000th egg, knowing that there is a one-to-one correspondence between them. Otherwise, I would have to go through them one by one inspecting each egg to see if it has hatched before being able to declare a certain chick as the 20,000th.Â
Another change that has been made to speed up the engine is what might be called the ‘don’t put all your eggs in one basket’ optimisation. The basket, in this case, is the array of 16-bit code units which is used to store the Unicode string. Now we have two baskets, the second being an array of 8-bit code units. We have all our large eggs in one basket, and small in another. We’re not so much concerned about dropping the basket and losing the eggs, but more about being able to use shortcuts – sorting them out in this way should make it much easier for the eggs to be examined and put in appropriate boxes, for example. Similarly there are many more efficient methods we can use for string processing with arrays of 8-bit code units.
Speaking of boxes, another major slowdown in LiveCode 7 was related to the storage of LiveCode variables. All LiveCode variables were being stored using an internal concept of ‘value’, and never as native C types. In particular, this meant that numeric variables were stored as MCNumberRef, essentially just a wrapper round a native type. Adding to a number involved the unboxing from and re-boxing into an MCNumberRef, a relatively expensive operation. The change to allowing certain native types is roughly the equivalent of having a built-in egg holder in your fridge versus having the actual box in there.
If you thought that analogy was a stretch, then you perhaps ought to stop here. I’ll be honest, I thought there were more chicken and/or egg phrases that I could adapt for my purposes.
Benjamin Franklin once said “An egg today is better than a hen tomorrow.” How remarkably prescient of him, about 250 years before the fact, to envisage how we would be approaching the task of improving our Unicode string comparison operations. For he must surely have been referring to the following example:
put "a" before tLongUnicodeString if tLongUnicodeString begins with "a" then // do something end if
Unfortunately checking in general if tLongUnicodeString begins with another string is distinctly non-trivial if the formSensitive is false. This is because there is no way of knowing in advance what displayable characters it consists of, and how many code units might make up those characters; it may be that the whole string is only one displayed character, if it has an absurd amount of combining characters. So even if the first displayed character of the string is in fact a single code unit, we would have to normalize the entirety of tLongUnicodeString simply in order to perform the check. To use Franklin’s rather bizarrely chosen fowl metaphor, the normalized form of tLongUnicodeString is the ‘hen’ which takes a (metaphorical) day to normalize, whereas the first displayed character of it is the ‘egg’. If we could normalize on the fly, then we would only have to wait for the ‘hen’ if the strings were in fact equal. If not, we could use the ‘egg’ today to return a result very quickly.
These are just some of the optimisations we have implemented for LiveCode 7, so we think you’ll notice significant speed improvements in DP 3.
3 comments
Join the conversationAlan Stenhouse - April 30, 2014
Alan Stenhouse liked this on Facebook.
Mikey - May 1, 2014
Ummmmmmmm, what if it’s a double-yolk?
Dar - May 1, 2014
I wonder what an absurd amount of combining characters is. Maybe anybody who uses more than four should pay the performance price.