The word “cloud” was much in evidence at the RunRevLive Conference this year. This was primarily because of LiveCloud, a cloud database for LiveCode, whose aesthetically pleasing t-shirts were donned in abundance, and whose capabilities were amply demonstrated by the official conference app. This pervasiveness stood in stark contrast to that of the San Diego sky, which staunchly refused to have anything whatsoever to do with clouds for the whole week-and-a-bit.
So perhaps it was with those LiveCloud t-shirts in mind that I, whilst putting the finishing touches to the demo stack to accompany my talk and contemplating compelling use-cases for the trueWord chunk, decided to code a word cloud creator.
Well, I say ‘decided to code’; first I checked if anyone had done one already that I could steal. No dice. Then, in case anyone else had even thought about it, I threw the question out to some of the team: “How easy would it be to get LiveCode to make word clouds?” We weren’t sure, and I needed something by the morning. So I gave it a go.
Initially I attempted to define an algorithm for word placement, keeping track of the current border and repeatedly traversing it to find the next available space. This quickly got too complicated, with many additional considerations to do with overhangs and such- a more heuristic approach was necessary.
The idea of a word cloud is not only that the size of a word is related to its frequency, but also that the distance of the word from the center is, on the whole, inversely related to frequency. So once you have the words in frequency order, the ideal path on which to try and position them is a spiral.
The spiral equation can be written parametrically as x = tsin(t), y = tcos(t), which means that at a given time t, the distance from the origin of a point on the spiral is also t, since x² + y² is t²(sin²t + cos²t) = t². So when we search for positions to place words, we can search along the spiral for the next available position, knowing that we are uniformly increasing the distance from the center as we vary the angle.
Here’s the code for returning the location on the spiral at time pTime:
function spiralLoc pTime local tX, tY put pTime * sin(pTime) into tX put pTime * cos(pTime) into tY return (tX & comma & tY) end spiralLoc
When following this strategy, we need a way of testing whether words are overlapping. Initially I had thought about taking a snapshot of a field containing the text and changing the alpha value of all the white pixels so that they would be transparent. Then I’d have an image to use for hit testing using LiveCode’s intersect command. It was pointed out to me, however, that the import snaphot command includes transparency data if you use the …of object form, and so this was as simple as the following:
on snapshotField pWord set the opaque of field pWord to false set the showborder of field pWord to false import snapshot from rectangle (the rect of field pWord) of field pWord with effects end snapshotField
Then to test if we can place a word, we just use LiveCode’s intersect command to check whether it intersects any previous word at the proposed position:
function canPlace pNum, pTime // we can always place the first word if pNum is 1 then return true if imageIntersects(pNum) is false then return true return false end canPlace
We now have essentially all the ingredients for a LiveCode word cloud creator. First of all, a handler to generate the list of frequencies:
function listFrequencies pText local tArray, tList repeat for each trueWord tWord in pText add 1 to tArray[tWord] end repeat repeat for each key tKey in tArray put tKey & comma & tArray[tKey] & cr after tList end repeat sort tList descending numeric by item 2 of each return tList end listFrequencies
Then a handler to create the image for a given word. The pWeight parameter is the frequency of the given word divided by the frequency of the most frequent word.
on createImage pWord, pWeight set the textSize of the templateField to pWeight*kMaxSize create field pWord set the text of field pWord to pWord set the height of field pWord to the formattedHeight of field pWord set the width of field pWord to the formattedWidth of field pWord snapshotField pWord delete field pWord end createImage
Repeating this for each of the words we are using to create the cloud gives us the set of images to place. We then try to place each image horizontally (and if that fails vertically) on the spiral path:
on placeWord pNum local tVertical, tTime put 0 into tTime repeat while tTime < kLimit if sFilledTimes[tTime] is empty then set the angle of image pNum to 0 set the loc of image pNum to spiralLoc(tTime) if canPlace(pNum) then put true into sFilledTimes[tTime] exit placeWord else set the angle of image pNum to 90 end if if canPlace(pNum) then put true into sFilledTimes[tTime] exit placeWord end if end if add kIncrement to tTime end repeat put ("couldn't place word" & pNum) after msg end placeWord
Watch the placement routine in action in this slightly annoying gif
These handlers together give a relatively pleasing word cloud, although I wanted to make a few simple tweaks to improve it. First of all, word clouds tend not to be circular but elliptical, so I introduced a flattening factor of 0.5 to the spiral equation. Also I felt the word sizes were not contrasted starkly enough, so I used the square of the weights instead. Sometimes words were being placed too close together, requiring a small white outer glow on the text to make such words fail the intersection test. Initially coding them as constants, I allowed the user to input various parameters like maximum font size, spiral flatness, maximum number of words, minimum word length, etc.
Here is the word cloud of this blog post!
What’s more, this being LiveCode 7.0, I was able to put text of all sorts of different languages into the text input box and everything worked exactly as expected. Here’s one with some sample text from various languages.
A Unicode compatible word cloud creator in under 160 readable lines – not many other languages can boast that sort of economy. What’s more, it only took a few hours to achieve something which by all accounts is distinctly non-trivial.
Many thanks to Ben and David who came up with some useful suggestions while I was coding this. If you can come up with any improvements, particularly to speed, I would be interested to hear them in the comments. Perhaps surprisingly, taking a snapshot of the group each time instead of looping though all previous images to find the intersection seems to slow things down.
Download my word cloud stack here.
read more
Recent Comments