Head In The (Word) Clouds

by Ali Lloyd on September 12, 2014 26 comments

The word “cloud” was much in evidence at the RunRevLive Conference this year. This was primarily because of LiveCloud, a cloud database for LiveCode, whose aesthetically pleasing t-shirts were donned in abundance, and whose capabilities were amply demonstrated by the official conference app. This pervasiveness stood in stark contrast to that of the San Diego sky, which staunchly refused to have anything whatsoever to do with clouds for the whole week-and-a-bit.

So perhaps it was with those LiveCloud t-shirts in mind that I, whilst putting the finishing touches to the demo stack to accompany my talk and contemplating compelling use-cases for the trueWord chunk, decided to code a word cloud creator.

Well, I say ‘decided to code’; first I checked if anyone had done one already that I could steal. No dice. Then, in case anyone else had even thought about it, I threw the question out to some of the team: “How easy would it be to get LiveCode to make word clouds?” We weren’t sure, and I needed something by the morning. So I gave it a go.

Initially I attempted to define an algorithm for word placement, keeping track of the current border and repeatedly traversing it to find the next available space. This quickly got too complicated, with many additional considerations to do with overhangs and such- a more heuristic approach was necessary.

The idea of a word cloud is not only that the size of a word is related to its frequency, but also that the distance of the word from the center is, on the whole, inversely related to frequency. So once you have the words in frequency order, the ideal path on which to try and position them is a spiral.

The spiral equation can be written parametrically as x = tsin(t), y = tcos(t), which means that at a given time t, the distance from the origin of a point on the spiral is also t, since x&#0178 + y&#0178 is t&#0178(sin&#0178t + cos&#0178t) = t&#0178. So when we search for positions to place words, we can search along the spiral for the next available position, knowing that we are uniformly increasing the distance from the center as we vary the angle.

Here’s the code for returning the location on the spiral at time pTime:

function spiralLoc pTime
   local tX, tY
   put pTime * sin(pTime) into tX
   put pTime * cos(pTime) into tY
   
   return (tX & comma & tY)
end spiralLoc

When following this strategy, we need a way of testing whether words are overlapping. Initially I had thought about taking a snapshot of a field containing the text and changing the alpha value of all the white pixels so that they would be transparent. Then I’d have an image to use for hit testing using LiveCode’s intersect command. It was pointed out to me, however, that the import snaphot command includes transparency data if you use the …of object form, and so this was as simple as the following:

on snapshotField pWord
   set the opaque of field pWord to false
   set the showborder of field pWord to false	
   import snapshot from rectangle (the rect of field pWord) of field pWord with effects
end snapshotField

Then to test if we can place a word, we just use LiveCode’s intersect command to check whether it intersects any previous word at the proposed position:

function canPlace pNum, pTime
   // we can always place the first word
   if pNum is 1 then return true
   if imageIntersects(pNum) is false then return true
   return false
end canPlace

We now have essentially all the ingredients for a LiveCode word cloud creator. First of all, a handler to generate the list of frequencies:

function listFrequencies pText
   local tArray, tList
   repeat for each trueWord tWord in pText
      add 1 to tArray[tWord]
   end repeat
   repeat for each key tKey in tArray
      put tKey & comma & tArray[tKey] & cr after tList
   end repeat
   sort tList descending numeric by item 2 of each
   return tList
end listFrequencies

Then a handler to create the image for a given word. The pWeight parameter is the frequency of the given word divided by the frequency of the most frequent word.

on createImage pWord, pWeight
   set the textSize of the templateField to pWeight*kMaxSize
   create field pWord
   set the text of field pWord to pWord
   set the height of field pWord to the formattedHeight of field pWord
   set the width of field pWord to the formattedWidth of field pWord
   snapshotField pWord
   delete field pWord
end createImage

Repeating this for each of the words we are using to create the cloud gives us the set of images to place. We then try to place each image horizontally (and if that fails vertically) on the spiral path:

on placeWord pNum
   local tVertical, tTime
   put 0 into tTime
   repeat while tTime < kLimit
      if sFilledTimes[tTime] is empty then 
         set the angle of image pNum to 0         
         set the loc of image pNum to spiralLoc(tTime)
         if canPlace(pNum) then
            put true into sFilledTimes[tTime] 
            exit placeWord
         else
            set the angle of image pNum to 90         
         end if
         if canPlace(pNum) then
            put true into sFilledTimes[tTime] 
            exit placeWord
         end if
      end if
      add kIncrement to tTime
   end repeat
   put ("couldn't place word" & pNum) after msg
end placeWord

Watch the placement routine in action in this slightly annoying gif

These handlers together give a relatively pleasing word cloud, although I wanted to make a few simple tweaks to improve it. First of all, word clouds tend not to be circular but elliptical, so I introduced a flattening factor of 0.5 to the spiral equation. Also I felt the word sizes were not contrasted starkly enough, so I used the square of the weights instead. Sometimes words were being placed too close together, requiring a small white outer glow on the text to make such words fail the intersection test. Initially coding them as constants, I allowed the user to input various parameters like maximum font size, spiral flatness, maximum number of words, minimum word length, etc.

Here is the word cloud of this blog post!

What’s more, this being LiveCode 7.0, I was able to put text of all sorts of different languages into the text input box and everything worked exactly as expected. Here’s one with some sample text from various languages.

 

 

A Unicode compatible word cloud creator in under 160 readable lines – not many other languages can boast that sort of economy. What’s more, it only took a few hours to achieve something which by all accounts is distinctly non-trivial.

Many thanks to Ben and David who came up with some useful suggestions while I was coding this. If you can come up with any improvements, particularly to speed, I would be interested to hear them in the comments. Perhaps surprisingly, taking a snapshot of the group each time instead of looping though all previous images to find the intersection seems to slow things down.

Download my word cloud stack here.

read more
Ali LloydHead In The (Word) Clouds