Head In The (Word) Clouds

by Ali Lloyd on September 12, 2014 26 comments

The word “cloud” was much in evidence at the RunRevLive Conference this year. This was primarily because of LiveCloud, a cloud database for LiveCode, whose aesthetically pleasing t-shirts were donned in abundance, and whose capabilities were amply demonstrated by the official conference app. This pervasiveness stood in stark contrast to that of the San Diego sky, which staunchly refused to have anything whatsoever to do with clouds for the whole week-and-a-bit.

So perhaps it was with those LiveCloud t-shirts in mind that I, whilst putting the finishing touches to the demo stack to accompany my talk and contemplating compelling use-cases for the trueWord chunk, decided to code a word cloud creator.

Well, I say ‘decided to code’; first I checked if anyone had done one already that I could steal. No dice. Then, in case anyone else had even thought about it, I threw the question out to some of the team: “How easy would it be to get LiveCode to make word clouds?” We weren’t sure, and I needed something by the morning. So I gave it a go.

Initially I attempted to define an algorithm for word placement, keeping track of the current border and repeatedly traversing it to find the next available space. This quickly got too complicated, with many additional considerations to do with overhangs and such- a more heuristic approach was necessary.

The idea of a word cloud is not only that the size of a word is related to its frequency, but also that the distance of the word from the center is, on the whole, inversely related to frequency. So once you have the words in frequency order, the ideal path on which to try and position them is a spiral.

The spiral equation can be written parametrically as x = tsin(t), y = tcos(t), which means that at a given time t, the distance from the origin of a point on the spiral is also t, since x&#0178 + y&#0178 is t&#0178(sin&#0178t + cos&#0178t) = t&#0178. So when we search for positions to place words, we can search along the spiral for the next available position, knowing that we are uniformly increasing the distance from the center as we vary the angle.

Here’s the code for returning the location on the spiral at time pTime:

function spiralLoc pTime
   local tX, tY
   put pTime * sin(pTime) into tX
   put pTime * cos(pTime) into tY
   
   return (tX & comma & tY)
end spiralLoc

When following this strategy, we need a way of testing whether words are overlapping. Initially I had thought about taking a snapshot of a field containing the text and changing the alpha value of all the white pixels so that they would be transparent. Then I’d have an image to use for hit testing using LiveCode’s intersect command. It was pointed out to me, however, that the import snaphot command includes transparency data if you use the …of object form, and so this was as simple as the following:

on snapshotField pWord
   set the opaque of field pWord to false
   set the showborder of field pWord to false	
   import snapshot from rectangle (the rect of field pWord) of field pWord with effects
end snapshotField

Then to test if we can place a word, we just use LiveCode’s intersect command to check whether it intersects any previous word at the proposed position:

function canPlace pNum, pTime
   // we can always place the first word
   if pNum is 1 then return true
   if imageIntersects(pNum) is false then return true
   return false
end canPlace

We now have essentially all the ingredients for a LiveCode word cloud creator. First of all, a handler to generate the list of frequencies:

function listFrequencies pText
   local tArray, tList
   repeat for each trueWord tWord in pText
      add 1 to tArray[tWord]
   end repeat
   repeat for each key tKey in tArray
      put tKey & comma & tArray[tKey] & cr after tList
   end repeat
   sort tList descending numeric by item 2 of each
   return tList
end listFrequencies

Then a handler to create the image for a given word. The pWeight parameter is the frequency of the given word divided by the frequency of the most frequent word.

on createImage pWord, pWeight
   set the textSize of the templateField to pWeight*kMaxSize
   create field pWord
   set the text of field pWord to pWord
   set the height of field pWord to the formattedHeight of field pWord
   set the width of field pWord to the formattedWidth of field pWord
   snapshotField pWord
   delete field pWord
end createImage

Repeating this for each of the words we are using to create the cloud gives us the set of images to place. We then try to place each image horizontally (and if that fails vertically) on the spiral path:

on placeWord pNum
   local tVertical, tTime
   put 0 into tTime
   repeat while tTime < kLimit
      if sFilledTimes[tTime] is empty then 
         set the angle of image pNum to 0         
         set the loc of image pNum to spiralLoc(tTime)
         if canPlace(pNum) then
            put true into sFilledTimes[tTime] 
            exit placeWord
         else
            set the angle of image pNum to 90         
         end if
         if canPlace(pNum) then
            put true into sFilledTimes[tTime] 
            exit placeWord
         end if
      end if
      add kIncrement to tTime
   end repeat
   put ("couldn't place word" & pNum) after msg
end placeWord

Watch the placement routine in action in this slightly annoying gif

These handlers together give a relatively pleasing word cloud, although I wanted to make a few simple tweaks to improve it. First of all, word clouds tend not to be circular but elliptical, so I introduced a flattening factor of 0.5 to the spiral equation. Also I felt the word sizes were not contrasted starkly enough, so I used the square of the weights instead. Sometimes words were being placed too close together, requiring a small white outer glow on the text to make such words fail the intersection test. Initially coding them as constants, I allowed the user to input various parameters like maximum font size, spiral flatness, maximum number of words, minimum word length, etc.

Here is the word cloud of this blog post!

What’s more, this being LiveCode 7.0, I was able to put text of all sorts of different languages into the text input box and everything worked exactly as expected. Here’s one with some sample text from various languages.

 

 

A Unicode compatible word cloud creator in under 160 readable lines – not many other languages can boast that sort of economy. What’s more, it only took a few hours to achieve something which by all accounts is distinctly non-trivial.

Many thanks to Ben and David who came up with some useful suggestions while I was coding this. If you can come up with any improvements, particularly to speed, I would be interested to hear them in the comments. Perhaps surprisingly, taking a snapshot of the group each time instead of looping though all previous images to find the intersection seems to slow things down.

Download my word cloud stack here.

Ali LloydHead In The (Word) Clouds

26 comments

Join the conversation
  • Larry Walker - September 12, 2014 reply

    Outstandingly clever hack, Ali!

    I think this post is a perfect example of the ingenuity of LiveCoders and the amazing power, brevity, and clarity of the LiveCode language. I find that, cleverly applied, LiveCode allows one to embrace the problem, instead of having to shoe-horn the problem into the language…

    Nicely done!

    Larry

    Ali Lloyd - September 12, 2014 reply

    Thanks Larry!

    I completely agree. Knowing how easy it is to actually implement a solution once you have it gives you that much more time to mull it over beforehand. Which in turn leads to much neater ideas.

    Aanchal - February 20, 2015 reply

    Oh this is such a pretty card thank you for jnniiog in our challenge at Forever Friends “BEARS” good luck and hope to see you again next time DT Member Sandra H xx

  • python - September 24, 2014 reply

    Cool. Ur the god of coding! What you have done is cool.

    Hari - October 18, 2014 reply

    1. Aber es gibt doch /realname.2. du kannst ja eine /gm1-Welt erltelsen, in der man mit gm baun kann. Also auch ab Spieler. Und nur die Baumis dfcrfen in der Survival-Gs-Welt baun. Und auch nur wenn sie da kein Scheidf baun. Wie findest du den Vorschlag mit Creative-Gs-Welt und Survival-Gs-Welt?3. Wie is des jetz mit der Mittelalterwelt?Danke darkhugteroXXX

  • Nick - October 3, 2014 reply

    Great article Ali. Clever code and LiveCode make for a powerful combo. “Tag Clouds” and “Word Clouds” seem to have fallen out of fashion but your take on placement is a breath of fresh air and makes for an awesome demo. The unicode co-mingled language word cloud is just icing on an already delicious cake.

    Thanks for sharing!

  • Lee - October 24, 2014 reply

    Is there a way of adding some colour selections to this and a way to save them as a high res image file?

    Ali - October 24, 2014 reply

    Hi Lee,

    Yes this is possible. In the createImages handler, you can add something like this:

    set the textColor of the templatefield to “red”

    or whatever colour you want. For a user defined colour, you can use the answer color command in a button to prompt a color selection (although obviously it wouldn’t be advisable to put this in the repeat loop!). Alternatively you could have a list of colours and make the text a random colour, eg:

    (in createImages)
    local tColors
    put the colorNames into tColors

    (in the repeat loop)
    set the textColor of the templatefield to any line of tColors

    in terms of exporting, you just need to add the line

    export snapshot from rect (the rect of group “Cloud”) of group “Cloud” to file (specialfolderpath(“Desktop”) & slash & “Cloud.png”) as png

    somewhere at the end and that will export an image of the cloud to your desktop.

    Here is an example using the random colours:
    https://dl.dropboxusercontent.com/u/32827558/Cloud.png

    Good suggestion, I kind of wish I had done that in the first place!

    Ali

  • Pauli - November 25, 2014 reply

    Hi, I tried this and typed in the following English-French-nonsense just to test: “Je me suis reveillé ce matin aussi a’jourd hui do you believe it”.

    Yes it is all wrong and garbage and just nonsense. But I typed this to test this code only.

    It keeps the font size at the maximum and writes the texts over the boundaries. All the texts are simply horizontal. It does not seem to work at all as described above.

    Am I doing something wrong or what is it that I don’t understand?

    Ali Lloyd - November 26, 2014 reply

    Hi Pauli,

    The reason it doesn’t work very well with that input is that the sizes are dependent on word frequency. Since all of your words there occur exactly once, they will all be the same size (determined by the max font size property).

    If you just wanted to create a word cloud with specific words in, the code would need to be changed a little – for example you could modify the listFrequencies function to assign certain weighting to each word.

    Ali

    Pauli - November 27, 2014 reply

    Thanks. I had misunderstood how this works. It seems to do the trick if the weighting depends on the word length, for example.

    I also seem to struggle to understand how 7.0 handles unicode. If I want to enter the text from a script, if I do

    set the unicodeText of field “Text” to uniencode(“text i want to enter is here, and a French à”)

    it does not work properly: it handles the characters as words. If I enter

    set the text of field “Text” to “text i want to enter is here, and a French à”

    the last character is not processed correctly, not as unicode. Sorry for this basic question but instead of typing in the field, how can I enter unicode characters corretly into it so that your fantastic WordCloud works?

    Ali - November 28, 2014 reply

    Hi Pauli,

    You shouldn’t need to do anything with uniEncode / uniDecode or the useUnicode property. You should simply just be able to type into a field (eg with alt + key for some characters), or copy and paste them from any source.

    The problem you are experiencing may be the curly quotes in
    set the text of field “Text” to “text i want to enter is here, and a French à”

    Strings must be surrounded by ascii quote characters ” rather than “ and ”.

    set the text of field “Text” to “text i want to enter is here, and a French à” should work from script or in the message box.

    Let me know if this resolves your problem.

    Ali

    Ali - November 29, 2014

    Ah, I see quotation marks are auto converted on this site, so that may not have been the problem …

    Pauli - December 13, 2014

    If I type in the field “Text” it works properly. But I have that text in UTF8 format in file French.txt.

    put URL “file:Z:/French.txt” into x

    set the text of field “Text” of card “WordArt” to uniencode(x,”UTF8″)
    set the text of field “Text” of card “WordArt” to quote & uniencode(x,”UTF8″) & quote
    set the unicodeText of field “Text” of card “WordArt” to uniencode(x,”UTF8″)

    None of these work properly. The last one did in version 6. What is the proper syntax to use in version 7?

    Pauli - December 13, 2014

    I forgot to add that the most simple candidate

    set the text of field “Text” of card “WordArt” to x

    does not work, either, when x is in UTF8. Any special character such as é disappear that way.

    Pauli - December 13, 2014

    Sorry I said it wrong: actually the old format

    set the unicodeText of field “Text” of card “WordArt” to uniencode(x,”UTF8″)

    might still be the one that works here.

    Ali - December 13, 2014 reply

    Sorry, it doesn’t seem to want to let me reply to your most recent post.

    I didn’t realise you text was coming from an external source as utf-8. If you have text in utf-8 in a file, then you can do the following

    put url (“binfile:Z:/French.txt”) into tUTF8Data
    put textDecode(tUTF8Data, “utf-8”) into field 1

    the utf-8 data should display properly then.

    Pauli - December 13, 2014

    Thanks! I’ll try this. The old format where I use file instead of binfile and then

    set the unicodeText of field “Text” of card “WordArt” to uniencode(x,”UTF8″)

    seems to work, too.

Join the conversation

*