The word “cloud” was much in evidence at the RunRevLive Conference this year. This was primarily because of LiveCloud, a cloud database for LiveCode, whose aesthetically pleasing t-shirts were donned in abundance, and whose capabilities were amply demonstrated by the official conference app. This pervasiveness stood in stark contrast to that of the San Diego sky, which staunchly refused to have anything whatsoever to do with clouds for the whole week-and-a-bit.
So perhaps it was with those LiveCloud t-shirts in mind that I, whilst putting the finishing touches to the demo stack to accompany my talk and contemplating compelling use-cases for the trueWord chunk, decided to code a word cloud creator.
Well, I say ‘decided to code’; first I checked if anyone had done one already that I could steal. No dice. Then, in case anyone else had even thought about it, I threw the question out to some of the team: “How easy would it be to get LiveCode to make word clouds?” We weren’t sure, and I needed something by the morning. So I gave it a go.
Initially I attempted to define an algorithm for word placement, keeping track of the current border and repeatedly traversing it to find the next available space. This quickly got too complicated, with many additional considerations to do with overhangs and such- a more heuristic approach was necessary.
The idea of a word cloud is not only that the size of a word is related to its frequency, but also that the distance of the word from the center is, on the whole, inversely related to frequency. So once you have the words in frequency order, the ideal path on which to try and position them is a spiral.
The spiral equation can be written parametrically as x = tsin(t), y = tcos(t), which means that at a given time t, the distance from the origin of a point on the spiral is also t, since x² + y² is t²(sin²t + cos²t) = t². So when we search for positions to place words, we can search along the spiral for the next available position, knowing that we are uniformly increasing the distance from the center as we vary the angle.
Here’s the code for returning the location on the spiral at time pTime:
function spiralLoc pTime local tX, tY put pTime * sin(pTime) into tX put pTime * cos(pTime) into tY return (tX & comma & tY) end spiralLoc
When following this strategy, we need a way of testing whether words are overlapping. Initially I had thought about taking a snapshot of a field containing the text and changing the alpha value of all the white pixels so that they would be transparent. Then I’d have an image to use for hit testing using LiveCode’s intersect command. It was pointed out to me, however, that the import snaphot command includes transparency data if you use the …of object form, and so this was as simple as the following:
on snapshotField pWord set the opaque of field pWord to false set the showborder of field pWord to false import snapshot from rectangle (the rect of field pWord) of field pWord with effects end snapshotField
Then to test if we can place a word, we just use LiveCode’s intersect command to check whether it intersects any previous word at the proposed position:
function canPlace pNum, pTime // we can always place the first word if pNum is 1 then return true if imageIntersects(pNum) is false then return true return false end canPlace
We now have essentially all the ingredients for a LiveCode word cloud creator. First of all, a handler to generate the list of frequencies:
function listFrequencies pText local tArray, tList repeat for each trueWord tWord in pText add 1 to tArray[tWord] end repeat repeat for each key tKey in tArray put tKey & comma & tArray[tKey] & cr after tList end repeat sort tList descending numeric by item 2 of each return tList end listFrequencies
Then a handler to create the image for a given word. The pWeight parameter is the frequency of the given word divided by the frequency of the most frequent word.
on createImage pWord, pWeight set the textSize of the templateField to pWeight*kMaxSize create field pWord set the text of field pWord to pWord set the height of field pWord to the formattedHeight of field pWord set the width of field pWord to the formattedWidth of field pWord snapshotField pWord delete field pWord end createImage
Repeating this for each of the words we are using to create the cloud gives us the set of images to place. We then try to place each image horizontally (and if that fails vertically) on the spiral path:
on placeWord pNum local tVertical, tTime put 0 into tTime repeat while tTime < kLimit if sFilledTimes[tTime] is empty then set the angle of image pNum to 0 set the loc of image pNum to spiralLoc(tTime) if canPlace(pNum) then put true into sFilledTimes[tTime] exit placeWord else set the angle of image pNum to 90 end if if canPlace(pNum) then put true into sFilledTimes[tTime] exit placeWord end if end if add kIncrement to tTime end repeat put ("couldn't place word" & pNum) after msg end placeWord
Watch the placement routine in action in this slightly annoying gif
These handlers together give a relatively pleasing word cloud, although I wanted to make a few simple tweaks to improve it. First of all, word clouds tend not to be circular but elliptical, so I introduced a flattening factor of 0.5 to the spiral equation. Also I felt the word sizes were not contrasted starkly enough, so I used the square of the weights instead. Sometimes words were being placed too close together, requiring a small white outer glow on the text to make such words fail the intersection test. Initially coding them as constants, I allowed the user to input various parameters like maximum font size, spiral flatness, maximum number of words, minimum word length, etc.
Here is the word cloud of this blog post!
What’s more, this being LiveCode 7.0, I was able to put text of all sorts of different languages into the text input box and everything worked exactly as expected. Here’s one with some sample text from various languages.
A Unicode compatible word cloud creator in under 160 readable lines – not many other languages can boast that sort of economy. What’s more, it only took a few hours to achieve something which by all accounts is distinctly non-trivial.
Many thanks to Ben and David who came up with some useful suggestions while I was coding this. If you can come up with any improvements, particularly to speed, I would be interested to hear them in the comments. Perhaps surprisingly, taking a snapshot of the group each time instead of looping though all previous images to find the intersection seems to slow things down.
26 comments
Join the conversationgheizhwinder - September 12, 2014
RT @runrev: Head In The (Word) Clouds: The word “cloud” was much in evidence at the RunRevLive Conference this year. Th… http://t.co/3wx…
Larry Walker - September 12, 2014
Outstandingly clever hack, Ali!
I think this post is a perfect example of the ingenuity of LiveCoders and the amazing power, brevity, and clarity of the LiveCode language. I find that, cleverly applied, LiveCode allows one to embrace the problem, instead of having to shoe-horn the problem into the language…
Nicely done!
Larry
Ali Lloyd - September 12, 2014
Thanks Larry!
I completely agree. Knowing how easy it is to actually implement a solution once you have it gives you that much more time to mull it over beforehand. Which in turn leads to much neater ideas.
Aanchal - February 20, 2015
Oh this is such a pretty card thank you for jnniiog in our challenge at Forever Friends “BEARS” good luck and hope to see you again next time DT Member Sandra H xx
trevordevore - September 12, 2014
RT @runrev: Head In The (Word) Clouds: The word “cloud” was much in evidence at the RunRevLive Conference this year. Th… http://t.co/3wx…
jpatten - September 12, 2014
RT @runrev: Head In The (Word) Clouds: The word “cloud” was much in evidence at the RunRevLive Conference this year. Th… http://t.co/3wx…
Simon Smith - September 12, 2014
Simon Smith liked this on Facebook.
Pelayo Milera - September 12, 2014
I love RunRev and I’ve used it for over a dozen years. I stand by it 100%.
Pelayo Milera - September 12, 2014
Pelayo Milera liked this on Facebook.
Cristian Parada - September 12, 2014
Cristian Parada liked this on Facebook.
i3: Illinois Innovators and Inventors - September 13, 2014
i3: Illinois Innovators and Inventors liked this on Facebook.
python - September 24, 2014
Cool. Ur the god of coding! What you have done is cool.
Hari - October 18, 2014
1. Aber es gibt doch /realname.2. du kannst ja eine /gm1-Welt erltelsen, in der man mit gm baun kann. Also auch ab Spieler. Und nur die Baumis dfcrfen in der Survival-Gs-Welt baun. Und auch nur wenn sie da kein Scheidf baun. Wie findest du den Vorschlag mit Creative-Gs-Welt und Survival-Gs-Welt?3. Wie is des jetz mit der Mittelalterwelt?Danke darkhugteroXXX
Nick - October 3, 2014
Great article Ali. Clever code and LiveCode make for a powerful combo. “Tag Clouds” and “Word Clouds” seem to have fallen out of fashion but your take on placement is a breath of fresh air and makes for an awesome demo. The unicode co-mingled language word cloud is just icing on an already delicious cake.
Thanks for sharing!
Lee - October 24, 2014
Is there a way of adding some colour selections to this and a way to save them as a high res image file?
Ali - October 24, 2014
Hi Lee,
Yes this is possible. In the createImages handler, you can add something like this:
set the textColor of the templatefield to “red”
or whatever colour you want. For a user defined colour, you can use the answer color command in a button to prompt a color selection (although obviously it wouldn’t be advisable to put this in the repeat loop!). Alternatively you could have a list of colours and make the text a random colour, eg:
(in createImages)
local tColors
put the colorNames into tColors
(in the repeat loop)
set the textColor of the templatefield to any line of tColors
in terms of exporting, you just need to add the line
export snapshot from rect (the rect of group “Cloud”) of group “Cloud” to file (specialfolderpath(“Desktop”) & slash & “Cloud.png”) as png
somewhere at the end and that will export an image of the cloud to your desktop.
Here is an example using the random colours:
https://dl.dropboxusercontent.com/u/32827558/Cloud.png
Good suggestion, I kind of wish I had done that in the first place!
Ali
Pauli - November 25, 2014
Hi, I tried this and typed in the following English-French-nonsense just to test: “Je me suis reveillé ce matin aussi a’jourd hui do you believe it”.
Yes it is all wrong and garbage and just nonsense. But I typed this to test this code only.
It keeps the font size at the maximum and writes the texts over the boundaries. All the texts are simply horizontal. It does not seem to work at all as described above.
Am I doing something wrong or what is it that I don’t understand?
Ali Lloyd - November 26, 2014
Hi Pauli,
The reason it doesn’t work very well with that input is that the sizes are dependent on word frequency. Since all of your words there occur exactly once, they will all be the same size (determined by the max font size property).
If you just wanted to create a word cloud with specific words in, the code would need to be changed a little – for example you could modify the listFrequencies function to assign certain weighting to each word.
Ali
Pauli - November 27, 2014
Thanks. I had misunderstood how this works. It seems to do the trick if the weighting depends on the word length, for example.
I also seem to struggle to understand how 7.0 handles unicode. If I want to enter the text from a script, if I do
set the unicodeText of field “Text” to uniencode(“text i want to enter is here, and a French à”)
it does not work properly: it handles the characters as words. If I enter
set the text of field “Text” to “text i want to enter is here, and a French à”
the last character is not processed correctly, not as unicode. Sorry for this basic question but instead of typing in the field, how can I enter unicode characters corretly into it so that your fantastic WordCloud works?
Ali - November 28, 2014
Hi Pauli,
You shouldn’t need to do anything with uniEncode / uniDecode or the useUnicode property. You should simply just be able to type into a field (eg with alt + key for some characters), or copy and paste them from any source.
The problem you are experiencing may be the curly quotes in
set the text of field “Text” to “text i want to enter is here, and a French à”
Strings must be surrounded by ascii quote characters ” rather than “ and ”.
set the text of field “Text” to “text i want to enter is here, and a French à” should work from script or in the message box.
Let me know if this resolves your problem.
Ali
Ali - November 29, 2014
Ah, I see quotation marks are auto converted on this site, so that may not have been the problem …
Pauli - December 13, 2014
If I type in the field “Text” it works properly. But I have that text in UTF8 format in file French.txt.
put URL “file:Z:/French.txt” into x
set the text of field “Text” of card “WordArt” to uniencode(x,”UTF8″)
set the text of field “Text” of card “WordArt” to quote & uniencode(x,”UTF8″) & quote
set the unicodeText of field “Text” of card “WordArt” to uniencode(x,”UTF8″)
None of these work properly. The last one did in version 6. What is the proper syntax to use in version 7?
Pauli - December 13, 2014
I forgot to add that the most simple candidate
set the text of field “Text” of card “WordArt” to x
does not work, either, when x is in UTF8. Any special character such as é disappear that way.
Pauli - December 13, 2014
Sorry I said it wrong: actually the old format
set the unicodeText of field “Text” of card “WordArt” to uniencode(x,”UTF8″)
might still be the one that works here.
Ali - December 13, 2014
Sorry, it doesn’t seem to want to let me reply to your most recent post.
I didn’t realise you text was coming from an external source as utf-8. If you have text in utf-8 in a file, then you can do the following
put url (“binfile:Z:/French.txt”) into tUTF8Data
put textDecode(tUTF8Data, “utf-8”) into field 1
the utf-8 data should display properly then.
Pauli - December 13, 2014
Thanks! I’ll try this. The old format where I use file instead of binfile and then
set the unicodeText of field “Text” of card “WordArt” to uniencode(x,”UTF8″)
seems to work, too.