If you were a Kool-Aid kid then chance has it you are familiar with Dr Seuss and the bizarre, illogical language he uses in his books. What a wonderful mind that can think outside of the box in such a beautifully unconventional, educational and humorous fashion. How could one possibly dream up sentences like: “One fish two fish red fish blue fish…”? Especially as this is not your typical every day level of conversation.
In this blog I demonstrate how computers can assist us in constructing sentences with similarly bizarre meanings.
As hardware has become more powerful and the algorithms explored have become more advanced, computers have been shown to mimic limited reasoning abilities that are most often associated with intelligent beings. I know I am touching on a controversial area here, and I am steering clear of suggestions of artificial intelligence or artificial self-awareness. I make a short reference to the Turing Test here, for the interested reader. This test assesses if a computer can convincingly mimic or simulate the communicative responses of a human being.
It would takes a bit more effort to implement an algorithm that is worthy of being entered into a Turing Test than we can cover here, but we will be looking at one computational aspect that may well be part of a larger solution.
If you read our blogs on a regular basis you will know that we have touched on different ways of implementing number sequence generators in blog “Recursion for Runaways“. We are building on the concepts outlined there to create a Natural Language Sentence Generator.
Before we can get our hands dirty with a bit of LiveCode, we need to take a step back and understand some of the basics that underlie the English language.
If you are starting to get a déjà vu experience of some long forgotten classes at school, do not worry. We are only scratching the surface of the topics involved, leaving you to explore the relevant fields in more depth at your leisure.
According to the global language monitor, the English language passed the 1 Million Word mark on June 10, 2009 with 14.7 new words being added every day. Our application is not going to utilize all of these words, in fact we are only going to use a very small subset of these words to build our Natural Language Generator. But in order to utilize any number of words we need to understand how groups of words are arranged in the English language.
There are eight “Parts of Speech” in the English language that are used to group kinds of words together. These parts are: articles, nouns, adjectives, verbs, adverbs, prepositions, pronouns, conjunctions and interjections. The order by which these “Parts of Speech” are arranged in sentences is determined by rules. For example, a very basic sentence can consist of an article, a noun and a verb, arranged in that order:
“A man walks.”
or
“The cat sleeps.”
By interchanging the articles, nouns and verbs you can create a large arrangement of sentences with different meanings. From a coding point of view we may store the words in some kind of storage mechanism, for example variables:
article = “A,The”
noun = “man,cat”
verb = “walks,sleeps”
This could form part of a word database that may allow us to create the two sentences shown, but it would also allow you to create further new sentences, for example:
“The man sleeps.”
“The cat walks.”
“A cat walks.”
In order to build the English Language Generator we use a database of words, similar to the ones used in the examples shown here. Randomly picking words from the “Part of Speech” entries and placing them in the “article, noun, verb” order is then a perfectly workable approach, but this arrangement would limit our expressive creativity somewhat and probably be quite boring after a while. What we need is some formalism or grammar that allows us to present and order the words in a more flexible fashion. This would allow us to only make changes to the grammar in order to control how the words are ordered.
We use an array as a database for both the words and the grammar rules and call this database “DCG”:
local DCG
We then add the rule to the database that we used to create the aforementioned sentences:
put “-art -n -v” into DCG[“-s”]
The tokens are “-art” (article), “-n” (noun) and “-v” (verb) for a sentence “-s”.
Next we populate the database with words. This time we add a few more words, giving us a bit more freedom to create sentences:
put “the,a” into DCG[“-art”]
put “man,woman,child,cat,dog,pie” into DCG[“-n”]
put “walks,eats,combs,cooks” into DCG[“-v”]
Using our rule and understanding of how basic sentences are constructed in the English language, we can now create sentences like the ones we have already discussed. With the word database shown here, and the basic sentence rule, we can create a total of 48 unique sentences.
Now let us look at how LiveCode can take this kind of database and convert the content into sentences:
function evaluateSegment pSegment
repeat for each word tItem in pSegment
if tItem begins with “-” then
put evaluateSegment (item random(the number of items in DCG[tItem]) of DCG[tItem]) after tResult
else
put tItem & space after tResult
end if
end repeat
return tResult
end evaluateSegment
Yes, that is it. These ten lines of code are our English Language Generator. It processes the database and randomly selects words to create sentences that are grammatically correct, based on the grammar rules we presented. If you look closely you can see that this function is recursive. The recursive design allows us to create this compact implementation. It also gives us powerful means by which a more complex grammar can be interpreted and processed.
You can put this generator together by creating a button and adding the following code to the button script:
local DCG
on mouseUp
// Grammar Rules
put “-art -n -v” into DCG[“-s”]
// Word Database
put “the,a” into DCG[“-art”]
put “man,woman,child,cat,dog,pie” into DCG[“-n”]
put “walks,eats,combs,cooks” into DCG[“-v”]
// Create Sentence
put evaluateSegment (“-s”)
end mouseUp
function evaluateSegment pSegment
repeat for each word tItem in pSegment
if tItem begins with “-” then
put evaluateSegment (item random(the number of items in DCG[tItem]) of DCG[tItem]) after tResult
else
put tItem & space after tResult
end if
end repeat
return tResult
end evaluateSegment
This script creates the database each time you hit the button, allowing you to update the entries and apply them for each run. This code opens the message box and displays a sentence.
Okay nice, we can randomly create sentences, but as you can see, this blog has not quite reached the end yet. Let us now move on to creating sentences with a bit more complexity and start to utilize the recursive nature of the “evaluateSegment” function.
In order to update your code and implement the following changes you only have to replace the code in the // Grammar Rules and // Word Database sections of your “mouseUp” handler.
First we update the // Grammar Rules and add the concept of a “noun phrase” (-np):
put “-np -v” into DCG[“-s”]
put “-art -n” into DCG[“-np”]
Fundamentally, this logic generates exactly the same sentences as with the rule we used earlier. The “-art -n” part of the sentence rule has been replace with a “noun phrase” part that itself defines “-art -n”. This means that “evaluateSegment” has to complete the “-np” rule before it can complete the “-s” rule.
Now why are we creating something more complicated when the simple sentence rule was sufficient?
Well, this new representation allows us to expand the grammar even further. We can now also add a “verb phrase” (-vp) to our grammar replacing the // Grammar Rules with the following syntax:
put “-np -vp” into DCG[“-s”]
put “-art -n” into DCG[“-np”]
put “-v -np,-v” into DCG[“-vp”]
A “verb phrase” can be a verb, as with our original grammar, but it can now also be a verb and a “noun phrase”, which in turn is an article, followed by noun. So what kind of sentences does this new grammar give us? It produces the same sentences as before but can also generate more complex sentences, such as:
“A man walks a dog.”
or
“The child eats a pie.”
We are now getting to a point where the grammar can use different numbers of words to construct English sentences. That is where recursion is in its element.
This is progress, but let us look at even more complex constructs that can create sentences with thousands of words. – Meet “conjunctions”. Conjunctions are words that bind sentences together.
For the next step we update the grammar rules again and also add some more words to our word database. Let us start with the // Grammar Rules first and replace them with:
put “-np -vp,-s -conj -s” into DCG[“-s”]
put “-art -n” into DCG[“-np”]
put “-v -np,-v” into DCG[“-vp”]
Essentially we are now allowing sentences to be constructed as before, but we are also creating sentences that are joined by a conjunction.
The // Word Database is updated by adding the following words as conjunctions:
put “and,or,but,so,because,although” into DCG[“-conj”]
We now have a grammar that has the potential to generate massive sentences. This is because the grammar can join an arbitrary number of sentences through conjunctions.
So what can we expect to be generated after this update?
We can expect sentences like the following:
“A man eats the pie because the dog cooks.”
“A cat cooks so the woman eats.”
Now these make sense to a degree but may not reflect everyday situations. We can also expect a lot of very different sentences with varying lengths that are total gibberish. In fact I have generated sentences that are several times longer than this blog post and at times even hit the recursion limit. It is natural that we sometimes hit that limit as we do not set any restriction on the length of sentence that can be generated.
You will probably find that only a very few of the sentence generated make any sense at all. So have we created an English Language Generator or is this maybe more a “Dr Seuss quote generator”?
Seeing how easy it is to string words together in a grammatically concise way is nice, but adding meaning to what they represent is maybe a skill best left to those with a unique awareness and understanding of themselves and the environment they live in, or as René Descartes may have put it: “Cogito ergo sum”.
The English grammar is certainly more complex than I let on in this blog, but you now have a basic LiveCode framework to build a language generator with a lot more complex grammatical constructs. Try expanding the word database and the grammar rules to update and create your own language generator. If you are a linguist or know other languages, why not see if you can get some meaning out of this code in a different language.
read more
Recent Comments