Yoko v0.2

The thoughtful little chatbot

Understanding language

Buzzwords: Natural Language Processing, formal grammars, syntactic versus semantic analysis, state, context, AIML

Different types of phrases

English phrases Yoko can understand are divided into 3 categories: statements,questions and sentiments. Statements and questions are approached in the parsing way described above, whereas sentiments are detected by 'common expressions'. The phrase ':(' is the common expression for sentiment 'SAD' for example. try it

To keep things organised and keep track of how well we are doing in each area, let's further divide statements and questions, roughly corresponding with the different elements of Yoko's worldview:

  • onthology: about classes, objects and their properties and values.
  • actions and events: about actions and events, and their effects and causality.
  • relations and preferences: concerning first and formost possession relationships, but also others like spatial containment, etc... Also, let's put preferences here
The distinction will not always be clear-cut of course (e.g. where does CLASS_IS_HYPOTHETICAL_EVENT go?), but still, it helps organize the mind.

You can view a dynamically loaded list of language patterns Yoko understands

NOTE: so far in Yoko's development I have been acting as if 'statements', 'questions' and 'sentiments' have been the only phrase meaning categories in existence. However, thinking about Yoko's 2D world clearly there is at least one more type of phrase: commands, or requests, i.e. asking another being to perform some action. So that's 4 types now. Did I miss others? (note that 'sentiments' is very broad, ranging from 'fuck you' to 'yes' to 'ouch!')

Yoko's grammar

UPDATE: Yoko's patterns are now specified in a sort of 'semantic' patterns, which are then 'compiled' to the form described below, which in turn gets parsed with regexes.

Here is an example from the grammar file parse_questions_onthology:

"INFO_ABOUT_CLASS" :
	[
			{"pattern" : "what is a [classmember]",					"examplephrase" : "what is a cat?"},
			{"pattern" : "what are [classmembers]",					"examplephrase" : "what are cats?"},
			{"pattern" : "what does Yoko know about [subphrase]",	"examplephrase" : "what do you know about red wine?"},
			{"pattern" : "what does Yoko know about [classmembers]",	"examplephrase" : "what do you know about cats?"},
			{"pattern" : "does Yoko know what [classmembers] are", 	"examplephrase" : "do you know what cats are?"}
	],
(note how the 'what does yoko know' stuff is because we transform things about 'you' and 'I' to their 'general' 3rd person first. Clunky but ok, it helps for many other places)

These 'semantic' patterns are then 'compiled' into more 'lexical' patterns, for example the upper one becomes this:

(
            [type] => question
            [category] => onthology
            [meaning] => INFO_ABOUT_CLASS
            [semanticpattern] => what is a [classmember]
            [examplephrase] => what is a cat?
            [params] => Array
                (
                    [class] => 1
                )

            [pattern] => what is a [noun_singular]
            [regexpattern] => /^what is a ([a-z0-9-\s]+)$/
        )

-- END OF THE UPDATE --- below is how it previously used to be, without the 'semantic pattern' approach that came on top of it to simplify inputting

Parsing the users input messages is so far primarily based on parsing simple 'template phrases', transforming a user input phrase of a shape like:

'A [something] is a [something]' //e.g. 'a cat is an animal'
or
[somethings] are [somethings]
into the programming-ready 'meaning' object:
[
 type: STATEMENT
 meaning: CLASS_HAS_PARENTCLASS
 parameters: class->something1, parentclass->something2
]
...with the parameters in the meaning array matched in the obvious way to those in the 'template phrase'. (obviously at the heart of making this conversation are a bunch of regular expressions.) Think 'cats are animals' or 'a cat is an animal' (or more advanced, 'my cat died today')

This form of 'understanding' is then treated with the 'brain' to store this info in the database if it is new, ask additional questions if it is not, or simply say 'I already knew that'. This means Yoko contains a LOT of code, lists and lists of functions, that do stuff with a 'meaning' + its accompanying parameters. (meanings are further divided in questions, statements and expressions of sentiment).

How does this approach to natural language parsing relate to the concepts of syntactic and semantic analysis (like explained in the Introduction to NLP article)? I think we could say that I am lumping the two together a bit, foregoing tagging parts-of-speech and word order as structured by formal grammars, and instead seeing how far I can get by doing both as once. So I don't tag anything as a 'singular noun' or 'proper noun', but just map those to classes and instances etc... right away.

NOTE: my current approach is probably terribly inefficient, see also the question somewhere else here on how many are there?.

Yoko's code contains a Part-Of-Speech tagger (using the Brown corpus) already though, implemented as a bit of an experiment, see resources, and in data>language I can play with POS tagging, but so far I don't use that in any way yet for Yoko's talking.

So Yoko's grammar is basically that, matching phrases to meaning strings that internally in the code get, well, appropriate code. Thus, Yoko's grammar so far is a combination of:

  • a word's position in the phrase relative to some hard-coded keywords
  • a word's form (ends with an 's' -> plural, ends with 'ed' -> past tense of some verb, starts with uppercase: name of some instance)

If I understand proper terminology, what this means is that at this point it's all (and very primitive) syntactic rather than semantic analysis taking place. Don't even think about resolving 'anaphora' stuff within a sentence. (This may be a good place to link to the fantastic article Introduction to Natural Language Processing article) But again: it's fascinating to learn/thing about these things, and hey, I'm having fun :)

Some examples

The parser files work a bit as follows, by specifying templates 'meaning tokens' and then which tokens (identified by the n-th token in the template) should be interpreted as what:
Some example tatements
Phrase structure Example How they will be interpreted
[somethings] are [something] Dogs are small class (1) has a property value of (2) (for a property z?) Default is 'high' 'z-ness'?
[somethings] are [somethings] Dogs are Animals class (1) has (2) as a parent class
[Something] is a [something] Tom is a dog (1) is an instance (with name (1)) of class (2)
a [something] is a [something] a dog is a animal class (1) has (2) as parent class
a [something] can be [something] a dog is a animal class (1) has (2) as property value 'SOMETIMES'.
Some example questions
Phrase structure Example How they will be interpreted
are [somethings] [something]? Are dogs small has class (2) possibleportyvalue 'ALWAYS', 'SOMETIMES' or 'NEVER' (or dont we know)?
Some example common formulations
Phrase structure Example How they will be interpreted
Freeform Hi/Hello/Goodbye/Fuck/... DB lookup to see if we know this and its appropriate response, and whether it teaches us something about the current conversation/phrase/... (i.e. does it express a degree of happiness?)

Yoko's English grammar

Let's list the 'grammar' of English as Yoko currently understands it. I.e. which types of phrases can it 'parse' to a 'meaning'. As said, her grammar is growing organically from 'template phrases' rather than from a set of strict rules, but some generalities can be said. Yoko's grammar is super simplistic, and hugely dependent of the order of words.

Phrase structure
The most important grammar-ish 'global' thing here is to know what type a phrase is (these are checked in this order):

  • A phrase is a common expression if it fits Yoko's database of common expression phrases/patterns, easy. That database also contains the sentiments associated with the common expressions. (e.g. 'cheers!' is a common expression with meaning 'thank you')
  • A phrase is a question if it starts with a 'question word' (what, who, where, why...) or it is a yes/no question and thus starts with 'do' or 'does'.
  • A phrase is a statement if it fits neither of the above 2 types.

Nouns
Yoko implicitely assumes the following rules:

  • Words that end with -s are the plural of a 'class' word. (and conversely, to generate the plural of a class word, add -s).
  • Words (and groups of adjacent words) that start with Uppercase characters are instances, and are always singular.

Verbs
Verbs so far have very limited conjugations:

  • Verbs that end with -ed indicate that an event of that verb took place in the past. (and conversely, to talk about an event etc etc)
  • In a 'statement' phrase, the word right after the subject is the verb. When talking in plural (like about classes) the verb ends not with -s, but when talking in singular (like about an instance) it does.

Difficulty 1: Irregular plurals/conjugations
Irregular stuff gets converted to its 'regular' form as per the above rules, using a database of 'irregularl' -> 'regularized' pairs. ('eaten' -> 'eated', 'children' -> 'childs' ...) There are also some 'irregular rules' that I should explicitly implement (rather than just adding them to the list): verbs where the stem ends in 'e' do not get an extra -ed but just -d when put in the past: hope -> hoped, etc... Currently I'm fixing these by just converting the 'irregular' hoped to hopeed, but this is clumsy of course. To do.

Difficulty 2: nouns/verbs consisting of multiple words
Currently, all words for classes, instances, actions, etc... are considered to be of 1 word. This is obviously false.

For instances it is easy to extend this: if multiple sequential words start in uppercase, they are all interpreted as a single instance. ('New York', 'The Beatles', 'One Flew Over The Cuckoo's Nest'). (note that we do/should also distinguish between 'plural' instances like 'The Beatles' or 'The United States', and 'singular' instances like 'John Lennon'. This will influence generating correct phrases about them.)

For verbs and nouns the multiple-word thing is harder. Maybe we should 'regularize' compound nouns/verbs to a one-word version, just like the above strategy for irregular stuff? I like this approach, but it would need a nice and clean and non-ambiguous rule. 'something holds on' -> 'something onholds' perhaps? Can we generalize that for compound verbs, the second component is actually always just a pronoun that needs to be there, and thus is part of a very limited set?

Difficulty 3: 'you' is used for your conversation partner and for 'general hypothetical person'
It's the difference between 'how old are you' and 'if you drink a poison you die'. I think it is quite accurate to state that the 'general' you is almost exclusively used when talking about hypothetical events. Now if only there were a way to detect these purely syntactically?

So there are two 'language element lists' so far to help in parsing meaning from input phrases: common expressions, and irregular plurals/verb conjugations. Since I consider these lists to belong to 'rules of English', they are hard-coded, and no 'learning about the world from conversations' goes on with them.

To make this baby grammar work a bit better, before parsing we can 'normalize' some stuff from 'real English' into 'Yoko English':

  • Irregular verbs/plurals get transformed into 'normal' form (past tenses in '-ed' so 'ate' becomes 'eated', ,etc). Vaguely related to so-called stemming.
  • Transform variations meaning exactly the same like 'an' into 'a', 'what's' into 'whats' etc...
  • Instances start with uppercase, except for some special ones like the sun, the earth, the sky, the weather (get your act together, science)... Convert these to 'Sun', 'Earth' etc...
  • Remove 'qualifiers' that contain information about how the person talking feels as opposed to the actual content: smileys, words like 'really', 'unfortunately', 'fucking' (when not used as verb), etc.. Obviously for other later components like emotions these will matter, but not for the basic baby grammar.
  • Some common spelling mistakes, like your a -> you're a, teh->>the, etc.
  • Instances of 'I' and 'you' are converted to 'Yoko' and '[user]'! At a first level, this is just talking about 'instances'. The fact that these are of course instances of particular meaning in the conversation can of course not be ignored for a natural conversation, but it shall be treated on another level, that of 'topics' and 'contexts' and stuff like that.

Interesting thought: we don't need the word 'who'
Think about it, you can replace 'who' by 'what' in any phrase and there would be no ambiguity, 'who' phrases just ask for an instance, just like (some) what phrases do. What is the capital of France? What is the president of France? What is your teacher? Ok, shit, this last one can have a different meaning in 'what' form. Still, it's an interesting simplification and one less thing to write parse-patterns for. The fact that 'who' is about a human is of course useful meta-information, for example when it comes to pronouns (though, hmmm, why not convert them all to 'it'?) and stuff. Not sure if I should do this or not, let's leave it for now.

Another very significant transformation that is done is to translate questions/statements about 'you' or 'me' to their equivalant phrase structure about the general instance. So Yoko's grammar will convert 'do you like cats?' first to 'does Yoko like cats?' and then process that. Similar with 'I like cats' and things like that. This way, we can teach Yoko about not just her own and the user's preferences, but you could also question her on other people's preferences. The example phrases of the grammar rules will therefor contain things like 'what is Yoko's favorite animal?', but keep in mind that this means that she will understand 'what is your favorite animal' as well.

The difficulty with classes / instances / verbs consisting of multiple words

My current approach of 'semantic patterns' feels rather scalable, except that it is impossible to catch everything if I don't take 'words' that actually have MULTIPLE words in the reasoning.

Like this example from the onthology:
"INFO_ABOUT_CLASS" :
	[
			{"pattern" : "what is a [classmember]",					"examplephrase" : "what is a cat?"},
			{"pattern" : "what are [classmembers]",					"examplephrase" : "what are cats?"},
			{"pattern" : "what does Yoko know about [subphrase]",	"examplephrase" : "what do you know about red wine?"},
			{"pattern" : "what does Yoko know about [classmembers]",	"examplephrase" : "what do you know about cats?"},
			{"pattern" : "does Yoko know what [classmembers] are", 	"examplephrase" : "do you know what cats are?"}
	],

The thing is, this should also mapp on a class name like tree house or maple trees or something like that. The problem is that if I start to use [subphrase], being an arbitrary length combination of words separated by spaces, it generally gets to 'greedy' and catches things that aren't there.

So, conclusion, once again we have a mix of knowledge and semantics and language here: in matching against [classmember], I should actually just go and do a search for all class member names that have multiple words as well.

.... Fuck, should I consider using a search engine like ElasticSearch here?

Identifying instances

Damn where did this text go??? Anyway, something something about instances occuring either as:

  • proper noun ('Snuggles') or
  • defined by possession ('my cat') or
  • defined by virtue of appearing in the 'story' ('the cat').

Breaking up longer or 'compound' inputs.

We should have a mechanism to tackle phrases like this:

		User: My cat died today! It was only 3 years old.
		OR
		User: My cat died today, but it was only 3 years old.
		OR
		User: Although it was only 3 years old, my cat died today.
	

The first case is quite easy: we break up the phrases separated by punctuation like '.' and '?' and '!', and run those seperate phrases through Yoko's internals. Compound inputs like this are likely to contain pronouns, but we have ways to handle those (see below).

The other 2 cases are a bit harder, but in first approximation we can do these same. This time we split up by words like 'but' and 'although', make a good guess about the order this means for the individual phrases, and run those phrases again through Yoko one after the other. Note that in a first approximation we discard the connotation of the separation words. 'but' implies some kind of contradiction, 'although' and even stronger one, 'because' represents cause and effect... Actually we can do smarter things with 'because' using our hypothetical events and their relations, but more on that in other places in this document.

Pronouns and 'pronoun states'

I am probably violating some linguistics terminology here, but by 'pronoun' I mean 'it', 'he', 'she', 'they' etc... Each of those pronouns also have some'derived pronouns'. For 'he' we have the derived pronouns 'his' for possessive and 'him' for, errr, indirect object or something. Also there is 'that' which is a bit of a special case (and perhaps technically not a pronoun at all? Whatever, it fits in this stuff for me)

(oh, right: any 'you' and 'me' and stuff like that is always replaced by the user and Yoko of course)

For each of the understood meanings of either the user input, and Yoko's reaction, we call a routine that updates the pronoun states. So for example when the user says 'I love cats', the pronoun state for 'they' is updated to 'cats', and the pronoun state for 'that' becomes USER_HAS_PREFERENCE with parameters level:love, type:class, item:cat or something like that.

The next step is that, before the parsing for meaning begins, we go through all the pronounstates, and if we have one present, we replace the pronouns with their current meaning. So the dialogue:

	User: I love cats.
	Yoko: Didn't know that.
	User: they are really furry.

Yoko will infer after the first phrase that the 'they' means 'cats', and parse from there. Derived pronouns will also be replaced accordingly ('my brother hates them' becomes 'my brother hates cats').

Some caveats/weaknesses for now:

  • the 'that' case is a tad more difficult, as often 'that' refers to the entire previous meaning, not just one instance or class mentioned. So as long as I haven't figured out how to handle more complicated compound sentences and stuff like that, I think this one will have to wait
  • As long as I don't store gender as an extra field in my instances and classes database, I think I have little choice but to map all of it/he/she to the same instance/class? Humans can of course have the property gender with attribute male or female in the current storage mechanism already, but other things than humans have 'gender' in language and affect pronounstates accordingly. Hmmm...
  • Whoops, another annoying one: sometimes 'it' or 'he' or whatever refers to something earlier in the same sentance rather than to something said before. 'when a glass drops it usually breaks'. This is actaully another hard case of 'pronoun resoluation', as in there is genuine ambiguity here, and no lexical/grammatical difference - you have to know the world to know what makes sense:
    - my car is a vehicle
    - when a glass drops it usually breaks
    -> what do we learn that breaks here, the glass or the car?

It gets worse. There's the case of answering questions and 'hidden pronouns'. Boy do we like to take shortcuts in language. Here's another pronoun-ish thing we do: when answering a question, we imply that our answer is about that question without saying it.

For our NLP purposes there is good and bad news. The bad news one is that by itself, the grammar structure of such a phrase does not contain at all what it is referring to. The good news is that there is an obvious - almost too trivial to mention - indicator that it does: answers are directly preceded by the question.

Here is what we want to tackle:

		Human: I love cats!
		Yoko: First time I hear about cats. What are they? (oooh, we could add pronouns to Yoko's arsenal as well!)
		Human: a type of animals.
	

How to parse it? We should set aside some extra parsing detection for the user phrase that follows an answer, and convert it to a 'complete' statement accordingly based on that phrase's form:

Yoko: what are cats? Human: a type of animals.
OR
Human: they are animals.
OR
Human: fluffy animals.

Finally, besides 'that's very general way of 'refering to the previous phrase' we can use it to parse out compound phrases like a cat is an animal that meows to turn them into a cat is an animal and a cat meows. Criterion, if the phrase contains 'is' and then 'that' in that order, we can transform it like that?

So let's implement some rules for that.

Lists

Full-on corpus POS tagging may be eventually the way to go, but before that, a number of ad-hoc lists are already sutied. Like a list of uncountable nouns, or a list of names (http://www.behindthename.com/top/)

Deciding between pattern matches

One problem I am often encountering is that, especially if I allow the [something] and similar tags to be too 'flexible' (no limit on word count for example), the more 'broad' patterns 'eat' the phrase before the more specific (and thus likely to be accurate) pattern gets a chance. I should prevent this by ordering patterns descending by number of wildcards (like [something]) occurring, and check against them in that order. This will assure that mroe specific patterns get a chance first.

(Or wait, even better! Because that darn [Something] [something]s ('Snuggles meows') pattern keeps sucking up everything, even [somethings] are [somethings] the very first pattern Yoko ever understood.

Another interesting approach: when multiple patterns match, maybe favor the one where 'known elements' (e.g. the instance) have less words? This lowers the chance of 'indication words' like 'have' or 'has' etc etc being included in an instance name.

DIFFICULTIES IN UNDERSTANDING ENGLISH

English may have a wonderfully simple grammar, which is a blessing for Yoko, but it also comes with a drawback. In my current approach, there are quite some cases where it is hard (impossible?) to, using nothing but the phrase structure, unambiguously map certain types of statements more or less directly to lessons learned about the world. Humans figure these out easily becauses they already know the words being talked about, but I want to postpone this tight coupling between understanding the world and the language as long as possible. Not possible? :(

These are:

  1. Distinguishing between an instance's property value that is 'permanent' and one that is merely a 'state'
    Example: Albert is angry versus Albert is smart.
    In spanish the distinction between a permanent property and a temporary state is made explicit with two forms of 'to be', namely ser and estar. Shame in English it is not so! So if we tell Yoko that 'Tom is angry' it will probably require some nagging to find out if that's a permanent or temporary thing about Tom. Additionally, some base knowledge will make Yoko better and better at not having to nag: it makes sense to store with a property what its typical 'permanency' is. Though most properties can be both, property 'size' will almost always indicate a permanent state, whereas 'mood' or 'temperature' will almost always indicate a temporary state. Maybe I should scan some spanish text body for ser and estar to plug these classificaions in Yoko?
  2. Distinguishing between having a property and having a possession relation
    The reason is that 'have' is used for both. Example: a country has a size versus a country has a capital. Or with questions: Or the question 'what is the mood of Yoko' and 'what is the capital of Belgium'.
    Well, see example. In the database (and Yoko's world view) these are very distinct things, and rightly so I think, but from the above phrases it's hard to differ in them. The good news is, that if we already know that both elements are CLASSES, then we can obviously suppose that it's a possession relation. Or if it's a property, than it's about a property relation.
  3. Distinguishing between an uncountable class having a property value and having a parent class
    Example: Jazz is beautiful versus Jazz is music. I'm sure recognizing uncountable classes is gonna be a pain in the ass in other ways, I can sense it already.
Real humans have no problem with phrases like that, because we already know what's being talked about. That is, the neat separation of world view and communication about it breaks down: the world view helps the communicating. This is also what is going on in many 'ambiguous' phrases so often mentioned in the context of chat bots, like the so-called Winograd Schemas.

The biggest conclusion for me here I think, is that I will have to feed Yoko some 'true' data myself, formulated in an unambiguous way, so that there's less risk in misinterpreting later on. The other alternative of course, is in each of these cases: nagging for clarification!

So how to alleviate (or solve) these issues? We need wildcards as a first step to parsing some wider variety of input. Let's introduce (brackets) as the 'optional' wild card, with , so (it) means that the word it may or may not be there. Then we can do stuff like:

		"I love (it) when ..."
	
or of course, our more general version:
		"[Something] [haspreferencetowards] (it) when ....";
	

What about 'general' wildcards? Like a * for any optional words, or ? for one or no words? I WANT TO AVOID THIS. Allowing 'anything' will greatly increase the risk of wrongly interpreting something I think, which is already happening all the time just because words end in -s and whatnot. Rather than doing catch-all stuff like that, let's just remove lots and lots of 'qualifier' words and many multi-option (like|love|crave|admire|)... kinda stuff. As usual, this will yield in more DONT_UNDERSTAND, but also more control over what goes on.

Reaction modifiers

Another one I am sure has a name in linguistics, but I wouldn't know what to google. In any case, it seems useful in the reactions to store some grammatical meta-information that assures it is correct, and which is more efficient than storing just lots of reply patterns.

A good example is Yoko replying with some summed-up list. Asking if she knows any actions (e.g. 'what can cats do?') or class members (e.g. 'do you know any animals?') results in a 'listed response', which has very similar properties among all word types, but some grammar is needed to make it perfect.

Internally we store root forms, like 'dog' and 'bark'. A list of these will have a comma-separated list with 'and' between the last 2, but then also gramatically it can sometimes be 'plural' and sometimes be 'gerund' for actions, etc...

Subphrases and 'that' context

Moving conversations to the next level: subphrases and 'that' context

Consider this dialogue, first between me and Yoko to teach her what she likes:
	Wouter (teaching Yoko): You hate it when a cat dies.
And then somebody comes along and has the following dialogue:
	User: my cat died.
Yoko: that sucks :( how old was it?
User: she was 17 years.
Yoko. Wow, that's old for a cat!

I WANT THAT. THAT WOULD BE UNPRECEDENTEDLY AWESOME. LET'S DO IT.

The above scenario requires some things:

  • Storing and retrieving preferences about hypothetical events.
  • Understanding subphrases (in this case, describing the hypothetical event 'a cat dies' so it can be turned into a preference correctly).
  • Context 1: A sense of 'that/he/she' context in the conversation, for as long as it takes (troughout the entire above conversation for example)
  • Context 2: Making sense of 'pure reaction', often sentiment-expressing, phrases like 'ok', or 'cool' or ':)' or 'hehe', or even 'why'.
    After the last 'ok' there, the user would essentially start talking of something new or ask another question like 'do you have a cat yourself?' or something. That's fine.

Building on the previous example, I want Yoko to understand 'I like it when the sun shines', so 'the sun shines' is what is being liked (type 'EVENT' i.e. a hypothetical event). This requires a first pass at sub phrases ('noun phrases'? Naaah something more general, any of these types).

'that' context. The terminology comes from AIML, though it's obvious that every AIML bot totally sucks at this. The idea is to have conversations that can:

  1. Understand and use words like 'he', 'that', 'they', 'it', etc... 'My cat died. It was 12 years old.'
  2. Understand and use words that are 100% a reaction, and thus make no sense isolated. Building on dying cats:

Active versus passive phrases

At first I stored the 'role' of classes in actions, so that 'pizza can be eaten' would store this somehow as pizza with role 'object' instead of 'subject'. But that's a mess. The new goal is to convert every passive phrase to an active one. The main thing here is, who is the subject if none is given? Our answer by default: HUMANS.

Cleaning up irregularities

Irregularities
Before going to the real parsing, we clean up or 'smooth' the input a bit language-wise, to maintain as much as possible my [something]s way of finding a class, as well as other things, things like... :

  • Plurals that are not just the singular + s: company > companies.
  • Past forms that are not just the stem form +ed: read > 'readed', died > 'dieed', 'sang' > 'singed etc...
  • Both 'a' and 'an' map to 'a'
  • The gerund (was that what it's called? 'ing form') of verbs ending in 'e' drops the e. Smoke becomes smoking. 'Yoko dislikes smoking' Another 'stemming' issue! Are there any 'rules' surrounding this? Do certain verbs always end in -e based on for example the penultimate character? work->working, smoke->smoking... Guess not :(
  • ...

Non-literal or ambiguous expressions, methafores, figures of speach, idioms, etc...?

I don't give a damn about these. (except perhaps when they are useful in humor, see below) These things are silly distractions, not on the to-do list, not in a 5-year old's world view anyway. Maybe in Yoko 8.0, when she tries to run for president.

Humor

Can we formulate humoristic remarks when all we know about is classes, instances and properties? (and perhaps actions and events) I believe the answer is YESSSSS.

property value comparison joke
Certainly state allows for humor: expressing some instance has a property value state by comparing it to some class which has that same property value will sound witty. More specifically, this 'comparing' requires comparing the object/situation to be mocked with one that has two instantly recognizable features.

An example to illustrate: somebody is being authorative, and you say jokingly 'sir yes sir!'. What you do is related 'authorative' to an army officer (typical for them), and then used another typical identifier of an army officer to make the joke: the fact that they are addressed in 'sir yes sir'. In conclusion, for jokes of this type we will need classes (or possibly well-known instances of classes? In a less PC fashion, you could have raised the Hitler salute to the previous situation, linking to instance Hittler for the same comical effect) that have at least TWO property values that not many other classes/instances have. Not many classes are seen as extremely authorative, so army officer fulfills that one, and they are the only class that is associated with responses of the type 'sir yes sir!'. So everybody will get this joke, and all the conditions to make it work can be automatically generated.

Even better is that we can do the same about a permanent property value, not just a current state! She should also be able to make jokes about a property value of something, by stating another property of a class (clearly identified by it) that also has the property of the thing we're joking about. In other words, what we do with 'yo mama' jokes to mock the property weight's value fat.

Example: her face looks so red (=propertyvalue or better state to mock) you'd want to make a Bloody Mary (i.e. comparing to class tomatoes which also have propertyvalue red, and are clearly identified by the fact that you can make Bloody Mary's with them).

Absurd class-propertyvalue joke
Without a notion of state, perhaps we can already make jokes by answering questions in a witty way as follows. E.g. if the answer to 'does instance X belong to class y?' is no because we know it belongs to class x, we can reply by saying "Well I've never seen a [property value of class y that x has different] x!" Hmmm... When is this funny, when is it not? When the 2 classes are 'close' somehow? When there is something 'visual' about the property value so the listener can visualize the absurdity?

shared property value joke
This one is taken from this article, and centers around the phrase 'I like my X like I like my Y, Z' where X and Y are 2 classes (or perhaps the second can be an instance I think), with Z a property value they have in common. The closer the association of the propertyvalue with the classes the better, and the more apart the classes the better.

Examples from that article:

  • I like my relationships like I like my source, open
  • I like my coffee like I like my war, cold
  • I like my boys like I like my sectors, bad
Good stuff!

TYPES OF PHRASES

We already saw that phrases can be three types: statements, questions and sentiments (and perhaps a fourth, requests for action). And perhaps statements mixed with sentiments or other types of mixes. An interesting question is: how many elements does each of these types contain? Is it possible to list them all? Is it in the order of dozens, hundreds or thousands?

Two types of questions: 'yes-no questions' versus 'retrieve' questions
I just realized that some questions require almost identical steps to answer, in particular related to (possession) relations maybe. Compare these:

  • Do you have a cat?
  • Who is your cat?
The first question tries to get a result, and returns whether a result is found. The second not only retrieves the result, but also returns it.

PREV: World view | NEXT: Dialogue
Written by Wouter - copyright 2013. Questions and remarks welcome at wouter@yokobot.com!
A lot more chatbots over at chatbots.org!