Word order and diversity of expressions
This tutorial dives deeper on how to create optimal sets of expressions for your NLP model using word order and diversity of words.
Last updated
This tutorial dives deeper on how to create optimal sets of expressions for your NLP model using word order and diversity of words.
Last updated
To ensure your chatbot effectively understands users, it must analyze two key elements:
the specific words used
their order in which these words are used
Word order and word diversity aren't the only elements to take into account when making optimal sets of expressions. Have a read at our list of best practices as well.
Word order can significantly impact meaning. For instance, I want to take the train, not the car differs from I want to take the car, not the train, despite containing the same words in a different order.
When generating expressions for your chatbot, it's crucial to account for all plausible word orders and syntactic structures.
While some structures might not be perfectly grammatical, it's important to include them, as users might not always be syntactically correct. If the meaning is clear to a human, it should be part of your model.
To ensure your chatbot understands the difference between good and bad inputs, use diverse words pertinent to the use case. For restricted cases, strictly control the vocabulary.
To make words usable by machine learning algorithms, they are converted into numerical vectors with hundreds or thousands of dimensions, similar to coordinates on a graph.
The advantage of this approach is that it naturally clusters words with similar meanings, such as "boat" and "ship," close to each other in the vector space. With the right techniques, word vectors can capture even more intricate relationships. For instance, "cars" and "planes" are both forms of transportation and will be related, yet their differences are also preserved.
If we put these modes of transportation on a simple graph, it looks something like this:
Now that we know how important word diversity and word order are, we can use this insight to create a step-by-step guide on how to make a good set of expressions.
To make this process easier, generate expressions from your Intents tab..
To be optimal in your manual entry of expressions:
Create a list of expressions with very different order of words. Try to come up with as many weird sentences as you can that might convey your intention of taking the train. Donβt worry if you forget something, you can always come back to this step later and add other sentences.
Be diverse. In this step, you take all of the sentences listed above and come up with as many synonyms as possible.
Start by chunking a general expression that represents your intent into a series of components.
Find synonyms for each of these componets. Here it's important to cover the highest word diversity possible for each word.
Repeat this for every word or set of words.
If later you come up with an additional synonym, you can just add it to the list of alternatives. That way you donβt have to check all your expressions again to see if you missed anything.
Following this approach you can quickly generate a large amount of expressions. If you look at the example above for βI want to go to Paris by trainβ, all possible combinations for this sentence amounts up to 6 x 3 x 3 x 2 x 2 x 3 = 648 expressions! So if you repeat this for all of the 5 sentences above, you'll end up with more than 3.000 expressions and you only had to come up with 5 or 6 sentences. Easy does it.
Word order and word diversity aren't the only elements to take into account when making optimal sets of expressions. Have a read at our list of best practices as well.