Word order and diversity of expressions

This tutorial dives deeper on how to create optimal sets of expressions for your NLP model using word order and diversity of words.

To ensure your chatbot effectively understands users, it must analyze two key elements:

  • the specific words used

  • their order in which these words are used

Word order and word diversity aren't the only elements to take into account when making optimal sets of expressions. Have a read at our list of best practices as well.

Word order

Word order can significantly impact meaning. For instance, I want to take the train, not the car differs from I want to take the car, not the train, despite containing the same words in a different order.

When generating expressions for your chatbot, it's crucial to account for all plausible word orders and syntactic structures.


For instance, if you're feeding your book_train_ticket intent with expressions, make sure to account for different :

  • I want to take the train to Paris.

  • To Paris, I want to take the train.

  • I want to go to Paris by train.

While some structures might not be perfectly grammatical, it's important to include them, as users might not always be syntactically correct. If the meaning is clear to a human, it should be part of your model.

Word diversity

To ensure your chatbot understands the difference between good and bad inputs, use diverse words pertinent to the use case. For restricted cases, strictly control the vocabulary.

Word vectors

To make words usable by machine learning algorithms, they are converted into numerical vectors with hundreds or thousands of dimensions, similar to coordinates on a graph.

The advantage of this approach is that it naturally clusters words with similar meanings, such as "boat" and "ship," close to each other in the vector space. With the right techniques, word vectors can capture even more intricate relationships. For instance, "cars" and "planes" are both forms of transportation and will be related, yet their differences are also preserved.

If we put these modes of transportation on a simple graph, it looks something like this:


For example, to design a chatbot for arranging land transportation, you must be precise with the horizontal axisβ€”no water means no boat transport. The vertical axis can be more flexible; for instance, a bike suffices for small packages in crowded cities, whereas larger freight requires trucks or trains. The varying transport methods you would consider are highlighted in yellow below.

Consider another scenario where you need to transport something large, regardless of the mode of transportβ€”land, water, or air. Here, cars and bikes are unsuitable, but a boat, train, or plane would be appropriate. In this scenario, you should allow some variety horizontally but limit vertical diversity, as depicted by the blue box in the illustration.

How to create optimal sets of expressions

Now that we know how important word diversity and word order are, we can use this insight to create a step-by-step guide on how to make a good set of expressions.

To make this process easier, generate expressions from your Intents tab..

To be optimal in your manual entry of expressions:

  1. Create a list of expressions with very different order of words. Try to come up with as many weird sentences as you can that might convey your intention of taking the train. Don’t worry if you forget something, you can always come back to this step later and add other sentences.


If we look at the expressions shown above for taking the train, we can already use the sentences above and add a few more:

  1. I want to take the train to Paris

  2. To Paris i want to take the train

  3. I want to go to Paris by train

  4. By train is how I want to go to Paris

  5. To Paris by train is how I want to go

  6. …

  1. Be diverse. In this step, you take all of the sentences listed above and come up with as many synonyms as possible.

  • Start by chunking a general expression that represents your intent into a series of components.


To illustrate how this works, let’s use the sentence: I want to go to Paris by train. First you strip this sentence of its components, like so:

  • I want to

  • go

  • to

  • Paris

  • by

  • train

  • Find synonyms for each of these componets. Here it's important to cover the highest word diversity possible for each word.


Take I want to for example. You can replace this with:

  • I would like to

  • I must

  • I should

  • I prefer to

  • Let me

  • …

  1. Repeat this for every word or set of words.


Using expression generation, this will look something like this:

  • I want to go to Paris

  • I would like to travel in the direction of Paris

  • I should go to Paris

If later you come up with an additional synonym, you can just add it to the list of alternatives. That way you don’t have to check all your expressions again to see if you missed anything.

Following this approach you can quickly generate a large amount of expressions. If you look at the example above for β€œI want to go to Paris by train”, all possible combinations for this sentence amounts up to 6 x 3 x 3 x 2 x 2 x 3 = 648 expressions! So if you repeat this for all of the 5 sentences above, you'll end up with more than 3.000 expressions and you only had to come up with 5 or 6 sentences. Easy does it.

Word order and word diversity aren't the only elements to take into account when making optimal sets of expressions. Have a read at our list of best practices as well.

Last updated