NLP best practices
Some of the best ways to set up a strong and reliable NLP model for your bot.
⭐️ Best practices to create good intents
Some tips to create the best intents:
Use intents that you already have
Have customer data like FAQs? Analyze it to see which questions are most common and important, then program your bot to handle those. Calculate each question's volume to prioritize them. Unsure what users need? Start with a simple click-bot to gather data, then use that info to define your bot's intents.
Start small
Start small. If your bot can handle 5 out of 15 questions but covers 80% of queries, that's a great start. This allows your team to address the remaining 20%. Add new intents based on user data and let your bot grow naturally. It's better to start small and excel than to do too much and fail.
Balance the number of expressions per intent
It’s important to have about the same number of expressions per intent to make sure that the bot doesn't train more on intents with a large expression count, ignoring the ones with a less expressions.
Revise and optimize
Start with a general intent to trigger a basic flow, then add follow-up questions to understand user needs better, allowing you to refine intents later.
For example, in a telco support bot:
Issue with phone
Issue with wifi
Each intent can cover multiple issues (battery, screen, software, lost order for phones; connection types for wifi). Use follow-up questions to specify the problem (e.g., phone model, modem type). Over time, analyze user messages: if users often specify phone models but not wifi issues, create more intents or use entities for phones while keeping wifi intents broad. Creating intents is an ongoing, iterative process.
Avoid conflict
When intents are very similar, merge them to avoid confusion. For example, if you have intents for booking train and bus tickets, merge them into one 'booking tickets' intent, and differentiate by the transportation mode entity.
⭐️ Best practices to create good expressions
Creating a good set of expressions is key to create a smart bot. The accuracy of your bot stands or falls with the quality of your expressions, so make sure to spend enough time on this, as well as reviewing them regularly.
Here are some tips & tricks for creating good expressions:
Use diverse expressions in terms of vocabulary and structure
For more information, read our dedicated article.
Use real live data
Chances are there are already a lot of user expressions which you can feed to your bot. Think customer support logs, social media posts, comments on your company's forum etc.
Use pre-built intents
No need to reinvent the wheel when you can download the wheel directly on the Chatlayer platform! We have a lot of pre-built intents ready for you to use. Simply download them, train the NLP, and you're good to go!
Be specific
Expressions must match a specific intent. For change_address, phrases like I have a question are too vague. For forgot_password, I forgot it is insufficiently specific. Be clear and precise.
Avoid filler words
Avoid adding the expression hello, I want to book a train ticket. Can you help me with that? Thanks, because this sentence contains too many irrelevant words. Simply use I want to book a train ticket which is shorter and more relevant.
Use real language
Add words and sentences to your bot which a real person would use in this conversation. Don’t use entire paragraphs or language which is overly formal. Keep it light and natural instead. Make use of real user messages in case you have them; data is knowledge.
Allow for slang and dialect
Feel free to use slang words, common abbreviations (e.g. asap instead of as soon as possible) and regional dialects. Don’t overdo it though: only stick to things the majority of people would actually use.
Create enough expressions
To achieve optimal bot performance, ensure each intent has 40 to 50 expressions. For excellent behavior, aim for 200 to 400 expressions per intent. Regularly review your user data and incorporate user-provided expressions to continually improve your model’s accuracy.
Keep the number of expressions balanced
Ensure a balanced number of expressions per intent. If one intent has 100 expressions and another only 10, the model will more often match user messages to the intent with 100 expressions, causing overtriggering. Inaccurate matches happen because the model learns better from the intent with more data.
Use correct spelling
Ensure each word in the training data is correctly spelled. The engine maps words to numeric formats but only for a predefined 200,000-word vocabulary. Misspelled words can lead to incorrect interpretations, like pone being corrected to pony or phone. Verify spelling to ensure your bot accurately learns relevant meanings.
Lower case vs UPPER CASE
Users often do not use capitalisation when chatting with a bot. However, for intent classification, capitalisation is ignored, so you do not have to worry about it. But be careful: capitalisation is relevant for entity extraction.
No need for punctuation (or accents)
Punctuation and accents are ignored by our NLP, so don't worry about adding them. For instance, élève is treated the same as eleve.
⭐️ Best practices to create good entities
Entities should only be used if their value is needed in the bot flow.
When adding entities to your training data, take the following things into account:
Punctuation
Do not include any punctuation like '.' or '?' in your entity. '-' is ok, as it is often part of the entity, as in Sint-Niklaas.
Capitalisation
The entity extraction models are not case sensitive. So there is no need to add both Brussels and brussels.
Words, not sentences
Entities are a word or small number of words, usually noun phrases. Never mark full sentences or bigger phrases as an entity. In case users often use paraphrases instead of a word, which frequently happens with more technical terms, such as the little box that I use in order to have internet everywhere in my house instead of wifi extender, consider not using entities but a separate intent.
Display entities in expressions
We recommends adding at least 30 expressions per entity, to guarantee the quality of the entity detection
Last updated