The thousand eyes on the chatbot. Evaluation prospects for a virtual assistant

17 March 2020by Ellysse

This chatbot does not respond consistently. This chatbot is not empathetic. This chatbot doesn’t understand names and, even worse, responds ambiguously. Moreover, it is never used by anyone. A user approves a chatbot / voicebot service with high expectations, for this reason the result is often unsatisfactory and the company suffers in terms of customer satisfaction.

How do you measure the success of a chatbot? There are many points of view that can be included in the evaluation phase. The important thing is to choose, before blindly pointing your fingers, from which angle to point your eyes to express your judgment!

Because the chatbot is a success for the user it must first of all be considered easy and quick to use (usability); must be able to provide complete and adequate responses (performance), make the user experience positively also in terms of emotion and involvement (empathy) and therefore correspond to his expectations with real skills (satisfaction).

A fundamental evaluation perspective to be taken into consideration is that relating to information Recovery, which focuses attention on the ability that the system has to recover information, giving space to evaluation criteria regarding accuracy, which aims to test the ability to understand the infinite ways of making a request; accessibility, the head of the ability to understand and respond also to the relationship to the context; finally efficiency, or the ability to perform the task by reaching the goal without spending excess resources.

From a purely linguistic point of view, the assessment is based on Grice’s 4 conversational maxims:

– do not be reticent or redundant, or do not say too much or too little

– be honest, don’t give false or unproven information

– be relevant to the topic of the conversation

– be clear, avoid ambiguity

From a technological perspective, on the other hand, the assessment of the chatbot’s degree of humanity, in simple terms, the naturalness with which it is able to hold a conversation with a human and verified for this assessment uses the Turing test.

Last but not least, the commercial value of the chatbot, measured in terms of effectiveness (number of users, duration and number of conversations) in relation to costs (number of agents, duration of conversations, number of failed conversations, of unwanted answers and repeated questions).

The evaluation of a chatbot can take place taking into consideration one or more perspectives. The important thing is to contextualize the success or failure of a chatbot by being equipped with the right glasses!