Voice Assistants have shifted the way we communicate with ubiquitous services by enabling the use of natural language communication and the analysis of acoustic and linguistic language patterns. Speech skills of children are not yet fully developed; therefore, most conversational agents frequently misunderstand them.
Bolita (i.e., “little ball” in Spanish), is a bilingual conversational agent that teaches children how to tell jokes. Bolita uses techniques from Joke Telling Social Stories and implements constrained and spontaneous production tasks. Bolita runs in a smartphone with Google Assistant and uses the Sphero Bolt Robot to provide visual and auditory feedback as a reward to children.
The Bolita telling joke app works as follows:
- Introduction. Bolita gives a welcome message: “Hello my name is Bolita and I like to tell jokes to make others laugh. I am very fun. You will see. Come on! Say I want a joke.” While Bolita talks, the Sphero spins slowly and shows an equalizer animation in the LED matrix.
- Telling joke. After the child say, “I want a joke,” Bolita tells a joke. To support spontaneous production, Bolita asks the child: “Do you like the joke?” If the child likes the joke, Bolita asks the child: “Why do you like the joke? If the child does not like the joke, Bolita tells another one. While Bolita tells the joke, the Sphero shows the same movement and animation from the introduction. When the joke ends, the Sphero starts laughing and spins quickly, and the led matrix changes colors.
- Repeat a Joke. After the child gives an opinion, Bolita invites the child to repeat the joke together following a constrained production task scheme. Jokes are divided in two sentences, introduction, and punch line. First, Bolita tells the introduction part of the joke and asks the child to repeat it. Then, Bolita tells the punch line of the joke and asks the child to repeat it. When the child successfully repeats the joke, the Sphero spins slowly and shows an equalizer animation in the LED matrix.
Interaction measurements: The voice data was automatically saved in the Google account of the smartphone. We downloaded the voice data and manually separated it into folders. We automatically saved all transcripts generated by DialogFlow in a database on Google Firebase, and we manually transcribed all audios.
We analyzed 2184 audios and transcripts, the results of the IPTA-3 from each child, and the questionnaire responses. The scores of the ITPA-3 test were interpreted according to its speech manual (very high, high, above-average, below-average, average deficient, very deficient). We generated a frequency table in R to analyze the answers to the questionnaire.
To analyze audio recordings, we used Parselmouth in Python to extract acoustic features.
Evaluation methods: We enrolled 37 young children (18 boys/19 girls) between 8 and 11 years old (9.76 ± 1.10). Participants were recruited from an elementary school located in the northwest of Mexico. All participants speak Spanish. Before the study, children’s parents gave written informed consent on behalf of their children who are minors in Mexico.
Participants completed the verbal section of the Illinois Test of the Psycholinguistic Abilities- Third Edition (ITPA-3) to assess their spoken and written language. The specialist enrolled in the study was the psychologist who conducted the tests. After the test, all children talked with Bolita completing two sessions of five minutes each. At the end of the two sessions with Bolita, children answered a questionnaire about how much they liked talking with Bolita.
Outcomes and results: The result of this study shows that children enjoy speaking with Bolita. Our results also show that there are acoustic and linguistic features that can be used to characterize the speech skills of participants.
Our results show that most of the children liked to speak with Bolita. 58% of the children loved to speak with Bolita, 50% of the children found it super fun to speak with Bolita. In addition, 44.6% of the children would like to speak with Bolita again, and 80% of the children would like to have Bolita in their homes.
Our results show that children with below-average speech skills are more likely to be less understood by Bolita than other children. The WER for children with below-average language is 25%, while for all other children the WER is below 10%. In addition, WER is greater in spontaneous production(x=14.65%) than in constrained production (x=10.67%)(p‹0.05).
Our results show differences related to speech duration, the number of words, and time to respond among children with speech skills above-average, on average, and below-average.