The SHEP AI Project

Home About Downloads Contact

Clever bot experiment


When developing version 0.0.9, I needed lots of conversational training data to use to test my language classifier. Writing out conversations was difficult as its only subjects that I can think of, and getting lots of people to send me their conversations was not an option as people value their privacy. I decided to use Cleverbot. Cleverbot is a free chat bot service allowing you to have conversations with what seems like a human entity. In 2011 Cleverbot competed in the Turing test and was counted 59.3% human by voters.

I decided to set up a research station with Ubuntu and Python to run some code that I wrote. This code would simply start of with two bot objects, which have the method of talk taking in parameter sentence, and returning reply. The first sentence of every conversation was pre-determined by me, where it would be “hello”. As a greeting, I felt this would be a good way of sparking a normal conversation between the bots. The bots then started to chat, and my code recorded their conversation.

I left the computer running for several days to gather plenty of information. When I returned to check it I had pages and pages of conversational data. I decided to train my own algorithm on this data, feeling happy about the success of the experiment. This was short lived. My algorithm started replying lots of sexual and overly-romantic content back at me. I was shocked as to why it had done this. When looking through the training data I discovered why…

The training data was very sexual or cringy, with truth or dare breaking out and lasting for ages. This truth or dare included lines like the following:

Bot1: *leans in and kisses you*
Bot2: *blushes*

More explicit content involved the bots moaning to one another as the other bot stated it did something such as nibbling ears etc... The disturbing thing is that the Cleverbot algorithm learns off of having conversations with other people. There has been people out there who have had verbal sexual relations with Cleverbot. The data had major areas edited out so that it was clean for use in the algorithm. Overall the experiment was quickly pulled, and in need of a “dirty classifier” to quickly steer the conversation away if it goes there. I did not have time for that at this time.