Chatbots are often used. This article is a memo of a chatbot that I made by reading an article on Medium. The original article is here:
https://medium.com/analytics-vidhya/building-a-simple-chatbot-in-python-using-nltk-7c8c8215ac6e
Chatbots are used in a variety of fields for different purposes, such as i) Support bots, designed to solve customer requests related to the delivery of a service or use of a product, and ii) Financial bots, aimed to resolve inquiries about financial services. Chatbots may have some constraints regarding the requests that they can respond and the vocabulary that they can employ, which depends on the specific domain where they are serving on. Furthermore, according to the Hype Cycle for emerging technologies by Gartner [2], conversational AI platforms remain in the phases of “innovation trigger” and “peak of inflated expectations”, meaning that they are getting substantial attention from the industry.
Besides the aforementioned use cases for chatbots, cybersecurity is one of the newest where to apply this technology. Thus, there exist chatbots focused on training end-users [3] or cyber analysts [4] in security awareness and incident response. Further, there are also malicious chatbots devoted to malware distribution through a human-machine conversation [5]. In addition, there is software designed to guide the user in terms of security and privacy, such as Artemis [6], a conversational interface to perform precision-guided analytics on endpoint data. Most of these security chatbots are implemented in a question-answering context [7] using a post-reply technique. As far as we know, the use of chatbots to profile suspects in an active way of child pornography has been little investigated, existing few approaches [8, 9] employing a chatbot to emulate a victim such as a child or a teenager. Likewise, our investigation aims to emulate a vulnerable person while the suspect offers him/her illegal content.
Building the chatbot
In this section, we'll cover NLP and NLG in this project.
First, we'll need to import the relevant libraries:
import io
import random
import string # to process standard python strings
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')
import nltk
from nltk.corpus import gutenberg
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True) # for downloading packages
If this is your first time using nltk, don't forget to run the following line and use the GUI to download all the packages.
nltk.download()
Preprocessing
raw = gutenberg.raw('bible-kjv.txt')
#Tokenisation
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences
word_tokens = nltk.word_tokenize(raw)# converts to list of words
# Preprocessing
lemmer = WordNetLemmatizer()
def LemTokens(tokens):
return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
It doesn't really matter which corpus you use. Here I use the bible from gutenberg.
Keyword Matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey")
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
"""If user's input is a greeting, return a greeting response"""
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)
Generating Response TfidfVectorizer and cosine_similarity will be used to find the similarity between words entered by the user and the words in the corpus. This is the simplest possible implementation of a chatbot.
def response(user_response):
robo_response=''
sent_tokens.append(user_response)
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(sent_tokens)
vals = cosine_similarity(tfidf[-1], tfidf)
idx=vals.argsort()[0][-2]
flat = vals.flatten()
flat.sort()
req_tfidf = flat[-2]
if(req_tfidf==0):
robo_response=robo_response+"I am sorry! I don't understand you"
return robo_response
else:
robo_response = robo_response+sent_tokens[idx]
return robo_response
Start and end of the conversation
flag=True
print("ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!")
while(flag==True):
user_response = input()
user_response=user_response.lower()
if(user_response!='bye'):
if(user_response=='thanks' or user_response=='thank you' ):
flag=False
print("ROBO: You are welcome..")
else:
if(greeting(user_response)!=None):
print("ROBO: "+greeting(user_response))
else:
print("ROBO: ",end="")
print(response(user_response))
sent_tokens.remove(user_response)
else:
flag=False
print("ROBO: Bye! take care..")
This approach is very basic for a chatbot as the response from the bot is simply based on keywords matching.
Recommended Posts