First post & I've just started learning programming, so there are a lot of poor sentences and codes, but I hope you'll read it.
When browsing images on Twitter, I was stressed by many tweets with only text and images other than the target genre. Therefore, I thought that it would be better if I could extract only the desired ones. (Summary: I want erotic images)
Get an API key to use the Cloud Vision API This article was helpful
Twitter API applies for usage and obtains an API key and token. It takes a little time and effort because it is necessary to describe the intended use in English. This article was helpful
The following three third-party libraries are used. All can be installed with pip.
main.py
import base64
import json
import os
import pickle
import time
import schedule
import tweepy
import requests
Import the library.
main.py
API_KEY = 'Twitter API key'
API_SECRET_KEY = 'Twitter API secret key'
ACCESS_TOKEN = 'Twitter Access token'
ACCESS_TOKEN_SECRET = 'Twitter Access token secret'
CVA_API_KEY = "Cloud Vision API key"
Keep each key you have obtained.
First, get the TL that is the source of the tweet. This time I use list_timeline because I want to pull the tweets of the account added to the list, but I think that it is also good to narrow down to a specific account by using user_timeline etc.
main.py
auth = tweepy.OAuthHandler(API_KEY, API_SECRET_KEY)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True)
#Get tweets from the timeline
def main():
with open('before_tl.pickle', 'rb') as f:
before_tl = pickle.load(f)
tl = api.list_timeline(owner_screen_name="List administrator's Twitter ID", slug="The name of the list you want to get")
with open('before_tl.pickle', 'wb') as f:
pickle.dump(tl, f)
for tweet in reversed(tl): #Reversed to sort tweets and RT times in chronological order
if not tweet in before_tl:
media_getter(tweet)
The reason for saving TL with pickle is to avoid over-tapping the pay-as-you-go GCP API. When passing a tweet from TL, it is collated with the previous TL and processing is executed only for new tweets.
main.py
#User's screen name from tweet(ID)Get
def username_geter(tweet):
if 'RT' in tweet.text:
return tweet.retweeted_status.user.screen_name
return tweet.user.screen_name
#Get the URL list of images
def media_getter(tweet):
try:
medialist = [d.get('media_url') for d in tweet.extended_entities["media"]]
name = username_geter(tweet)
for media in medialist:
img_save(media,name)
except:
print('Text Only')
The user's screen name is used as the file name when saving the image.
This completes the process of getting the image URL from Twitter.
From here, you can save the image and pass it to Cloud Vision for analysis.
main.py
#Save the image from the url and change the save destination according to the judgment
def img_save(media,name):
url_path = media.split("/")[-1]
file_name = "adult/" + name + url_path
response = requests.get(media)
image = response.content
with open(file_name, "wb") as f:
f.write(image)
identify = img_sort(file_name)
if identify == "adult":
print('---saved image---')
else:
import os
os.remove(file_name)
#Returns a judgment according to the result
def img_sort(img_path):
res_json = img_judge(img_path)
judgement = res_json['responses'][0]['safeSearchAnnotation']['adult']
if judgement == "POSSIBLE":
print(judgement)
return "possible"
elif judgement == "LIKELY" or judgement == "VERY_LIKELY":
print(judgement)
return "adult"
else:
print(judgement)
#Send the image to cloudvisoinapi and receive the result
def img_judge(image_path):
api_url = 'https://vision.googleapis.com/v1/images:annotate?key={}'.format(CVA_API_KEY)
with open(image_path, "rb") as img:
image_content = base64.b64encode(img.read())
req_body = json.dumps({
'requests': [{
'image': {
'content': image_content.decode('utf-8')
},
'features': [{
'type': 'SAFE_SEARCH_DETECTION'
}]
}]
})
res = requests.post(api_url, data=req_body)
return res.json()
The save destination of the image is determined by dividing the URL with "/" and listing it, and combining the extracted directory at the end with the screen name and directory.
The saved image is passed to the API, and the process is branched based on the returned result. Click here to see what value is returned (https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1?hl=ja#google.cloud.vision. v1.SafeSearchAnnotation).
It is a specification to save LIKELY (highly likely) and above and delete the others, but this time I changed the save destination according to the judgment to check the accuracy of Cloud Vision.
main.py
import shutil
elif identify == "possible":
new_file_name = "possible/" + name + url_path
shutil.move(file_name, new_file_name)
print('---saved possibleimage---')
else:
new_file_name = "other/" + name + url_path
shutil.move(file_name, new_file_name)
print('---saved otherimage---')
Let's do it. Processing is performed every 8 seconds using schedule.
main.py
if __name__ == "__main__":
schedule.every(8).seconds.do(main)
while True:
schedule.run_pending()
time.sleep(1)
(Because the image of a third party is used, the image is blurred)
I was able to safely extract and save only the erotic images. It is a masterpiece that images are added more and more. It was confirmed that the accuracy was quite high when compared with the one judged as POSSIBLE. Everything that you can see is judged to be LIKELY or higher.
This time I used SAFE_SEARCH_DETECTION (the ability to determine if an image contains harmful content), but there are many other features in the Cloud Vision API. If you make good use of the function, you can use it for various image collection and classification.
Try Google Cloud Vision API TEXT_DETECTION in Python I tried using Google Cloud Vision API How to use Tweety ~ Part 1 ~ [Getting Tweet]
Recommended Posts