-[x] I'll make a Twitter tweet collection program that will work for 3 months!
It was the last program that worked surprisingly well despite the fact that it was made quite appropriately, but since it was made by imitating what you see, is this really good? I can't get rid of the question. The official explanation is at the level of one pella, and there is no API reference. Stream API is actually ~~ left unattended without demand ~~ Isn't it popular?
Perhaps there is a page that explains in detail somewhere, but I can hardly read English so I do not understand well. Even if you are using Tweepy, do you think that you are not looking at anything other than the main things, or do you think that it is self-evident except for the parts that are introduced?
Anyway, there is too little information around the Stream API than I imagined. Above all, leaving the official document unattended is the least fashionable. I'm scared because the function I want is "actually implemented" or something. If this happens, you have to check the source directly . Fortunately, Tweepy is an open source software published under the MIT License, published on GitHub. In other words, it's not impossible if you try to read it-if you have the time, motivation, and ability --- it should be.
streaming.py , there was a source with the name itself, so open this When I try it, there are three classes. * StreamListener * ReadBuffer * Stream The two other than ReadBuffer are used in the first program I made. If the ReadBuffer class is exactly what it sounds like, it's okay to leave it alone for now. Let's take a look at "StreamListener" and "Stream".
Isn't it okay to understand that the class that passes the information received by the Stream class and actually processes it? Actually, after inheriting this class, the method of the parent class is overridden to implement the original operation, and other than that, the predetermined operation of the parent class is performed ... , isn't it? </ small>
There were quite a few methods that could be overridden. Below is a list (check the description against the reference in dev.twitter.com ) ..
What is it, isn't there a good message sent? There seems to be a proper "connected" message. I was a little relieved. For the method called from on_data, it seems that it will be disconnected if the return value is False. on_error is implemented to return False if not overridden. You can't see this without looking at the Stream class and checking the caller.
By the way, the messages sent by Stream API are published on dev.twitter.com . However, it seems that the messages "status_withheld", "user_withheld", "scrub_geo", "for_user", and "control" are not checked by on_data. Is it because it doesn't make much sense? If you want to see these, you have to override on_data and override most of the above methods. I don't know.
It looks like a class that implements connect, disconnect, and receive loops. Make the initial settings for connection with "init (constructor)", and call the methods corresponding to Stream API of "userstream", "firehose", "retweet", "sample", "filter", and "sitestream" to go through "_start". Call "_run".
The loop from "while self.running:" performs specific connection processing, and the execution loop after connection is handled by "_read_loop". If the connection process fails with "self.session.request ()", call on_error of StreamListener class, escape from the While loop in case of False, and reconnect while waiting for a certain period of time while turning the error counter in other cases ...
** Isn't there a reconnection function! ** **
That's right, there is no reason not to implement the troublesome story in the library. It's not a story to let the user do it.
The hurdle has suddenly dropped for the most important requirement specification, "** A function to reconnect in the event of an unexpected disconnection **". On the other hand, I'd like to ask for an hour why the details of these important functions are not described in an easy-to-understand manner ...
By the way, in "\ _read_loop", "on_data" of StreamListener class is called, and if False is returned here, the loop is escaped. At this time, self.running where False is stored is also seen in the connection loop of the caller "_run", and the connection loop is exited and the session is terminated.
take heart. As a result of looking at the source
Looking at the source specifically, there is nothing that corresponds to (1). I mean, it doesn't return True or False as a return value. ...... Is this okay? * I'm checking with "is False:", so it's OK * because it doesn't equal in comparison when the value doesn't return? Regarding (2), on_timeout does not return a value and returns, so it is OK if you override only on_error and return True. (3) What should I do?
However, for warnings as well as errors, it seems that it will be useful to keep a log later, so override "on_connect" "on_disconnect" "on_limit" "on_timeout" "on_warning" "on_exception" To.
Exceptions are exceptions, so it is difficult to deal with them. I think, the exception in the processing loop of the Stream class is raised after calling on_exception of StreamListener, so it should be possible to exclude this at the caller. Otherwise, what if it happens during individual processing in StreamListener? However, if you follow the caller, it will be a StreamListener, so it will eventually gather there ...? If it still stops with an exception ... Would you like to call the script in a loop with a batch file? ??
Based on the top priority "** Function to reconnect in case of unexpected disconnection **", the source up to now looks like this.
tweetcheck2.py
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tweepy
#Get it yourself and put it in
CK = '' # Consumer Key
CS = '' # Consumer Secret
AT = '' # Access Token
AS = '' # Accesss Token Secert
class Listener(tweepy.StreamListener):
def on_status(self, status):
print('Tweet') #I can't read it anyway, so I'll just tell you that there was a message
return True
def on_error(self, status_code):
print('Error occurred: ' + str(status_code))
return True
def on_connect(self):
print('Connected')
return
def on_disconnect(self, notice):
print('Disconnected:' + str(notice.code))
return
def on_limit(self, track):
print('Receive limit has occurred:' + str(track))
return
def on_timeout(self):
print('time out')
return True
def on_warning(self, notice):
print('Warning message:' + str(notice.message))
return
def on_exception(self, exception):
print('Exception error:' + str(exception))
return
#Main processing from here
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)
while True: #infinite loop
try:
listener = Listener()
stream = tweepy.Stream(auth, listener)
#Select one and uncomment it.
#stream.filter(track=['#xxxxxx'])
stream.sample()
#stream.userstream()
except:
pass #Ignore all exceptions and loop
Display all messages that threaten execution, such as errors and warnings, and reconnect when an error occurs. In the unlikely event that an exception occurs, the main processing section will exclude it, ignore the exception, loop it, and re-execute.
Finished? …… Ctrl + C …… </ small>
(When I actually tried it, I couldn't stop it (stupid. Is there a way to escape the loop only with Ctrl + C?)
The basic form of Twitter-related processing is probably like this for the time being. I'm wondering if this is really okay, but ... ** I'm looking for Tsukkomi . Next time, on the MongoDB side, I think I'll start working on the top priority " Storage of received data in MongoDB **". (Continue)
Recommended Posts