For some reason, I got the message history from slack for about 1 year, so I will write how to implement it in Python. I reformatted the retrieved message to make it easier to analyze, but I can't publish it all because it's so bad. I would like to write an article again if there is a part that can be published.
Python has slackapi / python-slackclient, but I haven't used it this time. If you want to know how to implement using python-slackclient, I recommend reading other than this article.
Client
The token of slack is an instance variable so that it can be obtained from the environment variable or written directly in the main script. If you are using pipenv, it will automatically read .env
, so the one set in the environment variable is the default value. It is an implementation that depends on my development environment, but I also made it compatible with cases where pipenv is not used (I do not want to set it in the environment variable).
There is a method: BaseSlackMethod
in the argument of the request function, but this is because in the case of slack, each API endpoint is called a method. I will explain the implementation of BaseSlackMethod later, but I made BaseSlackMethod a base class so that I can increase the number of classes for method. Doing so made the request parameters manageable in code. You can save the trouble of going to the reference one by one. You did it!
src/slack/client.py
import os
from dataclasses import dataclass
from typing import Any, ClassVar, Dict
import requests
from src.log import get_logger
from src.slack.exceptions import SlackRequestError
from src.slack.types import Headers
from src.slack.methods.base import BaseSlackMethod
SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN", "")
logger = get_logger(__name__)
@dataclass
class SlackClient:
api_url: ClassVar[str] = "https://slack.com/api"
token: str = SLACK_API_TOKEN
def _get_headers(self, headers: Headers) -> Headers:
"""Get headers
Args:
headers (Headers)
Returns:
Headers
"""
final_headers = {
"Content-Type": "application/x-www-form-urlencoded;charset=utf-8",
}
if self.token:
final_headers["Authorization"] = f"Bearer {self.token}"
final_headers.update(headers)
return final_headers
def request(
self, method: BaseSlackMethod, headers: Dict[str, Any] = None,
) -> Dict[Any, Any]:
"""API request to Slack
Args:
method (BaseSlackMethod)
headers (Dict[str, Any], optional): Defaults to None.
Raises:
SlackRequestError
err
Returns:
Dict[Any, Any]: response body
"""
if not isinstance(headers, dict):
headers = {}
headers = self._get_headers(headers)
url = f"{self.api_url}/{method.endpoint}"
try:
res = requests.get(url, headers=headers, params=method.params)
if res.ok is False:
raise SlackRequestError(res.text)
except Exception as err:
logger.error(err)
logger.error("Data acquisition failure from slack")
raise err
else:
logger.info("Data acquisition completed from slack")
return res.json()
The API method to get the message history is conversations.history. Read the reference for more information on request parameters.
By dropping the parameters into the code as shown below, it is easier to understand the parameters that can be requested by method. The code can also be a good reference with the appropriate comments.
For the time being, I will explain the important parameters for acquiring the history for one year. They are cursor
and ʻoldest`. cursor is the next token to recursively get history. oldest specifies the start date and time of history as the general meaning. The point to note is that Unix Time stamp can be specified by oldest.
src/slack/methods/conversation.py
import os
from datetime import datetime
from dataclasses import dataclass, asdict
from typing import ClassVar, Optional
from src.slack.types import SlackParams
SLACK_CHANNEL_ID = os.getenv("SLACK_CHANNEL_ID", "")
@dataclass
class ConversationsHistory:
endpoint: ClassVar[str] = "conversations.history"
channel: str = SLACK_CHANNEL_ID
cursor: Optional[str] = None
inclusive: bool = False
limit: int = 100
latest: float = datetime.now().timestamp()
oldest: float = 0
@property
def params(self) -> SlackParams:
self_dict = asdict(self)
if self.cursor is None:
del self_dict["cursor"]
return self_dict
: arrow_down_small: is the base class.
src/slack/methods/base.py
from dataclasses import dataclass, asdict
from typing import ClassVar
from src.slack.types import SlackParams
@dataclass
class BaseSlackMethod:
endpoint: ClassVar[str] = ""
@property
def params(self) -> SlackParams:
return asdict(self)
I want to get the history for one year, so I use the formula datetime.now () --timedelta (days = 365)
to calculate the date and time one year ago. Timedelta is convenient because you can calculate the date and time one year later by changing minus to plus. Thank you ~~: pray:
This time, I adopted a simple while loop because I have to recursively get the history for another year. It's a crappy disposable script, so I didn't have to implement the if statement carefully to see if there was a next_cursor, but I didn't like ending with a KeyError, so I did that.
src/slack/__main__.py
from datetime import datetime, timedelta
from src.utils import save_to_file
from src.slack.client import SlackClient
from src.slack.methods.conversation import ConversationsHistory
def main() -> None:
tmp_oldest = datetime.now() - timedelta(days=365)
oldest = tmp_oldest.timestamp()
method = ConversationsHistory(inclusive=True, oldest=oldest)
client = SlackClient()
count = 1
while True:
res = client.request(method)
save_to_file(res, f"outputs/tests/sample{count}.json")
if (
"response_metadata" in res
and "next_cursor" in res["response_metadata"]
):
method.cursor = res["response_metadata"]["next_cursor"]
count += 1
else:
break
if __name__ == "__main__":
main()
When I tried to get the history of one channel for one year, more than 2000 lines of one file were created about 200 files. Terrifying: scream:
Reference
Recommended Posts