Inspired by, I'm going to publish a guy written in Python.
The most important thing about counting is to treat the URI as 20 characters. If it's a URI, you should be able to easily retrieve it with a regular expression, but Twitter, who is shit lately, recognizes it as a URI even without a schema! Nice (white eyes)
It's possible to extract without a schema thanks to the tremendous effort of people on Twitter, regular expressions.
twitter-text-java/src/com/twitter/Regex.java at master · twitter/twitter-text-java
What is it anymore?
I could translate this into Python, but Python's regular expressions are too old to support various things.
I just crushed the place where the error occurred and it became like this.
twikoto3/twikoto3/twittertext/regex.py at master · azyobuzin/twikoto3 I only make the part related to the URI.
Python is worse than I expected ...
All you have to do is normalize and count the number of characters. You can count it, but the problem is that you have to count by code point.
This was a big problem because it wasn't supported by Python 3, which is mainly Unicode.
This is what I met at that time
It's in the public domain, so you can do whatever you want. It requires some modifications to work with Python 3, but it's very easy to count by code point.
import codepoint
text = "Yoshinoya" #http://blog.unfindable.net/archives/I wanted to do 2728, but died of a Qiita bug
length = len(codepoint.characters(text))
You can now count! !!
Click here for the full code
twikoto3/twikoto3/twittertext at master · azyobuzin/twikoto3
Recommended Posts