First wall

The most important thing about counting is to treat the URI as 20 characters. If it's a URI, you should be able to easily retrieve it with a regular expression, but Twitter, who is shit lately, recognizes it as a URI even without a schema! Nice (white eyes)

It's possible to extract without a schema thanks to the tremendous effort of people on Twitter, regular expressions.

twitter-text-java/src/com/twitter/Regex.java at master · twitter/twitter-text-java

What is it anymore?

I could translate this into Python, but Python's regular expressions are too old to support various things.

I just crushed the place where the error occurred and it became like this.

twikoto3/twikoto3/twittertext/regex.py at master · azyobuzin/twikoto3 I only make the part related to the URI.

Python is worse than I expected ...

Second wall

All you have to do is normalize and count the number of characters. You can count it, but the problem is that you have to count by code point.

This was a big problem because it wasn't supported by Python 3, which is mainly Unicode.

This is what I met at that time

codepoint.py

It's in the public domain, so you can do whatever you want. It requires some modifications to work with Python 3, but it's very easy to count by code point.

import codepoint

text = "Yoshinoya" #http://blog.unfindable.net/archives/I wanted to do 2728, but died of a Qiita bug
length = len(codepoint.characters(text))

You can now count! !!

Click here for the full code

twikoto3/twikoto3/twittertext at master · azyobuzin/twikoto3

Character count on Twitter

First wall

Second wall