I'm making a blog by practicing Django, and I defined the following model according to the textbook. At that time, I didn't understand the significance of unicode, so I summarized it.
class Blog(models.Model):
	title = models.CharField(max_length=100, unique=True)
	slug = models.SlugField(max_length=100, unique =True)
	body = models.TextField()
	posted = models.DateField(db_index=True,auto_now_add=True)
	category = models.ForeignKey('blog.Category')
	def __unicode__(self):
		return '%s' % self.title
It seems that many people know it, but of course I am a beginner and I don't know. I'm a self-proclaimed "Guguru Kas", so Kas googled like Kas.
Unicode is one of the means to handle millions of languages in a unified way on a computer.
There are many ways computers can understand natural language, to give an example:
I'm confused.
So, these methods are like encoding, which is a device for the computer to understand, but it seems that it depends on each method.
So, of course, if you handle a character string with ordinary python, it will be encoded, but since there are many backgrounds of people who read on the web, and each is exchanged with a computer in natural language using different methods, It seems that it will be messed up depending on the environment of the reader.
For example, suppose the letter A is encoded in ASCII and stored on a computer. Then, let's think about what happens when the caller's environment is utf-8.
Character A → ASCII → character code= 123456
It is assumed that it is saved with the code such as.
So, if the caller's environment is utf-8, ...
Character code= 123456 → utf-8 → Letter B
A different character string is returned. In the end, the same character string does not exist, and the entire web page becomes ??????????????.
I came to this point and thought, "Should I handle all character codes with unicode?", But it seems that the world is not so convenient. Heaven may give two things, but apparently the human world is always a trade-off.
unicode has abandoned the ability to ** show to humans ** in exchange for gaining the ability to be a unified standard for computers
It looks like. Or rather, if it doesn't, Tsuji will not match, so I decided to understand it tentatively. (So the title is a memo.)
Let's get back to Django.
In Django, all the information seems to be exchanged in unicode within the framework. So, unicode is a little skipped when showing it to humans, so in order to fix it, it is necessary to write unicode in the model.
For the time being, the article is over in a situation full of rushes, but for the time being, I will make a paragraph and move on.
I know what I need to know (should)
Recommended Posts