I was wondering how the diff information was visualized. I want to finally make the colored difference information in pdf format.
When I looked it up in various ways, I wondered if python-docx could be used, and evaluated it. I will summarize it in. I thought about MarkDown, but the disposition of a bad engineer that the detailed layout and appearance can be corrected with the GUI of Word if it is a method of generating docx made me choose this method (bitter smile). If you can create Word, you can create PDF.
I hope it will be helpful for those who want to create a Word template based on the results of various analyzes with Python.
This time, we confirmed the following functions.
Originally, it would be smooth if it was done with Python on Windows, but this time, due to various circumstances, we confirmed the operation in the following environment. No, like diff, I wanted to process the data that was messed up on cygwin, so it became this environment.
When using Cygwin, check the character code of all python files in UTF-8 format and line breaks in \ n only format.
Also, for ** MS-DOS prompt **, the line break is \ r \ n and the character code is SJIS. Also in this chapter
# -*- coding: utf-8 -*-
A certain description
# -*- coding: shift-jis -*-
If so, it should be cool.
Python is already included.
Installation is described here [https://python-docx.readthedocs.io/en/latest/user/install.html#install). The conditions are as follows.
You can usually do it with pip install python-docx, maybe. The MS-DOS prompt version went smoothly in my environment as well.
First of all, although it is the above condition, the following libraries must be installed on Cygwin. If these are not included, an error will appear as if there is no header, so enter them. ** May not be included as standard. ** **
Furthermore, if you have python in both Windows and Cygwin like me, you need to be careful, and if you pip install, it will be in Windows depending on the path setting and so on. So, I installed it by the following method.
Click here to set up Cygwin (http://qiita.com/GDaigo/items/a80003684fc6ab7505fd#%E3%82%BB%E3%83%83%E3%83%88%E3%82%A2%E3%83] % 83% E3% 83% 97) may also be helpful
easy_install-2.7 python-docx
Below is a description of the head family.
https://python-docx.readthedocs.io/en/latest/index.html
The tutorial should be easy to understand. However, it is quite difficult to find what you want to do. In my case, I had a lot of trouble with the character modification system, but the relationship was summarized below.
http://python-docx.readthedocs.io/en/latest/user/text.html
It is troublesome to explain the specifications of the library in detail, so I tried to express it in the code below.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# SimpleDocService
# python-We provide a simple service related to dox.
#Well, it's the code to understand the docx library.
#
from docx import Document
from docx.shared import RGBColor
from docx.shared import Inches
from docx.shared import Pt
class SimpleDocxService:
def __init__(self):
self.document = Document()
self.latest_run = None
def set_normal_font(self, name, size):
#Font settings
font = self.document.styles['Normal'].font
font.name = name
font.size = Pt(size)
def add_head(self, text, lv):
#Heading settings
self.document.add_heading(text, level=lv)
def open_text(self):
#Start adding text
self.paragraph = self.document.add_paragraph()
def close_text(self):
#Text addition finished
return #Currently no processing
def get_unicode_text(self, text, src_code):
# python-Convert to unicode for docx
return unicode(text, src_code)
def adjust_return_code(self, text):
#If you add the data of the text file as it is, a line break will occur.
#Remove it as it will be a hassle
text = text.replace("\n", "")
text = text.replace("\r", "")
return text
def add_text(self, text):
#Add text
self.latest_run = self.paragraph.add_run(text)
def add_text_italic(self, text):
#Add text (italically)
self.paragraph.add_run(text).italic = True
def add_text_bold(self, text):
#Add text (emphasize)
self.paragraph.add_run(text).bold = True
def add_text_color(self, text, r, g, b):
#Color the letters
self.paragraph.add_run(text).font.color.rgb = RGBColor(r, g, b)
def add_picture(self, filename, inch):
#Insert figure
self.document.add_picture(filename, width=Inches(inch))
def save(self, name):
#Output as a docx file.
self.document.save(name)
SimpleDocxService is a class that collects APIs of various functions evaluated this time. It provides the following functions.
API | motion |
---|---|
set_normal_font(name, size) | Set the standard text font. name is the name and size to size |
add_head(text, lv) | Creating headlines. text is the heading name. lv is level(0=Title, 1=Heading 1,...) |
open_text() | Open text area (*) |
close_text() | Close text area (*) |
get_unicode_text(text, src_code) | src_Generates and returns a unicode character string from the character code specified by code |
adjust_return_code(text) | Generates and returns text with line breaks erased |
add_text(text) | Write text data to a word document |
add_text_italic(text) | Write text data to a word document, make the typeface italic |
add_text_bold(text) | Write text data to a word document, make the typeface bold |
add_text_color(text, r, g, b) | text Write data to a word document. Specify the color with rgb. Example: r=255, g=0, b=Red at 0 |
add_picture(filename, inch) | Insert the image data specified by filename. inch is the horizontal inch size |
save(name) | Save as a word file with the file name specified by name |
Some supplements.
This is related to the behavior of python-docx, so I will supplement it with code. The code that actually writes the text is as follows. This is the code taken from Honke.
p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True
How it will be displayed is also in Home. In this way, you can get a paragraph and add text to it. It seems that text modification can also be done at the time of this add_run. Perhaps this also involves the structure of the docx file.
So, open_text is used to take a new paragraph. It is not necessary in python-docx, but the idea is to use close_text to complete the series of descriptions. for that reason, The text using the SimpleDoxService class is described as follows.
docx = SimpleDoxService()
docx.open_text()
docx.add_text("This is a my best book.\n")
docx.add_text("Do you know this?")
docx.close_text()
The reason why I do this is considering the relationship with the figure. If you want to include a picture, use add_picture as you can see in the code above. At this time, suppose you write the following (code that directly uses python-docx without using the SimpleDocxService class).
p = document.add_paragraph('A plain paragraph having some ')
p.add_run("text1\n")
document.add_picture("sample.png ", width=Inches(1.25))
p.add_run("text2\n")
In this case, of course, in a sense,
text1
<<sample.png diagram>>
text2
not
text1
text2
<<sample.png diagram>>
It becomes. So, I wanted to clarify that in the app code, so I added the concept of open and close. This will be shown later in the code of the sample application, so I hope you can refer to that as well.
In the case of Python, the character code is rather troublesome. It is necessary to pay attention to which character code the library handles. In the case of python-docx, it seems that it is processed by unicode, so in the case of Japanese, it is necessary to convert to unicode. There are various conversion methods here, but it seems that it is necessary to use this get_unicode_text function method to make it unicode (it seems that it is not SJIS because it is Word ...).
This is the code I put in by cut and try. Sorry. It seems that if you use the text with line breaks as it is, unnecessary line breaks will be inserted. The way I did it was to use this adjust_return_code function to prevent it.
In the next section, we will actually create a word file using the code of this SimpleDocxService class.
This time, create a WORD file with the following configuration as a sample.
Below is the material used in the sample. I'm a professional student again. ..
First of all, the image under the title is a file called ** report_top.png **, which looks like this.
Next, the text file is ** sample.txt **, which looks like this. Well, it's an excerpt from My blog ...
I think that the basic role of a manager is to move multiple people and achieve results.
Therefore, it is difficult to ignore people's emotional and mental problems. I think this is a little different from accepting the other party. After considering such a problem to some extent, I dared to ignore it.
Another picture is a file called ** sample_pic.png **, which looks like this.
The sample posted below was created using this. Of course, the image and text do not have to be this. However, please note that the text is assumed to be SJIS in Japanese and the line break is \ r \ n in Windows.
By the way, the material of professional student is obtained from the following, and the size and character insertion are processed. http://pronama.azurewebsites.net/pronama/
Below is sample code that uses the SimpleDocxService class to generate a word.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from docx_simple_service import SimpleDocxService
if __name__ == "__main__":
docx = SimpleDocxService()
#Font settings
docx.set_normal_font("Courier New", 9)
#Title display
docx.add_head(u"Main title", 0)
#Illustration insertion
docx.add_picture("report_top.png ", 3.0)
#Phrase title display
docx.add_head(u"First topic", 1)
# shift-Put the jis text file in the docx text
f = open("sample.txt")
text = f.read()
f.close()
docx.open_text()
docx.add_text("\n")
text = docx.get_unicode_text(text, 'shift-jis')
text = docx.adjust_return_code(text)
docx.add_text(text)
docx.close_text()
#Illustration insertion
docx.add_picture("sample_pic.png ", 5.0)
#Generate text in code and put it in docx.
#An example of qualification is also here.
docx.open_text()
docx.add_text("\nThis is a my best book.")
docx.add_text("\nThis is ")
docx.add_text_bold("a my best")
docx.add_text(" book.")
docx.add_text("\nThis is ")
docx.add_text_italic("a my best")
docx.add_text(" book.")
docx.add_text_color("\nThis is a my best book.", 0xff, 0x00, 0x00)
docx.close_text()
#Next phrase
docx.add_head(u"Second topic", 1)
#Generate text in code and put it in docx.
docx.open_text()
docx.add_text(u"\n Yes, that's it.")
docx.close_text()
#It's a save.
docx.save("test.docx")
print "complete."
You will have a docx like this.
I think it has the structure described above.
For python-docx, the code itself isn't too difficult once you know how to write it. You can understand it by comparing the above sample code with the code of SimpleDocxService class. So, if it is within this range, I think that you can do various things by changing the code posted here.
I used it below. Thank you for providing the wonderful software.
that's all.
Recommended Posts