I was wondering how the diff information was visualized. I want to finally make the colored difference information in pdf format.

When I looked it up in various ways, I wondered if python-docx could be used, and evaluated it. I will summarize it in. I thought about MarkDown, but the disposition of a bad engineer that the detailed layout and appearance can be corrected with the GUI of Word if it is a method of generating docx made me choose this method (bitter smile). If you can create Word, you can create PDF.

I hope it will be helpful for those who want to create a Word template based on the results of various analyzes with Python.

Postscript

It is a must-see for shiracamus to improve in the comment section.
This time, we do not consider multiple paragraph support. By the way, if you correspond to it, it seems that it will be possible to read the data and this is Chapter 1 and this is Chapter 2.

What kind of docs to make

This time, we confirmed the following functions.

Create a title
Create heading 1 (2 or later should be able to use this application)
Embedding a text file written in Windows Shif-JIS
Embed text directly in Python
Insert figure

Operating environment, etc.

Originally, it would be smooth if it was done with Python on Windows, but this time, due to various circumstances, we confirmed the operation in the following environment. No, like diff, I wanted to process the data that was messed up on cygwin, so it became this environment.

Cygwin (32bit) / on Windows10
python2.7
python_docx-0.8.6-py2.7

When using Cygwin, check the character code of all python files in UTF-8 format and line breaks in \ n only format.

Also, for ** MS-DOS prompt **, the line break is \ r \ n and the character code is SJIS. Also in this chapter

# -*- coding: utf-8 -*-

A certain description

# -*- coding: shift-jis -*-

If so, it should be cool.

Install python-docx

Python is already included.

Installation is described here [https://python-docx.readthedocs.io/en/latest/user/install.html#install). The conditions are as follows.

Python 2.6, 2.7, 3.3, or 3.4
lxml >= 2.3.2

You can usually do it with pip install python-docx, maybe. The MS-DOS prompt version went smoothly in my environment as well.

Precautions on Cygwin

First of all, although it is the above condition, the following libraries must be installed on Cygwin. If these are not included, an error will appear as if there is no header, so enter them. ** May not be included as standard. ** **

libxml2
libxslt

Furthermore, if you have python in both Windows and Cygwin like me, you need to be careful, and if you pip install, it will be in Windows depending on the path setting and so on. So, I installed it by the following method.

Click here to set up Cygwin (http://qiita.com/GDaigo/items/a80003684fc6ab7505fd#%E3%82%BB%E3%83%83%E3%83%88%E3%82%A2%E3%83] % 83% E3% 83% 97) may also be helpful

easy_install-2.7 python-docx

How to use python-docx

Below is a description of the head family.

https://python-docx.readthedocs.io/en/latest/index.html

The tutorial should be easy to understand. However, it is quite difficult to find what you want to do. In my case, I had a lot of trouble with the character modification system, but the relationship was summarized below.

http://python-docx.readthedocs.io/en/latest/user/text.html

It is troublesome to explain the specifications of the library in detail, so I tried to express it in the code below.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#
# SimpleDocService
#  python-We provide a simple service related to dox.
#Well, it's the code to understand the docx library.
#

from docx import Document
from docx.shared import RGBColor
from docx.shared import Inches
from docx.shared import Pt

class SimpleDocxService:

    def __init__(self):
        self.document = Document()
        self.latest_run = None

    def set_normal_font(self, name, size):
        #Font settings
        font = self.document.styles['Normal'].font
        font.name = name
        font.size = Pt(size)

    def add_head(self, text, lv):
        #Heading settings
        self.document.add_heading(text, level=lv)

    def open_text(self):
        #Start adding text
        self.paragraph = self.document.add_paragraph()

    def close_text(self):
        #Text addition finished
        return #Currently no processing

    def get_unicode_text(self, text, src_code):
        # python-Convert to unicode for docx
        return unicode(text, src_code)

    def adjust_return_code(self, text):
        #If you add the data of the text file as it is, a line break will occur.
        #Remove it as it will be a hassle
        text = text.replace("\n", "")
        text = text.replace("\r", "")
        return text

    def add_text(self, text):
        #Add text
        self.latest_run = self.paragraph.add_run(text)

    def add_text_italic(self, text):
        #Add text (italically)
        self.paragraph.add_run(text).italic = True

    def add_text_bold(self, text):
        #Add text (emphasize)
        self.paragraph.add_run(text).bold = True

    def add_text_color(self, text, r, g, b):
        #Color the letters
        self.paragraph.add_run(text).font.color.rgb = RGBColor(r, g, b)

    def add_picture(self, filename, inch):
        #Insert figure
        self.document.add_picture(filename, width=Inches(inch))

    def save(self, name):
        #Output as a docx file.
        self.document.save(name)

SimpleDocxService is a class that collects APIs of various functions evaluated this time. It provides the following functions.

API	motion
set_normal_font(name, size)	Set the standard text font. name is the name and size to size
add_head(text, lv)	Creating headlines. text is the heading name. lv is level(0=Title, 1=Heading 1,...）
open_text()	Open text area (*)
close_text()	Close text area (*)
get_unicode_text(text, src_code)	src_Generates and returns a unicode character string from the character code specified by code
adjust_return_code(text)	Generates and returns text with line breaks erased
add_text(text)	Write text data to a word document
add_text_italic(text)	Write text data to a word document, make the typeface italic
add_text_bold(text)	Write text data to a word document, make the typeface bold
add_text_color(text, r, g, b)	text Write data to a word document. Specify the color with rgb. Example: r=255, g=0, b=Red at 0
add_picture(filename, inch)	Insert the image data specified by filename. inch is the horizontal inch size
save(name)	Save as a word file with the file name specified by name

Some supplements.

- About the text area

This is related to the behavior of python-docx, so I will supplement it with code. The code that actually writes the text is as follows. This is the code taken from Honke.

p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

How it will be displayed is also in Home. In this way, you can get a paragraph and add text to it. It seems that text modification can also be done at the time of this add_run. Perhaps this also involves the structure of the docx file.

So, open_text is used to take a new paragraph. It is not necessary in python-docx, but the idea is to use close_text to complete the series of descriptions. for that reason, The text using the SimpleDoxService class is described as follows.

docx  = SimpleDoxService()
docx.open_text()
docx.add_text("This is a my best book.\n")
docx.add_text("Do you know this?")
docx.close_text()

The reason why I do this is considering the relationship with the figure. If you want to include a picture, use add_picture as you can see in the code above. At this time, suppose you write the following (code that directly uses python-docx without using the SimpleDocxService class).

p = document.add_paragraph('A plain paragraph having some ')
p.add_run("text1\n")
document.add_picture("sample.png ", width=Inches(1.25))
p.add_run("text2\n")

In this case, of course, in a sense,

text1
<<sample.png diagram>>
text2

not

text1
text2
<<sample.png diagram>>

It becomes. So, I wanted to clarify that in the app code, so I added the concept of open and close. This will be shown later in the code of the sample application, so I hope you can refer to that as well.

get_unicode_text function

In the case of Python, the character code is rather troublesome. It is necessary to pay attention to which character code the library handles. In the case of python-docx, it seems that it is processed by unicode, so in the case of Japanese, it is necessary to convert to unicode. There are various conversion methods here, but it seems that it is necessary to use this get_unicode_text function method to make it unicode (it seems that it is not SJIS because it is Word ...).

adjust_return_code function

This is the code I put in by cut and try. Sorry. It seems that if you use the text with line breaks as it is, unnecessary line breaks will be inserted. The way I did it was to use this adjust_return_code function to prevent it.

In the next section, we will actually create a word file using the code of this SimpleDocxService class.

I will actually make it.

This time, create a WORD file with the following configuration as a sample.

Title
Insert a picture
Heading 1 (1st)
Text file string
Insert another picture
Python-qualified string
Heading 1 (2nd)
String specified in Python

Material

Below is the material used in the sample. I'm a professional student again. ..

First of all, the image under the title is a file called ** report_top.png **, which looks like this.

Next, the text file is ** sample.txt **, which looks like this. Well, it's an excerpt from My blog ...

I think that the basic role of a manager is to move multiple people and achieve results.
Therefore, it is difficult to ignore people's emotional and mental problems. I think this is a little different from accepting the other party. After considering such a problem to some extent, I dared to ignore it.

Another picture is a file called ** sample_pic.png **, which looks like this.

The sample posted below was created using this. Of course, the image and text do not have to be this. However, please note that the text is assumed to be SJIS in Japanese and the line break is \ r \ n in Windows.

By the way, the material of professional student is obtained from the following, and the size and character insertion are processed. http://pronama.azurewebsites.net/pronama/

Sample code

Below is sample code that uses the SimpleDocxService class to generate a word.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
from docx_simple_service import SimpleDocxService

if __name__ == "__main__":

    docx = SimpleDocxService()

    #Font settings
    docx.set_normal_font("Courier New", 9)

    #Title display
    docx.add_head(u"Main title", 0)

    #Illustration insertion
    docx.add_picture("report_top.png ", 3.0)

    #Phrase title display
    docx.add_head(u"First topic", 1)

    # shift-Put the jis text file in the docx text
    f = open("sample.txt")
    text = f.read()
    f.close()
    docx.open_text()
    docx.add_text("\n")
    text = docx.get_unicode_text(text, 'shift-jis')
    text = docx.adjust_return_code(text)
    docx.add_text(text)
    docx.close_text()

    #Illustration insertion
    docx.add_picture("sample_pic.png ", 5.0)

    #Generate text in code and put it in docx.
    #An example of qualification is also here.
    docx.open_text()
    docx.add_text("\nThis is a my best book.")
    docx.add_text("\nThis is ")
    docx.add_text_bold("a my best")
    docx.add_text(" book.")
    docx.add_text("\nThis is ")
    docx.add_text_italic("a my best")
    docx.add_text(" book.")
    docx.add_text_color("\nThis is a my best book.", 0xff, 0x00, 0x00)
    docx.close_text()

    #Next phrase
    docx.add_head(u"Second topic", 1)

    #Generate text in code and put it in docx.
    docx.open_text()
    docx.add_text(u"\n Yes, that's it.")
    docx.close_text()

    #It's a save.
    docx.save("test.docx")

    print "complete."

The SimpleDocxService class is assumed to be implemented in docx_simple_service.py in the same folder as this program.
It is assumed that the image files report_top.png and sample_pic.png are in the same folder as this program.
It is assumed that a text file called sample.txt with SJIS and line breaks \ r \ n is in the same folder as this program.
When this program is executed, it will be saved as test.docx in the same folder as this program.

Execution result

You will have a docx like this.

I think it has the structure described above.

For python-docx, the code itself isn't too difficult once you know how to write it. You can understand it by comparing the above sample code with the code of SimpleDocxService class. So, if it is within this range, I think that you can do various things by changing the code posted here.

The problem is until you find a way to write ... *

license

I used it below. Thank you for providing the wonderful software.

(I will write it for the time being ...) The above code is in the public domain. It's not enough code to claim copyright. However, of course, no one will undertake any damage when using it. Just be careful there.
Python itself is a PSF (Python Software Foundation) license.
For information on ↑, see [Python on Wikipedia](https://ja.wikipedia.org/wiki/Python#.E3.83.A9.E3.82.A4.E3.82.BB.E3.83.B3. E3.82.B9) is the source.
The license for python-docx is listed below. Sounds like MIT. https://github.com/python-openxml/python-docx/blob/master/LICENSE
Please note that the images of professional students must be used in accordance with "Professional students usage guidelines".

that's all.

[Python] [Word] [python-docx] Try to create a template of a word sentence in Python using python-docx