I made a tool to generate Markdown from the exported Scrapbox JSON file

Introduction

"What about having been an engineer for five years and not producing any output?" I felt a sense of crisis, so I decided to post it on Qiita. It may be difficult to read because it is the first article, but please forgive me.

Overview

I decided to use Scrapbox for internal activities, but eventually I wanted to put the pages in Scrapbox on a file server so that I could leave it as an internal asset. Scrapbox has a function to export the contents of all pages as a JSON file, but it is difficult to read as it is. So I searched for a tool that would convert it to Markdown and save it, but I couldn't find a tool that looked good, so I made it myself using Python.

The exported JSON file has this format. (Exporting without metadata.)

`john-project.json`


{
  "name": "john-project",
  "displayName": "john-project",
  "exported": 1598595295,
  "pages": [
    {
      "title": "How to use Scrapbox",
      "created": 1598594744,
      "updated": 1598594744,
      "id": "000000000000000000000000",
      "lines": [
        "How to use Scrapbox",
        "Welcome to Scrapbox. You can freely edit and use this page.",
        "",
        "Invite members to this project",
        //Omission
        " [We publish use cases of companies https://scrapbox.io/case]",
        ""
      ]
    },
    {
      "title": "The first line is the heading",
      "created": 1598594777,
      "updated": 1598595231,
      "id": "111111111111111111111111",
      "lines": [
        "The first line is the heading",
        "[**Two asterisks are headline style]",
        "Indented bulleted list",
        " \t Increase the number to further indent",
        " [[Bold]]Or[/ italic]、[-Strikethrough]Can be used",
        " \t like this[-/* italic]Can be combined",
        "[Page link]Or[External link https://scrapbox.io]",
        "code:test.py",
        "　for i in range(5):",
        "     print('[*Ignore inside code blocks]')",
        "",
        "`[- code]`Ignore",
        "",
        "table:Tabular format",
        " aaa\tbbb\tccc",
        "Ah ah\t\t Uuu",
        "\t111\t222\t333",
        ""
      ]
    }
  ]
}

Using the created tool, it will be converted to the following Markdown file. (* In order to improve the appearance on Qiita, there is a part where a full-width space is added later to the code block and the end of the line of the table)

``The first line is the heading.md`


#The first line is the heading
###Two asterisks are headline style
-Indented bulleted list
  -Indentation further as the number increases
- **Bold**Or_italic_ 、 ~~Strikethrough~~Can be used
  -in this way_~~**italic**~~_Can be combined
[Page link]()Or[Externallink](https://scrapbox.io)
code:test.py
```　
　for i in range(5):
 print ('[* Ignore inside code block]')
```　

`[- code]`Ignore

table:Tabular format
|aaa|bbb|ccc|　
|-----|-----|-----|-----|
|Ah ah||Uuu|
|111|222|333|

The appearance is converted as follows. The line breaks are a little loose, but it's fairly easy to see.

最初の行は見出し.png

policy

Many members (including myself) are new to Scrapbox, and it seems that they are not very elaborate, so I decided to convert only the notations that I could use without aiming for perfect conversion. The conversion method is simple, just use regular expressions to find the parts written in Scrapbox notation and simply replace them with the Markdown format. Finally, exe it so that it can be used by people who do not have Python installed.

environment

I used Windows10 and Python3.7.

Implementation

File reading

Make sure to receive the JSON file name as the first argument. By doing this, you can use it by simply dragging and dropping the JSON file onto the exe file. Also, create a folder to output Markdown.

filename = sys.argv[1]
with open(filename, 'r', encoding='utf-8') as fr:
    sb = json.load(fr)
    outdir = 'markdown/'
    if not os.path.exists(outdir):
        os.mkdir(outdir)

conversion

From here, each page and each line will be converted in order. Write the conversion target in () of each heading.

Heading (first line)

Scrapbox interprets the first line as a heading, so add # (sharp + half-width space) to the beginning of the first line to make it a heading.

for p in sb['pages']:
    title = p['title']
    lines = p['lines']
    is_in_codeblock = False
    with open(f'{outdir}{title}.md', 'w', encoding='utf-8') as fw:
        for i, l in enumerate(lines):
            if i == 0:
                l = '# ' + l

Code block ( `code: hoge.ext `)

In Scrapbox, code blocks can be represented by code: hoge.ext. As long as the beginning of the line is blank, the code block will continue. I don't want to convert inside the code block, so I will proceed while determining whether the line I'm looking at is inside the code block. Markdown notation when entering and exiting a code block```Add.

# Code block processing
if l.startswith('code:'):
    is_in_codeblock = True
    l += f'\n```'
elif is_in_codeblock and not l.startswith(('\t', ' ', '　')):
    is_in_codeblock = False
    fw.write('```\n')

# Omission

# Convert if not a code block
if not is_in_codeblock:
    l = convert(l)

####table(`table:hoge`）

In Scrapboxtable:hogeThe table can be expressed with. The table continues as long as the beginning of the row is blank. Scrapbox tables don't have headers, but Markdown can't represent a table without headers, so it forces the first row to be interpreted as a header. The cells are separated by tabs, so|Convert to. Spaces at the beginning of a line can have tabs, half-width spaces, and full-width spaces, so they will be converted to muddy.

if l.startswith('table:'):
    is_in_table = True
elif is_in_table and not l.startswith(('\t', ' ', '　')):
    is_in_table = False
if is_in_table:
    row += 1
    if row != 0:
         l = l.replace('\t', '|') + '|'
        if l.startswith(' '):
            l = l.replace(' ', '|', 1)
    if row == 1:
        col = l.count('|')
         l += f'\n{"|-----" * col}|'

####code(`hoge`）

Since I don't want to convert in the code, I put a process to delete the code part before the conversion process of each notation. It's written in the same way as Markdown, so you can just delete it.

def ignore_code(l: str) -> str:
    for m in re.finditer(r'`.+?`', l):
        l = l.replace(m.group(0), '')
    return l

####hashtag(#hoge）

If this is written at the beginning of the string, it may be interpreted as a heading by Markdown (it seems to look different depending on the viewer). for that reason,`It is treated as a code by enclosing it in.

def escape_hash_tag(l: str) -> str:
    for m in re.finditer(r'#(.+?)[ \t]', ignore_code(l)):
        l = l.replace(m.group(0), '`' + m.group(0) + '`')
 if l.startswith ('#'): # If all lines are tags
        l = '`' + l + '`'
    return l

####Bulleted list (indent)

The number of indents is counted and replaced with the Markdown format.

def convert_list(l: str) -> str:
    m = re.match(r'[ \t　]+', l)
    if m:
        l = l.replace(m.group(0),
                      (len(m.group(0)) - 1) * '  ' + '- ', 1)
    return l

####Bold ([[hoge]]、[** hoge]、[*** hoge]）

In Scrapbox[[hoge]]Or[* hoge]If you do like, it will be bold. Also, in the latter notation[** hoge]If you increase the asterisk like, the characters will become larger.

Of the latter notations, the two and three asterisk notations were used like Markdown headings, so I've converted them accordingly. Other than that, it may be used at the same time as other decorations, so it will be converted separately.

def convert_bold(l: str) -> str:
    for m in re.finditer(r'\[\[(.+?)\]\]', ignore_code(l)):
        l = l.replace(m.group(0), '**' + m.group(1) + '**')
 m = re.match (r'\ [(\ * \ * | \ * \ * \ *) (. +?) \]', Igno_code (l)) # Probably the heading
    if m:
        l = '#' * (5 - len(m.group(1))) + ' ' + \
 m.group (2) #Scrapbox has more *
    return l

####Character decoration ([* hoge]、[/ hoge]、[- hoge]、[-/* hoge]etc)

In Scrapbox, in addition to bold, italics[/ hoge]And strikethrough[- hoge]Can be used. These are combined[-/* hoge]Since it can be used like, it processes at the same time.

def convert_decoration(l: str) -> str:
    for m in re.finditer(r'\[([-\*/]+) (.+?)\]', ignore_code(l)):
        deco_s, deco_e = ' ', ' '
        if '/' in m.group(0):
            deco_s += '_'
            deco_e = '_' + deco_e
        if '-' in m.group(0):
            deco_s += '~~'
            deco_e = '~~' + deco_e
        if '*' in m.group(0):
            deco_s += '**'
            deco_e = '**' + deco_e
        l = l.replace(m.group(0), deco_s + m.group(2) + deco_e)
    return l

(The highlight is strange, but I couldn't fix it)

####Link([URL title]、[Title URL]、[hoge]）

In Scrapbox[URL title]Or[Title URL]Express the link to the outside with. Don't think about the exact thinghttpI decided to interpret the one starting with as a URL. Also,[hoge]A format like this is a link to another page in Scrapbox. You can't use this link after Markdown output, but behind()By adding, only the appearance is like a link.

def convert_link(l: str) -> str:
    for m in re.finditer(r'\[(.+?)\]', ignore_code(l)):
        tmp = m.group(1).split(' ')
        if len(tmp) == 2:
            if tmp[0].startswith('http'):
                link, title = tmp
            else:
                title, link = tmp
            l = l.replace(m.group(0), f'[{title}]({link})')
        else:
            l = l.replace(m.group(0), m.group(0) + '()')
    return l

###exe conversion

Finally, use pyinstaller to exe. Make one exe file without console display.

pip install pyinstaller
pyinstaller sb2md.py -wF

Drag the JSON file into the exe file&You can run the program by dropping it.

##Finally

The code created this time isGitHubIt is placed in. When writing such a little process, I still find Python useful.

I started using Scrapbox just the other day, and I'm not very good at it now, so I plan to update it as soon as another usage comes out.