When the OCR reading result (field_value
) is" Aiue Bank Kakikuke Branch Ordinary 1234567 "
What should I do if I want to output the bank name and branch name separately as separate items from here?
Split can be used in such cases.
First of all, if the field_value
is" Aiue Bank Kakikuke Branch Ordinary 1234567 ",
The process to retrieve only the bank name is as follows.
Use split to retrieve bank name
#Take out the bank name
field_value = field_value.split("Bank")[0]
You can see that "Aiue" could be taken out from this ↓.
Also, when field_value
is" Aiue Bank Kakikuke Branch Ordinary 12345678 ",
The process for extracting only the branch name is as follows.
Use split to retrieve branch name
#Extract the branch name of the bank
field_value = field_value.split("Bank")[1]
field_value = field_value.split("Branch")[0]
You can see that "Kakikuke" was taken out by this kind of ↓.
This is an explanation of why the bank name and branch name can be retrieved by performing the above processing.
If you understand, please skip it.
Split is the process of "dividing a string with a delimiter into a list".
You might think, "What is a list ?!" It's okay if you can roughly think of it as "a mass of multiple elements lined up". (For more information, search for "Python list" etc.)
Split processing can be performed in the manner of processing target.split (delimiter)
.
The process field_value.split ("bank ")
is
It means "list field_value
separated by the letters " bank "
.
By doing this, you can retrieve the following list.
[" Aiue "," Kakikuke Branch Ordinary 1234567 "]
The above is a list with two elements
The first element is " Aiue "
, and the second element is " Kakikuke Branch Ordinary 1234567 "
.
In the above process, the character string was made into a "list" on a sunny day. Now it's time to retrieve the elements from that list.
You can do this in the same way as list [index]
.
Roughly speaking, the "index" is a number ** that indicates the number of the element in the list **, but it is a little caveat that ** starts from zero **.
Based on the story so far, let's take a look at the process of extracting the bank name again.
Use split to retrieve bank name
#Take out the bank name
field_value = field_value.split("Bank")[0]
This means "make a list of the contents of field_value
separated by the letters " bank "
. Then take out the first element of the list (= index is zero!) And make it field_value
. Put it in. "
"List the contents of field_value
separated by the letters " bank "
. "Isfield_value.split (" bank")
,
"And take out the first element of the list (= index is zero!)" Is [0]
,
"Put it in field_value
"is infield_value =
,
Each is applicable.
If you can hold it down to this point, you will be able to understand the processing of branch names immediately.
Use split to retrieve branch name
#Extract the branch name of the bank
field_value = field_value.split("Bank")[1]
field_value = field_value.split("Branch")[0]
The first line is a list of the contents of field_value
separated by the letters"bank"
. Then, take out the second element of the list (= index is 1!) And make it field_value. It means "please put it in
. "
So, at this point, the content of field_value
is"Kakikuke branch normal 1234567"
.
For that field_value
, in the second line,
"Separate the contents of field_value
with the letters " branch "
to make a list. Then take out the first element of the list (= index is zero!) And put it in field_value
. "
We are doing the processing.
At the time of "List the contents of field_value
separated by the characters " branch "
"
Since there is a list of [" Kakikuke "," Normal 1234567 "]
By taking out the first element of this guy, we were able to take out the branch name "Kakikuke"!
Recommended Posts