~~ I checked the tutorial on the net and it didn't work, so I will post it as a reminder. ~~ [2019/11/22 postscript] It's embarrassing, but this case seems to have been caused by my misunderstanding, and the problem I was presenting at the beginning did not occur in the first place. .. .. I will leave the article because the behavior around ".//" seems to be a little helpful, including the meaning of self-discipline.
I had an opportunity to handle XML (character code is EUC-JP) in python3, and I was dealing with it
test.xml(The file that was originally handled had more tags than this, and there were many attribute values, etc.)
<root><tag>
<hoge>
<hogehoge>aaa</hogehoge>
</hoge>
<fuga>bbb</fuga>
<fugo>ccc</fugo>
・
(Omitted)
・
</tag>
<tag2>
・
(Omitted)
・
</tag2>
</root>
The access methods to the elements are as follows.
test.py
import xml.etree.ElementTree as Et
def test():
# ElementTree.parse()Seems to only support unicode
#Once you open the file and get the string from read
with open(r'HogePath/FugaPath/test.xml', 'r', encoding='euc_jp') as f:
root: Et.Element = Et.fromstring(f.read())
#hoge When getting the text "aaa" of hoge
print(root[0][0][0].text)
if __name__=='__main__':
test()
As XML became complicated, I wanted to take it with XPath, so refer to the information on the net
I finally realized that I was wrong in the first place(Self-discipline)
print(root.findall('./hoge/hogehoge')[0].text)
I tried to write, but it didn't work and I got "Index Error: list index out of range". In the first place, the following result itself is empty ... ~~ In other words, it seems that it is not taken correctly. ~~ ** ← This is wrong! !! ** **
print(root.findall('./hoge/hogehoge'))
** The example is too simple and easy to understand, but at this time I overlooked the "tag". .. .. </ font> The following is a method that was half-forced to solve this oversight. ** **
There seems to be a problem with the specification method around the root of XPath
print(root.findall('.//hoge/hogehoge')[0].text)
It was solved by changing the head part to ".//" like. ** ← It seems that it was solved by force ** I wonder if I should remember that it is the same as the URL etc ... I haven't followed the principle like this, so I'd be happy if anyone could teach me. ~~
[Updated on November 22, 2019] According to what @LOZTPX taught in the comments
// represents the set of all descendants of the starting node, omitting the node path.
I see! So even if I overlooked the above, I was picking it up without being aware of the "tag" ... This was a learning experience.
Let's see the contents of the XML file to be handled properly ()