[Python3 / ElementTree] If you cannot access the XPath format well, check the XML hierarchy carefully ... (self-advised)

~~ I checked the tutorial on the net and it didn't work, so I will post it as a reminder. ~~ [2019/11/22 postscript] It's embarrassing, but this case seems to have been caused by my misunderstanding, and the problem I was presenting at the beginning did not occur in the first place. .. .. I will leave the article because the behavior around ".//" seems to be a little helpful, including the meaning of self-discipline.

What happened

I had an opportunity to handle XML (character code is EUC-JP) in python3, and I was dealing with it

test.xml(The file that was originally handled had more tags than this, and there were many attribute values, etc.)


The access methods to the elements are as follows.


import xml.etree.ElementTree as Et

def test():
    # ElementTree.parse()Seems to only support unicode
    #Once you open the file and get the string from read
    with open(r'HogePath/FugaPath/test.xml', 'r', encoding='euc_jp') as f:
        root: Et.Element = Et.fromstring(f.read())

    #hoge When getting the text "aaa" of hoge

if __name__=='__main__':

As XML became complicated, I wanted to take it with XPath, so refer to the information on the net

I finally realized that I was wrong in the first place(Self-discipline)


I tried to write, but it didn't work and I got "Index Error: list index out of range". In the first place, the following result itself is empty ... ~~ In other words, it seems that it is not taken correctly. ~~ ** ← This is wrong! !! ** **


** The example is too simple and easy to understand, but at this time I overlooked the "tag". .. .. </ font> The following is a method that was half-forced to solve this oversight. ** **

(Wrong) solution (it can be done)

There seems to be a problem with the specification method around the root of XPath


It was solved by changing the head part to ".//" like. ** ← It seems that it was solved by force ** I wonder if I should remember that it is the same as the URL etc ... I haven't followed the principle like this, so I'd be happy if anyone could teach me. ~~

[Updated on November 22, 2019] According to what @LOZTPX taught in the comments

// represents the set of all descendants of the starting node, omitting the node path.

I see! So even if I overlooked the above, I was picking it up without being aware of the "tag" ... This was a learning experience.


Let's see the contents of the XML file to be handled properly ()

Recommended Posts

[Python3 / ElementTree] If you cannot access the XPath format well, check the XML hierarchy carefully ... (self-advised)
If you are told cannot by Python import, review the file name
Check if the URL exists in Python
Check if the characters are similar in Python