Last time, I introduced the most used XPath writing method, and this time, I introduced the functions that are often used for XPath to specify data more correctly. I will.
contains () is typically used to fuzzy search for a string contained in an attribute value or text.
-** contains (@class, "XXX"): Specify the element whose attribute value contains a specific character string **
For example, if you want to get everything with Red in the class attribute from this HTML, write as follows.
//span[contains(@class,“Red”)]
In other words, this XPath means to get a span element ** that contains Red in ** class.
-** contains (text (), "XXX"): Specify elements whose text contains a specific character string **
For example, if you want to specify an element containing the characters "Rowling" from this HTML, write as follows.
//span[contains(text(),"Rowling")]
** Tips! ** ** When specifying the page feed button, ** "contains (text ()," next ")" ** is often used. Click here for how to write an XPath that specifies a page forward button ➡ How to write an XPath that specifies a page forward button
In the previous article, I introduced that you can get the elements of order by enclosing a number in [](square bracket). You can also specify the Nth element in position.
For example, in the above HTML, "Product 3" is the 4th th element, so write it as follows.
//tbody/th[4]
Using position () =, write as follows.
//tbody/th[position()=4]
When getting an element other than "advertisement", since "advertisement" is the first th element, write as follows.
//tbody/th[position()>1]
If you want to specify an element that contains multiple conditions at the same time, use the and / not / or function.
-** and-Specify elements that match multiple conditions **
If you want to get the href including "S_20" and "pdf" from this HTML, write as follows.
//a[contains(@href,“S_20”) and contains(@href,“pdf”)]
-** not-Specify an element that does not include specific conditions **
If you want to get [@href] other than https://helpcenter.octoparse.jp/hc/ja/xpath/S_10.html from this HTML, write as follows.
//a[not(contains(@href, "S_10"))]
-** or-Specify an element that matches any of the conditions **
If you want to get the href containing M or L from this HTML, write as follows.
//a[contains(@href,”M_”) or contains(@href,”L_”)]
Also, if you want to get a href other than M or L, combine not and or and write as follows.
//a[not(contains(@href,”M_”) or contains(@href,”L_”))]
The above are the functions often used for XPath. If you want to understand more XPath syntax / functions, please see this article.
Original article: https://helpcenter.octoparse.jp/hc/ja/articles/360012713639
Recommended Posts