pandas has the ability to retrieve data. Note that I stumbled on the method when I wanted to use a part of the character string as a search condition, such as when the ID was grouped by the first character of several digits. I used Jupyter to execute the code. ** Please install pandas using pip etc. ** **
Python3.5 pandas0.17 Jupyter notebook
1st digit: A ~ E 2nd digit: A ~ E 3rd digit: 0 ~ 9 4th digit: 0 ~ 9 5th digit: 0 ~ 9 6th digit: 0 ~ 9
Example: AD3489
It's not very clever code, but it generated 200 ID-like strings and names.
python
import random
id = []
text=['A','B','C','D','E']
name = []
namelistF = ['Thom','Walter','Alexis','Allan','Hubert','Amy','Anastasia','Lorna','Barbara']
namelistL=['Abernethy','Ackroyd','Collins','Guises','Haworth','Hayden','Kingsley','Leyland','McLennan']
for i in range(200):
id.append(text[random.randint(0,len(text))-1] + text[random.randint(0,len(text))-1] + str(random.randint(0,len(text))) + str(random.randint(0,9)) + str(random.randint(0,9) )+ str(random.randint(0,9)))
name.append(namelistF[random.randint(0,len(namelist)-1)] + ' ' + namelistL[random.randint(0,len(namelist)-1)])
In this way, 200 pieces of data were generated.
Obtained ID data
['AD4865', 'DE5898', 'ED1278', 'EA4809', 'CA1251', 'EC2877', 'EC0579', 'EC3476', 'EE1624', 'AD3489', 'AC1209', 'EE3963', 'BC4656', 'BE3524', 'BE1321', 'CA4381', 'BB4813', 'AC4885', 'EA2290', 'DE3520', 'EC3145', 'CE5665', 'BE0071', 'EA0096', 'DC3792', 'CC1623', 'EE3194', 'EE1599', 'EE3398', 'EE0891', 'AB2412', 'AB3631', 'EC3118', 'DE2280', 'DA2018', 'CE4929', 'DA0157', 'BE3865', 'EE5122', 'ED5983', 'CD3569', 'BA3981', 'DE3913', 'BC3383', 'EE1456', 'AB0498', 'DA3937', 'AE2070', 'CE5804', 'DE2952', 'EB5014', 'AB0785', 'EC5061', 'AE1615', 'AC5219', 'EB3453', 'BB4372', 'EC4597', 'EE3370', 'CC1251', 'AA2228', 'CE3570', 'AC2734', 'DC0885', 'AC1142', 'DE3956', 'EE4113', 'EA4665', 'EB5010', 'CE1369', 'EC4333', 'AC1651', 'CE2802', 'BE3500', 'BB5521', 'DC5230', 'BA3568', 'AD4049', 'EE0666', 'CB1121', 'AA1281', 'CC1060', 'AD3520', 'BC3767', 'CD1125', 'DA0091', 'DA1978', 'BD2344', 'EE1345', 'AE2442', 'BC2306', 'CD1526', 'DD0514', 'EE0192', 'AE0517', 'ED3991', 'EE4694', 'AA1882', 'AC3296', 'ED1368', 'BD2772', 'BC1574', 'AD4402', 'DA2235', 'EC0168', 'CD0686', 'AB1050', 'BB2185', 'EA0274', 'DE4259', 'CB1594', 'ED3869', 'EA4427', 'CE0328', 'DD1004', 'BC4501', 'EE4727', 'CC5595', 'ED2344', 'ED4679', 'EE4167', 'AD5802', 'CB4401', 'EB2113', 'ED1493', 'CD3285', 'AA2717', 'AA1299', 'DA0957', 'CB5816', 'EE3520', 'DB2187', 'EA4323', 'DC3989', 'AA0663', 'DE5448', 'EE1418', 'EE2066', 'AA1373', 'EE1518', 'EE1782', 'CB2087', 'CE0266', 'AC4397', 'EC1308', 'EE2743', 'AD3849', 'CC3094', 'EE2725', 'BD3893', 'EC4543', 'DE2661', 'BE2766', 'DC2414', 'ED1347', 'CB2313', 'AD0026', 'ED1356', 'BE3893', 'EE3442', 'EC1069', 'EE2767', 'CA1410', 'EC5553', 'EA2432', 'AB5636', 'ED1639', 'DB3176', 'EE5932', 'EC1606', 'CC4372', 'EA2838', 'DE2964', 'DD3804', 'BC5171', 'BD4210', 'EE3239', 'AA3637', 'DA4393', 'CB3981', 'CD4627', 'EE0674', 'BE3231', 'BE0014', 'EE4350', 'BD1956', 'CE3526', 'EB2453', 'AB3957', 'EE5343', 'AD1178', 'CC4550', 'DD0465', 'CE2580', 'CD1344', 'ED0247', 'ED4614', 'DE0976', 'CE5883', 'AA2922']
Obtained name data
['Lorna Leyland', 'Anastasia Guises', 'Hubert Guises', 'Allan Kingsley', 'Lorna Kingsley', 'Amy Hayden', 'Hubert Abernethy', 'Anastasia Guises', 'Thom Hayden', 'Walter Kingsley', 'Allan Ackroyd', 'Thom Kingsley', 'Anastasia Collins', 'Thom Ackroyd', 'Thom Collins', 'Lorna Haworth', 'Anastasia Hayden', 'Thom Leyland', 'Alexis Leyland', 'Anastasia McLennan', 'Walter Haworth', 'Amy Collins', 'Barbara Leyland', 'Anastasia Ackroyd', 'Amy Kingsley', 'Barbara Haworth', 'Allan Ackroyd', 'Barbara Guises', 'Thom Haworth', 'Thom Ackroyd', 'Allan Guises', 'Barbara McLennan', 'Barbara Guises', 'Hubert Abernethy', 'Anastasia Haworth', 'Walter Collins', 'Allan Abernethy', 'Hubert Ackroyd', 'Thom McLennan', 'Thom Abernethy', 'Allan McLennan', 'Hubert Kingsley', 'Allan Leyland', 'Allan Haworth', 'Alexis Collins', 'Thom Ackroyd', 'Anastasia Haworth', 'Alexis McLennan', 'Amy Haworth', 'Walter Guises', 'Barbara Hayden', 'Lorna Collins', 'Lorna Abernethy', 'Alexis Abernethy', 'Anastasia Ackroyd', 'Allan Kingsley', 'Thom McLennan', 'Thom Hayden', 'Walter Hayden', 'Lorna Leyland', 'Alexis Leyland', 'Barbara Leyland', 'Barbara Collins', 'Amy Leyland', 'Lorna Hayden', 'Lorna Leyland', 'Walter Kingsley', 'Allan McLennan', 'Anastasia Haworth', 'Allan Guises', 'Anastasia Abernethy', 'Walter Kingsley', 'Amy Abernethy', 'Allan Guises', 'Lorna Ackroyd', 'Alexis Guises', 'Thom Abernethy', 'Lorna McLennan', 'Allan Abernethy', 'Barbara Hayden', 'Amy Collins', 'Allan Kingsley', 'Thom Haworth', 'Thom Kingsley', 'Alexis Leyland', 'Amy Hayden', 'Hubert Hayden', 'Thom Ackroyd', 'Walter Abernethy', 'Barbara Guises', 'Walter Collins', 'Hubert Abernethy', 'Allan Ackroyd', 'Barbara Kingsley', 'Anastasia Hayden', 'Lorna Abernethy', 'Thom Hayden', 'Thom Leyland', 'Amy Leyland', 'Thom Guises', 'Alexis Hayden', 'Walter Guises', 'Anastasia Leyland', 'Thom Ackroyd', 'Allan Collins', 'Barbara Collins', 'Allan Hayden', 'Thom Collins', 'Barbara McLennan', 'Lorna Haworth', 'Walter Hayden', 'Barbara Guises', 'Alexis Collins', 'Lorna Hayden', 'Barbara Haworth', 'Thom Guises', 'Hubert Guises', 'Anastasia Haworth', 'Thom Ackroyd', 'Hubert Haworth', 'Hubert Abernethy', 'Anastasia Leyland', 'Amy Leyland', 'Walter Guises', 'Thom McLennan', 'Hubert Collins', 'Barbara McLennan', 'Anastasia Guises', 'Amy Ackroyd', 'Barbara McLennan', 'Anastasia Guises', 'Walter Hayden', 'Amy Kingsley', 'Lorna Hayden', 'Walter Guises', 'Barbara Guises', 'Thom Kingsley', 'Anastasia Abernethy', 'Amy Kingsley', 'Walter Kingsley', 'Walter McLennan', 'Alexis Ackroyd', 'Lorna Kingsley', 'Amy Collins', 'Anastasia Haworth', 'Amy Haworth', 'Lorna Kingsley', 'Allan Guises', 'Anastasia Guises', 'Amy Haworth', 'Barbara Guises', 'Lorna Leyland', 'Allan Hayden', 'Alexis Leyland', 'Lorna Ackroyd', 'Alexis Guises', 'Walter Collins', 'Lorna Abernethy', 'Hubert Collins', 'Hubert McLennan', 'Alexis Collins', 'Lorna Leyland', 'Barbara Hayden', 'Lorna Kingsley', 'Barbara Haworth', 'Thom Leyland', 'Lorna Ackroyd', 'Walter Guises', 'Allan Haworth', 'Hubert McLennan', 'Walter Haworth', 'Lorna Ackroyd', 'Allan Hayden', 'Lorna Abernethy', 'Allan Abernethy', 'Allan Ackroyd', 'Lorna Haworth', 'Barbara Hayden', 'Thom Haworth', 'Alexis Hayden', 'Barbara Leyland', 'Alexis Ackroyd', 'Walter Abernethy', 'Anastasia Abernethy', 'Thom Kingsley', 'Hubert Haworth', 'Amy Hayden', 'Hubert Guises', 'Walter Hayden', 'Allan Guises', 'Allan Ackroyd', 'Hubert Abernethy', 'Amy McLennan', 'Allan Abernethy', 'Walter Kingsley', 'Allan Haworth', 'Hubert Hayden', 'Lorna McLennan', 'Lorna Collins', 'Lorna Haworth']
Put this ID array and name array in a dictionary type called data.
data = {
'ID':
['AD4865', 'DE5898', 'ED1278', 'EA4809', 'CA1251', 'EC2877', 'EC0579', 'EC3476', 'EE1624', 'AD3489', 'AC1209', 'EE3963', 'BC4656', 'BE3524', 'BE1321', 'CA4381', 'BB4813', 'AC4885', 'EA2290', 'DE3520', 'EC3145', 'CE5665', 'BE0071', 'EA0096', 'DC3792', 'CC1623', 'EE3194', 'EE1599', 'EE3398', 'EE0891', 'AB2412', 'AB3631', 'EC3118', 'DE2280', 'DA2018', 'CE4929', 'DA0157', 'BE3865', 'EE5122', 'ED5983', 'CD3569', 'BA3981', 'DE3913', 'BC3383', 'EE1456', 'AB0498', 'DA3937', 'AE2070', 'CE5804', 'DE2952', 'EB5014', 'AB0785', 'EC5061', 'AE1615', 'AC5219', 'EB3453', 'BB4372', 'EC4597', 'EE3370', 'CC1251', 'AA2228', 'CE3570', 'AC2734', 'DC0885', 'AC1142', 'DE3956', 'EE4113', 'EA4665', 'EB5010', 'CE1369', 'EC4333', 'AC1651', 'CE2802', 'BE3500', 'BB5521', 'DC5230', 'BA3568', 'AD4049', 'EE0666', 'CB1121', 'AA1281', 'CC1060', 'AD3520', 'BC3767', 'CD1125', 'DA0091', 'DA1978', 'BD2344', 'EE1345', 'AE2442', 'BC2306', 'CD1526', 'DD0514', 'EE0192', 'AE0517', 'ED3991', 'EE4694', 'AA1882', 'AC3296', 'ED1368', 'BD2772', 'BC1574', 'AD4402', 'DA2235', 'EC0168', 'CD0686', 'AB1050', 'BB2185', 'EA0274', 'DE4259', 'CB1594', 'ED3869', 'EA4427', 'CE0328', 'DD1004', 'BC4501', 'EE4727', 'CC5595', 'ED2344', 'ED4679', 'EE4167', 'AD5802', 'CB4401', 'EB2113', 'ED1493', 'CD3285', 'AA2717', 'AA1299', 'DA0957', 'CB5816', 'EE3520', 'DB2187', 'EA4323', 'DC3989', 'AA0663', 'DE5448', 'EE1418', 'EE2066', 'AA1373', 'EE1518', 'EE1782', 'CB2087', 'CE0266', 'AC4397', 'EC1308', 'EE2743', 'AD3849', 'CC3094', 'EE2725', 'BD3893', 'EC4543', 'DE2661', 'BE2766', 'DC2414', 'ED1347', 'CB2313', 'AD0026', 'ED1356', 'BE3893', 'EE3442', 'EC1069', 'EE2767', 'CA1410', 'EC5553', 'EA2432', 'AB5636', 'ED1639', 'DB3176', 'EE5932', 'EC1606', 'CC4372', 'EA2838', 'DE2964', 'DD3804', 'BC5171', 'BD4210', 'EE3239', 'AA3637', 'DA4393', 'CB3981', 'CD4627', 'EE0674', 'BE3231', 'BE0014', 'EE4350', 'BD1956', 'CE3526', 'EB2453', 'AB3957', 'EE5343', 'AD1178', 'CC4550', 'DD0465', 'CE2580', 'CD1344', 'ED0247', 'ED4614', 'DE0976', 'CE5883', 'AA2922'],
'Name':
['Lorna Leyland', 'Anastasia Guises', 'Hubert Guises', 'Allan Kingsley', 'Lorna Kingsley', 'Amy Hayden', 'Hubert Abernethy', 'Anastasia Guises', 'Thom Hayden', 'Walter Kingsley', 'Allan Ackroyd', 'Thom Kingsley', 'Anastasia Collins', 'Thom Ackroyd', 'Thom Collins', 'Lorna Haworth', 'Anastasia Hayden', 'Thom Leyland', 'Alexis Leyland', 'Anastasia McLennan', 'Walter Haworth', 'Amy Collins', 'Barbara Leyland', 'Anastasia Ackroyd', 'Amy Kingsley', 'Barbara Haworth', 'Allan Ackroyd', 'Barbara Guises', 'Thom Haworth', 'Thom Ackroyd', 'Allan Guises', 'Barbara McLennan', 'Barbara Guises', 'Hubert Abernethy', 'Anastasia Haworth', 'Walter Collins', 'Allan Abernethy', 'Hubert Ackroyd', 'Thom McLennan', 'Thom Abernethy', 'Allan McLennan', 'Hubert Kingsley', 'Allan Leyland', 'Allan Haworth', 'Alexis Collins', 'Thom Ackroyd', 'Anastasia Haworth', 'Alexis McLennan', 'Amy Haworth', 'Walter Guises', 'Barbara Hayden', 'Lorna Collins', 'Lorna Abernethy', 'Alexis Abernethy', 'Anastasia Ackroyd', 'Allan Kingsley', 'Thom McLennan', 'Thom Hayden', 'Walter Hayden', 'Lorna Leyland', 'Alexis Leyland', 'Barbara Leyland', 'Barbara Collins', 'Amy Leyland', 'Lorna Hayden', 'Lorna Leyland', 'Walter Kingsley', 'Allan McLennan', 'Anastasia Haworth', 'Allan Guises', 'Anastasia Abernethy', 'Walter Kingsley', 'Amy Abernethy', 'Allan Guises', 'Lorna Ackroyd', 'Alexis Guises', 'Thom Abernethy', 'Lorna McLennan', 'Allan Abernethy', 'Barbara Hayden', 'Amy Collins', 'Allan Kingsley', 'Thom Haworth', 'Thom Kingsley', 'Alexis Leyland', 'Amy Hayden', 'Hubert Hayden', 'Thom Ackroyd', 'Walter Abernethy', 'Barbara Guises', 'Walter Collins', 'Hubert Abernethy', 'Allan Ackroyd', 'Barbara Kingsley', 'Anastasia Hayden', 'Lorna Abernethy', 'Thom Hayden', 'Thom Leyland', 'Amy Leyland', 'Thom Guises', 'Alexis Hayden', 'Walter Guises', 'Anastasia Leyland', 'Thom Ackroyd', 'Allan Collins', 'Barbara Collins', 'Allan Hayden', 'Thom Collins', 'Barbara McLennan', 'Lorna Haworth', 'Walter Hayden', 'Barbara Guises', 'Alexis Collins', 'Lorna Hayden', 'Barbara Haworth', 'Thom Guises', 'Hubert Guises', 'Anastasia Haworth', 'Thom Ackroyd', 'Hubert Haworth', 'Hubert Abernethy', 'Anastasia Leyland', 'Amy Leyland', 'Walter Guises', 'Thom McLennan', 'Hubert Collins', 'Barbara McLennan', 'Anastasia Guises', 'Amy Ackroyd', 'Barbara McLennan', 'Anastasia Guises', 'Walter Hayden', 'Amy Kingsley', 'Lorna Hayden', 'Walter Guises', 'Barbara Guises', 'Thom Kingsley', 'Anastasia Abernethy', 'Amy Kingsley', 'Walter Kingsley', 'Walter McLennan', 'Alexis Ackroyd', 'Lorna Kingsley', 'Amy Collins', 'Anastasia Haworth', 'Amy Haworth', 'Lorna Kingsley', 'Allan Guises', 'Anastasia Guises', 'Amy Haworth', 'Barbara Guises', 'Lorna Leyland', 'Allan Hayden', 'Alexis Leyland', 'Lorna Ackroyd', 'Alexis Guises', 'Walter Collins', 'Lorna Abernethy', 'Hubert Collins', 'Hubert McLennan', 'Alexis Collins', 'Lorna Leyland', 'Barbara Hayden', 'Lorna Kingsley', 'Barbara Haworth', 'Thom Leyland', 'Lorna Ackroyd', 'Walter Guises', 'Allan Haworth', 'Hubert McLennan', 'Walter Haworth', 'Lorna Ackroyd', 'Allan Hayden', 'Lorna Abernethy', 'Allan Abernethy', 'Allan Ackroyd', 'Lorna Haworth', 'Barbara Hayden', 'Thom Haworth', 'Alexis Hayden', 'Barbara Leyland', 'Alexis Ackroyd', 'Walter Abernethy', 'Anastasia Abernethy', 'Thom Kingsley', 'Hubert Haworth', 'Amy Hayden', 'Hubert Guises', 'Walter Hayden', 'Allan Guises', 'Allan Ackroyd', 'Hubert Abernethy', 'Amy McLennan', 'Allan Abernethy', 'Walter Kingsley', 'Allan Haworth', 'Hubert Hayden', 'Lorna McLennan', 'Lorna Collins', 'Lorna Haworth']
}
Since the data is generated using random numbers, the result will be different from this article, but it is the same even if you do the following.
data = {
'ID':id,
'Name':name
}
Create a DataFrame from the data created earlier.
import pandas ps pd
frame = pd.DataFrame(data)
The DataFrame was created in pandas like this.
Use ** str.contains () ** to search pandas data using regular expressions. When you want to check "when the first character of the 6-digit ID starts with E"
frame[frame['ID'].str.contains('E.....')]
Then you can get a list of lines whose ID starts with E like this. 71 data hits. The'E .....'part is a regular expression. "." Means something for one character. 'E .....' means to search for the 6 letters starting with E. If you try'E ......', it means 7 characters starting with E, so it will not hit.
To find data with 1 in the 3rd digit
frame[frame['ID'].str.contains('..1...')]
It will be.
There are still other ways to write a regular expression, but if you want to search, for example, if the first digit is AA or AB.
frame[frame['ID'].str.contains('(AA|AB)....')]
will do. If you write this way, you will get a warning saying User Warning: This pattern has match groups. To actually get the groups, use str. Extract., But that's okay.
The result is the same regardless of which one is big, but as pandas it was written with or condition
frame[(frame['ID'].str.contains('AA....')) | (frame['ID'].str.contains('AB....')) ]
I think the grammar of is more correct.
When using and condition in pandas
frame[(frame['ID'].str.contains('A.....')) & (frame['ID'].str.contains('...1..')) ]
Scratch like.
I feel that how to write and and or is a point to stumble in pandas. ** frame [(Condition 1) & (Condition 2) | (Condition 3)] **.
If you want to search under the condition that "the first digit of the ID is A and the name is Walter", use the and condition.
frame[(frame['ID'].str.contains('A.....')) & (frame['Name'].str.contains('Walter.*'))]
I will write.
'. \ *'Means "anything is fine, so the string continues". If you enter'Walter. \ *', All strings starting with'Walter' will be hit.
In the case of the search condition "starting with Walter and ending with s" in the previous conditions
frame[(frame['ID'].str.contains('A.....')) & (frame['Name'].str.contains('Walter.*s'))]
It seems that it should be done like this.
However, when I looked at the execution result, I got the result of Walter Kingsley. Apparently it hit the s of King "s" ley. Use $ to specify that s is the last character.frame[(frame['ID'].str.contains('A.....')) & (frame['Name'].str.contains('Walter.*s$'))]
Then
The $ mark has the meaning of searching for the end of a regular expression string. So I didn't hit Kingsley's s, but Guises and Collins.I didn't know how to search based on the first character of the data stored in the DataFrame, so I was forced to search by turning the for loop, but if you use this method, you can search quickly with a short syntax.
Recommended Posts