The record addition node adds data vertically in SPSS Modeler. This is a processing process that corresponds to UNION ALL in SQL. Let's rewrite this with Python pandas.
This is done using the following two time-series sensor data. Similar data items, but with different column names or only one column.
■ Data 1: Cond4n_e104.csv M_CD: Machine code UP_TIIME: Uptime POWER: Power TEMP: Temperature ERR_CD: Error code
■ Data 2: COND2n.csv Time: Uptime Power: Power Temperature: Temperature Pressure: Pressure Uptime: Uptime Status: Status code Outcome: error code
Add data 2 "COND2n.csv" according to the column of data 1 "Cond4n_e104.csv".
First, use the filter node to match the column of data 2 to the column name of data 1.
Then connect the record addition node. Since the column corresponding to M_CD does not exist in COND2n.csv of data 2, NULL is entered.
Data 2 has been added to data 1 as shown below.
By the way, in the record addition node, the default field match criterion is "name", but you can add it based on the column position even if the name is different. Also, if you want to add Pressure etc. that is included only in the data 2 to be added, you can add it by selecting "All datasets" in the field input source. It is also possible to add a tag string that indicates which data came from.
Use rename and drop to perform the process corresponding to the filter node. Use rename to align the column name with data 1, and drop to delete unnecessary columns.
#Align the column of data 2 with the column name of data 1.
df2_1=df2.rename(columns={'Time': 'UP_TIME', 'Power': 'POWER', 'Temperature': 'TEMP', 'Outcome': 'ERR_CD'})\
.drop(['Pressure','Uptime','Status'],axis=1)
df2_1
Next, record addition processing corresponding to the record addition node is performed. There are two methods, append and concat. The result is the same in both cases. When combining 3 or more data, I think it is easier to understand how to write concat.
#How to use append
df1.append(df2_1)
#How to use concat
pd.concat([df1,df2_1])
The sample is placed below.
stream https://github.com/hkwd/200611Modeler2Python/raw/master/append/append.str notebook https://github.com/hkwd/200611Modeler2Python/blob/master/append/append.ipynb data https://raw.githubusercontent.com/hkwd/200611Modeler2Python/master/data/Cond4n_e104.csv https://raw.githubusercontent.com/hkwd/200611Modeler2Python/master/data/COND2n.csv
■ Test environment Modeler 18.2.2 Windows 10 64bit Python 3.7.9 pandas 1.0.5
Duplicate record node https://www.ibm.com/support/knowledgecenter/ja/SS3RA7_18.2.1/modeler_mainhelp_client_ddita/clementine/distinct_settingstab.html
Recommended Posts