SAS Viya is an AI platform. It is available through languages such as Python, Java and R. A table object called CASTable is used in SAS Viya (CAS stands for Cloud Analytic Services). This time, I will introduce how to extract only some columns and display the information in CASTable.
First, connect to SAS Viya.
import swat
conn = swat.CAS('server-name.mycompany.com', 5570, 'username', 'password')
Then get the CASTable. This time, I will use CSV of IRIS data.
tbl = conn.loadtable('data/iris.csv', caslib='casuser').casTable
You can retrieve the column by specifying the key with tbl as the dict.
col = tbl['sepal_width']
col
The output looks like this:
CASColumn('DATA.IRIS', caslib='CASUSER(username)')['sepal_width'].sort_values(['sepal_length', 'sepal_width'], ascending=[False, True])
If you use the head
method of a column, only the value of that column will be output.
col.head()
0 3.8
1 2.6
2 2.8
3 3.0
4 3.8
Name: sepal_width, dtype: float64
Similarly, if you give the key as an array, you can get multiple columns.
widths = tbl[['sepal_width', 'petal_width', 'species']]
The contents are as follows.
sepal_width | petal_width | species | |
---|---|---|---|
0 | 3.8 | 2.0 | virginica |
1 | 2.6 | 2.3 | virginica |
2 | 2.8 | 2.0 | virginica |
3 | 3.0 | 2.3 | virginica |
4 | 3.8 | 2.2 | virginica |
You can also check the summary of the data.
widths.describe()
sepal_width | petal_width | |
---|---|---|
count | 150.000000 | 150.000000 |
mean | 3.054000 | 1.198667 |
std | 0.433594 | 0.763161 |
min | 2.000000 | 0.100000 |
25% | 2.800000 | 0.300000 |
50% | 3.000000 | 1.300000 |
75% | 3.300000 | 1.800000 |
max | 4.400000 | 2.500000 |
The column information is displayed in the same way.
widths.columninfo()
Column | ID | Type | RawLength | FormattedLength | NFL | NFD | |
---|---|---|---|---|---|---|---|
0 | sepal_width | 2 | double | 8 | 12 | 0 | 0 |
1 | petal_width | 4 | double | 8 | 12 | 0 | 0 |
2 | species | 5 | varchar | 10 | 10 | 0 | 0 |
If you get some columns, you can narrow down the analysis to only the ones you need even if the table has many columns. It can be used when there is too much numerical data and you do not know where to analyze.