SAS Viya is an AI platform. It is available through languages such as Python, Java and R. A table object called CASTable is used in SAS Viya (CAS stands for Cloud Analytic Services). This time, I will introduce how to change the extraction conditions when viewing the data status in CASTable.
First, connect to SAS Viya.
import swat
conn = swat.CAS('server-name.mycompany.com', 5570, 'username', 'password')
Then get the CASTable. This time, I will use CSV of IRIS data.
tbl = conn.loadtable('data/iris.csv', caslib='casuser').casTable
Use the describe
method to see what data you have.
tbl.describe()
The result will be returned as follows. You can see the number of rows, standard deviation, minimum value, maximum value, and data worth 25% / 50% / 75%.
sepal_length | sepal_width | petal_length | petal_width |
---|---|---|---|
count | 150.000000 | 150.000000 | 150.000000 |
mean | 5.843333 | 3.054000 | 3.758667 |
std | 0.828066 | 0.433594 | 1.764420 |
min | 4.300000 | 2.000000 | 1.000000 |
25% | 5.100000 | 2.800000 | 1.600000 |
50% | 5.800000 | 3.000000 | 4.350000 |
75% | 6.400000 | 3.300000 | 5.100000 |
max | 7.900000 | 4.400000 | 6.900000 |
Changing the percentiles
will change the data retrieved. The following is an example of changing to 30% and 80% data.
tbl.describe(percentiles=[0.3, 0.8])
sepal_length | sepal_width | petal_length | petal_width |
---|---|---|---|
count | 150.000000 | 150.000000 | 150.000000 |
mean | 5.843333 | 3.054000 | 3.758667 |
std | 0.828066 | 0.433594 | 1.764420 |
min | 4.300000 | 2.000000 | 1.000000 |
30% | 5.250000 | 2.800000 | 1.700000 |
50% | 5.800000 | 3.000000 | 4.350000 |
80% | 6.550000 | 3.400000 | 5.350000 |
max | 7.900000 | 4.400000 | 6.900000 |
Specify ʻinclude ='all'` to check all data.
tbl.describe(include='all')
sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|
count | 150 | 150 | 150 | 150 |
unique | 35 | 23 | 43 | 22 |
top | 5 | 3 | 1.5 | 0.2 |
freq | 10 | 26 | 14 | 28 |
mean | 5.84333 | 3.054 | 3.75867 | 1.19867 |
std | 0.828066 | 0.433594 | 1.76442 | 0.763161 |
min | 4.3 | 2 | 1 | 0.1 |
25% | 5.1 | 2.8 | 1.6 | 0.3 |
50% | 5.8 | 3 | 4.35 | 1.3 |
75% | 6.4 | 3.3 | 5.1 | 1.8 |
max | 7.9 | 4.4 | 6.9 | 2.5 |
The number can also be a floating point number.
tbl.describe(stats='all')
sepal_length | sepal_width | petal_length | petal_width |
---|---|---|---|
count | 1.500000e+02 | 1.500000e+02 | 1.500000e+02 |
unique | 3.500000e+01 | 2.300000e+01 | 4.300000e+01 |
mean | 5.843333e+00 | 3.054000e+00 | 3.758667e+00 |
std | 8.280661e-01 | 4.335943e-01 | 1.764420e+00 |
min | 4.300000e+00 | 2.000000e+00 | 1.000000e+00 |
25% | 5.100000e+00 | 2.800000e+00 | 1.600000e+00 |
50% | 5.800000e+00 | 3.000000e+00 | 4.350000e+00 |
75% | 6.400000e+00 | 3.300000e+00 | 5.100000e+00 |
max | 7.900000e+00 | 4.400000e+00 | 6.900000e+00 |
nmiss | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
sum | 8.765000e+02 | 4.581000e+02 | 5.638000e+02 |
stderr | 6.761132e-02 | 3.540283e-02 | 1.440643e-01 |
var | 6.856935e-01 | 1.880040e-01 | 3.113179e+00 |
uss | 5.223850e+03 | 1.427050e+03 | 2.583000e+03 |
cv | 1.417113e+01 | 1.419759e+01 | 4.694272e+01 |
tvalue | 8.642537e+01 | 8.626430e+01 | 2.609020e+01 |
probt | 3.331256e-129 | 4.374977e-129 | 1.994305e-57 |
You can use the describe method to get an overview of the data in the CASTable. Please use it as a base for data analysis.
Recommended Posts