This is a collection of self-made questions that I made as one of the study methods in the Python3 engineer certification data analysis test that I took in November 2020. I hope it will help those who are going to take the exam.
The experience report is summarized in this article ↓ https://qiita.com/pon_maeda/items/a6c008fb3d993278fccb
――This collection of questions is created in the form of questions such as answering each question and filling in the blanks so that you can easily solve it in the gap time. -** Please note that the actual test is a four-choice format (as of November 15, 2020) . - It is a little more difficult than the actual exam. ** ** ――Since it was roughly created for personal use, it may not be a problem statement. Please forgive me.
Machine learning is roughly divided into three. There are three types: () learning, () learning, and () learning.
The () variable, also known as the correct label, is used only for () learning.
The method used when this correct label is a continuous value is (), and the method used when it is another value is ().
What are the two main methods of unsupervised learning?
venv is a tool that allows you to use different versions of Python. (Yes / No)
A function that allows you to specify a file name with a wildcard in Python.
Japanese reading of sin, cos, and tan.
How many Napiers are there?
What is the logarithm of 1?
The factorial of 1 is.
Suppose you are told that if you roll a hexahedral dice once, you will get an odd number, although the number of rolls is unknown. The probability in this case is called the () probability, which is the basis of the () theorem.
4.1. NumPy
NumPy has a type for arrays () and a type for matrices ().
One of the features of ↑ is that you can use multiple types or make one type.
Function to check the size in an array
The ravel function returns (), while the flatten function returns ().
Function to check the type of array
Function to convert array type
A function that generates a uniform random number of integers
A function that generates a uniform random number of decimals
A function that creates a random number from a standard normal distribution of integers
Is the standard normal distribution the mean () or variance () distribution?
What is the function to generate a normal distribution random number by specifying the mean and standard deviation?
A function that creates an identity matrix with the specified diagonal elements
A function that creates an array of specified values for all elements
A function that creates an evenly divided array in a specified range
A function that allows you to see the differences between the elements of an array
a = [1, 2, 3]
b = [4, 5, 6]
np.concatnate([a, b])
Then, which of the following is possible?
[1, 2, 3, 4, 5, 6]
[[1, 2, 3],[4, 5, 6]]
[1, 2, 3, [4, 5, 6]]
The np.concatnate function is (row or column) directional concatenation in the case of concatenation between one-dimensional arrays.
The np.concatnate function is concatenated in the (row or column) direction by default when concatenating two-dimensional arrays.
If the argument axis = 1 is specified for this function, it becomes () direction concatenation.
A function that divides a two-dimensional array in the column direction.
A function that splits a two-dimensional array in the row direction
What does transpose of a two-dimensional array mean?
If you have a two-dimensional array called a, how do you transpose it?
What is a function that increases the dimension of a one-dimensional array without specifying the number of elements?
a = np.array([1, 5, 4])
# array([[1, 5, 4]])
How can I use the above function to increase the dimensions as described above?
a = np.array([1, 5, 4])
# array([[1],
[5],
[4]])
How can I use the above function to increase the dimensions as described above?
What is the function that generates the grid data?
np.arange(1, 10, 3)
What will happen to this result?
4.1.3. NumPy features What is NumPy's convenience function group that converts array elements such as sin () and log () at once?
A function that returns the absolute value of an array element
a = np.array([0, 1, 2])
b = np.array([[-3, -2, -1],
[0, 1, 2]])
a + b
As mentioned above, what is the sum of the two-dimensional array and the one-dimensional array?
What does it mean to be able to compute a scalar on an array?
What does the @ operator mean?
A_matrix @ B_matrix
In a different way.
A function that calculates the number of True in an array of truth.
--np.count_nonzero method --A function that outputs the number of non-zero elements. --Python treats False as 0, so it counts the number of True. --np.sum function --Function to add in elements --Python treats True as 1, so the number of True is calculated as a result.
A function that finds whether True is included in an array of truth.
A function that finds whether all elements are True in an array of truth.
4.2. pandas
With df.head () and df.tail (), output only the () line at the beginning and end of the DataFrame.
Function to know the size of df
How to get two pieces of information from df, A column and B column
How to extract only records with 10,000 steps or more, assuming that there is a df that is a data frame of the number of steps and calories ingested
Or
df [df.loc [:,“ steps ”]> = 10000]
df.query ('steps> = 10000')
etc.
How to sort in descending order of steps, assuming there is df which is a DataFrame of steps and calories ingested
One-hot encode the motion index column containing the three values High, Mid, and Low, adding "exercise" to the prefix.
How to create an array of dates from 2020-01-01 to 2020-10-01.
Create an array of dates for 100 days from 2020-01-01.
Create an array only for Saturday among the dates from 2020-01-01 to 2020-10-01.
Group the time series data df into monthly data and use the average value.
Or
df.resample ('M'), mean ()
etc.
Argument used when you want to fill Nan with the previous value in the fillna function.
If it is a DataFrame, fill it with the value one line above. If it is bfill, it will be filled with the value one line below.
What if you want to give the median value to the argument of the fillna function?
Create df_merge by concatenating df_1 and df_2 in the column direction.
Function to check the mode
Function that gives the median
A function that yields the standard deviation (sample standard deviation)
Functions and arguments that give the standard deviation (population)
4.3. Matplotlib
Where is the pie chart placed?
The pie chart is arranged around (clockwise or counterclockwise).
For pie charts, pass the () argument to the () method to implement it clockwise.
To specify where to start drawing the graph in a pie chart, pass the () argument to the () method.
4.4. scikit-learn
What class is used to complement the data if there are missing values?
About the value passed to the strategy argument in the above class.
mean = ①、median = ②、most_frequent = ③
What is the class that encodes categorical variables?
What is the attribute that confirms the original value after encoding?
Along with the encoding of categorical variables, what is the major processing method?
Another way to call this encoding.
What do you call a matrix with many components 0 and a matrix with many non-zero components?
Distributed normalization is the process of converting features so that the mean of the features is () and the standard deviation is ().
What is the class that performs distributed normalization?
Minimum / maximum normalization is the process of converting features so that the minimum value of the feature is () and the maximum value is ().
What is the class that performs minimum / maximum normalization?
Classification is a typical task of supervised learning.
The above uses the correct label, which is called the () variable.
Three typical classification algorithms
To build a classification model, the data at hand is ().
"Learning" in classification refers to building a classification model using () datasets.
What is the ability to respond to unknown data calculated from predictions for the test data set of the constructed model?
What is the function that separates each dataset?
Support vector machines are algorithms that can be used not only for classification and regression, but also for ().
When considering 2D data belonging to two classes, what is the data closest to the boundary among the data of each class?
When considering 2D data belonging to two classes, draw a straight line in () so that the distance between the support vectors is the largest ().
The distance between this straight line and the support vector is called ().
What is the data of randomly selected samples and features (explanatory variables) used in Random Forest?
Random forest is a set of decision trees, and what is learning using multiple learning machines in this way?
Regression is the task of explaining () variables with () variables represented by features.
In linear regression, when the explanatory variable is one variable, it is called (), and when there are two or more variables, it is called ().
A task that () data data without damaging the information it has.
In scikit-learn, which class of which module is used for principal component analysis.
Four indicators that quantify the extent to which data categories have been assigned.
() Rate, () Rate, () Rate, () Value
In addition, these indicators are calculated from the () matrix.
There is a trade-off between the () rate and the () rate.
The () curve and () calculated from it are used as indicators to quantify the accuracy of the prediction probability for the data.
Hyperparameters have values (determined or undetermined) during training.
Two typical methods for optimizing hyperparameters.
It's a poor problem, but I hope it helps someone. If you make any mistakes, I would be grateful if you could comment on them. Thank you until the end.
Recommended Posts