This time, it's getting closer to the data analysis method in earnest. I will post about visualization for clients.
It is also called a statistical chart in Japanese.
Information that cannot be read only by numerical values Easy to understand by visualization etc. Represented by charts and figures.
Plot the data on a plane A graph connecting the plotted data with a straight line.
Use: Suitable for visualizing the amount that changes with time and position (distance). Example: By associating time on the horizontal axis (x-axis) and sales volume of products on the vertical axis (y-axis) You can visualize the transition of sales volume.
A graph in which items are arranged on the horizontal axis and the values taken by the items are vertically represented by the length of the figure.
Use: A good visualization method for comparing the values of two or more items Example: Useful if you want to visualize the number of votes cast by agenda.
Also called a frequency distribution map.
After dividing the data by class, the frequency within the class (the number of data contained in the same class) is calculated. Graph expressed in height
Use: Distribution of one-dimensional data (such as data that measures the length of a product many times) This is the most suitable visualization method for visualization. Example: Census by age
A graph with dots corresponding to the x-axis and y-axis of a certain data, respectively.
You can also visualize a total of three items on a plane by using the color and size of the dots.
Use: Check if the data is concentrated or depopulated on the data of two items. Example: Maximum temperature and number of ice cream sold
A graph that assigns an angle from the center to a circular figure according to the proportion of the whole
Use: The best visualization method when you want to compare the percentage of an item to the whole. Example: Age-specific percentage of all customers, etc.
The PC generates random numbers based on the "seed".
numpy.random.seed() #By specifying the same seed value (integer) each time, the same random number sequence is generated each time it is executed.
Under the same conditions, the same calculation result can be obtained even if random numbers are used. Therefore, it is used for output that requires reproducibility, such as when debugging.
If you do not set a seed, the computer time will be used as the initial value. Generates a different sequence of random numbers each time you run it.
numpy.random.randn()
By the above program The histogram that plots the generated numerical values is based on an expression called the normal distribution. It has a shape similar to the graph to be drawn.
The graph of the normal distribution is highest in the center, It has a symmetrical bell shape that goes down toward both sides. The average value comes to the highest position in the center.
If you specify an integer in numpy.random.randn () Returns random numbers according to a normal distribution for the number of specified integer values.
numpy.random.binomial()
The above program returns either a successful or unsuccessful attempt. For example, when you throw a coin, you always get only the front or back. The probability of failure or success is 0.5. Such an attempt
It ’s called Bernoulli Trial.
When n independent Bernoulli trials were performed Probability distribution of how many times an event occurs
It is called the binomial distribution.
If you specify an integer n and a real number p between 0 and 1 in numpy.random.binomial () Trials the success rate p as many times as the specified integer n Returns the number of successes.
In other words, the binomial distribution with the number of trials n and the probability p is calculated.
If you specify an integer value for the third argument, the first and second set trials will be performed. Returns the number of integer values.
#When you want to output the number of times the coin appears 10,000 times when you throw a coin 100 times
#Describe as follows.
import numpy
numpy.random.binomial(100, 0.5, 10000)
# (Number of trials,確率、Number of trialsのセット数)
#Output result
[52 51 61 ..., 57 53 52]
numpy.random.choice(x,n)
If you specify list type data x and integer value n in the above program The result of randomly selecting from the specified list type data x Returns the number of specified integer values n.
When dealing with time series data, we need a way to represent time.
datetime #Data type that handles dates and times
datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0)
#If specified, returns a datetime object with the specified date and time.
#Year(year),Month(month),Day(day)Is mandatory. Other arguments can be omitted, otherwise it will be 0.
It is mandatory to specify the year, month, and day. Other arguments can be omitted, otherwise it will be 0.
datetime.timedelta #A data type that represents elapsed time and time difference.
datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
#If specified, returns a timedelta object with the specified time.
#All arguments can be omitted, otherwise it will be 0.
import datetime
td = datetime.timedelta(days=1, seconds=2, microseconds=3, milliseconds=4, minutes=5, hours=6, weeks=7)
print(td)
#Output result
50 days, 6:05:02.004003
#You can also specify a negative number.
td = datetime.timedelta(days=-1, hours=-10)
print(td)
#Output result
-2 days, 14:00:00
By finding the difference between datetime objects You can compare the date and time.
The result is obtained with a timedelta object. Similarly, it is possible to perform operations between timedelta objects. Again, you can get results with a timedelta object.
By adding or subtracting with the timedelta object, you can easily get the number of days and hours until the set date and time.
import datetime
d1 = datetime.datetime.now()
d2 = datetime.datetime(2019, 9, 20, 19, 45, 0)
td = d2 - d1
print(td)
print(type(td))
#Output result
243 days, 5:38:45.159115
<class 'datetime.timedelta'>
strptime()
#Generates and returns a datetime object from a string.
#At this time, you need to specify the formatting code corresponding to the original string.
For details on the formatting code, go to the official Python website.
datetime
#Basic date type and time type ・ strftime()And strptime()Behavior
import datetime
s = "2017-12-20 10:00:00"
str_dt = datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
print(str_dt)
print(type(str_dt))
#Output result
2017-12-20 10:00:00
<class 'datetime.datetime'>
import datetime as dt
#A string representing October 22, 1992"Year-Month-Day"Assign to variable s in the form of
s = "1992-10-22"
# -Register separately by
#Convert the variable s to a datetime object representing October 22, 1992 and assign it to the variable x
x = dt.datetime.strptime(s, "%Y-%m-%d")
#Convert to date with strptime
#output
print(x)
1992-10-22 00:00:00
To calculate the numerical value read from a file etc. The type of the read data must be int type or float type. You can convert numbers-only strings to int () or float () to convert them to numeric types.
numpy.arange()
#When you want to order the elements of a list or even columns(0, 2, 4, ...)When you want to get
numpy.arange(Starting value,End value,Interval value)
#If specified, the start value to the end value-Returns numbers up to 1 at specified intervals.
np.arange(0, 5, 2) #When you want to get an even column from 0 to 4
np.arange(0, 4, 2) #Note that if specified, it will be an even column from 0 to 2.
numpy.linspace() #When you want to divide the specified range into the specified number
numpy.linspace(Starting value,End value,Value of the number you want to divide)
#If specified, returns the points to be divided into the specified number.
np.linspace(0, 15, 4)
#4 points that divide the range from 0 to 15 at equal intervals 0, 5, 10,If you want to get 15
Recommended Posts