Predict the distribution of continuous values ​​other than the normal distribution with ordinary PyTorch or TensorFlow

Introduction

The other day, when I was looking at the implementation of muzero-general, there was an interesting way to categorically predict continuous values.

If you think about it carefully, I thought that it would be possible to express the prediction of an arbitrary probability distribution of continuous values, and when I verified it, it seemed to work, so I'll make a note of it.

Predict continuous values ​​categorically

If you use MeanSquaredError to predict continuous values, you basically assume that the distribution of the values ​​you want to predict is a normal distribution (and you are only interested in the expected values). So, for example, if the distribution has two mountains, it may be a little inconvenient to predict the distance between them. I don't know how wide it is (dispersion, etc.). If you have a fixed range of values ​​(of interest), you can do the following:

For example, if the range of values ​​is 0 to 10, then you decide on a point with a value of 11 such as v = [0, 1, 2, ..., 10]. First, set p [0 ~ 10] = 0, For example, the value 3.7 is expressed as p [3] = 0.3, p [4] = 0.7. If it is 0.1, then p [0] = 0.9 and p [1] = 0.1. If it is 3, p [3] = 1.0. In short, it feels like allocating the degree of belonging to both ends of the value.

Conversely, if you want to calculate the original value from this p, calculate the expected value sum (p * v).

If you write it in

code, it will look like this.

import numpy as np

SUPPORT_SIZE = 11
VALUE_RANGE = [0., 10.]


def scalar_to_support(scalars):
  values = np.array(scalars)
  min_v, max_v = VALUE_RANGE
  values = np.clip((values - min_v) / (max_v - min_v), 0., 1.)
  key_values = np.linspace(0., 1., SUPPORT_SIZE)

  r_index = np.searchsorted(key_values, values, side="left")  # a[i-1] < x <= a[i]
  l_index = np.clip(r_index-1, 0, len(key_values))
  left_vs = key_values[l_index]
  right_vs = key_values[r_index]

  interval = key_values[1] - key_values[0]
  left_ps = 1-(values - left_vs)/interval
  right_ps = 1-(right_vs - values)/interval

  vectors = np.zeros((len(values), SUPPORT_SIZE))
  for i in range(len(scalars)):
    vectors[i, l_index[i]] = left_ps[i]
    vectors[i, r_index[i]] = right_ps[i]

  return vectors

def support_to_scalar(supports):
  min_v, max_v = VALUE_RANGE
  key_values = np.linspace(min_v, max_v, SUPPORT_SIZE)
  supports /= supports.sum(axis=1, keepdims=True)
  return np.sum(supports * key_values, axis=1)

For training, the output should be softmax (11 elements in this case) and the Loss function should be CrossEntropy.

Verification of whether the probability distribution can be learned

Let's verify it using the PyTorch that we just learned.

Prediction of constants

First, let's see if we can predict one value in a fixed manner.

code
# on jupyter notebook
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

SUPPORT_SIZE = 101


class Net(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.fc = torch.nn.Linear(1, SUPPORT_SIZE)

  def forward(self, x):
    x = self.fc(x)
    x = F.softmax(x, dim=1)
    return x


def get_dummy_input(batch_size):
    dummy_inputs = np.random.random((batch_size, 1)).astype("float32")
    return torch.tensor(dummy_inputs)  # dummy

def constant_target_value_fn(const):
  def fn(batch_size):
    return [const] * batch_size
  return fn

def train_model(model, target_value_fn, epoch=1000, lr=0.01, batch_size=16):
  optimizer = torch.optim.Adam(model.parameters(), lr=lr)

  loss_history = []
  for ep in range(epoch):
    target_values = target_value_fn(batch_size)
    target_supports = torch.tensor(scalar_to_support(target_values))
    #
    optimizer.zero_grad()
    outputs = model(get_dummy_input(batch_size))
    losses = torch.mean(- target_supports * torch.log(outputs))
    losses.backward()
    optimizer.step()
    loss_history.append(losses.item())

  plt.plot(loss_history)
  plt.show()

#######################

model = Net()
train_model(model, constant_target_value_fn(4.8))
outputs = model(get_dummy_input(5)).detach().numpy()
print(f"Expected value={np.mean(support_to_scalar(outputs))}")

vs = np.mean(outputs, axis=0)
plt.plot(vs)

I set the target range to 0 to 10 and the size of the output vector to 101, and predicted a constant of 4.8. The prediction result is like this. There seems to be no problem.

Expected value = 4.840604254500373 image.png

Prediction of normal distribution

Next, let's train the normal distribution (loc = 4.8, scale = 0.9).

code
from scipy import stats

def norm_fn(loc, scale):
  def fn(batch_size):
    dist = stats.norm(loc=loc, scale=scale)
    return dist.rvs(batch_size)
  return fn

model = Net()
distribution_fn = norm_fn(4.8, 0.9)
train_model(model, distribution_fn, epoch=1000, batch_size=1024)
outputs = model(get_dummy_input(5)).detach().numpy()
print(f"Expected value={np.mean(support_to_scalar(outputs))}")

vs = np.mean(outputs, axis=0)
plt.plot(vs)

image.png

Good vibes.

Prediction of mixed normal distribution

Next, let's learn two normal distributions, Normal (loc = 1.8, scale = 0.3, probability = 0.6) and Normal (loc = 6.8, scale = 0.9, probability = 0.4).

code
from collections import Counter
def multi_norm_fn(loc_scale_prob_list):
  def fn(batch_size):
    values = []
    for loc, scale, prob in loc_scale_prob_list:
      dist = stats.norm(loc=loc, scale=scale)
      values.append(dist.rvs(batch_size))
    ps = np.array([p for _, _, p in loc_scale_prob_list])
    ps = ps / np.sum(ps)
    count = Counter(np.random.choice(range(len(values)), size=batch_size, p=ps))
    ret = []
    for i, cnt in count.items():
      ret += list(values[i][:cnt])
    return ret
  return fn


model = Net()
distribution_fn = multi_norm_fn([
  [1.8, 0.3, 0.6],
  [6.8, 0.9, 0.4],
])
train_model(model, distribution_fn, epoch=1000, batch_size=1024)
outputs = model(get_dummy_input(5)).detach().numpy()
print(f"Expected value={np.mean(support_to_scalar(outputs))}")

vs = np.mean(outputs, axis=0)
plt.plot(vs)

image.png

Oh, you can do two mountains properly. Also, the size of the base is expressed.

Exponential distribution prediction

Finally, there is the exponential distribution (loc = 1.4, scale = 2.0).

code
def exp_fn(loc, scale):
  def fn(batch_size):
    dist = stats.expon(loc, scale)
    return dist.rvs(batch_size)
  return fn

model = Net()
distribution_fn = exp_fn(1.4, 2.0)
train_model(model, distribution_fn, epoch=1000, batch_size=1024)
outputs = model(get_dummy_input(5)).detach().numpy()
print(f"Expected value={np.mean(support_to_scalar(outputs))}")

vs = np.mean(outputs, axis=0)
plt.plot(vs)

image.png

It is expressed that the peak rises from around 1.4 and gradually falls. It also shows that there are some values ​​above 10.

Digression

Also, I think it's quite perfect for predicting continuous values ​​that circulate like angles. It seems that 10 degrees and 350 degrees are actually close to each other with a little ingenuity.

at the end

So, I was able to confirm that continuous value prediction is OK with Cross Entropy in Categorical. This kind of technique may be mentioned in recent books, but I personally made a note of it because it was new.

Recommended Posts

Predict the distribution of continuous values ​​other than the normal distribution with ordinary PyTorch or TensorFlow
Explain the nature of the multivariate normal distribution graphically
Defeat the probability density function of the normal distribution
Predict the second round of summer 2016 with scikit-learn
Find the sum of unique values with pandas crosstab
Predict the number of people infected with COVID-19 with Prophet
Predict the gender of Twitter users with machine learning
Steps to calculate the likelihood of a normal distribution
Preparing the execution environment of PyTorch with Docker November 2019
use something other than the default profile with boto3