Python library hyperopt for optimizing annoying search spaces in real, discrete, and conditional dimensions tutorial ([wiki: FMin rev: a663e]](https://github.com/hyperopt/hyperopt/wiki/FMin/a663e64546eb5cd3ed462618dcc1e41863ad8688)) was translated by google. License
This page is a tutorial on the basic usage of hyperopt.fmin ()
.
Describes how to write an objective function that fmin can optimize and how to write a search space that fmin can search.
Hyperopt's job is to find the best possible stochastic function value for a scalar value rather than the set of possible arguments for that function. While many optimization packages expect these inputs to be derived from vector spaces, Hyperopt encourages you to describe your search space in more detail. By providing more information about where your function is defined and where the optimal value is, hyperopt's algorithms can be searched more efficiently.
The way to use hyperopt is to write:
This (most basic) tutorial will show you how to create functions and search spaces using the default trial database and dummy random search algorithm. Section (1) is about different calling conventions for communication between the objective function and hyperopt. Section (2) is about the description of the search space.
You can do a parallel search by replacing the Trials
database with the MongoTrials
database. There is another wiki page about using mongodb for parallel search.
Choosing a search algorithm is as easy as passing ʻalgo = hyperopt.tpe.suggest instead of ʻalgo = hyperopt.random.suggest
. The search algorithm is actually a callable object, and its constructor accepts configuration arguments, which is all about how the search algorithm is selected.
Hyperopt offers several levels of increased flexibility and complexity when specifying to minimize the objective function. Questions to think about as a designer
In the next few sections, we'll look at different ways to implement an objective function that minimizes the quadratic objective function for a single variable. In each section, search in the range -10 to +10. This can be described in * search space *.
space = hp.uniform('x', -10, 10)
Below, Section 2, covers how to specify search spaces that are more complicated.
The simplest protocol for communication between a hyperopt optimization algorithm and an objective function is for the objective function to receive a valid point from the search space and use the floating point * loss * (also known as the negation utility) associated with that point. return.
from hyperopt import fmin, tpe, hp
best = fmin(fn=lambda x: x ** 2,
space=hp.uniform('x', -10, 10),
algo=tpe.suggest,
max_evals=100)
print best
This protocol has the advantage of being very readable and easy to type. As you can see, it's almost one liner. The drawback of this protocol is (1) This type of function does not allow additional information about each assessment to be returned to the test database. And (2) This kind of function cannot interact with the search algorithm or other parallel function evaluation. The following example shows why you want to do these things.
If the objective function is complex and takes a long time to execute, you may want to save more statistics and diagnostic information, as well as the last floating point loss. In such cases, the fmin function can treat the dictionary as a return value. That is, your loss function can return a dictionary that nests all the statistics and diagnostics you want. The reality is a little less flexible than this. For example, when using mongodb, the dictionary must be a valid JSON document. Still, there is plenty of flexibility to store domain-specific auxiliary results.
When the objective function returns a dictionary, the fmin function looks for some special key-value pair in the return value and passes it to the optimization algorithm. There are two required key-value pairs.
status
--One of the keys for hyperopt.STATUS_STRINGS
. For example,'ok' for successful termination or'fail' if no function is defined.loss
--The value of the floating point function you are trying to minimize. If the status is'ok', this must be present.The fmin function also responds to some option keys:
loss_variance
--float --Uncertainty of stochastic objective functiontrue_loss
--float --When optimizing hyperparameters, saving the model generalization error with this name will give you a clearer output from the built-in plot routines.true_loss_variance
--float --Generalization error uncertaintyThe dictionary uses a variety of back-end storage mechanisms, so you need to make sure it's compatible with JSON. If it is a graph with a tree structure of dictionary, list, tuple, number, string, date and time, there is no problem.
** Hint: ** To store numpy arrays, consider serializing them into strings and saving them as attachments.
Writing the above function in a dictionary-returning style would look like this:
import pickle
import time
from hyperopt import fmin, tpe, hp, STATUS_OK
def objective(x):
return {'loss': x ** 2, 'status': STATUS_OK }
best = fmin(objective,
space=hp.uniform('x', -10, 10),
algo=tpe.suggest,
max_evals=100)
print best
To actually see the purpose of returning a dictionary, modify the objective function to return some and pass an explicit trials
argument to fmin
.
import pickle
import time
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
def objective(x):
return {
'loss': x ** 2,
'status': STATUS_OK,
# -- store other results like this
'eval_time': time.time(),
'other_stuff': {'type': None, 'value': [0, 1, 2]},
# -- attachments are handled differently
'attachments':
{'time_module': pickle.dumps(time.time)}
}
trials = Trials()
best = fmin(objective,
space=hp.uniform('x', -10, 10),
algo=tpe.suggest,
max_evals=100,
trials=trials)
print best
In this case, the call to fmin is the same as before, but you can pass the trial object directly to check all the return values calculated during the experiment.
So, for example:
trials.trials
--List of dictionaries representing all of the searchestrials.results
--List of dictionaries returned by'objective' during the searchtrials.losses ()
--Floating point list of losses (for each'ok' trial)trials.statuses ()
--List of status stringsYou can save this trial object, pass it to a built-in plotting routine, or analyze it with your own custom code.
ʻAttachments is handled by a special mechanism that allows the same code to be used for both
Trialsand
MongoTrials`.
You can get a trial attachment like this. This will get the'time_module'attachment for the 5th trial.
msg = trials.trial_attachments(trials.trials[5])['time_module']
time_module = pickle.loads(msg)
attachments are large strings, so if you use MongoTrials you don't need to download more than you need. Strings can also be attached globally to the entire trial object via trials. attachments behave like a string-to-string dictionary.
** N.B. ** Currently, trial-specific attachments to Trials objects are placed in the same global trial attachment dictionary, but are subject to change in the future and do not apply to MongoTrials.
It is possible for fmin ()
to give your objective function the handle of mongodb used in parallel experiments. This mechanism allows you to update the database with partial results and communicate with other parallel processes that are evaluating different points. The objective function can even add new search points, such as random.suggest
.
The basic techniques are:
fmin_pass_expr_memo_ctrl
Using decoratorspyll.rec_eval
in your own function to build a search space point from ʻexpr and
memo`.ctrl
, an instance of hyperopt.Ctrl
, to communicate with other trial objects.I won't cover it in this short tutorial, but I would like to make some mention of what is possible with the current codebase. It also includes hyperopt sources, unit tests, and sample projects such as hyperopt-convnet. Please email me or submit a github issue to speed up this part of the code.
The search space consists of nested functional expressions that contain stochastic expressions. Stochastic expressions are hyperparameters. Sampling from this nested stochastic program defines a random search algorithm. Hyperparameter optimization algorithms work by replacing the usual "sampling" logic with adaptive search strategies and do not actually attempt to sample from a specified distribution in the search space.
It's best to think of search space as a stochastic argument sampling program. For example
from hyperopt import hp
space = hp.choice('a',
[
('case 1', 1 + hp.lognormal('c1', 0, 1)),
('case 2', hp.uniform('c2', -10, 10))
])
The result of running this code is the variable space
, which references the graph of the expression identifier and its arguments. Nothing was actually sampled. It's just a graph that describes how to sample points. The code for working with this kind of representation graph is in hyperopt.pyll
, and we call these graphs pyll
graphs or * pyll programs *.
If desired, the sample space can be sampled and evaluated.
import hyperopt.pyll.stochastic
print hyperopt.pyll.stochastic.sample(space)
This search space, described by space
, has three parameters:
*'a'-Select a case *'c1'-Positive value parameter used in'case 1' *'c2'-Bounded real-valued parameters used in'case 2'
One thing to note here is that all optimizable stochastic expressions have a * label * as the first argument. These labels are used to return parameter choices to the caller and are used internally in a variety of ways.
Another thing to note is the use of tuples in the center of the graph (around each of'case 1'and'case 2'). Lists, dictionaries, and tuples are all upgraded to "deterministic functional expressions" and become part of the search space stochastic program.
The third notable is the numeric expression 1 + hp.lognormal ('c1', 0, 1)
embedded in the search space description. As far as the optimization algorithm is concerned, there is no difference in adding 1 directly to the search space and 1 in the logic of the objective function itself. Designers can choose where to place such processing to achieve the kind of modularity they need. The result of an intermediate expression in the search space can be any Python object, even when optimizing in parallel using mongodb. It's easy to add a new type of non-stochastic representation to the search space description (see Section 2.3 below).
Fourth,'c1'and'c2' are examples called conditional parameters. Each of'c1'and'c2' shows only the numbers in the sample returned for a particular value of'a'. If'a'is 0,'c1' is used but'c2' is not used. If'a'is 1,'c2' is used but'c1' is not used. When it makes sense, you should encode the parameters as conditional in this way, rather than simply ignoring them in the objective function. You can search more efficiently if you find that'c1'may not affect the objective function (because it does not affect the objective function arguments).
The probabilistic formulas currently recognized by the hyperopt optimization algorithm are:
hp.choice(label, options)
Returns one of the options. Must be a list or tuple. The elements of ʻoptions` can be nested in their own stochastic expressions. In this case, the probabilistic selection that appears in only a few options is the * condition * parameter.
hp.randint(label, upper)
Returns a random number in the range [0, upper). The semantics of this distribution is that there is no correlation in the loss function between nearby integer values compared to farther integer values. This is a good distribution, for example, to describe a random seed. If the loss function probably correlates with nearby integer values, then you should use one of the "quantized" continuous distributions, such as quniform
, qloguniform
, qnormal
or qlognormal
.
hp.uniform(label, low, high)
Returns a value evenly between low
and high
.
When optimizing, this variable is constrained by the spacing on both sides.
hp.quniform(label, low, high, q)
Returns a value such as round (uniform (low, high) / q) * q
.
The purpose is somewhat "smooth", but suitable for discrete values that should be constrained both above and below.
hp.loguniform(label, low, high)
Returns a logarithmic uniform distribution such as ʻexp (uniform (low, high))`.
When optimized, this variable is constrained to the interval [exp (low), exp (high)]
.
hp.qloguniform(label, low, high, q)
Returns a value such as round (exp (uniform (low, high)) / q) * q
.
Suitable for discrete variables whose purpose is "smooth" and whose value size is smoother, but limited to both top and bottom.
hp.normal(label, mu, sigma)
Returns the actual value that is normally distributed with mean mu and standard deviation sigma. When optimizing, this is an unconstrained variable.
hp.qnormal(label, mu, sigma, q)
Returns a value such as round (normal (mu, sigma) / q) * q
.
Probably takes a value near mu, but is fundamentally suitable for unlimited discrete variables.
hp.lognormal(label, mu, sigma)
Returns the value drawn according to ʻexp (normal (mu, sigma)) `so that the logarithm of the return value is normally distributed. When optimized, this variable is limited to positive values.
hp.qlognormal(label, mu, sigma, q)
Returns a value such as round (exp (normal (mu, sigma)) / q) * q
.
Suitable for discrete variables whose purpose is smooth and smoothed by the size of the variable bounded from one side.
2.2 A Search Space Example: scikit-learn
To see all these possibilities in action, let's see how scikit-learn describes the hyperparameter spaces of the classification algorithm. (This idea was developed at hyperopt-sklearn.)
from hyperopt import hp
space = hp.choice('classifier_type', [
{
'type': 'naive_bayes',
},
{
'type': 'svm',
'C': hp.lognormal('svm_C', 0, 1),
'kernel': hp.choice('svm_kernel', [
{'ktype': 'linear'},
{'ktype': 'RBF', 'width': hp.lognormal('svm_rbf_width', 0, 1)},
]),
},
{
'type': 'dtree',
'criterion': hp.choice('dtree_criterion', ['gini', 'entropy']),
'max_depth': hp.choice('dtree_max_depth',
[None, hp.qlognormal('dtree_max_depth_int', 3, 1, 1)]),
'min_samples_split': hp.qlognormal('dtree_min_samples_split', 2, 1, 1),
},
])
You can use nodes like arguments to the pyll function (see pyll). If you want to know more about this, please submit a github issue.
Simply put, it just decorates the top-level (that is, pickle-friendly) functions for use through the scope
object.
import hyperopt.pyll
from hyperopt.pyll import scope
@scope.define
def foo(a, b=0):
print 'runing foo', a, b
return a + b / 2
# -- this will print 0, foo is called as usual.
print foo(0)
#In the description of the search space, like normal Python`foo`Can be used.
#These two calls don't actually call foo,
#Only record that you need to call foo to evaluate the graph.
space1 = scope.foo(hp.uniform('a', 0, 10))
space2 = scope.foo(hp.uniform('a', 0, 10), hp.normal('b', 0, 1))
# -- this will print an pyll.Apply node
print space1
# -- this will draw a sample by running foo()
print hyperopt.pyll.stochastic.sample(space1)
If possible, we should avoid adding new kinds of stochastic representations to describe the parameter search space. In order for all search algorithms to work in all spaces, the search algorithms must match the type of hyperparameters that describe the spaces. As a library maintainer, I open up the possibility that some kind of expression should be added from time to time, but as I said, I want to avoid it as much as possible. Adding a new kind of stochastic representation is not one of the ways hyperopt is extensible.
Copyright (c) 2013, James Bergstra All rights reserved.
Recommended Posts