This notebook shows how one can get and set the value of a hyperparameter in a scikit-learn estimator. We recall that hyperparameters refer to the parameter that will control the learning process.

They should not be confused with the fitted parameters, resulting from the training. These fitted parameters are recognizable in scikit-learn because they are spelled with a final underscore _, for instance model.coef_.

Preparation

import pandas as pd
import matplotlib.pyplot as plt
import time
from sklearn.compose import make_column_selector as selector
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate
myDataFrame = pd.read_csv("../../scikit-learn-mooc/datasets/adult-census.csv")
myDataFrame = myDataFrame.drop(columns="education-num")
target_column = 'class'
data = myDataFrame.drop(columns=target_column)
numerical_columns = selector(dtype_exclude=object)(data)
target = myDataFrame[target_column]
data_numerical = myDataFrame[numerical_columns]
data_numerical.head()
age capital-gain capital-loss hours-per-week
0 25 0 0 40
1 38 0 0 50
2 28 0 0 40
3 44 7688 0 40
4 18 0 0 30

Simple predictive model : scaler + logistic regression

Default values

model = Pipeline(steps=[
    ("preprocessor", StandardScaler()),
    ("classifier", LogisticRegression())
])
cv_results = cross_validate(model, data_numerical, target)

scores = cv_results["test_score"]
fit_time = cv_results["fit_time"]
print("The accuracy is "
      f"{scores.mean():.3f} +/- {scores.std():.3f}, for {fit_time.mean():.3f} seconds")
The accuracy is 0.800 +/- 0.003, for 0.072 seconds

The list of all the parameters of the pipeline

for parameter in model.get_params():
    print(parameter)
memory
steps
verbose
preprocessor
classifier
preprocessor__copy
preprocessor__with_mean
preprocessor__with_std
classifier__C
classifier__class_weight
classifier__dual
classifier__fit_intercept
classifier__intercept_scaling
classifier__l1_ratio
classifier__max_iter
classifier__multi_class
classifier__n_jobs
classifier__penalty
classifier__random_state
classifier__solver
classifier__tol
classifier__verbose
classifier__warm_start

Change one parameter

C : float, default=1.0

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

model.set_params(classifier__C=1e-3)
Pipeline(steps=[('preprocessor', StandardScaler()),
                ('classifier', LogisticRegression(C=0.001))])
cv_results = cross_validate(model, data_numerical, target)

scores = cv_results["test_score"]
fit_time = cv_results["fit_time"]
print("The accuracy is "
      f"{scores.mean():.3f} +/- {scores.std():.3f}, for {fit_time.mean():.3f} seconds")
The accuracy is 0.787 +/- 0.002, for 0.066 seconds
model.get_params()['classifier__C']
0.001

Search a good value

for C in [1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100]:
    model.set_params(classifier__C=C)
    cv_results = cross_validate(model, data_numerical, target)
    scores = cv_results["test_score"]
    fit_time = cv_results["fit_time"]
    print(f"The accuracy via cross-validation with C={C} is "
      f"{scores.mean():.3f} +/- {scores.std():.3f}, for {fit_time.mean():.3f} seconds")
The accuracy via cross-validation with C=0.0001 is 0.766 +/- 0.001, for 0.057 seconds
The accuracy via cross-validation with C=0.001 is 0.787 +/- 0.002, for 0.061 seconds
The accuracy via cross-validation with C=0.01 is 0.799 +/- 0.003, for 0.068 seconds
The accuracy via cross-validation with C=0.1 is 0.800 +/- 0.003, for 0.072 seconds
The accuracy via cross-validation with C=1 is 0.800 +/- 0.003, for 0.068 seconds
The accuracy via cross-validation with C=10 is 0.800 +/- 0.003, for 0.069 seconds
The accuracy via cross-validation with C=100 is 0.800 +/- 0.003, for 0.067 seconds