.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/use_cases/plot_employee_salaries.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_use_cases_plot_employee_salaries.py: .. _example_use_case_employee_salaries: ============================================== Simplified and structured experiment reporting ============================================== This example shows how to leverage `skore` for structuring useful experiment information allowing to get insights from machine learning experiments. .. GENERATED FROM PYTHON SOURCE LINES 13-15 Loading a non-trivial dataset ============================= .. GENERATED FROM PYTHON SOURCE LINES 17-19 We use a skrub dataset that contains information about employees and their salaries. We will see that this dataset is non-trivial. .. GENERATED FROM PYTHON SOURCE LINES 21-26 .. code-block:: Python from skrub.datasets import fetch_employee_salaries datasets = fetch_employee_salaries() df, y = datasets.X, datasets.y .. GENERATED FROM PYTHON SOURCE LINES 27-29 Let's first have a condensed summary of the input data using a :class:`skrub.TableReport`. .. GENERATED FROM PYTHON SOURCE LINES 31-35 .. code-block:: Python from skrub import TableReport TableReport(df) .. raw:: html

	gender	department	department_name	division	assignment_category	employee_position_title	date_first_hired	year_first_hired
0	F	POL	Department of Police	MSB Information Mgmt and Tech Division Records Management Section	Fulltime-Regular	Office Services Coordinator	09/22/1986	1,986
1	M	POL	Department of Police	ISB Major Crimes Division Fugitive Section	Fulltime-Regular	Master Police Officer	09/12/1988	1,988
2	F	HHS	Department of Health and Human Services	Adult Protective and Case Management Services	Fulltime-Regular	Social Worker IV	11/19/1989	1,989
3	M	COR	Correction and Rehabilitation	PRRS Facility and Security	Fulltime-Regular	Resident Supervisor II	05/05/2014	2,014
4	M	HCA	Department of Housing and Community Affairs	Affordable Housing Programs	Fulltime-Regular	Planning Specialist III	03/05/2007	2,007

9,223	F	HHS	Department of Health and Human Services	School Based Health Centers	Fulltime-Regular	Community Health Nurse II	11/03/2015	2,015
9,224	F	FRS	Fire and Rescue Services	Human Resources Division	Fulltime-Regular	Fire/Rescue Division Chief	11/28/1988	1,988
9,225	M	HHS	Department of Health and Human Services	Child and Adolescent Mental Health Clinic Services	Parttime-Regular	Medical Doctor IV - Psychiatrist	04/30/2001	2,001
9,226	M	CCL	County Council	Council Central Staff	Fulltime-Regular	Manager II	09/05/2006	2,006
9,227	M	DLC	Department of Liquor Control	Licensure, Regulation and Education	Fulltime-Regular	Alcohol/Tobacco Enforcement Specialist II	01/30/2012	2,012

Column	Column name	dtype	Is sorted	Null values	Unique values	Mean	Std	Min	Median	Max
0	gender	ObjectDType	False	17 (0.2%)	2 (< 0.1%)
1	department	ObjectDType	False	0 (0.0%)	37 (0.4%)
2	department_name	ObjectDType	False	0 (0.0%)	37 (0.4%)
3	division	ObjectDType	False	0 (0.0%)	694 (7.5%)
4	assignment_category	ObjectDType	False	0 (0.0%)	2 (< 0.1%)
5	employee_position_title	ObjectDType	False	0 (0.0%)	443 (4.8%)
6	date_first_hired	ObjectDType	False	0 (0.0%)	2264 (24.5%)
7	year_first_hired	Int64DType	False	0 (0.0%)	51 (0.6%)	2.00e+03	9.33	1,965	2,005	2,016

Column 1	Column 2	Cramér's V
department	department_name	1.00
division	assignment_category	0.593
assignment_category	employee_position_title	0.497
department_name	assignment_category	0.422
department	assignment_category	0.422
department	employee_position_title	0.413
department_name	employee_position_title	0.413
division	employee_position_title	0.410
department	division	0.381
department_name	division	0.381
gender	department	0.380
gender	department_name	0.380
gender	assignment_category	0.294
gender	employee_position_title	0.275
gender	division	0.265
employee_position_title	date_first_hired	0.179
date_first_hired	year_first_hired	0.151
department	date_first_hired	0.150
department_name	date_first_hired	0.150
employee_position_title	year_first_hired	0.131

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

.. GENERATED FROM PYTHON SOURCE LINES 36-60 From the table report, we can make the following observations: * Looking at the *Table* tab, we observe that the year related to the ``date_first_hired`` column is also present in the ``date`` column. Hence, we should beware of not creating twice the same feature during the feature engineering. * Looking at the *Stats* tab: - The type of data is heterogeneous: we mainly have categorical and date-related features. - The ``division`` and ``employee_position_title`` features contain a large number of categories. It is something that we should consider in our feature engineering. * Looking at the *Associations* tab, we observe that two features are holding the exact same information: ``department`` and ``department_name``. Hence, during our feature engineering, we could potentially drop one of them if the final predictive model is sensitive to the collinearity. In terms of target and thus the task that we want to solve, we are interested in predicting the salary of an employee given the previous features. We therefore have a regression task at end. .. GENERATED FROM PYTHON SOURCE LINES 62-64 .. code-block:: Python TableReport(y) .. raw:: html

	current_annual_salary
0	6.92e+04
1	9.74e+04
2	1.05e+05
3	5.27e+04
4	9.34e+04

9,223	7.21e+04
9,224	1.70e+05
9,225	1.03e+05
9,226	1.54e+05
9,227	7.55e+04

Column	Column name	dtype	Is sorted	Null values	Unique values	Mean	Std	Min	Median	Max
0	current_annual_salary	Float64DType	False	0 (0.0%)	3403 (36.9%)	7.34e+04	2.91e+04	9.20e+03	6.94e+04	3.03e+05

Please enable javascript

.. GENERATED FROM PYTHON SOURCE LINES 65-68 Later in this example, we will show that `skore` stores similar information when a model is trained on a dataset, thus enabling us to get quick insights on the dataset used to train and test the model. .. GENERATED FROM PYTHON SOURCE LINES 70-84 Tree-based model ================ Let's start by creating a tree-based model using some out-of-the-box tools. For feature engineering, we use skrub's :class:`~skrub.TableVectorizer`. To deal with the high cardinality of the categorical features, we use a :class:`~skrub.StringEncoder` to encode the categorical features. Finally, we use a :class:`~sklearn.ensemble.HistGradientBoostingRegressor` as a base estimator, it is a rather robust model. Modelling ^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 86-96 .. code-block:: Python from sklearn.ensemble import HistGradientBoostingRegressor from sklearn.pipeline import make_pipeline from skrub import StringEncoder, TableVectorizer model = make_pipeline( TableVectorizer(high_cardinality=StringEncoder()), HistGradientBoostingRegressor(), ) model .. raw:: html

Pipeline(steps=[('tablevectorizer', TableVectorizer()),
                    ('histgradientboostingregressor',
                     HistGradientBoostingRegressor())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

.. GENERATED FROM PYTHON SOURCE LINES 97-102 Evaluation ^^^^^^^^^^ Let us compute the cross-validation report for this model using a :class:`skore.CrossValidationReport`: .. GENERATED FROM PYTHON SOURCE LINES 104-111 .. code-block:: Python from skore import CrossValidationReport hgbt_model_report = CrossValidationReport( estimator=model, X=df, y=y, splitter=5, n_jobs=4 ) hgbt_model_report.help() .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 112-116 A report provides a collection of useful information. For instance, it allows to compute on demand the predictions of the model and some performance metrics. Let's cache the predictions of the cross-validated models once and for all. .. GENERATED FROM PYTHON SOURCE LINES 118-120 .. code-block:: Python hgbt_model_report.cache_predictions(n_jobs=4) .. GENERATED FROM PYTHON SOURCE LINES 121-125 Now that the predictions are cached, any request to compute a metric will be performed using the cached predictions and will thus be fast. We can now have a look at the performance of the model with some standard metrics. .. GENERATED FROM PYTHON SOURCE LINES 127-129 .. code-block:: Python hgbt_model_report.metrics.summarize().frame() .. raw:: html

	HistGradientBoostingRegressor
	mean	std
Metric
R²	0.911027	0.016488
RMSE	8672.883305	1111.578431
Fit time (s)	2.611937	0.472511
Predict time (s)	0.172976	0.010085

.. GENERATED FROM PYTHON SOURCE LINES 130-133 Similarly to what we saw in the previous section, the :class:`skore.CrossValidationReport` also stores some information about the dataset used. .. GENERATED FROM PYTHON SOURCE LINES 135-138 .. code-block:: Python data_display = hgbt_model_report.data.analyze() data_display .. raw:: html

	gender	department	department_name	division	assignment_category	employee_position_title	date_first_hired	year_first_hired	current_annual_salary
0	F	POL	Department of Police	MSB Information Mgmt and Tech Division Records Management Section	Fulltime-Regular	Office Services Coordinator	09/22/1986	1,986	6.92e+04
1	M	POL	Department of Police	ISB Major Crimes Division Fugitive Section	Fulltime-Regular	Master Police Officer	09/12/1988	1,988	9.74e+04
2	F	HHS	Department of Health and Human Services	Adult Protective and Case Management Services	Fulltime-Regular	Social Worker IV	11/19/1989	1,989	1.05e+05
3	M	COR	Correction and Rehabilitation	PRRS Facility and Security	Fulltime-Regular	Resident Supervisor II	05/05/2014	2,014	5.27e+04
4	M	HCA	Department of Housing and Community Affairs	Affordable Housing Programs	Fulltime-Regular	Planning Specialist III	03/05/2007	2,007	9.34e+04

9,223	F	HHS	Department of Health and Human Services	School Based Health Centers	Fulltime-Regular	Community Health Nurse II	11/03/2015	2,015	7.21e+04
9,224	F	FRS	Fire and Rescue Services	Human Resources Division	Fulltime-Regular	Fire/Rescue Division Chief	11/28/1988	1,988	1.70e+05
9,225	M	HHS	Department of Health and Human Services	Child and Adolescent Mental Health Clinic Services	Parttime-Regular	Medical Doctor IV - Psychiatrist	04/30/2001	2,001	1.03e+05
9,226	M	CCL	County Council	Council Central Staff	Fulltime-Regular	Manager II	09/05/2006	2,006	1.54e+05
9,227	M	DLC	Department of Liquor Control	Licensure, Regulation and Education	Fulltime-Regular	Alcohol/Tobacco Enforcement Specialist II	01/30/2012	2,012	7.55e+04

Column	Column name	dtype	Is sorted	Null values	Unique values	Mean	Std	Min	Median	Max
0	gender	ObjectDType	False	17 (0.2%)	2 (< 0.1%)
1	department	ObjectDType	False	0 (0.0%)	37 (0.4%)
2	department_name	ObjectDType	False	0 (0.0%)	37 (0.4%)
3	division	ObjectDType	False	0 (0.0%)	694 (7.5%)
4	assignment_category	ObjectDType	False	0 (0.0%)	2 (< 0.1%)
5	employee_position_title	ObjectDType	False	0 (0.0%)	443 (4.8%)
6	date_first_hired	ObjectDType	False	0 (0.0%)	2264 (24.5%)
7	year_first_hired	Int64DType	False	0 (0.0%)	51 (0.6%)	2.00e+03	9.33	1,965	2,005	2,016
8	current_annual_salary	Float64DType	False	0 (0.0%)	3403 (36.9%)	7.34e+04	2.91e+04	9.20e+03	6.94e+04	3.03e+05

Column 1	Column 2	Cramér's V	Pearson's Correlation
department	department_name	1.00
assignment_category	current_annual_salary	0.698
division	assignment_category	0.593
assignment_category	employee_position_title	0.497
department	assignment_category	0.422
department_name	assignment_category	0.422
department	employee_position_title	0.413
department_name	employee_position_title	0.413
division	employee_position_title	0.410
department_name	division	0.381
department	division	0.381
gender	department	0.380
gender	department_name	0.380
gender	assignment_category	0.294
employee_position_title	current_annual_salary	0.292
gender	employee_position_title	0.275
gender	division	0.265
division	current_annual_salary	0.221
year_first_hired	current_annual_salary	0.218	-0.480
department_name	current_annual_salary	0.207

Please enable javascript

.. GENERATED FROM PYTHON SOURCE LINES 139-144 The display obtained allows for a quick overview with the same HTML-based view as the :class:`skrub.TableReport` we have seen earlier. In addition, you can access a :meth:`skore.TableReportDisplay.plot` method to have a particular focus on one potential analysis. For instance, we can get a figure representing the correlation matrix of the dataset. .. GENERATED FROM PYTHON SOURCE LINES 146-148 .. code-block:: Python data_display.plot(kind="corr") .. image-sg:: /auto_examples/use_cases/images/sphx_glr_plot_employee_salaries_001.png :alt: Cramer's V Correlation :srcset: /auto_examples/use_cases/images/sphx_glr_plot_employee_salaries_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 149-156 We get the results from some statistical metrics aggregated over the cross-validation splits as well as some performance metrics related to the time it took to train and test the model. The :class:`skore.CrossValidationReport` also provides a way to inspect similar information at the level of each cross-validation split by accessing an :class:`skore.EstimatorReport` for each split. .. GENERATED FROM PYTHON SOURCE LINES 158-161 .. code-block:: Python hgbt_split_1 = hgbt_model_report.estimator_reports_[0] hgbt_split_1.metrics.summarize(favorability=True).frame() .. raw:: html

	HistGradientBoostingRegressor	Favorability
Metric
R²	0.910637	(↗︎)
RMSE	8607.961973	(↘︎)
Fit time (s)	2.815443	(↘︎)
Predict time (s)	0.167178	(↘︎)

.. GENERATED FROM PYTHON SOURCE LINES 162-164 The favorability of each metric indicates whether the metric is better when higher or lower. .. GENERATED FROM PYTHON SOURCE LINES 166-172 Linear model ============ Now that we have established a first model that serves as a baseline, we shall proceed to define a quite complex linear model: a pipeline with a complex feature engineering that uses a linear model as the base estimator. .. GENERATED FROM PYTHON SOURCE LINES 174-176 Modelling ^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 178-226 .. code-block:: Python import numpy as np from sklearn.compose import make_column_transformer from sklearn.linear_model import RidgeCV from sklearn.pipeline import make_pipeline from sklearn.preprocessing import OneHotEncoder, SplineTransformer from skrub import DatetimeEncoder, DropCols, GapEncoder, ToDatetime def periodic_spline_transformer(period, n_splines=None, degree=3): if n_splines is None: n_splines = period n_knots = n_splines + 1 # periodic and include_bias is True return SplineTransformer( degree=degree, n_knots=n_knots, knots=np.linspace(0, period, n_knots).reshape(n_knots, 1), extrapolation="periodic", include_bias=True, ) one_hot_features = ["gender", "department_name", "assignment_category"] datetime_features = "date_first_hired" date_encoder = make_pipeline( ToDatetime(), DatetimeEncoder(resolution="day", add_weekday=True, add_total_seconds=False), DropCols("date_first_hired_year"), ) date_engineering = make_column_transformer( (periodic_spline_transformer(12, n_splines=6), ["date_first_hired_month"]), (periodic_spline_transformer(31, n_splines=15), ["date_first_hired_day"]), (periodic_spline_transformer(7, n_splines=3), ["date_first_hired_weekday"]), ) feature_engineering_date = make_pipeline(date_encoder, date_engineering) preprocessing = make_column_transformer( (feature_engineering_date, datetime_features), (OneHotEncoder(drop="if_binary", handle_unknown="ignore"), one_hot_features), (GapEncoder(n_components=100), "division"), (GapEncoder(n_components=100), "employee_position_title"), ) model = make_pipeline(preprocessing, RidgeCV(alphas=np.logspace(-3, 3, 100))) model .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 227-240 In the diagram above, we can see what how we performed our feature engineering: * For categorical features, we use two approaches. If the number of categories is relatively small, we use a `OneHotEncoder`. If the number of categories is large, we use a `GapEncoder` that is designed to deal with high cardinality categorical features. * Then, we have another transformation to encode the date features. We first split the date into multiple features (day, month, and year). Then, we apply a periodic spline transformation to each of the date features in order to capture the periodicity of the data. * Finally, we fit a :class:`~sklearn.linear_model.RidgeCV` model. .. GENERATED FROM PYTHON SOURCE LINES 242-248 Evaluation ^^^^^^^^^^ Now, we want to evaluate this linear model via cross-validation (with 5 folds). For that, we use skore's :class:`~skore.CrossValidationReport` to investigate the performance of our model. .. GENERATED FROM PYTHON SOURCE LINES 250-255 .. code-block:: Python linear_model_report = CrossValidationReport( estimator=model, X=df, y=y, splitter=5, n_jobs=4 ) linear_model_report.help() .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 256-264 We observe that the cross-validation report has detected that we have a regression task at hand and thus provides us with some metrics and plots that make sense with regards to our specific problem at hand. To accelerate any future computation (e.g. of a metric), we cache the predictions of our model once and for all. Note that we do not necessarily need to cache the predictions as the report will compute them on the fly (if not cached) and cache them for us. .. GENERATED FROM PYTHON SOURCE LINES 266-272 .. code-block:: Python import warnings with warnings.catch_warnings(): warnings.simplefilter(action="ignore", category=FutureWarning) linear_model_report.cache_predictions(n_jobs=4) .. GENERATED FROM PYTHON SOURCE LINES 273-274 We can now have a look at the performance of the model with some standard metrics. .. GENERATED FROM PYTHON SOURCE LINES 276-278 .. code-block:: Python linear_model_report.metrics.summarize(favorability=True).frame() .. raw:: html

	RidgeCV		Favorability
	mean	std
Metric
R²	0.765180	0.022767	(↗︎)
RMSE	14103.629945	1172.175959	(↘︎)
Fit time (s)	10.064724	2.052521	(↘︎)
Predict time (s)	0.408238	0.008181	(↘︎)

.. GENERATED FROM PYTHON SOURCE LINES 279-284 Comparing the models ==================== Now that we cross-validated our models, we can make some further comparison using the :class:`skore.ComparisonReport`: .. GENERATED FROM PYTHON SOURCE LINES 286-291 .. code-block:: Python from skore import ComparisonReport comparator = ComparisonReport([hgbt_model_report, linear_model_report]) comparator.metrics.summarize(favorability=True).frame() .. raw:: html

	mean		std		Favorability
Estimator	HistGradientBoostingRegressor	RidgeCV	HistGradientBoostingRegressor	RidgeCV
Metric
R²	0.911027	0.765180	0.016488	0.022767	(↗︎)
RMSE	8672.883305	14103.629945	1111.578431	1172.175959	(↘︎)
Fit time (s)	2.611937	10.064724	0.472511	2.052521	(↘︎)
Predict time (s)	0.172976	0.408238	0.010085	0.008181	(↘︎)

.. GENERATED FROM PYTHON SOURCE LINES 292-297 In addition, if we forgot to compute a specific metric (e.g. :func:`~sklearn.metrics.mean_absolute_error`), we can easily add it to the report, without re-training the model and even without re-computing the predictions since they are cached internally in the report. This allows us to save some potentially huge computation time. .. GENERATED FROM PYTHON SOURCE LINES 299-306 .. code-block:: Python from sklearn.metrics import get_scorer metric = {"R²": "r2", "RMSE": "rmse", "MAE": get_scorer("neg_mean_absolute_error")} metric_kwargs = {"response_method": "predict"} comparator.metrics.summarize(metric=metric, metric_kwargs=metric_kwargs).frame() .. raw:: html

	mean		std
Estimator	HistGradientBoostingRegressor	RidgeCV	HistGradientBoostingRegressor	RidgeCV
Metric
R²	0.911027	0.765180	0.016488	0.022767
RMSE	8672.883305	14103.629945	1111.578431	1172.175959
MAE	4672.991598	9939.712091	176.786948	390.123295

.. GENERATED FROM PYTHON SOURCE LINES 307-310 Finally, we can even get a deeper understanding by analyzing each split in the :class:`~skore.CrossValidationReport`. Here, we plot the actual-vs-predicted values for each split. .. GENERATED FROM PYTHON SOURCE LINES 312-314 .. code-block:: Python linear_model_report.metrics.prediction_error().plot(kind="actual_vs_predicted") .. image-sg:: /auto_examples/use_cases/images/sphx_glr_plot_employee_salaries_002.png :alt: Prediction Error for RidgeCV Data source: Test set :srcset: /auto_examples/use_cases/images/sphx_glr_plot_employee_salaries_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 315-325 Conclusion ========== This example showcased `skore`'s integrated approach to machine learning workflow, from initial data exploration with `TableReport` through model development and evaluation with `CrossValidationReport`. We demonstrated how `skore` automatically captures dataset information and provides efficient caching, enabling quick insights and flexible model comparison. The workflow highlights `skore`'s ability to streamline the entire ML process while maintaining computational efficiency. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 40.977 seconds) .. _sphx_glr_download_auto_examples_use_cases_plot_employee_salaries.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_employee_salaries.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_employee_salaries.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_employee_salaries.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators ` for more details.	[('tablevectorizer', ...), ('histgradientboostingregressor', ...)]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing `. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

	categories categories: 'auto' or a list of array-like, default='auto' Categories (unique values) per feature: - 'auto' : Determine categories automatically from the training data. - list : ``categories[i]`` holds the categories expected in the ith column. The passed categories should not mix strings and numeric values within a single feature, and should be sorted in case of numeric values. The used categories can be found in the ``categories_`` attribute. .. versionadded:: 0.20	'auto'
	drop drop: {'first', 'if_binary'} or an array-like of shape (n_features,), default=None Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into an unregularized linear regression model. However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models. - None : retain all features (the default). - 'first' : drop the first category in each feature. If only one category is present, the feature will be dropped entirely. - 'if_binary' : drop the first category in each feature with two categories. Features with 1 or more than 2 categories are left intact. - array : ``drop[i]`` is the category in feature ``X[:, i]`` that should be dropped. When `max_categories` or `min_frequency` is configured to group infrequent categories, the dropping behavior is handled after the grouping. .. versionadded:: 0.21 The parameter `drop` was added in 0.21. .. versionchanged:: 0.23 The option `drop='if_binary'` was added in 0.23. .. versionchanged:: 1.1 Support for dropping infrequent categories.	'if_binary'
	sparse_output sparse_output: bool, default=True When ``True``, it returns a :class:`scipy.sparse.csr_matrix`, i.e. a sparse matrix in "Compressed Sparse Row" (CSR) format. .. versionadded:: 1.2 `sparse` was renamed to `sparse_output`	False
	dtype dtype: number type, default=np.float64 Desired dtype of output.	'float32'
	handle_unknown handle_unknown: {'error', 'ignore', 'infrequent_if_exist', 'warn'}, default='error' Specifies the way unknown categories are handled during :meth:`transform`. - 'error' : Raise an error if an unknown category is present during transform. - 'ignore' : When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None. - 'infrequent_if_exist' : When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will map to the infrequent category if it exists. The infrequent category will be mapped to the last position in the encoding. During inverse transform, an unknown category will be mapped to the category denoted `'infrequent'` if it exists. If the `'infrequent'` category does not exist, then :meth:`transform` and :meth:`inverse_transform` will handle an unknown category as with `handle_unknown='ignore'`. Infrequent categories exist based on `min_frequency` and `max_categories`. Read more in the :ref:`User Guide `. - 'warn' : When an unknown category is encountered during transform a warning is issued, and the encoding then proceeds as described for `handle_unknown="infrequent_if_exist"`. .. versionchanged:: 1.1 `'infrequent_if_exist'` was added to automatically handle unknown categories and infrequent categories. .. versionadded:: 1.6 The option `"warn"` was added in 1.6.	'ignore'
	min_frequency min_frequency: int or float, default=None Specifies the minimum frequency below which a category will be considered infrequent. - If `int`, categories with a smaller cardinality will be considered infrequent. - If `float`, categories with a smaller cardinality than `min_frequency * n_samples` will be considered infrequent. .. versionadded:: 1.1 Read more in the :ref:`User Guide `.	None
	max_categories max_categories: int, default=None Specifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, `max_categories` includes the category representing the infrequent categories along with the frequent categories. If `None`, there is no limit to the number of output features. .. versionadded:: 1.1 Read more in the :ref:`User Guide `.	None
	feature_name_combiner feature_name_combiner: "concat" or callable, default="concat" Callable with signature `def callable(input_feature, category)` that returns a string. This is used to create feature names to be returned by :meth:`get_feature_names_out`. `"concat"` concatenates encoded feature name and category with `feature + "_" + str(category)`.E.g. feature X with values 1, 6, 7 create feature names `X_1, X_6, X_7`. .. versionadded:: 1.3	'concat'

	n_components	30
	vectorizer	'tfidf'
	ngram_range	(3, ...)
	analyzer	'char_wb'
	stop_words	None
	random_state	None

	loss loss: {'squared_error', 'absolute_error', 'gamma', 'poisson', 'quantile'}, default='squared_error' The loss function to use in the boosting process. Note that the "squared error", "gamma" and "poisson" losses actually implement "half least squares loss", "half gamma deviance" and "half poisson deviance" to simplify the computation of the gradient. Furthermore, "gamma" and "poisson" losses internally use a log-link, "gamma" requires ``y > 0`` and "poisson" requires ``y >= 0``. "quantile" uses the pinball loss. .. versionchanged:: 0.23 Added option 'poisson'. .. versionchanged:: 1.1 Added option 'quantile'. .. versionchanged:: 1.3 Added option 'gamma'.	'squared_error'
	quantile quantile: float, default=None If loss is "quantile", this parameter specifies which quantile to be estimated and must be between 0 and 1.	None
	learning_rate learning_rate: float, default=0.1 The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use ``1`` for no shrinkage.	0.1
	max_iter max_iter: int, default=100 The maximum number of iterations of the boosting process, i.e. the maximum number of trees.	100
	max_leaf_nodes max_leaf_nodes: int or None, default=31 The maximum number of leaves for each tree. Must be strictly greater than 1. If None, there is no maximum limit.	31
	max_depth max_depth: int or None, default=None The maximum depth of each tree. The depth of a tree is the number of edges to go from the root to the deepest leaf. Depth isn't constrained by default.	None
	min_samples_leaf min_samples_leaf: int, default=20 The minimum number of samples per leaf. For small datasets with less than a few hundred samples, it is recommended to lower this value since only very shallow trees would be built.	20
	l2_regularization l2_regularization: float, default=0 The L2 regularization parameter penalizing leaves with small hessians. Use ``0`` for no regularization (default).	0.0
	max_features max_features: float, default=1.0 Proportion of randomly chosen features in each and every node split. This is a form of regularization, smaller values make the trees weaker learners and might prevent overfitting. If interaction constraints from `interaction_cst` are present, only allowed features are taken into account for the subsampling. .. versionadded:: 1.4	1.0
	max_bins max_bins: int, default=255 The maximum number of bins to use for non-missing values. Before training, each feature of the input array `X` is binned into integer-valued bins, which allows for a much faster training stage. Features with a small number of unique values may use less than ``max_bins`` bins. In addition to the ``max_bins`` bins, one more bin is always reserved for missing values. Must be no larger than 255.	255
	categorical_features categorical_features: array-like of {bool, int, str} of shape (n_features) or shape (n_categorical_features,), default='from_dtype' Indicates the categorical features. - None : no feature will be considered categorical. - boolean array-like : boolean mask indicating categorical features. - integer array-like : integer indices indicating categorical features. - str array-like: names of categorical features (assuming the training data has feature names). - `"from_dtype"`: dataframe columns with dtype "category" are considered to be categorical features. The input must be an object exposing a ``__dataframe__`` method such as pandas or polars DataFrames to use this feature. For each categorical feature, there must be at most `max_bins` unique categories. Negative values for categorical features encoded as numeric dtypes are treated as missing values. All categorical values are converted to floating point numbers. This means that categorical values of 1.0 and 1 are treated as the same category. Read more in the :ref:`User Guide ` and :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py`. .. versionadded:: 0.24 .. versionchanged:: 1.2 Added support for feature names. .. versionchanged:: 1.4 Added `"from_dtype"` option. .. versionchanged:: 1.6 The default value changed from `None` to `"from_dtype"`.	'from_dtype'
	monotonic_cst monotonic_cst: array-like of int of shape (n_features) or dict, default=None Monotonic constraint to enforce on each feature are specified using the following integer values: - 1: monotonic increase - 0: no constraint - -1: monotonic decrease If a dict with str keys, map feature to monotonic constraints by name. If an array, the features are mapped to constraints by position. See :ref:`monotonic_cst_features_names` for a usage example. Read more in the :ref:`User Guide `. .. versionadded:: 0.23 .. versionchanged:: 1.2 Accept dict of constraints with feature names as keys.	None
	interaction_cst interaction_cst: {"pairwise", "no_interactions"} or sequence of lists/tuples/sets of int, default=None Specify interaction constraints, the sets of features which can interact with each other in child node splits. Each item specifies the set of feature indices that are allowed to interact with each other. If there are more features than specified in these constraints, they are treated as if they were specified as an additional set. The strings "pairwise" and "no_interactions" are shorthands for allowing only pairwise or no interactions, respectively. For instance, with 5 features in total, `interaction_cst=[{0, 1}]` is equivalent to `interaction_cst=[{0, 1}, {2, 3, 4}]`, and specifies that each branch of a tree will either only split on features 0 and 1 or only split on features 2, 3 and 4. See :ref:`this example` on how to use `interaction_cst`. .. versionadded:: 1.2	None
	warm_start warm_start: bool, default=False When set to ``True``, reuse the solution of the previous call to fit and add more estimators to the ensemble. For results to be valid, the estimator should be re-trained on the same data only. See :term:`the Glossary `.	False
	early_stopping early_stopping: 'auto' or bool, default='auto' If 'auto', early stopping is enabled if the sample size is larger than 10000 or if `X_val` and `y_val` are passed to `fit`. If True, early stopping is enabled, otherwise early stopping is disabled. .. versionadded:: 0.23	'auto'
	scoring scoring: str or callable or None, default='loss' Scoring method to use for early stopping. Only used if `early_stopping` is enabled. Options: - str: see :ref:`scoring_string_names` for options. - callable: a scorer callable object (e.g., function) with signature ``scorer(estimator, X, y)``. See :ref:`scoring_callable` for details. - `None`: the :ref:`coefficient of determination ` (:math:`R^2`) is used. - 'loss': early stopping is checked w.r.t the loss value.	'loss'
	validation_fraction validation_fraction: int or float or None, default=0.1 Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data. The value is ignored if either early stopping is not performed, e.g. `early_stopping=False`, or if `X_val` and `y_val` are passed to fit.	0.1
	n_iter_no_change n_iter_no_change: int, default=10 Used to determine when to "early stop". The fitting process is stopped when none of the last ``n_iter_no_change`` scores are better than the ``n_iter_no_change - 1`` -th-to-last one, up to some tolerance. Only used if early stopping is performed.	10
	tol tol: float, default=1e-7 The absolute tolerance to use when comparing scores during early stopping. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score.	1e-07
	verbose verbose: int, default=0 The verbosity level. If not zero, print some information about the fitting process. ``1`` prints only summary info, ``2`` prints info per iteration.	0
	random_state random_state: int, RandomState instance or None, default=None Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `.	None

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators ` for more details.	[('columntransformer', ...), ('ridgecv', ...)]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing `. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

	cardinality_threshold	40
	low_cardinality	OneHotEncoder..._output=False)
	high_cardinality	StringEncoder()
	numeric	PassThrough()
	datetime	DatetimeEncoder()
	specific_transformers	()
	drop_null_fraction	1.0
	drop_if_constant	False
	drop_if_unique	False
	datetime_format	None
	n_jobs	None

	resolution	'hour'
	add_weekday	False
	add_total_seconds	True
	add_day_of_year	False
	periodic_encoding	None

	transformers transformers: list of tuples List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data. name : str Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using ``set_params`` and searched in grid search. transformer : {'drop', 'passthrough'} or estimator Estimator must support :term:`fit` and :term:`transform`. Special-cased strings 'drop' and 'passthrough' are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively. columns : str, array-like of str, int, array-like of int, array-like of bool, slice or callable Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where ``transformer`` expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data `X` and can return any of the above. To select multiple columns by name or dtype, you can use :obj:`make_column_selector`.	[('pipeline', ...), ('onehotencoder', ...), ...]
	remainder remainder: {'drop', 'passthrough'} or estimator, default='drop' By default, only the specified columns in `transformers` are transformed and combined in the output, and the non-specified columns are dropped. (default of ``'drop'``). By specifying ``remainder='passthrough'``, all remaining columns that were not specified in `transformers`, but present in the data passed to `fit` will be automatically passed through. This subset of columns is concatenated with the output of the transformers. For dataframes, extra columns not seen during `fit` will be excluded from the output of `transform`. By setting ``remainder`` to be an estimator, the remaining non-specified columns will use the ``remainder`` estimator. The estimator must support :term:`fit` and :term:`transform`. Note that using this feature requires that the DataFrame columns input at :term:`fit` and :term:`transform` have identical order.	'drop'
	sparse_threshold sparse_threshold: float, default=0.3 If the output of the different transformers contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use ``sparse_threshold=0`` to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.	0.3
	n_jobs n_jobs: int, default=None Number of jobs to run in parallel. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	transformer_weights transformer_weights: dict, default=None Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each transformer will be printed as it is completed.	False
	verbose_feature_names_out verbose_feature_names_out: bool, str or Callable[[str, str], str], default=True - If True, :meth:`ColumnTransformer.get_feature_names_out` will prefix all feature names with the name of the transformer that generated that feature. It is equivalent to setting `verbose_feature_names_out="{transformer_name}__{feature_name}"`. - If False, :meth:`ColumnTransformer.get_feature_names_out` will not prefix any feature names and will error if feature names are not unique. - If ``Callable[[str, str], str]``, :meth:`ColumnTransformer.get_feature_names_out` will rename all the features using the name of the transformer. The first argument of the callable is the transformer name and the second argument is the feature name. The returned string will be the new feature name. - If ``str``, it must be a string ready for formatting. The given string will be formatted using two field names: ``transformer_name`` and ``feature_name``. e.g. ``"{feature_name}__{transformer_name}"``. See :meth:`str.format` method from the standard library for more info. .. versionadded:: 1.0 .. versionchanged:: 1.6 `verbose_feature_names_out` can be a callable or a string to be formatted.	True
	force_int_remainder_cols force_int_remainder_cols: bool, default=False This parameter has no effect. .. note:: If you do not access the list of columns for the remainder columns in the `transformers_` fitted attribute, you do not need to set this parameter. .. versionadded:: 1.5 .. versionchanged:: 1.7 The default value for `force_int_remainder_cols` will change from `True` to `False` in version 1.7. .. deprecated:: 1.7 `force_int_remainder_cols` is deprecated and will be removed in 1.9.	'deprecated'

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators ` for more details.	[('todatetime', ...), ('datetimeencoder', ...), ...]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing `. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

	resolution	'day'
	add_weekday	True
	add_total_seconds	False
	add_day_of_year	False
	periodic_encoding	None

	transformers transformers: list of tuples List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data. name : str Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using ``set_params`` and searched in grid search. transformer : {'drop', 'passthrough'} or estimator Estimator must support :term:`fit` and :term:`transform`. Special-cased strings 'drop' and 'passthrough' are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively. columns : str, array-like of str, int, array-like of int, array-like of bool, slice or callable Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where ``transformer`` expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data `X` and can return any of the above. To select multiple columns by name or dtype, you can use :obj:`make_column_selector`.	[('splinetransformer-1', ...), ('splinetransformer-2', ...), ...]
	remainder remainder: {'drop', 'passthrough'} or estimator, default='drop' By default, only the specified columns in `transformers` are transformed and combined in the output, and the non-specified columns are dropped. (default of ``'drop'``). By specifying ``remainder='passthrough'``, all remaining columns that were not specified in `transformers`, but present in the data passed to `fit` will be automatically passed through. This subset of columns is concatenated with the output of the transformers. For dataframes, extra columns not seen during `fit` will be excluded from the output of `transform`. By setting ``remainder`` to be an estimator, the remaining non-specified columns will use the ``remainder`` estimator. The estimator must support :term:`fit` and :term:`transform`. Note that using this feature requires that the DataFrame columns input at :term:`fit` and :term:`transform` have identical order.	'drop'
	sparse_threshold sparse_threshold: float, default=0.3 If the output of the different transformers contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use ``sparse_threshold=0`` to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.	0.3
	n_jobs n_jobs: int, default=None Number of jobs to run in parallel. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	transformer_weights transformer_weights: dict, default=None Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each transformer will be printed as it is completed.	False
	verbose_feature_names_out verbose_feature_names_out: bool, str or Callable[[str, str], str], default=True - If True, :meth:`ColumnTransformer.get_feature_names_out` will prefix all feature names with the name of the transformer that generated that feature. It is equivalent to setting `verbose_feature_names_out="{transformer_name}__{feature_name}"`. - If False, :meth:`ColumnTransformer.get_feature_names_out` will not prefix any feature names and will error if feature names are not unique. - If ``Callable[[str, str], str]``, :meth:`ColumnTransformer.get_feature_names_out` will rename all the features using the name of the transformer. The first argument of the callable is the transformer name and the second argument is the feature name. The returned string will be the new feature name. - If ``str``, it must be a string ready for formatting. The given string will be formatted using two field names: ``transformer_name`` and ``feature_name``. e.g. ``"{feature_name}__{transformer_name}"``. See :meth:`str.format` method from the standard library for more info. .. versionadded:: 1.0 .. versionchanged:: 1.6 `verbose_feature_names_out` can be a callable or a string to be formatted.	True
	force_int_remainder_cols force_int_remainder_cols: bool, default=False This parameter has no effect. .. note:: If you do not access the list of columns for the remainder columns in the `transformers_` fitted attribute, you do not need to set this parameter. .. versionadded:: 1.5 .. versionchanged:: 1.7 The default value for `force_int_remainder_cols` will change from `True` to `False` in version 1.7. .. deprecated:: 1.7 `force_int_remainder_cols` is deprecated and will be removed in 1.9.	'deprecated'

	n_knots n_knots: int, default=5 Number of knots of the splines if `knots` equals one of {'uniform', 'quantile'}. Must be larger or equal 2. Ignored if `knots` is array-like.	7
	degree degree: int, default=3 The polynomial degree of the spline basis. Must be a non-negative integer.	3
	knots knots: {'uniform', 'quantile'} or array-like of shape (n_knots, n_features), default='uniform' Set knot positions such that first knot <= features <= last knot. - If 'uniform', `n_knots` number of knots are distributed uniformly from min to max values of the features. - If 'quantile', they are distributed uniformly along the quantiles of the features. - If an array-like is given, it directly specifies the sorted knot positions including the boundary knots. Note that, internally, `degree` number of knots are added before the first knot, the same after the last knot.	array([[ 0.],... [12.]])
	extrapolation extrapolation: {'error', 'constant', 'linear', 'continue', 'periodic'}, default='constant' If 'error', values outside the min and max values of the training features raises a `ValueError`. If 'constant', the value of the splines at minimum and maximum value of the features is used as constant extrapolation. If 'linear', a linear extrapolation is used. If 'continue', the splines are extrapolated as is, i.e. option `extrapolate=True` in :class:`scipy.interpolate.BSpline`. If 'periodic', periodic splines with a periodicity equal to the distance between the first and last knot are used. Periodic splines enforce equal function values and derivatives at the first and last knot. For example, this makes it possible to avoid introducing an arbitrary jump between Dec 31st and Jan 1st in spline features derived from a naturally periodic "day-of-year" input feature. In this case it is recommended to manually set the knot values to control the period.	'periodic'
	include_bias include_bias: bool, default=True If False, then the last spline element inside the data range of a feature is dropped. As B-splines sum to one over the spline basis functions for each data point, they implicitly include a bias term, i.e. a column of ones. It acts as an intercept term in a linear models.	True
	order order: {'C', 'F'}, default='C' Order of output array in the dense case. `'F'` order is faster to compute, but may slow down subsequent estimators.	'C'
	handle_missing handle_missing: {'error', 'zeros'}, default='error' Specifies the way missing values are handled. - 'error' : Raise an error if `np.nan` values are present during :meth:`fit`. - 'zeros' : Encode splines of missing values with values `0`. Note that `handle_missing='zeros'` differs from first imputing missing values with zeros and then creating the spline basis. The latter creates spline basis functions which have non-zero values at the missing values whereas this option simply sets all spline basis function values to zero at the missing values. .. versionadded:: 1.8	'error'
	sparse_output sparse_output: bool, default=False Will return sparse CSR matrix if set True else will return an array. .. versionadded:: 1.2	False

	n_components	100
	batch_size	1024
	gamma_shape_prior	1.1
	gamma_scale_prior	1.0
	rho	0.95
	rescale_rho	False
	hashing	False
	hashing_n_features	4096
	init	'k-means++'
	max_iter	5
	ngram_range	(2, ...)
	analyzer	'char'
	add_words	False
	random_state	None
	rescale_W	True
	max_iter_e_step	1
	max_no_improvement	5
	verbose	0

	alphas alphas: array-like of shape (n_alphas,), default=(0.1, 1.0, 10.0) Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to ``1 / (2C)`` in other linear models such as :class:`~sklearn.linear_model.LogisticRegression` or :class:`~sklearn.svm.LinearSVC`. If using Leave-One-Out cross-validation, alphas must be strictly positive.	array([1.0000...00000000e+03])
	fit_intercept fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).	True
	scoring scoring: str, callable, default=None The scoring method to use for cross-validation. Options: - str: see :ref:`scoring_string_names` for options. - callable: a scorer callable object (e.g., function) with signature ``scorer(estimator, X, y)``. See :ref:`scoring_callable` for details. - `None`: negative :ref:`mean squared error ` if cv is None (i.e. when using leave-one-out cross-validation), or :ref:`coefficient of determination ` (:math:`R^2`) otherwise.	None
	cv cv: int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the efficient Leave-One-Out cross-validation - integer, to specify the number of folds. - :term:`CV splitter`, - An iterable yielding (train, test) splits as arrays of indices. For integer/None inputs, if ``y`` is binary or multiclass, :class:`~sklearn.model_selection.StratifiedKFold` is used, else, :class:`~sklearn.model_selection.KFold` is used. Refer :ref:`User Guide ` for the various cross-validation strategies that can be used here.	None
	gcv_mode gcv_mode: {'auto', 'svd', 'eigen'}, default='auto' Flag indicating which strategy to use when performing Leave-One-Out Cross-Validation. Options are:: 'auto' : use 'svd' if n_samples > n_features, otherwise use 'eigen' 'svd' : force use of singular value decomposition of X when X is dense, eigenvalue decomposition of X^T.X when X is sparse. 'eigen' : force computation via eigendecomposition of X.X^T The 'auto' mode is the default and is intended to pick the cheaper option of the two depending on the shape of the training data.	None
	store_cv_results store_cv_results: bool, default=False Flag indicating if the cross-validation values corresponding to each alpha should be stored in the ``cv_results_`` attribute (see below). This flag is only compatible with ``cv=None`` (i.e. using Leave-One-Out Cross-Validation). .. versionchanged:: 1.5 Parameter name changed from `store_cv_values` to `store_cv_results`.	False
	alpha_per_target alpha_per_target: bool, default=False Flag indicating whether to optimize the alpha value (picked from the `alphas` parameter list) for each target separately (for multi-output settings: multiple prediction targets). When set to `True`, after fitting, the `alpha_` attribute will contain a value for each target. When set to `False`, a single alpha is used for all targets. .. versionadded:: 0.24	False

gender

department

department_name

division

assignment_category

employee_position_title

date_first_hired

year_first_hired

gender

department

department_name

division

assignment_category

employee_position_title

date_first_hired

year_first_hired

Please enable javascript

current_annual_salary

current_annual_salary

Please enable javascript

gender

department

department_name

division

assignment_category

employee_position_title

date_first_hired

year_first_hired

current_annual_salary

gender

department

department_name

division

assignment_category

employee_position_title

date_first_hired

year_first_hired

current_annual_salary

Please enable javascript