.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_evaluation/plot_estimator_report.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_evaluation_plot_estimator_report.py: .. _example_estimator_report: =============================================================== `EstimatorReport`: Get insights from any scikit-learn estimator =============================================================== This example shows how the :class:`skore.EstimatorReport` class can be used to quickly get insights from any scikit-learn estimator. .. GENERATED FROM PYTHON SOURCE LINES 13-19 Loading our dataset and defining our estimator ============================================== First, we load a dataset from skrub. Our goal is to predict if a healthcare manufacturing companies paid a medical doctors or hospitals, in order to detect potential conflict of interest. .. GENERATED FROM PYTHON SOURCE LINES 21-27 .. code-block:: Python from skrub.datasets import fetch_open_payments dataset = fetch_open_payments() df = dataset.X y = dataset.y .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading 'open_payments' from https://github.com/skrub-data/skrub-data-files/raw/refs/heads/main/open_payments.zip (attempt 1/3) .. GENERATED FROM PYTHON SOURCE LINES 28-32 .. code-block:: Python from skrub import TableReport TableReport(df) .. raw:: html

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



.. GENERATED FROM PYTHON SOURCE LINES 33-35 .. code-block:: Python TableReport(y.to_frame()) .. raw:: html

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



.. GENERATED FROM PYTHON SOURCE LINES 36-43 Looking at the distributions of the target, we observe that this classification task is quite imbalanced. It means that we have to be careful when selecting a set of statistical metrics to evaluate the classification performance of our predictive model. In addition, we see that the class labels are not specified by an integer 0 or 1 but instead by a string "allowed" or "disallowed". For our application, the label of interest is "allowed". .. GENERATED FROM PYTHON SOURCE LINES 43-45 .. code-block:: Python pos_label, neg_label = "allowed", "disallowed" .. GENERATED FROM PYTHON SOURCE LINES 46-48 Before training a predictive model, we need to split our dataset into a training and a validation set. .. GENERATED FROM PYTHON SOURCE LINES 48-54 .. code-block:: Python from skore import train_test_split # If you have many dataframes to split on, you can always ask train_test_split to return # a dictionary. Remember, it needs to be passed as a keyword argument! split_data = train_test_split(X=df, y=y, random_state=42, as_dict=True) .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────────────────────── HighClassImbalanceWarning ──────────────────────────────╮ │ It seems that you have a classification problem with a high class imbalance. In this │ │ case, using train_test_split may not be a good idea because of high variability in │ │ the scores obtained on the test set. To tackle this challenge we suggest to use │ │ skore's CrossValidationReport with the `splitter` parameter of your choice. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────── ShuffleTrueWarning ─────────────────────────────────╮ │ We detected that the `shuffle` parameter is set to `True` either explicitly or from │ │ its default value. In case of time-ordered events (even if they are independent), │ │ this will result in inflated model performance evaluation because natural drift will │ │ not be taken into account. We recommend setting the shuffle parameter to `False` in │ │ order to ensure the evaluation process is really representative of your production │ │ release process. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 55-65 By the way, notice how skore's :func:`~skore.train_test_split` automatically warns us for a class imbalance. Now, we need to define a predictive model. Hopefully, `skrub` provides a convenient function (:func:`skrub.tabular_learner`) when it comes to getting strong baseline predictive models with a single line of code. As its feature engineering is generic, it does not provide some handcrafted and tailored feature engineering but still provides a good starting point. So let's create a classifier for our task. .. GENERATED FROM PYTHON SOURCE LINES 65-70 .. code-block:: Python from skrub import tabular_learner estimator = tabular_learner("classifier") estimator .. rst-class:: sphx-glr-script-out .. code-block:: none /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/skrub/_tabular_pipeline.py:75: FutureWarning: tabular_learner will be deprecated in the next release. Equivalent functionality is available in skrub.tabular_pipeline. .. raw:: html
Pipeline(steps=[('tablevectorizer',
                     TableVectorizer(low_cardinality=ToCategorical())),
                    ('histgradientboostingclassifier',
                     HistGradientBoostingClassifier())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 71-80 Getting insights from our estimator =================================== Introducing the :class:`skore.EstimatorReport` class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Now, we would be interested in getting some insights from our predictive model. One way is to use the :class:`skore.EstimatorReport` class. This constructor will detect that our estimator is unfitted and will fit it for us on the training data. .. GENERATED FROM PYTHON SOURCE LINES 80-84 .. code-block:: Python from skore import EstimatorReport report = EstimatorReport(estimator, **split_data, pos_label=pos_label) .. GENERATED FROM PYTHON SOURCE LINES 85-88 Once the report is created, we get some information regarding the available tools allowing us to get some insights from our specific model on our specific task by calling the :meth:`~skore.EstimatorReport.help` method. .. GENERATED FROM PYTHON SOURCE LINES 89-91 .. code-block:: Python report.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────── Tools to diagnose estimator HistGradientBoostingClassifier ─────────────╮ │ EstimatorReport │ │ ├── .metrics │ │ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │ │ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │ │ │ ├── .confusion_matrix(...) - Plot the confusion matrix. │ │ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │ │ │ ├── .precision(...) (↗︎) - Compute the precision score. │ │ │ ├── .precision_recall(...) - Plot the precision-recall curve. │ │ │ ├── .recall(...) (↗︎) - Compute the recall score. │ │ │ ├── .roc(...) - Plot the ROC curve. │ │ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │ │ │ ├── .timings(...) - Get all measured processing times related │ │ │ │ to the estimator. │ │ │ ├── .custom_metric(...) - Compute a custom metric. │ │ │ └── .summarize(...) - Report a set of metrics for our estimator. │ │ ├── .feature_importance │ │ │ └── .permutation(...) - Report the permutation feature importance. │ │ ├── .data │ │ │ └── .analyze(...) - Plot dataset statistics. │ │ ├── .cache_predictions(...) - Cache estimator's predictions. │ │ ├── .clear_cache(...) - Clear the cache. │ │ ├── .get_predictions(...) - Get estimator's predictions. │ │ └── Attributes │ │ ├── .X_test - Testing data │ │ ├── .X_train - Training data │ │ ├── .y_test - Testing target │ │ ├── .y_train - Training target │ │ ├── .estimator - Estimator to make the report from │ │ ├── .estimator_ - The cloned or copied estimator │ │ ├── .estimator_name_ - The name of the estimator │ │ ├── .fit - Whether to fit the estimator on the │ │ │ training data │ │ ├── .fit_time_ - The time taken to fit the estimator, in │ │ │ seconds │ │ ├── .ml_task - No description available │ │ └── .pos_label - For binary classification, the positive │ │ class │ │ │ │ │ │ Legend: │ │ (↗︎) higher is better (↘︎) lower is better │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 92-93 Be aware that we can access the help for each individual sub-accessor. For instance: .. GENERATED FROM PYTHON SOURCE LINES 94-96 .. code-block:: Python report.metrics.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────────────────────── Available metrics methods ──────────────────────────────╮ │ report.metrics │ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │ │ ├── .confusion_matrix(...) - Plot the confusion matrix. │ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │ │ ├── .precision(...) (↗︎) - Compute the precision score. │ │ ├── .precision_recall(...) - Plot the precision-recall curve. │ │ ├── .recall(...) (↗︎) - Compute the recall score. │ │ ├── .roc(...) - Plot the ROC curve. │ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │ │ ├── .timings(...) - Get all measured processing times related to │ │ │ the estimator. │ │ ├── .custom_metric(...) - Compute a custom metric. │ │ └── .summarize(...) - Report a set of metrics for our estimator. │ │ │ │ │ │ Legend: │ │ (↗︎) higher is better (↘︎) lower is better │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 97-105 Metrics computation with aggressive caching ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ At this point, we might be interested to have a first look at the statistical performance of our model on the validation set that we provided. We can access it by calling any of the metrics displayed above. Since we are greedy, we want to get several metrics at once and we will use the :meth:`~skore.EstimatorReport.metrics.summarize` method. .. GENERATED FROM PYTHON SOURCE LINES 106-113 .. code-block:: Python import time start = time.time() metric_report = report.metrics.summarize().frame() end = time.time() metric_report .. raw:: html
HistGradientBoostingClassifier
Metric
Precision 0.684615
Recall 0.458763
ROC AUC 0.943806
Brier score 0.035087
Fit time (s) 9.009192
Predict time (s) 1.553413


.. GENERATED FROM PYTHON SOURCE LINES 114-116 .. code-block:: Python print(f"Time taken to compute the metrics: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the metrics: 4.81 seconds .. GENERATED FROM PYTHON SOURCE LINES 117-124 An interesting feature provided by the :class:`skore.EstimatorReport` is the the caching mechanism. Indeed, when we have a large enough dataset, computing the predictions for a model is not cheap anymore. For instance, on our smallish dataset, it took a couple of seconds to compute the metrics. The report will cache the predictions and if we are interested in computing a metric again or an alternative metric that requires the same predictions, it will be faster. Let's check by requesting the same metrics report again. .. GENERATED FROM PYTHON SOURCE LINES 125-131 .. code-block:: Python start = time.time() metric_report = report.metrics.summarize().frame() end = time.time() metric_report .. raw:: html
HistGradientBoostingClassifier
Metric
Precision 0.684615
Recall 0.458763
ROC AUC 0.943806
Brier score 0.035087
Fit time (s) 9.009192
Predict time (s) 1.553413


.. GENERATED FROM PYTHON SOURCE LINES 132-134 .. code-block:: Python print(f"Time taken to compute the metrics: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the metrics: 0.00 seconds .. GENERATED FROM PYTHON SOURCE LINES 135-137 Note that when the model is fitted or the predictions are computed, we additionally store the time the operation took: .. GENERATED FROM PYTHON SOURCE LINES 138-140 .. code-block:: Python report.metrics.timings() .. rst-class:: sphx-glr-script-out .. code-block:: none {'fit_time': 9.009191993000002, 'predict_time_test': 1.5534125639999843} .. GENERATED FROM PYTHON SOURCE LINES 141-143 Since we obtain a pandas dataframe, we can also use the plotting interface of pandas. .. GENERATED FROM PYTHON SOURCE LINES 144-149 .. code-block:: Python import matplotlib.pyplot as plt ax = metric_report.plot.barh() ax.set_title("Metrics report") .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_001.png :alt: Metrics report :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'Metrics report') .. GENERATED FROM PYTHON SOURCE LINES 150-152 Whenever computing a metric, we check if the predictions are available in the cache and reload them if available. So for instance, let's compute the log loss. .. GENERATED FROM PYTHON SOURCE LINES 153-159 .. code-block:: Python start = time.time() log_loss = report.metrics.log_loss() end = time.time() log_loss .. rst-class:: sphx-glr-script-out .. code-block:: none 0.12371864066132363 .. GENERATED FROM PYTHON SOURCE LINES 160-162 .. code-block:: Python print(f"Time taken to compute the log loss: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the log loss: 0.03 seconds .. GENERATED FROM PYTHON SOURCE LINES 163-165 We can show that without initial cache, it would have taken more time to compute the log loss. .. GENERATED FROM PYTHON SOURCE LINES 166-173 .. code-block:: Python report.clear_cache() start = time.time() log_loss = report.metrics.log_loss() end = time.time() log_loss .. rst-class:: sphx-glr-script-out .. code-block:: none 0.12371864066132363 .. GENERATED FROM PYTHON SOURCE LINES 174-176 .. code-block:: Python print(f"Time taken to compute the log loss: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the log loss: 1.57 seconds .. GENERATED FROM PYTHON SOURCE LINES 177-180 By default, the metrics are computed on the test set only. However, if a training set is provided, we can also compute the metrics by specifying the `data_source` parameter. .. GENERATED FROM PYTHON SOURCE LINES 181-183 .. code-block:: Python report.metrics.log_loss(data_source="train") .. rst-class:: sphx-glr-script-out .. code-block:: none 0.09681882698124583 .. GENERATED FROM PYTHON SOURCE LINES 184-187 In the case where we are interested in computing the metrics on a completely new set of data, we can use the `data_source="X_y"` parameter. In addition, we need to provide a `X` and `y` parameters. .. GENERATED FROM PYTHON SOURCE LINES 188-196 .. code-block:: Python start = time.time() metric_report = report.metrics.summarize( data_source="X_y", X=split_data["X_test"], y=split_data["y_test"] ).frame() end = time.time() metric_report .. raw:: html
HistGradientBoostingClassifier
Metric
Precision 0.684615
Recall 0.458763
ROC AUC 0.943806
Brier score 0.035087
Fit time (s) 9.009192
Predict time (s) 1.558121


.. GENERATED FROM PYTHON SOURCE LINES 197-199 .. code-block:: Python print(f"Time taken to compute the metrics: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the metrics: 5.01 seconds .. GENERATED FROM PYTHON SOURCE LINES 200-203 As in the other case, we rely on the cache to avoid recomputing the predictions. Internally, we compute a hash of the input data to be sure that we can hit the cache in a consistent way. .. GENERATED FROM PYTHON SOURCE LINES 206-213 .. code-block:: Python start = time.time() metric_report = report.metrics.summarize( data_source="X_y", X=split_data["X_test"], y=split_data["y_test"] ).frame() end = time.time() metric_report .. raw:: html
HistGradientBoostingClassifier
Metric
Precision 0.684615
Recall 0.458763
ROC AUC 0.943806
Brier score 0.035087
Fit time (s) 9.009192
Predict time (s) 1.558121


.. GENERATED FROM PYTHON SOURCE LINES 214-216 .. code-block:: Python print(f"Time taken to compute the metrics: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the metrics: 0.20 seconds .. GENERATED FROM PYTHON SOURCE LINES 217-226 .. note:: In this last example, we rely on computing the hash of the input data. Therefore, there is a trade-off: the computation of the hash is not free and it might be faster to compute the predictions instead. Be aware that we can also benefit from the caching mechanism with our own custom metrics. Skore only expects that we define our own metric function to take `y_true` and `y_pred` as the first two positional arguments. It can take any other arguments. Let's see an example. .. GENERATED FROM PYTHON SOURCE LINES 227-241 .. code-block:: Python def operational_decision_cost(y_true, y_pred, amount): mask_true_positive = (y_true == pos_label) & (y_pred == pos_label) mask_true_negative = (y_true == neg_label) & (y_pred == neg_label) mask_false_positive = (y_true == neg_label) & (y_pred == pos_label) mask_false_negative = (y_true == pos_label) & (y_pred == neg_label) fraudulent_refuse = mask_true_positive.sum() * 50 fraudulent_accept = -amount[mask_false_negative].sum() legitimate_refuse = mask_false_positive.sum() * -5 legitimate_accept = (amount[mask_true_negative] * 0.02).sum() return fraudulent_refuse + fraudulent_accept + legitimate_refuse + legitimate_accept .. GENERATED FROM PYTHON SOURCE LINES 242-246 In our use case, we have a operational decision to make that translate the classification outcome into a cost. It translate the confusion matrix into a cost matrix based on some amount linked to each sample in the dataset that are provided to us. Here, we randomly generate some amount as an illustration. .. GENERATED FROM PYTHON SOURCE LINES 247-252 .. code-block:: Python import numpy as np rng = np.random.default_rng(42) amount = rng.integers(low=100, high=1000, size=len(split_data["y_test"])) .. GENERATED FROM PYTHON SOURCE LINES 253-255 Let's make sure that a function called the `predict` method and cached the result. We compute the accuracy metric to make sure that the `predict` method is called. .. GENERATED FROM PYTHON SOURCE LINES 256-258 .. code-block:: Python report.metrics.accuracy() .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9523654159869495 .. GENERATED FROM PYTHON SOURCE LINES 259-260 We can now compute the cost of our operational decision. .. GENERATED FROM PYTHON SOURCE LINES 261-268 .. code-block:: Python start = time.time() cost = report.metrics.custom_metric( metric_function=operational_decision_cost, response_method="predict", amount=amount ) end = time.time() cost .. rst-class:: sphx-glr-script-out .. code-block:: none -132431.94 .. GENERATED FROM PYTHON SOURCE LINES 269-271 .. code-block:: Python print(f"Time taken to compute the cost: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the cost: 0.01 seconds .. GENERATED FROM PYTHON SOURCE LINES 272-273 Let's now clean the cache and see if it is faster. .. GENERATED FROM PYTHON SOURCE LINES 274-276 .. code-block:: Python report.clear_cache() .. GENERATED FROM PYTHON SOURCE LINES 277-284 .. code-block:: Python start = time.time() cost = report.metrics.custom_metric( metric_function=operational_decision_cost, response_method="predict", amount=amount ) end = time.time() cost .. rst-class:: sphx-glr-script-out .. code-block:: none -132431.94 .. GENERATED FROM PYTHON SOURCE LINES 285-287 .. code-block:: Python print(f"Time taken to compute the cost: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the cost: 1.56 seconds .. GENERATED FROM PYTHON SOURCE LINES 288-291 We observe that caching is working as expected. It is really handy because it means that we can compute some additional metrics without having to recompute the the predictions. .. GENERATED FROM PYTHON SOURCE LINES 292-301 .. code-block:: Python report.metrics.summarize( scoring={ "Precision": "precision", "Recall": "recall", "Operational Decision Cost": operational_decision_cost, }, scoring_kwargs={"amount": amount, "response_method": "predict"}, ).frame() .. raw:: html
HistGradientBoostingClassifier
Metric
Precision 0.684615
Recall 0.458763
Operational Decision Cost -132431.940000


.. GENERATED FROM PYTHON SOURCE LINES 302-306 It could happen that we are interested in providing several custom metrics which does not necessarily share the same parameters. In this more complex case, skore will require us to provide a scorer using the :func:`sklearn.metrics.make_scorer` function. .. GENERATED FROM PYTHON SOURCE LINES 307-320 .. code-block:: Python from sklearn.metrics import make_scorer, f1_score f1_scorer = make_scorer(f1_score, response_method="predict") operational_decision_cost_scorer = make_scorer( operational_decision_cost, response_method="predict", amount=amount ) report.metrics.summarize( scoring={ "F1 Score": f1_scorer, "Operational Decision Cost": operational_decision_cost_scorer, }, ).frame() .. raw:: html
HistGradientBoostingClassifier
Metric
F1 Score 0.549383
Operational Decision Cost -132431.940000


.. GENERATED FROM PYTHON SOURCE LINES 321-327 Effortless one-liner plotting ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :class:`skore.EstimatorReport` class also provides a plotting interface that allows to plot *defacto* the most common plots. As for the metrics, we only provide the meaningful set of plots for the provided estimator. .. GENERATED FROM PYTHON SOURCE LINES 328-330 .. code-block:: Python report.metrics.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────────────────────── Available metrics methods ──────────────────────────────╮ │ report.metrics │ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │ │ ├── .confusion_matrix(...) - Plot the confusion matrix. │ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │ │ ├── .precision(...) (↗︎) - Compute the precision score. │ │ ├── .precision_recall(...) - Plot the precision-recall curve. │ │ ├── .recall(...) (↗︎) - Compute the recall score. │ │ ├── .roc(...) - Plot the ROC curve. │ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │ │ ├── .timings(...) - Get all measured processing times related to │ │ │ the estimator. │ │ ├── .custom_metric(...) - Compute a custom metric. │ │ └── .summarize(...) - Report a set of metrics for our estimator. │ │ │ │ │ │ Legend: │ │ (↗︎) higher is better (↘︎) lower is better │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 331-332 Let's start by plotting the ROC curve for our binary classification task. .. GENERATED FROM PYTHON SOURCE LINES 333-336 .. code-block:: Python display = report.metrics.roc() display.plot() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_002.png :alt: ROC Curve for HistGradientBoostingClassifier :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 337-341 The plot functionality is built upon the scikit-learn display objects. We return those display (slightly modified to improve the UI) in case we want to tweak some of the plot properties. We can have quick look at the available attributes and methods by calling the ``help`` method or simply by printing the display. .. GENERATED FROM PYTHON SOURCE LINES 342-344 .. code-block:: Python display .. rst-class:: sphx-glr-script-out .. code-block:: none skore.RocCurveDisplay(...) .. GENERATED FROM PYTHON SOURCE LINES 345-347 .. code-block:: Python display.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭────────────────────────── RocCurveDisplay ───────────────────────────╮ │ display │ │ ├── Attributes │ │ │ ├── .ax_ │ │ │ ├── .chance_level_ │ │ │ ├── .figure_ │ │ │ └── .lines_ │ │ └── Methods │ │ ├── .frame(...) - Get the data used to create the ROC curve plot. │ │ ├── .plot(...) - Plot visualization. │ │ └── .set_style(...) - Set the style parameters for the display. │ ╰───────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 348-351 .. code-block:: Python display.plot() _ = display.ax_.set_title("Example of a ROC curve") .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_003.png :alt: Example of a ROC curve :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 352-356 Similarly to the metrics, we aggressively use the caching to avoid recomputing the predictions of the model. We also cache the plot display object by detection if the input parameters are the same as the previous call. Let's demonstrate the kind of performance gain we can get. .. GENERATED FROM PYTHON SOURCE LINES 357-363 .. code-block:: Python start = time.time() # we already trigger the computation of the predictions in a previous call display = report.metrics.roc() display.plot() end = time.time() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_004.png :alt: ROC Curve for HistGradientBoostingClassifier :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 364-366 .. code-block:: Python print(f"Time taken to compute the ROC curve: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the ROC curve: 0.04 seconds .. GENERATED FROM PYTHON SOURCE LINES 367-368 Now, let's clean the cache and check if we get a slowdown. .. GENERATED FROM PYTHON SOURCE LINES 369-371 .. code-block:: Python report.clear_cache() .. GENERATED FROM PYTHON SOURCE LINES 372-377 .. code-block:: Python start = time.time() display = report.metrics.roc() display.plot() end = time.time() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_005.png :alt: ROC Curve for HistGradientBoostingClassifier :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 378-380 .. code-block:: Python print(f"Time taken to compute the ROC curve: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the ROC curve: 1.60 seconds .. GENERATED FROM PYTHON SOURCE LINES 381-382 As expected, since we need to recompute the predictions, it takes more time. .. GENERATED FROM PYTHON SOURCE LINES 384-389 Visualizing the confusion matrix ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Another useful visualization for classification tasks is the confusion matrix, which shows the counts of correct and incorrect predictions for each class. .. GENERATED FROM PYTHON SOURCE LINES 391-392 Let's first start with a basic confusion matrix: .. GENERATED FROM PYTHON SOURCE LINES 392-396 .. code-block:: Python cm_display = report.metrics.confusion_matrix() cm_display.plot() plt.show() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_006.png :alt: Confusion Matrix :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 397-399 We can normalize the confusion matrix to get percentages instead of raw counts. Here we normalize by true labels (rows): .. GENERATED FROM PYTHON SOURCE LINES 399-402 .. code-block:: Python cm_display.plot(normalize="true") plt.show() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_007.png :alt: Confusion Matrix :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_007.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 403-405 More plotting options are available via ``heatmap_kwargs``, which are passed to seaborn's heatmap. For example, we can customize the colormap and number format: .. GENERATED FROM PYTHON SOURCE LINES 405-408 .. code-block:: Python cm_display.plot(heatmap_kwargs={"cmap": "Greens", "fmt": ".2e"}) plt.show() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_008.png :alt: Confusion Matrix :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_008.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 409-411 Finally, the confusion matrix can also be exported as a pandas DataFrame for further analysis: .. GENERATED FROM PYTHON SOURCE LINES 411-414 .. code-block:: Python cm_frame = cm_display.frame() cm_frame .. raw:: html
predicted_label disallowed allowed
true_label
disallowed 16980 246
allowed 630 534


.. GENERATED FROM PYTHON SOURCE LINES 415-421 Decision threshold support ~~~~~~~~~~~~~~~~~~~~~~~~~~ For binary classification, the confusion matrix can be computed at different decision thresholds. This is useful for understanding how the model's predictions change as the decision threshold varies. .. GENERATED FROM PYTHON SOURCE LINES 423-424 First, we create a display with threshold support enabled: .. GENERATED FROM PYTHON SOURCE LINES 424-426 .. code-block:: Python cm_threshold_display = report.metrics.confusion_matrix(threshold=True) .. GENERATED FROM PYTHON SOURCE LINES 427-428 Now we can plot the confusion matrix at a specific threshold: .. GENERATED FROM PYTHON SOURCE LINES 428-431 .. code-block:: Python cm_threshold_display.plot(threshold_value=0.3) plt.show() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_009.png :alt: Confusion Matrix (threshold: 0.30) :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_009.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 432-436 Since there are a finite number of threshold where the predictions change, we plot the decision matrix associated with the threshold closest to the one provided. The frame method also supports threshold selection: .. GENERATED FROM PYTHON SOURCE LINES 436-438 .. code-block:: Python cm_threshold_display.frame(threshold_value=0.7) .. raw:: html
predicted_label disallowed allowed
true_label
disallowed 17144 82
allowed 873 291


.. GENERATED FROM PYTHON SOURCE LINES 439-441 When `threshold_value` is set to `"all"`, we get all confusion matrices for all available thresholds: .. GENERATED FROM PYTHON SOURCE LINES 441-444 .. code-block:: Python cm_all_thresholds = cm_threshold_display.frame(threshold_value="all") cm_all_thresholds.head(5) .. raw:: html
threshold disallowed/disallowed disallowed/allowed allowed/disallowed allowed/allowed
0 0.000276 0 17226 0 1164
1 0.000326 1 17225 0 1164
2 0.000357 2 17224 0 1164
3 0.000358 3 17223 0 1164
4 0.000360 4 17222 0 1164


.. GENERATED FROM PYTHON SOURCE LINES 445-449 .. seealso:: For using the :class:`~skore.EstimatorReport` to inspect your models, see :ref:`example_feature_importance`. .. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 23.153 seconds) .. _sphx_glr_download_auto_examples_model_evaluation_plot_estimator_report.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_estimator_report.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_estimator_report.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_estimator_report.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_