.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_gallery/1-glm/plot_2_LogisticRegression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_gallery_1-glm_plot_2_LogisticRegression.py: ============================================== Classification: Logistic Regression and Beyond ============================================== We would like to use an example to show how the best subset selection for logistic regression works in our program. .. GENERATED FROM PYTHON SOURCE LINES 9-17 Real Data Example ^^^^^^^^^^^^^^^^^ Titanic Dataset """"""""""""""" Consider the Titanic dataset obtained from the Kaggle competition: https://www.kaggle.com/c/titanic/data. The dataset consists of data of 889 passengers, and the goal of the competition is to predict the survival status (yes/no) based on features including the class of service, the sex, the age, etc. .. GENERATED FROM PYTHON SOURCE LINES 17-24 .. code-block:: Python import pandas as pd dt = pd.read_csv("train.csv") print(dt.head(5)) .. rst-class:: sphx-glr-script-out .. code-block:: none PassengerId Survived Pclass ... Fare Cabin Embarked 0 1 0 3 ... 7.2500 NaN S 1 2 1 1 ... 71.2833 C85 C 2 3 1 3 ... 7.9250 NaN S 3 4 1 1 ... 53.1000 C123 S 4 5 0 3 ... 8.0500 NaN S [5 rows x 12 columns] .. GENERATED FROM PYTHON SOURCE LINES 25-29 We only focus on some numerical or categorical variables: - predictor variables: `Pclass`, `Sex`, `Age`, `SibSp`, `Parch`, `Fare`, `Embarked`; - response variable is `Survived`. .. GENERATED FROM PYTHON SOURCE LINES 29-35 .. code-block:: Python dt = dt.iloc[:, [1, 2, 4, 5, 6, 7, 9, 11]] # variables interested dt['Pclass'] = dt['Pclass'].astype(str) print(dt.head(5)) .. rst-class:: sphx-glr-script-out .. code-block:: none Survived Pclass Sex Age SibSp Parch Fare Embarked 0 0 3 male 22.0 1 0 7.2500 S 1 1 1 female 38.0 1 0 71.2833 C 2 1 3 female 26.0 0 0 7.9250 S 3 1 1 female 35.0 1 0 53.1000 S 4 0 3 male 35.0 0 0 8.0500 S .. GENERATED FROM PYTHON SOURCE LINES 36-37 However, some rows contain missing values (NaN) and we need to drop them. .. GENERATED FROM PYTHON SOURCE LINES 37-41 .. code-block:: Python dt = dt.dropna() print('sample size: ', dt.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none sample size: (712, 8) .. GENERATED FROM PYTHON SOURCE LINES 42-43 Then use dummy variables to replace classification variables: .. GENERATED FROM PYTHON SOURCE LINES 43-48 .. code-block:: Python dt1 = pd.get_dummies(dt) print(dt1.head(5)) .. rst-class:: sphx-glr-script-out .. code-block:: none Survived Age SibSp Parch ... Sex_male Embarked_C Embarked_Q Embarked_S 0 0 22.0 1 0 ... True False False True 1 1 38.0 1 0 ... False True False False 2 1 26.0 0 0 ... False False False True 3 1 35.0 1 0 ... False False False True 4 0 35.0 0 0 ... True False False True [5 rows x 13 columns] .. GENERATED FROM PYTHON SOURCE LINES 49-50 Now we split `dt1` into training set and testing set: .. GENERATED FROM PYTHON SOURCE LINES 50-62 .. code-block:: Python from sklearn.model_selection import train_test_split import numpy as np X = np.array(dt1.drop('Survived', axis=1)) Y = np.array(dt1.Survived) train_x, test_x, train_y, test_y = train_test_split( X, Y, test_size=0.33, random_state=0) print('train size: ', train_x.shape[0]) print('test size:', test_x.shape[0]) .. rst-class:: sphx-glr-script-out .. code-block:: none train size: 477 test size: 235 .. GENERATED FROM PYTHON SOURCE LINES 63-76 Here ``train_x`` contains: - V0: dummy variable, 1st ticket class (1-yes, 0-no) - V1: dummy variable, 2nd ticket class (1-yes, 0-no) - V2: dummy variable, sex (1-male, 0-female) - V3: Age - V4: # of siblings / spouses aboard the Titanic - V5: # of parents / children aboard the Titanic - V6: Passenger fare - V7: dummy variable, Cherbourg for embarkation (1-yes, 0-no) - V8: dummy variable, Queenstown for embarkation (1-yes, 0-no) And ``train_y`` indicates whether the passenger survived (1-yes, 0-no). .. GENERATED FROM PYTHON SOURCE LINES 76-80 .. code-block:: Python print('train_x:\n', train_x[0:5, :]) print('train_y:\n', train_y[0:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none train_x: [[54.0 1 0 59.4 True False False True False True False False] [30.0 0 0 8.6625 False False True True False False False True] [47.0 0 0 38.5 True False False False True False False True] [28.0 2 0 7.925 False False True False True False False True] [29.0 1 0 26.0 False True False True False False False True]] train_y: [1 0 0 0 1] .. GENERATED FROM PYTHON SOURCE LINES 81-87 Model Fitting """"""""""""" The ``LogisticRegression()`` function in the ``abess.linear`` allows us to perform best subset selection in a highly efficient way. For example, in the Titanic sample, if you want to look for a best subset with no more than 5 variables on the logistic model, you can call: .. GENERATED FROM PYTHON SOURCE LINES 87-94 .. code-block:: Python from abess import LogisticRegression s = 5 # max target sparsity model = LogisticRegression(support_size=range(0, s + 1)) model.fit(train_x, train_y) .. raw:: html
LogisticRegression(support_size=range(0, 6))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 95-97 Now the ``model.coef_`` contains the coefficients of logistic model with no more than 5 variables. That is, those variables with a coefficient 0 is unused in the model: .. GENERATED FROM PYTHON SOURCE LINES 97-100 .. code-block:: Python print(model.coef_) .. rst-class:: sphx-glr-script-out .. code-block:: none [-0.05410776 -0.53642966 0. 0. 1.74091231 0. -1.26223831 2.7096497 0. 0. 0. 0. ] .. GENERATED FROM PYTHON SOURCE LINES 101-109 By default, the ``LogisticRegression`` function set ``support_size = range(0, min(p,n/log(n)p)`` and the best support size is determined by the Extended Bayesian Information Criteria (EBIC). You can change the tuning criterion by specifying the argument ``ic_type``. The available tuning criteria now are ``"gic"``, ``"aic"``, ``"bic"``, ``"ebic"``. For a quicker solution, you can change the tuning strategy to a golden section path which tries to find the elbow point of the tuning criterion over the hyperparameter space. Here we give an example. .. GENERATED FROM PYTHON SOURCE LINES 109-115 .. code-block:: Python model_gs = LogisticRegression(path_type="gs", s_min=0, s_max=s) model_gs.fit(train_x, train_y) print(model_gs.coef_) .. rst-class:: sphx-glr-script-out .. code-block:: none [-0.05410776 -0.53642966 0. 0. 1.74091231 0. -1.26223831 2.7096497 0. 0. 0. 0. ] .. GENERATED FROM PYTHON SOURCE LINES 116-123 where ``s_min`` and ``s_max`` bound the support size and this model gives the same answer as before. More on the Results """"""""""""""""""" After fitting with ``model.fit()``, we can further do more exploring work to interpret it. As we show above, ``model.coef_`` contains the sparse coefficients of variables and those non-zero values indicate "important" variables chosen in the model. .. GENERATED FROM PYTHON SOURCE LINES 123-129 .. code-block:: Python print('Intercept: ', model.intercept_) print('coefficients: \n', model.coef_) print('Used variables\' index:', np.nonzero(model.coef_ != 0)[0]) .. rst-class:: sphx-glr-script-out .. code-block:: none Intercept: 0.5742977507847972 coefficients: [-0.05410776 -0.53642966 0. 0. 1.74091231 0. -1.26223831 2.7096497 0. 0. 0. 0. ] Used variables' index: [0 1 4 6 7] .. GENERATED FROM PYTHON SOURCE LINES 130-131 The training loss and the score under information criterion: .. GENERATED FROM PYTHON SOURCE LINES 131-136 .. code-block:: Python print('Training Loss: ', model.train_loss_) print('IC: ', model.eval_loss_) .. rst-class:: sphx-glr-script-out .. code-block:: none Training Loss: 204.35270048059678 IC: 464.3920499135153 .. GENERATED FROM PYTHON SOURCE LINES 137-139 Prediction is allowed for the estimated model. Just call ``model.predict()`` function like: .. GENERATED FROM PYTHON SOURCE LINES 139-143 .. code-block:: Python fitted_y = model.predict(test_x) print(fitted_y) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0] .. GENERATED FROM PYTHON SOURCE LINES 144-148 Besides, we can also call for the survival probability of each observation by ``model.predict_proba()``, which returns a matrix with two columns. The first column corresponds to probability for class "0", and the second is for class "1" (survived). Then, we tend to classify the observation into more likely class. .. GENERATED FROM PYTHON SOURCE LINES 148-153 .. code-block:: Python fitted_p = model.predict_proba(test_x) print(fitted_p) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0.50743387 0.49256613] [0.74057032 0.25942968] [0.15071537 0.84928463] [0.79795817 0.20204183] [0.96198452 0.03801548] [0.95977651 0.04022349] [0.27648557 0.72351443] [0.76884378 0.23115622] [0.76884378 0.23115622] [0.33165327 0.66834673] [0.03224465 0.96775535] [0.35094054 0.64905946] [0.01538079 0.98461921] [0.84761133 0.15238867] [0.74995921 0.25004079] [0.42359788 0.57640212] [0.73004032 0.26995968] [0.28735418 0.71264582] [0.62208165 0.37791835] [0.8228686 0.1771314 ] [0.74226703 0.25773297] [0.24607858 0.75392142] [0.12025589 0.87974411] [0.59748431 0.40251569] [0.43558118 0.56441882] [0.65942131 0.34057869] [0.77994844 0.22005156] [0.932841 0.067159 ] [0.42119469 0.57880531] [0.66352233 0.33647767] [0.84344878 0.15655122] [0.97317339 0.02682661] [0.85446957 0.14553043] [0.30336212 0.69663788] [0.10921555 0.89078445] [0.12074848 0.87925152] [0.08073996 0.91926004] [0.40918613 0.59081387] [0.57002721 0.42997279] [0.54346526 0.45653474] [0.61153036 0.38846964] [0.90979818 0.09020182] [0.94257539 0.05742461] [0.92226281 0.07773719] [0.9005148 0.0994852 ] [0.88993666 0.11006334] [0.0180426 0.9819574 ] [0.85780137 0.14219863] [0.8903911 0.1096089 ] [0.03059829 0.96940171] [0.28648812 0.71351188] [0.30336212 0.69663788] [0.36336243 0.63663757] [0.74057032 0.25942968] [0.45021417 0.54978583] [0.46690207 0.53309793] [0.92967528 0.07032472] [0.9293708 0.0706292 ] [0.13110112 0.86889888] [0.62098833 0.37901167] [0.56123326 0.43876674] [0.96915459 0.03084541] [0.85446957 0.14553043] [0.80006385 0.19993615] [0.70819044 0.29180956] [0.88171401 0.11828599] [0.05413855 0.94586145] [0.69389487 0.30610513] [0.01236779 0.98763221] [0.19088286 0.80911714] [0.74057032 0.25942968] [0.06948297 0.93051703] [0.0902975 0.9097025 ] [0.48714638 0.51285362] [0.95075583 0.04924417] [0.46234646 0.53765354] [0.51757961 0.48242039] [0.73959052 0.26040948] [0.90525825 0.09474175] [0.6615436 0.3384564 ] [0.44892685 0.55107315] [0.11974729 0.88025271] [0.90941602 0.09058398] [0.18266554 0.81733446] [0.13163148 0.86836852] [0.90525825 0.09474175] [0.95538456 0.04461544] [0.71924495 0.28075505] [0.21109988 0.78890012] [0.86106974 0.13893026] [0.97565829 0.02434171] [0.95302055 0.04697945] [0.29853147 0.70146853] [0.08595031 0.91404969] [0.33767709 0.66232291] [0.9005148 0.0994852 ] [0.06280397 0.93719603] [0.1577817 0.8422183 ] [0.8903911 0.1096089 ] [0.84530315 0.15469685] [0.84761133 0.15238867] [0.14120978 0.85879022] [0.77994844 0.22005156] [0.75908805 0.24091195] [0.78831956 0.21168044] [0.84761133 0.15238867] [0.39506122 0.60493878] [0.67355065 0.32644935] [0.73874787 0.26125213] [0.92482907 0.07517093] [0.86106974 0.13893026] [0.25965364 0.74034636] [0.15253925 0.84746075] [0.54786818 0.45213182] [0.9293708 0.0706292 ] [0.74057032 0.25942968] [0.77994844 0.22005156] [0.98164302 0.01835698] [0.85836737 0.14163263] [0.79788631 0.20211369] [0.84761133 0.15238867] [0.90009763 0.09990237] [0.76081454 0.23918546] [0.26927389 0.73072611] [0.73784984 0.26215016] [0.96391455 0.03608545] [0.96129876 0.03870124] [0.83746312 0.16253688] [0.25965364 0.74034636] [0.02006328 0.97993672] [0.91829389 0.08170611] [0.35926408 0.64073592] [0.15966607 0.84033393] [0.14789964 0.85210036] [0.19016604 0.80983396] [0.02742217 0.97257783] [0.36336243 0.63663757] [0.98180978 0.01819022] [0.95478642 0.04521358] [0.88499785 0.11500215] [0.64716682 0.35283318] [0.9395756 0.0604244 ] [0.19016604 0.80983396] [0.34572827 0.65427173] [0.43558118 0.56441882] [0.78909413 0.21090587] [0.90979818 0.09020182] [0.84761133 0.15238867] [0.90794231 0.09205769] [0.86741702 0.13258298] [0.92967528 0.07032472] [0.89556126 0.10443874] [0.32670564 0.67329436] [0.08952309 0.91047691] [0.12858887 0.87141113] [0.86741702 0.13258298] [0.86106974 0.13893026] [0.30998425 0.69001575] [0.0145825 0.9854175 ] [0.25965364 0.74034636] [0.04842691 0.95157309] [0.90009763 0.09990237] [0.02115516 0.97884484] [0.48933053 0.51066947] [0.95558225 0.04441775] [0.95558225 0.04441775] [0.71638648 0.28361352] [0.96512977 0.03487023] [0.50511029 0.49488971] [0.8821979 0.1178021 ] [0.35926408 0.64073592] [0.37487948 0.62512052] [0.02115516 0.97884484] [0.9293708 0.0706292 ] [0.49506961 0.50493039] [0.37596932 0.62403068] [0.13163148 0.86836852] [0.86106974 0.13893026] [0.82544239 0.17455761] [0.6968841 0.3031159 ] [0.92226281 0.07773719] [0.62098833 0.37901167] [0.88221559 0.11778441] [0.5298741 0.4701259 ] [0.59737712 0.40262288] [0.0630781 0.9369219 ] [0.82544239 0.17455761] [0.83310188 0.16689812] [0.33359333 0.66640667] [0.12661189 0.87338811] [0.75738401 0.24261599] [0.41474865 0.58525135] [0.23939759 0.76060241] [0.90941602 0.09058398] [0.041657 0.958343 ] [0.27018941 0.72981059] [0.69488121 0.30511879] [0.70819044 0.29180956] [0.22574405 0.77425595] [0.03224465 0.96775535] [0.9141412 0.0858588 ] [0.13163148 0.86836852] [0.96915459 0.03084541] [0.28099043 0.71900957] [0.91273698 0.08726302] [0.94704734 0.05295266] [0.65133737 0.34866263] [0.67146626 0.32853374] [0.965596 0.034404 ] [0.84049023 0.15950977] [0.08914497 0.91085503] [0.47466173 0.52533827] [0.19863876 0.80136124] [0.44777727 0.55222273] [0.92605446 0.07394554] [0.75082977 0.24917023] [0.23524154 0.76475846] [0.26568554 0.73431446] [0.72817106 0.27182894] [0.1023766 0.8976234 ] [0.32670564 0.67329436] [0.95558225 0.04441775] [0.69875031 0.30124969] [0.02351608 0.97648392] [0.83746312 0.16253688] [0.85107278 0.14892722] [0.97930718 0.02069282] [0.71732988 0.28267012] [0.94257539 0.05742461] [0.94987806 0.05012194] [0.87351692 0.12648308] [0.93254923 0.06745077] [0.91724157 0.08275843] [0.90979818 0.09020182] [0.932841 0.067159 ]] .. GENERATED FROM PYTHON SOURCE LINES 154-156 We can also generate an ROC curve and calculate tha AUC value. On this dataset, the AUC is 0.83, which is rather desirable. .. GENERATED FROM PYTHON SOURCE LINES 156-169 .. code-block:: Python from sklearn.metrics import roc_curve, auc import matplotlib.pyplot as plt fpr, tpr, _ = roc_curve(test_y, fitted_p[:, 1]) plt.plot(fpr, tpr) plt.plot([0, 1], [0, 1], 'k--', label="ROC curve (area = %0.2f)" % auc(fpr, tpr)) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title("Receiver operating characteristic (ROC) curve") plt.legend(loc="lower right") plt.show() .. image-sg:: /auto_gallery/1-glm/images/sphx_glr_plot_2_LogisticRegression_001.png :alt: Receiver operating characteristic (ROC) curve :srcset: /auto_gallery/1-glm/images/sphx_glr_plot_2_LogisticRegression_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 170-213 Extension: Multi-class Classification ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Multinomial logistic regression """"""""""""""""""""""""""""""" When the number of classes is more than 2, we call it multi-class classification task. Logistic regression can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one. The extended model is multinomial logistic regression. To arrive at the multinomial logistic model, one can imagine, for :math:`K` possible classes, running :math:`K−1` independent logistic regression models, in which one class is chosen as a "pivot" and then the other :math:`K−1` classes are separately regressed against the pivot outcome. This would proceed as follows, if class :math:`K` (the last outcome) is chosen as the pivot: .. math:: \ln (\mathbb{P}(y=1)/\mathbb{P}(y=K)) &= x^T\beta^{(1)},\\ & \cdots\\ \ln (\mathbb{P}(y=K-1)/\mathbb{P}(y=K)) &= x^T\beta^{(K-1)}. Then, the probability to choose the :math:`j`-th class can be easily derived to be: .. math:: \mathbb{P}(y=j) = \frac{\exp(x^T\beta^{(j)})}{1+\sum_{k=1}^{K-1} \exp(x^T\beta^{(k)})}, and subsequently, we would that the object belongs to the :math:`j^*`-th class if the :math:`j^*=\arg\max_j \mathbb{P}(y=j)`. Notice that, for :math:`K` possible classes case, there are :math:`p\times(K−1)` unknown parameters: :math:`\beta^{(1)},\dots,\beta^{(K−1)}` to be estimated. Because the number of parameters increases as :math:`K`, it is even more urgent to constrain the model complexity. And the best subset selection for multinomial logistic regression aims to maximize the log-likelihood function and control the model complexity by restricting :math:`B=(\beta^{(1)},\dots,\beta^{(K−1)})` with :math:`||B||_{0,2}\leq s` where :math:`||B||_{0,2}=\sum_{i=1}^p I(B_{i\cdot}=0)`, :math:`B_{i\cdot}` is the :math:`i`-th row of coefficient matrix :math:`B` and :math:`0\in \mathbb{R}^{K-1}` is an all zero vector. In other words, each row of :math:`B` would be either all zero or all non-zero. Simulated Data Example ~~~~~~~~~~~~~~~~~~~~~~ We shall conduct Multinomial logistic regression on an artificial dataset for demonstration. The ``make_multivariate_glm_data()`` provides a simple way to generate suitable dataset for this task. The assumption behind this model is that the response vector follows a multinomial distribution. The artificial dataset contains 100 observations and 20 predictors but only five predictors have influence on the three possible classes. .. GENERATED FROM PYTHON SOURCE LINES 213-226 .. code-block:: Python from abess.datasets import make_multivariate_glm_data n = 100 # sample size p = 20 # all predictors k = 5 # real predictors M = 3 # number of classes np.random.seed(0) dt = make_multivariate_glm_data(n=n, p=p, k=k, family="multinomial", M=M) print(dt.coef_) print('real variables\' index:\n', set(np.nonzero(dt.coef_)[0])) .. rst-class:: sphx-glr-script-out .. code-block:: none [[ 0. 0. 0. ] [ 0. 0. 0. ] [ 1.09734231 4.03598978 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 9.91227834 -3.47987303 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 8.93282229 8.93249765 0. ] [-4.03426165 -2.70336848 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [-5.53475149 -2.65928982 0. ] [ 0. 0. 0. ]] real variables' index: {2, 5, 10, 11, 18} .. GENERATED FROM PYTHON SOURCE LINES 227-229 To carry out best subset selection for multinomial logistic regression, we can call the ``MultinomialRegression()``. Here is an example. .. GENERATED FROM PYTHON SOURCE LINES 229-236 .. code-block:: Python from abess import MultinomialRegression s = 5 model = MultinomialRegression(support_size=range(0, s + 1)) model.fit(dt.x, dt.y) .. raw:: html
MultinomialRegression(support_size=range(0, 6))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 237-239 Its use is quite similar to ``LogisticRegression``. We can get the coefficients to recognize "in-model" variables. .. GENERATED FROM PYTHON SOURCE LINES 239-244 .. code-block:: Python print('intercept:\n', model.intercept_) print('coefficients:\n', model.coef_) .. rst-class:: sphx-glr-script-out .. code-block:: none intercept: [0.83130839 0.17875959 0. ] coefficients: [[ 0. 0. 0. ] [ 0. 0. 0. ] [ 0.2630968 1.97016804 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 6.9160885 -2.36539929 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 7.86005067 7.21979657 0. ] [-3.14548748 -1.6343453 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [-4.80791465 -1.67622322 0. ] [ 0. 0. 0. ]] .. GENERATED FROM PYTHON SOURCE LINES 245-247 So those variables used in model can be recognized and we can find that they are the same as the data's "real" coefficients we generate. .. GENERATED FROM PYTHON SOURCE LINES 247-251 .. code-block:: Python print('used variables\' index:\n', set(np.nonzero(model.coef_)[0])) .. rst-class:: sphx-glr-script-out .. code-block:: none used variables' index: {2, 5, 10, 11, 18} .. GENERATED FROM PYTHON SOURCE LINES 252-255 The ``abess`` R package also supports classification tasks. For R tutorial, please view https://abess-team.github.io/abess/articles/v03-classification.html. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.246 seconds) .. _sphx_glr_download_auto_gallery_1-glm_plot_2_LogisticRegression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_2_LogisticRegression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_2_LogisticRegression.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_