.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_gallery/3-advanced-features/plot_best_group.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_gallery_3-advanced-features_plot_best_group.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_gallery_3-advanced-features_plot_best_group.py:


Best Subset of Group Selection
==============================

.. GENERATED FROM PYTHON SOURCE LINES 6-33

Introduction
------------
Best subset of group selection (BSGS) aims to choose a small part of non-overlapping groups to achieve the best interpretability on the response variable.
BSGS is practically useful for the analysis of ubiquitously existing variables with certain group structures.
For instance, a categorical variable with several levels is often represented by a group of dummy variables.
Besides, in a nonparametric additive model, a continuous component can be represented by a set of basis functions
(e.g., a linear combination of spline basis functions). Finally, specific prior knowledge can impose group structures on variables.
A typical example is that the genes belonging to the same biological pathway can be considered as a group in the genomic data analysis.
Figure for distinct BSGS and best-subset selection is presented below.

.. image:: ../../Tutorial/figure/best-subset-group-selection.png 

The BSGS can be achieved by solving:

.. math::
    \min_{\beta\in \mathbb{R}^p} \frac{1}{2n} ||y-X\beta||_2^2,\; \textup{s.t.}\ ||\beta||_{0,2}\leq s .


where :math:`||\beta||_{0,2} = \sum_{j=1}^J I(||\beta_{G_j}||_2\neq 0)` in which :math:`||\cdot||_2` is the :math:`\ell_2` norm and model size :math:`s` is a positive integer to be determined from data.

Regardless of the NP-hard of this problem, Zhang et al develop a certifiably polynomial algorithm to solve it.
This algorithm is integrated in the ``abess`` package, and user can handily select best group subset by assigning a proper value to the ``group`` arguments:

Using best group subset selection
---------------------------------
We still use the dataset ``data`` generated before, which has 100
samples, 5 useful variables and 15 irrelevant variables.

.. GENERATED FROM PYTHON SOURCE LINES 33-52

.. code-block:: Python


    import numpy as np
    from abess.datasets import make_glm_data
    from abess.linear import LinearRegression

    np.random.seed(0)

    # generate data
    n = 100
    p = 20
    k = 5
    coef1 = 0.5*np.ones(5)
    coef2 = np.zeros(5)
    coef3 = 0.5*np.ones(5)
    coef4 = np.zeros(5)
    coef = np.hstack((coef1, coef2, coef3, coef4))
    data = make_glm_data(n=n, p=p, k=k, family='gaussian', coef_ = coef)
    print('real coefficients:\n', data.coef_, '\n')


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    real coefficients:
     [0.5 0.5 0.5 0.5 0.5 0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.5 0.  0.  0.
     0.  0. ] 


.. GENERATED FROM PYTHON SOURCE LINES 53-54

Support we have some prior information that every 5 variables as a group:

.. GENERATED FROM PYTHON SOURCE LINES 54-58

.. code-block:: Python


    group = np.linspace(0, 3, 4).repeat(5)
    print('group index:\n', group)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    group index:
     [0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 3. 3. 3. 3. 3.]


.. GENERATED FROM PYTHON SOURCE LINES 59-64

Then we can set the ``group`` argument in function. Besides, the
``support_size`` here indicates the number of groups, instead of the
number of variables. Similarly, ``always_select``, ``A_init`` and other
parameters related to "index" should also be group index, instead of
the variable one.

.. GENERATED FROM PYTHON SOURCE LINES 64-70

.. code-block:: Python


    model1 = LinearRegression(support_size=range(3), group=group)
    model1.fit(data.x, data.y)
    print('coefficients:\n', model1.coef_)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    coefficients:
     [0.65915697 0.45713643 0.49044526 0.43927599 0.62863533 0.
     0.         0.         0.         0.         0.575272   0.41249505
     0.37598688 0.59901008 0.58798189 0.         0.         0.
     0.         0.        ]


.. GENERATED FROM PYTHON SOURCE LINES 71-75

The fitted result suggest that only two groups are selected (since ``support_size`` is from 0 to 2) and the selected variables are shown above.

Next, we want to compare the result of a given group structure with that without a given group structure.


.. GENERATED FROM PYTHON SOURCE LINES 75-79

.. code-block:: Python

    model2 = LinearRegression()
    model2.fit(data.x, data.y)
    print('coefficients:\n', model2.coef_)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    coefficients:
     [0.61823344 0.54500673 0.59272352 0.42754021 0.65843857 0.
     0.         0.         0.         0.         0.         0.
     0.         0.66978731 0.55137187 0.         0.         0.
     0.         0.        ]


.. GENERATED FROM PYTHON SOURCE LINES 80-89

The result from a model without a given group structure omits three predictors 
belonging to the active set.
The ``abess`` R package also supports best group subset selection.

For R tutorial, please view
https://abess-team.github.io/abess/articles/v07-advancedFeatures.html.

sphinx_gallery_thumbnail_path = 'Tutorial/figure/best-subset-group-selection.png'


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.007 seconds)


.. _sphx_glr_download_auto_gallery_3-advanced-features_plot_best_group.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_best_group.ipynb <plot_best_group.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_best_group.py <plot_best_group.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_