.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_gallery/4-computation-tips/plot_large_sample.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_gallery_4-computation-tips_plot_large_sample.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_gallery_4-computation-tips_plot_large_sample.py:


Large-Sample Data
=================

.. GENERATED FROM PYTHON SOURCE LINES 7-14

Introduction
^^^^^^^^^^^^^^^^^^^^^^^

.. image:: ../../Tutorial/figure/large-sample.png

A large sample size leads to a large range of possible support sizes which adds to the computational burdon.
The computational tip here is to use the golden-section searching to avoid support size enumeration.

.. GENERATED FROM PYTHON SOURCE LINES 16-19

A motivated observation
^^^^^^^^^^^^^^^^^^^^^^^
Here we generate a simple example under linear model via ``make_glm_data``.

.. GENERATED FROM PYTHON SOURCE LINES 19-36

.. code-block:: Python


    from time import time
    import numpy as np
    import matplotlib.pyplot as plt
    from abess.datasets import make_glm_data
    from abess.linear import LinearRegression

    np.random.seed(0)
    data = make_glm_data(n=100, p=20, k=5, family='gaussian')

    ic = np.zeros(21)
    for sz in range(21):
        model = LinearRegression(support_size=[sz], ic_type='ebic')
        model.fit(data.x, data.y)
        ic[sz] = model.eval_loss_
    print("lowest point: ", np.argmin(ic))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    lowest point:  5


.. GENERATED FROM PYTHON SOURCE LINES 37-43

The generated data contains 100 observations with 20 predictors,
while 5 of them are useful (has non-zero coefficients).
Uses extended Bayesian information criterion (EBIC), the ``abess`` successfully detect the true support size.

We go further and take a look on the support size versus EBIC returned
by ``LinearRegression`` in ``abess.linear``.

.. GENERATED FROM PYTHON SOURCE LINES 45-51

.. code-block:: Python

    plt.plot(ic, 'o-')
    plt.xlabel('support size')
    plt.ylabel('EBIC')
    plt.title('Model Selection via EBIC')
    plt.show()


.. image-sg:: /auto_gallery/4-computation-tips/images/sphx_glr_plot_large_sample_001.png
   :alt: Model Selection via EBIC
   :srcset: /auto_gallery/4-computation-tips/images/sphx_glr_plot_large_sample_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 52-65

From the figure, we can find that
the curve should is a strictly unimodal function achieving minimum at the true subset size,
where ``support_size = 5`` is the lowest point.

Motivated by this observation, we consider a golden-section search technique to determine the optimal support size
associated with the minimum EBIC.

.. image:: ../../Tutorial/figure/goldenSection.png

Compare to the sequential searching, the golden section is much faster because it skip some support sizes which are likely to be a non-optimal one.
Precisely, searching the optimal support size one by one from a candidate set with :math:`O(s_{max})` complexity,
**golden-section** reduce the time complexity to :math:`O(\ln(s_{max}))`, giving a significant computational improvement.


.. GENERATED FROM PYTHON SOURCE LINES 67-70

Usage: golden-section
^^^^^^^^^^^^^^^^^^^^^
In ``abess`` package, golden-section technique can be easily formed like:

.. GENERATED FROM PYTHON SOURCE LINES 70-76

.. code-block:: Python


    model = LinearRegression(path_type='gs', s_min=0, s_max=20)
    model.fit(data.x, data.y)
    print("real coef:\n", np.nonzero(data.coef_)[0])
    print("predicted coef:\n", np.nonzero(model.coef_)[0])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    real coef:
     [ 2  5 10 11 18]
    predicted coef:
     [ 2  5 10 11 18]


.. GENERATED FROM PYTHON SOURCE LINES 77-83

where ``path_type = 'gs'`` means using golden-section rather than search the support size one-by-one.
``s_min`` and ``s_max`` indicates the left and right bound of range of the support size.
Note that in golden-section searching, we should not give ``support_size``, which is only useful for sequential strategy.

The output of golden-section strategy suggests the optimal model size is accurately detected.


.. GENERATED FROM PYTHON SOURCE LINES 86-90

Golden-section v.s. Sequential-searching: runtime comparison
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this part, we perform a runtime comparison experiment to demonstrate the speed gain brought by golden-section.


.. GENERATED FROM PYTHON SOURCE LINES 90-102

.. code-block:: Python


    t1 = time()
    model = LinearRegression(support_size=range(21))
    model.fit(data.x, data.y)
    print("sequential time: ", time() - t1)

    t2 = time()
    model = LinearRegression(path_type='gs', s_min=0, s_max=20)
    model.fit(data.x, data.y)
    print("golden-section time: ", time() - t2)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    sequential time:  0.002244710922241211
    golden-section time:  0.0012612342834472656


.. GENERATED FROM PYTHON SOURCE LINES 103-105

The golden-section runs much faster than sequential method.
The speed gain would be enlarged when the range of support size is larger.

.. GENERATED FROM PYTHON SOURCE LINES 107-113

The ``abess`` R package also supports golden-section.
For R tutorial, please view
https://abess-team.github.io/abess/articles/v09-fasterSetting.html.

sphinx_gallery_thumbnail_path = 'Tutorial/figure/large-sample.png'


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.163 seconds)


.. _sphx_glr_download_auto_gallery_4-computation-tips_plot_large_sample.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_large_sample.ipynb <plot_large_sample.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_large_sample.py <plot_large_sample.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_