.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_gallery/4-computation-tips/plot_specific_models.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_gallery_4-computation-tips_plot_specific_models.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_gallery_4-computation-tips_plot_specific_models.py:


Specific Models
===============

.. GENERATED FROM PYTHON SOURCE LINES 7-38

Introduction
^^^^^^^^^^^^
From the algorithm preseneted in “`ABESS algorithm: details <https://abess.readthedocs.io/en/latest/auto_gallery/1-glm/plot_a2_abess_algorithm_details.html>`__”,
one of the bottleneck in algorithm is the computation of forward and backward sacrifices,
which requires conducting iterative algorithms or frequently visiting :math:`p` variables.
To improve computational efficiency,
we designed specialize strategies for computing forward and backward sacrifices for different models.
The specialize strategies is roughly divide into two classes: (i) covariance update for (multivariate) linear model;
(ii) quasi Newton iteration for non-linear model (e.g., logistic regression).
We going to specify the two strategies as follows.

Covariance update
^^^^^^^^^^^^^^^^^
Under linear model, the core bottleneck is computing sacrifices, e.g. the foreward sacrifices,

.. math:: \zeta_{j}=\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}^{\mathcal{A}}}\right)-\mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}+\hat{t}^{\{j\}}\right)=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\frac{\hat{\boldsymbol d}_{j}}{X_{j}^{\top} X_{j} / n}\right)^{2}.

where
:math:`\hat{t}=\arg \min _{t} \mathcal{L}_{n}\left(\hat{\boldsymbol{\beta}}^{\mathcal{A}}+t^{\{j\}}\right), \hat{\boldsymbol d}_{j}=X_{j}^{\top}(y-X \hat{\boldsymbol{\beta}}) / n`.
Intuitively, for :math:`j \in \mathcal{A}` (or
:math:`j \in \mathcal{I}` ), a large :math:`\xi_{j}` (or
:math:`\zeta_{j}`) implies the :math:`j` th variable is potentially
important.

It would take a lot of time on calculating :math:`X^T_jy`, :math:`X^T_jX_j` and its inverse.
To speed up, it is actually no need to recompute these items at each splicing process.
Instead, they can be stored when first calculated, which is what we call
"covariance update".

It is easy to enable this feature with an additional argument
``covariance_update=True`` for linear model, for example:

.. GENERATED FROM PYTHON SOURCE LINES 38-61

.. code-block:: Python


    import numpy as np
    from time import time
    from abess.linear import LinearRegression
    from abess.datasets import make_glm_data

    np.random.seed(1)
    data = make_glm_data(n=10000, p=100, k=10, family='gaussian')
    model1 = LinearRegression()
    model2 = LinearRegression(covariance_update=True)

    t1 = time()
    model1.fit(data.x, data.y)
    t1 = time() - t1

    t2 = time()
    model2.fit(data.x, data.y)
    t2 = time() - t2

    print(f"No covariance update: {t1}")
    print(f"Covariance update: {t2}")
    print(f"Same answer? {(model1.coef_==model2.coef_).all()}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    No covariance update: 2.492441415786743
    Covariance update: 2.3630528450012207
    Same answer? True


.. GENERATED FROM PYTHON SOURCE LINES 62-68

We can see that covariance update improve computation 
when sample size :math:`n` is much larger than dimension :math:`p`.

However, we have to point out that covariance update will cause higher memory usage, especially when :math:`p` is large.
So, we recommend to enable covariance update for fast computation when sample size is much larger than dimension 
and dimension is moderate (:math:`p \leq 2000`).

.. GENERATED FROM PYTHON SOURCE LINES 71-116

Quasi Newton iteration
^^^^^^^^^^^^^^^^^^^^^^
In the third step in `Algorithm 2 <https://abess.readthedocs.io/en/latest/auto_gallery/1-glm/plot_a2_abess_algorithm_details.html#algorithm-2-splicing-left-boldsymbol-beta-d-mathcal-a-mathcal-i-k-max-tau-s-right>`__
, we need to solve a convex optimization problem:

.. math::
    \tilde{\beta} = \arg\min_{\text{supp}(\beta) = \tilde{\mathcal{A}} }  l_n(\beta ).


But generally, it has no closed-form solution, and has to be solved via iterative algorithm.
A natural method for solving this problem is Netwon method, i.e.,
conduct the update:

.. math::
    \beta_{\tilde{\mathcal{A}} }^{m+1} \leftarrow \boldsymbol  \beta_{\tilde{\mathcal{A}} }^m - \Big( \left.\frac{\partial^2 l_n( \boldsymbol  \beta )}{ (\partial \boldsymbol  \beta_{\tilde{\mathcal{A}}}  )^2 }\right|_{\boldsymbol  \beta = \boldsymbol  \beta^m} \Big)^{-1} \Big( \left.\frac{\partial  l_n( \boldsymbol  \beta )}{  \partial \boldsymbol  \beta_{\tilde{\mathcal{A}}}    }\right|_{\boldsymbol  \beta = \boldsymbol  \beta^m}  \Big),


until :math:`\| \beta_{\tilde{\mathcal{A}} }^{m+1} - \beta_{\tilde{\mathcal{A}} }^{m}\|_2 \leq \epsilon` or :math:`m \geq k`,
where :math:`\epsilon, k` are two user-specific parameters.
Generally, setting :math:`\epsilon = 10^{-6}` and :math:`k = 80` achieves desirable estimation.
Generally, the inverse of second derivative is computationally intensive, and thus,
we approximate it with its diagonalized version. Then, the update formulate changes to:

.. math::
    \beta_{\tilde{\mathcal{A}} }^{m+1} \leftarrow \boldsymbol  \beta_{\tilde{\mathcal{A}} }^m -  \rho D \Big( \left.\frac{\partial  l_n( \boldsymbol  \beta )}{  \partial \boldsymbol  \beta_{\tilde{\mathcal{A}}}    }\right|_{\boldsymbol  \beta = \boldsymbol  \beta^m}  \Big),


where :math:`D = \textup{diag}( (\left.\frac{\partial^2 l_n( \boldsymbol  \beta )}{ (\partial \boldsymbol  \beta_{\tilde{\mathcal{A}_{1}}}  )^2 }\right|_{\boldsymbol  \beta = \boldsymbol  \beta^m} )^{-1}, \ldots, (\left.\frac{\partial^2 l_n( \boldsymbol  \beta )}{ (\partial \boldsymbol  \beta_{\tilde{\mathcal{A}}_{|A|}}  )^2 }\right|_{\boldsymbol  \beta = \boldsymbol  \beta^m} )^{-1})`
and :math:`\rho`` is step size.
Although using the approximation may increase the iteration time,
it avoids a large computational complexity when computing the matrix inversion.
Furthermore, we use a heuristic strategy to reduce the iteration time.
Observing that not every new support after exchanging the elements in active set and inactive set
may not reduce the loss function,
we can early stop the newton iteration on these support.
Specifically, support :math:`l_1 = L({\beta}^{m}), l_2 = L({\beta}^{m+1})`,
if :math:`l_1 - (k - m - 1) \times (l_2 - l_1)) > L - \tau`,
then we can expect the new support cannot lead to a better loss after :math:`k` iteration,
and hence, it is no need to conduct the remaining :math:`k - m - 1` times Newton update.
This heuristic strategy is motivated by the convergence rate of Netwon method is linear at least.
|image0|

To enable this feature, you can simply give an additional argument ``approximate_Newton=True``.
The :math:`\epsilon` and :math:`k` we mentioned before, can be set with ``primary_model_fit_epsilon``
and ``primary_model_fit_max_iter``, respectively. For example:

.. GENERATED FROM PYTHON SOURCE LINES 116-141

.. code-block:: Python


    import numpy as np
    from time import time
    from abess.linear import LogisticRegression
    from abess.datasets import make_glm_data

    np.random.seed(1)
    data = make_glm_data(n=1000, p=100, k=10, family='binomial')
    model1 = LogisticRegression()
    model2 = LogisticRegression(approximate_Newton=True,
                                primary_model_fit_epsilon=1e-6,
                                primary_model_fit_max_iter=10)

    t1 = time()
    model1.fit(data.x, data.y)
    t1 = time() - t1

    t2 = time()
    model2.fit(data.x, data.y)
    t2 = time() - t2

    print(f"No newton: {t1}")
    print(f"Newton: {t2}")
    print(f"Same answer? {(np.nonzero(model1.coef_)[0]==np.nonzero(model2.coef_)[0]).all()}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    No newton: 7.58605432510376
    Newton: 2.6649746894836426
    Same answer? True


.. GENERATED FROM PYTHON SOURCE LINES 142-146

The ``abess`` R package also supports covariance update and quasi Newton iteration.
For R tutorial, please view https://abess-team.github.io/abess/articles/v09-fasterSetting.html

.. |image0| image:: ../../Tutorial/figure/convergence_rates.png


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 15.151 seconds)


.. _sphx_glr_download_auto_gallery_4-computation-tips_plot_specific_models.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_specific_models.ipynb <plot_specific_models.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_specific_models.py <plot_specific_models.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_