.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_gallery/4-computation-tips/plot_sparse_inputs.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_gallery_4-computation-tips_plot_sparse_inputs.py: Sparse Inputs ============= We sometimes meet with problems where the :math:`N × p` input matrix :math:`X` is extremely sparse, i.e., many entries in :math:`X` have zero values. A notable example comes from document classification: aiming to assign classes to a document, making it easier to manage for publishers and news sites. The input variables for characterizing documents are generated from a so called "bag-of-words" model. In this model, each variable is scored for the presence of each of the words in the entire dictionary under consideration. Since most words are absent, the input variables for each document is mostly zero, and so the entire matrix is mostly zero. .. GENERATED FROM PYTHON SOURCE LINES 12-16 Example ^^^^^^^ We create a sparse matrix as our example: .. GENERATED FROM PYTHON SOURCE LINES 16-27 .. code-block:: Python from time import time from abess import LinearRegression from scipy.sparse import coo_matrix import numpy as np import matplotlib.pyplot as plt row = np.array([0, 1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9]) col = np.array([0, 3, 1, 2, 4, 3, 5, 2, 3, 1, 5, 2]) data = np.array([4, 5, 7, 9, 1, 23, 4, 5, 6, 8, 77, 100]) x = coo_matrix((data, (row, col))) .. GENERATED FROM PYTHON SOURCE LINES 28-29 And visualize the sparsity pattern via: .. GENERATED FROM PYTHON SOURCE LINES 29-32 .. code-block:: Python plt.spy(x) plt.show() .. image-sg:: /auto_gallery/4-computation-tips/images/sphx_glr_plot_sparse_inputs_001.png :alt: plot sparse inputs :srcset: /auto_gallery/4-computation-tips/images/sphx_glr_plot_sparse_inputs_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 33-39 Usage: sparse matrix ^^^^^^^^^^^^^^^^^^^^ The sparse matrix can be directly used in ``abess`` pacakages. We just need to set argument ``sparse_matrix = True``. Note that if the input matrix is not sparse matrix, the program would automatically transfer it into the sparse one, so this argument can also make some improvement. .. GENERATED FROM PYTHON SOURCE LINES 39-49 .. code-block:: Python coef = np.array([1, 1, 1, 0, 0, 0]) y = x.dot(coef) model = LinearRegression() model.fit(x, y, sparse_matrix=True) print("real coef: \n", coef) print("pred coef: \n", model.coef_) .. rst-class:: sphx-glr-script-out .. code-block:: none real coef: [1 1 1 0 0 0] pred coef: [1. 1. 1. 0. 0. 0.] .. GENERATED FROM PYTHON SOURCE LINES 50-54 Sparse v.s. Dense: runtime comparsion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We compare the runtime under a larger sparse data: .. GENERATED FROM PYTHON SOURCE LINES 54-70 .. code-block:: Python from scipy.sparse import rand from numpy.random import default_rng rng = default_rng(12345) x = rand(1000, 200, density=0.01, format='coo', random_state=rng) coef = np.repeat([1, 0], 100) y = x.dot(coef) t = time() model.fit(x.toarray(), y) print("dense matrix: ", time() - t) t = time() model.fit(x, y, sparse_matrix=True) print("sparse matrix: ", time() - t) .. rst-class:: sphx-glr-script-out .. code-block:: none dense matrix: 0.263674259185791 sparse matrix: 0.12428140640258789 .. GENERATED FROM PYTHON SOURCE LINES 71-78 From the comparison, we see that the time required by sparse matrix is smaller, and this sould be more visible when the sparse imput matrix is large. Hence, we suggest to assign a sparse matrix to ``abess`` when the input matrix have a lot of zero entries. The ``abess`` R package also supports sparse matrix. For R tutorial, please view https://abess-team.github.io/abess/articles/v09-fasterSetting.html .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.525 seconds) .. _sphx_glr_download_auto_gallery_4-computation-tips_plot_sparse_inputs.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_sparse_inputs.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_sparse_inputs.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_