.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_gallery/1-glm/plot_3_MultiTaskLearning.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_gallery_1-glm_plot_3_MultiTaskLearning.py: ================================ Multi-Response Linear Regression ================================ .. GENERATED FROM PYTHON SOURCE LINES 7-48 Introduction: model setting ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Multi-response linear regression (a.k.a., multi-task learning) aims at predicting multiple responses at the same time, and thus, it is a natural extension for classical linear regression where the response is univariate. Multi-response linear regression (MRLR) is very helpful for the analysis of correlated response such as chemical measurements for soil samples and microRNAs associated with Glioblastoma multiforme cancer. Suppose :math:`y` is an :math:`m`-dimensional response variable, :math:`x` is :math:`p`-dimensional predictors, :math:`B \in R^{m \times p}` is the coefficient matrix, the MMLR model for the multivariate response is given by .. math:: y = B x + \epsilon, where :math:`\epsilon` is an :math:`m`-dimensional random noise variable with zero mean. Due to the Occam's razor principle or the high-dimensionality of predictors, it is meaningful to use a small amount of predictors to conduct multi-task learning. For example, understanding the relationship between gene expression and symptoms of a disease has significant importance in identifying potential markers. Many diseases usually involve multiple manifestations and those manifestations are usually related. In some cases, it makes sense to predict those manifestations using a small but the same set of predictors. The best subset selection problem under the MMLR model is formulated as .. math:: \frac{1}{2n} \| Y - XB \|_{F}^2, \text{ subject to: } \| B \|_{0, 2} \leq s, where, :math:`Y \in R^{n \times m}` and :math:`X \in R^{n \times p}` record :math:`n` observations` response and predictors, respectively. Here :math:`\| B \|_{0, 2} = \sum_{i = 1}^{p} I(B_{i\cdot} = {\bf 0})`, where :math:`B_{i\cdot}` is the :math:`i`-th row of coefficient matrix :math:`B` and :math:`{\bf 0} \in R^{m}` is an all-zero vector. Simulated Data Example ~~~~~~~~~~~~~~~~~~~~~~ We use an artificial dataset to demonstrate how to solve best subset selection problem for MMLR with ``abess`` package. The ``make_multivariate_glm_data()`` function provides a simple way to generate suitable dataset for this task. The synthetic data have 100 observations with 3-dimensional responses and 20-dimensional predictors. Note that there are three predictors having an impact on the responses. .. GENERATED FROM PYTHON SOURCE LINES 48-64 .. code-block:: Python from abess.datasets import make_multivariate_glm_data import numpy as np np.random.seed(0) n = 100 p = 20 M = 3 k = 3 data = make_multivariate_glm_data(n=n, p=p, M=M, k=k, family='multigaussian') print(data.y[0:5, ]) print(data.coef_) print("non-zero: ", set(np.nonzero(data.coef_)[0])) .. rst-class:: sphx-glr-script-out .. code-block:: none [[-4.47877355 -2.83660944 9.01903871] [ 6.89372511 -2.07996131 -4.30416456] [ 0.98334 -1.85903489 -5.4657045 ] [ 1.9803706 1.8026655 -6.35946312] [ 2.0746701 1.97692084 0.26295969]] [[ 0. 0. 0. ] [ 0. 0. 0. ] [ 0.8880576 2.35738133 0.33938644] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 2.33460874 -3.0222518 -1.63030259] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [-0.25565796 -0.25578781 -3.82878688] [ 0. 0. 0. ]] non-zero: {2, 18, 5} .. GENERATED FROM PYTHON SOURCE LINES 65-69 Model Fitting """"""""""""" To carry out sparse mutli-task learning, we can call the ``MultiTaskRegression`` like: .. GENERATED FROM PYTHON SOURCE LINES 69-75 .. code-block:: Python from abess import MultiTaskRegression model = MultiTaskRegression() model.fit(data.x, data.y) .. raw:: html
MultiTaskRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 76-77 After fitting, ``model.coef_`` contains the predicted coefficients: .. GENERATED FROM PYTHON SOURCE LINES 77-82 .. code-block:: Python print(model.coef_) print("non-zero: ", set(np.nonzero(model.coef_)[0])) .. rst-class:: sphx-glr-script-out .. code-block:: none [[ 0. 0. 0. ] [ 0. 0. 0. ] [ 0.82745448 2.38786974 0.32939017] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 2.42521165 -3.12093333 -1.76555086] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [ 0. 0. 0. ] [-0.2299864 -0.15746641 -3.69082244] [ 0. 0. 0. ]] non-zero: {2, 18, 5} .. GENERATED FROM PYTHON SOURCE LINES 83-89 The outputs show that the support set is correctly identifying and the parameter estimation approaches to the truth. More on the results """"""""""""""""""" Since there are three responses, we have three solution paths, which correspond to three responses, respectively. To plot the figure, we can fix the ``support_size`` at different levels: .. GENERATED FROM PYTHON SOURCE LINES 89-131 .. code-block:: Python import matplotlib.pyplot as plt coef = np.zeros((3, 21, 20)) for s in range(21): model = MultiTaskRegression(support_size=s) model.fit(data.x, data.y) for y in range(3): coef[y, s, :] = model.coef_[:, y] plt.subplot(2,2,1) for i in range(20): plt.plot(coef[0, :, i]) plt.xlabel('support_size') plt.ylabel('coefficient') plt.title('the 1st response\'s coef') plt.subplot(2,2,2) for i in range(20): plt.plot(coef[1, :, i]) plt.xlabel('support_size') plt.ylabel('coefficient') plt.title('the 2nd response\'s coef') plt.subplot(2,2,3) for i in range(20): plt.plot(coef[2, :, i]) plt.xlabel('support_size') plt.ylabel('coefficient') plt.title('the 3rd response\'s coef') plt.subplot(2,2,4) coef_norm =np.sum(coef**2, axis = 0)**0.5 for i in range(20): plt.plot(coef_norm[:, i]) plt.xlabel('support_size') plt.ylabel('L2 norm of coefficient') plt.title('the L2 norm of the coef') plt.subplots_adjust(wspace=0.6,hspace=1) plt.show() .. image-sg:: /auto_gallery/1-glm/images/sphx_glr_plot_3_MultiTaskLearning_001.png :alt: the 1st response's coef, the 2nd response's coef, the 3rd response's coef, the L2 norm of the coef :srcset: /auto_gallery/1-glm/images/sphx_glr_plot_3_MultiTaskLearning_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 132-135 The ``abess`` R package also supports MRLR. For R tutorial, please view https://abess-team.github.io/abess/articles/v06-MultiTaskLearning.html. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.319 seconds) .. _sphx_glr_download_auto_gallery_1-glm_plot_3_MultiTaskLearning.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_3_MultiTaskLearning.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_3_MultiTaskLearning.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_