.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_gallery/5-scikit-learn-connection/plot_2_geomstats.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_gallery_5-scikit-learn-connection_plot_2_geomstats.py: Work with geomstats =================== .. GENERATED FROM PYTHON SOURCE LINES 7-12 The package `geomstats` is used for computations and statistics on nonlinear manifolds, such as Hypersphere,Hyperbolic Space, Symmetric-Positive-Definite (SPD) Matrices Space and Skew-Symmetric Matrices Space. `abess` also works well with the package `geomstats`. Here is an example of using `abess` to do logistic regression of samples on Hypersphere, and we will compare the precision score, the recall score and the running time with `abess` and with `scikit-learn`. .. GENERATED FROM PYTHON SOURCE LINES 12-28 .. code-block:: Python import numpy as np import matplotlib.pyplot as plt import geomstats.backend as gs import geomstats.visualization as visualization from geomstats.learning.frechet_mean import FrechetMean from geomstats.geometry.hypersphere import Hypersphere from sklearn.model_selection import train_test_split from sklearn.metrics import precision_score, recall_score from sklearn.linear_model import LogisticRegression as sklLogisticRegression from abess import LogisticRegression import time import warnings warnings.filterwarnings("ignore") gs.random.seed(0) .. rst-class:: sphx-glr-script-out .. code-block:: none INFO: Using numpy backend .. GENERATED FROM PYTHON SOURCE LINES 29-35 An Example ---------- Two sets of samples on Hypersphere in 3-dimensional Euclidean Space are created. The sample points in `data0` are distributed around :math:`[-3/5, 0, 4/5]`, and the sample points in `data1` are distributed around :math:`[3/5, 0, 4/5]`. The sample size of both is set to 100, and the precision of both is set to 5. The two sets of samples are shown in the figure below. .. GENERATED FROM PYTHON SOURCE LINES 35-47 .. code-block:: Python sphere = Hypersphere(dim=2) data0 = sphere.random_riemannian_normal(mean=np.array([-3/5, 0, 4/5]), n_samples=100, precision=5) data1 = sphere.random_riemannian_normal(mean=np.array([3/5, 0, 4/5]), n_samples=100, precision=5) fig = plt.figure(figsize=(8, 8)) ax = visualization.plot(data0, space="S2", color="black", alpha=0.7, label="data0 points") ax = visualization.plot(data1, space="S2", color="red", alpha=0.7, label="data1 points") ax.set_box_aspect([1, 1, 1]) ax.legend() plt.show() .. image-sg:: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_001.png :alt: plot 2 geomstats :srcset: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 48-51 Then, we divide the data into `train_data` and `test_data`, and calculate the frechit mean of `train_data`, which has the minimum sum of the squares of the distances along the geodesic to each sample point in `train_data`. The `test_data`,the `train_data` and the frechit mean are shown in the figure below. .. GENERATED FROM PYTHON SOURCE LINES 51-68 .. code-block:: Python labels = np.concatenate((np.zeros(data0.shape[0]),np.ones(data1.shape[0]))) data = np.concatenate((data0,data1)) train_data, test_data, train_labels, test_labels = train_test_split(data, labels, test_size=0.33, random_state=0) mean = FrechetMean(sphere) mean.fit(train_data) mean_estimate = mean.estimate_ fig = plt.figure(figsize=(8, 8)) ax = visualization.plot(train_data, space="S2", color="black", alpha=0.5, label="train data") ax = visualization.plot(test_data, space="S2", color="brown", alpha=0.5, label="test data") ax = visualization.plot(mean_estimate, space="S2", color="blue", s=100, label="frechet mean") ax.set_box_aspect([1, 1, 1]) ax.legend() plt.show() .. image-sg:: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_002.png :alt: plot 2 geomstats :srcset: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 69-72 Next, do the logarithm map for all sample points from the frechit mean. That is, map each sample point to which point on the tangential of the geodesic (from the frechit mean to the sample point) at the frechit mean and has the distance to the frechit that equals to the length of the geodesic. .. GENERATED FROM PYTHON SOURCE LINES 72-76 .. code-block:: Python log_train_data = sphere.metric.log(train_data, mean_estimate) log_test_data = sphere.metric.log(test_data, mean_estimate) .. GENERATED FROM PYTHON SOURCE LINES 77-78 The following figure shows the logarithm mapping of `train_data[5]` from the frechit mean. .. GENERATED FROM PYTHON SOURCE LINES 78-92 .. code-block:: Python geodesic = sphere.metric.geodesic(mean_estimate, end_point=train_data[5]) points_on_geodesic = geodesic(gs.linspace(0.0, 1.0, 30)) fig = plt.figure(figsize=(8, 8)) ax = fig.add_subplot(111, projection="3d") ax = visualization.plot(mean_estimate, space="S2", color="blue", s=100, label="frechet mean") ax = visualization.plot(train_data[5], space="S2", color="red", s=100, label="train_data[5]") ax = visualization.plot(points_on_geodesic, ax=ax, space="S2", color="black", alpha=0.5, label="Geodesic") arrow = visualization.Arrow3D(mean_estimate, vector=log_train_data[5]) arrow.draw(ax, color="black") ax.legend(); plt.show() .. image-sg:: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_003.png :alt: plot 2 geomstats :srcset: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 93-95 After that, the samples are naturally distributed on a linear area. Then, some common analysis methods can be used to analyze this set of data, such as LogisticRegression from `abess`. .. GENERATED FROM PYTHON SOURCE LINES 95-103 .. code-block:: Python model = LogisticRegression(support_size= range(0,4)) model.fit(log_train_data, train_labels) fitted_labels = model.predict(log_test_data) print('Used variables\' index:', np.nonzero(model.coef_ != 0)[0]) print('accuracy:',sum((fitted_labels - test_labels + 1) % 2)/test_data.shape[0]) .. rst-class:: sphx-glr-script-out .. code-block:: none Used variables' index: [0] accuracy: 0.9090909090909091 .. GENERATED FROM PYTHON SOURCE LINES 104-107 The result shows that the only variables' index it used is :math:`[0]`. When constructing the samples, the means of the two sets are only different in the 0th direction. It shows that `abess` correctly identifies the most relevant variable for classification. .. GENERATED FROM PYTHON SOURCE LINES 109-119 Comparison ---------- Here is the comparison of the precision score and the recall score with `abess` and `scikit-learn`, and the comparison of the running time with `abess` and `scikit-learn`. We loop 50 times. At each time, two sets of samples on Hypersphere in 10-dimensional Euclidean Space are created. The sample points in `data0` are distributed around :math:`[1 / 3, 0, 2 / 3, 0, 2 / 3, 0, 0, 0, 0, 0]`, and the sample points in `data1` are distributed around :math:`[0, 0, 2 / 3, 0, 2 / 3, 0, 0, 0, 0, 1 / 3]`. The sample size of both is set to 200, and the precision of both is set to 5. .. GENERATED FROM PYTHON SOURCE LINES 119-163 .. code-block:: Python m = 50 # cycles n_sam = 200 s = 10 pre = 5 sphere = Hypersphere(dim=s - 1) labels = np.concatenate((np.zeros(n_sam), np.ones(n_sam))) abess_precision_score = np.zeros(m) skl_precision_score = np.zeros(m) abess_recall_score = np.zeros(m) skl_recall_score = np.zeros(m) abess_geo_time = np.zeros(m) skl_geo_time = np.zeros(m) for i in range(m): data0 = sphere.random_riemannian_normal(mean=np.array([1 / 3, 0, 2 / 3, 0, 2 / 3, 0, 0, 0, 0, 0]), n_samples=n_sam, precision=pre) data1 = sphere.random_riemannian_normal(mean=np.array([0, 0, 2 / 3, 0, 2 / 3, 0, 0, 0, 0, 1 / 3]), n_samples=n_sam, precision=pre) data = np.concatenate((data0, data1)) train_data, test_data, train_labels, test_labels = train_test_split(data, labels, test_size=0.33, random_state=0) mean = FrechetMean(sphere) mean.fit(train_data) mean_estimate = mean.estimate_ log_train_data = sphere.metric.log(train_data, mean_estimate) log_test_data = sphere.metric.log(test_data, mean_estimate) start = time.time() abess_geo_model = LogisticRegression(support_size=range(0, s + 1)).fit(log_train_data, train_labels) abess_geo_fitted_labels = abess_geo_model.predict(log_test_data) end = time.time() abess_geo_time[i] = end - start abess_precision_score[i] = precision_score(test_labels, abess_geo_fitted_labels, average='micro') abess_recall_score[i] = recall_score(test_labels, abess_geo_fitted_labels, average='micro') start = time.time() skl_geo_model = sklLogisticRegression().fit(X=log_train_data, y=train_labels) skl_geo_fitted_labels = skl_geo_model.predict(log_test_data) end = time.time() skl_geo_time[i] = end - start skl_precision_score[i] = precision_score(test_labels, skl_geo_fitted_labels, average='micro') skl_recall_score[i] = recall_score(test_labels, skl_geo_fitted_labels, average='micro') .. GENERATED FROM PYTHON SOURCE LINES 164-165 The following figures show the precision score and the recall score with `abess` or `scikit-learn`. .. GENERATED FROM PYTHON SOURCE LINES 165-188 .. code-block:: Python fig = plt.figure(figsize=(15,5)) ax1 = fig.add_subplot(121) ax1.boxplot([abess_precision_score, skl_precision_score], patch_artist='Patch', labels = ['abess', 'scikit-learn'], boxprops = {'color':'black','facecolor':'yellow'} ) ax1.set_title('precision score with abess or scikit-learn') ax1.set_ylabel('precision score') ax2 = fig.add_subplot(122) ax2.boxplot([abess_recall_score, skl_recall_score], patch_artist='Patch', labels = ['abess', 'scikit-learn'], boxprops = {'color':'black','facecolor':'yellow'} ) ax2.set_title('recall score with abess or scikit-learn') ax2.set_ylabel('recall score') plt.show() .. image-sg:: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_004.png :alt: precision score with abess or scikit-learn, recall score with abess or scikit-learn :srcset: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 189-190 The following figure shows the running time with `abess` or `scikit-learn`. .. GENERATED FROM PYTHON SOURCE LINES 190-210 .. code-block:: Python abess_geo_time_mean = np.mean(abess_geo_time) skl_geo_time_mean = np.mean(skl_geo_time) abess_geo_time_std = np.std(abess_geo_time) skl_geo_time_std = np.std(skl_geo_time) meth = ['abess', 'scikit-learn'] x_pos = np.arange(len(meth)) CTEs = [abess_geo_time_mean, skl_geo_time_mean] error = [abess_geo_time_std, skl_geo_time_std] fig = plt.figure(figsize=(8,5)) ax = fig.add_subplot(111) ax.bar(x_pos, CTEs, yerr=error, align='center', alpha=0.5, ecolor='black', capsize=10) ax.set_ylabel('running time') ax.set_xticks(x_pos) ax.set_xticklabels(meth) ax.set_title('running time with abess or scikit-learn') ax.yaxis.grid(True) plt.show() .. image-sg:: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_005.png :alt: running time with abess or scikit-learn :srcset: /auto_gallery/5-scikit-learn-connection/images/sphx_glr_plot_2_geomstats_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 211-213 We can find that the precision score and the recall score with `abess` are generally higher than those without `abess`. And the running time with `abess` is only slightly slower than that without `abess`. .. GENERATED FROM PYTHON SOURCE LINES 215-216 sphinx_gallery_thumbnail_path = 'Tutorial/figure/geomstats.png' .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 3.911 seconds) .. _sphx_glr_download_auto_gallery_5-scikit-learn-connection_plot_2_geomstats.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_2_geomstats.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_2_geomstats.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_