# Develop New Features¶

In this tutorial, we will show you how to develop a new algorithm for specific best-subset problem with `abess`

's procedure.

## Preliminaries¶

We have endeavor to make developing new features easily. Before developing the code, please make sure you have:

understood the ABESS algorithm;

installed

`abess`

via the code in github by following Installation instruction;read the Appendix: Architecture of abess of

`abess`

library;some experience on writing R or Python code.

## Core C++¶

The main files related to the core are in `src`

, which
are written in C++. Among them, some important files:

`api.cpp/api.h`

contain the API's, which are the entrance for both R & Python.`AlgorithmXXX.h`

records the implement of each concrete algorithm;

If you want to add a new algorithm, all of them should be noticed.

Besides, we have implemented ABESS algorithms for generalized linear model on
`src/AlgorithmGLM.h`

[code temp]
and principal component analysis (PCA) on `src/AlgorithmPCA.h`

[code temp].
You can check them to help your own developing.

### Write an API¶

API’s are all defined in the `src/api.cpp`

[code
temp]
and the related header file `src/api.h`

[code
temp].
We have written some API functions (e.g. `abessGLM_API()`

), so you
can either add a new function for the new algorithm or simply add
into existing one.

First of all, the algorithm name and its number should be determined.

The format of a new algorithm’s name is “**abess+your_algorithm**”,
which means that using abess to solve the problem, and its number should
be an integer unused by others.

In the following part, we suppose to create an algorithm named
`abess_new_algorithm`

with number 123.

Next, four important data type should be determined:

T1 : type of Y

T2 : type of coefficients

T3 : type of intercept

T4 : type of X

The algorithm variable are based on them: [code link]

```
vector<Algorithm<{T1}, {T2}, {T3}, {T4}>*> algorithm_list(algorithm_list_size);
```

Take `LinearRegression`

(the linear regression on abess) as an example,

Y: dense vector

Coefficients: dense vector

Intercept: numeric

X: dense/sparse matrix

so that we define:

```
vector<Algorithm<Eigen::VectorXd, Eigen::VectorXd, double, Eigen::MatrixXd> *> algorithm_list_uni_dense(algorithm_list_size);
vector<Algorithm<Eigen::VectorXd, Eigen::VectorXd, double, Eigen::SparseMatrix<double>> *> algorithm_list_uni_sparse(algorithm_list_size);
```

After that, request memory to initial the algorithm: [code link]

```
for (int i = 0; i < algorithm_list_size; i++)
{
if (model_type == 123) // number of algorithm
{
algorithm_list[i] = new abessLm<{T4}>(...);
}
}
```

Finally, call `abessWorkflow()`

, which would compute the result:
[code link]

```
// "List" is a preset structure to store results
List out_result = abessWorkflow<{T1}, {T2}, {T3}, {T4}>(..., algorithm_list);
```

### Implement your Algorithm¶

The implemented algorithms are stored in
`src/AlgorithmXXX.h`

. We have implemented some
algorithms (e.g. `AlgorithmGLM.h`

), so you can either create a new
file containing new algorithm or simply add into existing one.

The new algorithm should inherit a base class, called *Algorithm*,
which defined in `Algorithm.h`

. And then rewrite some virtual function
interfaces to fit specify problem. The implementation is modularized
such that you can easily extend the package.

A concrete algorithm is like: [code link]

```
#include "Algorithm.h"
template <class T4>
class abess_new_algorithm : public Algorithm<{T1}, {T2}, {T3}, T4> // T1, T2, T3 are the same as above, which are fixed.
{
public:
// constructor and destructor
abess_new_algorithm(...) : Algorithm<...>::Algorithm(...){};
~abess_new_algorithm(){};
void primary_model_fit(...){
// solve the subproblem under given active set
// record the sparse answer in variable "beta"
};
double loss_function(...){
// define and compute loss under given active set
// return the current loss
};
void sacrifice(...){
// define and compute sacrifice for all variables (both forward and backward)
// record sacrifice in variable "bd"
};
double effective_number_of_parameter(...){
// return effective number of parameter
};
}
```

Note that `sacrifice`

function here would compute “forward/backward
sacrifices” and record them in `bd`

.

For active variable, the lower (backward) sacrifice is, the more likely it will be dropped;

For inactive variable, the higher (forward) sacrifice is, the more likely it will come into use.

If you create a new file to store the algorithm, remember to include
it inside `src/api.cpp`

. [code
temp]

Now your new method has been connected to the whole frame. In the next section, we focus on how to build R or Python package based on the core code.

## R & Python Package¶

### R Package¶

To make sure your code available for R, run

```
R CMD INSTALL R-package
```

Then, this package would be installed into R session if the R package
dependence (`Rcpp`

and `Matrix`

) have been installed.

After that, the object in R can be passed to Cpp via the unified API
`abessCpp`

. We strongly suggest the R function is named as
`abessXXX`

and use `roxygen2`

to write R documentation and
`devtools`

to configure your package.

### Python Package¶

First of all, you should ensure the C++ code available for Python,
`cd`

into directory `abess/python`

and run
`$ python setup.py install`

. (Same steps in Installation)

It may take a few minutes to install:

if the installation throw some errors, it means that the C++ code may be wrong;

if the installation runs without errors, it will finish with message like “

*Finished processing dependencies for abess*”.

Then create a new python file in `python/abess`

or open an
existed file, such as `python/abess/linear.py`

, to add a python
API for your new method.

A simple new method can be added like: [code temp].

```
# all algorithms should inherit the base class `bess_base`
from .bess_base import bess_base
class new_algorithm(bess_base):
"""
Here is some introduction.
"""
def __init__(self, ...):
super(abess_new_algorithm, self).__init__(
algorithm_type="abess",
model_type="new_algorithm",
# other init
)
def fit(self, ...):
# override `bess_base.fit()`, if necessary
def custom_function(self, ...):
# some custom functions, e.g. predict
```

The base class implements a `fit`

function, which plays a role on
checking input and calling C++ API to compute results. You may want to
override it for custom features. [code temp].

Then, the final step is to link this Python class with the model type
number (it has been defined in Section **Core C++**). In the `fit`

function, you would find somewhere like:

```
if self.model_type == "new_algorithm":
model_type_int = 123 # same number in C++
```

Finally, don’t forget to import the new algorithm in
`python/abess/__init__.py`

.

Now run `$ python setup.py install`

again and this time the
installation would be finished quickly. Congratulation! Your work can
now be used by:

```
from abess import new_algorithm
```

#### bess_base¶

As we show above, any new methods are based on `bess_base`

, which can
be found in `bess_base.py`

: [code
link]

```
from sklearn.base import BaseEstimator
class bess_base(BaseEstimator):
def __init__(...):
# some init
def fit(...):
# check input, warp with cpp
```

Actually, it is based on `sklearn.base.BaseEstimator`

[code
link].
Two methods, `get_params`

and `set_params`

are offered in this base
class.

In our package, we write an method called `fit`

to realize the abess
process. Of cause, you can also override it like `SparsePCA`

.

## Verify you result¶

After programming the code, it is necessary to verify the contributed function can return a reasonable result. Here, we share our experience for it. Notice that the core our algorithm are forward and backward sacrifices, as long as they are properly programming, the contributed function would work well.

Check

`primary_model_fit`

and`loss_function`

Secondly, we recommend you consider `primary_model_fit`

for the
computation of backward sacrifices. To check whether it works well, you
can leverage the parameter `always.include`

in R. Actually, when the
number of elements pass to `always.include`

is equal to
`support.size`

(`always_include`

and `support_size`

in Python),
our algorithm is no need to do variable selection since all element must
be selected, and thus, our implementation framework would just simply
solving a convex problem by conducting `primary_model_fit`

and the
solution should match to (or close to) the function implemented in
R/Python. Take the PCA task as an example, we should expect that, the
results returned by `abess`

:

```
data(USArrests)
abess_fit <- abesspca(USArrests, always.include = c(1:3), support.size = 3)
as.vector(spca_fit[["coef"]])[1:3]
```

should match with that returned by the `princomp`

function:

```
princomp_fit <- loadings(princomp(USArrests[, 1:3]))[, 1]
princomp_fit
```

Actually, in our implementation, the results returned in two code blocks is match in magnitude. If the results are match, you can congratulate for your correct coding. We also recommend you write a automatic test case for this following the content below.

At the same time, you can see whether the `loss_function`

is right by
comparing `spca_fit[["loss"]]`

and the variance of the first principal
component.

Check

`sacrifice`

Thirdly, we recommend you consider `sacrifice`

. Checking the function
`sacrifice`

needs more efforts. Monte Carlo studies should be conduct
to check whether `sacrifice`

is properly programmed such that the
effective/relevant variables can be detected when sample size is large.
We strongly recommend to check the result by setting: - sample size at
least 1000 - dimension is less than 50 - the true support size is less
than 5 - variables are independence - the support size from 0 to the
ground true - the \(l_2\) regularization is zero.

In most of the cases, this setting is very helpful for checking code.
Generally, the output of `abess`

would match to the correct under this
setting. Take linear regression in R as our example, the code for
checking is demonstrated below:

```
n <- 1000
p <- 50
support_size <- 3
dataset <- generate.data(n, p, support_size, seed = 1)
abess_fit <- abess(dataset[["x"]], dataset[["y"]], support.size = 0:support_size)
## estimated support:
extract(abess_fit, support.size = support_size)[["support.vars"]]
## true support:
which(dataset[["beta"]] != 0)
```

In this example, the estimated support set is the same as the true.

## Write test cases¶

It is always a good habit to do some test for the changed package. Contributions with test cases included are easier to accept.

We use testthat for unit tests in R and pytest in Python. You may need to install first.

You can find some examples here and please feel free to add your test code into it (or create a new test file) under the test folder:

R test folder:

`abess/R-package/tests/testthat`

.Python test folder:

`python/pytest`

.

A good test code should contain:

possible input modes as well as some wrong input;

check whether the output is expected;

possible extreme cases;

All test under pytest folder should be checked after coding.