pyrfm: A library for random feature maps in Python.

pyrfm

A library for random feature maps and linear models with random feature maps in Python.

Installation

  1. Download the source codes by:

    git clone https://github.com/neonnnnn/pyrfm.git
    

or download as a ZIP from GitHub.

  1. Install the dependencies::

    cd pyrfm
    
    pip install -r requirements.txt
    
  2. Finally, build and install pyrfm by:

    python setup.py install
    

For running example codes (in pyrfm/benchmarks), jupyter and matplotlib are required.

What are random feature maps?

Using random feature maps is a promising way for large-scale kernel methods. They are maps from an original feature space to a randomized feature space approximating a kernel-induced feature space. The basic idea is to run linear models on such randomized feature space for classification, regression, clustering, etc. When the dimension of the random feature map D is not so high and the number of training example N is large, this approach is very efficient compared to canonical kernel methods.

Random Feature Maps Implemented

pyrfm follows the scikit-learn API and now supports following random features.

  • RandomFourier: random Fourier feature (for the RBF kernel) [RR08]

  • RandomMaclaurin: random Maclaurin feature (for the polynomial kernel, exp kernel, and user-specified dot product kernels) [KK12]

  • TensorSketch: tensor sketching (for the polynomial kernel) [PP13]

  • RandomKernel: random kernel feature (for the ANOVA kernel and all-subsets kernel) [ASO19]

  • MB: S.Maji and A.Berg feature (for the intersection (min) kernel) (this feature is not random) [MB09]

In other words, pyrfm now provides approximaters for following kernels.

The random Fourier feature is also implemented in scikit-learn (kernel_approximation.RBFSampler).

Furthermore, pyrfm supports following structured random features.

These methods are faster and more memory-efficient than canonical random features such as random Fourier, random kernel, etc. We believe that you can use these structured random features as a subroutine of your proposed random features, and SignedCirculantRandomKernel is an example of it (SignedCirculantRandomMatrix is used as a subroutine).

Moreover, following data-dependent random feature methods are supported:

Additionaly, we provide following random projection methods.

Linear Models Implemented

Moreover, pyrfm supports following solvers for linear models with random features.

All methods support squared loss for regression and hinge loss, squared hinge loss, and logistic loss for classification.

AdaGrad, SDCA, Adam, SGD/ASGD, and SAG/SAGA in pyrfm are for a very-large-scale dataset such that computing its random feature matrix (i.e., computing random features for all instances at the same time) causes MemoryError. If you can allocate memory for random feature matrix of your training data, you should use the other implementations of linear models (linear_model in scikit-learn, sklearn-contrib-lightning, etc). DoublySGD solvers do not keep the random weights but samples them at each iteration. DoublySGD solver increase the number of random weights vector at every iteration. We recommend batch_size > 1 and n_bases_sampled > 1. For more details, please see [DXH+14].

Now, these stochastic solvers except DoublySGD solvers run efficiently for following random features.

For improving efficiency, implement cdef class and cdef transform method for your desired transformer. Please see random_feature/random_features_fast.pyx/pxd. Although these stochastic solvers support any transformers, they might run unbelievable slow when there is no cdef class and cdef transform method for your desired transformer in random_features_fast.pyx. DoublySGD solvers supports only RBFSampler, RandomFourier, RandomMaclaurin, RandomKernel, and SkewedChi2Sampler. We believe that these implementations can be used for researches.

References

ASO19(1,2)

Kyohei Atarashi, Maji Subhransu, and Satoshi Oyama. Random feature maps for the itemset kernel. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. 2019.

Bot10

Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.

Bot12

Léon Bottou. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade, pages 421–436. Springer, 2012.

CHL08

Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. Coordinate descent method for large-scale l2-loss linear support vector machines. Journal of Machine Learning Research, 9(Jul):1369–1398, 2008.

CCFC02

Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, 693–703. Springer, 2002.

DXH+14(1,2)

Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina F Balcan, and Le Song. Scalable kernel methods via doubly stochastic gradients. In Advances in Neural Information Processing Systems, 3041–3049. 2014.

DBLJ14

Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems, 1646–1654. 2014.

DHS11

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.

FHL15

Chang Feng, Qinghua Hu, and Shizhong Liao. Random feature mapping with signed circulant matrix projection. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence. 2015.

HXGD14

Raffay Hamid, Ying Xiao, Alex Gittens, and Dennis DeCoste. Compact random feature maps. In Proceedings of the International Conference on Machine Learning, 19–27. 2014.

KK12

Purushottam Kar and Harish Karnick. Random feature maps for dot product kernel. In Proceedings of the Artificial Inteligence and Statistics, 583–591. 2012.

KB15

Diederik P Kingma and Jimmy Ba. Adam: a method for stochastic optimization. In Proceedings of the International Conference on Learning Representations. 2015.

LSarlosS13

Quoc Le, Tamás Sarlós, and Alex Smola. Fastfood-approximating kernel expansions in loglinear time. In Proceedings of the International Conference on Machine Learning, volume 85. 2013.

LHC06(1,2)

Ping Li, Trevor J Hastie, and Kenneth W Church. Very sparse random projections. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 287–296. ACM, 2006.

MB09(1,2)

Subhransu Maji and Alexander C Berg. Max-margin additive classifiers for detection. In Proceedings of the IEEE International Conference on Computer Vision, 40–47. IEEE, 2009.

PP13

Ninh Pham and Rasmus Pagh. Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 239–247. ACM, 2013.

RR08

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Proceedings of the Advances in Neural Information Processing Systems, 1177–1184. 2008.

SLRB17

Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite sums with the stochastic average gradient. Mathematical Programming, 162(1-2):83–112, 2017.

SSZ13

Shai Shalev-Shwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14(Feb):567–599, 2013.

SD16

Aman Sinha and John C Duchi. Learning kernels with random features. In Advances in Neural Information Processing Systems, 1298–1306. 2016.

Tro11

Joel A Tropp. Improved analysis of the subsampled randomized hadamard transform. Advances in Adaptive Data Analysis, 3(01n02):115–126, 2011.

WDA+09

Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, and Alex Smola. Feature hashing for large scale multitask learning. arXiv preprint arXiv:0902.2206, 2009.

YSC+16

Felix Xinnan X Yu, Ananda Theertha Suresh, Krzysztof M Choromanski, Daniel N Holtmann-Rice, and Sanjiv Kumar. Orthogonal random features. In Proceedings of the Advances in Neural Information Processing Systems, 1975–1983. 2016.

Authors

  • Kyohei Atarashi, 2018-present

Indices and tables