TensorFlow Algorithms¶
LKPY provides several algorithm implementations, particularly matrix factorization, using TensorFlow. These algorithms serve two purposes:
Provide classic algorithms ready to use for recommendation or as baselines for new techniques.
Demonstrate how to connect TensorFlow to LensKit for use in your own experiments.
Biased MF¶
These models implement the standard biased matrix factorization model, like
lenskit.algorithms.als.BiasedMF
, but learn the model parameters
using TensorFlow’s gradient descent instead of the alternating least squares
algorithm.
Bias-Based¶
-
class
lenskit.algorithms.tf.
BiasedMF
(features=50, *, bias=True, damping=5, epochs=5, batch_size=10000, reg=0.02, rng_spec=None)¶ Bases:
lenskit.algorithms.mf_common.MFPredictor
Biased matrix factorization model for explicit feedback, optimized with TensorFlow.
This is a basic TensorFlow implementation of the biased matrix factorization model for rating prediction:
\[s(i|u) = b + b_u + b_i + \vec{p}_u \cdot \vec{q_i}\]User and item embedding matrices are regularized with \(L_2\) regularization, governed by a regularization term \(\lambda\). Regularizations for the user and item embeddings are then computed as follows:
\[\begin{split}\lambda_u = \lambda / |U| \\ \lambda_i = \lambda / |I| \\\end{split}\]This rescaling allows the regularization term to be independent of the number of users and items.
Because the model is very simple, this algorithm works best with large batch sizes.
This implementation uses
lenskit.algorithms.bias.Bias
for computing the biases, and uses TensorFlow to fit a matrix factorization on the residuals. It then extracts the resulting matrices, and relies onMFPredictor
to implement the prediction logic, likelenskit.algorithms.als.BiasedMF
. Its code is suitable as an example of how to build a Keras/TensorFlow algorithm implementation for LensKit where TF is only used in the train stage.A variety of resources informed the design, most notably this one.
- Parameters
features (int) – The number of latent features to learn.
bias – The bias model to use.
damping – The bias damping, if
bias
isTrue
.epochs (int) – The number of epochs to train.
batch_size (int) – The Keras batch size.
reg (double) – The regularization term \(\lambda\) used to derive embedding vector regularizations.
rng_spec – The random number generator initialization.
-
fit
(ratings, **kwargs)¶ Train a model using the specified ratings (or similar) data.
- Parameters
ratings (pandas.DataFrame) – The ratings data.
kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.
- Returns
The algorithm object.
-
predict_for_user
(user, items, ratings=None)¶ Compute predictions for a user and items.
- Parameters
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
- Returns
scores for the items, indexed by item id.
- Return type
Fully Integrated¶
-
class
lenskit.algorithms.tf.
IntegratedBiasMF
(features=50, *, epochs=5, batch_size=10000, reg=0.02, bias_reg=0.2, rng_spec=None)¶ Bases:
lenskit.Predictor
Biased matrix factorization model for explicit feedback, optimizing both bias and embeddings with TensorFlow.
This is a basic TensorFlow implementation of the biased matrix factorization model for rating prediction:
\[s(i|u) = b + b_u + b_i + \vec{p}_u \cdot \vec{q_i}\]User and item embedding matrices are regularized with \(L_2\) regularization, governed by a regularization term \(\lambda\). Regularizations for the user and item embeddings are then computed as follows:
\[\begin{split}\lambda_u = \lambda / |U| \\ \lambda_i = \lambda / |I| \\\end{split}\]This rescaling allows the regularization term to be independent of the number of users and items. The same rescaling applies to the bias regularization.
Because the model is very simple, this algorithm works best with large batch sizes.
This implementation uses TensorFlow to fit the entire model, including user/item biases and residuals, and uses TensorFlow to do the final predictions as well. Its code is suitable as an example of how to build a Keras/TensorFlow algorithm implementation for LensKit where TF used for the entire process.
A variety of resources informed the design, most notably this one and `Chin-chi Hsu's example code`_.
- Parameters
features (int) – The number of latent features to learn.
epochs (int) – The number of epochs to train.
batch_size (int) – The Keras batch size.
reg (double) – The regularization term for the embedding vectors.
bias_reg (double) – The regularization term for the bias vectors.
rng_spec – The random number generator initialization.
-
model
¶ The Keras model.
-
fit
(ratings, **kwargs)¶ Train a model using the specified ratings (or similar) data.
- Parameters
ratings (pandas.DataFrame) – The ratings data.
kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.
- Returns
The algorithm object.
-
predict_for_user
(user, items, ratings=None)¶ Compute predictions for a user and items.
- Parameters
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
- Returns
scores for the items, indexed by item id.
- Return type
Bayesian Personalized Rating¶
-
class
lenskit.algorithms.tf.
BPR
(features=50, *, epochs=5, batch_size=10000, reg=0.02, neg_count=1, rng_spec=None)¶ Bases:
lenskit.Predictor
Bayesian Personalized Ranking with matrix factorization, optimized with TensorFlow.
This is a basic TensorFlow implementation of the BPR algorithm _[BPR].
User and item embedding matrices are regularized with \(L_2\) regularization, governed by a regularization term \(\lambda\). Regularizations for the user and item embeddings are then computed as follows:
\[\begin{split}\lambda_u = \lambda / |U| \\ \lambda_i = \lambda / |I| \\\end{split}\]This rescaling allows the regularization term to be independent of the number of users and items.
Because the model is relatively simple, optimization works best with large batch sizes.
- Parameters
features (int) – The number of latent features to learn.
epochs (int) – The number of epochs to train.
batch_size (int) – The Keras batch size. This is the number of positive examples to sample in each batch. If
neg_count
is greater than 1, the batch size will be similarly multipled.reg (double) – The regularization term for the embedding vectors.
neg_count (int) – The number of negative examples to sample for each positive one.
rng_spec – The random number generator initialization.
-
model
¶ The Keras model.
-
fit
(ratings, **kwargs)¶ Train a model using the specified ratings (or similar) data.
- Parameters
ratings (pandas.DataFrame) – The ratings data.
kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.
- Returns
The algorithm object.
-
predict_for_user
(user, items, ratings=None)¶ Compute predictions for a user and items.
- Parameters
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
- Returns
scores for the items, indexed by item id.
- Return type