Collaborative filtering

Tools to quickly build a collab Learner

This module adds to the tools from fastai collab to use transfer learning by loading embeddings needed for collaborative filtering from a pretrained model. Additionally, it also adds the capability of saving the vocabulary the collab model was trained on. The most important function in this module is collab_learner.

Loading `users`/`items` embeddings from a pretrained model

In a collab model, to load a pretrained vocabulary, we need to adapt the embeddings of the vocabulary used for the pre-training to the vocabulary of our current collab corpus.

source

match_embeds

 match_embeds (old_wgts:dict, old_vocab:list, new_vocab:dict)

Convert the users and items (possibly saved as 0.module.encoder.weight and 1.attn.lbs_weight.weight respectively) embedding in old_wgts to go from old_vocab to new_vocab

	Type	Details
old_wgts	dict	Embedding weights of the pretrained model
old_vocab	list	Vocabulary (tokens and labels) of the corpus used for pretraining
new_vocab	dict	Current collab corpus vocabulary (`users` and `items`)
Returns	dict

Create a `Learner`

source

load_pretrained_keys

 load_pretrained_keys (model, wgts:dict)

Load relevant pretrained wgts in `model

	Type	Details
model		Model architecture
wgts	dict	Model weights
Returns	tuple

source

CollabLearner

 CollabLearner (dls:DataLoaders, model:callable,
                loss_func:callable|None=None,
                opt_func:Optimizer|OptimWrapper=<function Adam>,
                lr:float|slice=0.001, splitter:callable=<function
                trainable_params>, cbs:Callback|MutableSequence|None=None,
                metrics:callable|MutableSequence|None=None,
                path:str|Path|None=None, model_dir:str|Path='models',
                wd:float|int|None=None, wd_bn_bias:bool=False,
                train_bn:bool=True, moms:tuple=(0.95, 0.85, 0.95),
                default_cbs:bool=True)

Basic class for a Learner in Collab.

	Type	Default	Details
dls	DataLoaders		`DataLoaders` containing fastai or PyTorch `DataLoader`s
model	callable		PyTorch model for training or inference
loss_func	callable \| None	None	Loss function. Defaults to `dls` loss
opt_func	Optimizer \| OptimWrapper	Adam	Optimization function for training
lr	float \| slice	0.001	Default learning rate
splitter	callable	trainable_params	Split model into parameter groups. Defaults to one parameter group
cbs	Callback \| MutableSequence \| None	None	`Callback`s to add to `Learner`
metrics	callable \| MutableSequence \| None	None	`Metric`s to calculate on validation set
path	str \| Path \| None	None	Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir	str \| Path	models	Subdirectory to save and load models
wd	float \| int \| None	None	Default weight decay
wd_bn_bias	bool	False	Apply weight decay to normalization and bias parameters
train_bn	bool	True	Train frozen normalization layers
moms	tuple	(0.95, 0.85, 0.95)	Default momentum for schedulers
default_cbs	bool	True	Include default `Callback`s

It works exactly as a normal learner, the only difference is that it also saves the items vocabulary used by self.model

The following function lets us quickly create a Learner for collaborative filtering from the data.

source

collab_learner

 collab_learner (dls, n_factors=50, use_nn=False, emb_szs=None,
                 layers=None, config=None, y_range=None, loss_func=None,
                 pretrained=False, opt_func:Union[fastai.optimizer.Optimiz
                 er,fastai.optimizer.OptimWrapper]=<function Adam>,
                 lr:Union[float,slice]=0.001, splitter:<built-
                 infunctioncallable>=<function trainable_params>, cbs:Unio
                 n[fastai.callback.core.Callback,collections.abc.MutableSe
                 quence,NoneType]=None, metrics:Union[<built-infunctioncal
                 lable>,collections.abc.MutableSequence,NoneType]=None,
                 path:Union[str,pathlib.Path,NoneType]=None,
                 model_dir:Union[str,pathlib.Path]='models',
                 wd:Union[float,int,NoneType]=None, wd_bn_bias:bool=False,
                 train_bn:bool=True, moms:tuple=(0.95, 0.85, 0.95),
                 default_cbs:bool=True)

Create a Learner for collaborative filtering on dls.

	Type	Default	Details
dls	DataLoaders		`DataLoaders` containing fastai or PyTorch `DataLoader`s
n_factors	int	50
use_nn	bool	False
emb_szs	NoneType	None
layers	NoneType	None
config	NoneType	None
y_range	NoneType	None
loss_func	callable \| None	None	Loss function. Defaults to `dls` loss
pretrained	bool	False
opt_func	Optimizer \| OptimWrapper	Adam	Optimization function for training
lr	float \| slice	0.001	Default learning rate
splitter	callable	trainable_params	Split model into parameter groups. Defaults to one parameter group
cbs	Callback \| MutableSequence \| None	None	`Callback`s to add to `Learner`
metrics	callable \| MutableSequence \| None	None	`Metric`s to calculate on validation set
path	str \| Path \| None	None	Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir	str \| Path	models	Subdirectory to save and load models
wd	float \| int \| None	None	Default weight decay
wd_bn_bias	bool	False	Apply weight decay to normalization and bias parameters
train_bn	bool	True	Train frozen normalization layers
moms	tuple	(0.95, 0.85, 0.95)	Default momentum for schedulers
default_cbs	bool	True	Include default `Callback`s

If use_nn=False, the model used is an EmbeddingDotBias with n_factors and y_range. Otherwise, it’s a EmbeddingNN for which you can pass emb_szs (will be inferred from the dls with get_emb_sz if you don’t provide any), layers (defaults to [n_factors]) y_range, and a config that you can create with tabular_config to customize your model.

loss_func will default to MSELossFlat and all the other arguments are passed to Learner.

path = untar_data(URLs.ML_SAMPLE)
ratings = pd.read_csv(path/'ratings.csv')
ratings.head()

	userId	movieId	rating	timestamp
0	73	1097	4.0	1255504951
1	561	924	3.5	1172695223
2	157	260	3.5	1291598691
3	358	1210	5.0	957481884
4	130	316	2.0	1138999234

dls = CollabDataLoaders.from_df(ratings, bs=64)
dls.show_batch()

	userId	movieId	rating
0	199	3578	4.0
1	564	165	4.0
2	664	1198	4.5
3	608	1682	4.0
4	654	1	5.0
5	213	457	4.0
6	56	58559	5.0
7	292	597	4.5
8	102	1036	4.0
9	262	3578	2.5

with tempfile.TemporaryDirectory() as d:
    learn = collab_learner(dls, y_range=(0,5), path=d)
    learn.fit(1)
    
    # Test save created a file
    learn.save('tmp')
    assert (Path(d)/'models/tmp.pth').exists()
    assert (Path(d)/'models/tmp_vocab.pkl').exists()

epoch	train_loss	valid_loss	time
0	2.480442	2.294809	00:00

Loading users/items embeddings from a pretrained model

match_embeds

Create a Learner

load_pretrained_keys

CollabLearner

collab_learner

Loading `users`/`items` embeddings from a pretrained model

Create a `Learner`