Model Sharing¶
The lenskit.sharing
module provides utilities for managing models and sharing them
between processes, particularly for the multiprocessing in lenskit.batch
.
Sharing Mode¶
The only piece algorithm developers usually need to directly handle is the concept of ‘sharing mode’ when implementing custom pickling logic. To save space, it is reasonable to exclude intermediate data structures, such as caches or inverse indexes, from the pickled representation of an algorithm, and reconstruct them when the model is loaded.
However, LensKit’s multi-process sharing also uses pickling to capture the object state
while using shared memory for numpy.ndarray
objects. In these cases, the structures
should be pickled, so they can be shared between model instances.
To support this, we have the concept of sharing mode. Code that excludes objects when
pickling should call in_share_context()
to determine if that exclusion should
actually happen.
Query whether sharing mode is active. If
True
, we are currently in asharing_mode()
context, which means model pickling will be used for cross-process sharing.
-
lenskit.sharing.
sharing_mode
()¶ Context manager to tell models that pickling will be used for cross-process sharing, not model persistence.
Model Store API¶
Model stores handle persisting models into shared memory, cleaning up shared memory, and making objects available to other classes.
LensKit users and algorithm implementers will generally not need to use this code themselves, unlessthey are implementing their own batch processing logic.
-
lenskit.sharing.
get_store
(reuse=True, *, in_process=False)¶ Get a model store, using the best available on the current platform. The resulting store should be used as a context manager, as in:
>>> with get_store() as store: ... pass
This function uses the following priority list for locating a suitable store:
The currently-active store, if
reuse=True
A no-op store, if
in_process=True
SHMModelStore
, if on Python 3.8JoblibModelStore
- Parameters
- Returns
the model store.
- Return type
-
class
lenskit.sharing.
BaseModelStore
¶ Bases:
lenskit.sharing.BaseModelClient
Base class for storing models for access across processes.
Stores are also context managers that initalize themselves and clean themselves up. As context managers, they are also re-entrant, and register themselves so that
create_store()
can re-use existing managers.-
abstract
client
()¶ Get a client for the model store. Clients are cheap to pass to child processes for multiprocessing.
- Returns
the model client.
- Return type
-
init
()¶ Initialize the store.
-
abstract
put_model
(model)¶ Store a model in the model store.
- Parameters
model (object) – the model to store.
- Returns
a key to retrieve the model with
BaseModelClient.get_model()
-
put_serialized
(path, binpickle=False)¶ Deserialize a model and load it into the store.
The base class method unpickles
path
and callsput_model()
.- Parameters
path (str or pathlib.Path) – the path to deserialize
binpickle – if
True
, deserialize withbinpickle.load()
instead of pickle.
-
shutdown
()¶ Shut down the store
-
abstract
-
class
lenskit.sharing.
BaseModelClient
¶ Bases:
object
Model store client to get models given keys. Clients must be able to be cheaply pickled and de-pickled to enable worker processes to access them.
-
abstract
get_model
(key)¶ Get a model from the model store.
- Parameters
key – the model key to retrieve.
- Returns
The model, previously stored with
BaseModelStore.put_model()
, wrapped in aSharedObject
to manage underlying resources.- Return type
-
abstract
Bases:
object
Wrapper for a shared object that can release it when the object is no longer needed.
Objects of this type are context managers, that return the shared object (not themselves) when entered.
Any other refernces to
object
, or its contents, must be released before callingrelease()
or exiting the context manager. Among other things, that means that you will need to delete its variable:with client.get_model(k) as model: # model here is the actual model object wrapped by the SharedObject # returned by get_model pass # actually do the things you want to do del model # release model, so the shared object can be closed
Be careful of stray references to the model object! Some things we have seen causing stray references include:
passing the algorithm to a logger (call
str()
on it explicitly), at least in the test harness
The default implementation uses
sys.getrefcount()
to provide debugging support to help catch stray references.the underlying shared object.
Release the shared object. Automatically called by
__exit__()
, so in normal use of a shared object with awith
statement, this method is not needed.The base class implementation simply deletes the object reference. Subclasses should override this method to handle their own release logic.
Model Store Implementations¶
We provide several model store implementations.
Memory Mapping¶
The memory-mapped-file store works on any supported platform and Python version. It uses Joblib’s memory-mapped Pickle extension to store models on disk and use their storage to back memory-mapped views of major data structures.
-
class
lenskit.sharing.file.
FileModelStore
(*, path=None, reserialize=True)¶ Bases:
lenskit.sharing.BaseModelStore
,lenskit.sharing.file.FileClient
Model store using BinPickle’s memory-mapping pickle support.
- Parameters
path – the path to use; otherwise uses a new temp directory under
util.scratch_dir()
.reserialize – if
True
(the default), models passed toput_serialized()
are re-serialized in the BinPickle storage, even if they are binpickle files.
-
class
lenskit.sharing.file.
FileClient
¶ Bases:
lenskit.sharing.BaseModelClient
Client using BinPickle’s memory-mapping pickle support.