Parallel Execution

LensKit supports various forms of parallel execution, each with an environment variable controlling its :

Batch operations using multi-process execution.
Parallel model training. For most models provided by LensKit, this is usually implemented using PyTorch JIT parallelism (torch.jit.fork()).
Parallel computation in the various backends (BLAS, MKL, Torch, etc.).

Other models compatible with LensKit may use their own parallel processing logic.

Configuring Parallelism

LensKit provides 4 knobs for configuring parallelism, each of which has a corresponding environment variable and parameter to initialize(). The environment variables are:

LK_NUM_PROCS: The number of processes to use for batch operations. Defaults to the number of CPUs or 4, whichever is lower.

LK_NUM_THREADS

The number of threads to use for parallel model building. Defaults to the number of CPUs or 8, whichever is smaller.

This number is passed to torch.set_num_interop_threads() to set up the Torch JIT thread count.

LK_NUM_BACKEND_THREADS

The number of threads to be used by backend compute engines. Defaults to up to 4 backend threads per training thread, depending on the capacity of the machine:

max(min(NCPUS // LK_NUM_THREADS, 4), 1)

This is passed to torch.set_num_threads() (to control PyTorch internal parallelism), and to the underlying BLAS layer (via threadpoolctl).

LK_NUM_CHILD_THREADS

The number of backend threads to be used in worker processes spawned by batch evaluation. Defaults to 4 per process, capped by the number of CPUs available:

max(min(NCPUS // LK_NUM_PROCS, 4), 1)

Workers have both the process and thread counts set to 1.

lenskit.parallel.initialize(*, processes=None, threads=None, backend_threads=None, child_threads=None)

Set up and configure LensKit parallelism. This only needs to be called if you want to control when and how parallelism is set up; components using parallelism will call ensure_init(), which will call this function with its default arguments if it has not been called.

Parameters:

processes (int | None) – The number of processes to use for multiprocessing evaluations (see LK_NUM_PROCS)
threads (int | None) – The number of threads to use for parallel model training and similar operations (see LK_NUM_THREADS).
backend_threads (int | None) – The number of threads underlying computational engines should use (see LK_NUM_BACKEND_THREADS).
child_threads (int | None) – The number of threads backends are allowed to use in the worker processes in multiprocessing operations (see LK_NUM_CHILD_THREADS).

lenskit.parallel.ensure_parallel_init()

Make sure LensKit parallelism is configured, and configure with defaults if it is not.

Components using parallelism or intensive computations should call this function before they begin training.

Parallel Model Ops

LensKit uses a custom API wrapping multiprocessing.pool.Pool to parallelize batch operations (see lenskit.batch).

The basic idea of this API is to create an invoker that has a model and a function, and then passing lists of argument sets to the function:

with invoker(model, func):
    results = list(func.map(args))

The model is persisted into shared memory to be used by the worker processes. PyTorch tensors, including those on CUDA devices, are shared.

LensKit users will generally not need to directly use parallel op invokers, but if you are implementing new batch operations with parallelism they are useful. They may also be useful for other kinds of analysis.

lenskit.parallel.invoker(model, func, n_jobs=None, progress=None)

Get an appropriate invoker for performing operations on model.

Parameters:

model (M) – The model object on which to perform operations.
func (Callable[[M, A], R]) – The function to call. The function must be pickleable.
n_jobs (int | None) – The number of processes to use for parallel operations. If None, will call proc_count() with a maximum default process count of 4.
progress (Progress | None) –
A progress bar to use to report status. It should have the following states:
- dispatched
- in-progress
- finished
One can be created with invoke_progress()

Returns:

An invoker to perform operations on the model.

Return type:

ModelOpInvoker

class lenskit.parallel.ModelOpInvoker

Bases: ABC, Generic[A, R]

Interface for invoking operations on a model, possibly in parallel. The operation invoker is configured with a model and a function to apply, and applies that function to the arguments supplied in map. Child process invokers also route logging messages to the parent process, so logging works even with multiprocessing.

An invoker is a context manager that calls shutdown() when exited.

abstract map(tasks)

Apply the configured function to the model and iterables. This is like map(), except it supplies the invoker’s model as the first object to func.

Parameters:

iterables – Iterables of arguments to provide to the function.
tasks (Iterable[A])

Returns:

An iterable of the results.

Return type:

iterable

Logging and Progress

Multi-process op invokers automatically set up logging and progress reporting to work across processes using the manylog package. Op invokers can also report the progress of queued jobs to a progress_api.Progress.

lenskit.parallel.invoke_progress(logger=None, label=None, total=None, unit=None)

Create a progress bar for parallel tasks. It is populated with the correct state of tasks for invoker().

See make_progress() for details on parameter meanings.

Parameters:

logger (str | Logger | None)
label (str | None)
total (int | None)
unit (str | None)

Return type:

Progress

Computing Work Chunks

The WorkChunks class provides support for dividing work into chunks for parallel processing, particularly for model training.

class lenskit.parallel.chunking.WorkChunks(total, chunk_size, chunk_count)

Bases: NamedTuple

The chunking configuration for parallel work.

Parameters:

total (int)
chunk_size (int)
chunk_count (int)

total: int: Alias for field number 0

chunk_size: int: Alias for field number 1

chunk_count: int: Alias for field number 2

classmethod create(njobs): Compute the chunk size for parallel model training.