[[!template id=project title="Kernel continuations" contact=""" [tech-kern](mailto:tech-kern@NetBSD.org), [board](mailto:board@NetBSD.org), [core](mailto:core@NetBSD.org) """ category="kernel" difficulty="hard" description=""" This project proposal is a subtask of [[smp_networking]]. The goal of this project is to implement continuations at the kernel level. Most of the pieces are already available in the kernel, so this can be reworded as: combine *callouts*, *softints*, and *workqueues* into a single framework. Continuations are meant to be cheap; very cheap. These continuations are a dispatching system for making callbacks at scheduled times or in different thread/interrupt contexts. They aren't "continuations" in the usual sense such as you might find in Scheme code. Please note that the main goal of this project is to simplify the implementation of [[SMP networking|smp_networking]], so care must be taken in the design of the interface to support all the features required for this other project. The proposed interface looks like the following. This interface is mostly derived from the `callout(9)` API and is a superset of the softint(9) API. The most significant change is that workqueue items are not tied to a specific kernel thread. * `kcont_t *kcont_create(kcont_wq_t *wq, kmutex_t *lock, void (*func)(void *, kcont_t *), void *arg, int flags);` A `wq` must be supplied. It may be one returned by `kcont_workqueue_acquire` or a predefined workqueue such as (sorted from highest priority to lowest): * `wq_softserial`, `wq_softnet`, `wq_softbio`, `wq_softclock` * `wq_prihigh`, `wq_primedhigh`, `wq_primedlow`, `wq_prilow` `lock`, if non-NULL, should be locked before calling `func(arg)` and released afterwards. However, if the lock is released and/or destroyed before the called function returns, then, before returning, `kcont_set_mutex` must be called with either a new mutex to be released or `NULL`. If acquiring lock would block, other pending kernel continuations which depend on other locks may be dispatched in the meantime. However, all continuations sharing the same set of `{ wq, lock, [ci] }` need to be processed in the order they were scheduled. `flags` must be 0. This field is just provided for extensibility. * `int kcont_schedule(kcont_t *kc, struct cpu_info *ci, int nticks);` If the continuation is marked as *INVOKING*, an error of `EBUSY` should be returned. If `nticks` is 0, the continuation is marked as *INVOKING* while *EXPIRED* and *PENDING* are cleared, and the continuation is scheduled to be invoked without delay. Otherwise, the continuation is marked as *PENDING* while *EXPIRED* status is cleared, and the timer reset to `nticks`. Once the timer expires, the continuation is marked as *EXPIRED* and *INVOKING*, and the *PENDING* status is cleared. If `ci` is non-NULL, the continuation is invoked on the specified CPU if the continuations's workqueue has per-cpu queues. If that workqueue does not provide per-cpu queues, an error of `ENOENT` is returned. Otherwise when `ci` is `NULL`, the continuation is invoked on either the current CPU or the next available CPU depending on whether the continuation's workqueue has per-cpu queues or not, respectively. * `void kcont_destroy(kcont_t *kc);` * `kmutex_t *kcont_getmutex(kcont_t *kc);` Returns the lock currently associated with the continuation `kc`. * `void kcont_setarg(kcont_t *kc, void *arg);` Updates `arg` in the continuation `kc`. If no lock is associated with the continuation, then `arg` may be changed at any time; however, if the continuation is being invoked, it may not pick up the change. Otherwise, `kcont_setarg` must only be called when the associated lock is locked. * `kmutex_t *kcont_setmutex(kcont_t *kc, kmutex_t *lock);` Updates the lock associated with the continuation `kc` and returns the previous lock. If no lock is currently associated with the continuation, then calling this function with a lock other than NULL will trigger an assertion failure. Otherwise, `kcont_setmutex` must be called only when the existing lock (which will be replaced) is locked. If `kcont_setmutex` is called as a result of the invokation of func, then after kcont_setmutex has been called but before func returns, the replaced lock must have been released, and the replacement lock, if non-NULL, must be locked upon return. * `void kcont_setfunc(kcont_t *kc, void (*func)(void *), void *arg);` Updates `func` and `arg` in the continuation `kc`. If no lock is associated with the continuation, then only arg may be changed. Otherwise, `kcont_setfunc` must be called only when the associated lock is locked. * `bool kcont_stop(kcont_t *kc);` The `kcont_stop function` stops the timer associated the continuation handle kc. The *PENDING* and *EXPIRED* status for the continuation handle is cleared. It is safe to call `kcont_stop` on a continuation handle that is not pending, so long as it is initialized. `kcont_stop` will return a non-zero value if the continuation was *EXPIRED*. * `bool kcont_pending(kcont_t *kc);` The `kcont_pending` function tests the *PENDING* status of the continuation handle `kc`. A *PENDING* continuation is one who's timer has been started and has not expired. Note that it is possible for a continuation's timer to have expired without being invoked if the continuation's lock could not be acquired or there are higher priority threads preventing its invokation. Note that it is only safe to test *PENDING* status when holding the continuation's lock. * `bool kcont_expired(kcont_t *kc);` Tests to see if the continuation's function has been invoked since the last `kcont_schedule`. * `bool kcont_active(kcont_t *kc);` * `bool kcont_invoking(kcont_t *kc);` Tests the *INVOKING* status of the handle `kc`. This flag is set just before a continuation's function is being called. Since the scheduling of the worker threads may induce delays, other pending higher-priority code may run before the continuation function is allowed to run. This may create a race condition if this higher-priority code deallocates storage containing one or more continuation structures whose continuation functions are about to be run. In such cases, one technique to prevent references to deallocated storage would be to test whether any continuation functions are in the *INVOKING* state using `kcont_invoking`, and if so, to mark the data structure and defer storage deallocation until the continuation function is allowed to run. For this handshake protocol to work, the continuation function will have to use the `kcont_ack` function to clear this flag. * `bool kcont_ack(kcont_t *kc);` Clears the *INVOKING* state in the continuation handle `kc`. This is used in situations where it is necessary to protect against the race condition described under `kcont_invoking`. * `kcont_wq_t *kcont_workqueue_acquire(pri_t pri, int flags);` Returns a workqueue that matches the specified criteria. Thus if multiple requesters ask for the same criteria, they are all returned the same workqueue. `pri` specifies the priority at which the kernel thread which empties the workqueue should run. If `flags` is 0 then the standard operation is required. However, the following flag(s) may be bitwise ORed together: * `WQ_PERCPU` specifies that the workqueue should have a separate queue for each CPU, thus allowing continuations to be invoked on specific CPUs. * `int kcont_workqueue_release(kcont_wq_t *wq);` Releases an acquired workqueue. On the last release, the workqueue's resources are freed and the workqueue is destroyed. """ ]] [[!tag smp_networking]]