[[!template id=project

title="Kernel continuations"

contact="""
[tech-kern](mailto:tech-kern@NetBSD.org),
[board](mailto:board@NetBSD.org),
[core](mailto:core@NetBSD.org)
"""

category="kernel"
difficulty="hard"

description="""
This project proposal is a subtask of [[smp_networking]].

The goal of this project is to implement continuations at the kernel level.
Most of the pieces are already available in the kernel, so this can be
reworded as: combine *callouts*, *softints*, and *workqueues* into a single
framework.  Continuations are meant to be cheap; very cheap.

These continuations are a dispatching system for making callbacks at
scheduled times or in different thread/interrupt contexts.
They aren't "continuations" in the usual sense such as you might find
in Scheme code.

Please note that the main goal of this project is to simplify the
implementation of [[SMP networking|smp_networking]], so care must be taken
in the design of the interface to support all the features required for
this other project.

The proposed interface looks like the following.  This interface is mostly
derived from the `callout(9)` API and is a superset of the softint(9) API.
The most significant change is that workqueue items are not tied to a
specific kernel thread.

* `kcont_t *kcont_create(kcont_wq_t *wq, kmutex_t *lock, void
  (*func)(void *, kcont_t *), void *arg, int flags);`

  A `wq` must be supplied.  It may be one returned by
  `kcont_workqueue_acquire` or a predefined workqueue such as (sorted from
  highest priority to lowest):

  * `wq_softserial`, `wq_softnet`, `wq_softbio`, `wq_softclock`
  * `wq_prihigh`, `wq_primedhigh`, `wq_primedlow`, `wq_prilow`

  `lock`, if non-NULL, should be locked before calling `func(arg)` and
  released afterwards.  However, if the lock is released and/or destroyed
  before the called function returns, then, before returning,
  `kcont_set_mutex` must be called with either a new mutex to be released
  or `NULL`.  If acquiring lock would block, other pending kernel
  continuations which depend on other locks may be dispatched in the
  meantime.  However, all continuations sharing the same set of `{ wq, lock,
  [ci] }` need to be processed in the order they were scheduled.

  `flags` must be 0.  This field is just provided for extensibility.

* `int kcont_schedule(kcont_t *kc, struct cpu_info *ci, int nticks);`

  If the continuation is marked as *INVOKING*, an error of `EBUSY` should
  be returned.  If `nticks` is 0, the continuation is marked as *INVOKING*
  while *EXPIRED* and *PENDING* are cleared, and the continuation is
  scheduled to be invoked without delay.  Otherwise, the continuation is
  marked as *PENDING* while *EXPIRED* status is cleared, and the timer
  reset to `nticks`.  Once the timer expires, the continuation is marked as
  *EXPIRED* and *INVOKING*, and the *PENDING* status is cleared.  If `ci`
  is non-NULL, the continuation is invoked on the specified CPU if the
  continuations's workqueue has per-cpu queues.  If that workqueue does not
  provide per-cpu queues, an error of `ENOENT` is returned.  Otherwise when
  `ci` is `NULL`, the continuation is invoked on either the current CPU or
  the next available CPU depending on whether the continuation's workqueue
  has per-cpu queues or not, respectively.

* `void kcont_destroy(kcont_t *kc);`

* `kmutex_t *kcont_getmutex(kcont_t *kc);`

  Returns the lock currently associated with the continuation `kc`.

* `void kcont_setarg(kcont_t *kc, void *arg);`

  Updates `arg` in the continuation `kc`.  If no lock is associated with
  the continuation, then `arg` may be changed at any time; however, if the
  continuation is being invoked, it may not pick up the change.  Otherwise,
  `kcont_setarg` must only be called when the associated lock is locked.

* `kmutex_t *kcont_setmutex(kcont_t *kc, kmutex_t *lock);`

  Updates the lock associated with the continuation `kc` and returns the
  previous lock.  If no lock is currently associated with the continuation,
  then calling this function with a lock other than NULL will trigger an
  assertion failure.  Otherwise, `kcont_setmutex` must be called only when
  the existing lock (which will be replaced) is locked.  If
  `kcont_setmutex` is called as a result of the invokation of func, then
  after kcont_setmutex has been called but before func returns, the
  replaced lock must have been released, and the replacement lock, if
  non-NULL, must be locked upon return.

* `void kcont_setfunc(kcont_t *kc, void (*func)(void *), void *arg);`

  Updates `func` and `arg` in the continuation `kc`.  If no lock is
  associated with the continuation, then only arg may be changed.
  Otherwise, `kcont_setfunc` must be called only when the associated lock
  is locked.

* `bool kcont_stop(kcont_t *kc);`

  The `kcont_stop function` stops the timer associated the continuation
  handle kc.  The *PENDING* and *EXPIRED* status for the continuation
  handle is cleared.  It is safe to call `kcont_stop` on a continuation
  handle that is not pending, so long as it is initialized.  `kcont_stop`
  will return a non-zero value if the continuation was *EXPIRED*.

* `bool kcont_pending(kcont_t *kc);`

  The `kcont_pending` function tests the *PENDING* status of the
  continuation handle `kc`.  A *PENDING* continuation is one who's timer
  has been started and has not expired.  Note that it is possible for a
  continuation's timer to have expired without being invoked if the
  continuation's lock could not be acquired or there are higher priority
  threads preventing its invokation.  Note that it is only safe to test
  *PENDING* status when holding the continuation's lock.

* `bool kcont_expired(kcont_t *kc);`

  Tests to see if the continuation's function has been invoked since the
  last `kcont_schedule`.

* `bool kcont_active(kcont_t *kc);`

* `bool kcont_invoking(kcont_t *kc);`

  Tests the *INVOKING* status of the handle `kc`.  This flag is set just
  before a continuation's function is being called.  Since the scheduling
  of the worker threads may induce delays, other pending higher-priority
  code may run before the continuation function is allowed to run.  This
  may create a race condition if this higher-priority code deallocates
  storage containing one or more continuation structures whose continuation
  functions are about to be run.  In such cases, one technique to prevent
  references to deallocated storage would be to test whether any
  continuation functions are in the *INVOKING* state using
  `kcont_invoking`, and if so, to mark the data structure and defer storage
  deallocation until the continuation function is allowed to run.  For this
  handshake protocol to work, the continuation function will have to use
  the `kcont_ack` function to clear this flag.

* `bool kcont_ack(kcont_t *kc);`

  Clears the *INVOKING* state in the continuation handle `kc`.  This is
  used in situations where it is necessary to protect against the race
  condition described under `kcont_invoking`.

* `kcont_wq_t *kcont_workqueue_acquire(pri_t pri, int flags);`

  Returns a workqueue that matches the specified criteria.  Thus if
  multiple requesters ask for the same criteria, they are all returned the
  same workqueue.  `pri` specifies the priority at which the kernel thread
  which empties the workqueue should run.

  If `flags` is 0 then the standard operation is required.  However, the
  following flag(s) may be bitwise ORed together:

  * `WQ_PERCPU` specifies that the workqueue should have a separate queue
    for each CPU, thus allowing continuations to be invoked on specific CPUs.

* `int kcont_workqueue_release(kcont_wq_t *wq);`

  Releases an acquired workqueue.  On the last release, the workqueue's
  resources are freed and the workqueue is destroyed.
"""
]]

[[!tag smp_networking]]