File:  [NetBSD Developer Wiki] / wikisrc / projects / project / smp_networking.mdwn
Revision 1.2: download - view: text, annotated - select for diffs
Thu Nov 10 03:06:51 2011 UTC (10 years, 7 months ago) by jmmv
Branches: MAIN
CVS tags: HEAD
Add a specific proposal for the SMP networking project.

This proposal is built on top of several individual, smaller projects, all
of which are related to achieve the goals of SMP support and modularity on
the network stack.  Keep in mind that this is just that: a proposal.
Applicants could still come up with their own ideas.

The text of all these new pages is mostly a copy/paste of the original
document written by matt@ (see
I have done some minor edits (hopefully not changing any of the technical
details) and added some preliminary texts to the pages.  (I was unable to
parse some of the sentences though, so they remain "as is"...)

[[!template id=project

title="SMP Networking (aka remove the big network lock)"


funded="The NetBSD Foundation"


Traditionally, the NetBSD kernel code had been protected by a single,
global lock.  This lock ensured that, on a multiprocessor system, two
different threads of execution did not access the kernel concurrently and
thus simplified the internal design of the kernel.  However, such design
does not scale to multiprocessor machines because, effectively, the kernel
is restricted to run on a single processor at any given time.

The NetBSD kernel has been modified to use fine grained locks in many of
its different subsystems, achieving good performance on today's
multiprocessor machines.  Unfotunately, these changes have not yet been
applied to the networking code, which remains protected by the single lock.
In other words: NetBSD networking has evolved to work in a uniprocessor
envionment; switching it to use fine-grained locked is a hard and complex

# Funding

At this time, The NetBSD Foundation is accepting project specifications to
remove the single networking lock.  If you want to apply for this project,
please send your proposal to the contact addresses listed above.

What follows is a particular design proposal, extracted from an
[original text]( written by
[Matt Thomas](  You may choose to work on this
particular proposal or come up with your own.

**Please note that the subtasks listed below are also open for funding

# Tentative specification

The future of NetBSD network infrastructure has to efficiently embrace two
major design criteria: Symmetric Multi-Processing (SMP) and modularity.
Other design considerations include not only supporting but taking
advantage of the capability of newer network devices to do packet
classification, payload splitting, and even full connection offload.

You can divide the network infrastructure into 5 major components:

* Interfaces (both real devices and pseudo-devices)
* Socket code
* Protocols
* Routing code
* mbuf code.

Part of the complexity is that, due to the monolithic nature of the kernel,
each layer currently feels free to call any other layer.  This makes
designing a lock hierarchy difficult and likely to fail.

Part of the problem are asynchonous upcalls, among which include:

* `ifa->ifa_rtrequest` for route changes.
* `pr_ctlinput` for interface events.

Another source of complexity is the large number of global variables
scattered throughout the source files.  This makes putting locks around
them difficult.

The proposed solution presented here include the following tasks (in no
particular order) to achieve the desired goals of SMP support and

[[!map show="title" pages="projects/project/* and tagged(project) and tagged(smp_networking)"]]

# Radical thoughts

You should also consider the following ideas:

## LWPs in user space do not need a kernel stack

Those pages are only being used in case the an exception happens.
Interrupts are probably going to their own dedicated stack.  One could just
keep a set of kernel stacks around.  Each CPU has one, when a user
exception happens, that stack is assigned to the current LWP and removed as
the active CPU one.  When that CPU next returns to user space, the kernel
stack it was using is saved to be used for the next user exception.  The
idle lwp would just use the current kernel stack.

## LWPs waiting for kernel condition shouldn't need a kernel stack

If an LWP is waiting on a kernel condition variable, it is expecting to be
inactive for some time, possibly a long time.  During this inactivity, it
does not really need a kernel stack.

When the exception handler get an usermode exeception, it sets LWP
restartable flag that indicates that the exception is restartable, and then
services the exception as normal.  As routines are called, they can clear
the LWP restartable flag as needed.  When an LWP needs to block for a long
time, instead of calling `cv_wait`, it could call `cv_restart`.  If
`cv_restart` returned false, the LWPs restartable flag was clear so
`cv_restart` acted just like `cv_wait`.  Otherwise, the LWP and CV would
have been tied together (big hand wave), the lock had been released and the
routine should have returned `ERESTART`.  `cv_restart` could also wait for
a small amount of time like .5 second, and only if the timeout expires.

As the stack unwinds, eventually, it would return to the last the exception
handler.  The exception would see the LWP has a bound CV, save the LWP's
user state into the PCB, set the LWP to sleeping, mark the lwp's stack as
idle, and call the scheduler to find more work.  When called,
`cpu_switchto` would notice the stack is marked idle, and detach it from
the LWP.

When the condition times out or is signalled, the first LWP attached to the
condition variable is marked runnable and detached from the CV.  When the
`cpu_switchto` routine is called, the it would notice the lack of a stack
so it would grab one, restore the trapframe, and reinvoke the exception

CVSweb for NetBSD wikisrc <> software: FreeBSD-CVSweb