--- wikisrc/projects/project/smp_networking.mdwn 2011/11/06 17:16:56 1.1 +++ wikisrc/projects/project/smp_networking.mdwn 2011/11/10 03:06:51 1.2 @@ -4,40 +4,125 @@ title="SMP Networking (aka remove the bi contact=""" [tech-kern](mailto:tech-kern@NetBSD.org), +[tech-net](mailto:tech-net@NetBSD.org), [board](mailto:board@NetBSD.org), [core](mailto:core@NetBSD.org) """ -category="kernel" +category="networking" difficulty="hard" funded="The NetBSD Foundation" description=""" -Traditionally, the kernel code had been protected by a single, global lock. -This lock ensured that, on a multiprocessor system, two different threads -of execution did not access the kernel concurrently and thus simplified the -internal design of the kernel. However, such design does not scale to -multiprocessor machines because, effectively, the kernel is restricted to -run on a single processor at any given time. +**WARNING: THIS IS A DRAFT; THE INFORMATION CONTAINED IN THIS PROJECT AND +ANY OF THE SUBPROJECTS LINKED BELOW IS SUBJECT TO CHANGE.** + +Traditionally, the NetBSD kernel code had been protected by a single, +global lock. This lock ensured that, on a multiprocessor system, two +different threads of execution did not access the kernel concurrently and +thus simplified the internal design of the kernel. However, such design +does not scale to multiprocessor machines because, effectively, the kernel +is restricted to run on a single processor at any given time. The NetBSD kernel has been modified to use fine grained locks in many of its different subsystems, achieving good performance on today's multiprocessor machines. Unfotunately, these changes have not yet been applied to the networking code, which remains protected by the single lock. +In other words: NetBSD networking has evolved to work in a uniprocessor +envionment; switching it to use fine-grained locked is a hard and complex +problem. -The aim of this project is to remove the single lock surrounding the -networking code in the kernel, allowing such code to execute more -efficiently in multiprocessor machines. - -This project is sponsored by The NetBSD Foundation as improving the -performance of the networking subsystem in current machines is critical to -maintain the relevance of the NetBSD operating system. +# Funding At this time, The NetBSD Foundation is accepting project specifications to remove the single networking lock. If you want to apply for this project, -please send your proposal to the contact addresses listed above. Please -see the [call for -proposals](http://blog.netbsd.org/tnf/entry/request_for_project_specs_to) -posted to the blog. +please send your proposal to the contact addresses listed above. + +What follows is a particular design proposal, extracted from an +[original text](http://www.NetBSD.org/~matt/smpnet.html) written by +[Matt Thomas](mailto:matt@NetBSD.org). You may choose to work on this +particular proposal or come up with your own. + +**Please note that the subtasks listed below are also open for funding +individually.** + +# Tentative specification + +The future of NetBSD network infrastructure has to efficiently embrace two +major design criteria: Symmetric Multi-Processing (SMP) and modularity. +Other design considerations include not only supporting but taking +advantage of the capability of newer network devices to do packet +classification, payload splitting, and even full connection offload. + +You can divide the network infrastructure into 5 major components: + +* Interfaces (both real devices and pseudo-devices) +* Socket code +* Protocols +* Routing code +* mbuf code. + +Part of the complexity is that, due to the monolithic nature of the kernel, +each layer currently feels free to call any other layer. This makes +designing a lock hierarchy difficult and likely to fail. + +Part of the problem are asynchonous upcalls, among which include: + +* `ifa->ifa_rtrequest` for route changes. +* `pr_ctlinput` for interface events. + +Another source of complexity is the large number of global variables +scattered throughout the source files. This makes putting locks around +them difficult. + +The proposed solution presented here include the following tasks (in no +particular order) to achieve the desired goals of SMP support and +modularity: + +[[!map show="title" pages="projects/project/* and tagged(project) and tagged(smp_networking)"]] + +# Radical thoughts + +You should also consider the following ideas: + +## LWPs in user space do not need a kernel stack + +Those pages are only being used in case the an exception happens. +Interrupts are probably going to their own dedicated stack. One could just +keep a set of kernel stacks around. Each CPU has one, when a user +exception happens, that stack is assigned to the current LWP and removed as +the active CPU one. When that CPU next returns to user space, the kernel +stack it was using is saved to be used for the next user exception. The +idle lwp would just use the current kernel stack. + +## LWPs waiting for kernel condition shouldn't need a kernel stack + +If an LWP is waiting on a kernel condition variable, it is expecting to be +inactive for some time, possibly a long time. During this inactivity, it +does not really need a kernel stack. + +When the exception handler get an usermode exeception, it sets LWP +restartable flag that indicates that the exception is restartable, and then +services the exception as normal. As routines are called, they can clear +the LWP restartable flag as needed. When an LWP needs to block for a long +time, instead of calling `cv_wait`, it could call `cv_restart`. If +`cv_restart` returned false, the LWPs restartable flag was clear so +`cv_restart` acted just like `cv_wait`. Otherwise, the LWP and CV would +have been tied together (big hand wave), the lock had been released and the +routine should have returned `ERESTART`. `cv_restart` could also wait for +a small amount of time like .5 second, and only if the timeout expires. + +As the stack unwinds, eventually, it would return to the last the exception +handler. The exception would see the LWP has a bound CV, save the LWP's +user state into the PCB, set the LWP to sleeping, mark the lwp's stack as +idle, and call the scheduler to find more work. When called, +`cpu_switchto` would notice the stack is marked idle, and detach it from +the LWP. + +When the condition times out or is signalled, the first LWP attached to the +condition variable is marked runnable and detached from the CV. When the +`cpu_switchto` routine is called, the it would notice the lack of a stack +so it would grab one, restore the trapframe, and reinvoke the exception +handler. """ ]]