1: [[!template id=project
3: title="SMP Networking (aka remove the big network lock)"
14: funded="The NetBSD Foundation"
17: Traditionally, the NetBSD kernel code had been protected by a single,
18: global lock. This lock ensured that, on a multiprocessor system, two
19: different threads of execution did not access the kernel concurrently and
20: thus simplified the internal design of the kernel. However, such design
21: does not scale to multiprocessor machines because, effectively, the kernel
22: is restricted to run on a single processor at any given time.
24: The NetBSD kernel has been modified to use fine grained locks in many of
25: its different subsystems, achieving good performance on today's
26: multiprocessor machines. Unfotunately, these changes have not yet been
27: applied to the networking code, which remains protected by the single lock.
28: In other words: NetBSD networking has evolved to work in a uniprocessor
29: envionment; switching it to use fine-grained locked is a hard and complex
32: # Funding
34: At this time, The NetBSD Foundation is accepting project specifications to
35: remove the single networking lock. If you want to apply for this project,
36: please send your proposal to the contact addresses listed above.
38: Due to the size of this project, your proposal does not need to cover
39: everything to qualify for funding. We have attempted to split the work
40: into smaller units, and **you can submit funding applications for these
41: smaller subtasks independently** as long as the work you deliver fits in
42: the grand order of this project. For example, you could send an
43: application to make the network interfaces alone MP-friendly (see the *work
44: plan* below).
46: What follows is a particular design proposal, extracted from an
47: [original text](http://www.NetBSD.org/~matt/smpnet.html) written by
48: [Matt Thomas](mailto:matt@NetBSD.org). You may choose to work on this
49: particular proposal or come up with your own.
51: # Tentative specification
53: The future of NetBSD network infrastructure has to efficiently embrace two
54: major design criteria: Symmetric Multi-Processing (SMP) and modularity.
55: Other design considerations include not only supporting but taking
56: advantage of the capability of newer network devices to do packet
57: classification, payload splitting, and even full connection offload.
59: You can divide the network infrastructure into 5 major components:
61: * Interfaces (both real devices and pseudo-devices)
62: * Socket code
63: * Protocols
64: * Routing code
65: * mbuf code.
67: Part of the complexity is that, due to the monolithic nature of the kernel,
68: each layer currently feels free to call any other layer. This makes
69: designing a lock hierarchy difficult and likely to fail.
71: Part of the problem are asynchonous upcalls, among which include:
73: * `ifa->ifa_rtrequest` for route changes.
74: * `pr_ctlinput` for interface events.
76: Another source of complexity is the large number of global variables
77: scattered throughout the source files. This makes putting locks around
78: them difficult.
80: ## Subtasks
82: The proposed solution presented here include the following tasks (in no
83: particular order) to achieve the desired goals of SMP support and
86: [[!map show="title" pages="projects/project/* and tagged(project) and tagged(smp_networking) and tagged(status:active)"]]
88: ## Work plan
90: Aside from the list of tasks above, the work to be done for this project
91: can be achieved by following these steps:
93: 1. Move ARP out of the routing table. See the [[nexthop_cache]] project.
95: 1. Make the network interfaces MP, which are one of the few users of the
96: big kernel lock left. This needs to support multiple receive and
97: transmit queues to help reduce locking contention. This also includes
98: changing more of the common interfaces to do what the `tsec` driver does
99: (basically do everything with softints). This also needs to change the
100: `*_input` routines to use a table to do dispatch instead of the current
101: switch code so domain can be dynamically loaded.
103: 1. Collect global variables in the IP/UDP/TCP protocols into structures.
104: This helps the following items.
106: 1. Make IPV4/ICMP/IGMP/REASS MP-friendly.
108: 1. Make IPV6/ICMP/IGMP/ND MP-friendly.
110: 1. Make TCP MP-friendly.
112: 1. Make UDP MP-friendly.
114: # Radical thoughts
116: You should also consider the following ideas:
118: ## LWPs in user space do not need a kernel stack
120: Those pages are only being used in case the an exception happens.
121: Interrupts are probably going to their own dedicated stack. One could just
122: keep a set of kernel stacks around. Each CPU has one, when a user
123: exception happens, that stack is assigned to the current LWP and removed as
124: the active CPU one. When that CPU next returns to user space, the kernel
125: stack it was using is saved to be used for the next user exception. The
126: idle lwp would just use the current kernel stack.
128: ## LWPs waiting for kernel condition shouldn't need a kernel stack
130: If an LWP is waiting on a kernel condition variable, it is expecting to be
131: inactive for some time, possibly a long time. During this inactivity, it
132: does not really need a kernel stack.
134: When the exception handler get an usermode exeception, it sets LWP
135: restartable flag that indicates that the exception is restartable, and then
136: services the exception as normal. As routines are called, they can clear
137: the LWP restartable flag as needed. When an LWP needs to block for a long
138: time, instead of calling `cv_wait`, it could call `cv_restart`. If
139: `cv_restart` returned false, the LWPs restartable flag was clear so
140: `cv_restart` acted just like `cv_wait`. Otherwise, the LWP and CV would
141: have been tied together (big hand wave), the lock had been released and the
142: routine should have returned `ERESTART`. `cv_restart` could also wait for
143: a small amount of time like .5 second, and only if the timeout expires.
145: As the stack unwinds, eventually, it would return to the last the exception
146: handler. The exception would see the LWP has a bound CV, save the LWP's
147: user state into the PCB, set the LWP to sleeping, mark the lwp's stack as
148: idle, and call the scheduler to find more work. When called,
149: `cpu_switchto` would notice the stack is marked idle, and detach it from
150: the LWP.
152: When the condition times out or is signalled, the first LWP attached to the
153: condition variable is marked runnable and detached from the CV. When the
154: `cpu_switchto` routine is called, the it would notice the lack of a stack
155: so it would grab one, restore the trapframe, and reinvoke the exception
CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb