Annotation of wikisrc/projects/project/smp_networking.mdwn, revision 1.3
1.1 jmmv 1: [[!template id=project
3: title="SMP Networking (aka remove the big network lock)"
1.2 jmmv 7: [tech-net](mailto:tech-net@NetBSD.org),
1.1 jmmv 8: [board](mailto:board@NetBSD.org),
1.2 jmmv 12: category="networking"
1.1 jmmv 13: difficulty="hard"
14: funded="The NetBSD Foundation"
1.2 jmmv 17: **WARNING: THIS IS A DRAFT; THE INFORMATION CONTAINED IN THIS PROJECT AND
18: ANY OF THE SUBPROJECTS LINKED BELOW IS SUBJECT TO CHANGE.**
20: Traditionally, the NetBSD kernel code had been protected by a single,
21: global lock. This lock ensured that, on a multiprocessor system, two
22: different threads of execution did not access the kernel concurrently and
23: thus simplified the internal design of the kernel. However, such design
24: does not scale to multiprocessor machines because, effectively, the kernel
25: is restricted to run on a single processor at any given time.
1.1 jmmv 26:
27: The NetBSD kernel has been modified to use fine grained locks in many of
28: its different subsystems, achieving good performance on today's
29: multiprocessor machines. Unfotunately, these changes have not yet been
30: applied to the networking code, which remains protected by the single lock.
1.2 jmmv 31: In other words: NetBSD networking has evolved to work in a uniprocessor
32: envionment; switching it to use fine-grained locked is a hard and complex
1.1 jmmv 34:
1.2 jmmv 35: # Funding
1.1 jmmv 36:
37: At this time, The NetBSD Foundation is accepting project specifications to
38: remove the single networking lock. If you want to apply for this project,
1.2 jmmv 39: please send your proposal to the contact addresses listed above.
1.3 ! jmmv 41: Due to the size of this project, your proposal does not need to cover
! 42: everything to qualify for funding. We have attempted to split the work
! 43: into smaller units, and **you can submit funding applications for these
! 44: smaller subtasks independently** as long as the work you deliver fits in
! 45: the grand order of this project. For example, you could send an
! 46: application to make the network interfaces alone MP-friendly (see the *work
! 47: plan* below).
1.2 jmmv 49: What follows is a particular design proposal, extracted from an
50: [original text](http://www.NetBSD.org/~matt/smpnet.html) written by
51: [Matt Thomas](mailto:matt@NetBSD.org). You may choose to work on this
52: particular proposal or come up with your own.
54: # Tentative specification
56: The future of NetBSD network infrastructure has to efficiently embrace two
57: major design criteria: Symmetric Multi-Processing (SMP) and modularity.
58: Other design considerations include not only supporting but taking
59: advantage of the capability of newer network devices to do packet
60: classification, payload splitting, and even full connection offload.
62: You can divide the network infrastructure into 5 major components:
64: * Interfaces (both real devices and pseudo-devices)
65: * Socket code
66: * Protocols
67: * Routing code
68: * mbuf code.
70: Part of the complexity is that, due to the monolithic nature of the kernel,
71: each layer currently feels free to call any other layer. This makes
72: designing a lock hierarchy difficult and likely to fail.
74: Part of the problem are asynchonous upcalls, among which include:
76: * `ifa->ifa_rtrequest` for route changes.
77: * `pr_ctlinput` for interface events.
79: Another source of complexity is the large number of global variables
80: scattered throughout the source files. This makes putting locks around
81: them difficult.
1.3 ! jmmv 83: ## Subtasks
1.2 jmmv 85: The proposed solution presented here include the following tasks (in no
86: particular order) to achieve the desired goals of SMP support and
89: [[!map show="title" pages="projects/project/* and tagged(project) and tagged(smp_networking)"]]
1.3 ! jmmv 91: ## Work plan
! 93: Aside from the list of tasks above, the work to be done for this project
! 94: can be achieved by following these steps:
! 96: 1. Move ARP out of the routing table. See the [[nexthop_cache]] project.
! 98: 1. Make the network interfaces MP, which are one of the few users of the
! 99: big kernel lock left. This needs to support multiple receive and
! 100: transmit queues to help reduce locking contention. This also includes
! 101: changing more of the common interfaces to do what the `tsec` driver does
! 102: (basically do everything with softints). This also needs to change the
! 103: `*_input` routines to use a table to do dispatch instead of the current
! 104: switch code so domain can be dynamically loaded.
! 106: 1. Collect global variables in the IP/UDP/TCP protocols into structures.
! 107: This helps the following items.
! 109: 1. Make IPV4/ICMP/IGMP/REASS MP-friendly.
! 111: 1. Make IPV6/ICMP/IGMP/ND MP-friendly.
! 113: 1. Make TCP MP-friendly.
! 115: 1. Make UDP MP-friendly.
1.2 jmmv 117: # Radical thoughts
119: You should also consider the following ideas:
121: ## LWPs in user space do not need a kernel stack
123: Those pages are only being used in case the an exception happens.
124: Interrupts are probably going to their own dedicated stack. One could just
125: keep a set of kernel stacks around. Each CPU has one, when a user
126: exception happens, that stack is assigned to the current LWP and removed as
127: the active CPU one. When that CPU next returns to user space, the kernel
128: stack it was using is saved to be used for the next user exception. The
129: idle lwp would just use the current kernel stack.
131: ## LWPs waiting for kernel condition shouldn't need a kernel stack
133: If an LWP is waiting on a kernel condition variable, it is expecting to be
134: inactive for some time, possibly a long time. During this inactivity, it
135: does not really need a kernel stack.
137: When the exception handler get an usermode exeception, it sets LWP
138: restartable flag that indicates that the exception is restartable, and then
139: services the exception as normal. As routines are called, they can clear
140: the LWP restartable flag as needed. When an LWP needs to block for a long
141: time, instead of calling `cv_wait`, it could call `cv_restart`. If
142: `cv_restart` returned false, the LWPs restartable flag was clear so
143: `cv_restart` acted just like `cv_wait`. Otherwise, the LWP and CV would
144: have been tied together (big hand wave), the lock had been released and the
145: routine should have returned `ERESTART`. `cv_restart` could also wait for
146: a small amount of time like .5 second, and only if the timeout expires.
148: As the stack unwinds, eventually, it would return to the last the exception
149: handler. The exception would see the LWP has a bound CV, save the LWP's
150: user state into the PCB, set the LWP to sleeping, mark the lwp's stack as
151: idle, and call the scheduler to find more work. When called,
152: `cpu_switchto` would notice the stack is marked idle, and detach it from
153: the LWP.
155: When the condition times out or is signalled, the first LWP attached to the
156: condition variable is marked runnable and detached from the CV. When the
157: `cpu_switchto` routine is called, the it would notice the lack of a stack
158: so it would grab one, restore the trapframe, and reinvoke the exception
1.1 jmmv 160: """
CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb