Annotation of wikisrc/projects/project/smp_networking.mdwn, revision 1.3

1.1       jmmv        1: [[!template id=project
                      3: title="SMP Networking (aka remove the big network lock)"
                      5: contact="""
                      6: [tech-kern](,
1.2       jmmv        7: [tech-net](,
1.1       jmmv        8: [board](,
                      9: [core](
                     10: """
1.2       jmmv       12: category="networking"
1.1       jmmv       13: difficulty="hard"
                     14: funded="The NetBSD Foundation"
                     16: description="""
                     20: Traditionally, the NetBSD kernel code had been protected by a single,
                     21: global lock.  This lock ensured that, on a multiprocessor system, two
                     22: different threads of execution did not access the kernel concurrently and
                     23: thus simplified the internal design of the kernel.  However, such design
                     24: does not scale to multiprocessor machines because, effectively, the kernel
                     25: is restricted to run on a single processor at any given time.
1.1       jmmv       26: 
                     27: The NetBSD kernel has been modified to use fine grained locks in many of
                     28: its different subsystems, achieving good performance on today's
                     29: multiprocessor machines.  Unfotunately, these changes have not yet been
                     30: applied to the networking code, which remains protected by the single lock.
1.2       jmmv       31: In other words: NetBSD networking has evolved to work in a uniprocessor
                     32: envionment; switching it to use fine-grained locked is a hard and complex
                     33: problem.
1.1       jmmv       34: 
1.2       jmmv       35: # Funding
1.1       jmmv       36: 
                     37: At this time, The NetBSD Foundation is accepting project specifications to
                     38: remove the single networking lock.  If you want to apply for this project,
1.2       jmmv       39: please send your proposal to the contact addresses listed above.
1.3     ! jmmv       41: Due to the size of this project, your proposal does not need to cover
        !            42: everything to qualify for funding.  We have attempted to split the work
        !            43: into smaller units, and **you can submit funding applications for these
        !            44: smaller subtasks independently** as long as the work you deliver fits in
        !            45: the grand order of this project.  For example, you could send an
        !            46: application to make the network interfaces alone MP-friendly (see the *work
        !            47: plan* below).
        !            48: 
1.2       jmmv       49: What follows is a particular design proposal, extracted from an
                     50: [original text]( written by
                     51: [Matt Thomas](  You may choose to work on this
                     52: particular proposal or come up with your own.
                     54: # Tentative specification
                     56: The future of NetBSD network infrastructure has to efficiently embrace two
                     57: major design criteria: Symmetric Multi-Processing (SMP) and modularity.
                     58: Other design considerations include not only supporting but taking
                     59: advantage of the capability of newer network devices to do packet
                     60: classification, payload splitting, and even full connection offload.
                     62: You can divide the network infrastructure into 5 major components:
                     64: * Interfaces (both real devices and pseudo-devices)
                     65: * Socket code
                     66: * Protocols
                     67: * Routing code
                     68: * mbuf code.
                     70: Part of the complexity is that, due to the monolithic nature of the kernel,
                     71: each layer currently feels free to call any other layer.  This makes
                     72: designing a lock hierarchy difficult and likely to fail.
                     74: Part of the problem are asynchonous upcalls, among which include:
                     76: * `ifa->ifa_rtrequest` for route changes.
                     77: * `pr_ctlinput` for interface events.
                     79: Another source of complexity is the large number of global variables
                     80: scattered throughout the source files.  This makes putting locks around
                     81: them difficult.
1.3     ! jmmv       83: ## Subtasks
        !            84: 
1.2       jmmv       85: The proposed solution presented here include the following tasks (in no
                     86: particular order) to achieve the desired goals of SMP support and
                     87: modularity:
                     89: [[!map show="title" pages="projects/project/* and tagged(project) and tagged(smp_networking)"]]
1.3     ! jmmv       91: ## Work plan
        !            92: 
        !            93: Aside from the list of tasks above, the work to be done for this project
        !            94: can be achieved by following these steps:
        !            95: 
        !            96: 1. Move ARP out of the routing table.  See the [[nexthop_cache]] project.
        !            97: 
        !            98: 1. Make the network interfaces MP, which are one of the few users of the
        !            99:    big kernel lock left.  This needs to support multiple receive and
        !           100:    transmit queues to help reduce locking contention.  This also includes
        !           101:    changing more of the common interfaces to do what the `tsec` driver does
        !           102:    (basically do everything with softints).  This also needs to change the
        !           103:    `*_input` routines to use a table to do dispatch instead of the current
        !           104:    switch code so domain can be dynamically loaded.
        !           105: 
        !           106: 1. Collect global variables in the IP/UDP/TCP protocols into structures.
        !           107:    This helps the following items.
        !           108: 
        !           109: 1. Make IPV4/ICMP/IGMP/REASS MP-friendly.
        !           110: 
        !           111: 1. Make IPV6/ICMP/IGMP/ND MP-friendly.
        !           112: 
        !           113: 1. Make TCP MP-friendly.
        !           114: 
        !           115: 1. Make UDP MP-friendly.
        !           116: 
1.2       jmmv      117: # Radical thoughts
                    119: You should also consider the following ideas:
                    121: ## LWPs in user space do not need a kernel stack
                    123: Those pages are only being used in case the an exception happens.
                    124: Interrupts are probably going to their own dedicated stack.  One could just
                    125: keep a set of kernel stacks around.  Each CPU has one, when a user
                    126: exception happens, that stack is assigned to the current LWP and removed as
                    127: the active CPU one.  When that CPU next returns to user space, the kernel
                    128: stack it was using is saved to be used for the next user exception.  The
                    129: idle lwp would just use the current kernel stack.
                    131: ## LWPs waiting for kernel condition shouldn't need a kernel stack
                    133: If an LWP is waiting on a kernel condition variable, it is expecting to be
                    134: inactive for some time, possibly a long time.  During this inactivity, it
                    135: does not really need a kernel stack.
                    137: When the exception handler get an usermode exeception, it sets LWP
                    138: restartable flag that indicates that the exception is restartable, and then
                    139: services the exception as normal.  As routines are called, they can clear
                    140: the LWP restartable flag as needed.  When an LWP needs to block for a long
                    141: time, instead of calling `cv_wait`, it could call `cv_restart`.  If
                    142: `cv_restart` returned false, the LWPs restartable flag was clear so
                    143: `cv_restart` acted just like `cv_wait`.  Otherwise, the LWP and CV would
                    144: have been tied together (big hand wave), the lock had been released and the
                    145: routine should have returned `ERESTART`.  `cv_restart` could also wait for
                    146: a small amount of time like .5 second, and only if the timeout expires.
                    148: As the stack unwinds, eventually, it would return to the last the exception
                    149: handler.  The exception would see the LWP has a bound CV, save the LWP's
                    150: user state into the PCB, set the LWP to sleeping, mark the lwp's stack as
                    151: idle, and call the scheduler to find more work.  When called,
                    152: `cpu_switchto` would notice the stack is marked idle, and detach it from
                    153: the LWP.
                    155: When the condition times out or is signalled, the first LWP attached to the
                    156: condition variable is marked runnable and detached from the CV.  When the
                    157: `cpu_switchto` routine is called, the it would notice the lack of a stack
                    158: so it would grab one, restore the trapframe, and reinvoke the exception
                    159: handler.
1.1       jmmv      160: """
                    161: ]]

CVSweb for NetBSD wikisrc <> software: FreeBSD-CVSweb