File:  [NetBSD Developer Wiki] / wikisrc / projects / project / smp_networking.mdwn
Revision 1.6: download - view: text, annotated - select for diffs
Thu Feb 14 22:22:53 2013 UTC (5 years, 7 months ago) by wiki
Branches: MAIN
CVS tags: HEAD
web commit by spz

    1: [[!template id=project
    3: title="SMP Networking (aka remove the big network lock)"
    5: contact="""
    6: [tech-kern](,
    7: [tech-net](,
    8: [board](,
    9: [core](
   10: """
   12: category="networking"
   13: difficulty="hard"
   15: description="""
   16: Traditionally, the NetBSD kernel code had been protected by a single,
   17: global lock.  This lock ensured that, on a multiprocessor system, two
   18: different threads of execution did not access the kernel concurrently and
   19: thus simplified the internal design of the kernel.  However, such design
   20: does not scale to multiprocessor machines because, effectively, the kernel
   21: is restricted to run on a single processor at any given time.
   23: The NetBSD kernel has been modified to use fine grained locks in many of
   24: its different subsystems, achieving good performance on today's
   25: multiprocessor machines.  Unfotunately, these changes have not yet been
   26: applied to the networking code, which remains protected by the single lock.
   27: In other words: NetBSD networking has evolved to work in a uniprocessor
   28: envionment; switching it to use fine-grained locked is a hard and complex
   29: problem.
   31: # This project is currently claimed
   33: # Funding
   35: At this time, The NetBSD Foundation is accepting project specifications to
   36: remove the single networking lock.  If you want to apply for this project,
   37: please send your proposal to the contact addresses listed above.
   39: Due to the size of this project, your proposal does not need to cover
   40: everything to qualify for funding.  We have attempted to split the work
   41: into smaller units, and **you can submit funding applications for these
   42: smaller subtasks independently** as long as the work you deliver fits in
   43: the grand order of this project.  For example, you could send an
   44: application to make the network interfaces alone MP-friendly (see the *work
   45: plan* below).
   47: What follows is a particular design proposal, extracted from an
   48: [original text]( written by
   49: [Matt Thomas](  You may choose to work on this
   50: particular proposal or come up with your own.
   52: # Tentative specification
   54: The future of NetBSD network infrastructure has to efficiently embrace two
   55: major design criteria: Symmetric Multi-Processing (SMP) and modularity.
   56: Other design considerations include not only supporting but taking
   57: advantage of the capability of newer network devices to do packet
   58: classification, payload splitting, and even full connection offload.
   60: You can divide the network infrastructure into 5 major components:
   62: * Interfaces (both real devices and pseudo-devices)
   63: * Socket code
   64: * Protocols
   65: * Routing code
   66: * mbuf code.
   68: Part of the complexity is that, due to the monolithic nature of the kernel,
   69: each layer currently feels free to call any other layer.  This makes
   70: designing a lock hierarchy difficult and likely to fail.
   72: Part of the problem are asynchonous upcalls, among which include:
   74: * `ifa->ifa_rtrequest` for route changes.
   75: * `pr_ctlinput` for interface events.
   77: Another source of complexity is the large number of global variables
   78: scattered throughout the source files.  This makes putting locks around
   79: them difficult.
   81: ## Subtasks
   83: The proposed solution presented here include the following tasks (in no
   84: particular order) to achieve the desired goals of SMP support and
   85: modularity:
   87: [[!map show="title" pages="projects/project/* and tagged(project) and tagged(smp_networking) and tagged(status:active)"]]
   89: ## Work plan
   91: Aside from the list of tasks above, the work to be done for this project
   92: can be achieved by following these steps:
   94: 1. Move ARP out of the routing table.  See the [[nexthop_cache]] project.
   96: 1. Make the network interfaces MP, which are one of the few users of the
   97:    big kernel lock left.  This needs to support multiple receive and
   98:    transmit queues to help reduce locking contention.  This also includes
   99:    changing more of the common interfaces to do what the `tsec` driver does
  100:    (basically do everything with softints).  This also needs to change the
  101:    `*_input` routines to use a table to do dispatch instead of the current
  102:    switch code so domain can be dynamically loaded.
  104: 1. Collect global variables in the IP/UDP/TCP protocols into structures.
  105:    This helps the following items.
  107: 1. Make IPV4/ICMP/IGMP/REASS MP-friendly.
  109: 1. Make IPV6/ICMP/IGMP/ND MP-friendly.
  111: 1. Make TCP MP-friendly.
  113: 1. Make UDP MP-friendly.
  115: # Radical thoughts
  117: You should also consider the following ideas:
  119: ## LWPs in user space do not need a kernel stack
  121: Those pages are only being used in case the an exception happens.
  122: Interrupts are probably going to their own dedicated stack.  One could just
  123: keep a set of kernel stacks around.  Each CPU has one, when a user
  124: exception happens, that stack is assigned to the current LWP and removed as
  125: the active CPU one.  When that CPU next returns to user space, the kernel
  126: stack it was using is saved to be used for the next user exception.  The
  127: idle lwp would just use the current kernel stack.
  129: ## LWPs waiting for kernel condition shouldn't need a kernel stack
  131: If an LWP is waiting on a kernel condition variable, it is expecting to be
  132: inactive for some time, possibly a long time.  During this inactivity, it
  133: does not really need a kernel stack.
  135: When the exception handler get an usermode exeception, it sets LWP
  136: restartable flag that indicates that the exception is restartable, and then
  137: services the exception as normal.  As routines are called, they can clear
  138: the LWP restartable flag as needed.  When an LWP needs to block for a long
  139: time, instead of calling `cv_wait`, it could call `cv_restart`.  If
  140: `cv_restart` returned false, the LWPs restartable flag was clear so
  141: `cv_restart` acted just like `cv_wait`.  Otherwise, the LWP and CV would
  142: have been tied together (big hand wave), the lock had been released and the
  143: routine should have returned `ERESTART`.  `cv_restart` could also wait for
  144: a small amount of time like .5 second, and only if the timeout expires.
  146: As the stack unwinds, eventually, it would return to the last the exception
  147: handler.  The exception would see the LWP has a bound CV, save the LWP's
  148: user state into the PCB, set the LWP to sleeping, mark the lwp's stack as
  149: idle, and call the scheduler to find more work.  When called,
  150: `cpu_switchto` would notice the stack is marked idle, and detach it from
  151: the LWP.
  153: When the condition times out or is signalled, the first LWP attached to the
  154: condition variable is marked runnable and detached from the CV.  When the
  155: `cpu_switchto` routine is called, the it would notice the lack of a stack
  156: so it would grab one, restore the trapframe, and reinvoke the exception
  157: handler.
  158: """
  159: ]]

CVSweb for NetBSD wikisrc <> software: FreeBSD-CVSweb