File:  [NetBSD Developer Wiki] / wikisrc / projects / project / atomic_radix_patricia_trees.mdwn
Revision 1.1: download - view: text, annotated - select for diffs
Thu Nov 10 03:06:51 2011 UTC (2 years, 5 months ago) by jmmv
Branches: MAIN
CVS tags: HEAD
Add a specific proposal for the SMP networking project.

This proposal is built on top of several individual, smaller projects, all
of which are related to achieve the goals of SMP support and modularity on
the network stack.  Keep in mind that this is just that: a proposal.
Applicants could still come up with their own ideas.

The text of all these new pages is mostly a copy/paste of the original
document written by matt@ (see http://www.netbsd.org/~matt/smpnet.html).
I have done some minor edits (hopefully not changing any of the technical
details) and added some preliminary texts to the pages.  (I was unable to
parse some of the sentences though, so they remain "as is"...)

[[!template id=project

title="Lockless, atomic and generic Radix/Patricia trees"

contact="""
[tech-kern](mailto:tech-kern@NetBSD.org),
[board](mailto:board@NetBSD.org),
[core](mailto:core@NetBSD.org)
"""

category="kernel"
difficulty="hard"
funded="The NetBSD Foundation"

description="""
This project proposal is a subtask of [[smp_networking]] and is elegible
for funding independently.

The goal of this project is to implement lockless, atomic and generic Radix
and Patricia trees.  BSD systems have always used a radix tree for their
routing tables.  However, the radix tree implementation is showing its age.
Its lack of flexibility (it is only suitable for use in a routing table)
and overhead of use (requires memory allocation/deallocation for insertions
and removals) make replacing it with something better tuned to today's
processors a necessity.

Since a radix tree branches on bit differences, finding these bit
differences efficiently is crucial to the speed of tree operations.  This
is most quickly done by XORing the key and the tree node's value together
and then counting the number of leading zeroes in the result of the XOR.
Many processors today (ARM, PowerPC) have instructions that can count the
number of leading zeroes in a 32 bit word (and even a 64 bit word).  Even
those that do not can use a simple constant time routine to count them:

    int
    clz(unsigned int bits)
    {
        int zeroes = 0;
        if (bits == 0)
            return 32;
        if (bits & 0xffff0000) bits &= 0xffff0000; else zeroes += 16;
        if (bits & 0xff00ff00) bits &= 0xff00ff00; else zeroes += 8;
        if (bits & 0xf0f0f0f0) bits &= 0xf0f0f0f0; else zeroes += 4;
        if (bits & 0xcccccccc) bits &= 0xcccccccc; else zeroes += 2;
        if (bits & 0xaaaaaaaa) bits &= 0xaaaaaaaa; else zeroes += 1;
        return zeroes;
    }

The existing BSD radix tree implementation does not use this method but
instead uses a far more expensive method of comparision.  Adapting the
existing implementation to do the above is actually more expensive than
writing a new implementation.

The primary requirements for the new radix tree are:

* Be self-contained.  It cannot require additional memory other than what
  is used in its data structures.

* Be generic.  A radix tree has uses outside networking.

To make the radix tree flexible, all knowledge of how keys are represented
has to be encapsulated into a `pt_tree_ops_t` structure with these
functions:

* `bool ptto_matchnode(const void *foo, const void *bar, pt_bitoff_t
  max_bitoff, pt_bitoff_t *bitoffp, pt_slot_t *slotp);`

  Returns true if both `foo` and `bar` objects have the identical string of
  bits starting at `*bitoffp` and ending before `max_bitoff`.  In addition
  to returning true, `*bitoffp` should be set to the smaller of
  `max_bitoff` or the length, in bits, of the compared bit strings.  Any
  bits before `*bitoffp` are to be ignored.  If the string of bits are not
  identical, `*bitoffp` is set to the where the bit difference occured,
  `*slotp` is the value of that bit in `foo`, and false is returned.  The
  `foo` and `bar` (if not `NULL`) arguments are pointers to a key member
  inside a tree object.  If bar is `NULL`, then assume it points to a key
  consisting of entirely of zero bits.

* `bool ptto_matchkey(const void *key, const void *node_key,
  pt_bitoff_t bitoff, pt_bitlen_t bitlen);`

  Returns true if both `key` and `node_key` objects have identical strings
  of `bitlen` bits starting at `bitoff`.  The `key` argument is the same
  key argument supplied to `ptree_find_filtered_node`.

* `pt_slot_t ptto_testnode(const void *node_key, pt_bitoff_t bitoff,
  pt_bitlen_t bitlen);`

  Returns `bitlen` bits starting at `bitoff` from `node_key`.  The
  `node_key` argument is a pointer to the key members inside a tree object.

* `pt_slot_t ptto_testkey(const void *key, pt_bitoff_t bitoff,
  pt_bitlen_t bitlen);`

  Returns `bitlen` bits starting at `bitoff` from key.  The `key` argument
  is the same key argument supplied to `ptree_find_filtered_node`.

All bit offsets are relative to the most significant bit of the key,

The ptree programming interface should contains these routines:

* `void ptree_init(pt_tree_t *pt, const pt_tree_ops_t *ops, size_t
  ptnode_offset, size_t key_offset);`

  Initializes a ptree.  If `pt` points at an existing ptree, all knowledge
  of that ptree is lost.  The `pt` argument is a pointer to the `pt_tree_t`
  to be initialized.  The `ops` argument is a pointer to the
  `pt_tree_ops_t` used by the ptree.  This has four members: The
  `ptnode_offset` argument contains the offset from the beginning of an
  item to its `pt_node_t` member.  The `key_offset` argument contains the
  offset from the beginning of an item to its key data.  This is used if 0
  is used, a pointer to the beginning of the item will be generated.

* `void *ptree_find_filtered_node(pt_tree_t *pt, const void *key,
  pt_filter_t filter, void *filter_ctx);`

  The filter argument is either `NULL` or a function `bool (*)(const
  void *, void *, int);`

* `bool ptree_insert_mask_node(pt_tree_t *pt, void *item, pt_bitlen_t
  masklen);`

* `bool ptree_insert_node(pt_tree_t *pt, void *item);`

* `void *ptree_iterate(pt_tree_t *pt, const void *node, pt_direction_t
  direction);`

* `void ptree_remove_node(pt_tree_t *pt, const pt_tree_ops_t *ops,
  void *item);`
"""
]]

[[!tag smp_networking]]

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb