[[!template id=project title="Lockless, atomic and generic Radix/Patricia trees" contact=""" [tech-kern](mailto:tech-kern@NetBSD.org), [board](mailto:board@NetBSD.org), [core](mailto:core@NetBSD.org) """ category="kernel" difficulty="hard" description=""" This project proposal is a subtask of [[smp_networking]]. The goal of this project is to implement lockless, atomic and generic Radix and Patricia trees. BSD systems have always used a radix tree for their routing tables. However, the radix tree implementation is showing its age. Its lack of flexibility (it is only suitable for use in a routing table) and overhead of use (requires memory allocation/deallocation for insertions and removals) make replacing it with something better tuned to today's processors a necessity. Since a radix tree branches on bit differences, finding these bit differences efficiently is crucial to the speed of tree operations. This is most quickly done by XORing the key and the tree node's value together and then counting the number of leading zeroes in the result of the XOR. Many processors today (ARM, PowerPC) have instructions that can count the number of leading zeroes in a 32 bit word (and even a 64 bit word). Even those that do not can use a simple constant time routine to count them: int clz(unsigned int bits) { int zeroes = 0; if (bits == 0) return 32; if (bits & 0xffff0000) bits &= 0xffff0000; else zeroes += 16; if (bits & 0xff00ff00) bits &= 0xff00ff00; else zeroes += 8; if (bits & 0xf0f0f0f0) bits &= 0xf0f0f0f0; else zeroes += 4; if (bits & 0xcccccccc) bits &= 0xcccccccc; else zeroes += 2; if (bits & 0xaaaaaaaa) bits &= 0xaaaaaaaa; else zeroes += 1; return zeroes; } The existing BSD radix tree implementation does not use this method but instead uses a far more expensive method of comparision. Adapting the existing implementation to do the above is actually more expensive than writing a new implementation. The primary requirements for the new radix tree are: * Be self-contained. It cannot require additional memory other than what is used in its data structures. * Be generic. A radix tree has uses outside networking. To make the radix tree flexible, all knowledge of how keys are represented has to be encapsulated into a `pt_tree_ops_t` structure with these functions: * `bool ptto_matchnode(const void *foo, const void *bar, pt_bitoff_t max_bitoff, pt_bitoff_t *bitoffp, pt_slot_t *slotp);` Returns true if both `foo` and `bar` objects have the identical string of bits starting at `*bitoffp` and ending before `max_bitoff`. In addition to returning true, `*bitoffp` should be set to the smaller of `max_bitoff` or the length, in bits, of the compared bit strings. Any bits before `*bitoffp` are to be ignored. If the string of bits are not identical, `*bitoffp` is set to the where the bit difference occured, `*slotp` is the value of that bit in `foo`, and false is returned. The `foo` and `bar` (if not `NULL`) arguments are pointers to a key member inside a tree object. If bar is `NULL`, then assume it points to a key consisting of entirely of zero bits. * `bool ptto_matchkey(const void *key, const void *node_key, pt_bitoff_t bitoff, pt_bitlen_t bitlen);` Returns true if both `key` and `node_key` objects have identical strings of `bitlen` bits starting at `bitoff`. The `key` argument is the same key argument supplied to `ptree_find_filtered_node`. * `pt_slot_t ptto_testnode(const void *node_key, pt_bitoff_t bitoff, pt_bitlen_t bitlen);` Returns `bitlen` bits starting at `bitoff` from `node_key`. The `node_key` argument is a pointer to the key members inside a tree object. * `pt_slot_t ptto_testkey(const void *key, pt_bitoff_t bitoff, pt_bitlen_t bitlen);` Returns `bitlen` bits starting at `bitoff` from key. The `key` argument is the same key argument supplied to `ptree_find_filtered_node`. All bit offsets are relative to the most significant bit of the key, The ptree programming interface should contains these routines: * `void ptree_init(pt_tree_t *pt, const pt_tree_ops_t *ops, size_t ptnode_offset, size_t key_offset);` Initializes a ptree. If `pt` points at an existing ptree, all knowledge of that ptree is lost. The `pt` argument is a pointer to the `pt_tree_t` to be initialized. The `ops` argument is a pointer to the `pt_tree_ops_t` used by the ptree. This has four members: The `ptnode_offset` argument contains the offset from the beginning of an item to its `pt_node_t` member. The `key_offset` argument contains the offset from the beginning of an item to its key data. This is used if 0 is used, a pointer to the beginning of the item will be generated. * `void *ptree_find_filtered_node(pt_tree_t *pt, const void *key, pt_filter_t filter, void *filter_ctx);` The filter argument is either `NULL` or a function `bool (*)(const void *, void *, int);` * `bool ptree_insert_mask_node(pt_tree_t *pt, void *item, pt_bitlen_t masklen);` * `bool ptree_insert_node(pt_tree_t *pt, void *item);` * `void *ptree_iterate(pt_tree_t *pt, const void *node, pt_direction_t direction);` * `void ptree_remove_node(pt_tree_t *pt, const pt_tree_ops_t *ops, void *item);` """ ]] [[!tag smp_networking]]