An overview to binary trees and BST
When we first learn about coding and organising data, we almost always start with arrays. It is ordered, it is linear, easy peasy. Arrays are an example of linear data structures, of which data can only be accessed sequentially. Then we learn about the Document Object Model (DOM), Cascading Style Sheets (CSS) and are told that they use “tree structures” and dive into selectors. That may come naturally because we are familiar with computer file systems — drive to folders to files at the end, which follows a general tree structure! So just how often do we deal with trees?
Even beginner developers deal with trees allllll the time.
Anytime you interact with:
- Browser DOMs
- Frameworks, modules
- Client-server
- Google search (yes, this too!)
You are seeing the application of trees. Trees avail us to non-linear data structuring. It is a collection of nodes, each connected to a parent, where the ultimate ancestor node is called the root node (resembling an up-side-down tree). File systems are an example application of a general tree structure. The simplest tree is a linked-list, which is just a string of parent-child relationships (a linear data structure).
How Google Search Works (so quickly)
Crawling & Indexing: The journey of a query starts before we ever type a search, with crawling and indexing the web of trillions of documents. Google essentially gathers the pages during the crawl process and then creates an index much like the index in the back of a book.
Searching: When we search, at the most basic level, their algorithms use those search terms to traverse the index to find the appropriate pages.
— Edited from https://www.geeksforgeeks.org/google-search-works/
What is a binary tree
Let’s discuss a specific type of tree — a binary tree.
Each node in a binary tree can have at most two child-nodes. Following this principle, each node has 3 properties — value, left-child and right-child. From the root, the tree is split into two sections, the left-subtree and the right-sub-tree that in turn have the same pattern.
Binary search tree — BST
A binary search tree is essentially a sorted version of the binary tree, where the left-child is always less than the parent node and the right-child is always greater than the parent node. This means that operations based on search, min and max can be done more efficiently, where one of the subtrees is discarded at each comparison i.e. similar to a binary search. The binary tree is called balanced if the height of the left and right sub-trees are similar.
What is binary search?
In a nutshell, binary search is the fastest way to search through a sorted dataset using the divide-and-conquer approach. It works by repeatedly dividing in half the portion of the data that could contain the target item with each comparison, until we have narrowed down the possible values to just one. While not super useful for small datasets, it reduces the time taken to complete the operation exponentially for large databases (think thousands to millions of data points). In O notation, from O(n) for linear search to O(lg n), log base 2 for binary search.
Why not just do a binary search on a sorted array?
BSTs support everything we can get out of a sorted array — binary search (or a similar method), min/max, floor/ceil etc. Instead of sorting data linearly from least to greatest however, the data is there in many chunks, making it easier to insert or delete elements. This makes BSTs ideal for dynamically changing datasets where the data is some “sortable” type eg. integers/dates with the typical “natural” order. As a note, balanced BSTs complete these operations in O(h) time where h is the height of the tree.
A basic example — imagine we have defined a class BinarySearchTree with a method insert that recursively looks for the right spot to insert a new node.
Calling insert(5) on the BST will result in the following by checking that 5 is more than 2 , and less than 10.
Self-balancing trees
Basic BSTs work great if they come balanced, where the height is similar for both subtrees i.e. the root node’s value is close to the median of the data points. What if each additional input was skewed to one side of the tree? The time to search would be close to linear searching! Which, of course, defeats the purpose of using a BST.
Enter self-balancing trees. While there are several types, I will briefly touch on AVL and Red-black trees.
AVL Trees
Invented by Adelson, Velski & Landis, AVL trees are self-balancing binary search trees. It checks and assures that the height of the left and right subtrees are not off by more than 1. This difference is called the balance factor. Each node stores an integer per node corresponding to its height in the tree.
To balance itself, an AVL tree may perform 4 types of rotations: Left, Right, Left-Right and Right-left, depending on the insertion.
Red Black Trees
While also self-balancing, Red Black trees are looser about the height difference of its subtrees, specifying that no black-node-to-root height is over twice as high as any other. This means that the Red Black is probably preferable if your application has frequent inserts/deletes since there are fewer rotations to be made. AVL’s stricter height rules means traversal is done faster, though.
Red Black self-balances according to its rules by similar rotations as well as recoloring:
For more information on Red Black rules and how to insert nodes, check out Kevin Mavani’s article on them.
Binary Trees — Real World Applications
- Binary Search Tree — Used in many search applications where data is constantly entering/leaving.
- Splay Tree (A type of BST) — Used for optimised search as well, with the additional property that frequently accessed nodes are moved to the top.
- Binary Space Partition — Used in 3D game engines to determine which objects to render.
- Syntax Trees — Used in compilers to efficiently parse syntax expressions
- Binary Tries — Used in almost every router for storing router-tables
- Routing trees for network traffic
- And many more..
TLDR
All of us (or most of us) interact with data all the time, whether directly as a developer or indirectly as a user. Tree structures provide us with a great way to store and access non-linear data. Binary trees and their variations, in particular, are used in many ways when it comes to traversing “sortable” data. Hopefully (or sadly), you might start noticing their tracks everywhere from now on :)
Resources
- http://www.btechsmartclass.com/
- https://www.geeksforgeeks.org/binary-search-tree-data-structure/
- https://www.cs.cmu.edu/~clo/www/CMU/DataStructures/
- https://afteracademy.com/blog/binary-search-tree-introduction-operations-and-applications
- https://en.wikipedia.org/wiki/Binary_search_tree
- Paul Programming https://youtu.be/sf_9w653xdE