BITS-Tree-An Efficient Data Structure for Segment Storage and Query Processing

In this paper, a new and novel data structure is proposed to dynamically insert and delete segments. Unlike the standard segment trees[3], the proposed data structure permits insertion of a segment with interval range beyond the interval range of the existing tree, which is the interval between minimum and maximum values of the end points of all the segments. Moreover, the number of nodes in the proposed tree is lesser as compared to the dynamic version of the standard segment trees, and is able to answer both stabbing and range queries practically much faster compared to the standard segment trees.


Introduction
Solving geometrical problems computationally seems to be difficult due to its complex nature such as varying dimensionality and shape.Several algorithms are available in literature [4,8] for storing and retrieving geometrical objects.Segment tree [3] is one such data structure designed to handle intervals on the real line, whose extremes belong to a fixed set of abscissas.The static segment tree requires O(n log n) space, and the stabbing query processing time is O(k + log n), where n is the number of segments and k is the number of output segments.Though it looks reasonable in space and query processing, it does not support dynamic insertion and deletion of segments.However, in the dynamic segment trees, one can insert or delete a segment in the existing segment tree, but storage space requirements are likely get increased as the range of the tree is not based on the segment end points.The range of the tree is predefined, and one can insert or delete the segments whose interval contained in this range, thus in those cases where the range of the segment falls beyond the range of the segment tree, reconstruction of the tree is the only possibility, which is not advisable.Also, this may practically increases the time as the number of nodes to be traversed get increased.
Variants of segment trees are used in packet classification problem.One such data structure is Fat Inverted Segment Trees (FIS-trees) [5,9] with the space requirements of O(n 1+ 1 l ) and having complexity for insertion/deletion as O(n 1 l log n), where l is the height of the tree.Here, the tree is compressed (made 'fat') by increasing the degree to more than two in order to decrease the height, however this has an upper bound on the total number of insertions and deletions allowed.Also, Agarwal et.al [2] proposed a linear-size data structure for the stabbing semi-group problem by combining the features of interval and segment tree, with the assumption that the end points of all intervals belong to a fixed set of points.
In this paper, a novel data structure, BITS-tree (Balanced Inorder Threaded Segment Tree),is proposed to dynamically insert and delete segments, with interval range of the node is only based on the end points of segments stored in the tree.In addition to this, it answers both stabbing and range queries efficiently.Unlike in the dynamic version of the standard segment trees, the root of the BITS-tree does not have the information of overall interval range of nodes in the tree, the BITS-tree also permits insertion of segment with any interval range, and thus it is a real dynamic tree that is suitable for all situations.

BITS-Tree
Definition: 1 A BIT S-tree is a height balanced two-way inorder-threaded binary tree T that satisfies the following properties.
1.Each node v of T associates with a range r and a list of segments containing the range r. 2. The range of a node cannot overlap with a range of any other node, other than at the end points.3. The range of the nodes are sorted according to the inorder sequence.4. It has a special node, called dummy, with range φ (empty), and its list is φ.

The inorder predecessor of the first node of the inorder sequence is dummy.
Similarly the inorder successor of the last node of the inorder sequence is dummy.
For example, consider the BIT S-tree given figure 1.The inorder sequence of range is [5, 10], [10, 12], [12, 15] and [18, 21], which is sorted in ascending order, as this satisfies the binary search tree properties on ranges.So, searching for a particular range could be done as in the case of binary search tree, using the following definition.The doted lines in figure 1 denote threads.The hanging threads points to the dummy node.
Definition: 2 Given two segments S = [p, q] and T = [m, n], S < T implies q ≤ m, that is the segment S falls left of T .Similarly, S > T implies n ≤ p, and this means S falls right of T .
Definition: 3 Given two segments S = [p, q] and T = [m, n], where p = q and m = n, p < m < q < n denotes S left overlaps with T .Similarly, m < p < n < q denotes S right overlaps with T .Otherwise, S is said to be contained in T when m < p < q ≤ n or m ≤ p < q < n, or covered up T when p < m < n ≤ q or p ≤ m < n < q.
All four cases stated in definition 3 are commonly called as S and T overlaps.Note that, S left overlaps with T also implies T right overlaps with S. Similarly, S contained in T also implies T covered up T .
Definition: 6 Let S = [p, q] and T = [m, n] be the two overlapping segments and their associated lists respectively be list(S) and list(T ).Now, the list associated with C(S, T ), L(S, T ) and R(S, T ) are defined as follows.

Insertion
For to insert a segment, say S, one has to search for a node, whose range overlaps with S, from root like in the binary search tree based on the relation given in definition 2. As soon as, an overlapping range is found in any node v, the insertion process starts as follows.Let R be the range of v. Now, S ∪ R is partitioned into sub-ranges as described in definition 5. Now, range of v will be changed as C(S, R) and the segment S will be included in the list of v. Further, insertion of L(S, R) (if not empty) continues with the inorder predecessor of v, and insertion of R(S, R) (if not empty) continues with the inorder successor of v.However, in these two cases, the list of segments to be inserted may change, as in definition 6.Note that initially, the list of segments to inserted contains only S, and later may be changed according to definition 6.At one point of time, the insertion may reach a null pointer (not always).If so, the balancing range (sub-range of segment not yet considered) will be considered as a range for a node, by creating a new one, with necessary thread pointers and having the list currently to be inserted.As the tree is balanced, creation of a new node may require necessary rotation to get the tree to be balanced.The rotations we performed here is only the AVL-tree based rotations [1].The formal description of the insertion procedure is given in algorithm 1.The CreateNode() function used in algorithm 1, create a node as in threaded binary tree [6,7], however here the treads are bidirectional.For example, consider insertion of segment d = [2,7] in the BITS-tree given in figure 1.The search starts with the root node, and stop in the node with range r = [5,10] [5,7] is retained at that node by including d in the list of this node, and further insertion takes with respect to the ranges [2,5] and [7,10] at the inorder predecessor and successor, respectively.This Algorithm 1: BITS-Insert(T ,S,L) Input: T -the pointer to the root of the tree, S -the segment to be inserted; and L -Set of segments, initially  process in turn creates two new nodes in the tree given in figure 1, one with range [2,5] and other with range [7,10].This is shown in figure 2(b).As creation of new node with range [2,5] does not affect the balancing factor, no rotation is required.However, creation of new node for the [7,10] affects the balancing factor (figure 2(b)), and hence necessary rotation to be carried out, here it is left-right rotation.The effect of this rotation is shown in figure 2(c).Note here that the thread pointers are not affected due to rotation.Whichever be the rotation, the thread pointer cannot change, as the rotation does not affect the inorder sequence.Note here that the usage of threads obviously reduces the time for inserting a segment, as compared to the standard dynamic version of the segment tree.The time required for inserting a node is O(log n + k), where the first factor log n stands for locating first overlapping node, and k be the number of nodes in which the new segment get inserted.Note that, a segment tree contains at most 2n−1 nodes, where n is the number of segments in the tree.

Deletion
Let S = [p, q] be the segment to be deleted.First step of the deletion process is to locate a node v with range [p, − q ], where − q denotes a value less than or equal to q.If the list of the node v does not contain the segment S, then deletion does not require, and the process of deletion can be terminated.Otherwise, the segment S can be removed from the list of v. Now, after removal of S, if LIST (v) is empty then convert the RAN GE(v) as φ (empty), and v should be merged with P RED(v).Note that, during merge the node appears at lower level will be joined at the node at the higher level, and also it performs necessary rotation if required to make the tree balanced.In case, the LIST (v) is not empty after removal of S then compare the LIST (v) with the LIST (P RED(v)) and if found matching then merge the node v with P RED(v).Now, the deletion process can be continued with the SU CC(v), and that to be continued till the node w with range [ p −, q], where p − denotes a value greater than equal to p.Then, to complete the deletion, compare the new range of w with the SU CC(w), and if found matching then w and SU CC(w) must be merged.Each time when a node is merged with the inorder predecessor or successor, necessary rotation need to be performed as in the case of AVL trees.

Algorithm 2: BITS-Deletion
Input: T -the pointer to the root of the tree, and [p, q] -the segment to be deleted.begin while

SUCC(T)); end
As an example, suppose segment a = [5,12] is removed, as in figure 3(a).This process starts with node having range [5,7].Now, after removal of segment a from this node, the list of this node matches with its inorder predecessor, thus to be merged with its inorder predecessor and the result is shown in figure 3(c).Next, the process continues with the node having range [7,10].Now by removing segment a from this node, its list becomes empty and so its range to be modified as empty(φ) and required merging is to be done with its inorder predecessor, and the result is shown in figure 3(d).Finally, the segment a must be removed from the list of the node having range [10,12].Here, the list of this node does not match with its inorder predecessor, and hence merging with the predecessor is not required, rather as a last step of the deletion process, its range is to be compared with the inorder successor, and as it is matching it should be merged with its in order successor.The final result is shown in figure 3(e).
Like insertion, deletion also takes O(log n + k) time, where k is the number of nodes in which the segment to be deleted is present.

Stabbing Query
Given a point (one dimensional) p, it reports the segments containing p. On the other hand, it reports set of all segments intersects with the line x = p.In standard segment1 , the answer to this query lies in several nodes and one has to take union of lists of all those nodes to report the final answer.However, in BITS-tree, the answer lies in only one node.Here, the process is only to find the node whose range contains p, and the list of that node is the answer to the query.As the ranges of the nodes are not overlapped, a simple binary search will do for locating the range contains p. Since, BITS-tree is height balanced, O(log n) time is sufficient for locating a node with such range.Due to reduction in number of nodes and height as compared to the standard segment tree, our approach certainly outperforms considerably as compared to the existing one.

Range Query
Knowing set of segments overlaps with a given segment is an important problem in computational geometry [4].In BITS-tree, such queries can be answered very efficiently as compared to the standard segment trees.Here, given a segment s = [p, q], finding set of overlapping segments are determined by first locating a node u having range perfectly contains p, and traverse through the inorder successor till reaching the node v with range perfectly contains q.Now the answer is the union of list of nodes traversed from u to v. The time required for the same is O(log n + k), where k is the number of nodes containing answer to the query.

Comparison
The static version of the standard segment tree of the segments, given in figure 1, is shown in figure 4. Similarly, the best dynamic version of the standard segment tree for the same set of segments is shown in figure 5.For this particular example, the comparison is shown in table 1.The maximum stabbing query time for the static segment tree comes when the query point is 15, and in the dynamic segment tree the maximum query time comes when the query point is 13.However, this analysis is not sufficient, and thus the factors related to the theoretical analysis with respect to the number of segments as n is shown in table 2, q q q q q q −∞ +∞ Figure 4: Static version of the standard segment tree for the segments shown in figure 1.For the following lemmas, let us assume SST and DST respectively represent the static and dynamic versions of the standard segment trees.Also, the terms trees and BITS trees are used interchangeably in rest of the paper.
Lemma: 1 The maximum number of nodes required for the BITS tree is 2n − 1, where n is the number of segments.
Proof: The range of nodes in BITS trees are due to the end points of the segments, and these ranges are not overlapped with the ranges of other nodes in the tree.As n segments partition the range [−∞, +∞] into 2n + 1 sub-ranges, the BITS tree can have the maximum of 2n − 1 nodes, since the BITS tree cannot have the nodes with range either [−∞, x] or [y, +∞], where x and y are respectively min 1≤i≤n {x i |s i = [x i , y i ]} and max 1≤i≤n {y i |s i = [x i , y i ]} for the set of segments {s 1 , s 2 , • • • s n }.
Lemma: 2 Maximum storage required for maintaining the lists on all the nodes together in BITS tree is n 2 .
Proof: The maximum storage required for maintaining the lists arises only when one of the node's (say, N ) list with n segments and two nodes, other than N , each with n − 1, n − 2, • • • , 1 segments.Thus, a total of n + 2((n − 1) + (n − 2) + • • • + 1), which is n 2 , segments (may be with replication across the lists) stored altogether in the BITS tree.
Lemma: 3 The height of the BITS trees does not exceed ⌈log n⌉.
Proof: The proof follows from the fact that the height of the AVL tree is less than 1.441⌈log n⌉ [1].
Lemma: 4 Stabbing query on BITS tree can be addressed within ⌈log n⌉ + 1 + k time, where k is the number of output segments.
Proof: As the height of the tree is ⌈log n⌉, to locate a node whose range contains the query point requires O(⌈log n⌉) time.Once such a node is found, the segments exists in the list of that node becomes the answer, and thus, we add θ(k) time to output the segments.However, if the query point is exactly one of end points of a segment, it is required to access either the inorder successor or the inorder predecessor of the node to list out the segments.In this case, it requires an additional θ(1) time to visit that node using the threads.Thus, the stabbing query can be addressed within the time of ⌈log n⌉ + 1 + k.
Lemma: 5 Let [n 1 , n 2 ] be the range of root of a DST T , and n be the number of segments stored in T .Then, n ≤ n 2 − n 1 .
Proof: Let k be the number of leaf nodes in T .Let the range of i-th leaf be [a i , b i ], where which is the disjoint union of ranges of unit length across the range of the root.Thus, if n > n 2 − n 1 , then there should at least one segment, say s Lemma: 6 BITS trees outperforms SST and DST in case of range search.
Proof: Given a range, say r = [x 1 , x 2 ], the range search has to find the number of segments overlapped with r.In BITS tree, this is done by locating the node N 1 , whose range contains x 1 , then by visiting the nodes using inorder successor threads till reaching the node N 2 with the range containing x 2 .Here, the segments stored in the list of nodes between N 1 and N 2 in the inorder sequence will be the answer.This process needs to visit O(log n + p) nodes, where p is the number of nodes lies between N 1 and N 2 in the inorder sequence.Note that, log n + p ≤ m, where m is the number of nodes in the tree, as a node visited once cannot be visited again.
However, it is impossible to use inorder sequence in standard segment trees due to the following two reasons.
• The SST or BST does not have inorder thread to efficiently access the next node in the inorder sequence.
• The range of the nodes are overlapped with its ancestors.
Thus, to do range search on SST or DST, one has to use a known binary tree traversal method to answer the query, and that takes the time of O(m), where m is the number of nodes in the tree.This is due to fact that the properties of segment trees are not useful, in this case, to find the range search.Thus, the BITS trees outperforms both SST and DST in case of answering the range query.
The above analysis shows that BITS trees are better, as compared to the standard segment trees, except in list size requirements.Asymptotically, it is O(n 2 ) for BITS trees and O(n log n) for the SST, and may be more for DST as the number of segments does not determine this.However, this is not a major issue, as in any search tree, the time taken to query processing is more important than the space requirements.Once the segments are stored, the overall complexity relies on the number of queries performed.Moreover, in BITS tree the answer (for stabbing query) lies in only one node, but in SST or DST it may spread in several nodes.Due to reduction of height and number of nodes in the tree, it is sure that the BITS trees are much better compared to the standard segment trees.

Conclusion
The BITS-tree, proposed in this paper, is a balanced two-way inorder threaded segment tree, which is useful for storing geometric data objects available in one dimension.It handles both stabbing and range queries very efficiently than the standard segment tress.The BITS-trees are not restricted to any bound in range, unlike in the standard segment trees, and that is main advantage of this structure.Thus, one can insert a segment of any range without knowing the overall range the segment tree.Moreover, it is quite possible to generalize this to higher dimension with slight modifications in the data representation.

Figure 2 :
Figure 2: Insertion: (a) Set of segments including new segment d.(b) Insertion of segment d in BITS-tree in figure 1, before rotation.(c) BITS-tree after rotation.

Figure 3 :
Figure 3: Deletion: (a) Set of segments including the segment a to be deleted.(b) Original BIT S-tree.(c) After removal of segment a from node having range [5,7].(d) After removal of segment a from node having range [7,10].(e) After removal of segment a from node having range [10,12].

Figure 5 :
Figure 5: Dynamic version of the standard segment tree for the segments shown in figure 1.

Table 1 :
Comparison of trees given in figures 1, 4 and 5. SST denotes Static Segment Tree, DST denotes Dynamic Segment Tree and BITS denotes BITS Tree.

Table 2 :
Theoretical comparison of static segment trees, dynamic segment trees and the proposed BITS trees.SST denotes Static Segment Trees, DST denotes Dynamic Segment Trees and BITS denotes BITS Trees.