HCTrie: A Structure for Indexing Hundreds of Dimensions for Use in File Systems Search Yasuhiro Ohara Storage Systems Research Center & Computer Science Department University of California, Santa Cruz Santa Cruz, California 95064 Email: [email protected] Abstract—Data management in large-scale storage systems involves indexing and search for data objects (e.g., files). There are hundreds of types of metadata attributed to the data objects: examples include environmental settings of photograph files and simulation configurations for simulation output files. To provide intelligent file search that uses file metadata, we introduce a novel search structure called Hyper-Cube Trie (HCTrie), that can handle a few hundred dimensions of data attributes. HCTrie can utilize the differences in many dimensions effectively: candidates can be pruned based on differences in all dimensions. To the best of our knowledge, this is the first approach to restrain the memory growth to a linear scale against the number of dimensions, when multiple dimensions are indexed at the same time. Our prototype has successfully indexed five million data entries with one hundred attributes in a single data structure. We show that HCTrie can outperform MySQL in range search where ranges for less than 100 dimensions are specified in the search query. I. I NTRODUCTION Data management is becoming a challenging problem, especially in High Performance Computing (HPC) where huge amounts of scientific data must be handled. The number of files that must be handled reaches billions [16], making data management difficult. Data are stored in file systems, but the users cannot always remember the file path name [22]. Consequently, HPC scientists are required to manage their simulation results using external notes, including long complex directory and file names [14]. This is becoming a serious issue as we move toward exascale computing where hundreds of exabytes of data and billions of files exist in the file system. To tackle this, there is a growing demand to search by metadata, rather than by hierarchical file name [17]. Scientific data typically has many metadata fields, such as configuration parameters of the measurement and/or the simulation that derived the data. For example, the WISE All-Sky Data Release includes 285 fields (columns) in the data set [18]. There is a demand to search a data entry or file using this scientific metadata. There are past proposals to overcome this issue by the use of a database management system (DBMS) [1], [22]. DBMSes can construct an index on each column independently. Typically, some of the columns are selected and indexed based on the specific query trend for the target application. However, since using a DBMS to index file systems is not a widely used technique, such a query trend is not known. Without the search query trend, a DBMS that constructs indices only on some columns cannot perform all search functions efficiently. As an example of inefficient DBMS performance, we note that a query involving join operations may take 400–1000 seconds [9]. In this paper we tackle the problem of finding an efficient data structure for file search using scientific metadata and propose a novel search structure, called HCTrie. Our goal is an environment where there are a few hundred attributes for every few billion (1011 ) files in the file system. We will use the attributes for the purpose of file search; this is in general a multidimensional information search. These few hundred attributes correspond to the number of dimensions in search. We assume that the data structure as a whole would not be able to fit into the primary memory, thus we consider the use of the memory in secondary storage. Intuitively, there seems to be no reason to limit the number of dimensions d to a certain (low) number, even when we require no performance degradation. No matter how many dimensions, the complexity (both in time and space) should be bounded by the number of data entries n, if each internal branch node in the structure divides the n data in subgroups. Here, we want to remove the limitation on the number of dimensions and at the same time, leverage the locality and the differences in the rich attributes. Our primary goal is to provide a data structure with feasible growth in both time and space complexity against the number of dimensions, d. It is desirable to have a data structure that is not affected by the number of dimensions. Our secondary goal is efficient interaction between dimensions. We want to filter out the candidates in one dimension and at the same time effectively filter out that same candidates in other dimensions as well. Also, if we query entries in a combination of ranges in a few dimensions, we do not want entries outside the range to be returned, as much as possible. HCTrie scales well in the number of dimensions, utilizes the differences in the combination of dimensions, and is efficient in multi-dimensional range search in that the range can be handled in an aggregate unit of search space hypercube. The prototype implementation in this paper shows results for range search on one hundred dimensions, and the implementation supports up to 512 dimensions. c 2013 IEEE 978-1-4799-0218-7/13/$31.00 k_1 = 2, k_2 = 6, k_3 = 3 0 0 1 0 HCTrie node B-‐Tree 0 1 1 0 0 0 1 1 L_0 = 0|0|0 = 0 L_1 = 0|1|0 = 2 L_2 = 1|1|1 = 7 … 2d … …….… B-‐Tree B-‐Tree B-‐Tree B-‐Tree B-‐Tree … 2d … Fig. 2. Makelabel() in HCTrie. Each bit in the dimension keys (i.e., k1 = 2, k2 = 6, and k3 = 3) constitutes a label of a hypercube search space. For example, L0 is the label of the corresponding hypercube in the level 0, L1 is the one in the level 1, and so on. TABLE I. Fig. 1. HCTrie: the search space is divided in each dimension to construct the hypercube search spaces in each level of the tree. Every hypercube space is registered in the upper level hypercube’s B-Tree, enabling a space efficient descendant link array. II. HCT RIE HCTrie can be thought of as a simple extension of Quadtree [21]. The major difference is that the descendant pointer array is implemented in a B-Tree [7], [15]. This way, only the necessary descendant pointers in the 2d space are held, although there is the regular B-Tree memory overhead1 . Furthermore, efficient utilization of the secondary memory storage device (e.g., HDDs) can be achieved with the notion of disk pages. HCTrie divides the search space into d-dimensional hypercubes. Each hypercube at depth i corresponds to the 2i divided interval in each of the d-dimensions. Each hypercube is labeled with a number derived by each decomposition in dimension, whether it is HIGH(1) or LOW(0) within the parent hypercube. If it has only a single dimension, it is equivalent to Trie [10], so HCTrie can be thought of as a Trie on ddimensional hypercubes. The notion of 3-D HCTrie is illustrated in Figure 1. The left side of the figure depicts the division of the search space into halves in all dimensions, producing smaller hypercubes in each level of HCTrie. Only the lowest hypercubes in HCTrie (called leaf hypercubes) have pointers to the data entries. B-Trees serve as the link array from the ascendent hypercube to the descendent hypercubes. This enables efficient memory space usage, without spending the memory for a large number of empty pointers, even with high dimensionality. For example, despite the fact that there are 2100 subspaces when d = 100, at most n subspaces need to be stored in the B-Tree. (Note, we assume n 2d .) If the division in a level of HCTrie does not sufficiently distinguish the n data entries, the number of subspaces registered in that level’s B-Tree is smaller, and the difference is treated in a deeper level of HCTrie. In order to put a hypercube in the B-Tree, it is necessary to have an order relation between hypercubes. HCTrie simply extracts the i-th bit from keys in all dimensions. Figure 2 illustrates the makelabel() function. The makelabel() function appears in Algorithm 1 in Line 4. 1 Each storage space overhead within B-Tree nodes is guaranteed to be less than half of the disk page size, except the root node [15]. Notation d w B n K, ki R = (K s , K e ) N OTATIONS AND D ESCRIPTIONS Description the number of dimensions. the maximum length of key in all d dimensions. w = maxdi=1 length(ki ) the order of B-Tree. (i.e., the number of branches.) the number of data entries. the search query, i.e., K = (k1 , k2 , ..., kd ), where ki is the search key in the i-th dimension. the range of the search query. K s and K e are the beginning and the end of the range. A. Algorithms and complexity Table I describes the notation used in this paper. Algorithm 1 HCTrie Lookup. 1: procedure HCT RIE L OOKUP(K = (k1 , k2 , ..., kd )) 2: i ← 0, h0 ← HCTrie.root 3: for (i < w ∧ hi is not leaf) do 4: Li ← makelabel (K, i) 5: hi+1 ← hi .B-Tree lookup (Li ) 6: i←i+1 7: end for 8: return hi 9: end procedure Algorithm 2 HCTrie Range Search. 1: procedure HCT RIE R ANGE S EARCH(R = (K s , K e )) 2: S ← ∅, Q ← Q ∪ HCTrie.root 3: for all hi ∈ Q do 4: Lsi ← makelabel (K s , hi .level) 5: Lei ← makelabel (K e , hi .level) 6: b ← hi .BTreeLeafHead (Lsi ) 7: do 8: if (b.data is a hypercube) then 9: hi+1 ← b.data 10: Q ← Q ∪ hi+1 11: else 12: S ← S ∪ b.data 13: end if 14: b ← hi .BTreeLeafNext (b) 15: while b.key ≤ Lei 16: end for 17: return S 18: end procedure Algorithm 1 is the function to look up the corresponding hypercube. It inputs d-dimensional keys K = (k1 , k2 , ..., kd ), and returns the corresponding hypercube, hi . The algorithm starts at level 0, taking the root hypercube of HCTrie. The maximum depth is bounded by the number of bits in the dimension keys, w. For example, if the “long long unsigned int” is the longest type of field in the d keys, then w = 64. At each level i, the i-th bit in the keys are extracted to compose a corresponding hypercube label Li , as illustrated in Figure 2. Database Size (MiB) 1200 1000 id (uint64, 8B): 999 ti1 (int8, 1B): 1 ti2 (int8, 1B): 4 ti3 (int8, 1B): 1 ti4 (int8, 1B): 2 ti5 (int8, 1B): 3 ti6 (int8, 1B): 0 ti7 (int8, 1B): 3 ti8 (int8, 1B): 4 : ti97 (int8, 1B): 1 ti98 (int8, 1B): 1 ti99 (int8, 1B): 2 ti100 (int8, 1B): 1 MYI/index MYD/data 800 600 400 Fig. 4. A sample data entry from the random synthetic data. Each attribute is shown as a list of attribute name, type, length in bytes, and the value. The value takes a random value from the range [0..4] to simulate the low cardinality of scientific metadata. 200 0 M /5 M /5 ie Tr C H M M /1 ie Tr C H ie Tr C H /1 8 8 12 5 )/c 01 /(1 )/c 12 )/c 01 /(1 5 5 )/c 01 )/c 01 5 /c 01 /(1 /(1 4) /(6 /(1 1M L/ Q 1M L/ Q ie Tr yS C H M yS M Fig. 3. Database size, including the index and data file size. “MYI/index” shows the index, and “MYD/data” shows the data. The MyISAM engine was used for MySQL. The labels are formatted as “name/# of data entries/(# of columns)/cardinality”. “MySQL/1M/(101)/c5” was calculated from “MySQL/1M/(64)/c5”, since MySQL supports only a maximum of 64 indices. The difference between the index size of MySQL and HCTrie can be attributed to the number of trees: MySQL has larger number of relatively small trees, and HCTrie has one larger tree. With five million dataset with cardinality 5 and 128, the index file and the data file sizes in HCTrie were both approximately 515 MiB, 1 GiB in total. (“ti” stands for tiny integer). The size of a data entry is thus 108 bytes. The “id” attribute is a sequentially assigned number, and serves also to indicate the location of data in the data file. For example, the data entry with “id” 100 can be found at the file offset of 10800 bytes. We assume that the “id” is not used in querying a data entry, since to find the location of the data is the major purpose of the index. Each of the “ti” attributes can take a value from -127 to 127. However in this comparison, we suppress the number of distinct value to 5, simulating the low cardinality of the scientific metadata. The value is random from 0 to 4. Li is then used to search the B-Tree of the i-th hypercube to find the descendant hypercube (at level i + 1). B. Range Search Performance Algorithm 2 shows the range search function. The input R stores the beginning of the range, K s , and the end of the range, K e . The range search function conducts a breadth first search from the HCTrie root using the queue Q. Within the breadth first search, the B-Tree in the current hypercube hi is searched for the label Lsi that corresponds to the start of the range K s in the appropriate level (hi .level). From Lsi through Lei , the B-Tree is traversed while the B-Tree key is in the range. The collected data entries are returned through S. Figure 5 shows the performance measurement against MySQL, on one million random synthetic data. It was conducted on a host with Intel Xeon E3-1230 (3.20GHz) with 16GB memory and with the Intel SSD SSDSC2CT12. The MySQL version was 5.5.29, using the default configuration. We used the mysql command to measure the query execution time in MySQL, which includes an SQL parse time and communication time. Since we assumed out of index queries, and because MySQL limits the number of indices to 64, we did not create a MySQL index. The InnoDB MySQL engine was used. The worst-case scenario for the space complexity in HCTrie is that every difference in data entries makes a branch of factor 2, and all of these branches are realized by B-Trees which consist of only the root node in all hypercubes. The number of branches (and, equivalently, the number of nodes) in the binary tree that classifies n entries is (2n − 1). A disk page is consumed for each of these branches, and the size of the disk page can be derived as Bd, since B d-dimensional keys must be held in a disk page. Hence, the space complexity is O(Bdn). This means that HCTrie scales linearly in growth of both d and n. Note that HCTrie can benefit from the use of improved versions of B-Trees. For example, a B*-Tree [7] can be used to improve the memory efficiency guarantee, or a Streaming B-Tree [5] can be used to speed up the update and range search performance. III. E VALUATION A. Random Synthetic Data We evaluate our prototype of HCTrie against MySQL, on a random synthetic dataset. Figure 4 illustrates the sample random data entry. The random synthetic data consists of a 64bit unsigned “id” attribute and one hundred 8-bit “ti” attributes In the evaluation, range query searches were issued over one hundred dimensions. Our prototype implementation supports only search ranges that correspond to the binary divided search space boundaries. The range of the query for each dimension was [0..3] when specified, and [0..127] when unspecified. The x-axis of Figure 5 describes the number of unspecified dimensions. For example, the point x = 20 means that the first 20 dimensions are unspecified (i.e., [0..127]), and the latter 80 dimensions are specified with the range [0..3]. With up to 70 unspecified dimensions, HCTrie outperformed MySQL. MySQL took about 1.5 seconds, and HCTrie took less than 0.5 seconds. The line labeled “HCTrie-First1” is the time taken to search the index structure. The line labeled “HCTrie-First2” includes the time taken by the range match check. Since HCTrie may still return candidates that do not fall into the query range, it is necessary to load the entire data entry and check the matching status. In our prototype implementation, when the query range is larger and many candidates are in the range, the loading of data entries impacted the overall performance. For example, the line labeled “HCTrie-First2” in Figure 5 took more time than MySQL at 80 unspecified dimensions with 11,452 candidates. In our case of cardinality 0.1 10000 0.01 1000 0.001 100 0.0001 10 1e-05 1 100000 0.1 10000 0.01 1000 0.001 100 0.0001 10 1e-05 1 1e+06 1 100000 0.1 10000 0.01 1000 0.001 100 0.0001 10 1e-05 1 0 0 11 10 90 80 70 60 50 40 30 20 1e+07 10 1e+06 1 100000 0.1 10000 0.01 1000 0.001 100 0.0001 10 1e-05 1 11 10 90 80 70 60 50 40 30 20 10 0 0 Figure 6 illustrates an experiment with the unspecified dimensions at the end. For example, x = 40 means that the first 60 dimensions are specified (i.e., [0..3]) and the 1e+08 MySQL # of Rows HCTrie-First1 HCTrie-First2 100 0 Furthermore, the line “HCTrie-First1” shows that the index structure returns candidates quickly and indicates that what takes time is the data loading and the last check. We may be able to use HCTrie similar to Bloom filter [13], where candidates can be narrowed down quickly. Fig. 7. Range query performance evaluation on one million 100-dimensional data entries with the cardinality 5. x-axis is the number of unspecified dimensions randomly chosen in the dimension order. The averages of 100 times are shown with standard deviations. MySQL data is just projected from the previous figure for comparison. The result is similar and only slightly better than Figure 5. 0 Overall, if more than 30 out of 100 dimensions can be specified, HCTrie is more beneficial to use than MySQL, in our search case. Our query was loose: the range of [0..3] is specified where only [0..4] is possible. Still, differences in different dimensions can effectively be used to prune the candidates quickly, and thus contributes to the search performance. 1000 -1 Line “HCTrie-First1” in Figure 5 remains constant at around 0.15 seconds. This shows that for more than 10 unspecified dimensions, HCTrie had to check all the B-Tree leaf slots. In other words, HCTrie only took around 0.15 seconds to check one million data entries to see if the differentiating part indexed for the data entry matched the query range. Performance is affected only by the number of candidates found. Note that candidates out of the query range may be returned, depending on the characteristics of the dataset. 10 0 Number of Unspecified Dimensions (Columns) Fig. 6. Range query performance evaluation on one million 100-dimensional data entries with the cardinality 5. x-axis is the number of unspecified dimensions from the last (not first, compared to Figure 5) in the dimension order. The performance is slightly better than Figure 5, because specifying in the first dimensions makes the most significant bits in hypercube label specified, thus enables the fast B-Tree leaf traversal. With 90 unspecified dimensions, only 10 dimensions are specified for the range search query, thus the number of matching data entries increases significantly. The line with the label “# of Rows” indicates the number of matching data entries as the result of the query, using the right y-axis. As the query range becomes larger and the matching entries increase, the performance of HCTrie degrades, due to the aforementioned data loading and the last check. Our prototype implementation took around 29 and 87 seconds for 90 and 95 unspecified dimensions, respectively. 0 -1 0 0 Number of Unspecified Dimensions (Columns) 5 in each dimension, all the candidates found were in the query range. 1e+07 10 11 10 90 80 70 60 50 40 30 20 0 10 0 Fig. 5. Range query performance evaluation on one million 100-dimensional data entries with the cardinality 5. x-axis is the number of unspecified dimensions from the first in the dimension order. The rest of the dimensions are specified with the range [0..3]. “HCTrie-First1” shows the time taken to return candidates, and “HCTrie-First2” includes, in addition, the time taken to load the entire data and check the matching status. “# of Rows” indicates the number of results using the right y-axis. MySQL performed at around 1.5 seconds, while HCTrie returned candidates (“HCTrie-First1”) in around 0.14 seconds. 1 -1 0 0 11 10 90 80 70 60 50 40 30 20 0 10 0 -1 Number of Unspecified Dimensions (Columns) 1e+06 Number of Data Entries (Rows) Found 100000 10 1e+08 MySQL # of Rows HCTrie-Random1 HCTrie-Random2 100 Number of Data Entries (Rows) Found 1 1000 1e+07 Time Taken to Range Search (seconds) 1e+06 1e+08 MySQL # of Rows HCTrie-Last1 HCTrie-Last2 100 Number of Data Entries (Rows) Found 10 1000 Time Taken to Range Search (seconds) 1e+07 Time Taken to Range Search (seconds) 1e+08 MySQL # of Rows HCTrie-First1 HCTrie-First2 100 Number of Data Entries (Rows) Found Time Taken to Range Search (seconds) 1000 Number of Unspecified Dimensions (Columns) Fig. 8. Range query performance evaluation on five million (not one million, compared to previous experiments) 100-dimensional data entries with the cardinality 5. x-axis is the number of unspecified dimensions from the first in the dimension order. MySQL performed at around 7.6 seconds, while HCTrie returned candidates (HCTrie-First1) in around 1.7 seconds. last 40 dimensions are unspecified (i.e., [0..127]). Since the first dimensions are specified, the most significant bits of the hypercube labels are specified, thus contributing to the B-Tree lookup speed. This, in turn, also contributes to the overall performance. Figure 7 shows the randomly chosen unspecified dimension case. The results were only slightly better than the worst case (i.e., Figure 5 where first portion was unspecified) in our experiment. Figure 8 illustrates the case of Figure 5 with five million data entries. It shows that the advantage of HCTrie is approximately 6 seconds in most cases, which is a further improvement compared to Figure 5. IV. R ELATED W ORK An extensive list of access methods is summarized by Gaede and Günther [11]. Past studies on multidimensional search structure mainly focus on only a few dimensions, typically two or three, and generally less than ten. They typically have 2d growth in space for d-dimensions, like a natural extension of Quadtree would, because of their need to provision a pointer array for all possible numbers of ddimensional spaces. We also thank the faculty and students of SSRC for their help and guidance, especially Darrell Long, Ethan Miller, Aleatha Parker-Wood, Brian Madden, and Christina Strong, for their valuable comments. B-Trees are a one-dimensional access method that is widely used today, both in DBMSes (e.g., MySQL) and in file systems (e.g., Btrfs [20]). The fact that only one dimension can be indexed leads to the problem of composite indexing in a DBMS, where the combinations of columns need to be considered. R EFERENCES Binary Space Partitioning (BSP), a family of structures or a notion of dividing a space by hyperplanes, is widely used in computer graphics [19]. Also, families of Quadtree [21] and R-Tree [3], [4], [12] handle multidimensional keys. However, they target only two or three dimensions, and do not suit indexing one hundred dimensions. While a family of Quadtree [21] conducts a simple and intuitive division of multi-dimensional space, it supports only a low number of dimensions for the multi-dimensional search, such as two or three dimensions. If the number of dimensions, d, increases by hundreds, the size of the pointer array in Quadtree grows with 2d (e.g., 2100 ≈ 1030 ), which does not scale. [1] [2] [3] [4] [5] [6] [7] [8] K-d-Trees [6] have a good scalability against the number of dimensions d: they have the advantage that the number of dimensions does not impact scalability. However, they lose the data entry’s locality, hence it deteriorates on range search performance. The average cost of range search in a K-d-Tree is dominated by the overwork (i.e., a redundant and theoretically avoidable cost), and the complexity has been given by Duch and Martínez [8]. [10] B-Trie [2] combines B-Trees and Trie. However, it assumes only a single dimensional string key, and does not support multidimensional keys. [13] [9] [11] [12] [14] V. C ONCLUSION We have presented a novel multidimensional search structure called HCTrie. HCTrie can support a hundred of dimensions, which is progress compared to today’s support of only several dimensions in multidimensional search structures. On a 100-dimensional random synthetic dataset which simulates the low cardinality of scientific metadata, HCTrie showed better performance than MySQL, when a range is specified in more than 30 dimensions. [15] [16] [17] [18] This novel structure is expected to be beneficial in file system search, in which we will combine many types of functions to provide a new type of file search. [19] ACKNOWLEDGMENT [20] This research was supported in part by the NSF under awards IIP-0934401 and CCF-0937938, and by the Dept. of Energy under awards DE-FC02-10ER26017/DE-SC0005417. We also thank the industrial sponsors of the Storage Systems Research Center for their generous support. [21] [22] S. Ames, M. Gokhale, and C. Maltzahn, “QMDS: A file system metadata management service supporting a graph data model-based query language,” in Networking, Architecture and Storage (NAS), 2011 6th IEEE International Conference on, Jul. 2011, pp. 268–277. N. Askitis and J. Zobel, “B-tries for disk-based string management,” The VLDB Journal, vol. 18, no. 1, pp. 157–179, Jan. 2009. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*tree: an efficient and robust access method for points and rectangles,” in ACM SIGMOD, 1990, pp. 322–331. N. Beckmann and B. Seeger, “A revised R*-tree in comparison with related index structures,” in ACM SIGMOD, 2009, pp. 799–812. M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson, “Cache-oblivious streaming b-trees,” in Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, ser. SPAA ’07, 2007, pp. 81–92. J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Commun. ACM, vol. 18, September 1975. D. Comer, “Ubiquitous B-Tree,” ACM Comput. Surv., vol. 11, no. 2, pp. 121–137, Jun. 1979. A. Duch and C. Martínez, “Improving the performance of multidimensional search using fingers,” J. Exp. Algorithmics, vol. 10, Dec. 2005. A. Floratou, J. M. Patel, W. Lang, and A. Halverson, “When free is not really free: What does it cost to run a database workload in the cloud?” in TPCTC, ser. Lecture Notes in Computer Science, vol. 7144. Springer, 2011, pp. 163–179. E. Fredkin, “Trie memory,” Commun. ACM, vol. 3, no. 9, pp. 490–499, Sep. 1960. V. Gaede and O. Günther, “Multidimensional access methods,” ACM Comput. Surv., vol. 30, no. 2, pp. 170–231, Jun. 1998. A. Guttman, “R-trees: a dynamic index structure for spatial searching,” in ACM SIGMOD, 1984, pp. 47–57. F. Hao, M. Kodialam, and T. V. Lakshman, “Building high accuracy bloom filters using partitioned hashing,” in ACM SIGMETRICS, 2007, pp. 277–288. S. Jones, C. Strong, A. Parker-Wood, A. Holloway, and D. D. E. Long, “Easing the Burdens of HPC File Management,” in Proceedings of the 6th Parallel Data Storage Workshop (PDSW ’11), Nov. 2011. D. E. Knuth, The art of computer programming, volume 3: sorting and searching. Addison-Wesley-Longman, 1973. P. Kogge et al., “ExaScale Computing Study: Technology Challenges in Achieving Exascal e Systems,” DARPA / IPTO, Tech. Rep., 2008. A. Leung, M. Shao, T. Bisson, S. Pasupathy, and E. L. Miller, “Spyglass: Fast, scalable metadata search for large-scale storage systems,” in Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), Feb. 2009, pp. 153–166. NASA, “WISE All-Sky Release Explanatory Supplement: Data Products,” http://wise2.ipac.caltech.edu/docs/release/allsky/expsup/sec2_2a. html, 2012. M. S. Paterson and F. F. Yao, “Optimal binary space partitions for orthogonal objects,” Journal of Algorithms, vol. 13, no. 1, pp. 99–113, 1992. O. Rodeh, J. Bacik, and C. Mason, “BTRFS: The Linux B-tree Filesystem,” IBM, Tech. Rep., Jul. 2012. H. Samet, “The quadtree and related hierarchical data structures,” ACM Comput. Surv., vol. 16, no. 2, pp. 187–260, Jun. 1984. M. Seltzer and N. Murphy, “Hierarchical file systems are dead,” in Proceedings of the 12th Workshop on Hot Topics in Operating Systems (HotOS-XII), 2009.
© Copyright 2026 Paperzz