Study on Query Optimization Methods of Segmented Time Series
|School||Huazhong University of Science and Technology|
|Course||Computer Software and Theory|
|Keywords||Data Mining Query Optimization Clustering Correlation Theme found Reverse neighbors Break Timing|
Timing is different from other types of sequence data, it is through the formation of discrete sampling time points. It is in the real world there is widespread in many areas, for example: biomedical, finance, meteorology, natural science. Timing processing technology is a very important and valuable technology, has been successfully applied in many important areas, such as: a sensor network monitoring, financial data analysis, DNA sequence analysis, tracking, and motion capture moving objects and so on. However, timing is a typical high-dimensional and massive data types, the current timing is still facing great challenges processing technology. Around the timing sequence segmentation and segmentation approach query optimization technology, launched a five aspects of the research work: Timing segmentation related processing technology, cluster-based query optimization for static timing method segmented partition-based static timing Reverse nearest neighbor query optimization method, dynamic segmentation specific timing pattern query optimization method, grid-based dynamic segmentation timing patterns related query optimization methods. Because of the timing sequence segmentation processing technology is important, for the lack of dynamic segmentation method of non-isometric timing problems in the detailed analysis of the static and dynamic timing timing isometric and non-isometric segmentation method based on the use point on the cumulative approximation (Piecewise Aggregate Approximation, PAA) and point-to-linear approximation (Piecewise Linear Approximation, PLA) incremental computing features, presents a multi-stream adaptive segmentation algorithm QONSP, and proved that it is only there is a linear time complexity. Experimental results show that, QONSP thousands able to dynamically adaptive timing segmentation parameter can be controlled by adjusting the precision and the average length of the segment. In order to improve static timing after subparagraph query efficiency, clustering studied the impact on the timing range queries. Presents a segmentation based isometric symmetry low boundary function SLBS, and prove lower bounds on its segmented timing of Euclidean distance. Use SLBS, given static timing based on clustering range query optimization method RQIC, it also integrates a static query optimization techniques, including: first-k filtering, triangle inequality pruning and low boundary filtering. Experimental results show that, RQIC in most data sets maintained a capacity of more than 60% of the trim, and with the PLA query performance close to the query method. For the current lack of B-tree index-based static timing sub query optimization method to study the segmented static timing Reverse nearest neighbor query optimization techniques. Break sequence by using the static nature of the universal adaptation: any two different timings, on the whole, and if they are (or) to maintain the local trend similar, they will be near the object, the static timing partition, and according to the similarity partition size and partition to partition split and merge, the timing of the partition object to a B-tree index. Finally, based on the famous algorithm iDistance given based filtering - refining framework query optimization methods RiDistance. Experimental results show that, RiDistance is effective, it is more efficient than serial scan method query 1-2 orders of magnitude faster. In the segment of the dynamic timing after the query processing, in order to improve the function of pattern matching existing rapid data flow model is difficult to adapt the length and amplitude variation problem, a lot of dynamic segmentation timing mode query optimization. Introduces a new model similar to the distance function and prove that it is a measure of the function, which can use the triangle inequality to accelerate pattern matching speed. Also gives a fast pattern matching algorithm based on statistical information and forecasting models may occur probabilistic algorithms. Financial data stream based on the experimental results show that, given a function similar to the pattern more than other similar functions to adapt the amplitude offset and scaling changes, most financial query methods to monitor the specific mode of the data stream. Further, for the subsequence matching queries related shortcomings and the lack of a dynamic environment theme discovery algorithm to study the timing of grid-based dynamic segmentation correlation query optimization methods. Introduced to adapt to changes in the length and magnitude of local pattern similarity function SDD, and prove that it meets the metric function properties. Using the aforementioned dynamic segmentation techniques and SDD, grid-based projection technology gives the correlation model MCALP, it is able to monitor multiple data stream minimum correlation (cross-correlation) and the maximum correlation (topic), and proved it The two theorems improve query performance efficiency. The model includes monitoring query methods MCPDG minimum correlation and P-topic query methods PMDGS. Based on financial data stream Experimental results show that the proposed method is effective query optimization, has only linear time and space complexity.