【2019学术报告04】Approximate Analytic Queries over Compressed Time Series with Tight Deterministic Error Guarantees
We provide sound and tight deterministic error guarantees for approximate queries over compressed data. The query expressions are compositions of well-known linear algebra operators over vectors, along with arithmetic operators. Such queries can express most common statistics, including statistics (such as correlation and cross-correlation) that combine multiple time series. The queries are evaluated over classic time segment-based compressions of IoT time series. The error guarantees computations use common and straightforward precomputed error measures for each segment. Overall, the error guarantees are applicable (as an add-on) to most prior works on compression. In particular, the compression may use either fixed-length segmentation or (the far more effective in practice) variable-length segmentation. The estimation functions used to approximate the actual values may come from any function family.
This work identifies two broad estimation function families (namely, the already existing Vector Space family (VS) and the presently defined Linear Scalable Family (LSF)) that lead to theoretically and practically high-quality guarantees, even for queries that combine multiple time series. Well known function families (e.g., the polynomials) belong to LSF. The theoretical aspect of "high quality" is crisply captured by the Amplitude Independence (AI) property: An AI guarantee does not depend on the amplitude of the involved time series, even when we combine multiple time series. The experiments on four real-life datasets validated the importance of the Amplitude Independent (AI) error guarantees: When the novel AI guarantees were applicable, the guarantees could ensure that the approximate query results were very close (typically 1%) to the true results.