# s409 guha

Published on January 3, 2008

Author: Pumbaa

Source: authorstream.com

SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS:  SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENN Synopses:  Synopses Given n input numbers, summarize the input using B numbers, minimizing some error. Examples Histograms – piecewise constant repn. Wavelets – uses the wavelet basis Fourier, Bessel, SVD, what have you… Why space efficiency:  Why space efficiency “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. From a computational viewpoint however… Space is the cruelest resource :  Space is the cruelest resource Resources Time : tweedle thumbs Access (stream): make more passes Program simply will not run – or if data is shifted to disk, will run quite slow(er). Further, if we had more space, maybe we can compute a better (more accurate) synopsis Examples - I:  Examples - I Histograms Many error measures V-OPT, Jagadish etal, 1998 O(n2B) time O(nB) space Only O(n) space at a time (working space) O(n2B2) time and O(n) space Is that the best ? Here: O(n2B) time O(n) space. Example - II:  Example - II (Haar) Wavelets Orthonormal systems For l2 error store the largest B coeffs of input Does not work for non l2 Find the best B coeffs to retain (note, restricted). Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) Here O(n) space, and O(n2) time Example - III:  Example - III Extended Wavelets Multiple measures Optimization is similar to Knapsack with choices. Previous best – Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) Here, O(BM) space What we will not talk about:  What we will not talk about Approximation algorithms for histograms Range Query Histograms Basically improvement of a factor B in space across the board. B is not always small, specially when n is large The main idea:  The main idea Can we solve using a non DP paradigm ? Well, divide & conquer … Small details – how do we divide ? Interaction Does a small interaction partitioning exist ? How (much size) to represent it ? Ease of finding it (in the given representation) ? A case study - Histograms:  A case study - Histograms Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 Consider one bucket. The mean is the best value. A natural DP … The DP for histograms:  The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n What if:  What if We could figure out what was the story at the middlepoint ! Two questions So what ? How ? (use a DP) Wait a minute …:  Wait a minute … We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ? The code:  The code The end of working space:  The end of working space If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. Working space = total space. How much is little ?:  How much is little ? Wavelets:  Wavelets A set of vectors {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution Wavelet Synopsis Construction :  Wavelet Synopsis Construction Formally, given a signal X and the Haar basis {i} find a representation F=i zi i with at most B non-zero zi minimizing some error which a fn of X-F Restriction. Zi is either 0 or h X,i i Debate. Unrestricted or restricted. Omit. Wavelets:  Wavelets ||X-F||1 Long history Matias, Vitter Wang ’98 Garofalakis, Gibbons, ’02 Garofalakis, Kumar, ’04 State of the Art O(n2B log B) time O(n2B) space O(nB) working space Here O(n2log B) time O(n) space SEE ALSO NEXT TALK … What happens to wavelets [GK04] ?:  What happens to wavelets [GK04] ? Extensions:  Extensions Approximation Algorithms Range Query Histograms Extended Wavelets Histograms:  Histograms Saves space across all algorithms except algorithms which extend to general error measure over streams Range Query:  Range Query Same story Open Q: faster algorithm obeying synopsis size That’s all folks:  That’s all folks

16. 08. 2007
0 views

12. 10. 2007
0 views

15. 10. 2007
0 views

22. 10. 2007
0 views

05. 09. 2007
0 views

05. 09. 2007
0 views

05. 09. 2007
0 views

05. 09. 2007
0 views

23. 10. 2007
0 views

23. 10. 2007
0 views

24. 10. 2007
0 views

24. 10. 2007
0 views

04. 10. 2007
0 views

02. 11. 2007
0 views

26. 10. 2007
0 views

02. 11. 2007
0 views

14. 11. 2007
0 views

17. 11. 2007
0 views

16. 08. 2007
0 views

16. 08. 2007
0 views

16. 08. 2007
0 views

16. 08. 2007
0 views

16. 08. 2007
0 views

28. 12. 2007
0 views

03. 01. 2008
0 views

07. 10. 2007
0 views

29. 10. 2007
0 views

05. 01. 2008
0 views

05. 01. 2008
0 views

09. 08. 2007
0 views

09. 08. 2007
0 views

09. 08. 2007
0 views

09. 08. 2007
0 views

09. 08. 2007
0 views

07. 11. 2007
0 views

05. 09. 2007
0 views

22. 10. 2007
0 views

09. 08. 2007
0 views

17. 10. 2007
0 views

22. 10. 2007
0 views

28. 12. 2007
0 views

05. 09. 2007
0 views

26. 11. 2007
0 views

05. 09. 2007
0 views

02. 10. 2007
0 views

14. 02. 2008
0 views

20. 02. 2008
0 views

03. 01. 2008
0 views

04. 03. 2008
0 views

05. 09. 2007
0 views

10. 03. 2008
0 views

11. 03. 2008
0 views

18. 03. 2008
0 views

09. 08. 2007
0 views

25. 03. 2008
0 views

26. 03. 2008
0 views

26. 03. 2008
0 views

07. 04. 2008
0 views

30. 03. 2008
0 views

09. 04. 2008
0 views

10. 04. 2008
0 views

13. 04. 2008
0 views

14. 04. 2008
0 views

16. 04. 2008
0 views

17. 04. 2008
0 views

22. 04. 2008
0 views

29. 02. 2008
0 views

16. 08. 2007
0 views

23. 11. 2007
0 views

09. 08. 2007
0 views

15. 10. 2007
0 views

09. 08. 2007
0 views

11. 12. 2007
0 views

09. 08. 2007
0 views

14. 12. 2007
0 views

26. 11. 2007
0 views

29. 12. 2007
0 views

16. 06. 2007
0 views

16. 06. 2007
0 views

09. 08. 2007
0 views

13. 03. 2008
0 views

09. 08. 2007
0 views

03. 10. 2007
0 views

05. 09. 2007
0 views

02. 01. 2008
0 views

08. 10. 2007
0 views

15. 11. 2007
0 views

05. 09. 2007
0 views

05. 09. 2007
0 views

07. 12. 2007
0 views

21. 11. 2007
0 views

16. 08. 2007
0 views

09. 08. 2007
0 views

16. 10. 2007
0 views

12. 10. 2007
0 views