soda3

Information about soda3

Published on January 3, 2008

Author: Charlie

Source: authorstream.com

Content

Optimal Space Lower Bounds for All Frequency Moments:  Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT [email protected] The Streaming Model:  The Streaming Model … Stream of elements a1, …, aq each in {1, …, m} Want to compute statistics on stream Elements arranged in adversarial order Algorithms given one pass over stream Goal: Minimum space algorithm Frequency Moments:  Frequency Moments q = stream size, m = universe size fi = # occurrences of item i Define k-th Frequency Moment: Applications F_0 = # distinct elements in stream, F_1 = q F_2 = repeat rate Compute self-joins in database The Best Determininistic Algorithm:  The Best Determininistic Algorithm Trivial Algorithm for Fk Store/update frequency fi of each item i Space: m items i, log q bits for each fi Total Space = O(m log q) Negative Result [AMS96]: Any algorithm computing Fk exactly must use (m) space. Can we do better? Approximating Fk:  Approximating Fk Negative Result [AMS96]: Any deterministic algorithm that outputs x with |Fk – x| <  Fk must use (m) space. What about randomized approximation algorithms? Randomized algorithm A -approximates Fk if A outputs x with Pr[|Fk – x| <  Fk ] > 2/3 Previous Work:  Previous Work Upper Bounds: Can -approximate F0 [BJKST02], F2 [AMS96], Fk [CK04], k > 2 with space respectively: Lower Bounds: [AMS96] 8 k, –approximating Fk need (log m) space [IW03] -approximating F0 requires space if Questions: Does the bound hold for k  0? Does it hold for F0 for smaller ? First Result:  First Result Optimal Lower Bound: 8 k  1 and any  = (m-1/2), any -approximator for Fk must use (-2) bits of space. F1 = q computed trivially in log q space Fk computed in O(m log q) space, so need  = (m-.5) Technique: Reduction from 2-party protocol for computing Hamming distance (x,y) Idea Behind Lower Bounds:  Idea Behind Lower Bounds x 2 {0,1}m y 2 {0,1}m Stream s(x) Stream s(y) (1 § ) Fk algorithm A (1 § ) Fk algorithm A Internal state of A Compute (1 § ) Fk(s(x) ± s(y)) w.p. > 2/3 Idea: If can decide f(x,y) w.p. > 2/3, space used by A at least randomized 1-way comm. Complexity of f(,) S Alice Bob Randomized 1-way comm. complexity:  Randomized 1-way comm. complexity Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message sent: must be from Alice to Bob Comm. cost of protocol = expected length of longest message sent over all inputs.  -error randomized 1-way comm. complexity of f, R(f), is comm. cost of optimal protocol computing f w.p. ¸ 1- How do we lower bound R(f)? The VC Dimension [KNR] :  The VC Dimension [KNR] F = {f : X ! {0,1}} family of Boolean functions f 2 F is length-|X | bitstring For S µ X, shatter coefficient SC(fS) of S is |{f |S}f 2 F| = # distinct bitstrings when F restricted to S SC(F, p) = maxS 2 X, |S| = p SC(fS) If SC(fS) = 2|S|, S shattered by F VC Dimension of F, VCD(F), = size of largest S shattered by F Shatter Coefficient Theorem:  Shatter Coefficient Theorem Notation: For f: X £ Y ! {0,1}, define: fX = { fx(y) : Y ! {0,1} | x 2 X }, where fx(y) = f(x,y) Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( fX ), R1/3(f) = (log(SC(fX, p))) Hamming Distance Decision Problem (HDDP):  Hamming Distance Decision Problem (HDDP) We will lower bound R1/3(f) via SC(fX, t), but first, a critical lemma… Set t = (1/2) x 2 {0,1}t y 2 {0,1}t Alice Bob Promise Problem : (x,y) · t/2 – t1/2 (x,y) > t/2 f(x,y) = 0 OR f(x,y) = 1 Main Lemma:  Main Lemma S µ{0,1}n y = T = S-T Show 9 S µ {0,1}n with |S| = n s.t. there exists 2(n) “good” sets T µ S so that: 9 a separator y 2 {0,1}n s.t 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 8 t 2 S – T, (y,t) > n/2 Lemma Solves HDDP Complexity:  Lemma Solves HDDP Complexity Theorem: R1/3(f) = (t) = (-2). Proof: Alice gets yT for random good set T applying main lemma with n = t. Bob gets random s 2 S Let f: {yT }T £ S ! {0,1}. Main Lemma =>SC(f) = 2(t) [BJKS] => R1/3(f) = (t) = (-2) Back to Frequency Moments:  Back to Frequency Moments Idea: Use -approximator for Fk in a protocol to solve HDDP y 2 {0,1}t s 2 S µ {0,1}t Fk Alg Fk Alg State ay as ith universe element included exactly once in auxiliary stream ay (resp. as) if and only if yi (resp. si) = 1. Solving HDDP with Fk:  Solving HDDP with Fk Alice/Bob compute -approx to Fk(ay ± as) Fk(ay ± as) = 2k wt(y Æ s) + 1k (y,s) For k  1, Conclusion: -approximating Fk(ay ± as) decides HDDP, so space for Fk is (t) = (-2) Alice also transmits wt(y) in log m space. But How to Prove Main Lemma?:  But How to Prove Main Lemma? Recall: show 9 S µ {0,1}n with |S| = n s.t. there exists 2(n) sets T µ S so that: 9 a separator y 2 {0,1}n s.t 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 8 t 2 S – T, (y,t) > n/2 Use probabilistic method For S, choose n random elts in {0,1}n Show probability arbitrary T µ S satisfies (1),(2) is > 2-zn for constant z < 1. Hence expected such T is 2(n) So exists S with 2(n) such T Key Proving the Main Lemma:  Proving the Main Lemma Let T ={t1, …, tn/2} µ S be arbitrary Let yi = majority(t1,i, ..., tn/2,i) for all i 2 [m] What is probability p that both: 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 8 t 2 S – T, (y,t) > n/2 For 1, let x = Pr[8 t 2 T, (y,t) · n/2 – cn.5] For 2, let y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2 By independence, p = x ¢ y. It remains to lower bound x… The Matrix Problem:  The Matrix Problem WLOG, assume y = 1n (recall y is majority word) Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn.5] Equivalent to matrix problem: t1 -> t2 -> … tn/2 -> 101001000101111001 100101011100011110 001110111101010101 101010111011100011 Given random n/2 x n binary matrix w/each column majority 1, what is probablity each row has at least n/2 + cn.5 1s? Bipartite Graphs:  Bipartite Graphs Matrix Problem  Bipartite Graph Counting Problem: How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn.5 and each right vertex degree > n/2? … … Second Result:  Second Result Bipartite graph count: Probabilistic argument shows at least 2n^2/2 – zn/2 –n such bipartite graphs for constant z < 1. Analysis generalizes to show # bipartite graphs on m + n vertices w/each left vertex having degree > n/2 and each right vertex degree > m/2 is > 2mn-zm-n. Previous known count: 2mn-m-n [MW – personal comm.] Follows easily from a correlation inequality of Kleitman. Our proof uses correlation inequalities, but more involved analysis. Summary:  Summary Results: Optimal Lower Bound: 8 k  1 and any  = (m-1/2), any -approximator for Fk must use (-2) bits of space. Bipartite Graph Count: # bipartite graphs on m + n vertices w/each left vertex having degree > n/2 and each right vertex degree > m/2 is at least 2mn-zm-n for constant z < 1.

Related presentations


Other presentations created by Charlie

Personality Development
17. 11. 2007
0 views

Personality Development

History of Plastics
30. 04. 2008
0 views

History of Plastics

Juniper Networks 22 Nov 2005
28. 04. 2008
0 views

Juniper Networks 22 Nov 2005

CA Communications 02 20 08
18. 04. 2008
0 views

CA Communications 02 20 08

BharAloutlookISRI
17. 04. 2008
0 views

BharAloutlookISRI

direct basis
16. 04. 2008
0 views

direct basis

AnEconomicHistory English
14. 04. 2008
0 views

AnEconomicHistory English

Chap002
13. 04. 2008
0 views

Chap002

Financial Crisis
10. 04. 2008
0 views

Financial Crisis

NATO Today
23. 12. 2007
0 views

NATO Today

TM photo pp ppt
08. 10. 2007
0 views

TM photo pp ppt

2007 seminar 3
12. 10. 2007
0 views

2007 seminar 3

Roalddahl
12. 10. 2007
0 views

Roalddahl

micro credit presentation
15. 10. 2007
0 views

micro credit presentation

lecture 9 12 proteins 2007
16. 10. 2007
0 views

lecture 9 12 proteins 2007

Spanish American War
22. 10. 2007
0 views

Spanish American War

ppw 6 28 04
07. 10. 2007
0 views

ppw 6 28 04

PP2
23. 10. 2007
0 views

PP2

KATRINA TEACHERS GUIDEpr
04. 09. 2007
0 views

KATRINA TEACHERS GUIDEpr

ECA Knowledge Fair
31. 08. 2007
0 views

ECA Knowledge Fair

Automatic Indexing
31. 08. 2007
0 views

Automatic Indexing

ROLE OF JOURNALISTS UNION
31. 08. 2007
0 views

ROLE OF JOURNALISTS UNION

wendy
15. 11. 2007
0 views

wendy

Maximize Access Coverage
28. 11. 2007
0 views

Maximize Access Coverage

Notable Arborists
02. 10. 2007
0 views

Notable Arborists

INDEX OF SEGREGATION
07. 12. 2007
0 views

INDEX OF SEGREGATION

wilhelm2
04. 01. 2008
0 views

wilhelm2

FV1 day1
07. 01. 2008
0 views

FV1 day1

berdai
23. 10. 2007
0 views

berdai

weddings
11. 12. 2007
0 views

weddings

McDowell
29. 10. 2007
0 views

McDowell

usa jl
13. 11. 2007
0 views

usa jl

Construccion de un NOM
24. 10. 2007
0 views

Construccion de un NOM

TuLiP Overview
04. 09. 2007
0 views

TuLiP Overview

tulip
04. 09. 2007
0 views

tulip

HawaiiPresentation
17. 12. 2007
0 views

HawaiiPresentation

ompi tm cas 04 5
23. 10. 2007
0 views

ompi tm cas 04 5

Chapter12
03. 10. 2007
0 views

Chapter12

asian inc
29. 10. 2007
0 views

asian inc

Mat Prod L10
14. 02. 2008
0 views

Mat Prod L10

featurefilm
17. 10. 2007
0 views

featurefilm

EJ Genetic Research
24. 02. 2008
0 views

EJ Genetic Research

badagliacco
24. 02. 2008
0 views

badagliacco

Science and Warfare Lecture 1
26. 02. 2008
0 views

Science and Warfare Lecture 1

AHCIVI 1
27. 02. 2008
0 views

AHCIVI 1

student pressentation mngn
07. 11. 2007
0 views

student pressentation mngn

trucks 4 comm
28. 02. 2008
0 views

trucks 4 comm

bioweapons
04. 03. 2008
0 views

bioweapons

2007EMSVaccinationTr aining
10. 03. 2008
0 views

2007EMSVaccinationTr aining

Mehta diving and the environment
11. 03. 2008
0 views

Mehta diving and the environment

Perform Basis06 A0 en last
25. 03. 2008
0 views

Perform Basis06 A0 en last

wttcsantiago2007
26. 03. 2008
0 views

wttcsantiago2007

Living on Mars
07. 04. 2008
0 views

Living on Mars

lect22 handout
15. 10. 2007
0 views

lect22 handout

pedagogy
04. 09. 2007
0 views

pedagogy

FEE dev IHEP
31. 08. 2007
0 views

FEE dev IHEP

Rong Gen Cai
01. 12. 2007
0 views

Rong Gen Cai

mps break st louis
18. 06. 2007
0 views

mps break st louis

Moving on with Statistics
19. 06. 2007
0 views

Moving on with Statistics

Module 2 TAKS05
19. 06. 2007
0 views

Module 2 TAKS05

microsoft office overview
19. 06. 2007
0 views

microsoft office overview

Math in Middle School
19. 06. 2007
0 views

Math in Middle School

Math Concordance Show
19. 06. 2007
0 views

Math Concordance Show

Mary George
19. 06. 2007
0 views

Mary George

Lower Division
19. 06. 2007
0 views

Lower Division

Lecture Amiens
19. 06. 2007
0 views

Lecture Amiens

lady adalovelace
19. 06. 2007
0 views

lady adalovelace

Kelm
31. 08. 2007
0 views

Kelm

Oct06 CAC Presentation1
18. 06. 2007
0 views

Oct06 CAC Presentation1

NLI 0460
18. 06. 2007
0 views

NLI 0460

nicholas
18. 06. 2007
0 views

nicholas

NCLB Highly Qualified
18. 06. 2007
0 views

NCLB Highly Qualified

NCLB An dE Rate1029
18. 06. 2007
0 views

NCLB An dE Rate1029

MWR 07073
18. 06. 2007
0 views

MWR 07073

mtts product show
18. 06. 2007
0 views

mtts product show

OMSC
18. 06. 2007
0 views

OMSC

PACA 16 de agosto
22. 10. 2007
0 views

PACA 16 de agosto

dh firenze
19. 10. 2007
0 views

dh firenze

gridpp16 servicechallenges
24. 10. 2007
0 views

gridpp16 servicechallenges

lwi
19. 06. 2007
0 views

lwi

AnLiu IDAR07 nocomment
12. 10. 2007
0 views

AnLiu IDAR07 nocomment

VoIPSlides
12. 03. 2008
0 views

VoIPSlides

3 Russia 05
26. 10. 2007
0 views

3 Russia 05

Lynnand Marsha
19. 06. 2007
0 views

Lynnand Marsha

07 0314 k ahuja
28. 09. 2007
0 views

07 0314 k ahuja

PresJMorales
22. 10. 2007
0 views

PresJMorales

Math TEKS K5
19. 06. 2007
0 views

Math TEKS K5

me579 16 internetMC
15. 11. 2007
0 views

me579 16 internetMC

Briars
04. 09. 2007
0 views

Briars

Esm Juny 05 IESE tcm48 42493
01. 10. 2007
0 views

Esm Juny 05 IESE tcm48 42493

CESARE PACIOTTI
10. 10. 2007
0 views

CESARE PACIOTTI

kep engl2007
15. 10. 2007
0 views

kep engl2007

LCG Switzerland Phase 2
19. 10. 2007
0 views

LCG Switzerland Phase 2

stoddart
06. 03. 2008
0 views

stoddart

Ch Kor Symp00
13. 10. 2007
0 views

Ch Kor Symp00

xps seminar jan e
19. 06. 2007
0 views

xps seminar jan e

nys status Report 2006 2007
18. 06. 2007
0 views

nys status Report 2006 2007

JOSB
21. 11. 2007
0 views

JOSB

raffo phdthesis
07. 10. 2007
0 views

raffo phdthesis

parolari
03. 01. 2008
0 views

parolari

bailey
23. 10. 2007
0 views

bailey