icde07 hagonzal

Information about icde07 hagonzal

Published on November 23, 2007

Author: VolteMort

Source: authorstream.com

Content

Cost Conscious Cleaning of Massive RFID Data Sets:  Cost Conscious Cleaning of Massive RFID Data Sets Hector Gonzalez, Jiawei Han, Xuehua Shen University of Illinois at Urbana Champaign Department of Computer Science The Database and Information System Laboratory ICDE 2007 Outline:  Outline Motivation RFID Data Error Sources Cleaning Methods Smoothing methods Rule based Methods DBN based methods Cost conscious cleaning Architecture Cleaning Methods Cleaning Sequence Cleaning Plan Induction Performance Study Motivation:  Motivation The reliability of current RFID systems is far from optimal. Under a wide variety of environments more than 50% of all tag readings are missed. The volume of data generated by RFID systems is huge A large retailer with hundreds of readers can generate thousands of tag readings every second. An accurate and efficient cleaning process is essential to the successful implementation of RFID technology. Example:  Example A large retailer with RFID readers at warehouses, distribution centers, and store backrooms. A variety of factors impact correct tag detections Diverse reader/tag manufacturers, generation Moving (conveyor belts, doors) and static tags (shelfs). Different levels of RF noise caused by metal or water in the environment or in products. No single method can efficiently clean such a large volume of data, generated under such diverse circumstances. RFID Data:  RFID Data Readers conduct interrogation cycles at periodic invervals An RF signal is issued Tags awake and transmit via RF their EPC (electronic product code) A singulation protocol is used to prevent tag collisions In order to improve accuracy, during a read cycle multiple interrogation cycles are issued. Readings of the form (EPC, Reader Time) are generated at the end of each read cycle. Readings have extra information such as: Total responses obtained during the read cycle Antenna used for detection, tag type, and signal strength Error Sources:  Error Sources There are two types of errors. False Negatives: A tag that is present is not detected. False Positives: A tag not present is detected. Why do we observe errors: Collisions: Multiple tags transmit simultaneously, or Multiple readers transmit simultaneously. Environment Interference: Metal or water near the tag or reader cause RF interference. Physical Configuration: The tag moves too quickly, or is located in a blind spot. Logical Errors: A door reader detects tags that go nearby but not through. Cleaning Methods:  Cleaning Methods A cleaning method M is a classifier that assigns a location to tag cases. A tag case is a tuple of the form (<EPC, time>,<f1,f2,…,fk>) Where each f_i is a feature (e.g. tag type, signal strength) The label assigned to each tag case is of the form: (<EPC,time>: location, confidence) Where confidence is in the rage of 0.0 – 1.0 and indicates the level of certainty about the location. Terminal classifiers do not provide a confidence values Fixed size smoothing window:  Fixed size smoothing window Fixed window smoothing The window is made up of the last k (fixed) read cycles. If there is any reading inside the window mark the tag as present Problems Difficult to define the best window size for different conditions Benefit Cheap method to apply, only requirement is to remember last k readings Truth: Readings: ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Smooth: Adaptive window size smoothing:  Adaptive window size smoothing Adaptive window size smoothing Change the size of the window according to the observed probability of tag detections Use a binomial model Let p, be the probability of detecting a present tag chose window size w, such that (1-p) w < threshold Benefits: We adapt the window size to the current conditions Problems: It can be expensive to store and maintain p for every single tag in the system All readings inside the window have the same weight, better to put more weight on recent readings Rule Based Cleaning:  Rule Based Cleaning Rules can be derived from the data or given by a domain expert. A door reader should only recognize tags with a bell shaped signal strength. The shelf reader should only recognize items that stay there for more than 5 minutes Rules can be derived from an RFID warehouse Flowgraphs can be used to complete missing readings and to decide location conflicts DBN Based Cleaning:  DBN Based Cleaning We can model tag detections using a dynamic but hidden process Tag readings correspond to noisy observations on the true, but hidden, location of the tag We dynamically update our belief on tag location based on the sequence of observations More recent observations weight higher on our belief DBNs allow us to differentiate between the following two cases: ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Should we detect this tag??? No question, detect tag !!! DBN Structure:  DBN Structure Present t-1 Present t Detect t-1 Detect t Transition Model Observation Model We learn the transition and observation models from data DBNs can represent complex structures e.g., variable for RF noise, and tag speed interacting with detection hidden Observed Belief state updating:  Belief state updating Belief Update Equation: Belief at time t+1 given all evidence up to t+1 Observation Model Update to belief state given transition model Belief state is updated dynamically, we do not need to remember a window of observations, we only need to keep the latest belief state. Cost Conscious Cleaning:  Cost Conscious Cleaning “Given a collection of cleaning methods, a set of labeled tag cases, and a cost model for each cleaning method, design an efficient cleaning plan that defines the conditions under which each cleaning method or sequence of cleaning methods should be applied” System Architecture:  System Architecture Labeled Data Cleaning Methods Cleaning Plan Induction Apply Plan RFID Stream Clean Data Offline Online Cost Model:  Cost Model In order to apply a cleaning method to a tag case we need to incur a cost Classification cost: cost in terms of cpu and storage of labeling each tag case. Amortized per tuple training cost: Cost to train the cleaning method, e.g., in a DBN we need to learn the transition and observation models. Error cost When we make an error in deciding the location of a tag, a cost is incurred. Error cost can be a scalar or a function of the distance of the correct location to the predicted one, or even the price of the item. Cleaning Sequence:  Ordered application of cleaning methods for a set M to a set of tag cases D SD,M = Ms1 → Ms2 → … → Msk Apply Ms1 to the entire data set D Apply Ms2 to the cases that Ms1 failed to classify … Apply Msk to the cases that every other method failed to classify The cost of applying a cleaning sequence C(SD,M) is the cost of applying each method as described plus the error cost on tag cases misclassified by every method Optimal cleaning sequence S*D,M is the cleaning sequence with minimal cost among all possible cleaning sequences given D, and M Cleaning Sequence Cleaning Sequence Approximation:  M1 Cleaning Sequence Approximation M2 M3 C(M1)=1, C(M2)=1.5, C(M3)=0.5, Error: 5 Accuracy Adjusted Cleaning Cost: Step 1 Step 2 Step 3 SD,M = M1 → M3 → M2 C(SD,M)= 1 + 0.52*0.5 + 0.43*1.5 + 0.19*5 Cleaning Plan:  Cleaning Plan Input D: Set of labeled tag cases: <(EPC,time),(features)> M: Set of available cleaning methods: {M1,M2,…,Mk} C: Cost model {C(M1),C(M2),…,C(Mk),C(Error)} Output A decision tree that splits D according to feature values. For each leaf in the tree a cleaning sequence is defined. Application For each test case, use feature values to get to appropriate leaf. apply cleaning sequence defined in the leaf. Available Features:  Available Features Tag features Communications protocol, Manufacturer, price, quality Detection history Reader features Number of antennas Protocol, price, vendor Location features type of area being covered (e.g. door, shelf, conveyor belt) Interference level (e.g. presence of metal or water) Item features Type of item, contents, price Cleaning Plan Induction:  Cleaning Plan Induction Use a traditional Top Down Induction of Decision Trees (TDIDT) algorithm Node splitting criteria: Split nodes based on expected cost reduction: Cleaning sequence cost before the split Average cost for each cleaning sequence after the split. Example:  Example We use 3 cleaning methods fix_1, DBN, and pat The label is 1 if the tag is indeed present, and 0 otherwise Each method predicts 1 for present, 0 for absent The cleaning plan selects when to apply each method, e.g. we should use fix_1 to clean cases from shelf readers when there is metal Labeled tag cases shelf door reader: metal: no yes Method: fix_1 Accuracy: 75% Method: dbn Accuracy: 100% Method: pat Accuracy: 100% Cleaning Plan Example of node splitting:  Example of node splitting C(fix_1)=1.0,C(DBN)=2.0, C(PAT)=2.0, C(Error)=5.0 Cleaning sequence for D DBN → pat: 2.0 + 0.33*2.0 + 0.11*5.0 = 3.21 DNB → fix: 2.0 + 0.16*1.0 = 2.16 pat: 2.0 Cost Reduction: 3.21 – (2.16*0.67 + 2.0*0.33) = 1.11 Experimental Setup:  Experimental Setup Data generator simulates a complex RFID system with multiple locations, readers, and tag types. The simulation is controlled by several parameters Item flow characteristics, paths traversed, predictable vs random movements Item speed Reader, tag, and item characteristics (e.g. protocol, manufacturer, item contents). Location characteristics and RF noise levels The simulation is run for a number of read cycles, at each cycle to probability of tag detection is a function of reader, item, tag, and location characteristics. Cleaning methods:  Cleaning methods We compare the performance of several cleaning methods DBN: a dbn based cleaner var: adaptive window smoothing fix_k: fixed (size k) window smoothing Rule based methods pat: a pattern recognition method based on signal strength shape maj: used to resolve multi reader detection conflicts by majority voting Cost Model C(DBN)=2.0, C(var)=2.0, C(fix_1) = 1.0, C(fix_3)=1.4, C(pat)=2.5, C(maj)=1.4, C(Error)=10 All methods are terminal except for DBN which uses P(tag present) as confidence Complex Setup I:  Complex Setup I readers in low noise readers detecting far away tags readers with variable detection rates reader at a conveyor belt readers detecting tags with water and metal In some areas there can be conflict of multiple patterns detecting the same tag Complex Setup II:  Complex Setup II Reader Setup:  Reader Setup Noise Level:  Noise Level Conclusions:  Conclusions A cost conscious cleaning framework for RFID data can increase accuracy at lower cost than any single cleaning method. The cleaning plan can be efficiently learnt from data by applying the idea of cleaning cost reduction to node splitting. DBN based cleaning methods capture the intrinsic dynamic behavior of tag detection, and deliver high accuracy at lower costs than smoothing window based techniques. Thanks:  Thanks

Related presentations


Other presentations created by VolteMort

Global Supply Chain
13. 04. 2008
0 views

Global Supply Chain

Green Accounting CT 2006
18. 04. 2008
0 views

Green Accounting CT 2006

Chapter 01
10. 04. 2008
0 views

Chapter 01

jmkfah
09. 04. 2008
0 views

jmkfah

climate and science 07
07. 04. 2008
0 views

climate and science 07

2006080708
26. 03. 2008
0 views

2006080708

wcor06
21. 03. 2008
0 views

wcor06

FP7 MTAPU
18. 03. 2008
0 views

FP7 MTAPU

apostles symbols
16. 08. 2007
0 views

apostles symbols

Your First Flex Application
28. 11. 2007
0 views

Your First Flex Application

G070221 00
28. 11. 2007
0 views

G070221 00

how we help you become pregnant
03. 12. 2007
0 views

how we help you become pregnant

greenfieldlrp
03. 10. 2007
0 views

greenfieldlrp

ch4
16. 11. 2007
0 views

ch4

Habitat El Bosque Tropical
20. 11. 2007
0 views

Habitat El Bosque Tropical

ghostgirl
21. 11. 2007
0 views

ghostgirl

serious blow
16. 08. 2007
0 views

serious blow

Gospel
16. 08. 2007
0 views

Gospel

Jesus Feeds 5000
16. 08. 2007
0 views

Jesus Feeds 5000

Exegeting Jesus Parables
16. 08. 2007
0 views

Exegeting Jesus Parables

KNEZEVIC Desimir
04. 10. 2007
0 views

KNEZEVIC Desimir

week9
14. 12. 2007
0 views

week9

coldwar
19. 12. 2007
0 views

coldwar

organic solvents
30. 12. 2007
0 views

organic solvents

ID172 LOGANALY
31. 12. 2007
0 views

ID172 LOGANALY

Meconium
04. 01. 2008
0 views

Meconium

dutch history fo rdummies
11. 12. 2007
0 views

dutch history fo rdummies

life cycle
09. 08. 2007
0 views

life cycle

2006Stats RRT C 5 3
09. 08. 2007
0 views

2006Stats RRT C 5 3

Ethical Issues at End of Life
09. 08. 2007
0 views

Ethical Issues at End of Life

Lecturx7
09. 08. 2007
0 views

Lecturx7

ESI toolkit
07. 11. 2007
0 views

ESI toolkit

Sure of faith
16. 08. 2007
0 views

Sure of faith

HARTLEY WSOWS 20040204
05. 09. 2007
0 views

HARTLEY WSOWS 20040204

Biodiesel web
09. 11. 2007
0 views

Biodiesel web

csce520 lect2
16. 11. 2007
0 views

csce520 lect2

SM Aula 8 v3
28. 12. 2007
0 views

SM Aula 8 v3

33Chris
04. 01. 2008
0 views

33Chris

Celebration of life martha
09. 08. 2007
0 views

Celebration of life martha

life as an astronomer
09. 08. 2007
0 views

life as an astronomer

Ilene Lewis
09. 08. 2007
0 views

Ilene Lewis

Deep Drill Active HT2
09. 08. 2007
0 views

Deep Drill Active HT2

issues
28. 12. 2007
0 views

issues

families
24. 02. 2008
0 views

families

2005 OSD CAIG Presentation
04. 03. 2008
0 views

2005 OSD CAIG Presentation

sears techmission values2005
16. 08. 2007
0 views

sears techmission values2005

GSutter ppts
24. 06. 2007
0 views

GSutter ppts

griffiths LD summit 08nov06 post
24. 06. 2007
0 views

griffiths LD summit 08nov06 post

Griffiths D4L 26apr07 v2
24. 06. 2007
0 views

Griffiths D4L 26apr07 v2

Charlotte English FINAL
12. 03. 2008
0 views

Charlotte English FINAL

FOREIGN POLICY 1920s 1930s
14. 03. 2008
0 views

FOREIGN POLICY 1920s 1930s

Florence Labord Moodle
24. 06. 2007
0 views

Florence Labord Moodle

Einführung MKT Übungen
24. 06. 2007
0 views

Einführung MKT Übungen

kt cohort seminars sep20 2006
24. 06. 2007
0 views

kt cohort seminars sep20 2006

Karpati EPICT Final Review
24. 06. 2007
0 views

Karpati EPICT Final Review

JCampbell ppt
24. 06. 2007
0 views

JCampbell ppt

Boise Police 2 1 05
06. 11. 2007
0 views

Boise Police 2 1 05

Cheryl Keenan COABE 050505
09. 08. 2007
0 views

Cheryl Keenan COABE 050505

Lum db v1
16. 08. 2007
0 views

Lum db v1

2007 April Poll Powerpoint
05. 09. 2007
0 views

2007 April Poll Powerpoint

Viennot
02. 10. 2007
0 views

Viennot

EUROPEAN E INVESTOR 20040921
03. 10. 2007
0 views

EUROPEAN E INVESTOR 20040921

PulmonaryThromboembo lism
19. 11. 2007
0 views

PulmonaryThromboembo lism

Raffle Pandemic Flu Planning
05. 09. 2007
0 views

Raffle Pandemic Flu Planning

Face
11. 10. 2007
0 views

Face

navypowerpt
30. 11. 2007
0 views

navypowerpt

IHSLG CPD ERes RCSI 07
24. 06. 2007
0 views

IHSLG CPD ERes RCSI 07

retreat petery
16. 08. 2007
0 views

retreat petery

Open Day talk StudentLife
05. 12. 2007
0 views

Open Day talk StudentLife

LSIDs
09. 08. 2007
0 views

LSIDs

HF diff
16. 08. 2007
0 views

HF diff

IKT taikymas pradiniame ugdyme
24. 06. 2007
0 views

IKT taikymas pradiniame ugdyme

Dylan Symposium Presentation
09. 08. 2007
0 views

Dylan Symposium Presentation

Are We There Yet
16. 08. 2007
0 views

Are We There Yet

h0ykom ren
24. 06. 2007
0 views

h0ykom ren

hAykom ren
24. 06. 2007
0 views

hAykom ren

Halton Data Fair Winner MsNeilly
09. 08. 2007
0 views

Halton Data Fair Winner MsNeilly

careers 07
09. 08. 2007
0 views

careers 07

IBC05B
02. 01. 2008
0 views

IBC05B

Eating the Elephant
26. 11. 2007
0 views

Eating the Elephant

LSLR pt2
09. 08. 2007
0 views

LSLR pt2

pgeog251 ch18 af
27. 11. 2007
0 views

pgeog251 ch18 af

henny eyova
24. 06. 2007
0 views

henny eyova

ksypolitos SCH TelematicServices
24. 06. 2007
0 views

ksypolitos SCH TelematicServices

Morgan town 2007
09. 08. 2007
0 views

Morgan town 2007

IMS State of Open Source 0606
24. 06. 2007
0 views

IMS State of Open Source 0606

h dillon training shoestring
24. 06. 2007
0 views

h dillon training shoestring

Click on SUNY for internetII
05. 09. 2007
0 views

Click on SUNY for internetII

kruul
24. 06. 2007
0 views

kruul

228 210 RSCEast PowerPressed
24. 06. 2007
0 views

228 210 RSCEast PowerPressed

kanayama
24. 06. 2007
0 views

kanayama

Businessplan NYSERnet
05. 09. 2007
0 views

Businessplan NYSERnet

gse program
24. 06. 2007
0 views

gse program

ica 20105 cert 2 it 2007
24. 06. 2007
0 views

ica 20105 cert 2 it 2007

Mukherjee sbatransition
09. 08. 2007
0 views

Mukherjee sbatransition

Kinsinger VA
09. 08. 2007
0 views

Kinsinger VA

ALA05O lszewski
29. 11. 2007
0 views

ALA05O lszewski

lec12 04 lcca
09. 08. 2007
0 views

lec12 04 lcca

I GIG2
16. 08. 2007
0 views

I GIG2