larsen jsm2003

Published on October 29, 2007

Author: Arley33

Source: authorstream.com

Comparison of Alternative Latent Class Clusterings of Survey Data:  Comparison of Alternative Latent Class Clusterings of Survey Data Michael D. Larsen University of Chicago/ Iowa State University Outline:  Outline Survey and variables Latent class models Comparing clusterings Some comparisons Conclusions and future plans Survey:  Survey 1997 Survey of Doctoral Recipients NSF survey every 2 years 1 of 3 surveys in SESTAT database Respondents PhDs 1990-1996 Physical (n=2216) and biological (n=1019) sciences, engineering (n=516) Work in higher educational institutions Variables:  Variables Demographics: Sex, Race, Ethnicity, Age, etc. %F: biology (49%), physical (33%), eng. (23%) Several sets on career preparation Limitations on career path job searches Work activities Job search resources (which used?) Adequacy of PhD program career preparation Assorted other questions (e.g., postdoc?) One set of variables example:  One set of variables example Adequacy of career preparation Very adequate vs. Somewhat or not adeq. 11 areas (211 table) Biology, 3 significant differences, F vs. M Communication (F>M) z= 2.73 Ethics (F>M) z= 2.48 Computer (M>F) z= -2.58 Why cluster?:  Why cluster? Interest in clusters themselves Are there identifiable groups? Are clusters stable over time? Are the clusters related to demographic subpopulations? How do outcomes vary across clusters? Latent Class Models:  Latent Class Models G latent classes (subpopulations) K categorical variables define contingency table, each person in one cell of table Observed pattern of responses in table is mixture of patterns from latent classes. Response probability on each variable (conditionally) independent within each class (prob’s differ across classes). Latent Class Models, cont.:  Latent Class Models, cont. P(response pattern) = sum over classes of [ P(class) P(response pattern | class) ] EM algorithm (Dempster, Laird, Rubin 1977) Compute P(class | response pattern). Comparing clusterings:  Comparing clusterings Different sets of variables will group respondents differently. Cross tabulations Adjusted Rand Index (ARI) Rand Index = # of pairs in same cluster ARI = (Rand – Exp.)/(Max –Exp.) -- assumes hyper geometric distribution Calibrating the ARI (or other):  Calibrating the ARI (or other) Simulation Generate 1000 samples from the hyper geometric distribution, which corresponds to null of no association Compute ARI for 1000 samples Report # of samples >= ARIobserved A comparison:  A comparison Biology, Adequacy of Career Preparation Communication, ARI = 0.002, tail = 0.015 Ethics, ARI = 0.039, tail = 0.039 Computer, ARI = 0.002, tail = 0.021 4 latent classes (interesting patterns) ARI value is lower, tail area is larger Comments:  Comments ARI values are not large (not near 1) for tables with large n Simulated values are similar to P-values from standard tests Small ARI values can be significant in the way that small log odds (near 0) can be significant for large n Latent classes fit better than simple classifications, but ARI doesn’t increase. More on comment 4.:  More on comment 4. Two classes (females, males) and CI. vs. Four latent classes (based on BCI) and CI. Latter fits (much) better. ARI not larger than largest on individual variables. Future plans:  Future plans 1. Repeat on next waves (1999, 2001) 2. Additional comparison methods: Diversity measures Slight modification of ARI Machine Learning, Stats, Discovery, 2003, Marina Meila, U of Washington 3. Missing data (DK, RF, Missing) References:  References Larsen, Statistics in Transition, 2003 Larsen, submitted to “Retaining Women in Early Academic SMET Careers,” 2002, under revision Hubert and Arabie, 1985, J. of Classification NSF, EIA-0089930, ITWF Contact Information:  Contact Information Mike Larsen, U of Chicago, Statistics [email protected] http://galton.uchicago.edu/~larsen/jsm03 Email for contact at Iowa State University, Statistics

02. 01. 2008
0 views

26. 02. 2008
0 views

02. 10. 2007
0 views

07. 10. 2007
0 views

12. 10. 2007
0 views

12. 10. 2007
0 views

16. 10. 2007
0 views

17. 10. 2007
0 views

22. 10. 2007
0 views

11. 09. 2007
0 views

11. 09. 2007
0 views

11. 09. 2007
0 views

11. 09. 2007
0 views

09. 10. 2007
0 views

16. 10. 2007
0 views

25. 10. 2007
0 views

26. 10. 2007
0 views

11. 09. 2007
0 views

23. 10. 2007
0 views

15. 11. 2007
0 views

26. 11. 2007
0 views

14. 12. 2007
0 views

22. 11. 2007
0 views

28. 09. 2007
0 views

30. 12. 2007
0 views

07. 01. 2008
0 views

17. 10. 2007
0 views

02. 11. 2007
0 views

22. 10. 2007
0 views

15. 10. 2007
0 views

16. 11. 2007
0 views

16. 02. 2008
0 views

20. 02. 2008
0 views

24. 02. 2008
0 views

17. 10. 2007
0 views

28. 02. 2008
0 views

19. 10. 2007
0 views

19. 11. 2007
0 views

07. 12. 2007
0 views

26. 03. 2008
0 views

07. 04. 2008
0 views

30. 03. 2008
0 views

09. 04. 2008
0 views

10. 04. 2008
0 views

13. 04. 2008
0 views

14. 04. 2008
0 views

16. 04. 2008
0 views

17. 04. 2008
0 views

19. 02. 2008
0 views

28. 04. 2008
0 views

18. 03. 2008
0 views

28. 12. 2007
0 views

11. 09. 2007
0 views

15. 10. 2007
0 views

23. 12. 2007
0 views

23. 10. 2007
0 views

05. 10. 2007
0 views

22. 10. 2007
0 views

11. 09. 2007
0 views

15. 10. 2007
0 views

12. 10. 2007
0 views

29. 12. 2007
0 views

17. 10. 2007
0 views

11. 03. 2008
0 views

07. 01. 2008
0 views

30. 10. 2007
0 views

26. 10. 2007
0 views