Published on October 22, 2008
Types of Data : Types of Data Prof. Dr. Naeema H. Jabr Razzouki Introduction : Introduction Just as we must classify and organize information before it can be retrieved and used, We must classify data into the correct type before we can do any statistical analysis on them. Important Note: WHY ? : Important Note: WHY ? The data type will determine: how data must be coded for analysis, and what kinds of analysis can be performed. Types of Data : Types of Data Nominal اسمية Ordinal الرتبية Interval الفاصلة Ratio النسبية Questionnaire: Example : Questionnaire: Example Gender Female ……… Male …….. Postal code of your home address How would you rate the quality of our services? Poor ……. Fair ……. Average …….. Good …….Excellent How many points have you cumulated from the bonus program? How many e-mail messages did you send yesterday? How old are you? What is your yearly income? Under $20.000 --------------- $20.000 to $29.999 -------------- $30.000 to $39.999 -------------- $40,000 to 49.000 ---------------- $50.000 to $59.999 -------------- $60,000 and over ------------------ Nominal Data: : Nominal Data: The answer to question 1 will be either “Female” or “Male”, but before analyzing the data we must code them by assigning a number to each possible answer. We can code female as 1 and male as 2. This coding is purely symbolic thus we can do limited calculations on them. We cannot calculate the average (1+2)/2 because there will not be a person between female and male. We can only count the frequency of 1s and 2s to find out how many respondents are males and how many are females. This is what is called nominal data. Nominal Data are obtained when numbers are used to symbolically label categories. The only calculation that can be applied to nominal data is to note the frequency of occurrence of each category. Another example: Postal Code : Another example: Postal Code To find out on which area the respondent resides, it is then labeled with numbers just as with gender. The code, then, cannot be used in meaningful calculations other than simple counting. Therefore, postal codes are nominal data. Ordinal Data: : Ordinal Data: The Third question asks respondents to rate the quality of the service in a five-point scale, then we can code the answers as: 1 for poor 2 for fair 3 for average 4 for good 5 for excellent We can reverse the coding method 1for excellent and 5 for poor Ordinal data contain information as to better or worse, or greater or less, but they do not tell us details as to how much better or how much greater. : Ordinal data contain information as to better or worse, or greater or less, but they do not tell us details as to how much better or how much greater. In this case the coding is not arbitrary, we cannot say that 1 for poor and 2 for excellent and 3 for fair, we must follow an order (from high to low or from low to high) to indicate the rank of quality inherent in the words used. 4 is better than 3, 5 is better than 4. Still we cannot say that the difference between 1 and 2 is equal to that between 3 and 4. The numbers are attached simply to show the order not to show how much better Interval Data: : Interval Data: Question four introduces the third type of data as interval data, such as the temperature scales. i.g., 25°C is warmer than 20°C exactly 5°C. also we can say that that 15°C is warmer than 20°C by 5°C, but we cannot say that 20°C is twice is warm as 10°C because temperature measured in Celsius does not have an absolute zero point. That is the zero point is arbitrary chosen and an object of 0°C is not without heat. If we consider Fahrenheit in which 32°F corresponds to 0°C and 68 °F corresponds to 20°C , and 50°F corresponds to 10°F then, we cannot say that 20°C is as twice is warm as 10°C Interval Data , then : Interval Data , then Interval Data provide not only greater than or less than information, but also details on how much greater than or less than. Interval data have no absolute zero point, so that we CANNOT use comparisons such as “twice as many” or “half as much” with interval data. Question No. 4 : Question No. 4 How many points have you cumulated from the bonus program? Let Us Say: The Internet service provider gives every participant 2,000 points at the beginning and 100 points for each dollar the user pays for the service. Users can claim various prizes based on the total points accumulated. Thus a user with 3,000 points has 500 more points than a user with 2,500. However, this point system lacks an absolute zero. WHY? Because a user with 10,000 point has not earned twice as many as a user with 5,000. Look at the difference: 10,000-2,000= 8,000 5,000-2,000=3,000 Though: 8,000 points is more than twice of 3,000 and zero points means no participation. Ratio Data : Ratio Data As with questions 5 and 6: How many e-mail messages did you send yesterday? How old are you? The answer for both questions provides details about how much greater or less, they have an absolute zero point: 0 message means no message Ratio Data : Ratio Data Provide not only greater than or less than information, but also details on how much greater than or less than. In addition there is an absolute and non-arbitrary zero point so that we can use comparisons such as “twice as many,” Data Conversion : Data Conversion What about question 7? Regarding the yearly income Slide 17: You may think of an absolute zero point for income, no income at all You may think of comparisons, $40,000 is as twice as $20,000 Then you might decide that incomes are ratio data. BUT IT IS NOT !!!!!!! How? : How? Respondents are asked to place themselves into one of the six categories of income, so we can code these categories from 1 to 6 with 1 for the lowest and 6 for the highest. If you are in the fourth category and Ali in the second we know that your income is higher than Ali, but we don’t know exactly how much higher, we cannot say as much as twice. Therefore we are collecting Ordinal Data.