Published on September 13, 2017
1. 1 Chapter 1 Role of Biostatistics in Health Sciences Learning Objectives l To define statistics and its purpose l To know the areas of applications and common misuses of statistics 1.0 INTRODUCTION Statistics is an important and fascinating subject with growing applications in most academic disciplines and areas of life. It plays a key role in the develop- ments that have shaped the modern life and are fundamental in the fields of Medicine, Engineering, Technology, Bioinformatics, Economics and Finance. Because of its widespread use, it is important to comprehend the basics of sta- tistical reasoning, its principles, methods and practices. This chapter aims to introduce the discipline of statistics and the elements it encompasses. Lord Kelvin said, ‘When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and un- satisfactory kind.’ This adage brings out the fact that the extent of our knowledge about something depends on our ability to measure and express it in the form of numbers. Numbers play a definitive role in understanding something of our interest, be it the daily temperature readings, price indices or health indicators. Unless we have the numbers, all we know is only from our experiences, stories, views and perspectives. For instance, a description of a population as large, poor and un- healthy would be adequate for a common man, but for a policy maker or an admin- istrator it is of little use. He/she needs to know how many people live or how many have a particular disease to plan and allocate resources for the needed actions. The unique characteristic of numbers is that they not only describe the char- acteristics precisely and concisely but also express the variation between the To understand God’s thoughts we must study statistics, for these are the measure of His purpose. Florence Nightingale
2. 2 Principles and Practice of Biostatistics units under the investigation. For instance, if a public health specialist is con- cerned with the frequency of low birthweight among rural women, to determine the incidence of low birthweight, the researcher measures the birthweight of infants born to women from that particular region. Suppose data for the first four infants are 2000, 2100, 2900 and 2800. Here, note that the numbers not only precisely express the birthweight of each infant but also express the difference in birthweight between the infants in the group. Note that any characteristic, be it birthweight, blood pressure, cholesterol, blood group or eye colour, tends to vary (even if slightly) from person to person. This means that the things of interest (women in the region) will not necessarily be identical with respect to the characteristic being described (infant’s birthweight). This variability is omnipresent and becomes important for the human mind to describe, explain and interpret the given information or data. This concept of variability is absolutely essential and forms the basis of statistical reasoning. 1.1 WHAT ARE STATISTICS? ‘People who drink alcohol are almost twice as likely to have automobile acci- dents as people who do not drink alcohol.’ ‘Use of antibiotics reduces the duration of infections by 2 days.’ ‘The number of accidents recorded in the southern region during the last year was reported be 40.’ ‘Two-thirds of the adults are obese or overweight.’ The above statements could be facts or fallacy, but they are often familiarly called statistics and quoted in many forums. Statistics is quite an unusual word. To a layperson, it often means collection of related facts, like the above state- ments. But as a discipline, it is defined as a method of collecting, organizing, analysing and interpreting the numerical data. Broadly, statistics can be used for two major purposes: descriptive and inferential. If statistics is used to describe or summarize the data, it is called descriptive statistics. Typically, descriptive statis- tics are used to describe and provide statistical information about the collected data. An example of descriptive statistics is the decennial census of India, in which all residents are asked to provide information such as age, sex, religion, educational status, occupation and marital status. The data obtained in the census can be compiled, summarized and arranged in tables and graphs to provide a description of the characteristics of population at a given time. This information can further be used for administrative and organizational purposes. However, descriptive statistics alone cannot be used for many problems encountered by the researchers. Most often, researchers are interested in generalizing the results to the population. In other words, with a relatively small sample the researcher would like to make statements about the much larger population. For instance, if a drug is found to be effective on a sample, researchers want to extrapolate this finding to the population. In this regard, statistical methods used to draw conclu- sions about the whole population using a sample are called inferential statistics.
3. 3Role of Biostatistics in Health Sciences Chapter | 1 1.2 WHY STUDY STATISTICS? Firstly, since the statistical methods have become the basis for research in many different fields, it has become essential for students and researchers to learn the basic principles of statistics to read understand and critically evaluate the vari- ous research studies in their fields of interest. Secondly, if one were to perform a research study, a clear understanding of statistics procedures are a prerequisite. Thirdly, statistical reasoning and thinking can also be helpful in becoming better consumers and citizens by understanding and evaluating the nature of statistical facts that are faced in our daily lives. 1.3 PRACTICAL APPLICATIONS It is interesting to note that the use of statistics is not new, and governments have been traditionally collecting information about births, deaths, imports, exports, etc. In the early ages, statistics was majorly considered as ‘a science concerned with information that was important to the state’. It was Karl Pearson (1857–1936) who changed this perspective and developed a philosophy of statistical reasoning. He was one of the founders of twentieth-century science of statistics and one of his chief contributions to the statistics was the most-used chi-square test. During the same period, R.A. Fisher (1890–1962) developed the theoretical frameworks for statistical inference, a method for generalizing results from a sample to popula- tion. He also made significant contributions to the theory of experimental study designs and formulated the principles of randomization, replication and confound- ing. In the last few decades, the use of statistical methods has increased markedly in the fields of biology and social sciences. Such methods have also proven to be useful in various branches of physical sciences and engineering. Here, we have listed a few significant applications of the principles of statistics. 1.4 STATISTICS IN MEDICAL RESEARCH Statistical methods are being extensively used in biomedical research. A brief glance through any medical journal will show the role of statistical methods in modern medical research. Questions pertaining to risk factors for heart dis- eases, cancer, AIDS, etc., treatment efficacies and diagnostic accuracies are usually the most explored. Statistical methods like t-test, chi-square test, ANOVA and regression models have become some of the most common ap- plications of statistical methods in medical research. At the very least, most research papers report a ‘p-value’ to highlight the significance of the results. In the recent years, many new modern methods have found applications in medical research. For instance, bootstrap, Gibbs sampler, neural networks (NN) and Bayesian approaches in medicine have attracted many researchers to implement these in several biomedical applications. In addition, the availability
4. 4 Principles and Practice of Biostatistics of large molecular databases allows a scientist to plan an experiment and im- mediately obtain the relevant data from the available databases. This is an area in which statisticians are making very important contributions. In fact, biostatistics has emerged as a distinct discipline to deal with the growing application of classical statistical methodology. Biostatistics is ‘con- cerned with the design and analysis of data derived from biological, biomedical and health-related studies’. It includes a wide range of disciplines such as prob- ability, clinical trials, experimental design, informatics and computing to re- solve biomedical problems. 1.5 STATISTICS IN EPIDEMIOLOGICAL RESEARCH Epidemiology addresses the frequency, determinants, distribution and control of disease in human populations. Biostatistical methods and techniques have played an integral role in the development of epidemiology. The collaboration between statisticians and epidemiologists has led to advancements in both the fields. For instance, epidemiological studies present large amount of data which are binary in nature and analysis of which led to distinct methods of statistical analysis for categorical data. Likewise, the analysis of data from case–control studies presented an impracticable situation to use prospective analytic strate- gies. Statisticians Jerome Cornfield and Nathan Mantel provided a framework and a rationale for valid inference based on case–control data. The statistical analysis of the geographical distribution of the incidence of disease and its relationship to potential risk factors has an important role to play in various kinds of public health and epidemiological study. This general area is referred to as ‘spatial epidemiology’, and statistical methods have been applied increasingly in (a) disease mapping, (b) disease clustering studies and (c) ecological studies. 1.6 STATISTICS IN PUBLIC HEALTH AND POLICY Statistical methods provide the ability to forecast trends in data, and thereby assist in making public health decisions. Forecasting is usually done with the help of mathematical and regression models. For instance, the forecasting models were used to project the future burden of AIDS epidemic. The formulation of policies and programmes require reliable, robust and adequate data. For instance, health policies and programmes based on health indicators and priority indicators will receive more attention. Statistics plays a major role in processing the data relat- ing to variety of fields to assist the government in planning and commissioning. 1.7 STATISTICS IN SOCIAL SCIENCES In recent years, statistics has found some popular applications in the field of social science such as economics, political science, law, sociology and
5. 5Role of Biostatistics in Health Sciences Chapter | 1 anthropology. Statistical tools like time series analysis, index numbers, correla- tion and regression analyses estimation and hypothesis testing are immensely used in solving many economic problems such as wages, prices, production and distribution of income and wealth. In addition, advanced statistical methods like multilevel models, structural equation modelling and survey methods have been extensively used in the various branches of social sciences. In recognition of the enormous growth, a distinct discipline called Econometrics was devel- oped with an aim of applying statistical methods to the study and elucidation of economic principles. 1.8 MYTHS AND MISUSES It is interesting that the general view of statistics is in two extremes. At one extreme, it is commonly believed that anyone with school arithmetic is equipped to do statistical analysis. However, they often fail to understand that modern statistics is a highly developed technical field which has made wide use of mathematics and computer science. Although one can use his/her school arithmetic skills to understand and critically evaluate the statistical claims, it would be wrong to assume that he/she can efficiently do statistics. At the other extreme, it is a commonly held view that statistics is an apparently difficult subject with abstract ideas, complex formula and lengthy calculations. Hence, all efforts to learn statistics have been attempted with this negativity and preju- dice that it is an abstract discipline. But both of these views are baseless and mostly mistaken. Like all professional fields, statistics has certain sharp limitations and un- sound applications. Some of the common misuses are as follows: (1) Faulty generalizations where one tries to generalize the findings from a small and biased sample to a larger context. (2) Using statistical methods to make causal statements. (3) Use of means when medians are more appropriate and reporting only the means without referring to the measures of variability. (4) Use of percentages, without reference to the total number in the study. (5) Most serious of all misuses relate to significance tests. It has become a common practice to run several different tests in an attempt to find statisti- cal significance. This overreliance on statistical significance and p-values has resulted in many misuses and manipulation of statistical methods. The availability of computer programs have greatly contributed to this misuse, whereby it has become very easy to plug in some numbers into these pro- grams and get some results without knowing the basics of how these tests work, their assumptions and appropriateness. If researchers understand the underlying principles of these methods, they should have fewer problems in using these computer programs.
6. 6 Principles and Practice of Biostatistics 1.9 ROLE OF STATISTICIANS Statisticians play a key role in the design and analysis of research studies. Al- though the role of statisticians at the stage of analysis is quite understood, the importance of consulting a statistician at the design stage is often unfelt. Here, it is important to understand the fact that if the statistical design and data col- lection are flawed, the statistician cannot do much to salvage the study. So, the ideal time to consult a statistician is right at the beginning of the research study. Statisticians play a significant role in the design of studies by refining the research question and formulating workable hypotheses, choosing an appropri- ate and feasible study design, and in determining the required sample size. In the analysis, a statistician assists in the selection and implementation of appro- priate data analysis methods that are most effective. In the current setting, the ability to run t-tests, ANOVA and regression analyses is not a major contribu- tion, as there are many readily available statistical analysis software which can easily be implemented by anyone. However, current biomedical research in- volves data from study designs that violate many of the assumptions of tradi- tional statistical analysis. Correlated data from longitudinal and clustered study designs are commonplace. Distributional assumptions and missing data pat- terns must be carefully scrutinized before an appropriate data analysis method can be chosen. In these scenarios, only a professionally trained biostatistician can evaluate these issues and choose a correct technique to adequately handle specific problems in a research project. Finally, statisticians also play an important role in interpreting the findings of the analysis to make valid conclusions and in preparing study results for both presentations and papers.