29/10/57 Basic Concepts of Medical Statistics Pawin Numthavaj M.D. Section for Clinical Epidemiology and Biostatistics Faculty of Medicine Ramathibodi Hospital Mahidol University [email protected] Outline •Types of data •Descriptive statistics •Inferential statistics •Statistical software •How to use Stata program 1 29/10/57 Types of Data Types of Data Categorical (Qualitative) Nominal Numerical (Quantitative) Ordinal Discrete Continuous Categorical data • Data that possible entries are “categories” 1. Nominal • These categories are without ‘order’ • Example: • Blood group – A / B / AB / O • Sex – Male / Female • Cancer status – Have cancer / Do not have cancer • Special case of 2 choices: bin ary/dichotomous data 2. Ordinal • With ‘order’ – we can see which is bigger/smaller • Example: • Cancer staging – I / II / III / IV • Degree of disease – Mild / Moderate / Severe 2 29/10/57 Numerical data •Data that possible entries are numbers 1. Discrete •Numbers are stepping levels •Usually without fractions • Number of positive lymph nodes after surgery • Number of heart beats per minute 2. Continuous •Possible numbers could be with fractions • Age (years) • Cholesterol level (mg/dL) Types of data •Types of data are important in several aspects of choosing the right statistical analysis •Summarizing •Estimation •Hypothesis testing 3 29/10/57 Post test for types of data Data Types of data Weight (kg) Numerical History of smoking (Y/N) Categorical (dichotomous) Serum potassium level Numerical Length of stay in hospital (day) Numerical Patient status (dead/alive) Categorical (dichotomous) Descriptive statistics •Summaries of data •Methods for describing a set of data by using graphs or summary measures •Provide an ‘overview’ of general features of a set of data 4 29/10/57 Descriptive statistics Summarizing of Data Categorical Data Numerical Data - Frequency - Measures of central tendency - Percentage - Measures of dispersion Summarizing categorical data •Frequency •Percentage 5 29/10/57 Summarizing Categorical Data •Ex1. Summarizing of sex among 70 patients Sex Male Female Total Frequency 56 14 70 Percentage 80 20 100 Summarizing Categorical Data •Ex2. Summarizing of staging of cancers Stages of cancers Frequency I 120 II 320 III 160 III 200 Total 800 Percentage 100.0 6 29/10/57 Statistical Software and Stata Program Statistical Software Name Website Price Features Ease of use Note SPSS http://www.ibm.com/ software/analytics/sps s/ $$$$$ ++++ ++++ Need to purchase separate modules for complicated analyses (such as Survival Analysis) Available from MU (http://softwaredownload.ma hidol/) Stata http://www.stata.com / $$$$ ++++ +++ Ramathibodi access (CEB server) R http://www.rproject.org/ (Free) +++ + R-commander is nice add on SAS http://www.sas.com/ $$$$$ ++++ 0 Need programming skill 7 29/10/57 Stata window •Console •Command box •Review pane •Variables •Properties Stata commands •Type in Stata command and then see the result •Most command also have alternative menu and dialog •Beware of UPPER and lower cases •Beware of comma, colon, and spaces 8 29/10/57 Let’s use Stata as a calculator •Calculate summation of 23, 25, 29, 30 display 23+25+29+30 help display di 23+25+29+30 di sqrt(400) + (12^3) Let’s load data into Stata 9 29/10/57 Let’s load data into Stata •File – Import •Check “Import first row as variable names” Data Editor (Edit/Browse) 10 29/10/57 Description of Data •Codebook •Display information about data codebook •Missing values in Stata misstable summarize 11 29/10/57 List data • List • List variable list <variable> <variable> … • List if • List variable upon condition list <variable> <variable> … if <condition> • Bysort • Do the command after colon by each items of the first variable bysort <variable>: list <variable> <variable> … •list hn age sex •list hn age sex if revision ==1 •bysort sex: list hn age surgerydate 12 29/10/57 Summarize Categorical Data •Tabulate • Display table of variable with frequency and percentage tab <variable> • tab sex • bysort revision: tab sex Summarizing Numerical Data Summarizing Numerical Data Measures of Central Tendency Mean Median Measures of Dispersion Mode Standard Deviation Range 13 29/10/57 Summarized Data • sum age, detail • sum age if revision == 1, detail • bysort sex: sum age, detail Distribution of Data •Normal Frequency •Central tendency: use mean •Dispersion: use standard deviation •Non-normal distribution Frequency •Central tendency: use median •Dispersion: use range 14 29/10/57 Skewness Checking for normal distribution •Histogram •Normal probability plot •Compare mean and median •Compare mean and standard deviation 15 29/10/57 1. Histogram • histogram <variable> 0 20 Frequency 40 60 • histogram age 0.00 100.00 200.00 CD4 count 300.00 400.00 0.00 Normal F[(cd4c-m)/s] 0.25 0.50 0.75 1.00 2. Normal probability plot 0.00 0.25 0.50 Empirical P[i] = i/(N+1) 0.75 1.00 16 29/10/57 3. Compare mean and median • Mean 62.4 • Median 30.5 0 20 Frequency 40 60 • Distribution of CD4 data is skewed to the right (mean > median) 0.00 100.00 200.00 CD4 count 300.00 400.00 4.Compare mean and standard deviation Mean SD Mean±2SD Height (cm) 164.7 7.8 149.1, 180.3 CD4 count 62.4 74.4 -86.4, 211.2 17 29/10/57 Summarizing Numerical Data Age (year) Weight (kg) Height (cm) BMI CD4 Mean (SD) 49.6 (14.3) 95.6 (21.7) 171.5 (9.2) 32.5 (7.1) 62.4 (74.4) Dummy table Characteristics N(%) Gender Male Female Age; years; Mean(SD) Preoperative clinical score; Median(Range) Preoperative function score; Median(Range) Postoperative clinical score; Median(Range) 18 29/10/57 19
© Copyright 2025 Paperzz