1. Introduction
1. Overview
- A common and import goal of statistics : To learn about a large group from some of its member data (i.e. sample)
- Data are observations that have been collected
- Population : complete collection of all elements to be studied(The collection includes all subjects to be studied)
- Sample : a subselection of members from part of a population
- Focus in this course : To learn how to use sample data to form conclusions about population
2. Types of data
- Parameter and statistic
- Parameter : a measurement describing some characteristic of a population
- Statistic : a measurement describing some characteristic of a sample
- A common way of classifying data
- Quantitative vs.Qualitative data
* Quantitative data : consist of numbers representing counts or measurements (ex. 52kg)
* Qualitative(or categorical or attribute) data : can be distinguished by some non-numetric characteristic (ex. overweight)
- Discrete vs. continuous data
* Discrete data : the number of possible values is either a finite number or a countable number (counted)
* Continuous(numerical) data : infinitely many possible values (measured)
- Another common way of classifying data : to use four levels of measurements
- Nominal : categories, non-ordering (ex. colors)
- Ordinal : categories, ordering (ex. course grades)
- Interval : difference between any two data values is meaningful, no zero starting point (ex. temperatures 0℃, Years)
- Ratio : interval level with a natural zero starting point (ex. weights 0kg represent no weight, ages) // no represents nothing
3. Design of experiments
- Two distinct sources for obtaining data
- Observational study : don't attempt to modify the subjects
(Different types of observational studies)
* Cross-sectional study : data are collected at one point in time
* Retrospective study : data are collected from the past
* Prospective(or cohort) study : data are collected in the future from groups sharing common factors
- Experiment : apply some treatment (ex. clinical trial)
- Results of experiments are sometimes ruined because of confounding
(Confounding) occurs when effects of variables are somehow mixed so that the individual effects of the variables cannot be identified(i.e., confusion of variable effects)
- Thus, try to plan the experiment so that confounding does not occur
(Three key issues in design of experiment)
* Issue 1 : Controlling Effects of Variables
- Blinding : a technique in which the subject doesn't know whether he or she is receiving a treatment or a placebo
- single-blind : the subject don't know whether they are getting the treatment or placebo
- double-blind : neither the subject nor the experimenters know who is receiving a particular treatment
- Blocks : a group of subjects that are known to be similar in the ways that might affect the outcome of the experiment
- Randomization : select subject randomly
- Completely randomized design : treatments are assigned to the subjects by using a completely random assignment process
- Randomized block design : use randomization to assign subjectws to treatments separately within each block
* Issue 2 : Sample size & replication
- Sample size : large enough so that we can see the true nature of any effects, and obtain the sample using an appropriate method
- Replication : repetition or duplication of an experiment for reliability (odd repetition is better than even)
* Issue 3 : Randomization & sampling strategies
(Sampling method)
- Random sampling : each individual member has the same chance of being selected
- Simple random sampling : every possible samples of the same size 'n' has the same chance of being chosen
- Systematic sampling : randomly select a starting point and then select every k-th element in the population
- Stratified sampling : subdivide the population into at least two different subgroups then draw a sample from each subgroup
- Cluster sampling : divide the population area into sections then randomly select some of those clusters
(Sampling errors) the difference between a sample result and the true population result(∵ sample is a subset of population)
Summary
- Goal of statistics : to learn about a population(all subjects) from sample(part of a population)
- Types of data
* Parameter(from population) & statistics(from sample)
* Common way of classifying data : Quantitative(counts, discrete and continuous) vs. Qualitative(non-numeric)
* Another common way of classifying data : nominal(categories, non-ordering) / ordinal(categories, ordering) / internal(difference values is meaning full, no zero starting point) / ratio(interval with zero starting point)
- Design of experiment
* Two distinct sources for obtaning data : Observation(not modify) and experiment(apply treatment)
- Different types of observational studies : cross-sectional(at one point in time) / retrospective(past) / prospective(future)
- Results of experiments are sometimes ruined because of confounding
* Three key issues in design of experiment : controlling effects of variables, samples size & replication, and randomization & sampling method
- controlling effects of variables : blinding(single or double) / blocks / randomization(completely or randomized block)
- sampling method : random sampling, simple random sampling(size n), systematic sampling, stratified sampling, cluster sampling