% for survey % \documentclass[12pt]{article} \usepackage{graphicx} \usepackage{booktabs} \usepackage{med_headings} \usepackage{natbib}% To change the citation style, modify this line \bibliographystyle{abbrvnat} \newcommand{\beq}{\[} \newcommand{\eeq}{\]} \newcommand{\mod}{\mbox{mod}} % \pagestyle{empty} \oddsidemargin=-0.10in \textwidth=6.2in \topmargin=-0.75in \textheight=10.7in \begin{document} \begin{center} {\large \bf \mbox{Answers to questions raised by our statistics survey} } %{\sc Handout E %}\\ \medskip \end{center} \noindent\rule{\textwidth}{1pt} % \section*{How well calibrated are your estimates of uncertainty?} When looking at the responses to questions 3 and 4, we said {\em `Notice that more points are below the line ($y=x$) than above. Is the difference between the numbers below and above the line significant? (There are 16 above the line and 29 below. 4 are on the line.)'} Here are answers to this question using sampling theory and using Bayesian inference. Let's take as our null hypothesis the idea that the expected numbers above and below the line are actually equal. The alternative hypothesis says that the expected numbers are not equal; perhaps the alternative in fact specifies `I expect the mothers to have more children than the aunts'. In the sampling theory approach, we compute a quantity called a `p-value'. We ask the question `assuming that the null hypothesis is true, how probable is it that we would get an outcome {\em as extreme, or more extreme than} the actual outcome we observed?' [Many criticisms can be made of the use of p-values (see my book for some examples) but they are widely used, so let's compute a p-value for this problem.] Let's define the situation to be that $N=45$ tosses of a coin were made, and we observed $r=16$ heads; we define `as extreme or more extreme outcomes' to be the set $\{r=0$, $r=1, \ldots r=15, r=16\}$. Under the null hypothesis these outcomes have probability: \[ \sum_{r=0}^{16} {N\choose r} \frac{1}{2}^N = 0.036. \] This sum can be done exactly on the computer, or could be approximated using the normal (Gaussian) approximation to the Binomial distribution, which has mean 22.5 and variance $45/4$. The result $r=16$ is not quite two standard deviations from the mean. This probability is the `p-value': {\bf{0.036}}. The p-value would be twice as big if we defined the set of `as extreme or more extreme outcomes' to include $r=45-16, r= 45-15, r=45-14, \ldots, r=45$. (A `two-tailed test' instead of `one-tailed'.) It is common practice to say that a p-value smaller than 0.05 means an outcome is `significant'. But, as I said before, beware of p-values. Yes, a small p-value may be grounds for suspecting that $H_0$ is not true, but it does not show that $H_1$ is necessarily more probable, nor does the p-value correctly quantify how probable $H_0$ is. \subsection{Bayesian answer} Let's define $H_1$ to be the hypothesis that says that the bias $f$ of the coin is an unknown number in $(0,1)$; and that the prior distribution of $f$ is uniform over this interval. Then we can compare the two models as follows \citep{MacKay:itp}. \[ P( r \,|\, H_0 ) = {N\choose r} \frac{1}{2}^N \] \[ P( r \,|\, H_1 ) = \int_0^1 {N\choose r} f^r (1-f)^{N-r} \, df \] Here $r = 16$ and $N=45$. (The Bayesian approach only evaluates the probability of the data that happened, not the probability of other data sets that might have happened.) Numerically these two quantities are: \[ P( r \,|\, H_0 ) = 0.018 \] \[ P( r \,|\, H_1 ) = \frac{1}{N+1} = \frac{1}{46} = 0.022 \] \[ \frac{ P( r \,|\, H_0 ) }{ P( r \,|\, H_1 ) } = \frac{ 0.018}{ 0.021 } = 0.85 \] Assuming equal priors, the posterior probability is \[ P(H_1 \,| \, r ) = 0.54 . \] Thus the data give just weak evidence in favour of $H_1$. If we put forward the slightly-more-precise $H'_1$ that says `I expect that $f$ is {\em less\/} than $1/2$', the posterior probability ratio would swing by roughly a factor of 2 in favour of this alternative. \[ P( r \,|\, H'_1 ) \int_0^{1/2} {N\choose r} f^r (1-f)^{N-r} \, df \simeq \frac{2}{N+1} = \frac{2}{46} = 0.043 \] \[ P(H'_1 \,| \, r ) = 0.70 . \] What if $H'_1$ had been formulated more precisely? The posterior probability of a hypothesis like $H'_1$ could never be better than $0.87$. (Found by the following 3 lines of {\tt{octave}}.) \begin{verbatim} y=binomial_pdf(16,45,0.5); z=binomial_pdf(16,45,16.0/45.0); z/(y+z) \end{verbatim} \bibliography{/home/mackay/bibs.bib}%%% you should put your own \end{document}