Michael Lipkin
2 min readFeb 28, 2021

--

Sorry, I am finding the post confusing.

What is the green frequentist curve on this graph? If we take the frequentist estimate of 0.75 of getting heads than that is just a single point, not a distribution. Also the 0.75 would be an estimate - not a true probability value (since that would require infinite tosses).

With frequentism you start with the hypothesis (e.g. the hypothesis might be that the probability of heads is 0.5) You then look at the data and see if it aligns with the hypothesis. The data is considered to be a sample from a random distribution, so in the frequentist thought process the data COULD HAVE BEEN DIFFERENT.

For the Bayesian the data is FIXED. The events have happened and cannot be considered as being different. The task then, is to construct an idea of some underlying process that gave rise to the data. We don't give ourselves total freedom in designing this process, we define is as a random process with some unknown parameters. In this case the process is a Bernoulli process with unknown parameter p. Since we don't know what value p might be we model it as a probability distribution. (this is a totally different idea of probability where it is used as a model for lack of knowledge of something that has a fixed value but we will never know it exactly what it is)

If we have no idea what this parameter might be we might use an uninformative prior, if we think we have some knowledge we can include it in the prior.

We update the prior with the data to get the posterior, this always has less variance than the prior so we are a bit more certain about what the value of p might be.

So the posterior distribution represents our idea of what p might be, it could be anything! but it is common to take the peak of the distribution and report that - although this is just a convention and not a real Bayesian thing.

You are considering that the result with the second prior is better since it is nearer 0.5. On what basis?

The concept of a fair coin (i.e. with exact probability of heads = 0.5) is a frequentist one. Bayesians don't have this idea.

You will notice that unless the Bayesian chooses an infinitely strong prior with infinite density at p=0.5 and all else zero the Bayesian is unlikely to generate a posterior with a peak at exactly p=0.5 for this data.

If we have a fair coin and a lot of data then the posterior will be a narrow distribution centered on about 0.5. Even now the Bayesian is unsure of the exact value of p.

--

--