# Probability, statistics, frequentist and Bayesian

This post is a review of basic concepts in probability and statistics.

Useful reference: https://cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15/notes.html

https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/

Probability

It’s a tool to mathematically measure uncertainty.

Formal definition involving $\sigma-algebra$:

A probability space isa triple $(\Omega, F, P)$ consisting of :

• A sample space $\Omega$
• A set of events F – which will be $\sigma-algebra$
• A probability measure P that assigns probabbilites to the events in F.

Example: We have a fair coin. Now we toss it 1000 times, what’s the probability of getting 600 heads or more?

Statistics

The goal of statistics is to 1) draw conclusion from data (e.g. reject Null Hypothesis) and 2) evaluate the uncertainty of this information (e.g. p-value, confidence interval, or posterier distribution).

At the bottem, statistical statement is also about probability. Because it applies probability to draw conclusions from data.

Example: We would like to know whether the probability of raining tomorrow is 0.99. Then tomorrow comes, and it does not rain. Do we conclude that P(rain) = 0.99 is true?

Example 2: We would like to decide if a coin is fair. (Data) Toss the coin 1000 times, and 809 times it’s a head. Do we conclude the coin is fair?

Note : probability is logically self-contained. There are a few rules, and the answers follow from the rules. Statistics can be messy, because it involves draw conclusion from data – much art than science.

Frequentist vs Bayesian

Two schools of statistics. They are different in their interpretation of probability.

Frequentist interpret probability to be the frequencies of events in repeating experiments. E.g. P(head) = 0.6. Then if we toss a coin 1000 times, we will have 600 heads.

Bayesian interprets probability to be a state of knowledge, or a state of belief, about a preposition. E.g. P(head) = 0.6, means we are fairly certain (around 60% certain!) that a coin will be tossed head.

In practice though, Bayesian seldom use a single value to characterize such belief. Rather, it uses a distribution.

Frequentists are used in social science, biology, medicine, public health. We see two sample t-tests, p-values. Bayesian is used in computer science, “big data”.

Core difference between Frequentists and Bayesian

Bayesian considers the results from previous experiments, in the form of a prior.

See this comic for an illustration.

What does it mean?

A frequentist and a Bayesian are making a bet about whether the sun has exploded.

It’s night, so they can not observe.

They ask some expert whether the sun has gone Nova.

They also know that this expert will toss two coins. If both get 6, she will lie. Else, she won’t. (Data generation process)

Now they ask the expert, who tells them yes, the sun has gone Nova.

Frequent conclude that since the probability of getting two 6’s is 1/36 = 0.0027 <0.05 (p < 0.05), it’s very unlikely the expert has lied. Thus, she concludes the expert did not lie. Thus, she concludes that the sun has exploded.

Bayesian, however, has a strong belief that the sun has not exploded (or else they will be dead already). The prior distribution is

• P(sun has not exploded) = 0.99999999999999999,
• P(sun has exploded) = 0.00000000000000001.

Now the data generation process is essentially the following distribution:

• P(expert says sun exploded |Sun not exploded) =  1/36.
• P(expert says sun exploded |Sun exploded) =  35/36.
• P(expert says sun not exploded |Sun exploded) =  1/36.
• P(expert says sun not exploded |Sun not exploded) =  35/36.

The observed data is “expert says sun exploded”. We want to know

• P( Sun exploded | expert says sun exploded ) = P( expert says sun exploded | Sun exploded) * P( Sun exploded) / P(expert says sun exploded)

Since P(Sun exploded) is extremely small compared to other probabilities, P( Sun exploded | expert says sun exploded ) is also extremely small.

Thus although the expert is unlikely to lie (p = 0.0027), the sun is much more unlikely to have exploded. Thus, the expert most likely lied, and the sun has not exploded.

# Literature Review: Enough is enough

Another 6 days passed since I updated my blog – I’m still working on my MPhil thesis.

The problem? I started out too broad. After sending an (overdue) partial draft to the supervisor, she suggested I stop reviewing new literature. I then began wrapping things up.

After drawing the limit of literature, writing suddenly becomes much more easy.

I in fact write faster.

I also read faster. On papers on attitude change, it became easier to identify key arguments and let go of minor ones. On news about background and the history of protests in Hong Kong, it became easier to focus on what and how much is needed for my case. I briefly discussed about types of movement histories in Hong Kong, without going deeper about SMO strategies.

Thus, a lesson might be drawing boundaries is a hard but crucial step.

The lesson might also be having a clear delivery improves efficiency.

For example, this PhD spent 10+ years in his program… And it seemed he had a similar problem. On the surface, it might be procrastination. One level down there is anxiety, shame, guilt and low self-esteem. On level down, this is because of the unclear goals and priorities.

What can I do better to have finished this quicker?

• Talk to experienced people more often. Drawing boundaries is hard and there is no clearly defined rule. Thus only way is to learn from experience, and let them judge if this is enough! (Tacit knowledge / uninstitutionized knowledge)

# Literature Review: delete part of your writing might be the solution

I got stuck writing the second half of my literature review in the past few days. This post describes a solution to it.

My initial literature review has something on

• 1) belief structure — in cultural sociology and cognitive sociology
• 2) attitude change  — in social psychology
• 3) political socialization — in political psychology

For a while, I was stuck and did not know why I was stuck. I tried to write on agents of political socialization (family, peers, school, reference group…).

But then I found some literature on undergraduate political socialization, and that leads me to another rather broad field. I then became idle.

As with programming, when you find yourself thinking instead of writing, something is wrong.

After talking to a PhD student, I realized my problem was I was trying to say too many things.

“Belief” / “opinion” / “attitude” / “understandings” are not merely words in social science. They are concepts. So, for each of them, the literature is vast.

I cannot possibly write about both belief and attitude in my literature review, because that would be too much. More, they are not the same thing so they do not hold together.

After deleting all writings about belief structure and cognitive sociology, the literature review becomes much clearer.

A broad lesson is it takes practice to recognize the scope of literature, the theories and what is useful. A good idea is a clear idea.