# 介绍

• 《The Seven Pillars of Statistical Wisdom》Stephen M. Stigler -Harvard University Press (2016)
• 【602英国教材 · 实验设计】Statistical Principles for the Design of Experiments Applications to Real Experiments
• 【本科教材】Statistics
• CART by Leo Beriman

# 【统计学七支柱】

### Introduction

“By stipulating that, given a number of observations, you can actually gain information by throwing information away!”

“but as the name “Likelihood” hints, there is a wealth of associated methods, many related to parametric families or to Fisherian or Bayesian inference.”

• 霍，原来可以叫Fisherian

### Chap1: Aggregation

“Jorge Luis Borges understood this. In a fantasy short story published in 1942, “Funes the Memorious,” he described a man, Ireneo Funes, who found after an accident that he could remember absolutely everything. He could reconstruct every day in the smallest detail, and he could even later reconstruct the reconstruction, but he was incapable of understanding. Borges wrote, “To think is to forget details, generalize, make abstractions. In the teeming world of Funes there were only details.” Aggregation can yield great gains above the individual components. Funes was big data without Statistics.

• WOW

“It was already well known a century earlier that magnetic north and true north differed, and by 1500 it was also well known that the difference between true and magnetic north varied from place to place, often by considerable amounts—10° or more to the east or to the west.”

• 早期指南针都有一个10°左右的误差。

“Instead he gives the mean of the largest and smallest, what later statisticians would call a midrange.

• 最大值最小值的平均数：midrange

“If we collect a man’s urine during 24 hours and mix all his urine to analyze the average, we get an analysis of a urine that simply does not exist;”

• 人们在担心，我们通过平均值所得到的个体不一定「真实存在」。就比如说linear regression中的样本中心点$(\bar{X},\bar{y})$ 不一定存在于样本之中。

### Chap2: Information

Root-n Rule: 如果你想获得双倍的精确度，你需要四倍的数据

“The paradox of the accumulation of information, namely, that the last 10 measurements are worth less than the first 10, even though all measurements are equivalently accurate, is heightened by the different (and to a degree misleading) uses of the term information in Statistics and in science.”

• 后十个数据提供的信息，没有前十个那么多了
• 两个反例：
• FIsher Information.$I_n(\theta) = nI(\theta)$ 这是因为它的衡量与方差是一个量度的。我们在使用时要开方
• 香农的信息论【这个不太懂了…】

• 1824年泊松发现Cauchy不满足

“This was in direct contrast to long mathematical practice: in a sequence of mathematical operations, mathematicians would keep track of the maximum error that could have arisen at each step, a quantity that grew as the series grew, while statisticians would allow for a likely compensation of errors, which would in relative terms shrink as the series grew.”

• 比如说数分里面的$\epsilon-N$，对比依概率收敛

• 这取决于我们的目标
• outlier

### Chap3: Likelihood

Fisher提出了MLE的说法。

The associated idea of likelihood as a way to calibrate our inferences

• Likelihood变成了我们统计推断的一个尺度。该不该拒绝原假设？都是根据likelihood来判断的。
• Likelihood被Bayes和MLE两大方法所引导？

### Chap4: Intercomparison

“In 1904–1905, he wrote a pair of internal memoranda (really in-house instruction texts) summarizing the uses of error theory and the correlation coefficient, based upon his reading of recent work from Karl Pearson’s laboratory at University College London. ”

• Gosset作为一个数学/化学家，阅读同时代统计学家的成果，给自己带来了insights。

Fisher后来一己之力，创造出了t-test，发展出了回归分析理论，以及ANOVA的全部。

### Chap5: Regression

• Regression的出现，恰恰拯救了达尔文的模型。
• 这个图片很有意思。个子矮的父母更容易生出个子矮的子女。但是个字矮的子女，却更有可能是没那么矮的父母生出来的，而不是一个直接的相互关系。
• 所以富不过三代是不是也有可能是一种「回归」

1933年，西北大学经济学家贺拉斯·塞奎斯特出版了The Triumph of Mediocrity in Business一书，这本书完全建立在统计错误之上。例如，他观察到，如果你在1920年列出利润率最高的前25%的百货公司，并且跟踪这些公司的平均表现到1930年为止，那么会发现它们的业绩表现不断趋于行业平均值，走向平庸。即使塞奎斯特知道回归，他也没有理解它。塞奎斯特这样写道：“在商业中，走向平庸的趋势不仅是统计的结果，更表现了普遍的行为关系。”他浑然不觉的是，如果根据1930年的利润选择前25%的公司，效果将会发生逆转。1920~1930年，业绩表现会稳定地远离平庸

# 【602英国教材 · 实验设计】

### Chap 1 Introduction

• 比如我们有四组老鼠，所以我们就想研究四种老鼠药的表现，这有点愚蠢。不要迁就，除非你真的想研究四种老鼠药。

The resource equation【自由度等式】：

$T + B + E = N - 1$

• 以前一直不知道叫啥，原来叫这个。
• 左右的自由度相同。treatment + block + error = data - mean estimate

### Chap 2 Elementary RCBD

• CRD：除了treatment的分配方式，别的都一样
• RCBD：每个block都包含每种treatment一个且只有一个

• 老罗曾问CRD ANOVA的三部分哪两个好算，当时一头雾水，现在明白了。between和total好算，因为都是用一项减去correction factor $ny_{..}^2$

ANOVA只依赖于方差分解，而不依赖于误差的正态假设。

RCBD中缩小误差 $\epsilon$ 的一个原则：要使得在不同的Block上，treatment差异保持恒定

Chapter 4会讲到一种Blocking方法，以一个为主，然后另一个Blocking通过covariance的方式进行修正。

Chapter 8 会讲到一种Multiple Blocking System。每一个数据含有不同的block信息，听起来很Fancy。

### Chap 3 Treatment Stucture

Factorial本质上是多个Factor一起研究。

• Main Effect:

$\sum_j l_jt_j \\ w.r.t \sum_jl_j = 0$

• Interaction:

$\sum_j\sum_k l_jm_kt_{jk} \\ w.r.t. \sum_jl_j = 0, \sum_k m_k = 0$

$\sum_j (pq)_{jk} = 0, \sum_k (pq)_{jk} = 0$

• 一定是先解释main，再解释interaction。因为interaction本质上是对main无法解释的进行补充。如果没有main，显得有点可笑。
• 其次是看order of magnitude。如果main的大小和interaction差不多，那么就把main的level组合列一下就可以了，不需要单独说main。但是如果有至少一个main的大小比interaction大很多，那么值得单独说一下。【main的level组合例子：50°的水和白皮鸡蛋组合，会产生96分的口感满意度】