../

Fisher Information

Layman’s explanation

Let’s suppose you have some unknown parameter $\theta$ and an observable random variable $X$ which depends on $\theta$ in some manner. Can we find out the value of $\theta$ using the values of $X$? If we have multiple random variables that depend on $\theta$ how do we pick the best one? This is the question answered by Fisher information

Example

I stole this from some YouTube video

  • Suppose you want to find out your stomach’s internal temperature. Let this be our $\theta$.
  • As you can imagine, it would be pretty hard to measure it in a noninvasive manner. But you have some measurable random variable such as your mouth’s temperature and the temperature of your armpit
  • These random variables aren’t perfect though. Your mouth could be hotter/colder based on if you just had a cup of coffee or an ice-cream. Similarly your armpits might be hotter if you had just worked out.
  • Having these two ways of measurement how do we compare them?
  • How much information does the temperature of the mouth/armpit give us about the stomach?

Actual Definition

First Information is the variance of the score of the likelihood function.

Read this as fisher information given by random variable $X$ about $\theta$ is

$I_X(\theta) = \text{E}[(l(x; \theta) - \text{E}(l(x, \theta)))^2]$, where $l$ is the score

Let us first calculate the expectation of $l(x; \theta)$

$l(x;\theta) = \frac{\partial}{\partial\theta}\ln p(x; \theta)$

For the sake of simplicity let us assume that $\theta$ is a single variable and the all the derivatives will be univariate. WLOG this can be extended to multivariate functions.

Since this is a continuous variable we need to integrate

$\int p(x; \theta) \frac{\partial}{\partial\theta}\ln p(x; \theta) dx$

$= \int p(x; \theta) \frac{1}{p(x; \theta)}\frac{\partial p(x; \theta)}{\partial\theta} dx$

$= \int \frac{\partial p(x; \theta)}{\partial\theta} dx$

We can (apparently) flip integrals and derivatives

$= \frac{\partial} {\partial\theta} \int p(x; \theta) dx$

As $p$ is a pdf, $\int p(x; \theta) dx =1$

$= \frac{\partial} {\partial\theta} (1)$

$= 0$

Notice that we don’t really care what $p$ is. No matter what $p$ is, the expected value of score is 0

Hence, fisher information boils down to

$= \text{E}[l^2]$

$= \text{E}[(\frac{\partial}{\partial \theta}( \ln p(x; \theta) ))^2]$

It is the expected value of the squares of the derivatives of log likelihood