# Interesting Statsitcal Problems in Interviews

### Problem 1

**If we have $n$ dice draws with a dice that has $k$ sides. Let $X_i$ to be the random variables that describes the number of times that the dice faces $i$. What is the correlation between $x_i$ and $x_j$ for $i\neq j$**.

We first notice, $X_i$ are identitically distributed. Thus, the correlation between $X_i$ and $X_j$ does depend on $i$ and $j$ and they they have the same variance. Thus, we only need to calculate covariance first. We then notice

\[X_1+X_2 + \dots + X_k = n\]Next we compute

\[\begin{aligned} &\text{Cov}[X_i, X_1+X_2+\dots + X_k] = \text{Cov}[X_i, n] =0 \\ &=\text{Cov}(X_i, X_i) + (k-1)\text{Cov}(x_i,x_j)\\ \end{aligned}\]Thus, we find

\[(k-1)\text{Cov}(x_i,x_j) = - \text{Var}(X_i)\]Thus,

\[\text{Cov}(x_i,x_j) = - \frac{\text{Var}(x_i)}{k-1}\]Therefore, the correlation becomes

\[\text{Corr}(X_i,X_j) = \frac{\text{Cov}(X_i,X_j)}{\text{Var}(X_i)} = - \frac{1}{k-1}\]### Problem 2

**With a set of 52 cards, we draw without replacement from the card. What is the expected number of draws to see the first Ace?**

Counting down from the top of the deck of cards. Let $X_1$ be the number of cards before the first Ace. Let $X_2$ be the number of cards between the first and the second Ace. Let $X_3$ be the number of cards between the second and third Ace. Let $X_4$ be the number of cards between the third Ace and the fourth Ace. Finally, let $X_5$ be the number of cards between the fourth Ace and the bottom of the deck. We find

\[X_1+X_2+X_3+X_4+X_5 = 52-4 = 48\]We also find the distribution of $X_i$’s are the same and thus,

\[5E[X_1] = 48\quad \Rightarrow \quad E[X_1] = \frac{48}{5}\]### Problem 3

**A 2D random walk starts from $(1,1)$. If it will stop once it hits the $y$-axis. What is the probability that it will stop at the negative part of the $y$-axis?**

Notice the symmetry of the problem. In 2D random walk, the problem is symmetric. At a given time, it will be uniformly distributed in the direction respect to the starting point. Given it hits $y$-axis, it will be uniformly distributed around within the angle ($0$,$\pi$) with respect to the starting point. However, the probability of going to the negative part is the last $\pi/4$ angle. Thus, the probability of landing in the negative part of $y$-axis is $1/4$.

### Problem 4

**Give an example that two random variables are uncorrelated but dependent**

Let $X = \sin(\tau)$ and $Y = \cos(\tau)$ and $\tau\sim U(0,2\pi)$. This is uncorrelate becuase, if we plot $(X,Y)$, it relys on the unit circle. If we run a linear regression on that, the line will have zero slope. Since correlation between $Y$ and $X$ is the $R^2$ score and it is zero in this case. They are dependent because, once we know $X$, $Y$ can only have at most two values.

### Problem 5

**How to uniformly generate points on a disc with radiu $R$?**

If we label points by $(r,\theta)$, the probability density function of $(r,\theta)$ is

\[p(r,\theta)drd\theta = \frac{rd\theta dr }{\pi R^2} = \frac{d\theta}{2\pi} \cdot \frac{d r^2}{R^2} = p(\theta) p(r^2)\]Thus, we uniformly generate $r^2$ in the interval $[0,R^2]$ and uniform generate $\theta$ from $[0,2\pi)$.

## Leave a comment