01 Theory

Theory 1

THEOREM: Continuous PDF of a sum

Let fX,Y(x,y) be any joint continuous PDF.

Suppose W=X+Y. Then:

fW(w)=+fX,Y(u,wu)du

When X and Y are independent, so fX,Y=fXfY, this becomes convolution:

fW(w)=fXfY=+fX(u)fY(wu)du

Extra - Derivation of X+Y PDF

The joint CDF of X+Y:

FX+Y(w)=P[X+Yw]=x+ywfX,Y(x,y)dxdy

Find fX+Y by differentiating:

fX+Y(w)=ddwFX+Y(w)ddwx+ywfX,Y(x,y)dxdy

To calculate this derivative, change variables by setting u=x and s=x+y. The Jacobian is 1, so dxdy becomes duds, and we have:

ddww+fX,Y(u,su)duds+fX,Y(u,wu)du
Link to original

02 Illustration

Example - Sum of parabolic random variables

Sum of parabolic random variables

Suppose X is an RV with PDF given by:

fX(x)={34(1x2)x[1,1]0otherwise

Let Y be an independent copy of X. So fY=fX, but Y is independent of X.

Find the PDF of X+Y.

Solution

The graph of fX(wx) matches the graph of fX(x) except (i) flipped in a vertical mirror, (ii) shifted by w to the left.

When w[2,0], the integrand is nonzero only for x[1,w+1]:

fX+Y(w)=(34)21w+1(1(wx)2)(1x2)dx=916(w5302w334w23+1615)

When w[0,+2], the integrand is nonzero only for x[w1,+1]:

fX+Y(w)=(34)2w1+1(1(wx)2)(1x2)dx=916(w530+2w334w23+1615)

Final result is:

fX+Y(w)={916(w5302w334w23+1615)w[2,0]916(w530+2w334w23+1615)w[0,2]0otherwise

center

Link to original

03 Theory - extra

Theory 3

Videos by 3Blue1Brown:

Convolution

The convolution of two continuous functions f(x) and g(x) is defined by:

(fg)(x)=+f(xt)g(t)dt

center

For more example calculations, look at 9.6.1 and 9.6.2 at this page.

Applications of convolution

  • Convolutional neural networks (machine learning theory: translation invariant NN, low pre-processing)
  • Image processing: edge detection, blurring
  • Signal processing: smoothing and interpolation estimation
  • Electronics: linear translation-invariant (LTI) system response: convolution with impulse function

Extra - Convolution

Geometric meaning of convolution Convolution does not have a neat and precise geometric meaning, but it does have an imprecise intuitive sense.

The product of two quantities tends to be large when both quantities are large; when one of them is small or zero, the product will be small or zero. This behavior is different from the behavior of a sum, where one summand being large is sufficient for the sum to be large. A large summand overrides a small co-summand, whereas a large factor is scaled down by a small cofactor.

The upshot is that a convolution will be large when two functions have similar overall shape. (Caveat: one function must be flipped in a vertical mirror before the overlay is considered.) The argument value where the convolution is largest will correspond to the horizontal offset needed to get the closest overlay of the functions.

Algebraic properties of convolution

  • fg=gf
  • f(gh)=(fg)h
  • f(g+h)=fg+fh
  • a(fg)=(af)g=f(ag)
  • (fg)=fg=fg

The last of these is not the typical Leibniz rule for derivatives of products!

All of these properties can be checked by simple calculations with iterated integrals.

Convolution in more variables Given f,g:n, their convolution at 𝐱 is defined by integrating the shifted products over the whole domain:

(fg)(𝐱)=nf(𝐱𝐲)g(𝐲)dy
Link to original