Monte Carlo Simulation

Motivating Sampling

  • We always want to know some property related to our population

    • Maybe the distribution of our population
    • Maybe the expected value of our population
    • Maybe the variance of our population
    • Maybe the probability of observing a certain value in our population
  • Unfortunately, our population is infinitely large and/or unknown most of the time
  • To solve this problem, we estimate our population
  • Estimation is the process of using sampling to calculate guesses of population properties
  • Sampling is the process of collecting observations
  • A sampling distribution is a distribution represented by our data
  • The goal of sampling is to collect data that will give us the best guess of some population property of interest
  • In order to find the best guess, we need to collect data that most-closely represents our population

Motivating the Monte Carlo Method

  • If we want to calculate the probability of an observation, then we typically compute integrals by finding antiderivatives of a closed form density function
  • Density functions are deterministic functions

    • Specifically, we will always receive the same probability after inputting our parameters and observtion, since the parameters are assumed to be fixed
  • Although we like to think that probabilities are generated by a single deterministic function in principle, it is sometimes more accurate to think that probabilties are generated from a probabilistic function
  • For example, our deterministic probability function may not use the most accurate fixed parameters (i.e. mean and variance), which leads to inaccurate probability estimates

    • This can especially happen when our sample size is small
    • In this case, our parameter estimates will most likely always lead to poor probability estimates, since our methods for parameter estimation are deterministic (i.e. MLE)
    • Since an element of randomness is included in probabilistic methods of parameter estimation, our parameter estimates have a higher chance of being more accurate compared to using deterministic methods
  • The Monte Carlo method is one of the most popular probabilistic methods of estimation

Defining the Monte Carlo Algorithm

  • Monte Carlo methods generally follow a particular pattern:

    1. Define a domain of possible observations

      • For example, let's say we define the random variable XX, which can take on the values 1,2,3,4,5,61,2,3,4,5,6
    2. Randomly generate simulated values from an assumed probability distribution over the domain

      • Typically, we decide to simulate from a uniform distribution
      • Sometimes, we simulate from normal, beta, bernoulli, and poisson distributions
      • Specifically, the more values we simulate, the greater the coverage of our sample space and the more accurate our estimates become
    3. Perform a deterministic computation on the simulated values

      • For example, we may calculate the probability of observing and even number by calculating the percentage of simulated values that are even to the total number of simulated values
    4. Aggregate the results

      • For example, we may decide to construct a probability distribution by calculating the probabilities of observing each observation in the sample space
      • Or, we may decide to create a credible interval of our parameter estimate

The Law of Large Numbers within the Monte Carlo Method

  • Since Monte Carlo simulation is a probablistic method of calculating probabilities, we don't need to worry about estimating parameters using a deterministic density function
  • The Strong Law of Large Numbers is at the core of the Monte Carlo method
  • Essentially, the Monte Carlo method is an application of the Law of Large Numbers
  • The Monte Carlo method is also referred to as Monte Carlo simulation
  • Simulation essentially refers to artificially generating samples from some assumed population distribution
  • The Monte Carlo method does not typically involve a random walk algorithm
  • The random walk algorithm is typically involved in a markov chain instead

Assumptions of Monte Carlo Sampling

  • We assume that we have a proper sampling procedure, such as simple random sampling
  • We assume that we know the general form of our population distribution

    • For example, we may have a good idea that the population distribution follows a normal distribution
  • We assume that we have a good source of random variables

    • Specifically, we assume that our sample distribution made up of our sample data (or random variables) closely follows the population distribution

Summarizing Monte Carlo Sampling

  • Monte Carlo Simulation is a sampling procedure that involves gathering many observations from a population
  • The goal of Monte Carlo is to build a sampling distribution that effectively reflects its population distribution
  • The Law of Large Numbers makes Monte Carlo Simulation possible
  • Specifically, as our sample grows larger, our sampling distribution will resemble our population distribution

References

Previous
Next

Probabilistic Models

Markov Chains