Motivating Sampling
-
We always want to know some property related to our population
- Maybe the distribution of our population
- Maybe the expected value of our population
- Maybe the variance of our population
- Maybe the probability of observing a certain value in our population
- Unfortunately, our population is infinitely large and/or unknown most of the time
- To solve this problem, we estimate our population
- Estimation is the process of using sampling to calculate guesses of population properties
- Sampling is the process of collecting observations
- A sampling distribution is a distribution represented by our data
- The goal of sampling is to collect data that will give us the best guess of some population property of interest
- In order to find the best guess, we need to collect data that most-closely represents our population
Motivating the Monte Carlo Method
- If we want to calculate the probability of an observation, then we typically compute integrals by finding antiderivatives of a closed form density function
-
Density functions are deterministic functions
- Specifically, we will always receive the same probability after inputting our parameters and observtion, since the parameters are assumed to be fixed
- Although we like to think that probabilities are generated by a single deterministic function in principle, it is sometimes more accurate to think that probabilties are generated from a probabilistic function
-
For example, our deterministic probability function may not use the most accurate fixed parameters (i.e. mean and variance), which leads to inaccurate probability estimates
- This can especially happen when our sample size is small
- In this case, our parameter estimates will most likely always lead to poor probability estimates, since our methods for parameter estimation are deterministic (i.e. MLE)
- Since an element of randomness is included in probabilistic methods of parameter estimation, our parameter estimates have a higher chance of being more accurate compared to using deterministic methods
- The Monte Carlo method is one of the most popular probabilistic methods of estimation
Defining the Monte Carlo Algorithm
-
Monte Carlo methods generally follow a particular pattern:
-
Define a domain of possible observations
- For example, let's say we define the random variable , which can take on the values
-
Randomly generate simulated values from an assumed probability distribution over the domain
- Typically, we decide to simulate from a uniform distribution
- Sometimes, we simulate from normal, beta, bernoulli, and poisson distributions
- Specifically, the more values we simulate, the greater the coverage of our sample space and the more accurate our estimates become
-
Perform a deterministic computation on the simulated values
- For example, we may calculate the probability of observing and even number by calculating the percentage of simulated values that are even to the total number of simulated values
-
Aggregate the results
- For example, we may decide to construct a probability distribution by calculating the probabilities of observing each observation in the sample space
- Or, we may decide to create a credible interval of our parameter estimate
-
The Law of Large Numbers within the Monte Carlo Method
- Since Monte Carlo simulation is a probablistic method of calculating probabilities, we don't need to worry about estimating parameters using a deterministic density function
- The Strong Law of Large Numbers is at the core of the Monte Carlo method
- Essentially, the Monte Carlo method is an application of the Law of Large Numbers
- The Monte Carlo method is also referred to as Monte Carlo simulation
- Simulation essentially refers to artificially generating samples from some assumed population distribution
- The Monte Carlo method does not typically involve a random walk algorithm
- The random walk algorithm is typically involved in a markov chain instead
Assumptions of Monte Carlo Sampling
- We assume that we have a proper sampling procedure, such as simple random sampling
-
We assume that we know the general form of our population distribution
- For example, we may have a good idea that the population distribution follows a normal distribution
-
We assume that we have a good source of random variables
- Specifically, we assume that our sample distribution made up of our sample data (or random variables) closely follows the population distribution
Summarizing Monte Carlo Sampling
- Monte Carlo Simulation is a sampling procedure that involves gathering many observations from a population
- The goal of Monte Carlo is to build a sampling distribution that effectively reflects its population distribution
- The Law of Large Numbers makes Monte Carlo Simulation possible
- Specifically, as our sample grows larger, our sampling distribution will resemble our population distribution
References
Previous
Next