We wish to simulate a stochastic process where there are N users of our application that we contend will use our app within a 2 hour time period. To perform the simulation, we would like to have our users attempt to use the application at random times such that the distribution of the intervals between events accurately reflects what one might see in the real world.
N = 10000.0 T = 2.0 lmbda = N / T / 60 / 60 lmbda
For the arrival rate, let's set users, and our time interval hours. From that, we can calculate an arrival rate of per hour or users / second.
Now for the times. Starting at we have no arrivals, but as time passes the probability of an event increases, until it reaches a near-certainty. If we randomly chose a value between 0 and 1, then we can calculate a random time interval as
Let's validate this by generating a large sample of intervals and taking their average.
count = int(1E6) x = np.arange(count) y = -np.log(1.0 - np.random.random_sample(len(x))) / lmbda np.average(y)
array([ 0.1315719 , 0.0975074 , 0.23880816, 0.15444505, 0.22988277, 0.77673759, 0.79036711, 0.43808061, 1.29951833, 0.63123162])
So with a rate of new events would arrive on average seconds apart (or ).
We can plot the distribution of these random times, where we should see an exponential distribution.
plt.hist(y, 10) plt.show()
Python contains the
random.expovariate method which should give us similar intervals. Let's see by averaging a large sum of them.
from random import expovariate sum([expovariate(lmbda) for i in range(count)])/count
For completeness, we can also use NumPy's
random.poisson method if we pass in
y = np.random.exponential(1.0/lmbda, count) np.cumsum(y)[:10]
array([ 0.41607448, 1.1684804 , 2.06405238, 2.79645109, 4.59933146, 7.2532691 , 7.32377567, 7.63112724, 7.87685487, 10.05735753])
Again, this is in agreement with our expected average interval. Note the numbers (and histogram plots) won't match exactly as we are dealing with random time intervals.
x = range(count) y = [expovariate(lmbda) for i in x] plt.hist(y, 10) plt.show()
For a timeline of events, we can simply generate a sequence of independent intervals, and then generate a runnng sum of them for absolute timestamps.
intervals = [expovariate(lmbda) for i in range(1000)] timestamps = [0.0] timestamp = 0.0 for t in intervals: timestamp += t timestamps.append(timestamp) timestamps[:10]
[0.0, 1.3224654891101544, 1.6176771941526784, 7.4726505158688035, 9.118169958496466, 9.6621667226626, 9.785745129323853, 10.797284617682134, 11.492111174613239, 12.012244136855243]
deltas = [y - x for x, y in zip(timestamps, timestamps[1:])] deltas[:10]
[1.3224654891101544, 0.29521170504252403, 5.854973321716125, 1.6455194426276627, 0.543996764166133, 0.12357840666125419, 1.011539488358281, 0.6948265569311047, 0.5201329622420037, 0.47842851855116564]
sum(deltas) / len(deltas)
deltas = [y - x for x, y in zip(timestamps, timestamps[1:])] plt.figure(figsize=(16, 4)) plt.plot(deltas, 'r+') plt.show()
Here we can readily see how the time between events is distributed, with most of the deltas below 1.0 with some fairly large outliers. This is to be expected as will always be greater than but perhaps not by much.
Finally, let's generate hours worth of timestamps and see if we have close to our desired value. We will do this 100 times and then average the counts. We should have a value that is very close to .
limit = T * 60 * 60 counts =  for iter in range(100): count = 0 timestamp = 0.0 while timestamp < limit: timestamp += expovariate(lmbda) count += 1 counts.append(count) sum(counts) / len(counts)