# SPREADSHEETS GENERATE RESERVOIR UNCERTAINTY DISTRIBUTIONS

March 13, 1995
Spreadsheets can quickly generate and graph normal, lognormal, and triangular distributions for analyzing prospects or problems involving uncertainty in the oil and gas industry. Monte Carlo simulation input distributions are specified in different ways, often by assigning the loth and goth percentiles, P10 and P90. Selecting the appropriate distribution (such as normal, lognormal, and triangular distributions) and justifying that choice is critical to the believability of the model.
James A. Murtha
Consultant
Houston

Gerald J. Janusz
University of Illinois
Urbana-Champaign

Spreadsheets can quickly generate and graph normal, lognormal, and triangular distributions for analyzing prospects or problems involving uncertainty in the oil and gas industry.

Monte Carlo simulation input distributions are specified in different ways, often by assigning the loth and goth percentiles, P10 and P90. Selecting the appropriate distribution (such as normal, lognormal, and triangular distributions) and justifying that choice is critical to the believability of the model.

We will show procedures for calculating parameters for normal, lognormal, and triangular distributions with a spreadsheet, and outline the proof that you always get a unique answer for a triangular distribution.

The significance of the procedure is to avoid P10 and P90 inputs that yield a triangular distribution whose end points are clearly inappropriate. One such case would be if negative values are sampled to estimate net pay.

### DISTRIBUTION RANGES

A popular means of specifying a particular distribution is to indicate P10 and P90, along with a measure of central tendency such as either P50 or the mode of the distribution.

For either the normal or lognormal distributions, the user can specify P10 and P90 and solve for the mean and standard deviation.

The triangular distribution is specified by its minimum (P0), mode (most likely), and maximum (P100). If the user specifies P10, M, and P90, the questions arise:

• Does the common use of P10 and P90 and P50 or the mode always translate to a unique distribution?

• And how hard is it to determine the formal parameters of mean and standard deviation for the normal and lognormal?

It is easy to show that any two points on the normal curve determine its mean and standard deviation. This is true for the lognormal. The triangular distribution, however, requires three points. Somewhat surprising is how hard it is to get from the usual P10, P90, and mode to the PO and P100.

Even if the user prefers to input P10 and P90 rather than the defining parameters, there is an advantage to fully understand the distribution to be sampled. That is, it does no harm and might do considerable good for a user to know the full range of a triangular distribution.

Excel or other spread-sheet programs can solve for the parameters quickly.

Once solved for, the spread-sheet can graph the appropriate distribution. This preliminary step to Monte Carlo simulation might help avoid pitfalls associated with using distributions without knowing their associated practical ranges of values.

### METHODOLOGY

Monte Carlo simulation methods are employed in the oil and gas industry to evaluate prospects or to analyze problems that involve uncertainty. The basis of a simulation is a set of equations and assumptions. The variables (or parameters) of the equations are classified as inputs or outputs. Some of these inputs are treated as probability distributions rather than numbers.

With today's more powerful PCs, Monte Carlo simulation often is handled by spreadsheet add-ins such as @RISK (Palisade Corp.) or Crystal Ball (Decisioneering Inc.). A trial consists of selecting one value for each input parameter, according to some specified distribution, and recalculating the worksheet.

A simulation is a succession of hundreds or thousands of repeated trials during which the output values are stored. Afterwards, the values for each output variable are grouped into a histogram or cumulative distribution function.

Thus, Monte Carlo simulation is an alternative to both single point (deterministic) estimation and the scenario approach that presents worst, most likely, and best cases.

Three examples of input distributions are:

1. To simulate reserves, N, using the volumetric formula (N = AhR), we need probability distribution estimates for area, A, pay, h, and recovery, R.

2. To simulate a forecast of oil production rate, q, with an exponential decline model (q = qie-at), we need distributions for initial rate, qi, and decline rate, a.

3. To model an exploration program involving several countries or several plays, we generate distributions for successful wells, total reserves added, and total capital, N, using input distributions of reserves, Ni, capital, Ci, and geologic success rates for each component.

### DEFINING DISTRIBUTIONS

Each probability distribution requires parameters to define it. A normal distribution is usually specified by its mean and standard deviation. Thus the estimated capital investment for an arctic-strengthened platform might be modeled with a normal distribution (Fig. 1) (30474 bytes) with a mean of \$575 million and a standard deviation of \$40 million, written as Normal (575,40).

Similarly, a lognormal distribution is uniquely determined by specifying its mean and standard deviation. Exploration prospect reserves might be estimated with a lognormal distribution (Fig. 2) (31508 bytes) with a mean of 2 billion bbl and standard deviation 1.5 billion bbl, written as Lognormal (2,1.5).

We might choose to model operating expense for a given year as a triangular distribution (Fig. 3) (32697 bytes) with a low value of \$8,100, a most likely value of \$9,000, and a high value of \$10,800, written as Triangular (8.1,9,10.8).

### SPECIFING DISTRIBUTIONS

Often, experts estimating these distributions prefer certain percentiles, such as P10 and P90, rather than the mean and standard deviation, in the case of normal or lognormal. For triangular distributions, they may prefer P10, and P90 along with the mode (most likely) estimate.

Thus, the expert providing input for reserves of an exploration prospect might specify P10 = 0.68 billion bbl and P90 = 3.76 billion bbl, which would lead to a lognormal distribution of mean and standard deviation of about 1.0 million and 1.5 million bbl, respectively. Similarly, the operating costs might be input as P10 = \$8,600, Mode = \$9,000, and P90 = \$10,100.

### CONVERTING PARAMETERS

Lest we take anything for granted, it is important to know that these specifications of a distribution in terms of P10 and P90, and in some cases the mode, do indeed yield the desired distribution. It may be useful to think of the following facts from grade school math class:

• Two points determine a line.

• Three (non-collinear) points determine a circle.

Our question becomes, how many points does it take to uniquely determine a normal distribution or a lognormal distribution or a triangular distribution?

These points we are referring to are on the probability density function or the cumulative distribution function of the random variable in question. Three relevant facts are:

1. Any two points on a normal curve completely determine the curve.

2. Any two points on a lognormal curve completely determine the curve.

3. The mode, together with one point to the left and one to the right, completely determine a triangular distribution.

### USER SPECIFICATIONS

The normal curve is especially easy to convert from any two given points to the necessary parameters of mean and standard deviation. We use the concept of z-scores or standard normal curve, Normal (0,1) (Fig. 4). (29778 bytes) All statistics books have a table of z-scores. Table 1 (12330 bytes) is an abbreviated listing.

The idea is to convert from any normal curve to the standard normal using the equation:

z = (X - mean)/std

The z-score simply measures the distance from a given value of the random variable X to the mean value in units of standard deviations. From this table, we can read that only 1% of the values of a normal distribution exceed 2.33 standard deviations above the mean. Using symmetry, it follows that 98% of all normal distribution values fall within 2.33 standard deviations of the mean.

A more detailed table would reveal the often-stated fact that three standard deviations on either side of the mean capture 99.7% of all the data, two standard deviations capture 68%, and so on.

When programming this relationship in a spreadsheet, special statistical functions such as Normsinv in Excel return a value of z for any given value of probability.

### NORMAL CURVE

Let us suppose that we wish to use a normal distribution to describe the porosity of a limestone interval in a West Texas infill drilling prospect. Our experts estimate P10 and P90 to be 5.8% and 13.6%, respectively.

We seek the mean and standard deviation of the normal distribution. From Table 1 (12330 bytes), the z-scores corresponding to P10 and P90 are -1.28 and +1.28, giving rise to two equations in two unknowns (m and s):

-1.28 (5.8 - m)/s

+1.28 (13.6 - m)/s

Multiplying both equations by s and adding the results gives:

= 19.4 - 2m m = 9.7

Substituting this value of m into the first equation yields:

-1.28s = (5.8-9.7)

s = 3.05%

Thus, the distribution would be Normal (9.7,3.05) (Fig. 5). (33566 bytes)

### LOGNORMAL CURVE

Although a lognormal curve requires only two points to be completely determined, the computation is a bit more involved than for the normal distribution. By definition, if X is a lognormal variable with mean and standard deviation, m' and s', then the natural logarithm, ln(X), is normally distributed, with parameters m and s. The precise relationship between m, s, m', and s' is:

m' = exp(m + S2/2)

s' = m'(exp(S2)-1)1/2

If we begin with P10 and P90 for X, then ln(PlO) and ln(P90) are the 10 and 90-percentiles for the corresponding normal distribution, because the natural log function maintains the order of numbers, a

Suppose we believe that acreage is lognormally distributed with a P10 and P90 of 680 and 3,762, respectively. Then, in the associated normal distribution:

-1.28 = (ln(680) - m)/s = (6.522 - m)/s

+1.28 = (ln(3762)- m)/s = (8.23 - m)/s

Solving for m and s as shown before, we get:

m = 7.38

s = 0.67

Then using the relationships between the two distributions:

m'= exp(7.38 + 0.67 2/2) = 2,000

s' = 2,000(exp(0.672) - 1)1/2 = 1,500

### TRIANGULAR DISTRIBUTION

The triangular distribution, Triangular(L,M,H), requires three points to determine its three parameters. When users indicate P10 and P90 values, they must also specify a third point. In the case where M is provided, the problem then is to determine L and H.

Except for symmetric triangles, triangle distribution problems, unlike normal and lognormal distributions, have no analytical solution. There are no simple algebraic formulas that lead from P10, M, and P90 directly to L and H. Instead, one can use an iterative procedure starting with guesses for L and H and repeating some steps to estimate L and H to the desired accuracy. Details are discussed in the box.

As an example, suppose we believe that operating costs have a triangular distribution with:

PI O = \$8,600

Mode = \$9,000

P90 = \$10,100

Table 2 (25213 bytes) shows the initial estimate and 12 successive values from the iterative procedure, yielding estimates of 8,330 and 10,504 respectively for L and H.

Two reasons for having a procedure for obtaining L and H from PI O, M, and P90 are:

1. Available software that does Monte Carlo simulation will always have the function Triangular(L,M,H), but may not have Triangular(PlO,M,P90).

2. The cautious user may wish to know the values of L and H before using the distribution in a simulation model. What if L or H falls outside an acceptable range of values?

A typical application for volumetric estimate of reserves might use triangular distribution for net pay thickness. Suppose the user specifies:

P10 = 30

M = 60

P90 = 120 ft

Then the true range of this parameter is from -1.57 to 160.3. In other words, we have the absurd possibility of sampling a negative value for net pay!

Editor's note: To obtain Excel and Lotus 123 worksheets to handle the calculations and draw the graphs, journal subscribers can send a blank 5 1/4 or 3 1/2 diskette formatted to MS DOS and a self-addressed, postage paid or stamped return diskette mailer to: Production Editor, Oil & Gas Journal, 3050 Post Oak Blvd., Suite 200, Houston, TX 77056, USA.

Subscribers outside the U.S. should send the diskette and return mailer without return postage to the same address. This mail offer will expire Aug. 31, 1995.

### THE AUTHORS

Jim Murtha is an independant consultant who presents seminars and advises clients in risk analysis and decision making. He previously chaired a math department, taught petroleum engineering, and served as academic dean of a college.

Murtha received his PhD in mathamatics from the University of Wisconsin-Madison in 1964 and an MS in petroleum and natural gas engineering from Penn State in 1983. He has co-authored two texts in mathematics and statistics and recently published "Decisions Involving Uncertainty--An @RISK Tutorial for the Petroleum Industry." Murtha is a registered petroleum engineer.

Gerald J. Janusz is chaiman of the Department of Mathematics at the University of Illinois at Urbana-Champaign. He has been on the faculty there since 1968. Janusz received his PhD in mathematics from the University of Oregon in 1965. He has published 30 research papers in algebra and three text books, the latest if which is "Calculus," published in 1994.