Squaring a Normal Distributions
By P Barber (March 2010)
Abstract
This paper develops a set of empirical equations for calculating the Humphreys distribution parameters of a dataset obtained by squaring a Humphreys distribution with Pm = 0.5. (Where a Humphreys distribution with Pm = 0.5 represents a set with a Normal distribution)
Introduction
1. The parameters of a Statistical Distribution provide a precise definition of a dataset. If two Statistical Distributions are multiplied together, the result will also be precisely defined.
2. This paper examines the effect of squaring normal distributions with varying tolerance T.
Where T = 0.1 in combination with a Mean Value (Mi) of 20,000 would provide a three standard deviation limit of 0.1 x 20,000 = +/- 2,000,
Or a Humphreys distribution with factors: Mi = 20,000, Ni = 18,000 (20,000 – 2,000) and Xi = 22,000 (20,000 + 2,000). With Pm = (Mi – Ni) / (Xi – Ni) = 2,000/4,000 = 0.5
By definition any Humphreys distribution with Pm = 0.5 is a normal distribution with Standard deviation = (Xi – Ni)/6 = 4,000/6 = 666.66666.
3. The table below shows the results of a series of tests which indicate the effect of squaring Normal distributions with differing tolerances. The data in the table has been determined by fitting a Humphreys distribution to the results of a Monte Carlo simulation, using the Anderson Darling technique. The Anderson Darling statistic is shown along side each of the fitted results.
Based on M2 = 400,000,000 |
|
|
|||
T |
No |
Mo |
Xo |
AD |
|
0.05 |
371,438,028 |
400,020,308 |
428,537,578 |
0.147249 |
|
0.10 |
344,136,500 |
399,068,200 |
457,469,800 |
0.171935 |
|
0.15 |
315,828,356 |
398,368,686 |
487,064,901 |
0.138283 |
|
0.20 |
290,512,955 |
396,182,227 |
516,002,564 |
0.287446 |
|
0.25 |
264,877,424 |
393,154,505 |
546,871,229 |
0.121250 |
|
0.30 |
238,407,582 |
390,492,388 |
578,003,027 |
0.165118 |
|
0.35 |
213,249,882 |
386,961,332 |
609,257,224 |
0.173640 |
|
0.40 |
188,606,512 |
382,883,310 |
640,941,171 |
0.189685 |
|
0.45 |
164,480,277 |
378,263,005 |
673,044,064 |
0.209646 |
|
0.50 |
139,859,418 |
374,094,343 |
704,628,005 |
0.249577 |
|
0.55 |
117,682,172 |
368,828,250 |
735,650,020 |
0.503540 |
|
0.60 |
95,220,720 |
361,198,110 |
771,765,200 |
0.296411 |
|
0.65 |
73,866,500 |
356,232,000 |
801,269,800 |
0.677293 |
|
|
|
|
|
|
|
4. It was noted that the software pckage Oracle Crystal Ball indicated that it was possible to fit a Gamma distribution to many of the results of the Monte Carlo distribution. For T = 0.6 a Gamma distribution was fitted with AD = 0.1824 however for T =0.65 the best fit was a Lognormal distribution with AD = 0.1715
T |
Crystal Ball Best-Fit |
AD |
0.05 |
Normal |
0.1502 |
0.10 |
Gamma |
0.1961 |
0.15 |
Gamma |
0.1563 |
0.20 |
Gamma |
0.2634 |
0.25 |
Gamma |
0.1802 |
0.30 |
Gamma |
0.1739 |
0.35 |
Gamma |
0.1701 |
0.40 |
Gamma |
0.1705 |
0.45 |
Gamma |
0.1719 |
0.50 |
Gamma |
0.1751 |
0.55 |
Gamma |
0.2166 |
0.60 |
Gamma |
0.1824 |
0.65 |
Log Normal |
0.1715 |
5. In the graph below the values of AD shown in the table are plotted against the tolerance T, and a trend line added as a reference. While it is considered that the trend line has little if any significance, it can be seen that a number of points, namely those at T = 0.05, 0.15, 0.20 and 0.55 appear at a significant distance from the trend line, it is postulated that these data points have been derived from distributions which are less representative. It will also be noted that these data points deviate from the trend line on the curves plotted for Skew v T and for Kurt v T, which are shown in the appendix. It should be noted however, that these points appear to have little effect (other than the AD Statistic) on the factors calculated for the Humphreys distribution parameters.
6. In the table below the No.Mo.Xo parameters of the Humphreys distribution have been converted into normalised factors, by dividing the parameters by Mi2 (in this case Mi2 = 400,000). The table also shows the Skew, Kurt and the Ratio ‘Standard deviation/Average’ for the distributions produced.
T |
Fn |
Fm |
Fx |
Skew |
Kurt |
Std/Ave |
0.05 |
0.928595 |
1.000051 |
1.071344 |
0.002556 |
0.037161 |
0.023813 |
0.10 |
0.860341 |
0.997671 |
1.143675 |
0.064519 |
0.000399 |
0.047153 |
0.15 |
0.789571 |
0.995922 |
1.217662 |
0.075656 |
0.049689 |
0.071436 |
0.20 |
0.726282 |
0.990456 |
1.290006 |
0.126011 |
0.003346 |
0.094078 |
0.25 |
0.662194 |
0.982886 |
1.367178 |
0.169637 |
0.042688 |
0.118037 |
0.30 |
0.596019 |
0.976231 |
1.445008 |
0.204375 |
0.063460 |
0.141746 |
0.35 |
0.533125 |
0.967403 |
1.523143 |
0.238892 |
0.087514 |
0.165512 |
0.40 |
0.471516 |
0.957208 |
1.602353 |
0.273145 |
0.114802 |
0.189343 |
0.45 |
0.411201 |
0.945658 |
1.682610 |
0.307096 |
0.145272 |
0.213251 |
0.50 |
0.349649 |
0.935236 |
1.761570 |
0.340703 |
0.178862 |
0.237244 |
0.55 |
0.294205 |
0.922071 |
1.839125 |
0.360889 |
0.165086 |
0.260417 |
0.60 |
0.238052 |
0.902995 |
1.929413 |
0.406735 |
0.255123 |
0.285525 |
0.65 |
0.184666 |
0.890580 |
2.003175 |
0.432517 |
0.287410 |
0.309104 |
7. Plotting the factors Fn, Fm, Fx against T, and fitting a line reveals that a reasonable trend-line can be developed for each of the factors and the fitted equations provide the basis for the empirical relationship. Note that the trend line was forced to pass through the point T = 0, F(n,m,x) = 1.000, because if T = 0, then both input and output variables are represented by single numbers such that:
No = Mo = Xo = Mi2.
8. The fitted trend lines provide the following empirical equations for estimating the values of Fn, Fm and Fx:
Fn = |
0.260265 T2 - 1.425717 T + 1.000 |
||
|
|
|
|
Fm = |
0.262832 T2 + 0.000174 T + 1.000 |
||
|
|
|
|
Fx = |
0.190183 T2 + 1.426252 T + 1.000 |
Example
9. Given the parameters of a symmetrically distributed Humphreys distribution:
Ni = |
18,000 |
Mi = |
20,000 |
Xi = |
22,000 |
It is first necessary to calculate the value of Pm, to ensure that Pm = 0.5
Pm = (Mi – Ni)/(Xi –Ni) = (20,000 – 18,000)/(22,000 – 18.000) = 0.5 indicating that the condition is satisfied.
The value of T = (Xi –
Ni)/(2 x Mi) = (22,000 – 18,000)/( 2 x 20,000) = 0.1
|
T2 = 0.001 |
T = 0.1 |
Plus 1 |
|
Fn constants |
0.260265 |
-1.425717 |
1.000000 |
|
Fm constants |
0.262832 |
0.000174 |
1.000000 |
|
Fx constants |
0.190183 |
1.426252 |
1.000000 |
|
|
|
|
|
|
Multiply by: |
T2 = 0.001 |
T = 0.1 |
Plus 1 |
Total |
Fn = |
0.002603 |
-0.142572 |
1.000000 |
0.860031 |
Fm = |
0.002628 |
0.000017 |
1.000000 |
1.002646 |
Fx = |
0.001902 |
0.142625 |
1.000000 |
1.144527 |
|
|
|
|
|
Mi2 = |
|
|
|
400,000,000 |
|
|
|
|
|
Multiplying Fn, Fm and Fx by Mi2 yields: |
|
|
||
No = |
|
|
|
344,012,380 |
Mo = |
|
|
|
401,058,288 |
Xo = |
|
|
|
457,810,812 |
Conclusion
10. It can be seen from the above that it is possible to generate a meaningful parametric relationship for determining the Humphreys distribution parameters for the results obtained by squaring a Normal distribution. However, a number of questions remain. First is the question of accuracy. The empirical relationships described above have been based on single point results. It is also recognised that the distribution produced by a Monte Carlo Simulation is not smooth and is not repeatable. It seems likely however, that utilising the results of many simulations would enable the Humphreys distribution parameters to be fitted with accuracy. Although it is apparent from the analysis above that the Humphreys distribution does not provide a good fit, as the Tolerance T of the input distribution is increased. However, the value of AD is not sufficiently high to reject the Null Hypothesis, that the results could not have been drawn at random from a Humphreys distribution. In support of this assertion the graph below shows the results of a series of a series (100) of simulations designed to determine the significance level of associated with the final result No = 73,866,500 Mo = 356,232,000 and Xo = 801,269,800. (Pm = 0.38818)
11. Referring to the above graph, at the AD of 0.677293 associated with the last result, the level of significance is approximately 60%, which is quite high since it is usual to reject at the 0.05 level, which corresponds to AD values greater than about 2.4. Hence it can be claimed that the Humphreys distribution provides a good fit of the results of squaring normal distributions, at least up to T = 0.65
Appendix
This appendix shows plots of parameters derived using Excel functions from the datasets, of N = 10,000 values produced by Monte Carlo simulation. The plots are:
a) Skew v T
b) Kurt v T, and
c) Std/Average