Probability Distributions

[Home]   [Puzzles & Projects]    [Delphi Techniques]   [Math topics]   [Library]   [Utilities]

 

Search

Search WWW

Search DelphiForFun.org

As of October, 2016, Embarcadero is offering a free release of Delphi (Delphi 10.1 Berlin Starter Edition ).     There are a few restrictions, but it is a welcome step toward making more programmers aware of the joys of Delphi.  They do say "Offer may be withdrawn at any time", so don't delay if you want to check it out.  Please use the feedback link to let me know if the link stops working.

 

Support DFF - Shop

 If you shop at Amazon anyway,  consider using this link. 

     

We receive a few cents from each purchase.  Thanks

 


Support DFF - Donate

 If you benefit from the website,  in terms of knowledge, entertainment value, or something otherwise useful, consider making a donation via PayPal  to help defray the costs.  (No PayPal account necessary to donate via credit card.)  Transaction is secure.

Mensa Daily Puzzlers

For over 15 years Mensa Page-A-Day calendars have provided several puzzles a year for my programming pleasure.  Coding "solvers" is most fun, but many programs also allow user solving, convenient for "fill in the blanks" type.  Below are Amazon  links to the two most recent years.

Mensa 365 Puzzlers  Calendar 2017

Mensa 365 Puzzlers Calendar 2018

(Hint: If you can wait, current year calendars are usually on sale in January.)

Contact

Feedback:  Send an e-mail with your comments about this program (or anything else).

Search DelphiForFun.org only

 

 

 

Here's a program that explores several probability distributions.  This is the definitely a Math Topic, although there will be few if any equations  in this write-up.  Any good probability or statistics book will provide enough equations and formulas to confuse any normal human. The program on the the other hand, has considerable embedded math as discussed below.

Four of the most common and useful distributions are demonstrated here. 

bulletUniform: Uniform distribution can occur for discrete or continuous variables.  They are characterized by  each possible value being equally represented in the population.
bulletPoisson:  A discrete distribution that represents the number of occurrences of some event of interest in a particular unit of time or space.   It's mathematical equation represents actual observed rates for many real life service facilities and is frequently used in discrete event simulations.      
bulletNormal:  The continuous distribution that closely approximates many real life observed measured parameters  drawn from a common population,  (Sizes and weights of manufactured parts for example).   
bulletExponential:  The continuous distribution representing the inverse of  time related Poisson distributions.   For customer arrivals. for example, a population with and average of N customers per time unit will average 1/N time units per customer.   More significantly, samples drawn from such a population will represent the time to the arrival of the first or next customer.  Equipment failures tend to occur at time intervals that are exponentially distributed.  So manufacturers can answer questions like "If this hard drive has a exponentially distributed mean time to failure of 100,000 hours, how much money can we save by reducing the warranty period from 3 years to one year?"   

Each of the four operates the same way - the program generates a specified number of random samples drawn from a population with the distribution being demonstrated.  For each sample, four charts are available:   For discrete variables, each value is represented by a bar.  If the distribution is continuous, then user specifies the number of "buckets" (bars) to create.  

Here are sample charts from the Normal Distribution page:

bulletFrequency:  The  number of observations for each value (discrete) or interval (continuous).
bullet

Cumulative Frequency: The cumulative number of observations for each value (discrete) or interval (continuous).  Each bar represents total number of samples for all values less than or equal to the current value. 

bullet

Probability Density: The frequency chart with each bar's height divided by the total number of observations.

bullet

Cumulative Probability Distribution: The cumulative frequency chart with each bar height divided by the total number of observations.

Each chart shows the distribution of the sample as a bar chart overlaid with a line chart showing the theoretical distribution fort he entire population being sampled.  

There is also a page illustrating the amazing Central Limit Theorem.   The essence of the theorem is that the distribution of the sum of a large number of  independent random variables with any distribution approximates a normal distribution!    This probably account for the usefulness of the normal distribution in tracking errors, for example, since errors manufacturing often have many independent causes which sum to the total measured error in a finished product.   The theorem also applies to samples sums of samples drawn from the same distribution.  That is the case demonstrated  in this program where we  sum samples of a uniformly distributed random variable (each sample value has the same probability of occurring) to form new samples which are approximately normally distributed.     

Non programmers are welcome to read on, but may want to skip to the bottom of the page to download the executable  in zipped format. 

Notes for Programmers

The charts in this program are generated using Delphi's T-Chart component.  If you have the "Standard" or "Personal" editions of Delphi it will likely not include T-Chart and you will not be able to recompile.   "Professional" and higher editions, at least for D5, D6, and D7,  include T-Chart.

The four pages for the four distributions are very similar.  Each page has some Tedit controls to obtain the parameters for that distribution, a "Create a set" button to  make a set of random data points drawn from the distribution under study, and a TRadioGroup box to specify which of the four plot types is to be displayed.  

The Tedits which collect integer values are associated with TUpDown controls just to shift the responsibility for editing input numbers back to Delphi.    For real valued inputs, I used the Val procedure to detect invalid decimal number inputs.     

The Create a set click procedure generates the data.  Conceptually, we will choose random number between 0 and one and then apply the inverse probability distribution  function to find the data value for this  probability.   For discrete distributions, one bar is assigned for each value.  Continuous distributions have a user assigned number of "buckets", intervals to which data values are assigned.  Each value or bucket is used to accumulate the number of samples which have that value (discrete) or which fall within that interval (continuous) .   These FreqCount arrays provide the data for the plots.    The Frequency chart is a straightforward plot of the  FreqCount data.  The Cumulative Frequency chart "integrates" the frequency  data by adding the areas of the rectangles represented by the bars.    Once we have these, the Probability Density, and the Probability Distribution charts are simply copies of the first two with bar frequency counts normalized so that they sum to 1 , by  dividing each value by the number of samples taken.   

Once actual data has been handled, there remains the problem of overlaying a theoretical line chart corresponding to this distribution.  Since the data bars are centered on the value, for continuous distributions we add half of the interval to the data point being evaluated.  For the Normal distribution, there is no explicit formulas for the cumulative distribution, so we resort to  summing the areas of the current rectangle plus the previous rectangles to get the Cumulative Frequency and Cumulative Probability charts.    Exponential has an explicit formula which we use for those charts.     

The Central limit page illustrates the theorem by summing fixed sized sets of uniformly distributed  random numbers and plotting the resulting frequency chart to show the characteristic bell shaped curve that emerges and the number of samples and the subset size increases.

I'm sure that thee are a number of bugs left in this program.  If you happen find one,. use the feedback link to let me know. 

Running/Exploring the Program 

bulletDownload source
bulletDownload  executable

 

Created October 15, 2001

Modified July 29, 2017

 
  [Feedback]   [Newsletters (subscribe/view)] [About me]
Copyright 2000-2017, Gary Darby    All rights reserved.