This is a set of full programs and quick code snippets that I need from time to time. There is a full program that will fit a Normal Distribution (Gaussian Distribution, for those so inclined) to a set of random data that came about from some analysis I was carrying out on atomistic motion. There are subroutines that can be readily applied to other statistical analysis problems.
When I originally put barryhaycock.com together, it hadn't occurred to me to have a section for "Code Snippets", little pieces of code or the like that would be handy to grab-and-go or would be useful to refer colleagues to. So, for the moment, I'm hosting that here. One limitation with Webflow that I discovered was that cutting-and-pasting in code resulted in the loss of all formatting, but the guys in Webflow literally applied a fix within 24 hours to assist there. Had they not, I probably would have just linked to the source files.
One resource I cannot recommend more highly if you need some statistical code is on Dr. D. Bala Subrahamanyam site. I've never met Sabbu, and have no other affiliation with him, but I've found his Fortran Toolbox very useful and it's only fair that I place a link to his work here, the button below will open in a new window.
Sabbu's Fortran ToolboxAnother really helpful site is written by Alexis D. Chorozoglou. It's kind of a beginners' version of the infamous Numerical Recipes in C. The interpolation implementation there has been very useful. That site is linked on the button below (again, opens in a new page).
Numerical AnalysisThis is a full compilable program that will accept a file called "input.dat", which must contain data in a simple column format, X and Y. Y must be dependent on X. It will take the input data and will crudely attempt to fit a normal distribution to the data. It will then call gnuplot and pop up the data and the fit on-screen (Linux / Unix) and will allow the user to then tune the parameters. I have a version that will do multiple overlapping fits of normal distributions within a similarly formatted dataset, which I don't host here for the sake of space. If anyone needs it, though- email me and I'll gladly pass on my source.
This program requires a couple of in-source edits for use, but it is very easy to alter that for input files or command-line arguments as required.
! ============================================================================
! Name : Fit Gaussian.f90
! Author : Barry Haycock
! Version : 0.1
! Description : Part of moving towards a neat way to observe the optical ! calculation output and to choose a better setup, this is a small program to ! fit a single gaussian to some output data.
! ============================================================================
! Next step is going to be a really heavy setup with multiple curves.
! These will sum, so I can use arrays of Mu/Sigma/A/B
program Fit_Gaussian
implicit none
real Mu, sigma, A, B ! Guassian parameters.
real Chi, Chi_squared ! Need this stuff too
real, allocatable :: X_Vals(:)
real, allocatable :: Y_Vals(:)
integer num_points, max_peak_location
real max_peak, std_err, mean_Y, variance, std_dev
character (len=30) filename
character (len=20) plotscriptfile, nullword
!Tweaking Variables:
integer iA, iB, isigma
real dA, dB, dSigma
real new_A, new_B, new_sigma
real final_A, final_B, final_sigma
real current_chi_squared
! Heuristic Tuning variables
logical finished
character option
real current_A, current_B, current_sigma, current_Mu
integer i
! Read in data
filename = "input.dat" ! Obviously, because this is version 0.01
num_points = 100 ! Number of points in the file - This and line above should be made into inputs. allocate (X_Vals(num_points)) allocate (Y_Vals(num_points)) open (unit = 101, file = filename, status = "old")
read (101, *) nullword ! loose the header
do i = 1, num_points ! Read in data
read(101, *) X_Vals(i), Y_Vals(i)
end do
! find the peak
max_peak = 0.0d0
do i = 2, num_points - 1
if (Y_Vals(i) .ge. Y_Vals(i-1) .and. Y_Vals(i) .ge. Y_Vals(i+1)) then
if (Y_Vals(i) .ge. max_peak) then
max_peak = Y_Vals(i)
max_peak_location = i
end if
end if
end do
write (*,*) " "
write (*,*) " "
write (*,*) " "
write (*,*) "Max peak location is ", max_peak_location, "equal to ", max_peak A = max_peak Mu = X_Vals(max_peak_location)
! Now, we fit a guassian.
! This version is going to take some nice, neat initial guesses and work from there.
! Gaussian is of the form: y = B + A(exp((-(x-Mu)^2)/2*sigma^2)
! We will choose a guess value for each of A, B and sigma
! Then graph
! Then calculate Chi_squared
! Then I'll work out a way to iterate the whole thing.
! First guess:
B = 10 ! First guess for offset
sigma = 0.5 ! First guess for sigma
! Calculate chi_squared
chi_squared = 0.0d0
std_err = sqrt(Y_Vals(i))
call calculate_chi_squared(X_Vals, Y_Vals, num_points, A, B, sigma, std_err, Mu, chi_squared)
write(*,*) "Chi Squared = ", chi_squared write(*,*) "When A =", A, "B = ", B, "sigma =", sigma, "and Mu =", Mu
!Sweep sigma for best Chi_squared fit
current_chi_squared = 999.9d10
current_chi_squared = chi_squared dsigma = sigma / 50
final_sigma = 0.0d0
do isigma = -50, 50
new_sigma = sigma + (isigma * dsigma)
call calculate_chi_squared(X_Vals, Y_Vals, num_points, A, B, new_sigma, std_err, Mu, current_chi_squared)
if (current_chi_squared .lt. chi_squared) then
final_sigma = new_sigma
chi_squared = current_chi_squared
end if
end do
if (final_sigma .eq. 0.0) then
write(*,*) "No sigma improvement found"
final_sigma = sigma
end if
!Sweep A for best Chi_squared fit
dA = A/100 new_A = 0.0d0
final_A = 0.0d0
do iA = -50, 50
new_A = A + (iA*dA)
call calculate_chi_squared(X_Vals, Y_Vals, num_points, new_A, B, final_sigma, std_err, Mu, current_chi_squared)
if (current_chi_squared .lt. chi_squared) then
final_A = new_A
chi_squared = current_chi_squared
end if
end do
if (final_A .eq. 0.0) then
write(*,*) "No Amplitude improvement found"
final_A = A
end if
! Sweep B for best Chi_squared fit
dB = B/100
new_B = 0.0d0
final_B = 0.0d0
do iB = -50, 50
new_B = B + (iB*dB)
call calculate_chi_squared(X_Vals, Y_Vals, num_points, final_A, new_B, final_sigma, std_err, Mu, current_chi_squared)
if (current_chi_squared .lt. chi_squared) then final_B = new_B
chi_squared = current_chi_squared
end if
end do
if (final_B .eq. 0.0) then
write(*,*) "No Offset improvement found"
final_B = B
end if
! write out (sort of-) tweaked values
write(*,*) " "
write(*,*) " After Tweaking "
write(*,*) "Chi Squared = ", chi_squared
write(*,*) "When A =", final_A, "B = ", final_B, "sigma =", final_sigma, "and Mu =", Mu
! Send to plot
call all_plot (X_Vals, Y_Vals, num_points, final_A, new_B, final_sigma, std_err, & & Mu, current_chi_squared)
! Ask user for Huerestic input
finished = .false.
current_A = Final_A
current_B = Final_B
current_Mu = Mu
current_sigma = final_sigma
do while (finished .eqv. .false.)
write (*,*) "Which value would you like to change? "
write (*,*) "(A)mplitude, (O)ffset, (S)igma, (M)u or (D)one?"
read (*,*) option
if (option .eq. "A") then
write (*,*) "Current A value is:", current_A
write (*,*) "The Tweaked value was ", Final_A
write (*,*) "what value would you like to try?"
read (*,*) current_A
else if (option .eq. "O") then
write (*,*) "Current Offset value is:", current_B
write (*,*) "The Tweaked value was ", Final_B
write (*,*) "what value would you like to try?"
read (*,*) current_B
else if (option .eq. "S") then
write (*,*) "Current sigma value is:", current_sigma
write (*,*) "The Tweaked value was ", Final_sigma
write (*,*) "what value would you like to try?"
read (*,*) current_sigma
else if (option .eq. "M") then
write (*,*) "Current Mu value is:", current_Mu
write (*,*) "The Tweaked value was ", Mu
write (*,*) "what value would you like to try?"
read (*,*) current_Mu
else if (option .eq. "D") then
write(*,*) "Excellent! Moving on"
finished = .true.
else
write(*,*) "Unknown input, try again"
cycle
end if
! send to plot
call all_plot (X_Vals, Y_Vals, num_points, current_A, current_B, current_sigma, &
& std_err, current_Mu, current_chi_squared)
write (*,*) "New Chi^2 value is: ", current_chi_squared
end do
!deallocates
deallocate (X_Vals)
deallocate (Y_Vals)
end program
subroutine plot_in_gnuplot(X_Vals, Y_Vals, num_points)
implicit none
real, dimension (num_points), intent(in) :: X_Vals
real, dimension (num_points), intent(in) :: Y_Vals
integer, intent(in) :: num_points
integer i !Counter
! writeout tempfile
open(unit = 999, file = "tempDataPoints", status = "replace")
do i = 1, num_points
write(999,*) X_Vals(i), Y_Vals(i)
end do
close (999)
! Writeout gnuplot script
open (unit = 222, file = "gnuplotTemp", status = "replace")
!write first lines of gnuplot script
write(222, *) "set term x11 persist" !wxt
write(222, *) "set border lw 1"
write(222, *) 'set format x "%2.1f"'
write(222, *) "set mxtics 2"
write(222, *) " set termoption enhanced"
write(222, *) ' set ylabel "Y (arb)" font "Georgia, 12"'
write(222, *) ' set xlabel "X (arb)" font "Georgia, 12"'
write(222, *) ' set xtics font "Georgia, 9"'
write(222, *) ' set ytics font "Georgia, 9"'
write(222, *) 'plot "./tempDataPoints" using 1:2 w p'
close (222)
call system( "gnuplot gnuplotTemp")
end subroutine plot_in_gnuplot
subroutine calculate_chi_squared(check_X_Vals, check_Y_Vals, num_points, check_A, &
& check_B, check_sigma, std_err, check_Mu, check_chi_squared)
implicit none
real, dimension (num_points), intent(in) :: check_X_Vals
real, dimension (num_points), intent(in) :: check_Y_Vals
integer, intent(in) :: num_points
real, intent(in) :: check_sigma, check_Mu
real, intent(in) :: check_A, check_B
real, intent(out) :: check_chi_squared
integer i !Counter
real std_err
real gaussian, residual
check_chi_squared = 0.00d0
do i = 1, num_points
! Calculate residual of data
! Calculate current gaussian at this point
gaussian = check_B + check_A*(exp((( -1 * (check_X_Vals(i) - check_Mu)**2))/ (2 * check_sigma**2)))
residual = check_Y_Vals(i) - gaussian
check_chi_squared = check_chi_squared + (residual**2 / std_err**2)
end do
return
end subroutine
subroutine stats(Vals, num_points, avg, variance, sddev, std_err)
implicit none
real, dimension (num_points), intent(in) :: Vals
integer, intent(in) :: num_points
real, intent(out) :: avg, variance, sddev, std_err
integer i
real sum_Vals
sum_Vals = sum(Vals)
write(*,*) "sum_Vals ", sum_Vals
avg=sum_Vals/num_points
variance = 0
do i=1, num_points
variance = variance + (((Vals(i)-avg)**2.0))
end do
variance = variance/num_points
sddev = sqrt(variance)
std_err = sddev/sqrt(real(num_points))
end subroutine stats
subroutine all_plot(check_X_Vals, check_Y_Vals, num_points, check_A, check_B, &
& check_sigma, std_err, check_Mu, check_chi_squared)
implicit none
integer, intent(in) :: num_points
real, dimension (num_points), intent(in) :: check_X_Vals
real, dimension (num_points), intent(in) :: check_Y_Vals
real, intent(in) :: check_sigma, check_Mu, std_err
real, intent(in) :: check_A, check_B
real, intent(out) :: check_chi_squared
real gaussian
integer i !Counter
call calculate_chi_squared(check_X_Vals, check_Y_Vals, num_points, check_A, &
& check_B, check_sigma, std_err, check_Mu, check_chi_squared)
! writeout tempfile
open(unit = 999, file = "tempDataPoints", status = "replace")
do i = 1, num_points
gaussian = check_B + check_A*(exp((( -1 * (check_X_Vals(i) - check_Mu)**2))/ (2 * check_sigma**2)))
write(999,*) check_X_Vals(i), check_Y_Vals(i), gaussian
end do
close (999)
! Writeout gnuplot script
open (unit = 222, file = "gnuplotTemp", status = "replace")
!write first lines of gnuplot script
write(222, *) "set term x11 persist" !wxt
write(222, *) "set border lw 1"
write(222, *) 'set format x "%2.1f"'
write(222, *) "set mxtics 2"
write(222, *) " set termoption enhanced"
write(222, *) ' set ylabel "Y (arb)" font "Georgia, 12"'
write(222, *) ' set xlabel "X (arb)" font "Georgia, 12"'
write(222, *) ' set xtics font "Georgia, 9"'
write(222, *) ' set ytics font "Georgia, 9"'
write(222, *) 'plot "./tempDataPoints" using 1:2 w p title "data", "./tempDataPoints" using 1:3 w l title "Gaussian"'
close (222)
call system( "gnuplot gnuplotTemp")
end subroutine all_plot
For my purposes here, an xyz file is a very simple format. The header is a single line containing the number of atoms in the file. Following that is the list of atoms with the format nZ, x, y, z. Where x, y, z are cartesian coordinate positions and nZ is the atomic number of whatever element that is at that point.
randomXYZSubstituter simply passes through the xyz file and replaces some ratio of one element with another by changing the nZ value. Usage is:
> ./randomXYZSubstituter.x [SYMBOL1] [SYMBOL2] [AMOUNT] [SEED]
where [SYMBOL1] is the atomic number of the element to be replaced, [SYMBOL2] is the atomic number of the element to replace it with. [AMOUNT] is the ratio of the number of atoms of species1 to replace with species 2 (for example, 0.1 means replace 10% of the atoms of atomic number [SYMBOL1]). [SEED] is the random seed to use (any integer). [SEED] is a command-line parameter because if we use linux time as the seed in an array job, many of the "random" outputs will be identical.
randomXYZSubstituter reads in a file called "input.xyz" and outputs a file called "output.xyz".
program randomXYZSubstituter
implicit none
! Description:
! This program accepts 4 command line arguments so as it can be used for array-job submission.
! This will change read in the input.xyz file and output an output.xyz file with the first replace with the second randomly to the
! amount of the third argument the fourth argument is the input to the seed for
! the random number generator to the time + jobID.
! USAGE:
! ./randomxyzSubstitute [SYMBOL1] [SYMBOL2] [AMOUNT] [SEED]
!
! where :
! [SYMBOL1] is the symbol of the atom to be replaced
! [SYMBOL2] is the symbol of the atom to raplace [SYMBOL1]
! [AMOUNT] is some less than zero real number akin to the ratio of SYMBOL1s to
! be replaced by SYMBOL2s
! [SEED] is some integer to serve as a radom seed.
! Author:
! Barry Haycock
! www.barryhaycock.com
! Subroutine Variables
integer i ! counter over atoms for writeout
integer seedNumber
character (len = 5) nullWord
! Subroutine Variables
integer ix ! counter over vectors
integer, allocatable :: OrigSpecies(:)
integer, allocatable :: FinalSpecies(:)
real, allocatable :: FinalX(:), FinalY(:), FinalZ(:)
integer j, k, atom, natoms
real x, y, z
integer finalCount
integer, dimension(3) :: tarray
integer dopee, dopand
integer numDoped, numVictims
integer PossibleVictim
real percent
integer, allocatable :: victimList (:)
integer numDopees
! Procedure:
! Open input file:
open (unit = 999, file = "input.xyz", status = "old")
read (999,*) natoms
allocate (OrigSpecies(natoms))
allocate (FinalX(natoms))
allocate (FinalY(natoms))
allocate (FinalZ(natoms))
allocate (FinalSpecies(natoms))
do i = 1, natoms
read(999, *) OrigSpecies(i), FinalX(i), FinalY(i), FinalZ(i)
end do
close (999)
! Time to substitute
call getarg(1, nullWord)
read (nullWord,*) dopee
call getarg(2, nullWord)
read (nullWord,*) dopand
! COunt the number of dopee's in the xyz file
numDopees = 0
do i = 1, natoms
if (OrigSpecies (i) .eq. dopee) then
numDopees = numDopees + 1
end if
end do
allocate (victimList(numDopees))
write(*,*) "Number of replaceable sites:", numDopees
! generate a list of possible Dopees.
j = 0
do i = 1, natoms
if (OrigSpecies(i) .eq. dopee) then
j = j + 1
victimList(j) = i
end if
end do
call getarg(3, nullWord)
read (nullWord,*) percent
! work out how many atoms are going to be replaced:
numVictims =int((j)*percent)
write(*,*) 'Number of atoms to be replaced: ', numVictims
! The FinalSpecies array is initially the OrigSpecies
finalSpecies = OrigSpecies
!initiate new seed based on the input argument (from
!the process ID
!ensures new random numbers for each time the code is run.
call getarg(4, nullWord)
read (nullWord,*) seedNumber
write (*,*) "SeedNumber", seedNumber
call srand(37 + seedNumber)
numdoped = 0
do while (numDoped .ne. numVictims)
PossibleVictim = rand()
PossibleVictim = nint(rand()*(natoms-2))+1
if (FinalSpecies(PossibleVictim) .eq. dopee) then
FinalSpecies(PossibleVictim) = dopand
numDoped = numDoped + 1
end if
end do
!output final xyz file
open(unit=77, file="output.xyz",status="replace")
write(77, '(i4)') natoms
do i=1,natoms
write(77,*) FinalSpecies(i),FinalX(i), FinalY(i), FinalZ(i)
enddo
close(unit=77)
open (unit=78, file="report.txt", status="unknown", position="append")
write (78, *) "Number of atoms replaced was", numDoped
write (78, *) "Total number of possible dopees was", numDopees
write (78, *) "Percent to aim for was ", percent
write (78, *) "Actual percentage was ", real(numDoped/numDopees), "%"
write(78,*) " "
write(78,*) "Replacable Species:", dopee
write(78,*) " Replaced with:", dopand
close (78)
! Deallocate Arrays
deallocate (OrigSpecies)
deallocate (FinalX)
deallocate (FinalY)
deallocate (FinalZ)
deallocate (FinalSpecies)
end program randomxyzSubstituter
This is a twinned pair of subroutines that use recursion to integrate a function via an adaptive modification of Simpson's Rule. Simpson's rule is a way to numerically estimate the area under a curve by approximating the space as a number of boxes and carrying out a weighed sum of these boxes (or bins).
An Adaptive Simpson's Rule essentially keeps trying smaller and smaller boxes (and therefore more and more sums) until the previous iteration is within some tolerance of the current result. In this way, for a well-behaved function it can be far more efficient, however simply fixing the box size and using a standard Simpson's Rule implementation can be more computationally efficient, depending on your circumstances.
If you are unfamiliar with the idea of recursive functions (as many Fortran programmers are) check out Wikipedia or just web-search the term. Essentially a recursive function is a function that can call itself, and as soon as your head stops hurting when you try to imagine the power of that, you have a very useful tool at your disposal.
Usage of the adaptiveSimpson subroutine is to pass it 4 arguments. I use the Fortran "interface" directive to allow a function to be passed to the subroutine, this is the function you are integrating. upperBound and lowerBound are the boundaries within which you are integrating. relErr is the tolerance (where result of iteration i is within tolerance of iteration i + i, then return) and integral is the result of the calculation. I compile this usually as a module in a grander program.
! simpsonAdaptive.f90
! ===========================================================================
! This program, to be placed in an Integrator module carries out integration
! based on the adaptive Simpsons method.
!
!
! Adaptive Simpson's method is based on estimating the error we get from calculating
! a definite integral using Simpson's rule. If the error exceeds a user-specified
! tolerance, the algorithm calls for subdividing the interval of integration in two
! and applying adaptive Simpson's method to each subinterval in a recursive manner.
!
! This continues until:
! |S(a,c) + S(b,c) - S(a,c)|/15 < epsilon
!
! Where:- a and b are the ends of an interval with midpoint c, S() is the Simpson's
! rule estimate of that interval and epsilon is the relative error. The 15 ensures that
! estimates obtained are exact for polynomials of degreee 5 or less. (This is a proof
! from reference below)
!
! More Information:
! William M. McKeeman: Algorithm 145: Adaptive numerical integration by Simpson's rule.
! Commun. ACM 5(12): 604 (1962)
!
!==============================================================================
! Barry Haycock.
! ===========================================================================
!
! Module Declaration
! ===========================================================================
module M_A_Simpson
contains
recursive subroutine adaptive_simpson(f, lowerBound, upperBound, relErr, integral)
implicit none
! Argument Declaration and Description
! ===========================================================================
! Input
real, intent(in) :: lowerBound, upperBound, relErr !Lower and Upper Limits, desired accuracy
interface
function f(x)
real :: f, x
end function f
end interface
!Local Parameters
integer :: intSign, level, maxLevel
real :: h, minimum, middle, maximum, fmin, fmid, fmax, eps, Sac, minEps, temp1, temp2
real, parameter :: minSubinterval = 1.0e-8
!Outputs
real, intent(out) :: integral
integral = 0.0
if(lowerBound == upperBound) return
if(lowerBound < upperBound) then
minimum = lowerBound
maximum = upperBound
intSign = 1
else
minimum = upperBound
maximum = lowerBound
intSign = -1
endif
!Find the maximum level based on your machine accuracy and
!desired relative error. At each new level we half eps and the
!interval. Thus
!maxLevel = int(min(log base 2 of eps/epsilon, &
! log base 2 of first interval/epsilon))
! where epsilon is the least positive number that added to
! 1 returns a number that is greater than 1
minEps = 500.0*epsilon(minimum)
eps = max(abs(relErr), minEps)
temp1 = eps/(5.0*epsilon(minimum))
temp2 = (maximum - minimum)/(5.0*epsilon(minimum))
maxLevel = min( int(log(temp1)/log(2.0)), int(log(temp2)/log(2.0)) )
temp1 = maximum - minimum
if(temp1 < minEps) then
integral = 0.0
return
endif
level = 1
h = (maximum - minimum)/6.0
middle = (minimum + maximum)/2
fmin = f(minimum)
fmid = f(middle)
fmax = f(maximum)
Sac = h*(fmin + 4.0*fmid + fmax)
call simpsonAdp(f, h, minimum, middle, maximum, fmin, fmid, fmax, &
Sac, integral, level, maxLevel, eps)
integral = integral*intSign
end subroutine adaptive_simpson
recursive subroutine simpsonAdp(f, h, xa, xb, xc, fa, fb, fc, Sac, integral, level, maxLevel, eps)
implicit none
integer :: newLevel
integer, intent(in) :: maxLevel
integer, intent(in) :: level
real :: newH, Lxm, Rxm, Lfm, Rfm, QF, left, right, newEps, Sab, Sbc
real, intent(in) :: h, xa, xb, xc, fa, fb, fc, eps, Sac
real, intent(out) :: integral
interface
function f(x)
real :: f, x
end function f
end interface
newLevel = level + 1
newH = h/2.0
if(level > maxLevel) then
integral = Sac
else
Lxm = (xa + xb)/2.0
Lfm = f(Lxm)
Sab = newH*(fa + 4.0*Lfm + fb)
Rxm = (xb + xc)/2.0
Rfm = f(Rxm)
Sbc = newH*(fb + 4.0*Rfm + fc)
QF = Sab + Sbc
if(abs(QF - Sac) < 15.0*eps) then
integral = QF + (QF - Sac)/15.0
else
newEps = eps/2.0
call simpsonAdp(f, newH, xa, Lxm, xb, fa, Lfm, fb, Sab, left, newLevel, maxLevel, newEps)
call simpsonAdp(f, newH, xb, Rxm, xc, fb, Rfm, fc, Sbc, right, newLevel, maxLevel, newEps)
integral = left + right
endif
endif
end subroutine simpsonAdp
end module
I find using ImageMagick to convert pdfs to jpegs. In general I use the pdf terminal in GnuPlot because I prefer the way the output looks. To convert any pdf to jpeg, use this line (assuming GhostScript is installed).
gs -dNOPAUSE -sDEVICE=jpeg -dFirstPage=1 -dLastPage=999 -sOutputFile=[OUTNAME].%d.jpg -dJPEGQ=100 -r500x500 -q [INNAME] -c quit
Where [OUTNAME] is the output name of the file (the output files will be "[OUTNAME].[PAGENUMBER].jpg", in this case and [INNAME] is the name of the input file.
I typically use this to convert a whole bunch of pdfs in one directly, thus:
for a in `ls ../*.pdf` ; do gs -dNOPAUSE -sDEVICE=jpeg -dFirstPage=1 -dLastPage=999 -sOutputFile=$a.%d.jpg -dJPEGQ=100 -r500x500 -q $a -c quit ; done
Which will convert every pdf in the directory in jpgs by page number.