Title: | Statistical Modelling for Plant Size Distributions |
---|---|
Description: | Developed for the following tasks. 1 ) Computing the probability density function, cumulative distribution function, random generation, and estimating the parameters of the eleven mixture models. 2 ) Point estimation of the parameters of two - parameter Weibull distribution using twelve methods and three - parameter Weibull distribution using nine methods. 3 ) The Bayesian inference for the three - parameter Weibull distribution. 4 ) Estimating parameters of the three - parameter Birnbaum - Saunders, generalized exponential, and Weibull distributions fitted to grouped data using three methods including approximated maximum likelihood, expectation maximization, and maximum likelihood. 5 ) Estimating the parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data through the EM algorithm, 6 ) Estimating parameters of the nonlinear height curve fitted to the height - diameter observation, 7 ) Estimating parameters, computing probability density function, cumulative distribution function, and generating realizations from gamma shape mixture model introduced by Venturini et al. (2008) <doi:10.1214/07-AOAS156> , 8 ) The Bayesian inference, computing probability density function, cumulative distribution function, and generating realizations from univariate and bivariate Johnson SB distribution, 9 ) Robust multiple linear regression analysis when error term follows skewed t distribution, 10 ) Estimating parameters of a given distribution fitted to grouped data using method of maximum likelihood, and 11 ) Estimating parameters of the Johnson SB distribution through the Bayesian, method of moment, conditional maximum likelihood, and two - percentile method. |
Authors: | Mahdi Teimouri [aut, cre, cph, ctb]
|
Maintainer: | Mahdi Teimouri <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.4.3 |
Built: | 2025-02-08 06:23:19 UTC |
Source: | https://github.com/cran/ForestFit |
The DBH data contains the diameter at breast height (dbh), height and condition data for all trees centered in 108 plots of size 0.2 hectare immediately following a single prescribed burn and also following three 5-yr (year) interval reburns (four burns total) and a single 15-yr interval reburn (two burns total) and associated treatment information. The trees information were established from mixed ponderosa pine (Pinus ponderosa Dougl. ex Laws.) that contained scattered western junipers (Juniperus occidentalis Hook.). The plots were located in the Malheur National Forest on the southern end of the Blue Mountains near Burns, Oregon, USA.
data(DBH)
data(DBH)
A text file with 5732 observations from 17 variables related of trees characteristics such as dbh and height.
B. K., Kerns, D. J., Westlind, and M. A. Day. 2017. Season and interval of burning and cattle exclusion in the southern blue mountains, oregon: Overstory tree height, diameter and growth. Forest Service Research Data Archive, <doi:10.2737/RDS-2017-0041> .
Computes probability density function (pdf) of the gamma shape mixture (GSM) model. The general form for the pdf of the GSM model is given by
where is the parameter vector and known constant
is the number of components. The vector of mixing parameters is given by
where
s sum to one, i.e.,
. Here
is the rate parameter that is equal for all components.
dgsm(data, omega, beta, log = FALSE)
dgsm(data, omega, beta, log = FALSE)
data |
Vector of observations. |
omega |
Vector of the mixing parameters. |
beta |
The rate parameter. |
log |
If |
A vector of the same length as data
, giving the pdf of the GSM model.
Mahdi Teimouri
S. Venturini, F. Dominici, and G. Parmigiani, 2008. Gamma shape mixtures for heavy-tailed distributions, The Annals of Applied Statistics, 2(2), 756–776.
data<-seq(0,20,0.1) omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 dgsm(data, omega, beta)
data<-seq(0,20,0.1) omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 dgsm(data, omega, beta)
Computes the probability density function of the four-parameter JSB distibution given by
where ,
with
,
, and
.
djsb(data, param, log = FALSE)
djsb(data, param, log = FALSE)
data |
Vector of observations. |
param |
Vector of the parameters |
log |
If |
A vector of length n
, giving the density function of JSB distribution.
Mahdi Teimouri
delta <- 1 gamma <- 3 lambda <- 12 xi <- 5 param <- c(delta, gamma, lambda, xi) data <- rjsb(20, param) djsb(data, param, log = FALSE)
delta <- 1 gamma <- 3 lambda <- 12 xi <- 5 param <- c(delta, gamma, lambda, xi) data <- rjsb(20, param) djsb(data, param, log = FALSE)
Computes the probability density function of the 9-parameter JSBB distibution given by
where
for . The parameter space of SBB distribution is
in which
,
,
, and
. The supports of marginals are
and
.
The support of the parameter space is
and
.
djsbb(data, param, log = FALSE)
djsbb(data, param, log = FALSE)
data |
Vector of observations. |
param |
Vector of the parameters |
log |
If |
A vector of length n
, giving the density function of JSBB distribution.
Mahdi Teimouri
Delta <- c(2.5, 3) Gamma <- c(2, 1) Lambda <- c(1, 3) Xi <- c(0, 2) rho <- -0.5 param <- c(Delta[1], Gamma[1], Lambda[1], Xi[1], Delta[2], Gamma[2], Lambda[2], Xi[2], rho) data <- rjsbb(20, param) djsbb(data, param, log = FALSE)
Delta <- c(2.5, 3) Gamma <- c(2, 1) Lambda <- c(1, 3) Xi <- c(0, 2) rho <- -0.5 param <- c(Delta[1], Gamma[1], Lambda[1], Xi[1], Delta[2], Gamma[2], Lambda[2], Xi[2], rho) data <- rjsbb(20, param) djsbb(data, param, log = FALSE)
Computes probability density function (pdf) of the mixture model. The general form for the pdf of the mixture model is given by
where , is the whole parameter vector,
for
is the parameter space of the
-th component, i.e.
,
is the pdf of the
-th component, and known constant
is the number of components. The vector of mixing parameters is given by
where
s sum to one, i.e.,
. Parameters
and
are the shape and scale parameters of the
-th component or both are the shape parameters. In the latter case, the parameters
and
are called the first and second shape parameters, respectively. We note that the constants
s sum to one, i.e.
. The families considered for each component include Birnbaum-Saunders, Burr type XII, Chen, F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull with pdf given by the following.
Birnbaum-Saunders
Burr XII
Chen
F
Frechet
gamma
Gompertz
log-logistic
log-normal
Lomax
skew-normal
Weibull
where . In the skew-normal case,
and
are the density and distribution functions of the standard normal distribution, respectively.
dmixture(data, g, K, param)
dmixture(data, g, K, param)
data |
Vector of observations. |
g |
Name of the family including " |
K |
Number of components. |
param |
Vector of the |
For the skew-normal case, ,
, and
are the location, scale, and skewness parameters, respectively.
A vector of the same length as data
, giving the pdf of the mixture model of families computed at data
.
Mahdi Teimouri
data<-seq(0,20,0.1) K<-2 weight<-c(0.6,0.4) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) dmixture(data, "weibull", K, param)
data<-seq(0,20,0.1) K<-2 weight<-c(0.6,0.4) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) dmixture(data, "weibull", K, param)
Suppose denotes a vector of
independent observations coming from a four-parameter JSB distribution with probability density function given given by
where ,
with
,
, and
. Using the Bayesian approach, we compute the Bayes' estimators of the JSB distribution parameters.
fitbayesJSB(data, n.burn=8000, n.simul=10000)
fitbayesJSB(data, n.burn=8000, n.simul=10000)
data |
Vector of observations. |
n.burn |
Length of the burn-in period, i.e., the point after which Gibbs sampler is supposed to attain convergence. By default |
n.simul |
Total numbers of Gibbs sampler iterations. By default |
The Bayes' estimators are obtained by averaging on the all iterations between n.burn
and n.simul
.
A list of objects in two parts as
Bayes' estimators of the parameters.
A sequence of four goodness-of-fit measures consist of Anderson-Darling (AD
), Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
Mahdi Teimouri
N. L. Johnson, 1949. Systems of frequency curves generated by methods of translation, Biometrika, 36, 149–176.
L. J. Norman, S. Kotz, and N. Balakrishnan, 1994. Continuous Univariate Distributions, volume I, John Wiley & Sons.
# Here we use the SW dataset provided by FIA that represents a typical loblolly pine plantation. # As the variable of interest, we fit the JSB distribution to the diameter at breast height (SW$DIA) # in inches. data(SW) data<-SW$DIA fitbayesJSB(data, n.burn=4000, n.simul=5000)
# Here we use the SW dataset provided by FIA that represents a typical loblolly pine plantation. # As the variable of interest, we fit the JSB distribution to the diameter at breast height (SW$DIA) # in inches. data(SW) data<-SW$DIA fitbayesJSB(data, n.burn=4000, n.simul=5000)
Suppose denotes a vector of
independent observations coming from a three-parameter Weibull distribution. Using the methodology given in Green et al. (1994), we compute the Bayes' estimators of the shape, scale, and location parameters.
fitbayesWeibull(data, n.burn=8000, n.simul=10000)
fitbayesWeibull(data, n.burn=8000, n.simul=10000)
data |
Vector of observations. |
n.burn |
Length of the burn-in period, i.e., the point after which Gibbs sampler is supposed to attain convergence. By default |
n.simul |
Total numbers of Gibbas sampler iterations. By default |
The Bayes' estimators are obtained by averaging on the all iterations between n.burn
and n.simul
.
A list of objects in two parts as
Bayes' estimators of the parameters.
A sequence of four goodness-of-fit measures consist of Anderson-Darling (AD
), Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
The methodology used here for computing the Bayes' estimator of the location parameter is different from that used by Green et al. (1994). This means that the location parameter is allowed to be any real value.
Mahdi Teimouri
E. J. Green, F. A. R. Jr, A. F. M. Smith, and W. E. Strawderman, 1994. Bayesian estimation for the three-parameter Weibull distribution with tree diameter data, Biometrics, 50(1), 254-269.
n<-100 alpha<-2 beta<-2 theta<-3 data<-rweibull(n,shape=alpha,scale=beta)+theta fitbayesWeibull(data, n.burn=4000, n.simul=5000)
n<-100 alpha<-2 beta<-2 theta<-3 data<-rweibull(n,shape=alpha,scale=beta)+theta fitbayesWeibull(data, n.burn=4000, n.simul=5000)
Estimates the parameters of the nine well-known nine three-parameter nonlinear curves fitted to the height-diameter observations. These nine models are given by the following.
Richards (Richards(1959))
Gompertz (Winsor(1992))
Hossfeld IV (Zeide(1993))
Korf (Flewelling and De Jong(1994))
logistic (Pearl and Reed (1920))
Prodan (Prodan(1968))
Ratkowsky (Ratkowsky(1990))
Sibbesen (Huang et al. (1992))
Weibull (Yang et al. (1978))
fitcurve(h,d,model,start)
fitcurve(h,d,model,start)
h |
Vector of height observations. |
d |
Vector of diameter observations. |
model |
The name of the fitted model including
" |
start |
A vector of starting values for the parameters |
A list of objects in four parts as
Estimated parameters and corresponding summaries including standard errors, computed -statistics, and
-values.
Residuals.
Covariance matrix of the estimated model parameters (coefficients) ,
, and
.
Residual standard error, i.e., .
number of trials for attaining convergence.
The hieght-diameter scatterplot superimposed by the fitted model.
Mahdi Teimouri
J. W. Flewelling and R. De Jong. (1994). Considerations in simultaneous curve fitting for repeated height-diameter measurements, Canadian Journal of Forest Research, 24(7), 1408-1414.
S. Huang, S. J. Titus, and D. P. Wiens. 1992. Comparison of nonlinear height±diameter functions for major Alberta tree species. Canadian Journal of Forest Research, 22, 1297-1304.
R. Pearl and L. J. Reed. (1920). On the rate of growth of the population of the United States since 1790 and its mathematical representation, Proceedings of the National Academy of Sciences of the United States of America, 6(6), 275.
M. Prodan. 1968. The spatial distribution of trees in an area. Allg. Forst Jagdztg, 139, 214-217.
D. A. Ratkowsky. 1990. Handbook of nonlinear regression, New York, Marcel Dekker, Inc.
F. J. Richards. 1959. A flexible growth function for empirical use. Journal of Experimental Botany, 10, 290-300.
S. B. Winsor. 1992. The Gompertz curve as a growth curve. Proceedings of National Academic Science, USA, 18, 1-8.
R. C. Yang, A. Kozak, J. H. G. Smith. 1978. The potential of Weibull-type functions as a flexible growth curves. Canadian Journal of Forest Research, 8, 424-431.
B. Zeide. 1993. Analysis of growth equation. Forest Science, 39, 594-616.
# use the heigth and diameter at breast height (dbh) of the plot 55 in DBH data set. # The first column of DBH dataset contains the plot number. Also, H and D denote the # height and dbh variables that located at columns 10 and 11 of data set DBH, respectively. data(DBH) D<-DBH[DBH[,1]==55,10] H<-DBH[DBH[,1]==55,11] start<-c(9,5,2) fitcurve(H,D,"weibull", start=start)
# use the heigth and diameter at breast height (dbh) of the plot 55 in DBH data set. # The first column of DBH dataset contains the plot number. Also, H and D denote the # height and dbh variables that located at columns 10 and 11 of data set DBH, respectively. data(DBH) D<-DBH[DBH[,1]==55,10] H<-DBH[DBH[,1]==55,11] start<-c(9,5,2) fitcurve(H,D,"weibull", start=start)
Suppose a sample of independent observations each follows a three-parameter BS, GE, or Weibull distributions have been divided into
separate groups of the form
, for
. So, the likelihood function is given by
where the is the lower bound of the first group,
is the upper bound of the last group, and
is the frequency of observations within
-th group provided that
. The cdf of a three-parameter BS, GE, and Weibull distributions are given by
and
where .
fitgrouped1(r, f, family, method1, starts, method2)
fitgrouped1(r, f, family, method1, starts, method2)
r |
A numeric vector of length |
f |
A numeric vector of length |
family |
Can be either |
method1 |
A character string determining the method of estimation. It can be one of |
""aml"
(for method of approximated maximum likelihood (aml)),
""em"
(for method of expectation maximization (em)), and
""ml"
(for method of maximum likelihood (ml)).
starts |
A numeric vector of the initial values for the shape, scale, and location parameters, respectively. |
method2 |
The method for optimizing the log-likelihood function. It invovles one of |
If the method is "em"
, then the initial values ("starts"
) and the log-likelihood optimizing method ("method2"
) are ignored.
A two-part list of objects given by the following:
Estimated parameters of the three-parameter GE, Birnbaum-Saunders, or Weibull distribution fitted to the gropued data.
A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC
), Consistent Akaike Information Criterion (CAIC
), Bayesian Information Criterion (BIC
), Hannan-Quinn information criterion (HQIC
), Anderson-Darling (AD
), Chi-square (Chi-square
),
Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
Mahdi Teimouri
G. J. McLachlan and T. Krishnan, 2007. The EM Algorithm and Extensions, John Wiley & Sons.
A. P. Dempster, N. M. Laird, and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B (methodological), 1-38.
M. Teimouri and A. K. Gupta, 2012. Estimation Methods for the Gompertz–Makeham Distribution Under Progressively Type-I Interval Censoring Scheme, National Academy Science Letters, 35(3).
r<-c(0,1,2,3,4,10) f<-c(2,8,12,15,4) starts<-c(2,2,0) fitgrouped1(r,f,"birnbaum-saunders","em") fitgrouped1(r,f,"weibull","ml",starts,"CG") fitgrouped1(r,f,"ge","em")
r<-c(0,1,2,3,4,10) f<-c(2,8,12,15,4) starts<-c(2,2,0) fitgrouped1(r,f,"birnbaum-saunders","em") fitgrouped1(r,f,"weibull","ml",starts,"CG") fitgrouped1(r,f,"ge","em")
Suppose a sample of independent observations each follows a three-parameter BS, GE, or Weibull distributions have been divided into
separate groups of the form
, for
. So, the likelihood function is given by
where the is the lower bound of the first group,
is the upper bound of the last group, and
is the frequency of observations within
-th group provided that
.
fitgrouped2(r, f, param, start, cdf, pdf, method = "Nelder-Mead", lb = 0, ub = Inf , level = 0.05)
fitgrouped2(r, f, param, start, cdf, pdf, method = "Nelder-Mead", lb = 0, ub = Inf , level = 0.05)
r |
A numeric vector of length |
f |
A numeric vector of length |
param |
Vector of the of the family parameter's names. |
start |
Vector of the initial values. |
cdf |
Expression of the cumulative distribution function. |
pdf |
Expression of the probability density function. |
method |
The method for the numerically optimization that includes one of |
lb |
Lower bound of the family's support. That is zero by default. |
ub |
Upper bound of the family's support. That is |
level |
Significance level for constructing asymptotic confidence interval That is |
A two-part list of objects given by the following:
Maximum likelihood (ML) estimator for the parameters of the fitted family to the gropued data, asymptotic standard error of the ML estimator, lower bound of the asymptotic confidence interval, and upper bound of the asymptotic confidence interval at the given level.
A sequence of goodness-of-fit measures consist of Anderson-Darling (AD
), Cramer-von Mises (CVM
), and Kolmogorov-Smirnov (KS
) statistics.
Mahdi Teimouri
r <- c(2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5) f <- c(33, 111, 168, 147, 96, 45, 18, 4, 0) param <- c("alpha", "beta", "mu") pdf <- quote( alpha/beta*((x-mu)/beta)^(alpha-1)*exp( -((x-mu)/beta)^alpha ) ) cdf <- quote( 1-exp( -((x-mu)/beta)^alpha ) ); lb <- 2 ub <- Inf start <-c(2, 3, 2) level <- 0.05 fitgrouped2(r, f, param, start, cdf, pdf, method = "Nelder-Mead", lb = lb, ub = ub, level = 0.05)
r <- c(2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5) f <- c(33, 111, 168, 147, 96, 45, 18, 4, 0) param <- c("alpha", "beta", "mu") pdf <- quote( alpha/beta*((x-mu)/beta)^(alpha-1)*exp( -((x-mu)/beta)^alpha ) ) cdf <- quote( 1-exp( -((x-mu)/beta)^alpha ) ); lb <- 2 ub <- Inf start <-c(2, 3, 2) level <- 0.05 fitgrouped2(r, f, param, start, cdf, pdf, method = "Nelder-Mead", lb = lb, ub = ub, level = 0.05)
Estimates parameters of the gamma shape mixture (GSM) model whose probability density function gets the form as follows.
where is the parameter vector and known constant
is the number of components. The vector of mixing parameters is given by
where
s sum to one, i.e.,
. Here
is the rate parameter that is equal for all components.
fitgsm(data,K)
fitgsm(data,K)
data |
Vector of observations. |
K |
Number of components. |
Supposing that the number of components, i.e., is known, the parameters are estimated through the EM algorithm developed by the maintainer.
A list of objects in three parts as
The EM estimator of the rate parameter.
The EM estimator of the mixing parameters.
A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC
), Consistent Akaike Information Criterion (CAIC
), Bayesian Information Criterion (BIC
), Hannan-Quinn information criterion (HQIC
), Anderson-Darling (AD
), Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
Mahdi Teimouri
A. P. Dempster, N. M. Laird, and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.
S. Venturini, F. Dominici, and G. Parmigiani, 2008. Gamma shape mixtures for heavy-tailed distributions, The Annals of Applied Statistics, 2(2), 756–776.
n<-100 omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 data<-rgsm(n,omega,beta) K<-length(omega) fitgsm(data,K)
n<-100 omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 data<-rgsm(n,omega,beta) K<-length(omega) fitgsm(data,K)
Suppose denotes a vector of
independent observations coming from a four-parameter JSB distribution with probability density function given given by
where ,
with
,
, and
. Using Bayesian approach, method of conditional maximum likelihood (CML, Johnson (1949)), method of moment (MM, Fonseca(2009)), and two-percentile method that proposed by Knoebel and Burkhart (1991) (KB). We compute all four estimators when the scale
, and location
, parameters are predetermined. The method proposed by Ogana (2018) has been used for predetermining the scale and location parameters. Let DBH accounts for diameter at breast height (DBH), for estimating parameters
and
through the Bayesian approach, the location and scale parameters are predetermined as
and
, respectively. For the MM, CML, and KB methods, the parameters
and
are predetermined in the same way as suggested by Ogana (2018).
determine
fitJSB(y, n.burn=8000, n.simul=10000)
fitJSB(y, n.burn=8000, n.simul=10000)
y |
Vector of DBH observations. |
n.burn |
Length of the burn-in period, i.e., the point after which Gibbs sampler is supposed to attain convergence. By default |
n.simul |
Total numbers of Gibbs sampler iterations. By default |
The Bayes' estimators are obtained by averaging on the all iterations between n.burn
and n.simul
.
A list of objects in two parts as
Four estimators including Bayes, MM, CML, and KB.
A sequence of four goodness-of-fit measures consist of Anderson-Darling (AD
), Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
N. L. Johnson, 1949. Systems of frequency curves generated by methods of translation, Biometrika, 36, 149-176.
B . R. Knoebel and E. Burkhart, 1991. A bivariate distribution approach to modeling forest diameter distributions at two points in time, Biometrics, 3, 241-253.
T. F. Fonseca, 2009. Describing maritime pine diameter distributions with Johnson's SB distribution using a new all-parameter recovery approach, Forest Science, 55, 367-373.
F. N. Ogana, 2018. Evaluation of four methods of fitting Johnson’s SBB for height and volume predictions, Journal of Forest Science, 64, 187-197.
# Here we use the SW dataset provided by FIA that represents a typical loblolly pine plantation. # As the variable of interest, we fit the JSB distribution to the diameter at breast height (SW$DIA) # in inches. data(SW) y <- SW$DIA fitJSB(y, n.burn=8000, n.simul=10000)
# Here we use the SW dataset provided by FIA that represents a typical loblolly pine plantation. # As the variable of interest, we fit the JSB distribution to the diameter at breast height (SW$DIA) # in inches. data(SW) y <- SW$DIA fitJSB(y, n.burn=8000, n.simul=10000)
Estimates parameters of the mixture model using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by
where , is the whole parameter vector,
for
is the parameter space of the
-th component, i.e.
,
is the cdf of the
-th component, and known constant
is the number of components. Parameters
and
are the shape and scale parameters or both are the shape parameters. In the latter case, the parameters
and
are called the first and second shape parameters, respectively. We note that the constants
s sum to one, i.e.
. The families considered for the cdf
include Birnbaum-Saunders, Burr type XII, Chen, F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull.
fitmixture(data, family, K, initial=FALSE, starts)
fitmixture(data, family, K, initial=FALSE, starts)
data |
Vector of observations. |
family |
Name of the family including: " |
K |
Number of components. |
initial |
The sequence of initial values including |
starts |
If |
It is worth noting that identifiability of the mixture models supposed to be held. For skew-normal case we have in which
,
, and
, respectively, are the location, scale, and skewness parameters of the
-th component, see Azzalini (1985).
The output has three parts, The first part includes vector of estimated weight, shape, and scale parameters.
The second part involves a sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC
), Consistent Akaike Information Criterion (CAIC
), Bayesian Information Criterion (BIC
), Hannan-Quinn information criterion (HQIC
), Anderson-Darling (AD
), Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
The last part of the output contains clustering vector.
Mahdi Teimouri
A. Azzalini, 1985. A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 171-178.
A. P. Dempster, N. M. Laird, and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.
M. Teimouri, S. Rezakhah, and A. Mohammdpour, 2018. EM algorithm for symmetric stable mixture model, Communications in Statistics-Simulation and Computation, 47(2), 582-604.
# Here we model the northern hardwood uneven-age forest data (HW$DIA) in inches using a # 3-component Weibull mixture distribution. data(HW) data<-HW$DIA K<-3 fitmixture(data,"weibull", K, initial=FALSE)
# Here we model the northern hardwood uneven-age forest data (HW$DIA) in inches using a # 3-component Weibull mixture distribution. data(HW) data<-HW$DIA K<-3 fitmixture(data,"weibull", K, initial=FALSE)
Estimates parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by
where , is the whole parameter vector,
for
is the parameter space of the
-th component, i.e.
,
is the cdf of the
-th component, and known constant
is the number of components. Parameters
and
are the shape and scale parameters. The constants
s sum to one, i.e.
. The families considered for the cdf
include Gamma, Log-normal, and Weibull. If a sample of
independent observations each follows a distribution with cdf
have been divided into
separate groups of the form
, for
. So, the likelihood function of the observed data is given by
where
in which denotes the pdf of the
-th component. Using the the EM algorithm proposed by Dempster et al. (1977), we can solve
by introducing two new missing variables.
fitmixturegrouped(family, r, f, K, initial=FALSE, starts)
fitmixturegrouped(family, r, f, K, initial=FALSE, starts)
family |
Name of the family including: " |
r |
A numeric vector of length |
f |
A numeric vector of length |
K |
Number of components. |
initial |
The sequence of initial values including |
starts |
If |
Identifiability of the mixture models supposed to be held. For skew-normal mixture model the parameter vector of -th component gets the form
where
and
denote the location, scale, and skewness parameters, respectively.
The output has two parts, The first part includes vector of estimated weight, shape, and scale parameters.
A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC
), Consistent Akaike Information Criterion (CAIC
), Bayesian Information Criterion (BIC
), Hannan-Quinn information criterion (HQIC
), Anderson-Darling (AD
), Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
Mahdi Teimouri
G. J. McLachlan and P. N. Jones, 1988. Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 44, 571-578
n<-50 K<-2 m<-10 weight<-c(0.3,0.7) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) data<-rmixture(n, "weibull", K, param) r<-seq(min(data),max(data),length=m+1) D<-data.frame(table(cut(data,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4))) f<-D$Freq fitmixturegrouped("weibull",r,f,K,initial=FALSE)
n<-50 K<-2 m<-10 weight<-c(0.3,0.7) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) data<-rmixture(n, "weibull", K, param) r<-seq(min(data),max(data),length=m+1) D<-data.frame(table(cut(data,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4))) f<-D$Freq fitmixturegrouped("weibull",r,f,K,initial=FALSE)
Estimates the parameters of the two- and three-parameter Weibull model with pdf and cdf given by
and
where ,
,
and
. Here, the parameters
,
, and
are known in the literature as the shape, scale, and location, respectively. If
, then
and
in above are the pdf and cdf of a two-parameter Weibull distribution, respectively.
fitWeibull(data, location, method, starts)
fitWeibull(data, location, method, starts)
data |
Vector of observations |
starts |
Initial values for starting the iterative procedures such as Newton-Raphson. |
location |
Either TRUE or FALSE. If location=TRUE, then shift parameter will be considered; otherwise the shift parameter omitted. |
method |
Used method for estimating the parameters. In the two-parameter case, methods are
" |
For the method wml
, all weights have been provided for sample size less that or equal to 100. This means that both methods ml
and wml
give the same estimates for samples of size larger than 100.
A list of objects in two parts given by the following:
Estimated parameters for two- or three-parameter Weibull distribution.
A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC
), Consistent Akaike Information Criterion (CAIC
), Bayesian Information Criterion (BIC
), Hannan-Quinn information criterion (HQIC
), Anderson-Darling (AD
), Cramer-von Mises (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
Mahdi Teimouri
R. C. H. Cheng and M. A. Stephens, 1989. A goodness-of-fit test using Moran's statistic with estimated parameters, Biometrika, 76(2), 385-392.
C. A. Clifford and B. Whitten, 1982. Modified maximum likelihood and modified moment estimators for the three-parameter Weibull distribution, Communication in Statistics-Theory and Methods, 11(23), 2631-2656.
D. Cousineau, 2009. Nearly unbiased estimators for the three-parameter Weibull distribution with greater efficiency than the iterative likelihood method, British Journal of Mathematical and Statistical Psychology, 62, 167-191.
G. Cran, 1988. Moment estimators for the 3-parameter Weibull distribution, IEEE Transactions on Reliability, 37(4), 360-363.
J. R. Hosking, 1990. L-moments: analysis and estimation of distributions using linear combinations of order statistics, Journal of the Royal Statistical Society. Series B (Methodological), 52(1), 105-124.
Y. M. Kantar, 2015. Generalized least squares and weighted least squares estimation methods for distributional parameters, REVSTAT-Statistical Journal, 13(3), 263-282.
M. Teimouri and S. Nadarajah, 2012. A simple estimator for the Weibull shape parameter, International Journal of Structural Stability and Dynamics, 12(2), 2395-402.
M. Teimouri, S. M. Hoseini, and S. Nadarajah, 2013. Comparison of estimation methods for the Weibull distribution, Statistics, 47(1), 93-109.
F. Wang and J. B. Keats, 1995. Improved percentile estimation for the two-parameter Weibull distribution, Microelectronics Reliability, 35(6), 883-892.
L. Zhang, M. Xie, and L. Tang, 2008. On Weighted Least Squares Estimation for the Parameters of Weibull Distribution. In: Pham H. (eds) Recent Advances in Reliability and Quality in Design. Springer Series in Reliability Engineering. Springer, London.
n<-100 alpha<-2 beta<-2 theta<-3 data<-rweibull(n,shape=alpha,scale=beta)+theta starts<-c(2,2,3) fitWeibull(data, TRUE, "mps", starts) fitWeibull(data, TRUE, "wml", starts) fitWeibull(data, FALSE, "mlm", starts) fitWeibull(data, FALSE, "ustat", starts)
n<-100 alpha<-2 beta<-2 theta<-3 data<-rweibull(n,shape=alpha,scale=beta)+theta starts<-c(2,2,3) fitWeibull(data, TRUE, "mps", starts) fitWeibull(data, TRUE, "wml", starts) fitWeibull(data, FALSE, "mlm", starts) fitWeibull(data, FALSE, "ustat", starts)
Tree list from a U.S. Forest Service Forest Inventory and Analysis (FIA) plot PLT_CN 247006253010661 measured in 2012 and represents a typical northern hardwood uneven-age forest.
data(HW)
data(HW)
A data frame containing 25 trees (rows) and two columns. Columns are the trees' scientific name and diameter at breast height in inches.
Computes cumulative distribution function (cdf) of the gamma shape mixture (GSM) model. The general form for the cdf of the GSM model is given by
where
in which is the parameter vector and known constant
is the number of components. The vector of mixing parameters is given by
where
s sum to one, i.e.,
. Here
is the rate parameter that is equal for all components.
pgsm(data, omega, beta, log.p = FALSE, lower.tail = TRUE)
pgsm(data, omega, beta, log.p = FALSE, lower.tail = TRUE)
data |
Vector of observations. |
omega |
Vector of the mixing parameters. |
beta |
The rate parameter. |
log.p |
If |
lower.tail |
If |
A vector of the same length as data
, giving the cdf of the GSM model.
Mahdi Teimouri
S. Venturini, F. Dominici, and G. Parmigiani, 2008. Gamma shape mixtures for heavy-tailed distributions, The Annals of Applied Statistics, 2(2), 756–776.
data<-seq(0,20,0.1) omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 pgsm(data, omega, beta)
data<-seq(0,20,0.1) omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 pgsm(data, omega, beta)
Computes the cumulative distribution function of the four-parameter JSB distibution given by
where ,
with
,
, and
.
pjsb(data, param, log.p = FALSE, lower.tail = TRUE)
pjsb(data, param, log.p = FALSE, lower.tail = TRUE)
data |
Vector of observations. |
param |
Vector of the parameters |
log.p |
If |
lower.tail |
If |
A vector of length n
, giving random generated values from JSB distribution.
Mahdi Teimouri
data<-rnorm(10) param<-c(delta<-1, gamma<-3, lambda<-12, xi<-5) pjsb(data, param, log.p = FALSE, lower.tail = TRUE)
data<-rnorm(10) param<-c(delta<-1, gamma<-3, lambda<-12, xi<-5) pjsb(data, param, log.p = FALSE, lower.tail = TRUE)
Computes cumulative distribution function (cdf) of the mixture model. The general form for the cdf of the mixture model is given by
where , is the whole parameter vector,
for
is the parameter space of the
-th component, i.e.
,
is the cdf of the
-th component, and known constant
is the number of components. The vector of mixing parameters is given by
where
s sum to one, i.e.,
. Parameters
and
are the shape and scale parameters or both are the shape parameters. In the latter case, the parameters
and
are called the first and second shape parameters, respectively. The families considered for each component include Birnbaum-Saunders, Burr type XII, Chen, F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull.
pmixture(data, g, K, param)
pmixture(data, g, K, param)
data |
Vector of observations. |
g |
Name of the family including: " |
K |
Number of components. |
param |
Vector of the |
For the skew-normal case, ,
, and
are the location, scale, and skewness parameters, respectively.
A vector of the same length as data
, giving the cdf of the mixture model computed at data
.
Mahdi Teimouri
data<-seq(0,20,0.1) K<-2 weight<-c(0.6,0.4) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) pmixture(data, "weibull", K, param)
data<-seq(0,20,0.1) K<-2 weight<-c(0.6,0.4) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) pmixture(data, "weibull", K, param)
Simulates realizations from a gamma shape mixture (GSM) model with probability density function given by
where is the parameter vector and known constant
is the number of components. The vector of mixing parameters is given by
where
s sum to one, i.e.,
. Here
is the rate parameter that is equal for all components.
rgsm(n, omega, beta)
rgsm(n, omega, beta)
n |
Number of requested random realizations. |
omega |
Vector of the mixing parameters. |
beta |
The rate parameter. |
A vector of length n
, giving random generated values from GSM model.
Mahdi Teimouri
S. Venturini, F. Dominici, and G. Parmigiani, 2008. Gamma shape mixtures for heavy-tailed distributions, The Annals of Applied Statistics, 2(2), 756–776.
n<-100 omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 rgsm(n, omega, beta)
n<-100 omega<-c(0.05, 0.1, 0.15, 0.2, 0.25, 0.25) beta<-2 rgsm(n, omega, beta)
Simulates realizations from four-parameter JSB distribution with probability density function given by
where ,
with
,
,
, and
.
rjsb(n, param)
rjsb(n, param)
n |
Number of requested random realizations. |
param |
Vector of the parameters |
A vector of length n
, giving random generated values from JSB distribution.
Mahdi Teimouri
n<-100 param<-c(delta<-1, gamma<-3, lambda<-12, xi<-5) rjsb(n, param)
n<-100 param<-c(delta<-1, gamma<-3, lambda<-12, xi<-5) rjsb(n, param)
Simulates realizations from four-parameter JSB distribution.
rjsbb(n, param)
rjsbb(n, param)
n |
Number of requested random realizations. |
param |
Vector of the parameters |
A vector of length n
, giving random generated values from JSBB distribution.
Mahdi Teimouri
Delta <- c(2.5, 3) Gamma <- c(2,1) Lambda <- c(1, 3) Xi <- c(0, 2) rho <- -0.5 param <- c(Delta, Gamma, Lambda, Xi, rho) rjsbb(20, param)
Delta <- c(2.5, 3) Gamma <- c(2,1) Lambda <- c(1, 3) Xi <- c(0, 2) rho <- -0.5 param <- c(Delta, Gamma, Lambda, Xi, rho) rjsbb(20, param)
Generates iid realizations from the mixture model with pdf given by
where is the number of components,
, for
is parameter space of the
-th
component, i.e.
, and
is the whole parameter
vector
. Parameters
and
are the
shape and scale parameters or both are the shape parameters. In the latter case, parameters
and
are called the first and second shape parameters, respectively.
We note that the constants
s sum to one, i.e.,
.
The families considered for the cdf
include Birnbaum-Saunders, Burr type XII, Chen,
F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull.
rmixture(n, g, K, param)
rmixture(n, g, K, param)
n |
Number of requested random realizations. |
g |
Name of the family including " |
K |
Number of components. |
param |
Vector of the |
For the skew-normal case, ,
, and
are the location, scale, and skewness parameters, respectively.
A vector of length , giving a sequence of random realizations from given mixture model.
Mahdi Teimouri
n<-50 K<-2 weight<-c(0.3,0.7) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) rmixture(n, "weibull", K, param)
n<-50 K<-2 weight<-c(0.3,0.7) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) rmixture(n, "weibull", K, param)
distributionRobust multiple linear regression modelling with skew Student's error term. The density function of skew Student's
is given by
where ,
is the location parameter,
is the scale parameter, and
is the skewness parameter. Also,
and
denote the density and distribution functions of the Student's
distribution with
degrees of freedom at point
, respectively. If
, then the skew Student's
distribution turns into the ordinary Student's
distribution that is symmetric around
. Since Student's
is a heavy tailed distribution, it is so useful for regression analysis in presence of outliers.
skewtreg(y, x, Fisher=FALSE)
skewtreg(y, x, Fisher=FALSE)
y |
vector of response variable. |
x |
vector or matrix of explanatory variable(s). |
Fisher |
Either TRUE or FALSE. By default |
A list of estimated regression coefficients, asymptotic standard error, corresponding p-values, estimated parameters of error term (skew Student's ), F statistic, R-square and adjusted R-square, and observed Fisher information matrix is given.
Mahdi Teimouri
n<-100 x<-rnorm(n) y<-2+2*x+rt(n,df=2) skewtreg(y,x,Fisher=FALSE)
n<-100 x<-rnorm(n) y<-2+2*x+rt(n,df=2) skewtreg(y,x,Fisher=FALSE)
Tree list from a U.S. Forest Service Forest Inventory and Analysis (FIA) plot PLT_CN 259082471010854 measured in 2011 and represents a typical loblolly pine plantation.
data(SW)
data(SW)
A data frame containing 18 trees (rows) and two columns. Columns are the trees' scientific name and diameter at breast height in inches.
It contains a welcome message for user of ForestFit.