Title: | Clustering Using Mixtures of Sub Gaussian Stable Distributions |
---|---|
Description: | Developed for model-based clustering using the finite mixtures of skewed sub-Gaussian stable distributions developed by Teimouri (2022) <arXiv:2205.14067> and estimating parameters of the symmetric stable distribution within the Bayesian framework. |
Authors: | Mahdi Teimouri [aut, cre, cph, ctb]
|
Maintainer: | Mahdi Teimouri <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.1 |
Built: | 2025-02-20 02:42:55 UTC |
Source: | https://github.com/cran/mixSSG |
The set of AIS data involves recorded body factors of 202 athletes including 100 women 102 men, see Cook (2009). Among factors, two variables body mass index (BMI) and body fat percentage (Bfat) are chosen for cluster analysis.
data(AIS)
data(AIS)
A text file with 3 columns.
R. D. Cook and S. Weisberg, (2009). An Introduction to Regression Graphics, John Wiley & Sons, New York.
data(AIS)
data(AIS)
The bankruptcy dataset involves ratio of the retained earnings (RE) to the total assets, and the ratio of earnings before interests and the taxes (EBIT) to the total assets of 66 American firms, see Altman (1969).
data(bankruptcy)
data(bankruptcy)
A text file with 3 columns.
E. I. Altman, 1969. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 23(4), 589-609.
data(bankruptcy)
data(bankruptcy)
Suppose -dimensional random vector
follows a skewed sub-Gaussian stable distribution with density function
for
where
,
,
, and
are tail thickness, location, dispersion matrix, and skewness parameters, respectively. Herein, we give a good approximation for
. First , for
, define
If , then
where, (for
) are independent realizations following positive stable distribution that are generated using command
rpstable(3000, alpha)
. Otherwise, if , we have
where is distribution function of the Student's
with
degrees of freedom,
is the cumulative density function of normal distribution wih mean
and standard deviation
, and
.
dssg(Y, alpha, Mu, Sigma, Lambda)
dssg(Y, alpha, Mu, Sigma, Lambda)
Y |
a vector (or an |
alpha |
the tail thickness parameter. |
Mu |
a vector giving the location parameter. |
Sigma |
a positive definite symmetric matrix specifying the dispersion matrix. |
Lambda |
a vector giving the skewness parameter. |
simulated realizations of size from positive
-stable distribution.
Mahdi Teimouri
n <- 4 alpha <- 1.4 Mu <- rep(0, 2) Sigma <- diag(2) Lambda <- rep(2, 2) Y <- rssg(n, alpha, Mu, Sigma, Lambda) dssg(Y, alpha, Mu, Sigma, Lambda)
n <- 4 alpha <- 1.4 Mu <- rep(0, 2) Sigma <- diag(2) Lambda <- rep(2, 2) Y <- rssg(n, alpha, Mu, Sigma, Lambda) dssg(Y, alpha, Mu, Sigma, Lambda)
-stable (S
S) distribution using Bayesian paradigm.Let are
realizations form S
S distribution with parameters
, and
. Herein, we estimate parameters of symmetric univariate stable distribution within a Bayesian framework. We consider a uniform distribution for prior of tail thickness, that is
. The normal and inverse gamma conjugate priors are designated for
and
with density functions given, respectively, by
and
where ,
,
,
, and
.
fitBayes(y, mu0, sigma0, gamma0, delta0, epsilon)
fitBayes(y, mu0, sigma0, gamma0, delta0, epsilon)
y |
vector of realizations that following S |
mu0 |
the location hyperparameter corresponding to |
sigma0 |
the standard deviation hyperparameter corresponding to |
gamma0 |
the shape hyperparameter corresponding to |
delta0 |
the rate hyperparameter corresponding to |
epsilon |
a positive small constant playing the role of threshold for stopping sampler. |
Estimated tail thickness, location, and scale parameters, number of iterations to attain convergence, the log-likelihood value across iterations, the Bayesian information criterion (BIC), and the Akaike information criterion (AIC).
Mahdi Teimouri
n <- 100 alpha <- 1.4 mu <- 0 sigma <- 1 y <- rnorm(n) fitBayes(y, mu0 = 0, sigma0 = 0.2, gamma0 = 10e-5, delta0 = 10e-5, epsilon = 0.005)
n <- 100 alpha <- 1.4 mu <- 0 sigma <- 1 y <- rnorm(n) fitBayes(y, mu0 = 0, sigma0 = 0.2, gamma0 = 10e-5, delta0 = 10e-5, epsilon = 0.005)
Each -dimensional skewed sub-Gaussian stable (SSG) random vector
, admits the representation given by Teimouri (2022):
where (location vector in
,
(skewness vector in
),
(positive definite symmetric dispersion matrix), and
(tail thickness) are model parameters. Furthermore,
is a positive stable random variable,
, and
. We note that
,
, and
are mutually independent.
fitmssg(Y, K, eps = 0.15, initial = "FALSE", method = "moment", starts = starts)
fitmssg(Y, K, eps = 0.15, initial = "FALSE", method = "moment", starts = starts)
Y |
an |
K |
number of component. |
eps |
threshold value for stopping EM algorithm. It is 0.15 by default. The algorithm can be implemented faster if |
initial |
logical statement. If |
method |
either |
starts |
a list of initial values if |
a list of estimated parameters corresponding to clusters, predicted labels for clusters, the log-likelihood value across iterations, the Bayesian information criterion (BIC), and the Akaike information criterion (AIC).
Mahdi Teimouri
M. Teimouri, 2022. Finite mixture of skewed sub-Gaussian stable distributions, arxiv.org/abs/2205.14067.
M. Teimouri, S. Rezakhah, and A. Mohammadpour, 2018. Parameter estimation using the EM algorithm for symmetric stable random variables and sub-Gaussian random vectors, Journal of Statistical Theory and Applications, 17(3), 439-41.
J. A. Hartigan, M. A. Wong, 1979. Algorithm as 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series c (Applied Statistics), 28, 100-108.
data(bankruptcy) out1<-fitmssg(bankruptcy[,2:3], K=2, eps = 0.15, initial="FALSE", method="moment", starts=starts) n1 <- 100 n2 <- 50 omega1 <- n1/(n1 + n2) omega2 <- n2/(n1 + n2) alpha1 <- 1.6 alpha2 <- 1.6 mu1 <- c(-1, -1) mu2 <- c(6, 6) sigma1 <- matrix( c(2, 0.20, 0.20, 0.5), 2, 2 ) sigma2 <- matrix( c(0.4, 0.10, 0.10, 0.2 ), 2, 2 ) lambda1 <- c(5, 5) lambda2 <- c(-5, -5) Sigma <- array( NA, c(2, 2, 2) ) Sigma[, , 1] <- sigma1 Sigma[, , 2] <- sigma2 starts<-list( c(omega1,omega2), c(alpha1,alpha2), rbind(mu1,mu2), Sigma, rbind(lambda1,lambda2) ) Y <- rbind( rssg(n1 , alpha1, mu1, sigma1, lambda1), rssg(n2, alpha2, mu2, sigma2, lambda2) ) out2<-fitmssg(Y, K=2, eps=0.15, initial="TRUE", method="moment", starts=starts)
data(bankruptcy) out1<-fitmssg(bankruptcy[,2:3], K=2, eps = 0.15, initial="FALSE", method="moment", starts=starts) n1 <- 100 n2 <- 50 omega1 <- n1/(n1 + n2) omega2 <- n2/(n1 + n2) alpha1 <- 1.6 alpha2 <- 1.6 mu1 <- c(-1, -1) mu2 <- c(6, 6) sigma1 <- matrix( c(2, 0.20, 0.20, 0.5), 2, 2 ) sigma2 <- matrix( c(0.4, 0.10, 0.10, 0.2 ), 2, 2 ) lambda1 <- c(5, 5) lambda2 <- c(-5, -5) Sigma <- array( NA, c(2, 2, 2) ) Sigma[, , 1] <- sigma1 Sigma[, , 2] <- sigma2 starts<-list( c(omega1,omega2), c(alpha1,alpha2), rbind(mu1,mu2), Sigma, rbind(lambda1,lambda2) ) Y <- rbind( rssg(n1 , alpha1, mu1, sigma1, lambda1), rssg(n2, alpha2, mu2, sigma2, lambda2) ) out2<-fitmssg(Y, K=2, eps=0.15, initial="TRUE", method="moment", starts=starts)
The cumulative distribution function of positive stable distribution is given by
where is tail thickness or index of stability and
Kanter (1975) used the above integral transform to simulate positive stable random variable as
in which and
independently follows an exponential distribution with mean unity.
rpstable(n, alpha)
rpstable(n, alpha)
n |
the number of samples required. |
alpha |
the tail thickness parameter. |
simulated realizations of size from positive
-stable distribution.
Mahdi Teimouri
M. Kanter, 1975. Stable densities under change of scale and total variation inequalities, Annals of Probability, 3(4), 697-707.
rpstable(10, alpha = 1.2)
rpstable(10, alpha = 1.2)
Each skewed sub-Gaussian stable (SSG) random vector , admits the representation
where is location vector,
is skewness vector,
is a positive definite symmetric dispersion matrix, and
is tail thickness. Further,
is a positive stable random variable,
, and
. We note that
,
, and
are mutually independent.
rssg(n, alpha, Mu, Sigma, Lambda)
rssg(n, alpha, Mu, Sigma, Lambda)
n |
the number of samples required. |
alpha |
the tail thickness parameter. |
Mu |
a vector giving the location parameter. |
Sigma |
a positive definite symmetric matrix specifying the dispersion matrix. |
Lambda |
a vector giving the skewness parameter. |
simulated realizations of size from the skewed sub-Gaussian stable distribution.
Mahdi Teimouri
n <- 4 alpha <- 1.4 Mu <- rep(0, 2) Sigma <- diag(2) Lambda <- rep(2, 2) rssg(n, alpha, Mu, Sigma, Lambda)
n <- 4 alpha <- 1.4 Mu <- rep(0, 2) Sigma <- diag(2) Lambda <- rep(2, 2) rssg(n, alpha, Mu, Sigma, Lambda)
Suppose are realizations following
-dimensional skewed sub-Gaussian stable distribution. Herein, we estimate the tail thickness parameter
when
(location vector in
,
(skewness vector in
), and
(positive definite symmetric dispersion matrix are assumed to be known.
stoch(Y, alpha0, Mu0, Sigma0, Lambda0)
stoch(Y, alpha0, Mu0, Sigma0, Lambda0)
Y |
a vector (or an |
alpha0 |
initial value for the tail thickness parameter. |
Mu0 |
a vector giving the initial value for the location parameter. |
Sigma0 |
a positive definite symmetric matrix specifying the initial value for the dispersion matrix. |
Lambda0 |
a vector giving the initial value for the skewness parameter. |
Here, we assume that parameters ,
, and
are known and only the tail thickness parameter needs to be estimated.
Estimated tail thickness parameter , of the skewed sub-Gaussian stable distribution.
Mahdi Teimouri
n <- 100 alpha <- 1.4 Mu <- rep(0, 2) Sigma <- diag(2) Lambda <- rep(2, 2) Y <- rssg(n, alpha, Mu, Sigma, Lambda) stoch(Y, alpha, Mu, Sigma, Lambda)
n <- 100 alpha <- 1.4 Mu <- rep(0, 2) Sigma <- diag(2) Lambda <- rep(2, 2) Y <- rssg(n, alpha, Mu, Sigma, Lambda) stoch(Y, alpha, Mu, Sigma, Lambda)