Package 'mixSSG'

Title: Clustering Using Mixtures of Sub Gaussian Stable Distributions
Description: Developed for model-based clustering using the finite mixtures of skewed sub-Gaussian stable distributions developed by Teimouri (2022) <arXiv:2205.14067> and estimating parameters of the symmetric stable distribution within the Bayesian framework.
Authors: Mahdi Teimouri [aut, cre, cph, ctb]
Maintainer: Mahdi Teimouri <[email protected]>
License: GPL (>= 2)
Version: 2.1.1
Built: 2025-02-20 02:42:55 UTC
Source: https://github.com/cran/mixSSG

Help Index


AIS data

Description

The set of AIS data involves recorded body factors of 202 athletes including 100 women 102 men, see Cook (2009). Among factors, two variables body mass index (BMI) and body fat percentage (Bfat) are chosen for cluster analysis.

Usage

data(AIS)

Format

A text file with 3 columns.

References

R. D. Cook and S. Weisberg, (2009). An Introduction to Regression Graphics, John Wiley & Sons, New York.

Examples

data(AIS)

bankruptcy data

Description

The bankruptcy dataset involves ratio of the retained earnings (RE) to the total assets, and the ratio of earnings before interests and the taxes (EBIT) to the total assets of 66 American firms, see Altman (1969).

Usage

data(bankruptcy)

Format

A text file with 3 columns.

References

E. I. Altman, 1969. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 23(4), 589-609.

Examples

data(bankruptcy)

Approximating the density function of skewed sub-Gaussian stable distribution.

Description

Suppose dd-dimensional random vector Y\boldsymbol{Y} follows a skewed sub-Gaussian stable distribution with density function fY(yΘ)f_{\boldsymbol{Y}}(\boldsymbol{y} | \boldsymbol{\Theta}) for Θ=(α,μ,Σ,λ){\boldsymbol{\Theta}}=(\alpha,\boldsymbol{\mu},\Sigma, \boldsymbol{\lambda}) where α\alpha, μ\boldsymbol{\mu}, Σ\Sigma, and λ\boldsymbol{\lambda} are tail thickness, location, dispersion matrix, and skewness parameters, respectively. Herein, we give a good approximation for fY(yΘ)f_{\boldsymbol{Y}}(\boldsymbol{y} | \boldsymbol{\Theta}). First , for N=50{\cal{N}}=50, define

L=Γ(Nα2+1+α2)Γ(d+Nα2+α2)Γ(Nα2+1)Γ(d+Nα2)(N+1).L=\frac{\Gamma(\frac{{\cal{N}}\alpha}{2}+1+\frac{\alpha}{2})\Gamma\bigl(\frac{d+{\cal{N}}\alpha}{2}+\frac{\alpha}{2}\bigr)}{ \Gamma(\frac{{\cal{N}}\alpha}{2}+1)\Gamma\bigl(\frac{d+{\cal{N}}\alpha}{2}\bigr)({\cal{N}}+1)}.

If d(y)2L2αd(\boldsymbol{y})\leq 2L^{\frac{2}{\alpha}}, then

fY(yΘ)C02πδNi=1Nexp{d(y)2pi}Φ(m0,δpi)pid2,f_{\boldsymbol{Y}}(\boldsymbol{y} | {\boldsymbol{\Theta}}) \simeq \frac{{C}_{0}\sqrt{2\pi \delta }}{N} \sum_{i=1}^{N} \exp\Bigl\{-\frac{d(\boldsymbol{y})}{2p_{i}}\Bigr\}\Phi \bigl( m| 0, \sqrt{\delta p_{i}} \bigr)p_{i}^{-\frac{d}{2}},

where, p1,p2,,pNp_1,p_2,\cdots, p_N (for N=3000N=3000) are independent realizations following positive stable distribution that are generated using command rpstable(3000, alpha). Otherwise, if d(y)>2L2αd(\boldsymbol{y})> 2L^{\frac{2}{\alpha}}, we have

fY(yΘ)C0d(y)δπj=1N(1)j1Γ(jα2+1)sin(jπα2)Γ(j+1)[d(y)2]d+1+jα2Γ(d+jα2)Td+jα(md+jαd(y)δ),f_{\boldsymbol{Y}}(\boldsymbol{y} | \boldsymbol{\Theta})\simeq \frac{{C}_{0}\sqrt{d(\boldsymbol{y})\delta}}{\sqrt{\pi}} \sum_{j=1}^{{\cal{N}}}\frac{ (-1)^{j-1}\Gamma(\frac{j\alpha}{2}+1)\sin \bigl(\frac{j\pi \alpha}{2}\bigr)} {\Gamma(j+1)\bigl[\frac{d(\boldsymbol{y})}{2}\bigr]^{\frac{d+1+j\alpha}{2}}}\Gamma\Bigl(\frac{d+j\alpha}{2}\Bigr) T_{d+j\alpha}\biggl(m\sqrt{\frac{d+j\alpha}{d(\boldsymbol{y})\delta}}\biggr),

where Tν(x)T_{\nu}(x) is distribution function of the Student's tt with ν\nu degrees of freedom, Φ(xa,b)\Phi(x|a,b) is the cumulative density function of normal distribution wih mean aa and standard deviation bb, and C0=2(2π)d+12Σ12,{C_{0}=2 (2\pi)^{-\frac{d+1}{2}}|{\Sigma}|^{-\frac{1}{2}},} d(y)=(yμ)Ω1(yμ),d(\boldsymbol{y})=(\boldsymbol{y}-\boldsymbol{\mu})^{'}{{\Omega}^{-1}}(\boldsymbol{y}-\boldsymbol{\mu}), m=λΩ1(yμ),{m}=\boldsymbol{\lambda}^{'}{{\Omega}}^{-1}(\boldsymbol{y}-\boldsymbol{\mu}), Ω=Σ+λλ,{\Omega}={\Sigma}+\boldsymbol{\lambda}\boldsymbol{\lambda}^{'}, δ=1λΩ1λ{\delta}=1-\boldsymbol{\lambda}^{'}{\Omega}^{-1}\boldsymbol{\lambda}.

Usage

dssg(Y, alpha, Mu, Sigma, Lambda)

Arguments

Y

a vector (or an n×dn\times d matrix) at which the density function is approximated.

alpha

the tail thickness parameter.

Mu

a vector giving the location parameter.

Sigma

a positive definite symmetric matrix specifying the dispersion matrix.

Lambda

a vector giving the skewness parameter.

Value

simulated realizations of size nn from positive α\alpha-stable distribution.

Author(s)

Mahdi Teimouri

Examples

n <- 4
alpha <- 1.4
Mu <- rep(0, 2)
Sigma <- diag(2)
Lambda <- rep(2, 2)
Y <- rssg(n, alpha, Mu, Sigma, Lambda)
dssg(Y, alpha, Mu, Sigma, Lambda)

Estimating parameters of the symmetric α\alpha-stable (Sα\alphaS) distribution using Bayesian paradigm.

Description

Let y1,y2,,yn{{y}}_1,{{y}}_2, \cdots,{{y}}_n are nn realizations form Sα\alphaS distribution with parameters α,σ\alpha, \sigma, and μ\mu. Herein, we estimate parameters of symmetric univariate stable distribution within a Bayesian framework. We consider a uniform distribution for prior of tail thickness, that is αU(0,2)\alpha \sim U(0,2). The normal and inverse gamma conjugate priors are designated for μ\mu and σ2\sigma^2 with density functions given, respectively, by

π(μ)=12πσ0exp{12(μμ0σ0)2},\pi(\mu)=\frac{1}{\sqrt{2\pi}\sigma_{0}}\exp\Bigl\{-\frac{1}{2}\Bigl(\frac{\mu-\mu_0}{\sigma_0}\Bigr)^{2}\Bigr\},

and

π(δ)=δ0γ0δγ01exp{δ0δ},\pi(\delta)= \delta_{0}^{\gamma_{0}}\delta^{-\gamma_0-1}\exp\Bigl\{-\frac{\delta_0}{\delta}\Bigr\},

where μ0R\mu_0 \in R, σ0>0\sigma_0>0, δ=σ2\delta=\sigma^2, δ0>0\delta_0>0, and γ0>0\gamma_0>0.

Usage

fitBayes(y, mu0, sigma0, gamma0, delta0, epsilon)

Arguments

y

vector of realizations that following Sα\alphaS distribution.

mu0

the location hyperparameter corresponding to π(μ)\pi(\mu).

sigma0

the standard deviation hyperparameter corresponding to π(μ)\pi(\mu).

gamma0

the shape hyperparameter corresponding to π(δ)\pi(\delta).

delta0

the rate hyperparameter corresponding to π(δ)\pi(\delta).

epsilon

a positive small constant playing the role of threshold for stopping sampler.

Value

Estimated tail thickness, location, and scale parameters, number of iterations to attain convergence, the log-likelihood value across iterations, the Bayesian information criterion (BIC), and the Akaike information criterion (AIC).

Author(s)

Mahdi Teimouri

Examples

n <- 100
alpha <- 1.4
mu <- 0
sigma <- 1
y <- rnorm(n)
fitBayes(y, mu0 = 0, sigma0 = 0.2, gamma0 = 10e-5, delta0 = 10e-5, epsilon = 0.005)

Computing the maximum likelihood estimator for the mixtures of skewed sub-Gaussian stable distributions using the EM algorithm.

Description

Each dd-dimensional skewed sub-Gaussian stable (SSG) random vector Y\bf{Y}, admits the representation given by Teimouri (2022):

Y=dμ+PλZ0+PΣ12Z1,{\bf{Y}} \mathop=\limits^d {\boldsymbol{\mu}}+\sqrt{P}{\boldsymbol{\lambda}}\vert{Z}_0\vert + \sqrt{P}{\Sigma}^{\frac{1}{2}}{\bf{Z}}_1,

where μ\boldsymbol{\mu} (location vector in Rd{{{R}}}^{d}, λ\boldsymbol{\lambda} (skewness vector in Rd{{{R}}}^{d}), Σ\Sigma (positive definite symmetric dispersion matrix), and 0<α20<\alpha \leq 2 (tail thickness) are model parameters. Furthermore, PP is a positive stable random variable, Z0N(0,1){Z}_0\sim N({0},1), and Z1Nd(0,Σ)\bf{Z}_1\sim N_{d}\bigl({\bf{0}}, \Sigma\bigr). We note that ZZ, Z0Z_0, and Z1\boldsymbol{Z}_1 are mutually independent.

Usage

fitmssg(Y, K, eps = 0.15, initial = "FALSE", method = "moment", starts = starts)

Arguments

Y

an n×dn\times d matrix of observations.

K

number of component.

eps

threshold value for stopping EM algorithm. It is 0.15 by default. The algorithm can be implemented faster if eps is larger.

initial

logical statement. If initial = TRUE, then a list of the initial values must be given. Otherwise, it is determined by method.

method

either em or moment. If method = "moment", then the initial values are determined through the method of moment applied to each of KK clusters that are obtained through the k-means method of Hartigan and Wong (1979). Otherwise, the initial values for each cluster are determined through the EM algorithm (Teimouri et al., 2018) developed for sub-Gaussian stable distributions applied to each of KK clusters.

starts

a list of initial values if initial="TRUE". The list contains a vector of length KK of mixing (weight) parameters, a vector of length KK of tail thickness parameters, KK vectors of length of dd of location parameters, KK dispersion matrices, KK vectors of length of dd of skewness parameters, respectively.

Value

a list of estimated parameters corresponding to KK clusters, predicted labels for clusters, the log-likelihood value across iterations, the Bayesian information criterion (BIC), and the Akaike information criterion (AIC).

Author(s)

Mahdi Teimouri

References

M. Teimouri, 2022. Finite mixture of skewed sub-Gaussian stable distributions, arxiv.org/abs/2205.14067.

M. Teimouri, S. Rezakhah, and A. Mohammadpour, 2018. Parameter estimation using the EM algorithm for symmetric stable random variables and sub-Gaussian random vectors, Journal of Statistical Theory and Applications, 17(3), 439-41.

J. A. Hartigan, M. A. Wong, 1979. Algorithm as 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series c (Applied Statistics), 28, 100-108.

Examples

data(bankruptcy)
out1<-fitmssg(bankruptcy[,2:3], K=2, eps = 0.15, initial="FALSE", method="moment", starts=starts)
n1 <- 100
n2 <- 50
omega1 <- n1/(n1 + n2)
omega2 <- n2/(n1 + n2)
alpha1 <- 1.6
alpha2 <- 1.6
mu1 <- c(-1, -1)
mu2 <- c(6, 6)
sigma1 <- matrix( c(2, 0.20, 0.20, 0.5), 2, 2 )
sigma2 <- matrix( c(0.4, 0.10, 0.10, 0.2  ), 2, 2 )
lambda1 <- c(5, 5)
lambda2 <- c(-5, -5)
Sigma <- array( NA, c(2, 2, 2) )
Sigma[, , 1] <- sigma1
Sigma[, , 2] <- sigma2
starts<-list( c(omega1,omega2), c(alpha1,alpha2), rbind(mu1,mu2), Sigma, rbind(lambda1,lambda2) )
Y <- rbind( rssg(n1 , alpha1, mu1, sigma1, lambda1),  rssg(n2, alpha2, mu2, sigma2, lambda2) )
out2<-fitmssg(Y, K=2, eps=0.15, initial="TRUE", method="moment", starts=starts)

Simulating positive stable random variable.

Description

The cumulative distribution function of positive stable distribution is given by

FP(x)=1π0πexp{xα2αa(θ)}dθ,F_{P}(x)=\frac{1}{\pi}\int_{0}^{\pi}\exp\Bigl\{-x^{-\frac{\alpha}{2-\alpha}}a(\theta)\Bigr\}d\theta,

where 0<α20<\alpha \leq 2 is tail thickness or index of stability and

a(θ)=sin((1α2)θ)[sin(αθ2)]α2α[sin(θ)]22α.a(\theta)=\frac{\sin\Bigl(\bigl(1-\frac{\alpha}{2}\bigr)\theta\Bigr)\Bigl[\sin \bigl(\frac{\alpha \theta}{2}\bigr)\Bigr]^{\frac{\alpha}{2-\alpha}}}{[\sin(\theta)]^{\frac{2}{2-\alpha}}}.

Kanter (1975) used the above integral transform to simulate positive stable random variable as

P=d(a(θ)W)2αα,P\mathop=\limits^d\Bigl( \frac{a(\theta)}{W} \Bigr)^{\frac{2-\alpha}{\alpha}},

in which θU(0,π)\theta\sim U(0,\pi) and WW independently follows an exponential distribution with mean unity.

Usage

rpstable(n, alpha)

Arguments

n

the number of samples required.

alpha

the tail thickness parameter.

Value

simulated realizations of size nn from positive α\alpha-stable distribution.

Author(s)

Mahdi Teimouri

References

M. Kanter, 1975. Stable densities under change of scale and total variation inequalities, Annals of Probability, 3(4), 697-707.

Examples

rpstable(10, alpha = 1.2)

Simulating skewed sub-Gaussian stable random vector.

Description

Each skewed sub-Gaussian stable (SSG) random vector Y\bf{Y}, admits the representation

Y=dμ+PλZ0+PΣ12Z1,{\bf{Y}} \mathop=\limits^d {\boldsymbol{\mu}}+\sqrt{P}{\boldsymbol{\lambda}}\vert{Z}_0\vert + \sqrt{P}{\Sigma}^{\frac{1}{2}}{\bf{Z}}_1,

where μRd{\boldsymbol{\mu}} \in {R}^{d} is location vector, λRd{\boldsymbol{\lambda}} \in {R}^{d} is skewness vector, Σ\Sigma is a positive definite symmetric dispersion matrix, and 0<α20<\alpha \leq 2 is tail thickness. Further, PP is a positive stable random variable, Z0N(0,1){Z}_0\sim N({0},1), and Z1Nd(0,Σ){\bf{Z}}_1\sim N_{d}\bigl({\bf{0}}, \Sigma\bigr). We note that ZZ, Z0Z_0, and Z1{\bf{Z}}_1 are mutually independent.

Usage

rssg(n, alpha, Mu, Sigma, Lambda)

Arguments

n

the number of samples required.

alpha

the tail thickness parameter.

Mu

a vector giving the location parameter.

Sigma

a positive definite symmetric matrix specifying the dispersion matrix.

Lambda

a vector giving the skewness parameter.

Value

simulated realizations of size nn from the skewed sub-Gaussian stable distribution.

Author(s)

Mahdi Teimouri

Examples

n <- 4
alpha <- 1.4
Mu <- rep(0, 2)
Sigma <- diag(2)
Lambda <- rep(2, 2)
rssg(n, alpha, Mu, Sigma, Lambda)

Estimating the tail index of the skewed sub-Gaussian stable distribution using the stochastic EM algorithm given that other parameters are known.

Description

Suppose Y1,Y2,,Yn{\boldsymbol{Y}}_1,{\boldsymbol{Y}}_2, \cdots,{\boldsymbol{Y}}_n are realizations following dd-dimensional skewed sub-Gaussian stable distribution. Herein, we estimate the tail thickness parameter 0<α20<\alpha \leq 2 when μ\boldsymbol{\mu} (location vector in Rd{{{R}}}^{d}, λ\boldsymbol{\lambda} (skewness vector in Rd{{{R}}}^{d}), and Σ\Sigma (positive definite symmetric dispersion matrix are assumed to be known.

Usage

stoch(Y, alpha0, Mu0, Sigma0, Lambda0)

Arguments

Y

a vector (or an n×dn\times d matrix) at which the density function is approximated.

alpha0

initial value for the tail thickness parameter.

Mu0

a vector giving the initial value for the location parameter.

Sigma0

a positive definite symmetric matrix specifying the initial value for the dispersion matrix.

Lambda0

a vector giving the initial value for the skewness parameter.

Details

Here, we assume that parameters μ{\boldsymbol{\mu}}, λ{\boldsymbol{\lambda}}, and Σ\Sigma are known and only the tail thickness parameter needs to be estimated.

Value

Estimated tail thickness parameter α\alpha, of the skewed sub-Gaussian stable distribution.

Author(s)

Mahdi Teimouri

Examples

n <- 100
alpha <- 1.4
Mu <- rep(0, 2)
Sigma <- diag(2)
Lambda <- rep(2, 2)
Y <- rssg(n, alpha, Mu, Sigma, Lambda)
stoch(Y, alpha, Mu, Sigma, Lambda)