Title: | Mixed Correlation Coefficient Matrix |
---|---|
Description: | The IRLS (Iteratively Reweighted Least Squares) and GMM (Generalized Method of Moments) methods are applied to estimate mixed correlation coefficient matrix (Pearson, Polyseries, Polychoric), which can be estimated in pairs or simultaneously. For more information see Peng Zhang and Ben Liu (2024) <doi:10.1080/10618600.2023.2257251>; Ben Liu and Peng Zhang (2024) <doi:10.48550/arXiv.2404.06781>. |
Authors: | Ben Liu [aut, cre], Peng Zhang [ths], Xiaowei Lou [aut, dtc] |
Maintainer: | Ben Liu <[email protected]> |
License: | GPL |
Version: | 0.1.0 |
Built: | 2025-02-14 04:56:19 UTC |
Source: | https://github.com/cran/MCCM |
The CECERS uses a 9-point scoring system, 1-3 (inadequate), 5 (least acceptable), 7 (good), and 9 (excellent), to measure the quality of Chinese early children education (ECE) programs for children aged 3 to 6. The CECERS has a total of 51 items organized in eight categories: (1) Space and Furnishings (9 items); (2) Personal Care Routines (6 items); (3) Curriculum Planning and Implementation (5 items); (4) Whole-Group Instruction (7 items); (5) Activities (9 items); (6) Language-Reasoning (4 items); (7) Guidance and Interaction (5 items); (8) Parents and Staff (6 items).
A data frame with 1383 rows and 95 variables:
Kejian Li, Peng Zhang, Bi Ying Hu, Margaret R Burchinal, Xitao Fan, and Jinliang Qin. Testing the ‘thresholds’ of preschool education quality on child outcomes in china. Early Childhood Research Quarterly, 47:445–456, 2019.
Bivariate normal density with mean 0 variance 1.
dphixy(x, y, rho)
dphixy(x, y, rho)
x , y
|
points value. |
rho |
correlation coefficient. |
the density value.
library(mvtnorm) dmvnorm(c(1,-1),sigma = matrix(c(1,0.5,0.5,1),2,2)) dphixy(1,-1,0.5)
library(mvtnorm) dmvnorm(c(1,-1),sigma = matrix(c(1,0.5,0.5,1),2,2)) dphixy(1,-1,0.5)
Estimate the MCCM from dataframe and draw it with scatter plot of matrices (SPLOM). With bivariate scatter plots below the diagonal, histograms on the diagonal, and the polychoric correlation coefficients with standard errors above the diagonal. Correlation ellipses are drawn in the same graph. The red lines below the diagonal are the LOESS smoothed lines, fitting a smooth curve between two variables.
draw_correlation_matrix( data1, order_indx, pair_est = FALSE, MLE = FALSE, R0 = NULL, app = TRUE, korder = 2, max_iter = 1000, max_tol = 1e-08, show_log = FALSE )
draw_correlation_matrix( data1, order_indx, pair_est = FALSE, MLE = FALSE, R0 = NULL, app = TRUE, korder = 2, max_iter = 1000, max_tol = 1e-08, show_log = FALSE )
data1 |
a dataframe containing continuous or ordinal variable. |
order_indx |
a vector to indicate the ordinal variables. |
pair_est |
bool value, TRUE for pairwise estimation, FALSE for simultaneous estimation. |
MLE |
bool value, TRUE for maximum likelihood estimation, FALSE for IRLS (pairwise) or IGMM (simultaneous) estimation. |
R0 |
the initial value for correlation vector, default Pearson correlation matrix. |
app |
bool value for approximation, TRUE for Legendre approximation, FALSE for common integral. |
korder |
the order of Legendre approximation. |
max_iter |
max iteration number for IGMM. |
max_tol |
max tolerance for iteration algorithm. |
show_log |
bool value, TRUE for showing calculation log. |
the SPLOM plot.
library(mvtnorm) library(MASS) library(polycor) library(lavaan) set.seed(1997) n = 10000 rho12=0.3 rho13=0.4 rho14=0.5 rho23=0.6 rho24=0.7 rho34=0.8 R = matrix(c(1,rho12,rho13,rho14,rho12,1,rho23,rho24,rho13,rho23,1,rho34, rho14,rho24,rho34,1),4,4) indc = c(3,4) thresholds = list(c(),c(),0,0) data1 = gen_mixed(n=n,R=R,indc=indc,thresholds=thresholds) data2 = data.frame(data1$observed) # pairwise MLE estimation draw_correlation_matrix(data2,indc,TRUE,TRUE) # pairwise IRLS estimation draw_correlation_matrix(data2,indc,TRUE,FALSE) # simultaneous MLE estimation draw_correlation_matrix(data2,indc,FALSE,TRUE) # simultaneous IGMM estimation draw_correlation_matrix(data2,indc,FALSE,FALSE)
library(mvtnorm) library(MASS) library(polycor) library(lavaan) set.seed(1997) n = 10000 rho12=0.3 rho13=0.4 rho14=0.5 rho23=0.6 rho24=0.7 rho34=0.8 R = matrix(c(1,rho12,rho13,rho14,rho12,1,rho23,rho24,rho13,rho23,1,rho34, rho14,rho24,rho34,1),4,4) indc = c(3,4) thresholds = list(c(),c(),0,0) data1 = gen_mixed(n=n,R=R,indc=indc,thresholds=thresholds) data2 = data.frame(data1$observed) # pairwise MLE estimation draw_correlation_matrix(data2,indc,TRUE,TRUE) # pairwise IRLS estimation draw_correlation_matrix(data2,indc,TRUE,FALSE) # simultaneous MLE estimation draw_correlation_matrix(data2,indc,FALSE,TRUE) # simultaneous IGMM estimation draw_correlation_matrix(data2,indc,FALSE,FALSE)
An accelerated function to estimate a mixed correlation coefficient matrix, as well as its covariance matrix, for dataframes containing continuous and ordinal variable.
est_mixedGMM( dataYX, order_indx, R0 = NULL, app = TRUE, korder = 2, max_iter = 1000, max_tol = 1e-08, show_log = FALSE )
est_mixedGMM( dataYX, order_indx, R0 = NULL, app = TRUE, korder = 2, max_iter = 1000, max_tol = 1e-08, show_log = FALSE )
dataYX |
a dataframe or matrix containing both continuous and ordinal variables. |
order_indx |
a vector to indicate the ordinal variables. |
R0 |
the initial value for correlation vector, default Pearson correlation matrix. |
app |
bool value for approximation, TRUE for Legendre approximation, FALSE for common integral. |
korder |
the order of Legendre approximation. |
max_iter |
max iteration number for IGMM. |
max_tol |
max tolerance for iteration algorithm. |
show_log |
bool value, TRUE for showing calculation log. |
Rhat |
The estimated correlation coefficients. |
COV |
The estimated covariance matrix for Rhat |
arXiv:2404.06781
library(mvtnorm) library(MASS) set.seed(1997) n = 500 rho12=0.3 rho13=0.4 rho14=0.5 rho23=0.6 rho24=0.7 rho34=0.8 R = matrix(c(1,rho12,rho13,rho14,rho12,1,rho23,rho24,rho13,rho23,1,rho34, rho14,rho24,rho34,1),4,4) indc = c(3,4) thresholds = list(c(),c(),0,0) data1 = gen_mixed(n=n,R=R,indc=indc,thresholds=thresholds) data2 = data.frame(data1$observed) out1 = est_mixedGMM(dataYX = data2,order_indx = indc) print(out1$Rhat) print(out1$COV)
library(mvtnorm) library(MASS) set.seed(1997) n = 500 rho12=0.3 rho13=0.4 rho14=0.5 rho23=0.6 rho24=0.7 rho34=0.8 R = matrix(c(1,rho12,rho13,rho14,rho12,1,rho23,rho24,rho13,rho23,1,rho34, rho14,rho24,rho34,1),4,4) indc = c(3,4) thresholds = list(c(),c(),0,0) data1 = gen_mixed(n=n,R=R,indc=indc,thresholds=thresholds) data2 = data.frame(data1$observed) out1 = est_mixedGMM(dataYX = data2,order_indx = indc) print(out1$Rhat) print(out1$COV)
Function to calculate thresholds from ordinal variables.
est_thre(X)
est_thre(X)
X |
a ordinal series. |
the estimated value for thresholds.
library(mvtnorm) set.seed(1997) R1 = gen_CCM(4) n = 1000 indc = 3:4 thresholds = list(c(),c(),c(-1),c(1)) data1 = gen_mixed(n,R1,indc,thresholds=thresholds)$observed est_thre(data1[,3]) est_thre(data1[,4])
library(mvtnorm) set.seed(1997) R1 = gen_CCM(4) n = 1000 indc = 3:4 thresholds = list(c(),c(),c(-1),c(1)) data1 = gen_mixed(n,R1,indc,thresholds=thresholds)$observed est_thre(data1[,3]) est_thre(data1[,4])
Estimate the polychoric correlation coefficient.
esti_polychoric(X, maxn = 100, e = 1e-08, ct = FALSE)
esti_polychoric(X, maxn = 100, e = 1e-08, ct = FALSE)
X |
a matrix(2*N) or dataframe contains two polychoric variable, or a contingency table with both columns and rows names. |
maxn |
the maximum iterations times. |
e |
the maximum tolerance of convergence. |
ct |
|
rho |
estimated value of polychoric correlation coefficient. |
std |
standard deviation of rho. |
iter |
times of iteration convergence. |
Ex , Ey
|
the support points series of regression model |
Zhang, P., Liu, B., & Pan, J. (2024). Iteratively Reweighted Least Squares Method for Estimating Polyserial and Polychoric Correlation Coefficients. Journal of Computational and Graphical Statistics, 33(1), 316–328. https://doi.org/10.1080/10618600.2023.2257251
X = gen_polychoric(1000,0.5,0:1,-1:0) result = esti_polychoric(X) print(c(result$rho,result$std,result$iter))
X = gen_polychoric(1000,0.5,0:1,-1:0) result = esti_polychoric(X) print(c(result$rho,result$std,result$iter))
Estimate the polyserial correlation coefficient.
esti_polyserial(X, maxn = 100, e = 1e-08)
esti_polyserial(X, maxn = 100, e = 1e-08)
X |
a matrix(2*N) or dataframe contains two polyserial variable(Continuous variable first). |
maxn |
the maximum iterations times. |
e |
the maximum tolerance of convergence. |
rho |
estimated value of polyserial correlation coefficient. |
std |
standard deviation of rho. |
iter |
times of iteration convergence. |
Ex , Ey
|
the support point of regression model. |
Zhang, P., Liu, B., & Pan, J. (2024). Iteratively Reweighted Least Squares Method for Estimating Polyserial and Polychoric Correlation Coefficients. Journal of Computational and Graphical Statistics, 33(1), 316–328. https://doi.org/10.1080/10618600.2023.2257251
X = gen_polyseries(1000,0.5,-1:1) result = esti_polyserial(X) result
X = gen_polyseries(1000,0.5,-1:1) result = esti_polyserial(X) result
Generate a positive semidefinite correlation coefficients matrix
gen_CCM(d)
gen_CCM(d)
d |
the dimension of matrix. |
a correlation coefficients matrix.
X = gen_CCM(4) print(X)
X = gen_CCM(4) print(X)
Generate multi-normal sample and segment it into ordinal.
gen_mixed(n, R, indc, thresholds)
gen_mixed(n, R, indc, thresholds)
n |
the sample size. |
R |
the correlation coefficient matrix. |
indc |
vector to indicate whether variables are continuous or categorical. |
thresholds |
list contains thresholds for ordinal variables |
latent |
the original normal data. |
observed |
the observed ordinal data. |
library(mvtnorm) set.seed(1997) R1 = gen_CCM(6) n = 1000 indc = 4:6 thresholds = list( c(), c(), c(), c(0), c(-0.5,0), c(0,0.5) ) data1 = gen_mixed(n,R1,indc,thresholds)$observed data1 = data.frame(data1) table(data1$X4,data1$X5) table(data1$X5,data1$X6)
library(mvtnorm) set.seed(1997) R1 = gen_CCM(6) n = 1000 indc = 4:6 thresholds = list( c(), c(), c(), c(0), c(-0.5,0), c(0,0.5) ) data1 = gen_mixed(n,R1,indc,thresholds)$observed data1 = data.frame(data1) table(data1$X4,data1$X5) table(data1$X5,data1$X6)
Generate polychoric sample with hidden distribution: binormal with correlation coefficient rho.
gen_polychoric(n, rho, a, b)
gen_polychoric(n, rho, a, b)
n |
sample size. |
rho |
correlation coefficient. |
a |
the cutoff points array. |
b |
the cutoff points array. |
Polychoric sample with size n(in a 2*n matrix).
gen_polychoric(100,0.5,-1:1,1:2)
gen_polychoric(100,0.5,-1:1,1:2)
Generate polyseries sample with hidden distribution: binormal with correlation coefficient rho.
gen_polyseries(n, rho, a)
gen_polyseries(n, rho, a)
n |
sample size. |
rho |
correlation coefficient. |
a |
the cutoff points array. |
Polyseries sample with size n(in a 2*n matrix).
gen_polyseries(100,0.5,-1:1)
gen_polyseries(100,0.5,-1:1)
Generate random number of binormal distribution with 0 mean unit variance and correlation coefficient rho.
gen_rho(n, rho)
gen_rho(n, rho)
n |
sample size. |
rho |
correlation coefficient. |
Binormal random number with length n(in a 2*n matrix).
gen_rho(100,0.5)
gen_rho(100,0.5)
Calculate the MB of an array of estimates relative to the true value.
mb(rhohat, rho)
mb(rhohat, rho)
rhohat |
an array of estimators of rho. |
rho |
the true value of rho. |
the mean bias of rhohat array.
rho = 0.5 rhohat = 0.5 + rnorm(10) mb(rhohat,rho)
rho = 0.5 rhohat = 0.5 + rnorm(10) mb(rhohat,rho)
Estimate the correlation matrix for dataframes containing continuous and ordinal variable, in pairs or simultaneously, using MLE, IRLS, or IGMM.
MCCM_est( dataYX, order_indx, pair_est = FALSE, MLE = FALSE, R0 = NULL, app = TRUE, korder = 2, max_iter = 1000, max_tol = 1e-08, show_log = FALSE )
MCCM_est( dataYX, order_indx, pair_est = FALSE, MLE = FALSE, R0 = NULL, app = TRUE, korder = 2, max_iter = 1000, max_tol = 1e-08, show_log = FALSE )
dataYX |
a dataframe or matrix containing both continuous and ordinal variables. |
order_indx |
a vector to indicate the ordinal variables. |
pair_est |
bool value, TRUE for pairwise estimation, FALSE for simultaneous estimation. |
MLE |
bool value, TRUE for maximum likelihood estimation, FALSE for IRLS (pairwise) or IGMM (simultaneous) estimation. |
R0 |
the initial value for correlation vector, default Pearson correlation matrix. |
app |
bool value for approximation, TRUE for Legendre approximation, FALSE for common integral. |
korder |
the order of Legendre approximation. |
max_iter |
max iteration number for IGMM. |
max_tol |
max tolerance for iteration algorithm. |
show_log |
bool value, TRUE for showing calculation log. |
Rmatrix |
Estimated mixed correlation coefficient matrix. |
std_matrix |
Estimated standard deviation for each mixed correlation coefficient. |
COV |
The covariance matrix for MCCM (simultaneous estimation only). |
esti_polyserial, esti_polychoric, est_mixedGMM, summary_MCCM_est, draw_correlation_matrix
library(mvtnorm) library(MASS) library(polycor) library(lavaan) set.seed(1997) n = 10000 rho12=0.3 rho13=0.4 rho14=0.5 rho23=0.6 rho24=0.7 rho34=0.8 R = matrix(c(1,rho12,rho13,rho14,rho12,1,rho23,rho24,rho13,rho23,1,rho34, rho14,rho24,rho34,1),4,4) indc = c(3,4) thresholds = list(c(),c(),0,0) data1 = gen_mixed(n=n,R=R,indc=indc,thresholds=thresholds) data2 = data.frame(data1$observed) # pairwise MLE estimation out_pair_MLE = MCCM_est(dataYX=data2,order_indx=indc,pair_est=TRUE,MLE=TRUE) # pairwise IRLS estimation out_pair_IRLS = MCCM_est(dataYX=data2,order_indx=indc,pair_est=TRUE,MLE=FALSE) # simultaneous MLE estimation out_sim_MLE = MCCM_est(dataYX=data2,order_indx=indc,pair_est=FALSE,MLE=TRUE) # simultaneous IGMM estimation out_sim_IGMM = MCCM_est(dataYX=data2,order_indx=indc,pair_est=FALSE,MLE=FALSE) summary_MCCM_est(out_pair_MLE) summary_MCCM_est(out_pair_IRLS) summary_MCCM_est(out_sim_MLE) summary_MCCM_est(out_sim_IGMM)
library(mvtnorm) library(MASS) library(polycor) library(lavaan) set.seed(1997) n = 10000 rho12=0.3 rho13=0.4 rho14=0.5 rho23=0.6 rho24=0.7 rho34=0.8 R = matrix(c(1,rho12,rho13,rho14,rho12,1,rho23,rho24,rho13,rho23,1,rho34, rho14,rho24,rho34,1),4,4) indc = c(3,4) thresholds = list(c(),c(),0,0) data1 = gen_mixed(n=n,R=R,indc=indc,thresholds=thresholds) data2 = data.frame(data1$observed) # pairwise MLE estimation out_pair_MLE = MCCM_est(dataYX=data2,order_indx=indc,pair_est=TRUE,MLE=TRUE) # pairwise IRLS estimation out_pair_IRLS = MCCM_est(dataYX=data2,order_indx=indc,pair_est=TRUE,MLE=FALSE) # simultaneous MLE estimation out_sim_MLE = MCCM_est(dataYX=data2,order_indx=indc,pair_est=FALSE,MLE=TRUE) # simultaneous IGMM estimation out_sim_IGMM = MCCM_est(dataYX=data2,order_indx=indc,pair_est=FALSE,MLE=FALSE) summary_MCCM_est(out_pair_MLE) summary_MCCM_est(out_pair_IRLS) summary_MCCM_est(out_sim_MLE) summary_MCCM_est(out_sim_IGMM)
Calculate the MRB of an array of estimates relative to the true value.
mrb(rhohat, rho)
mrb(rhohat, rho)
rhohat |
an array of estimators of rho. |
rho |
the true value of rho. |
the mean relative bias of rhohat array.
rho = 0.5 rhohat = 0.5 + rnorm(10) mrb(rhohat,rho)
rho = 0.5 rhohat = 0.5 + rnorm(10) mrb(rhohat,rho)
The Parenteral Nutrition data were collected from 543 patients of whom 386 were given parenteral nutrition alone, 145 were given enteral and parenteral nutrition, and 3 were given enteral nutrition only. There are 23 main discrete variables, such as: clinical stages(1-4), dietary status(1-3), NRS(0-6), PG-SGA-qualitative(1-3), etc.
A data frame with 1086 rows and 29 variables:
Standard bivariate normal distribution approximated with Legendre polynomials.
Phixy(x, y, rho, korder = 3, app = TRUE)
Phixy(x, y, rho, korder = 3, app = TRUE)
x , y
|
P(X<=x,Y<=y). |
rho |
correlation coefficient. |
korder |
order of Legendre approximation. |
app |
bool value TRUE for approximation, FALSE for integral. |
P(X<=x,Y<=y).
library(mvtnorm) pmvnorm(upper = c(1,-1),sigma = matrix(c(1,0.5,0.5,1),2,2)) Phixy(1,-1,0.5,2,app=TRUE) Phixy(1,-1,0.5,app=TRUE)
library(mvtnorm) pmvnorm(upper = c(1,-1),sigma = matrix(c(1,0.5,0.5,1),2,2)) Phixy(1,-1,0.5,2,app=TRUE) Phixy(1,-1,0.5,app=TRUE)
Calculate the RMSE of an array of estimates relative to the true value.
rmse(rhohat, rho)
rmse(rhohat, rho)
rhohat |
an array of estimators of rho. |
rho |
the true value of rho. |
the root mean squared error of rhohat array.
rho = 0.5 rhohat = 0.5 + rnorm(10) rmse(rhohat,rho)
rho = 0.5 rhohat = 0.5 + rnorm(10) rmse(rhohat,rho)
Display the estimated correlation matrix and std matrix for a MCCM_est list.
summary_MCCM_est(out_MCCM)
summary_MCCM_est(out_MCCM)
out_MCCM |
output of function MCCM_est. |
The summary of estimation.
MCCM_est, draw_correlation_matrix