书籍名称:Generalized Linear Models and Extensions, Fourth Edition
出版社:Stata Press
作者: James W. Hardin and Joseph M. Hilbe
出版时间:2018-07-01
语种: 英语
页数: 598
印刷日期:2018-07-12
开本: 胶版纸
纸张:598 I S B N: 978-1-59718-225-6
装订: 平装
Generalized linear models (GLMs) extend linear regression to models with a non-Gaussian or even discrete response. GLM theory is predicated on the exponential family of distributions—a class so rich that it includes the commonly used logit, probit, and Poisson models. Although one can fit these models in Stata by using specialized commands (for example, logit for logit models), fitting them as GLMs with Stata’s glm command offers some advantages. For example, model diagnostics may be calculated and interpreted similarly regardless of the assumed distribution. This text thoroughly covers GLMs, both theoretically and computationally, with an emphasis on Stata. The theory consists of showing how the various GLMs are special cases of the exponential family, showing general properties of this family of distributions, and showing the derivation of maximum likelihood (ML) estimators and standard errors. Hardin and Hilbe show how iteratively reweighted least squares, another method of parameter estimation, is a consequence of ML estimation using Fisher scoring. The authors also discuss different methods of estimating standard errors, including robust methods, robust methods with clustering, Newey–West, outer product of the gradient, bootstrap, and jackknife. The thorough coverage of model diagnostics includes measures of influence such as Cook’s distance, several forms of residuals, the Akaike and Bayesian information criteria, and various R2-type measures of explained variability. After presenting general theory, Hardin and Hilbe then break down each distribution. Each distribution has its own chapter that explains the computational details of applying the general theory to that particular distribution. Pseudocode plays a valuable role here because it lets the authors describe computational algorithms relatively simply. Devoting an entire chapter to each distribution (or family, in GLM terms) also allows for the inclusion of real-data examples showing how Stata fits such models, as well as the presentation of certain diagnostics and analytical strategies that are unique to that family. The chapters on binary data and on count (Poisson) data are excellent in this regard. Hardin and Hilbe give ample attention to the problems of overdispersion and zero inflation in count-data models. The final part of the text concerns extensions of GLMs. First, the authors cover multinomial responses, both ordered and unordered. Although multinomial responses are not strictly a part of GLM, the theory is similar in that one can think of a multinomial response as an extension of a binary response. The examples presented in these chapters often use the authors’ own Stata programs, augmenting official Stata’s capabilities. Second, GLMs may be extended to clustered data through generalized estimating equations (GEEs), and one chapter covers GEE theory and examples. GLMs may also be extended by programming one’s own family and link functions for use with Stata’s official glm command, and the authors detail this process. Finally, the authors describe extensions for multivariate models and Bayesian analysis. The fourth edition includes two new chapters. The first introduces bivariate and multivariate models for binary and count outcomes. The second covers Bayesian analysis and demonstrates how to use the bayes: prefix and the bayesmh command to fit Bayesian models for many of the GLMs that were discussed in previous chapters. Additionally, the authors added discussions of the generalized negative binomial models of Waring and Famoye. New explanations of working with heaped data, clustered data, and bias-corrected GLMs are included. The new edition also incorporates more examples of creating synthetic data for models such as Poisson, negative binomial, hurdle, and finite mixture models.
List of figures List of tables List of listings Preface 1 Introduction 1.1 Origins and motivation 1.2 Notational conventions 1.3 Applied or theoretical? 1.4 Road map 1.5 Installing the support materials I Foundations of Generalized Linear Models 2 GLMs 2.1 Components 2.2 Assumptions 2.3 Exponential family 2.4 Example: Using an offset in a GLM 2.5 Summary 3 GLM estimation algorithms 3.1 Newton–Raphson (using the observed Hessian) 3.2 Starting values for Newton–Raphson 3.3 IRLS (using the expected Hessian) 3.4 Starting values for IRLS 3.5 Goodness of fit 3.6 Estimated variance matrices 3.6.1 Hessian 3.6.2 Outer product of the gradient 3.6.3 Sandwich 3.6.4 Modified sandwich 3.6.5 Unbiased sandwich 3.6.6 Modified unbiased sandwich 3.6.7 Weighted sandwich: Newey–West 3.6.8 Jackknife 3.6.8.1 Usual jackknife 3.6.8.2 One-step jackknife 3.6.8.3 Weighted jackknife 3.6.8.4 Variable jackknife 3.6.9 Bootstrap 3.6.9.1 Usual bootstrap 3.6.9.2 Grouped bootstrap 3.7 Estimation algorithms 3.8 Summary 4 Analysis of fit 4.1 Deviance 4.2 Diagnostics 4.2.1 Cook’s distance 4.2.2 Overdispersion 4.3 Assessing the link function 4.4 Residual analysis 4.4.1 Response residuals 4.4.2 Working residuals 4.4.3 Pearson residuals 4.4.4 Partial residuals 4.4.5 Anscombe residuals 4.4.6 Deviance residuals 4.4.7 Adjusted deviance residuals 4.4.8 Likelihood residuals 4.4.9 Score residuals 4.5 Checks for systematic departure from the model 4.6 Model statistics 4.6.1 Criterion measures 4.6.1.1 AIC 4.6.1.2 BIC 4.6.2 The interpretation of R2 in linear regression 4.6.2.1 Percentage variance explained 4.6.2.2 The ratio of variances 4.6.2.3 A transformation of the likelihood ratio 4.6.2.4 A transformation of the F test 4.6.2.5 Squared correlation 4.6.3 Generalizations of linear regression R2 interpretations 4.6.3.1 Efron’s pseudo-R2 4.6.3.2 McFadden’s likelihood-ratio index 4.6.3.3 Ben-Akiva and Lerman adjusted likelihood-ratio index 4.6.3.4 McKelvey and Zavoina ratio of variances 4.6.3.5 Transformation of likelihood ratio 4.6.3.6 Cragg and Uhler normed measure 4.6.4 More R2 measures 4.6.4.1 The count R2 4.6.4.2 The adjusted count R2 4.6.4.3 Veall and Zimmermann R2 4.6.4.4 Cameron–Windmeijer R2 4.7 Marginal effects 4.7.1 Marginal effects for GLMs 4.7.2 Discrete change for GLMs II Continuous Response Models 5 The Gaussian family 5.1 Derivation of the GLM Gaussian family 5.2 Derivation in terms of the mean 5.3 IRLS GLM algorithm (nonbinomial) 5.4 ML estimation 5.5 GLM log-Gaussian models 5.6 Expected versus observed information matrix 5.7 Other Gaussian links 5.8 Example: Relation to OLS 5.9 Example: Beta-carotene 6 The gamma family 6.1 Derivation of the gamma model 6.2 Example: Reciprocal link 6.3 ML estimation 6.4 Log-gamma models 6.5 Identity-gamma models 6.6 Using the gamma model for survival analysis 7 The inverse Gaussian family 7.1 Derivation of the inverse Gaussian model 7.2 Shape of the distribution 7.3 The inverse Gaussian algorithm 7.4 Maximum likelihood algorithm 7.5 Example: The canonical inverse Gaussian 7.6 Noncanonical links 8 The power family and link 8.1 Power links 8.2 Example: Power link 8.3 The power family III Binomial Response Models 9 The binomial–logit family 9.1 Derivation of the binomial model 9.2 Derivation of the Bernoulli model 9.3 The binomial regression algorithm 9.4 Example: Logistic regression 9.4.1 Model producing logistic coefficients: The heart data 9.4.2 Model producing logistic odds ratios 9.5 GOF statistics 9.6 Grouped data 9.7 Interpretation of parameter estimates 10 The general binomial family 10.1 Noncanonical binomial models 10.2 Noncanonical binomial links (binary form) 10.3 The probit model 10.4 The clog-log and log-log models 10.5 Other links 10.6 Interpretation of coefficients 10.6.1 Identity link 10.6.2 Logit link 10.6.3 Log link 10.6.4 Log complement link 10.6.5 Log-log link 10.6.6 Complementary log-log link 10.6.7 Summary 10.7 Generalized binomial regression 10.8 Beta binomial regression 10.9 Zero-inflated models 11 The problem of overdispersion 11.1 Overdispersion 11.2 Scaling of standard errors 11.3 Williams’ procedure 11.4 Robust standard errors IV Count Response Models 12 The Poisson family 12.1 Count response regression models 12.2 Derivation of the Poisson algorithm 12.3 Poisson regression: Examples 12.4 Example: Testing overdispersion in the Poisson model 12.5 Using the Poisson model for survival analysis 12.6 Using offsets to compare models 12.7 Interpretation of coefficients 13 The negative binomial family 13.1 Constant overdispersion 13.2 Variable overdispersion 13.2.1 Derivation in terms of a Poisson–gamma mixture 13.2.2 Derivation in terms of the negative binomial probability function 13.2.3 The canonical link negative binomial parameterization 13.3 The log-negative binomial parameterization 13.4 Negative binomial examples 13.5 The geometric family 13.6 Interpretation of coefficients 14 Other count-data models 14.1 Count response regression models 14.2 Zero-truncated models 14.3 Zero-inflated models 14.4 General truncated models 14.5 Hurdle models 14.6 Negative binomial(P) models 14.7 Negative binomial(Famoye) 14.8 Negative binomial(Waring) 14.9 Heterogeneous negative binomial models 14.10 Generalized Poisson regression models 14.11 Poisson inverse Gaussian models 14.12 Censored count response models 14.13 Finite mixture models 14.14 Quantile regression for count outcomes 14.15 Heaped data models V Multinomial Response Models 15 Unordered-response family 15.1 The multinomial logit model 15.1.1 Interpretation of coefficients: Single binary predictor 15.1.2 Example: Relation to logistic regression 15.1.3 Example: Relation to conditional logistic regression 15.1.4 Example: Extensions with conditional logistic regression 15.1.5 The independence of irrelevant alternatives 15.1.6 Example: Assessing the IIA 15.1.7 Interpreting coefficients 15.1.8 Example: Medical admissions—introduction 15.1.9 Example: Medical admissions—summary 15.2 The multinomial probit model 15.2.1 Example: A comparison of the models 15.2.2 Example: Comparing probit and multinomial probit 15.2.3 Example: Concluding remarks 16 The ordered-response family 16.1 Interpretation of coefficients: Single binary predictor 16.2 Ordered outcomes for general link 16.3 Ordered outcomes for specific links 16.3.1 Ordered logit 16.3.2 Ordered probit 16.3.3 Ordered clog-log 16.3.4 Ordered log-log 16.3.5 Ordered cauchit 16.4 Generalized ordered outcome models 16.5 Example: Synthetic data 16.6 Example: Automobile data 16.7 Partial proportional-odds models 16.8 Continuation-ratio models 16.9 Adjacent category model VI Extensions to the GLM 17 Extending the likelihood 17.1 The quasilikelihood 17.2 Example: Wedderburn’s leaf blotch data 17.3 Example: Tweedie family variance 17.4 Generalized additive models 18 Clustered data 18.1 Generalization from individual to clustered data 18.2 Pooled estimators 18.3 Fixed effects 18.3.1 Unconditional fixed-effects estimators 18.3.2 Conditional fixed-effects estimators 18.4 Random effects 18.4.1 Maximum likelihood estimation 18.4.2 Gibbs sampling 18.5 Mixed-effect models 18.6 GEEs 18.7 Other models 19 Bivariate and multivariate models 19.1 Bivariate and multivariate models for binary outcomes 19.2 Copula functions 19.3 Using copula functions to calculate bivariate probabilities 19.4 Synthetic datasets 19.5 Examples of bivariate count models using copula functions 19.6 The Famoye bivariate Poisson regression model 19.7 The Marshall–Olkin bivariate negative binomial regression model 19.8 The Famoye bivariate negative binomial regression model 20 Bayesian GLMs 20.1 Brief overview of Bayesian methodology 20.1.1 Specification and estimation 20.1.2 Bayesian analysis in Stata 20.2 Bayesian logistic regression 20.2.1 Bayesian logistic regression—noninformative priors 20.2.2 Diagnostic plots 20.2.3 Bayesian logistic regression—informative priors 20.3 Bayesian probit regression 20.4 Bayesian complementary log-log regression 20.5 Bayesian binomial logistic regression 20.6 Bayesian Poisson regression 20.6.1 Bayesian Poisson regression with noninformative priors 20.6.2 Bayesian Poisson with informative priors 20.7 Bayesian negative binomial likelihood 20.7.1 Zero-inflated negative binomial logit 20.8 Bayesian normal regression 20.9 Writing a custom likelihood 20.9.1 Using the llf() option 20.9.1.1 Bayesian logistic regression using llf() 20.9.1.2 Bayesian zero-inflated negative binomial logit regression using llf() 20.9.2 Using the llevaluator() option 20.9.2.1 Logistic regression model using llevaluator() 20.9.2.2 Bayesian clog-log regression with llevaluator() 20.9.2.3 Bayesian Poisson regression with llevaluator() 20.9.2.4 Bayesian negative binomial regression using llevaluator() 20.9.2.5 Zero-inflated negative binomial logit using llevaluator() 20.9.2.6 Bayesian gamma regression using llevaluator() 20.9.2.7 Bayesian inverse Gaussian regression using llevaluator() 20.9.2.8 Bayesian zero-truncated Poisson using llevaluator() 20.9.2.9 Bayesian bivariate Poisson using llevaluator() VII Stata Software 21 Programs for Stata 21.1 The glm command 21.1.1 Syntax 21.1.2 Description 21.1.3 Options 21.2 The predict command after glm 21.2.1 Syntax 21.2.2 Options 21.3 User-written programs 21.3.1 Global macros available for user-written programs 21.3.2 User-written variance functions 21.3.3 User-written programs for link functions 21.3.4 User-written programs for Newey–West weights 21.4 Remarks 21.4.1 Equivalent commands 21.4.2 Special comments on family(Gaussian) models 21.4.3 Special comments on family(binomial) models 21.4.4 Special comments on family(nbinomial) models 21.4.5 Special comment on family(gamma) link(log) models 22 Data synthesis 22.1 Generating correlated data 22.2 Generating data from a specified population 22.2.1 Generating data for linear regression 22.2.2 Generating data for logistic regression 22.2.3 Generating data for probit regression 22.2.4 Generating data for complimentary log-log regression 22.2.5 Generating data for Gaussian variance and log link 22.2.6 Generating underdispersed count data 22.3 Simulation 22.3.1 Heteroskedasticity in linear regression 22.3.2 Power analysis 22.3.3 Comparing fit of Poisson and negative binomial 22.3.4 Effect of missing covariate on R2Efron in Poisson regression A Tables References Author index Subject index