### Citations

2449 | Generalized Additive Models.
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ...C encodes the aggregation or mixing pattern. In the penalized CLM, a smooth structure for η is modelled with Psplines (Eilers, 2007). It is a powerful model for grouped counts (Lambert and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interactions. The GAM with P-splines, or P-GAM, was proposed by Marx and Eilers (1998). We illustrate the main idea in two dimensions. Let η = f1(x1)+ f2(x2) = [B1|B2] [ a1 a2 ] = Ba. (13) By combining the two bases into one matrix and chaining the coefficients in one vector we are back in a standard regression setting. Paul H.C. Eilers, Brian D. Marx and Maria Durban 163 The roughness penalties are λ1||D1a1||2 and λ2||D2a2||2 (where the indices here refer to the variables, not to the order of the dif... |

573 |
Classical and Modern Regression with Applications
- Myers
- 1990
(Show Context)
Citation Context ...for each observation in turn we can compute the prediction error CV = √ ∑ i (yi− y−i)2. (6) As such, CV is a natural criterion to select λ, through its minimization. Using brute force, the computation of CV is expensive, certainly when the number of observations is large. Fortunately there is an exact shortcut. We have that y = B(BTWB+λDTD)−1BTWy = Hy. (7) Commonly H is called the “hat” matrix. One can prove that yi − yi−1 = (yi − yi)/(1−hii), (8) Paul H.C. Eilers, Brian D. Marx and Maria Durban 159 and the diagonal of H can be computed quickly. A derivation can be found in Appendix B of Myers (1989). An informal proof goes as follows. Imagine that we change element i of y to get a new vector y∗; then y∗ = Hy∗. Now it holds that if we set y∗i = y−i, then y∗i = y−i. Hence we have that y−i− yi = hii(y−i−yi), as ∆yi = hii∆yi. After adding yi−yi to the right part of this equation and rearranging terms we arrive at (8). The hat matrix also gives us the effective model dimension, if we follow Ye (1998), who proposed ED = ∑ i ∂ yi/∂yi = ∑ hii. (9) In fact we can compute the trace of H without actually computing its diagonal, using cyclic permutation: ED = tr(H) = tr[(BTWB+λDTD)−1BTWB]. (... |

403 | Flexible smoothing with B-splines and penalties
- Eilers, Marx
- 1996
(Show Context)
Citation Context ...effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines. MSC: 41A15, 41A63, 62G05, 62G07, 62J07, 62J12. Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing. 1. Introduction Twenty years ago, Statistical Science published a discussion paper under the title “Flexible smoothing with B-splines and penalties” (Eilers and Marx, 1996). The authors were two statisticians with only a short track record, who finally got a manuscript published that had been rejected by three other journals. They had been trying since 1992 to sell their brainchild P-splines (Eilers and Marx, 1992). Apparently it did have some value, because two decades later the paper has been cited over a thousand times (according to the Web of Science, a conservative source), in both theoretical and applied work. By now, P-splines have become an active area of research, so it will be useful, and hopefully interesting, to look back and to sketch what might be ... |

370 |
Analysis of discrete ill-posed problems by means of the L-curve.
- HANSEN
- 1992
(Show Context)
Citation Context ...ty of the model. One should always be careful when using cross-validation or AIC to tune the smoothing parameter. An implicit assumption is that the observations are independent, conditional on their smooth expectations. If this is not the case, as for time series data, the serial correlation will be picked up as a part of the smooth component and severe undersmoothing can occur. One way to approach this problem is to explicitly model the correlation structure of the noise. We return to this subject in Section 10 on mixed models. A recent alternative strategy is the adaptation of the L-curve (Hansen, 1992). It was developed for ridge regression, but can be adapted to difference penalties. See Frasso and Eilers (2015) for examples and a variation, called the V-curve, which is easier to use. In Section 10 the tuning parameter for the penalty will appear as a ratio of variances, and the effective dimension plays an essential role when estimating them. 5. Generalized linear smoothing and extensions P-splines are based on linear regression, so it is routine to extend them for smoothing non-normal observations, by borrowing the framework of generalized linear models (GLM). Let y be observed and µ the... |

347 | Varying-coefficient models
- Hastie, Tibshirani
- 1993
(Show Context)
Citation Context ... (2004) presented algorithms for numerical optimization in cross-validation. His book (Wood, 2006a) contains a wealth of material on GAMs. See also Section 13 for the mgcv software. In Section 10 we will present Schall’s algorithm for variance estimation. It is attractive for tuning multiple penalty parameters. 7. Smooth regression coefficients In the preceding sections P-splines were used to model expected values of observations. There is another class of models in which the goal is to model regression coefficients as a curve or surface. In this section we discuss varying coefficient models (Hastie and Tibshirani, 1993), penalized signal regression (Marx and Eilers, 1999), and generalizations. In modern jargon these are all cases of functional data analysis (Ramsay and Silverman, 2003). Varying coefficient models (VCM) were first introduced by Hastie and Tibshirani (1993). They allow regression coefficients to interact with another variable by varying smoothly. The simplest form is E[y(t)] = µ(t) = β(t)x(t), where y and x are observed and β is to be estimated and forced to change slowly with t. The model assumes that y is proportional to x, with a varying slope of the regression line. If we introduce a Bspli... |

306 |
Maximum likelihood approaches to variance component estimation and to related problems
- Harville
- 1977
(Show Context)
Citation Context ...if we set y∗i = y−i, then y∗i = y−i. Hence we have that y−i− yi = hii(y−i−yi), as ∆yi = hii∆yi. After adding yi−yi to the right part of this equation and rearranging terms we arrive at (8). The hat matrix also gives us the effective model dimension, if we follow Ye (1998), who proposed ED = ∑ i ∂ yi/∂yi = ∑ hii. (9) In fact we can compute the trace of H without actually computing its diagonal, using cyclic permutation: ED = tr(H) = tr[(BTWB+λDTD)−1BTWB]. (10) A further simplification is possible by noting that (BTWB+P)−1BTWB = (BTWB+P)−1(BTWB+P−P) = I− (BTWB+P)−1P, (11) where P = λDTD. Harville (1977) presented a similar result for mixed models. In the case of P-splines, the expression is very simple because there are no fixed effects. −6 −4 −2 0 2 4 6 0 2 4 6 8 1 0 log10(lambda) E ff e c ti ve d im e n s io n Effective dimension for d = 0, 1, 2 d = 0 d = 1 d = 2 Figure 6: Changes in the effective model dimension for P-spline smoothing of 10 (simulated) data points with 43 cubic B-splines, for different orders (d) of the differences in the penalty. 160 Twenty years of P-splines The effective dimension is an excellent way to quantify the complexity of a P-spline model. It summarizes the com... |

306 |
Recovery of inter-block information when block sizes are unequal.
- Patterson, Thompson
- 1971
(Show Context)
Citation Context ...very short time. It certainly has helped to make penalized splines “salonfahig”: nowadays they are acceptable and even attractive to a large part of the statistical community. The estimation of the fixed and random effects is based on the maximization of the joint density of (y,u) for β and u which results in the well-known Henderson’s equations (Henderson, 1975): [ X T X X T Z ZTX ZTZ+λI ][ β u ] = [ X T y ZTy ] , (17) where λ= σ2/σ2u . The solution of these equations yields β and u. The variance components σ2 and σ2u are, in general, estimated by REML (Restricted Maximum Likelihood), see Patterson and Thompson (1971)), and the solutions are obtained by numerical optimization. Other approaches can be used, and among them, it is worth mentioning the algorithm given by Schall (1991), which estimated random effects and dispersion parameters without the need to specify their distribution. The key is that each variance component is connected to an effective dimension. The sum of squares of the corresponding random 170 Twenty years of P-splines coefficients is equal to their variance times their respective effective dimension. This fact can be exploited in an iterative algorithm. After each cycle of smoothing, t... |

294 | Approximate Bayesian inference for latent Gaussian models
- Martino
- 2007
(Show Context)
Citation Context ...alk of the same order as that of the differences. It is also possible to start from a mixed model representation. Crainiceanu et al. (2007) did this in one dimension, using truncated power functions. They avoid tensor products of TPFs and switch to radial basis functions for spatial smoothing. These authors also allowed for varying (although isotropic) smoothness and for heteroscedasticity. Jullion and Lambert (2007) proposed a Bayesian model for adaptive smoothing. As an alternative to MCMC, integrated nested Laplace approximation (INLA) is powerful and fast, and it is gaining in popularity (Rue et al., 2009). INLA avoids stochastic simulation for precision parameters and uses numerical integration instead. Basically INLA uses a parameter for each observation so a (B-spline) regression basis has to be implemented in an indirect way, as a matrix of constraints (Fraaije et al., 2015). INLA is an attractive choice for anisotropic smoothing. By working with a sum of precision matrices it can handle the equivalent of a mixed model with overlapping penalties (Rodriguez-Alvarez et al., 2015). 12. Varia In this section we discuss some subjects that do not find a natural home in one of the preceding sectio... |

234 |
An Introduction to Generalized Linear Models
- Dobson
- 2002
(Show Context)
Citation Context ...ed on linear regression, so it is routine to extend them for smoothing non-normal observations, by borrowing the framework of generalized linear models (GLM). Let y be observed and µ the vector of expected values. Then the linear predictor η = g(µ) = Ba is modelled by B-splines, and a suitable distribution is chosen for y, given µ. The penalty is subtracted from the log-likelihood: ℓ∗ = ℓ−λDTD/2. The penalPaul H.C. Eilers, Brian D. Marx and Maria Durban 161 ized likelihood equations result in BT(y−µ) = λDTDa. This is a small change from the standard GLM, in which the right-hand side is zero (Dobson and Barnett, 2008). The equations are non-linear, but penalized maximum likelihood leads to the iterative solution of at+1 = (B T Wt B+λD T d Dd) −1BTWt zt with zt = η t + W −1 t (y− µt), (12) where t denotes the iterate and Wt and ηt denote approximate solutions, while zt is the so-called working variable. The weights in the diagonal matrix W depend on the link function and the chosen distribution. For example, the Poisson distribution, with η = log(µ) has wii = µi. A powerful application of generalized linear smoothing with P-splines is density estimation (Eilers and Marx, 1996). A histogram with... |

209 |
Smoothing Spline ANOVA Models
- Gu
- 2002
(Show Context)
Citation Context ...thms and software that are available for mixed models. Especially in complex models with multiple smooth components, this approach can be more attractive than optimizing cross-validation or AIC. Yet, which approach (based on prediction error or maximum likelihood) is optimal for the selection of the smoothing parameter? Several papers on this subject have appeared along the years, but no unified opinion has been reached: Kauermann (2005a) showed that the ML estimate has the tendency to under-smooth, and prediction error methods give better performance than maximum likelihood based approaches, Gu (2002) also found that ML delivers rougher estimates than GCV, while Ruppert et al. (2003) found, through simulation studies, that REML will produce smoother fits than GCV (simular conclusion was also found in Kohn et al., 1991). Also, Wood (2011) concluded that REML/ML estimation is preferable to GCV for semiparametric GLMs due to its better resistance to over-fitting, less variability in the estimated smoothing parameters, and reduced tendency to having multiple minima. So, it is clear that there is no unique answer to this question, since different scenarios, will yield different conclusions. Mor... |

185 | A statistical perspective on ill-posed inverse problems - O’Sullivan - 1986 |

145 |
Best linear unbiased estimation and prediction under a selection model.
- Henderson
- 1975
(Show Context)
Citation Context ...o answer; researchers have different (and strong) opinions about the mixed model approach (even the authors of this paper do not always agree on this matter), but the truth is that it has become a revolution that has yielded incredible advances in a very short time. It certainly has helped to make penalized splines “salonfahig”: nowadays they are acceptable and even attractive to a large part of the statistical community. The estimation of the fixed and random effects is based on the maximization of the joint density of (y,u) for β and u which results in the well-known Henderson’s equations (Henderson, 1975): [ X T X X T Z ZTX ZTZ+λI ][ β u ] = [ X T y ZTy ] , (17) where λ= σ2/σ2u . The solution of these equations yields β and u. The variance components σ2 and σ2u are, in general, estimated by REML (Restricted Maximum Likelihood), see Patterson and Thompson (1971)), and the solutions are obtained by numerical optimization. Other approaches can be used, and among them, it is worth mentioning the algorithm given by Schall (1991), which estimated random effects and dispersion parameters without the need to specify their distribution. The key is that each variance component is connected to an effec... |

119 | Bayesian P-splines
- Lang, Brezger
- 2004
(Show Context)
Citation Context ...bases, and modified the penalty so that optimization could be carried out in standard statistical software. Wood and Scheipl (2013) proposed an intermediate low-rank smoother. Recently, Schall’s algorithm has been extended (Rodriguez-Alvarez et al., 2015) to the case of multidimensional smoothing. This work also shows the fundamental role of the effective dimensions of the components of the model. 172 Twenty years of P-splines 11. Bayesian P-splines It is a small step to go from a latent distribution in a mixed model to a prior in a Bayesian interpretation. Bayesian P-splines were proposed by Lang and Brezger (2004), and they were made accessible by appropriate software (Brezger et al., 2005). Their approach is based on Markov chain Monte Carlo (MCMC) simulation. As for the mixed model, the penalty leads to a singular distribution. This is solved by simulation using a random walk of the same order as that of the differences. It is also possible to start from a mixed model representation. Crainiceanu et al. (2007) did this in one dimension, using truncated power functions. They avoid tensor products of TPFs and switch to radial basis functions for spatial smoothing. These authors also allowed for varying ... |

112 | Parameter estimation for differential equations: A generalized smoothing approach (with discussion
- Ramsay, Hooker, et al.
- 2007
(Show Context)
Citation Context ...ctually the penalties are the core and many variations have been developed, while the B-spline basis did not change. We expect to see exciting developments in the near future. For a start, we sketch some aspects that we hope will get much attention. We wrote that the penalty forms the skeleton and that the B-splines put flesh on the bones. That means that new ideas for penalties have to be developed. One promising avenue is the application to differential equations. One can write the solution as a sum of B-splines (the collocation method) and use the differential equation (DE) as the penalty (Ramsay et al., 2007). In this light the usual penalty for smoothing splines is equivalent to a differential equation that says that the second derivative of the solution is zero everywhere. O’Sullivan (1986) took the step from a continuous penalty to a discrete one. This can also be done with a DE-based penalty. However, if the coefficients of the DE are 176 Twenty years of P-splines not fixed (e.g. estimated from the data), then this generates a significant computational load. It will be useful to study (almost) equivalent discrete penalties, based on difference equations. It is remarkable that in one-dimensiona... |

82 | Generalized additive models for location, scale and shape.
- Rigby, Stasinopoulos
- 2005
(Show Context)
Citation Context ... counts by a sum of B-splines, but rather the density itself, with constraints on the coefficients (Schellhase and Kauermann, 2012). Mortality or morbity smoothing is equivalent to discrete density estimation with an offset for exposures. P-splines have found their way into this area, for both one- and twodimensional tables (Currie et al., 2004; Camarda, 2012); both papers illustrate automatic extrapolation. The palette of distributions that generalized linear smoothing can use is limited. A very general approach is offered by GAMLSS: generalized additive models for location, scale and shape (Rigby and Stasinopoulos, 2005). An example is the normal distribution with smoothly varying mean and variance, combined with a (varying) Box-Cox trans162 Twenty years of P-splines form of the response variable. Many continuous and discrete distributions can be fitted by the GAMLSS algorithm, also in combination with mixtures, censoring and random components. Instead of using a parametric distribution, one can estimate smooth conditional quantiles, minimizing an asymmetrically weighted sum of absolute values of the residuals. Bollaerts et al. (2006) combined it with shape constraints to force monotonicity. To avoid crossing... |

79 | Geoadditive models
- Kammann, Wand
- 2003
(Show Context)
Citation Context ...oduct B-spline basis. Paul H.C. Eilers, Brian D. Marx and Maria Durban 167 A natural application of multidimensional P-splines is the smoothing of data on a grid. For larger grids the demands on memory and computation time can become too large and special algorithms are needed. See Section 13 for details. Multi-dimensional P-splines are numerically well-behaved, in contrast to truncated power functions. The poor numerical condition of the latter becomes almost insurmountable in higher dimensions. Proponents of TPF have avoided this issue by using radial basis functions (Ruppert et al., 2003; Kammann and Wand, 2003). This is, however, not an attractive scheme: a complicated algorithm is being used for placing the centres of the basis functions. We emphasize that the set of tensor products does not have to be rectangular, although Figure 8 might give that impression. When dealing with, say, a ring-shaped data domain, we can remove all tensor products that do not overlap with the ring, and reduce the penalty matrix accordingly. This can save much computation, and in the case of a ring also is more realistic, because it prevents the penalty from working across the inner region. As we showed for one-dimensio... |

61 |
Direct Generalized Additive Modeling with Penalized Likelihood
- Marx, Eilers
- 1998
(Show Context)
Citation Context ...and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interactions. The GAM with P-splines, or P-GAM, was proposed by Marx and Eilers (1998). We illustrate the main idea in two dimensions. Let η = f1(x1)+ f2(x2) = [B1|B2] [ a1 a2 ] = Ba. (13) By combining the two bases into one matrix and chaining the coefficients in one vector we are back in a standard regression setting. Paul H.C. Eilers, Brian D. Marx and Maria Durban 163 The roughness penalties are λ1||D1a1||2 and λ2||D2a2||2 (where the indices here refer to the variables, not to the order of the differences), leading to two penalty matrices P1 = λ1D T 1D1 and P2 = λ2D T 2D2, which can be combined in the block-diagonal matrix P. The resulting penalized likelihood equations ar... |

52 | Smoothing and forecasting mortality rates. - Currie, Durban, et al. - 2004 |

47 | Quantile smoothing of array CGH data - EILERS, MENEZES - 2005 |

46 | Penalized structured additive regression of spacetime data: a Bayesian perspective
- Fahrmeir, Kneib, et al.
- 2004
(Show Context)
Citation Context ...ive smooth structures As we have seen for the generalized additive and varying-coefficient model, the use of P-splines leads to a set of (modified) B-spline basis matrices which can be combined side-by-side into one large matrix. The penalties lead to a block-diagonal matrix. This idea extends to other model components like signal regression and tensor products. Standard linear regression and factor terms can be added too. This leads to additive smooth structures. Eilers and Marx (2002) proposed GLASS (generalized linear additive smooth structures), while Brezger and Lang (2006), referring to Fahrmeir et al. (2004), proposed STAR (structured additive regression). Belitz and Lang (2008) introduced simultaneous selection of variables and smoothing parameters in structured additive models. The geoadditive model has received much attention; it is formed by the addition of one-dimensional smooth components and a two-dimensional spatial trend. Often the spatial component is modelled as a conditional autoregressive model. Brezger and Lang (2006) presented a Bayesian version of GLASS/STAR, also using 2D P-splines for modelling spatial effects in a multinomial logit model for forest health. Fahrmeir and Kneib (2... |

44 | Generalized structured additive regression based on Bayesian P-splines
- Lang, S
- 2006
(Show Context)
Citation Context ...ems, thanks to the penalties. 9. Additive smooth structures As we have seen for the generalized additive and varying-coefficient model, the use of P-splines leads to a set of (modified) B-spline basis matrices which can be combined side-by-side into one large matrix. The penalties lead to a block-diagonal matrix. This idea extends to other model components like signal regression and tensor products. Standard linear regression and factor terms can be added too. This leads to additive smooth structures. Eilers and Marx (2002) proposed GLASS (generalized linear additive smooth structures), while Brezger and Lang (2006), referring to Fahrmeir et al. (2004), proposed STAR (structured additive regression). Belitz and Lang (2008) introduced simultaneous selection of variables and smoothing parameters in structured additive models. The geoadditive model has received much attention; it is formed by the addition of one-dimensional smooth components and a two-dimensional spatial trend. Often the spatial component is modelled as a conditional autoregressive model. Brezger and Lang (2006) presented a Bayesian version of GLASS/STAR, also using 2D P-splines for modelling spatial effects in a multinomial logit model for... |

43 |
Penalized likelihood for general semi-parametric regression models.
- Green
- 1987
(Show Context)
Citation Context ...rchies of curves form a special type of additive smooth structures. For example, in growth data for children we can introduce an overall mean curve and two additional 168 Twenty years of P-splines curves that show the difference between boys and girls. Moreover, we can have a smooth curve per individual child. Durban et al. (2005) gave an example (using truncated power functions), while Bugli and Lambert (2006) used proper P-splines in a Bayesian context. 10. P-splines as a mixed model The connection between nonparametric regression and mixed models was first established over 25 years ago by Green (1987) and Speed (1991), but it was not until the late 1990s before it became a “hot” research topic (Wang, 1998; Zhang et al., 1998; Verbyla et al., 1999), partly due to the developments in mixed model software. These initial references were based on the use of smoothing splines. In the penalized spline context, several authors quickly extended the model formulation into a mixed model (Brumback et al., 1999; Coull et al., 2001; Wand, 2003). They used truncated power functions as the regression basis, since these have an intuitive connection with a mixed model. However, as previously mentioned, the ... |

42 |
A perfect smoother.
- Eilers
- 2003
(Show Context)
Citation Context ...ng parameters, which we presented in Section 10. On the other hand, boosting allows to select relevant variables in a model and the use of non-standard objective functions. 13. Computation and software For many applications standard P-splines do not pose computational challenges. The size of the B-spline basis will be moderate and many thousands of observations can be handled with ease. If the data are observed on an equidistant grid and only smoothed values on that grid are wanted, one can just as well use the identity matrix as a basis. This leads to the Whittaker smoother (Whittaker, 1923; Eilers, 2003). The number of coefficients will be equal to the number of observation, but in combination with sparse matrix algorithms a very fast smoother is obtained. Sparse matrices also are attractive when long data series have to be smoothed with a large B-spline basis (de Rooi et al., 2014). Even though the basis matrix is sparse, one has to take care to avoid computing dense matrices along the way, as is the case when using truncated power functions. The key is to recognize that B j+k(x) = B j(x− sk), where s is the distance between the knots. Each xi is shifted by −ski to a chosen subdomain , and a... |

35 |
Comment on: variable selection and function estimation in additive nonparametric regression using a data-based prior.
- Brumback, Ruppert, et al.
- 1999
(Show Context)
Citation Context ... and Lambert (2006) used proper P-splines in a Bayesian context. 10. P-splines as a mixed model The connection between nonparametric regression and mixed models was first established over 25 years ago by Green (1987) and Speed (1991), but it was not until the late 1990s before it became a “hot” research topic (Wang, 1998; Zhang et al., 1998; Verbyla et al., 1999), partly due to the developments in mixed model software. These initial references were based on the use of smoothing splines. In the penalized spline context, several authors quickly extended the model formulation into a mixed model (Brumback et al., 1999; Coull et al., 2001; Wand, 2003). They used truncated power functions as the regression basis, since these have an intuitive connection with a mixed model. However, as previously mentioned, the numerical properties of TPFs are poor, compared to P-splines. In a short comment, that largely went unnoticed, Eilers (1999) showed how to interpret P-splines as a mixed model. Currie and Durban (2002) used this approach and extended it to handle heteroscedastic or autocorrelated noise. Work on the general approach for a mixed model representation of smoothers with quadratic penalty was also presented... |

31 | On the asymptotics of penalized splines.
- Li, Ruppert
- 2008
(Show Context)
Citation Context ...rvation so a (B-spline) regression basis has to be implemented in an indirect way, as a matrix of constraints (Fraaije et al., 2015). INLA is an attractive choice for anisotropic smoothing. By working with a sum of precision matrices it can handle the equivalent of a mixed model with overlapping penalties (Rodriguez-Alvarez et al., 2015). 12. Varia In this section we discuss some subjects that do not find a natural home in one of the preceding sections. We take a look at asymptotic properties of P-splines and at boosting. Several authors have studied the asymptotic behaviour of P-splines. See Li and Ruppert (2008); Claeskens et al. (2009); Kauermann et al. (2009); Wang et al. (2011). Although we admire the technical level of these contributions, we do not fully see their practical relevance. The problem is their very limited interpretation of increasing the number of observations: it is all about more observations on the same domain. In that case it is found that the number of knots should grow as a small power of the number of observations. Yet, the whole idea of P-splines is to use far too many knots and let the penalty do the work. Trying to optimize the number of knots, as Ruppert (2002) did, is no... |

30 | Splines, knots and penalties.
- Eilers, Marx
- 2010
(Show Context)
Citation Context ...b of Science, anyone can follow the trail through history in detail. We have done our best, in good faith, to give an overview of the field, but we do not claim that our choice of papers is free from subjectivity. The advent of P-splines has led to formidable developments in smoothing, and we have been actively shaping many of them. We hope that we will not offend any reader by serious omissions. 2. P-spline basics The two components of P-splines are B-splines and discrete penalties. In this section we briefly review them, starting with the former. We do not go much into technical detail; see Eilers and Marx (2010) for that. 2.1. B-splines Figure 1 shows four triangles of the same height and width, the middle ones overlapping with their two neighbours. These are linear B-splines, the non-zero parts consisting of two linear segments. Imagine that we scale the triangles by different amounts and add them all up. That would give us a piecewise-linear curve. We can generate many shapes by changing the coefficients, and we can get more or less detail by using more or fewer B-splines. If we indicate the triangles by B j(x) and if a1 to an are the scaling coefficients, we have ∑n j=1 a jB j(x) as the formula fo... |

26 | Generalized linear array models with applications to multidimensional smoothing. - Currie, Durban, et al. - 2006 |

26 |
Enhancing scatterplots with smoothed densities.
- Eilers, Goeman
- 2004
(Show Context)
Citation Context ...an 157 B-spline basis, this is the right tool for fitting periodic data or circular observations, like directions. • With second order differences, a j−2a j−1+a j−2, in the penalty, the fit approaches a straight line when λ is increased. If we change the equation to a j − 2φa j−1 + a j−2, the limit is a (co)sine with period p such that φ = cos(2π/p). The phase of the (co)sine is adjusted automatically to minimize the sum of squares of the residuals. For smoothing (and interpolation) of seasonal data (with known period) this harmonic penalty usually is more attractive than the standard one. • Eilers and Goeman (2004) combined penalties of first and second order to eliminate negative side lobes of the impulse response (as would be the case with only a second order penalty). This guarantees that smoothing of positive data never can lead to negative fitted values. • As described, the P-spline penalty is quadratic: it uses a sum of squares norm. This leads to a smooth result. Other norms have been used. The sum of absolute values (the L1 norm) of first order differences allows jumps (Eilers and de Menezes, 2005) between neighbouring coefficients, making it suitable for piecewise constant smoothing. This norm ... |

26 |
The performance of cross-validation and maximum likelihood estimators of spline smoothing parameters.
- Kohn, Ansley, et al.
- 1991
(Show Context)
Citation Context ...h (based on prediction error or maximum likelihood) is optimal for the selection of the smoothing parameter? Several papers on this subject have appeared along the years, but no unified opinion has been reached: Kauermann (2005a) showed that the ML estimate has the tendency to under-smooth, and prediction error methods give better performance than maximum likelihood based approaches, Gu (2002) also found that ML delivers rougher estimates than GCV, while Ruppert et al. (2003) found, through simulation studies, that REML will produce smoother fits than GCV (simular conclusion was also found in Kohn et al., 1991). Also, Wood (2011) concluded that REML/ML estimation is preferable to GCV for semiparametric GLMs due to its better resistance to over-fitting, less variability in the estimated smoothing parameters, and reduced tendency to having multiple minima. So, it is clear that there is no unique answer to this question, since different scenarios, will yield different conclusions. Moreover, behind the criteria used to select the smoothing parameter, there is, in our opinion, a deeper question: is it fair to use mixed models methodology for estimation and inference, when the mixed model representation o... |

25 | A note on penalized spline smoothing with correlated errors.
- Krivobokova, Kauermann
- 2007
(Show Context)
Citation Context ...ion parameters. It is well known that the standard methods for smoothing parameter selection (based on minimization of the mean squared prediction error) generally under-smooth the data in the presence of positive correlation, since a smooth trend plus correlated noise can be seen as a less smooth trend plus white noise. The solution is to take into account the correlation structure explicitly, i.e. Var(ǫ) = σ2V, where V can depend on one or more correlation parameters. Durban and Currie (2003) presented a strategy to select the smoothing parameter and estimate the correlation based on REML. Krivobokova and Kauermann (2007) showed that maximum likelihood estimation of the smoothing parameter is robust, even under moderately misspecified correlation. This method has allowed the inclusion of temporal non-linear trends and filtering of time series (Kauermann et al., 2011). Recently, and motivated by the need to improve the speed and stability of forecasting models, Wood et al. (2015) have developed efficient methods for fitting additive models to large data sets with correlated errors. Paul H.C. Eilers, Brian D. Marx and Maria Durban 171 Correlation also appears in more complex situations, for example in the case ... |

20 | Simple fitting of subject-specific curves for longitudinal data. - Durban, Harezlak, et al. - 2005 |

20 |
Structured additive regression for categorical space-time data: A mixed model approach.
- Kneib, Fahrmeir
- 2006
(Show Context)
Citation Context ...ethods for fitting additive models to large data sets with correlated errors. Paul H.C. Eilers, Brian D. Marx and Maria Durban 171 Correlation also appears in more complex situations, for example in the case of spatial data. Lee and Durban (2009) combined two-dimensional P-splines and random effects with a CAR (conditional auto-regressive) structure to estimate spatial trends when data are geographically distributed over locations on a map. Other authors have taken different approaches; they combined additive mixed models with spatial effects represented by Markov or Gaussian random fields (Kneib and Fahrmeir, 2006; Fahrmeir et al., 2010). 10.2. Multidimensional P-splines as mixed models Multidimensional P-splines can be handled as a mixed model too. A first attempt was made by Ruppert et al. (2003) using radial basis functions. Currie et al. (2006) analysed tensor product P-splines as mixed models. Here, the singular value decomposition of the penalty matrix (as in the 1D case) is used to construct the mixed model matrices. This approach works for any sum of quadratic penalties (Wood, 2006a). However, when the penalty is expressed as the sum of Kronecker product of marginal bases (the Kronecker sum of ... |

18 |
Multidimensional penalized signal regression.
- Marx, Eilers
- 2005
(Show Context)
Citation Context ... coefficients, the derivative of f (Ua) can be computed and inserted in a Taylor expansion. Using that, a and the coefficients for f are updated in turn until convergence. Paul H.C. Eilers, Brian D. Marx and Maria Durban 165 P-splines have been implemented in other types of single-index models, e.g. see Yu and Ruppert (2002) and Lu et al. (2006). Leitenstorfer and Tutz (2011) used boosting and Antoniadis et al. (2004) used a Bayesian approach. In the next section, we review the tensor product fundamentals that enable PSR extensions into two-dimensions. For example, Eilers and Marx (2003) and Marx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rates; see Jarrow et al. (2004)... |

17 | Fast and compact smoothing on large multidimensional grids. - Eilers, Currie, et al. - 2006 |

17 | The historical functional linear model.
- Malfait, Ramsay
- 2003
(Show Context)
Citation Context ...osting and Antoniadis et al. (2004) used a Bayesian approach. In the next section, we review the tensor product fundamentals that enable PSR extensions into two-dimensions. For example, Eilers and Marx (2003) and Marx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rates; see Jarrow et al. (2004) and Krivobokova et al. (2006). Additionally, Marx et al. (2011) extended SISR to two dimensions, whereas Marx (2015) presented a hybrid varying-coefficient single-index model. In SISR, a weighted sum of x(t) is formed and transformed. McLean et al. (2014) went one step further: E(yi) = µi = ∫ F(xi(t), t)dt. This can be interpreted as first transforming x (with a different function for e... |

16 |
Generalized linear additive smooth structures.
- Eilers, Marx
- 2002
(Show Context)
Citation Context ...line. If we introduce a Bspline basis B and write β = Ba, we get µ = XBa, where X = diag(x). With a difference penalty on a we have the familiar P-spline structure, with only a modified basis XB. A varying offset can be added: E[y(t)] = µ(t) = β(t)x(t)+β0(t). This has the form of an additive model. Building β0 with P-splines we effectively get a P-GAM. 164 Twenty years of P-splines This simple VCM can be extended by adding more additive or varying-coefficient terms. For non-normal data we model the linear predictor and choose a proper response distribution. VCM with P-splines were proposed by Eilers and Marx (2002). Lu et al. (2008) studied them too and presented a Newton-Raphson procedure to minimize the crossvalidation error. Andriyana et al. (2014) brought quantile regression into VCMs using P-splines. Kauermann (2005b) and Kauermann and Khomski (2006) developed P-spline survival and hazard models, respectively, to accommodate varying-coefficients. Wang et al. (2014) used VCMs for longitudinal data (with errors in variables) with Bayesian P-splines. Heim et al. (2007) used a 3D VCM in brain imaging. Modulation models for seasonal data are an interesting application of the VCM (Eilers et al., 2008; Ma... |

15 |
Multivariate calibration with temperature interaction using twodimensional penalized signal regression.
- Eilers, Marx
- 2003
(Show Context)
Citation Context ...del identifiable. For given coefficients, the derivative of f (Ua) can be computed and inserted in a Taylor expansion. Using that, a and the coefficients for f are updated in turn until convergence. Paul H.C. Eilers, Brian D. Marx and Maria Durban 165 P-splines have been implemented in other types of single-index models, e.g. see Yu and Ruppert (2002) and Lu et al. (2006). Leitenstorfer and Tutz (2011) used boosting and Antoniadis et al. (2004) used a Bayesian approach. In the next section, we review the tensor product fundamentals that enable PSR extensions into two-dimensions. For example, Eilers and Marx (2003) and Marx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rate... |

14 |
Simultaneous selection of variables and smoothing parameters in structured additive regression models.
- Belitz, Lang
- 2008
(Show Context)
Citation Context ...arying-coefficient model, the use of P-splines leads to a set of (modified) B-spline basis matrices which can be combined side-by-side into one large matrix. The penalties lead to a block-diagonal matrix. This idea extends to other model components like signal regression and tensor products. Standard linear regression and factor terms can be added too. This leads to additive smooth structures. Eilers and Marx (2002) proposed GLASS (generalized linear additive smooth structures), while Brezger and Lang (2006), referring to Fahrmeir et al. (2004), proposed STAR (structured additive regression). Belitz and Lang (2008) introduced simultaneous selection of variables and smoothing parameters in structured additive models. The geoadditive model has received much attention; it is formed by the addition of one-dimensional smooth components and a two-dimensional spatial trend. Often the spatial component is modelled as a conditional autoregressive model. Brezger and Lang (2006) presented a Bayesian version of GLASS/STAR, also using 2D P-splines for modelling spatial effects in a multinomial logit model for forest health. Fahrmeir and Kneib (2009) further built on Bayesian STAR models by incorporating geoadditive ... |

14 |
Respiratory health and air pollution: additive mixed model analyses.
- Coull, Schwartz, et al.
- 2001
(Show Context)
Citation Context ...d proper P-splines in a Bayesian context. 10. P-splines as a mixed model The connection between nonparametric regression and mixed models was first established over 25 years ago by Green (1987) and Speed (1991), but it was not until the late 1990s before it became a “hot” research topic (Wang, 1998; Zhang et al., 1998; Verbyla et al., 1999), partly due to the developments in mixed model software. These initial references were based on the use of smoothing splines. In the penalized spline context, several authors quickly extended the model formulation into a mixed model (Brumback et al., 1999; Coull et al., 2001; Wand, 2003). They used truncated power functions as the regression basis, since these have an intuitive connection with a mixed model. However, as previously mentioned, the numerical properties of TPFs are poor, compared to P-splines. In a short comment, that largely went unnoticed, Eilers (1999) showed how to interpret P-splines as a mixed model. Currie and Durban (2002) used this approach and extended it to handle heteroscedastic or autocorrelated noise. Work on the general approach for a mixed model representation of smoothers with quadratic penalty was also presented in Fahrmeir et al. ... |

14 | 2002: Flexible smoothing with P-splines: a unified approach - Currie, Durban |

13 | Functional Generalized Additive Models.
- McLean, Hooker, et al.
- 2014
(Show Context)
Citation Context ...as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rates; see Jarrow et al. (2004) and Krivobokova et al. (2006). Additionally, Marx et al. (2011) extended SISR to two dimensions, whereas Marx (2015) presented a hybrid varying-coefficient single-index model. In SISR, a weighted sum of x(t) is formed and transformed. McLean et al. (2014) went one step further: E(yi) = µi = ∫ F(xi(t), t)dt. This can be interpreted as first transforming x (with a different function for each t) and then adding the results, or “transform and add” in contrast to “add and transform”. 8. Multi-dimensional smoothing A natural extension of P-splines to higher dimensions is to form tensor products of onedimensional B-spline bases and to add difference penalties along each dimension (Eilers and Marx, 2003). Figure 7 illustrates this idea, showing one tensor product Tjk(x,y) = B j(x)Bk(y). Figure 8 illustrates a “thinned” section of a tensor product bas... |

12 |
Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models.
- Jullion, Lambert
- 2007
(Show Context)
Citation Context ...zger et al., 2005). Their approach is based on Markov chain Monte Carlo (MCMC) simulation. As for the mixed model, the penalty leads to a singular distribution. This is solved by simulation using a random walk of the same order as that of the differences. It is also possible to start from a mixed model representation. Crainiceanu et al. (2007) did this in one dimension, using truncated power functions. They avoid tensor products of TPFs and switch to radial basis functions for spatial smoothing. These authors also allowed for varying (although isotropic) smoothness and for heteroscedasticity. Jullion and Lambert (2007) proposed a Bayesian model for adaptive smoothing. As an alternative to MCMC, integrated nested Laplace approximation (INLA) is powerful and fast, and it is gaining in popularity (Rue et al., 2009). INLA avoids stochastic simulation for precision parameters and uses numerical integration instead. Basically INLA uses a parameter for each observation so a (B-spline) regression basis has to be implemented in an indirect way, as a matrix of constraints (Fraaije et al., 2015). INLA is an attractive choice for anisotropic smoothing. By working with a sum of precision matrices it can handle the equiv... |

10 |
Model-based boosting in R: a hands-on tutorial using the R package mboost.
- Hofner, Mayr, et al.
- 2014
(Show Context)
Citation Context ...s and Linux. It covers all the models that fit in the generalized linear additive smooth structure (or structured additive regression) framework. The Bayesian algorithms are based on Markov chain Monte Carlo. It also offers mixed model based algorithms. There are R packages to install BayesX and to communicate with it. It is also possible to use the R-INLA (Rue et al., 2009) package for fitting additive models with P-splines. See Fraaije et al. (2015) and the accompanying software. The package mboost offers boosting for a variety of models, including P-splines and generalized additive models (Hofner et al., 2014). With the extension gamboostLSS, one can apply boosting to models for location, scale and shape, similar to GAMLSS. To estimate smooth expectile curves or surfaces, the package expectreg is available. 14. Discussion The paper by Eilers and Marx (1996) that started it all contained a “consumer score card”, comparing various smoothing algorithms. P-splines received the best marks and their inventors concluded that they should be the smoother of choice. Two decades later, it is gratifying to see that this opinion is being shared by many statisticians and other scientists. Once prominent tools li... |

9 | Bayesian estimation in single-index models.
- Antoniadis, Gregoire, et al.
- 2004
(Show Context)
Citation Context ... , a second B-spline basis and corresponding coefficients are introduced. The domain is that of Ua, and a has to be standardized (e.g. mean zero and variance 1) to make the model identifiable. For given coefficients, the derivative of f (Ua) can be computed and inserted in a Taylor expansion. Using that, a and the coefficients for f are updated in turn until convergence. Paul H.C. Eilers, Brian D. Marx and Maria Durban 165 P-splines have been implemented in other types of single-index models, e.g. see Yu and Ruppert (2002) and Lu et al. (2006). Leitenstorfer and Tutz (2011) used boosting and Antoniadis et al. (2004) used a Bayesian approach. In the next section, we review the tensor product fundamentals that enable PSR extensions into two-dimensions. For example, Eilers and Marx (2003) and Marx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak... |

9 |
Spatially adaptive Bayesian penalized splines with heteroscedastic errors.
- Crainiceanu, Ruppert, et al.
- 2007
(Show Context)
Citation Context ...2 Twenty years of P-splines 11. Bayesian P-splines It is a small step to go from a latent distribution in a mixed model to a prior in a Bayesian interpretation. Bayesian P-splines were proposed by Lang and Brezger (2004), and they were made accessible by appropriate software (Brezger et al., 2005). Their approach is based on Markov chain Monte Carlo (MCMC) simulation. As for the mixed model, the penalty leads to a singular distribution. This is solved by simulation using a random walk of the same order as that of the differences. It is also possible to start from a mixed model representation. Crainiceanu et al. (2007) did this in one dimension, using truncated power functions. They avoid tensor products of TPFs and switch to radial basis functions for spatial smoothing. These authors also allowed for varying (although isotropic) smoothness and for heteroscedasticity. Jullion and Lambert (2007) proposed a Bayesian model for adaptive smoothing. As an alternative to MCMC, integrated nested Laplace approximation (INLA) is powerful and fast, and it is gaining in popularity (Rue et al., 2009). INLA avoids stochastic simulation for precision parameters and uses numerical integration instead. Basically INLA uses a... |

9 | Smooth-CAR mixed models for spatial count data. - Lee, Durban - 2009 |

8 |
BayesX: analysing Bayesian structured additive regression models.
- Brezger, Kneib, et al.
- 2005
(Show Context)
Citation Context ...ndard statistical software. Wood and Scheipl (2013) proposed an intermediate low-rank smoother. Recently, Schall’s algorithm has been extended (Rodriguez-Alvarez et al., 2015) to the case of multidimensional smoothing. This work also shows the fundamental role of the effective dimensions of the components of the model. 172 Twenty years of P-splines 11. Bayesian P-splines It is a small step to go from a latent distribution in a mixed model to a prior in a Bayesian interpretation. Bayesian P-splines were proposed by Lang and Brezger (2004), and they were made accessible by appropriate software (Brezger et al., 2005). Their approach is based on Markov chain Monte Carlo (MCMC) simulation. As for the mixed model, the penalty leads to a singular distribution. This is solved by simulation using a random walk of the same order as that of the differences. It is also possible to start from a mixed model representation. Crainiceanu et al. (2007) did this in one dimension, using truncated power functions. They avoid tensor products of TPFs and switch to radial basis functions for spatial smoothing. These authors also allowed for varying (although isotropic) smoothness and for heteroscedasticity. Jullion and Lamber... |

8 | Propriety of posteriors in structured additive regression models: Theory and empirical evidence.
- Fahrmeir, Kneib
- 2009
(Show Context)
Citation Context ...ahrmeir et al. (2004), proposed STAR (structured additive regression). Belitz and Lang (2008) introduced simultaneous selection of variables and smoothing parameters in structured additive models. The geoadditive model has received much attention; it is formed by the addition of one-dimensional smooth components and a two-dimensional spatial trend. Often the spatial component is modelled as a conditional autoregressive model. Brezger and Lang (2006) presented a Bayesian version of GLASS/STAR, also using 2D P-splines for modelling spatial effects in a multinomial logit model for forest health. Fahrmeir and Kneib (2009) further built on Bayesian STAR models by incorporating geoadditive features and Markov random fields, while addressing improper prior distributions. Also considering geoadditive structure, Kneib et al. (2011) expanded and unified Bayesian STAR models to further accommodate high-dimensional covariates. Hierarchies of curves form a special type of additive smooth structures. For example, in growth data for children we can introduce an overall mean curve and two additional 168 Twenty years of P-splines curves that show the difference between boys and girls. Moreover, we can have a smooth curve p... |

8 |
A note on smoothing parameter selection for penalized spline smoothing. Journal of statistical planning and inference,
- Kauermann
- 2005
(Show Context)
Citation Context ...be added: E[y(t)] = µ(t) = β(t)x(t)+β0(t). This has the form of an additive model. Building β0 with P-splines we effectively get a P-GAM. 164 Twenty years of P-splines This simple VCM can be extended by adding more additive or varying-coefficient terms. For non-normal data we model the linear predictor and choose a proper response distribution. VCM with P-splines were proposed by Eilers and Marx (2002). Lu et al. (2008) studied them too and presented a Newton-Raphson procedure to minimize the crossvalidation error. Andriyana et al. (2014) brought quantile regression into VCMs using P-splines. Kauermann (2005b) and Kauermann and Khomski (2006) developed P-spline survival and hazard models, respectively, to accommodate varying-coefficients. Wang et al. (2014) used VCMs for longitudinal data (with errors in variables) with Bayesian P-splines. Heim et al. (2007) used a 3D VCM in brain imaging. Modulation models for seasonal data are an interesting application of the VCM (Eilers et al., 2008; Marx et al., 2010). The amplitude of a sinusoidal (or more complex) waveform is made to vary slowly over time. This assumes that the period is known. If that is not the case, or when it is not constant, it is pos... |

8 |
Generalized additive models for location, scale and shape for high dimensional data. A flexible approach based on boosting.
- Mayr, Fenske, et al.
- 2012
(Show Context)
Citation Context ...number. We therefore advise to use 100 B-splines, a safe choice. Boosting for smoothing basically works as follows (Tutz and Binder, 2006; Schmid and Hothorn, 2008): (1) smooth with a very strong penalty and save the result, (2) smooth the residuals and add (a fraction of) this result to the previous result, (3) repeat step (2) many times. The result gets more flexible with each iteration. So one has to stop at some point, using AIC or another criterion. Boosting has many enthusiastic proponents, and its use has been extended to non-normal data and additive models and other smooth structures (Mayr et al., 2012). We find it difficult to see its advantages, especially when we compare it to Schall’s algorithm for tuning multiple smoothing parameters, which we presented in Section 10. On the other hand, boosting allows to select relevant variables in a model and the use of non-standard objective functions. 13. Computation and software For many applications standard P-splines do not pose computational challenges. The size of the B-spline basis will be moderate and many thousands of observations can be handled with ease. If the data are observed on an equidistant grid and only smoothed values on that grid... |

7 |
3D space-varying coefficient models with application to diffusion tensor imaging.
- Heim, Fahrmeir, et al.
- 2007
(Show Context)
Citation Context ...nd Tibshirani, 1990), updating each component function in turn, using any type of smoother. Convergence can be slow and diagnostics are hard or impossible to obtain. Direct fitting by P-splines does not have these disadvantages. As presented the model is unidentifiable, because an arbitrary upward shift of f1(x1) can be compensated by an equal shift downward of f2(x2). A solution is to introduce an (unpenalized) intercept and to constrain each component to have a zero average. The P-GAM has multiple smoothing parameters, so optimization of AIC, say, by a simple grid search involves much work. Heim et al. (2007) proposed a searching strategy that cycles over one-dimensional grid searches. As a more principled approach, Wood (2004) presented algorithms for numerical optimization in cross-validation. His book (Wood, 2006a) contains a wealth of material on GAMs. See also Section 13 for the mgcv software. In Section 10 we will present Schall’s algorithm for variance estimation. It is attractive for tuning multiple penalty parameters. 7. Smooth regression coefficients In the preceding sections P-splines were used to model expected values of observations. There is another class of models in which the goal ... |

6 |
Quantile regression with monotonicity restrictions using P-splines and the l1-norm.
- Bollaerts, Eilers, et al.
- 2006
(Show Context)
Citation Context ...r programming software can be used. See also Section 5 for quantile smoothing. • The jumps that are obtained with the L1 norm are not really “crisp,” but slightly rounded. The reason is that the L1 norm selects and shrinks. Much better results are obtained with the L0 norm, the number of non-zero coefficients (Rippe et al., 2012b). Although a non-convex objective function results, in practice it can be optimized reliably and quickly by an iteratively updated quadratic penalty. Other types of penalties can be used to enforce shape constraints. An example is a monotonously increasing curve fit (Bollaerts et al., 2006). A second, asymmetric, penalty κ ∑ j v j(∆a j) is introduced, with v j = 1 when ∆a j < 0 and v j = 0 otherwise. The value of κ regulates the influence of the penalty. Iterative computation is needed, as one needs to know v to do the smoothing and then to know the solution to determine (update) v. In practice, starting from v = 0 works well. Many variations are possible, to force sign constraints, to ensure (increasing or decreasing) monotonicity, or to require a convex or concave shape. One can also mix and match the asymmetric penalties to implement multiple shape constraints. Eilers (2005) ... |

6 |
MortalitySmooth: an R package for smoothing Poisson counts with P-splines.
- Camarda
- 2012
(Show Context)
Citation Context ...or products of linear B-splines (we discuss tensor products in Section 8). To reduce the number of coefficients, they used reduced splines, which are similar to nested B-splines (Lee et al., 2013). Another variation is not to model the logarithm of the counts by a sum of B-splines, but rather the density itself, with constraints on the coefficients (Schellhase and Kauermann, 2012). Mortality or morbity smoothing is equivalent to discrete density estimation with an offset for exposures. P-splines have found their way into this area, for both one- and twodimensional tables (Currie et al., 2004; Camarda, 2012); both papers illustrate automatic extrapolation. The palette of distributions that generalized linear smoothing can use is limited. A very general approach is offered by GAMLSS: generalized additive models for location, scale and shape (Rigby and Stasinopoulos, 2005). An example is the normal distribution with smoothly varying mean and variance, combined with a (varying) Box-Cox trans162 Twenty years of P-splines form of the response variable. Many continuous and discrete distributions can be fitted by the GAMLSS algorithm, also in combination with mixtures, censoring and random components. I... |

6 |
Unimodal smoothing.
- Eilers
- 2005
(Show Context)
Citation Context ... et al., 2006). A second, asymmetric, penalty κ ∑ j v j(∆a j) is introduced, with v j = 1 when ∆a j < 0 and v j = 0 otherwise. The value of κ regulates the influence of the penalty. Iterative computation is needed, as one needs to know v to do the smoothing and then to know the solution to determine (update) v. In practice, starting from v = 0 works well. Many variations are possible, to force sign constraints, to ensure (increasing or decreasing) monotonicity, or to require a convex or concave shape. One can also mix and match the asymmetric penalties to implement multiple shape constraints. Eilers (2005) used them for unimodal smoothing, while Eilers and Borgdorff (2007) used them to fit mixtures of log-concave non-parametric densities. This scheme has been extended to two dimensions by Rippe et al. (2012a) and applied to genotyping of SNPs (we discuss multidimensional smoothing in Section 8). Pya and Wood (2015) took a different approach. They write a =Σexp(β) and structure the matrix Σ in such a way that a has the desired shape, for any vector β . For 158 Twenty years of P-splines example Σi j = I(i ≥ j), with the indicator function I(·), provides a monotonic increasing function. Patterns f... |

6 |
Non-Standard Problems in Inference for Additive and Linear Mixed Models.
- Greven
- 2008
(Show Context)
Citation Context ...ntiles of x, and the penalty is on the size of the 156 Twenty years of P-splines ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 0 .0 0 .5 1 .0 1 .5 2 .0 x y P−spline smoothing with a large basis Figure 5: P-spline smoothing of 10 (simulated) data points with 43 cubic B-splines. coefficients. This work has been extended in the book by Ruppert et al. (2003). Some people have called the TPF approach P-splines too. This is confusing and unfortunate because TPF are inferior to the original P-splines; Eilers and Marx (2010) documented their poor numerical condition. B-splines and TPF are strongly related (Greven, 2008; Eilers and Marx, 2010). Actually B-splines can be computed as differences of TPF, but in the age of single precision floating point numbers it was avoided, for fear of large rounding errors. Eilers and Marx (2010) showed that this no longer holds. P-splines allow to select the degree of the Bsplines and the order of the penalty independently. With TPF there is no choice: they imply a difference penalty the order of which is determined by the degree of the TPF. 3. Penalty variations Standard P-splines use a penalty that is based on repeated differences. Many variations are possible. As stated... |

6 | Estimating the interest rate term structure of corporate debt with a semiparametric penalized spline model.
- Jarrow, Ruppert, et al.
- 2004
(Show Context)
Citation Context ...rx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rates; see Jarrow et al. (2004) and Krivobokova et al. (2006). Additionally, Marx et al. (2011) extended SISR to two dimensions, whereas Marx (2015) presented a hybrid varying-coefficient single-index model. In SISR, a weighted sum of x(t) is formed and transformed. McLean et al. (2014) went one step further: E(yi) = µi = ∫ F(xi(t), t)dt. This can be interpreted as first transforming x (with a different function for each t) and then adding the results, or “transform and add” in contrast to “add and transform”. 8. Multi-dimensional smoothing A natural extension of P-splines to higher dimensions is to form tensor products of ... |

6 |
Penalized spline smoothing in multivariable survival models with varying coefficients.
- Kauermann
- 2005
(Show Context)
Citation Context ...be added: E[y(t)] = µ(t) = β(t)x(t)+β0(t). This has the form of an additive model. Building β0 with P-splines we effectively get a P-GAM. 164 Twenty years of P-splines This simple VCM can be extended by adding more additive or varying-coefficient terms. For non-normal data we model the linear predictor and choose a proper response distribution. VCM with P-splines were proposed by Eilers and Marx (2002). Lu et al. (2008) studied them too and presented a Newton-Raphson procedure to minimize the crossvalidation error. Andriyana et al. (2014) brought quantile regression into VCMs using P-splines. Kauermann (2005b) and Kauermann and Khomski (2006) developed P-spline survival and hazard models, respectively, to accommodate varying-coefficients. Wang et al. (2014) used VCMs for longitudinal data (with errors in variables) with Bayesian P-splines. Heim et al. (2007) used a 3D VCM in brain imaging. Modulation models for seasonal data are an interesting application of the VCM (Eilers et al., 2008; Marx et al., 2010). The amplitude of a sinusoidal (or more complex) waveform is made to vary slowly over time. This assumes that the period is known. If that is not the case, or when it is not constant, it is pos... |

5 | A note on P-spline additive models with correlated errors. - Durban, Currie - 2003 |

5 |
Discussion on: the analysis of designed experiments and longitudinal data by using smoothing splines.
- Eilers
- 1999
(Show Context)
Citation Context ...idation. His book (Wood, 2006a) contains a wealth of material on GAMs. See also Section 13 for the mgcv software. In Section 10 we will present Schall’s algorithm for variance estimation. It is attractive for tuning multiple penalty parameters. 7. Smooth regression coefficients In the preceding sections P-splines were used to model expected values of observations. There is another class of models in which the goal is to model regression coefficients as a curve or surface. In this section we discuss varying coefficient models (Hastie and Tibshirani, 1993), penalized signal regression (Marx and Eilers, 1999), and generalizations. In modern jargon these are all cases of functional data analysis (Ramsay and Silverman, 2003). Varying coefficient models (VCM) were first introduced by Hastie and Tibshirani (1993). They allow regression coefficients to interact with another variable by varying smoothly. The simplest form is E[y(t)] = µ(t) = β(t)x(t), where y and x are observed and β is to be estimated and forced to change slowly with t. The model assumes that y is proportional to x, with a varying slope of the regression line. If we introduce a Bspline basis B and write β = Ba, we get µ = XBa, where X ... |

5 |
Non-parametric log-concave mixtures.
- Eilers, Borgdorff
- 2007
(Show Context)
Citation Context ...j(∆a j) is introduced, with v j = 1 when ∆a j < 0 and v j = 0 otherwise. The value of κ regulates the influence of the penalty. Iterative computation is needed, as one needs to know v to do the smoothing and then to know the solution to determine (update) v. In practice, starting from v = 0 works well. Many variations are possible, to force sign constraints, to ensure (increasing or decreasing) monotonicity, or to require a convex or concave shape. One can also mix and match the asymmetric penalties to implement multiple shape constraints. Eilers (2005) used them for unimodal smoothing, while Eilers and Borgdorff (2007) used them to fit mixtures of log-concave non-parametric densities. This scheme has been extended to two dimensions by Rippe et al. (2012a) and applied to genotyping of SNPs (we discuss multidimensional smoothing in Section 8). Pya and Wood (2015) took a different approach. They write a =Σexp(β) and structure the matrix Σ in such a way that a has the desired shape, for any vector β . For 158 Twenty years of P-splines example Σi j = I(i ≥ j), with the indicator function I(·), provides a monotonic increasing function. Patterns for combinations of constraints on first and second derivative are ta... |

5 |
Multivariate calibration with single-index signal regression
- Eilers, Li, et al.
- 2009
(Show Context)
Citation Context ...es available. Notice that a is forced to be smooth, but µ does not have to be smooth at all. Also not the rows of X are smoothed, but the regression coefficients. Li and Marx (2008) proposed signal sharpening to enhance external prediction by incorporating PLS weights. An extensive review of functional regression was presented by Morris (2015). The standard PSR model implicitly assumes the identity link function. However, it can be bent through µ = f (Xβ) = f (Ua), where f (·) is unknown. We call this model single-index signal regression (SISR), which is closely related to projection pursuit (Eilers et al., 2009). To estimate f , a second B-spline basis and corresponding coefficients are introduced. The domain is that of Ua, and a has to be standardized (e.g. mean zero and variance 1) to make the model identifiable. For given coefficients, the derivative of f (Ua) can be computed and inserted in a Taylor expansion. Using that, a and the coefficients for f are updated in turn until convergence. Paul H.C. Eilers, Brian D. Marx and Maria Durban 165 P-splines have been implemented in other types of single-index models, e.g. see Yu and Ruppert (2002) and Lu et al. (2006). Leitenstorfer and Tutz (2011) use... |

5 | Additive two-way hazards model with varying coefficients.
- Kauermann, Khomski
- 2006
(Show Context)
Citation Context ...t) = β(t)x(t)+β0(t). This has the form of an additive model. Building β0 with P-splines we effectively get a P-GAM. 164 Twenty years of P-splines This simple VCM can be extended by adding more additive or varying-coefficient terms. For non-normal data we model the linear predictor and choose a proper response distribution. VCM with P-splines were proposed by Eilers and Marx (2002). Lu et al. (2008) studied them too and presented a Newton-Raphson procedure to minimize the crossvalidation error. Andriyana et al. (2014) brought quantile regression into VCMs using P-splines. Kauermann (2005b) and Kauermann and Khomski (2006) developed P-spline survival and hazard models, respectively, to accommodate varying-coefficients. Wang et al. (2014) used VCMs for longitudinal data (with errors in variables) with Bayesian P-splines. Heim et al. (2007) used a 3D VCM in brain imaging. Modulation models for seasonal data are an interesting application of the VCM (Eilers et al., 2008; Marx et al., 2010). The amplitude of a sinusoidal (or more complex) waveform is made to vary slowly over time. This assumes that the period is known. If that is not the case, or when it is not constant, it is possible to estimate both varying ampl... |

5 | Filtering time series with penalized splines.
- Kauermann, Krivobokova, et al.
- 2011
(Show Context)
Citation Context ...ated noise can be seen as a less smooth trend plus white noise. The solution is to take into account the correlation structure explicitly, i.e. Var(ǫ) = σ2V, where V can depend on one or more correlation parameters. Durban and Currie (2003) presented a strategy to select the smoothing parameter and estimate the correlation based on REML. Krivobokova and Kauermann (2007) showed that maximum likelihood estimation of the smoothing parameter is robust, even under moderately misspecified correlation. This method has allowed the inclusion of temporal non-linear trends and filtering of time series (Kauermann et al., 2011). Recently, and motivated by the need to improve the speed and stability of forecasting models, Wood et al. (2015) have developed efficient methods for fitting additive models to large data sets with correlated errors. Paul H.C. Eilers, Brian D. Marx and Maria Durban 171 Correlation also appears in more complex situations, for example in the case of spatial data. Lee and Durban (2009) combined two-dimensional P-splines and random effects with a CAR (conditional auto-regressive) structure to estimate spatial trends when data are geographically distributed over locations on a map. Other author... |

5 | High dimensional structured additive regression models: Bayesian regularization, smoothing and predictive performance.
- Kneib, Konrath, et al.
- 2011
(Show Context)
Citation Context ...del has received much attention; it is formed by the addition of one-dimensional smooth components and a two-dimensional spatial trend. Often the spatial component is modelled as a conditional autoregressive model. Brezger and Lang (2006) presented a Bayesian version of GLASS/STAR, also using 2D P-splines for modelling spatial effects in a multinomial logit model for forest health. Fahrmeir and Kneib (2009) further built on Bayesian STAR models by incorporating geoadditive features and Markov random fields, while addressing improper prior distributions. Also considering geoadditive structure, Kneib et al. (2011) expanded and unified Bayesian STAR models to further accommodate high-dimensional covariates. Hierarchies of curves form a special type of additive smooth structures. For example, in growth data for children we can introduce an overall mean curve and two additional 168 Twenty years of P-splines curves that show the difference between boys and girls. Moreover, we can have a smooth curve per individual child. Durban et al. (2005) gave an example (using truncated power functions), while Bugli and Lambert (2006) used proper P-splines in a Bayesian context. 10. P-splines as a mixed model The conn... |

4 |
III-posed problems with counts, the composite link model and penalized likelihood.
- Eilers
- 2007
(Show Context)
Citation Context ... to the spatial context, while Sobotka et al. (2013) provide confidence intervals. Schnabel and Eilers (2013) proposed a location-scale model for non-crossing expectile curves. When analysing counts with a generalized linear model, often the Poisson distribution is assumed, with µ = exp(η) for the expected values. When counts are grouped or aggregated, the composite link model (CLM) of Thompson and Baker (1981) is more appropriate. It states that µ = Cexp(η), where the matrix C encodes the aggregation or mixing pattern. In the penalized CLM, a smooth structure for η is modelled with Psplines (Eilers, 2007). It is a powerful model for grouped counts (Lambert and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interaction... |

4 |
Generalized linear models with P-splines.
- Eilers, Marx
- 1992
(Show Context)
Citation Context ...f the central developments during the first two decades of P-splines. MSC: 41A15, 41A63, 62G05, 62G07, 62J07, 62J12. Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing. 1. Introduction Twenty years ago, Statistical Science published a discussion paper under the title “Flexible smoothing with B-splines and penalties” (Eilers and Marx, 1996). The authors were two statisticians with only a short track record, who finally got a manuscript published that had been rejected by three other journals. They had been trying since 1992 to sell their brainchild P-splines (Eilers and Marx, 1992). Apparently it did have some value, because two decades later the paper has been cited over a thousand times (according to the Web of Science, a conservative source), in both theoretical and applied work. By now, P-splines have become an active area of research, so it will be useful, and hopefully interesting, to look back and to sketch what might be ahead. 1 Erasmus University Medical Centre, Rotterdam, the Netherlands, p.eilers@erasmusmc.nl 2 Dept. of Experimental Statistics, Louisiana State University, USA, bmarx@lsu.edu 3 Univ. Carlos III Madrid, Dept of Statistics, Leganes, Spain, mdurb... |

4 |
Penalized solutions to functional regression problems.
- Harezlak, Coull, et al.
- 2007
(Show Context)
Citation Context .... (2004) used a Bayesian approach. In the next section, we review the tensor product fundamentals that enable PSR extensions into two-dimensions. For example, Eilers and Marx (2003) and Marx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rates; see Jarrow et al. (2004) and Krivobokova et al. (2006). Additionally, Marx et al. (2011) extended SISR to two dimensions, whereas Marx (2015) presented a hybrid varying-coefficient single-index model. In SISR, a weighted sum of x(t) is formed and transformed. McLean et al. (2014) went one step further: E(yi) = µi = ∫ F(xi(t), t)dt. This can be interpreted as first transforming x (with a different function for each t) and then adding t... |

4 |
Estimating the term structure of interest rates using penalized splines.
- Krivobokova, Kauermann, et al.
- 2006
(Show Context)
Citation Context ...nded PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rates; see Jarrow et al. (2004) and Krivobokova et al. (2006). Additionally, Marx et al. (2011) extended SISR to two dimensions, whereas Marx (2015) presented a hybrid varying-coefficient single-index model. In SISR, a weighted sum of x(t) is formed and transformed. McLean et al. (2014) went one step further: E(yi) = µi = ∫ F(xi(t), t)dt. This can be interpreted as first transforming x (with a different function for each t) and then adding the results, or “transform and add” in contrast to “add and transform”. 8. Multi-dimensional smoothing A natural extension of P-splines to higher dimensions is to form tensor products of onedimensional B-spline bases ... |

3 |
Modelling general patterns of digit preference.
- Camarda, Eilers, et al.
- 2008
(Show Context)
Citation Context ...s with a generalized linear model, often the Poisson distribution is assumed, with µ = exp(η) for the expected values. When counts are grouped or aggregated, the composite link model (CLM) of Thompson and Baker (1981) is more appropriate. It states that µ = Cexp(η), where the matrix C encodes the aggregation or mixing pattern. In the penalized CLM, a smooth structure for η is modelled with Psplines (Eilers, 2007). It is a powerful model for grouped counts (Lambert and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interactions. The GAM with P-splines, or P-GAM, was proposed by Marx and Eilers (1998). We illustrate the main idea in two dimensions. Let η = f1(x1)+ f2(x2) = [B1|B2] [ a1 a2 ] = Ba. (13) By combining the two bases... |

3 | Flexible copula density estimation with penalized hierarchical B-splines.
- Kauermann, Schellhase, et al.
- 2013
(Show Context)
Citation Context ...of the observed domain of the data, unless it is known to be bounded (as for intrinsically positive variables). P-splines conserve moments of distributions up to order d − 1, where d is the order of the differences in the penalty. This means that, if d = 3, the sum, the mean, and the variance of the smooth histogram are equal to those of the raw histogram, whatever the amount of smoothing (Eilers and Marx, 1996). In contrast, kernel smoothers increase variance. Many variations on this theme have been published. We already mentioned one- and two-dimensional log-concave densities in Section 3.. Kauermann et al. (2013) explored flexible copula density estimation. They modelled the density directly as a sum of tensor products of linear B-splines (we discuss tensor products in Section 8). To reduce the number of coefficients, they used reduced splines, which are similar to nested B-splines (Lee et al., 2013). Another variation is not to model the logarithm of the counts by a sum of B-splines, but rather the density itself, with constraints on the coefficients (Schellhase and Kauermann, 2012). Mortality or morbity smoothing is equivalent to discrete density estimation with an offset for exposures. P-splines ha... |

3 | P-Spline ANOVA Type Interaction Models for Spatio-Temporal Smoothing. - DJ, Durban - 2011 |

2 |
Functional ANOVA with random functional effects: an application to event-related potentials modelling for electroencephalograms analysis.
- Bugli, Lambert
- 2006
(Show Context)
Citation Context ...s, while addressing improper prior distributions. Also considering geoadditive structure, Kneib et al. (2011) expanded and unified Bayesian STAR models to further accommodate high-dimensional covariates. Hierarchies of curves form a special type of additive smooth structures. For example, in growth data for children we can introduce an overall mean curve and two additional 168 Twenty years of P-splines curves that show the difference between boys and girls. Moreover, we can have a smooth curve per individual child. Durban et al. (2005) gave an example (using truncated power functions), while Bugli and Lambert (2006) used proper P-splines in a Bayesian context. 10. P-splines as a mixed model The connection between nonparametric regression and mixed models was first established over 25 years ago by Green (1987) and Speed (1991), but it was not until the late 1990s before it became a “hot” research topic (Wang, 1998; Zhang et al., 1998; Verbyla et al., 1999), partly due to the developments in mixed model software. These initial references were based on the use of smoothing splines. In the penalized spline context, several authors quickly extended the model formulation into a mixed model (Brumback et al., 19... |

2 |
The smooth complex logarithm and quasi-periodic models.
- Eilers
- 2009
(Show Context)
Citation Context ...force monotonicity. To avoid crossing of individually estimated smooth quantile curves, Schnabel and Eilers (2013) introduced the quantile sheet, a surface on the domain formed by the explanatory variable and the probability level. Compared to the explicit solutions of (penalized, weighted) least squares problems, quantile smoothing is a bit less attractive for numerical work as it leads to linear programming or to quadratic programming if quadratic penalties are involved. In contrast, expectiles use asymmetrically weighted sums of squares and lead to simple iterative algorithms (Schnabel and Eilers, 2009). Sobotka and Kneib (2012) extended expectile smoothing to the spatial context, while Sobotka et al. (2013) provide confidence intervals. Schnabel and Eilers (2013) proposed a location-scale model for non-crossing expectile curves. When analysing counts with a generalized linear model, often the Poisson distribution is assumed, with µ = exp(η) for the expected values. When counts are grouped or aggregated, the composite link model (CLM) of Thompson and Baker (1981) is more appropriate. It states that µ = Cexp(η), where the matrix C encodes the aggregation or mixing pattern. In the penalized CL... |

2 |
Smooth semiparametric and nonparametric Bayesian estimation of bivariate densities from bivariate histogram data.
- Lambert
- 2011
(Show Context)
Citation Context ...chnabel and Eilers (2013) proposed a location-scale model for non-crossing expectile curves. When analysing counts with a generalized linear model, often the Poisson distribution is assumed, with µ = exp(η) for the expected values. When counts are grouped or aggregated, the composite link model (CLM) of Thompson and Baker (1981) is more appropriate. It states that µ = Cexp(η), where the matrix C encodes the aggregation or mixing pattern. In the penalized CLM, a smooth structure for η is modelled with Psplines (Eilers, 2007). It is a powerful model for grouped counts (Lambert and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interactions. The GAM with P-splines, or P-GAM, was proposed by Marx and Eilers (1998). We illu... |

2 |
Bayesian density estimation from grouped continuous data.
- Lambert, Eilers
- 2009
(Show Context)
Citation Context ...de confidence intervals. Schnabel and Eilers (2013) proposed a location-scale model for non-crossing expectile curves. When analysing counts with a generalized linear model, often the Poisson distribution is assumed, with µ = exp(η) for the expected values. When counts are grouped or aggregated, the composite link model (CLM) of Thompson and Baker (1981) is more appropriate. It states that µ = Cexp(η), where the matrix C encodes the aggregation or mixing pattern. In the penalized CLM, a smooth structure for η is modelled with Psplines (Eilers, 2007). It is a powerful model for grouped counts (Lambert and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interactions. The GAM with P-splines, or P-GAM, was proposed by Marx and Eilers ... |

2 | Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. - Lee, Durban, et al. - 2013 |

2 | Estimation of single-index models based on boosting techniques.
- Leitenstorfer, Tutz
- 2011
(Show Context)
Citation Context ...ion pursuit (Eilers et al., 2009). To estimate f , a second B-spline basis and corresponding coefficients are introduced. The domain is that of Ua, and a has to be standardized (e.g. mean zero and variance 1) to make the model identifiable. For given coefficients, the derivative of f (Ua) can be computed and inserted in a Taylor expansion. Using that, a and the coefficients for f are updated in turn until convergence. Paul H.C. Eilers, Brian D. Marx and Maria Durban 165 P-splines have been implemented in other types of single-index models, e.g. see Yu and Ruppert (2002) and Lu et al. (2006). Leitenstorfer and Tutz (2011) used boosting and Antoniadis et al. (2004) used a Bayesian approach. In the next section, we review the tensor product fundamentals that enable PSR extensions into two-dimensions. For example, Eilers and Marx (2003) and Marx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the element... |

2 |
A class of partially linear single-index survival models.
- Lu, Chen, et al.
- 2006
(Show Context)
Citation Context ...related to projection pursuit (Eilers et al., 2009). To estimate f , a second B-spline basis and corresponding coefficients are introduced. The domain is that of Ua, and a has to be standardized (e.g. mean zero and variance 1) to make the model identifiable. For given coefficients, the derivative of f (Ua) can be computed and inserted in a Taylor expansion. Using that, a and the coefficients for f are updated in turn until convergence. Paul H.C. Eilers, Brian D. Marx and Maria Durban 165 P-splines have been implemented in other types of single-index models, e.g. see Yu and Ruppert (2002) and Lu et al. (2006). Leitenstorfer and Tutz (2011) used boosting and Antoniadis et al. (2004) used a Bayesian approach. In the next section, we review the tensor product fundamentals that enable PSR extensions into two-dimensions. For example, Eilers and Marx (2003) and Marx and Eilers (2005) extended PSR to allow interaction with a discrete variable and to the twodimensional case where each x is not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previ... |

2 |
Penalized spline estimation for varying-coefficient models.
- Lu, Zhang, et al.
- 2008
(Show Context)
Citation Context ...Bspline basis B and write β = Ba, we get µ = XBa, where X = diag(x). With a difference penalty on a we have the familiar P-spline structure, with only a modified basis XB. A varying offset can be added: E[y(t)] = µ(t) = β(t)x(t)+β0(t). This has the form of an additive model. Building β0 with P-splines we effectively get a P-GAM. 164 Twenty years of P-splines This simple VCM can be extended by adding more additive or varying-coefficient terms. For non-normal data we model the linear predictor and choose a proper response distribution. VCM with P-splines were proposed by Eilers and Marx (2002). Lu et al. (2008) studied them too and presented a Newton-Raphson procedure to minimize the crossvalidation error. Andriyana et al. (2014) brought quantile regression into VCMs using P-splines. Kauermann (2005b) and Kauermann and Khomski (2006) developed P-spline survival and hazard models, respectively, to accommodate varying-coefficients. Wang et al. (2014) used VCMs for longitudinal data (with errors in variables) with Bayesian P-splines. Heim et al. (2007) used a 3D VCM in brain imaging. Modulation models for seasonal data are an interesting application of the VCM (Eilers et al., 2008; Marx et al., 2010). ... |

2 |
P-spline varying coefficient models for complex data.
- Marx
- 2010
(Show Context)
Citation Context ...e, anyone can follow the trail through history in detail. We have done our best, in good faith, to give an overview of the field, but we do not claim that our choice of papers is free from subjectivity. The advent of P-splines has led to formidable developments in smoothing, and we have been actively shaping many of them. We hope that we will not offend any reader by serious omissions. 2. P-spline basics The two components of P-splines are B-splines and discrete penalties. In this section we briefly review them, starting with the former. We do not go much into technical detail; see Eilers and Marx (2010) for that. 2.1. B-splines Figure 1 shows four triangles of the same height and width, the middle ones overlapping with their two neighbours. These are linear B-splines, the non-zero parts consisting of two linear segments. Imagine that we scale the triangles by different amounts and add them all up. That would give us a piecewise-linear curve. We can generate many shapes by changing the coefficients, and we can get more or less detail by using more or fewer B-splines. If we indicate the triangles by B j(x) and if a1 to an are the scaling coefficients, we have ∑n j=1 a jB j(x) as the formula fo... |

2 |
Shape constrained additive models.
- Pya, Wood
- 2015
(Show Context)
Citation Context ... v. In practice, starting from v = 0 works well. Many variations are possible, to force sign constraints, to ensure (increasing or decreasing) monotonicity, or to require a convex or concave shape. One can also mix and match the asymmetric penalties to implement multiple shape constraints. Eilers (2005) used them for unimodal smoothing, while Eilers and Borgdorff (2007) used them to fit mixtures of log-concave non-parametric densities. This scheme has been extended to two dimensions by Rippe et al. (2012a) and applied to genotyping of SNPs (we discuss multidimensional smoothing in Section 8). Pya and Wood (2015) took a different approach. They write a =Σexp(β) and structure the matrix Σ in such a way that a has the desired shape, for any vector β . For 158 Twenty years of P-splines example Σi j = I(i ≥ j), with the indicator function I(·), provides a monotonic increasing function. Patterns for combinations of constraints on first and second derivative are tabulated in their paper. 4. Diagnostics In contrast to many other smoothers, like kernels, local likelihood, and wavelets, Psplines use a regression model with clearly defined coefficients. Hence we can borrow from regression theory to compute info... |

1 |
Theory & methods: Krige, smooth, both or neither?
- Altman
- 2000
(Show Context)
Citation Context ...moothing splines is equivalent to a differential equation that says that the second derivative of the solution is zero everywhere. O’Sullivan (1986) took the step from a continuous penalty to a discrete one. This can also be done with a DE-based penalty. However, if the coefficients of the DE are 176 Twenty years of P-splines not fixed (e.g. estimated from the data), then this generates a significant computational load. It will be useful to study (almost) equivalent discrete penalties, based on difference equations. It is remarkable that in one-dimensional smoothing, kriging is almost absent. Altman (2000) compared splines and kriging and found that serial correlation is a key issue. If it is present and ignored, splines do not perform well. There are ways to handle correlation, as discussed in Section 10.. In spatial data analysis, kriging is still dominant. We believe that for many applications, tensor product P-splines would be a much better choice, especially if one is more interested in estimating a trend rather than doing spatial interpolation. It may appear that attempting to estimate a covariance structure from the data is a worthwhile effort, but in practice it often leads to unstable ... |

1 |
P-splines quantile regression estimation in varying coefficient models.
- Andriyana, Gijbels, et al.
- 2014
(Show Context)
Citation Context ...miliar P-spline structure, with only a modified basis XB. A varying offset can be added: E[y(t)] = µ(t) = β(t)x(t)+β0(t). This has the form of an additive model. Building β0 with P-splines we effectively get a P-GAM. 164 Twenty years of P-splines This simple VCM can be extended by adding more additive or varying-coefficient terms. For non-normal data we model the linear predictor and choose a proper response distribution. VCM with P-splines were proposed by Eilers and Marx (2002). Lu et al. (2008) studied them too and presented a Newton-Raphson procedure to minimize the crossvalidation error. Andriyana et al. (2014) brought quantile regression into VCMs using P-splines. Kauermann (2005b) and Kauermann and Khomski (2006) developed P-spline survival and hazard models, respectively, to accommodate varying-coefficients. Wang et al. (2014) used VCMs for longitudinal data (with errors in variables) with Bayesian P-splines. Heim et al. (2007) used a 3D VCM in brain imaging. Modulation models for seasonal data are an interesting application of the VCM (Eilers et al., 2008; Marx et al., 2010). The amplitude of a sinusoidal (or more complex) waveform is made to vary slowly over time. This assumes that the period i... |

1 |
On the estimation of the reproduction number based on misreported epidemic data. Statistics In
- Azmon, Faes, et al.
- 2014
(Show Context)
Citation Context ...inear model, often the Poisson distribution is assumed, with µ = exp(η) for the expected values. When counts are grouped or aggregated, the composite link model (CLM) of Thompson and Baker (1981) is more appropriate. It states that µ = Cexp(η), where the matrix C encodes the aggregation or mixing pattern. In the penalized CLM, a smooth structure for η is modelled with Psplines (Eilers, 2007). It is a powerful model for grouped counts (Lambert and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interactions. The GAM with P-splines, or P-GAM, was proposed by Marx and Eilers (1998). We illustrate the main idea in two dimensions. Let η = f1(x1)+ f2(x2) = [B1|B2] [ a1 a2 ] = Ba. (13) By combining the two bases into one matrix and ... |

1 | Modeling regional economic dynamics: spatial dependence, spatial heterogeneity and nonlinearities. - Basile, Durban, et al. - 2014 |

1 |
The functional linear array model.
- Brockhaus, Scheipl, et al.
- 2015
(Show Context)
Citation Context ...ue to the random u, and the biased adjusted confidence interval is f (x)± 2σ √ (H)ii (Ruppert et al., 2003). Of course, the interest of the mixed model representation of P-splines has been motivated by the possibility of including smoothing in a larger class of models. In fact, during the last 15 years, there has been an explosion of models: ranging from estimating subject-specific curves in longitudinal data (Durban et al., 2005), to extending classical models in economics Basile et al. (2014), to playing a key role in the recent advances in functional data analysis (Scheipl et al., 2015; Brockhaus et al., 2015), among others. 10.1. P-splines and correlated errors Although the mixed model approach has allowed the generalization of many existing models, there is an area in which it has played a key role: data with serial correlation. For many years the main difficulty when fitting a smooth model in the presence of correlation has been the joint estimation of the smoothing and correlation parameters. It is well known that the standard methods for smoothing parameter selection (based on minimization of the mean squared prediction error) generally under-smooth the data in the presence of positive correla... |

1 | Asymptotic properties of penalized spline estimators. - Eilers, Marx, et al. - 2009 |

1 | Smoothing of X-ray diffraction data and K alpha(2) elimination using penalized likelihood and the composite link model. - Rooi, Pers, et al. - 2014 |

1 |
Bayesian regularization in structured additive regression: a unifying perspective on shrinkage, smoothing and predictor selection.
- Fahrmeir, Kneib, et al.
- 2010
(Show Context)
Citation Context ...e models to large data sets with correlated errors. Paul H.C. Eilers, Brian D. Marx and Maria Durban 171 Correlation also appears in more complex situations, for example in the case of spatial data. Lee and Durban (2009) combined two-dimensional P-splines and random effects with a CAR (conditional auto-regressive) structure to estimate spatial trends when data are geographically distributed over locations on a map. Other authors have taken different approaches; they combined additive mixed models with spatial effects represented by Markov or Gaussian random fields (Kneib and Fahrmeir, 2006; Fahrmeir et al., 2010). 10.2. Multidimensional P-splines as mixed models Multidimensional P-splines can be handled as a mixed model too. A first attempt was made by Ruppert et al. (2003) using radial basis functions. Currie et al. (2006) analysed tensor product P-splines as mixed models. Here, the singular value decomposition of the penalty matrix (as in the 1D case) is used to construct the mixed model matrices. This approach works for any sum of quadratic penalties (Wood, 2006a). However, when the penalty is expressed as the sum of Kronecker product of marginal bases (the Kronecker sum of penalties), the represen... |

1 |
Early plant recruitment stages set the template for the development of vegetation patterns along a hydrological gradient.
- Fraaije, Braak, et al.
- 2015
(Show Context)
Citation Context ...r spatial smoothing. These authors also allowed for varying (although isotropic) smoothness and for heteroscedasticity. Jullion and Lambert (2007) proposed a Bayesian model for adaptive smoothing. As an alternative to MCMC, integrated nested Laplace approximation (INLA) is powerful and fast, and it is gaining in popularity (Rue et al., 2009). INLA avoids stochastic simulation for precision parameters and uses numerical integration instead. Basically INLA uses a parameter for each observation so a (B-spline) regression basis has to be implemented in an indirect way, as a matrix of constraints (Fraaije et al., 2015). INLA is an attractive choice for anisotropic smoothing. By working with a sum of precision matrices it can handle the equivalent of a mixed model with overlapping penalties (Rodriguez-Alvarez et al., 2015). 12. Varia In this section we discuss some subjects that do not find a natural home in one of the preceding sections. We take a look at asymptotic properties of P-splines and at boosting. Several authors have studied the asymptotic behaviour of P-splines. See Li and Ruppert (2008); Claeskens et al. (2009); Kauermann et al. (2009); Wang et al. (2011). Although we admire the technical level ... |

1 |
L- and V-curves for optimal smoothing.
- Frasso, Eilers
- 2015
(Show Context)
Citation Context ... parameter. An implicit assumption is that the observations are independent, conditional on their smooth expectations. If this is not the case, as for time series data, the serial correlation will be picked up as a part of the smooth component and severe undersmoothing can occur. One way to approach this problem is to explicitly model the correlation structure of the noise. We return to this subject in Section 10 on mixed models. A recent alternative strategy is the adaptation of the L-curve (Hansen, 1992). It was developed for ridge regression, but can be adapted to difference penalties. See Frasso and Eilers (2015) for examples and a variation, called the V-curve, which is easier to use. In Section 10 the tuning parameter for the penalty will appear as a ratio of variances, and the effective dimension plays an essential role when estimating them. 5. Generalized linear smoothing and extensions P-splines are based on linear regression, so it is routine to extend them for smoothing non-normal observations, by borrowing the framework of generalized linear models (GLM). Let y be observed and µ the vector of expected values. Then the linear predictor η = g(µ) = Ba is modelled by B-splines, and a suitable dist... |

1 | Some asymptotic results on generalized penalized spline smoothing. - Eilers, Marx, et al. - 2009 |

1 |
Sharpening P-spline signal regression.
- Li, Marx
- 2008
(Show Context)
Citation Context ...o be smooth, by putting a difference penalty on it, thereby making the problem well-posed (Marx and Eilers, 1999). In principle there is no need to introduce P-splines, by writing β = Ba and putting the penalty on a, but it reduces the computational load when X has many columns. Effectively we get penalized regression on the basis U =XB. After this step the machinery for cross-validation, standard errors and effective dimension becomes available. Notice that a is forced to be smooth, but µ does not have to be smooth at all. Also not the rows of X are smoothed, but the regression coefficients. Li and Marx (2008) proposed signal sharpening to enhance external prediction by incorporating PLS weights. An extensive review of functional regression was presented by Morris (2015). The standard PSR model implicitly assumes the identity link function. However, it can be bent through µ = f (Xβ) = f (Ua), where f (·) is unknown. We call this model single-index signal regression (SISR), which is closely related to projection pursuit (Eilers et al., 2009). To estimate f , a second B-spline basis and corresponding coefficients are introduced. The domain is that of Ua, and a has to be standardized (e.g. mean zero a... |

1 |
Multidimensional single-index signal regression.
- Marx, Eilers, et al.
- 2011
(Show Context)
Citation Context ... not a vector but a matrix. In these models there is no notion of time. When each element of y is not a scalar but a time series, as is x, the historical functional linear model (HFLM) assumes that in principle all previous x can influence the elements of y (Malfait and Ramsay, 2003). Harezlak et al. (2007) introduced P-spline technology for the HFLM. A mirror image of the HFLM is the interest term structure, estimating the expected future course of interest rates; see Jarrow et al. (2004) and Krivobokova et al. (2006). Additionally, Marx et al. (2011) extended SISR to two dimensions, whereas Marx (2015) presented a hybrid varying-coefficient single-index model. In SISR, a weighted sum of x(t) is formed and transformed. McLean et al. (2014) went one step further: E(yi) = µi = ∫ F(xi(t), t)dt. This can be interpreted as first transforming x (with a different function for each t) and then adding the results, or “transform and add” in contrast to “add and transform”. 8. Multi-dimensional smoothing A natural extension of P-splines to higher dimensions is to form tensor products of onedimensional B-spline bases and to add difference penalties along each dimension (Eilers and Marx, 2003). Figure 7 ... |

1 |
Bilinear modulation models for seasonal tables of counts.
- Marx, Eilers, et al.
- 2010
(Show Context)
Citation Context ...2). Lu et al. (2008) studied them too and presented a Newton-Raphson procedure to minimize the crossvalidation error. Andriyana et al. (2014) brought quantile regression into VCMs using P-splines. Kauermann (2005b) and Kauermann and Khomski (2006) developed P-spline survival and hazard models, respectively, to accommodate varying-coefficients. Wang et al. (2014) used VCMs for longitudinal data (with errors in variables) with Bayesian P-splines. Heim et al. (2007) used a 3D VCM in brain imaging. Modulation models for seasonal data are an interesting application of the VCM (Eilers et al., 2008; Marx et al., 2010). The amplitude of a sinusoidal (or more complex) waveform is made to vary slowly over time. This assumes that the period is known. If that is not the case, or when it is not constant, it is possible to estimate both varying amplitude and phase of a sine wave (Eilers, 2009). In a VCM, y and x are parallel vectors given at the same sampling positions in time or space. In penalized signal regression (PSR) we have a set of x vectors and corresponding scalars in y and the goal is to predict the latter. If the x vectors form the rows of a matrix X, we have linear regression E(y) = µ = Xβ . The prob... |

1 |
Functional Regression. Annual Review of Statistics and its Application,
- Morris
- 2015
(Show Context)
Citation Context ...by writing β = Ba and putting the penalty on a, but it reduces the computational load when X has many columns. Effectively we get penalized regression on the basis U =XB. After this step the machinery for cross-validation, standard errors and effective dimension becomes available. Notice that a is forced to be smooth, but µ does not have to be smooth at all. Also not the rows of X are smoothed, but the regression coefficients. Li and Marx (2008) proposed signal sharpening to enhance external prediction by incorporating PLS weights. An extensive review of functional regression was presented by Morris (2015). The standard PSR model implicitly assumes the identity link function. However, it can be bent through µ = f (Xβ) = f (Ua), where f (·) is unknown. We call this model single-index signal regression (SISR), which is closely related to projection pursuit (Eilers et al., 2009). To estimate f , a second B-spline basis and corresponding coefficients are introduced. The domain is that of Ua, and a has to be standardized (e.g. mean zero and variance 1) to make the model identifiable. For given coefficients, the derivative of f (Ua) can be computed and inserted in a Taylor expansion. Using that, a an... |

1 | Reliable single chip genotyping with semiparametric log-concave mixtures. Plos One,
- Rippe, Meulman, et al.
- 2012
(Show Context)
Citation Context ... sum of absolute values (the L1 norm) of first order differences allows jumps (Eilers and de Menezes, 2005) between neighbouring coefficients, making it suitable for piecewise constant smoothing. This norm is a natural choice when combined with an L1 norm on the residuals; standard linear programming software can be used. See also Section 5 for quantile smoothing. • The jumps that are obtained with the L1 norm are not really “crisp,” but slightly rounded. The reason is that the L1 norm selects and shrinks. Much better results are obtained with the L0 norm, the number of non-zero coefficients (Rippe et al., 2012b). Although a non-convex objective function results, in practice it can be optimized reliably and quickly by an iteratively updated quadratic penalty. Other types of penalties can be used to enforce shape constraints. An example is a monotonously increasing curve fit (Bollaerts et al., 2006). A second, asymmetric, penalty κ ∑ j v j(∆a j) is introduced, with v j = 1 when ∆a j < 0 and v j = 0 otherwise. The value of κ regulates the influence of the penalty. Iterative computation is needed, as one needs to know v to do the smoothing and then to know the solution to determine (update) v. In pract... |

1 | Visualization of genomic changes by segmented smoothing using an L0 penalty.
- Rippe, Meulman, et al.
- 2012
(Show Context)
Citation Context ... sum of absolute values (the L1 norm) of first order differences allows jumps (Eilers and de Menezes, 2005) between neighbouring coefficients, making it suitable for piecewise constant smoothing. This norm is a natural choice when combined with an L1 norm on the residuals; standard linear programming software can be used. See also Section 5 for quantile smoothing. • The jumps that are obtained with the L1 norm are not really “crisp,” but slightly rounded. The reason is that the L1 norm selects and shrinks. Much better results are obtained with the L0 norm, the number of non-zero coefficients (Rippe et al., 2012b). Although a non-convex objective function results, in practice it can be optimized reliably and quickly by an iteratively updated quadratic penalty. Other types of penalties can be used to enforce shape constraints. An example is a monotonously increasing curve fit (Bollaerts et al., 2006). A second, asymmetric, penalty κ ∑ j v j(∆a j) is introduced, with v j = 1 when ∆a j < 0 and v j = 0 otherwise. The value of κ regulates the influence of the penalty. Iterative computation is needed, as one needs to know v to do the smoothing and then to know the solution to determine (update) v. In pract... |

1 |
Efficient estimation of smooth distributions from coarsely grouped data.
- Rizzi, Gampe, et al.
- 2015
(Show Context)
Citation Context ...ers (2013) proposed a location-scale model for non-crossing expectile curves. When analysing counts with a generalized linear model, often the Poisson distribution is assumed, with µ = exp(η) for the expected values. When counts are grouped or aggregated, the composite link model (CLM) of Thompson and Baker (1981) is more appropriate. It states that µ = Cexp(η), where the matrix C encodes the aggregation or mixing pattern. In the penalized CLM, a smooth structure for η is modelled with Psplines (Eilers, 2007). It is a powerful model for grouped counts (Lambert and Eilers, 2009; Lambert, 2011; Rizzi et al., 2015), but it has also found application in misclassification and digit preference (Camarda et al., 2008; Azmon et al., 2014). de Rooi et al. (2014) used it to remove artifacts in X-ray diffraction scans. 6. Generalized additive models The generalized additive model (GAM) constructs the linear predictor as a sum of smooth terms, each based on a different covariate (Hastie and Tibshirani, 1990). The model is η = ∑ j f j(xj); it can be interpreted as a multidimensional smoother without interactions. The GAM with P-splines, or P-GAM, was proposed by Marx and Eilers (1998). We illustrate the main idea ... |

1 | Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. - Rodriguez-Alvarez, Lee, et al. - 2015 |