The spline-based models are widely used in practice to estimate the term structure of interest rates from a set of observed coupon-bond prices. The most popular method can be traced back to McCulloch (1971). Assuming that the price of a bond is equal to the present value of its future coupon payments and redemption, cash flows are regressed on a set of basis functions to estimate discount functions. Once the discount function is estimated, the zero-coupon yield and the forward rate can be obtained by transformations of the discount function. Though this method was followed by a lot of researchers, some serious drawbacks have been reported. The most important problem is the instability of estimated yield curves. As is widely known, the discount function δ(t), the zero coupon yield η(t) and the instantaneous forward rate f(t) are closely tied with one another by explicit relationships. McCulloch's method gives approximated discount function first, so the zero coupon yield curve and the forward rate curve can be derived once δ(t) is estimated. The problem is, however, seemingly reasonable estimate of the discount function does not always lead to acceptable shapes of yield curves, especially for the forward rate curve. Some researchers are concerned with the choice of basis functions when defining a spline function, while others question how to place knots efficiently. The choice of basis functions and/or knot locations undoubtedly affects the estimation results. However, the present article focuses on a different point. It is considered here that instability of the estimated yield curves is caused by the ill-posed nature of the regression spline, rather than by the inappropriate choice of the basis function. By ill-posed it is meant that a model may be over-parameterized compared to the amount of sample information. Without a addressing this ill-posed nature specifically, any modification of the choice of basis functions, approximating functional forms, or knots placement may provide only minor improvements. Throughout this article, a penalty term is added to the original log-likelihood of a yield curve model, that is, a penalized likelihood approach is adopted for this treatment. In this sense, the work of Fisher, Nychka and Zervos (1995) is the most closely related and influential to this study. Those authors fitted smoothing splines (with B-splines bases) instead of regression splines. Moreover, Fisher, Nychka and Zervos (1995) fitted smoothing spline to the zero coupon yield and the forward rate as well as to the discount function. Their simulation results suggest that the best way to estimate yield curves is to place spline bases on the forward rate curve. However, splining the forward rate or the zero coupon yield is not linear operation, hence the use of GCV or the effective number of parameters in model selection lacks its theoretical foundation. This paper proposes a penalized likelihood approach accompanied by generalized information criteria (GIC) that determine the desired degree of smoothness of yield curves in a data-dependent way. GIC, proposed by Konishi and Kitagawa (1996), is an extension of AIC, Akaike Information Criterion. Originally AIC is proposed on the assumption that the models to be compared are estimated by the method of maximum likelihood. GIC is extended to the cases where the models are not necessarily estimated by ML. Model selection among penalized (nonlinear) regressions comes within the range of GIC. Our approach is theoretically valid even if the regression functional is nonlinear with respect to the unknown coefficients of basis functions, of which typical case is 'splining the forward rate' or 'splining the zero coupon yield' case. As will be shown in Section 2, these cases are reduced to the problems of nonlinear regression spline. The derived GICs enable us to compare the models with various choices of basis functions under different regression functional forms in a unified manner. In addition, the number of basis function can be chosen based in minimum GIC method. Monte Carlo simulations reveal that choosing the appropriate number of bases by GIC reduces MSE rather than controlling a plenty of bases by a single smoothing parameter.