Principal Curves

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Authors Trevor J. Hastie, Werner Stuetzle
Journal/Conference Name Journal of the American Statistical Association
Paper Category
Paper Abstract Principal curves are smooth one-dimensional curves that pass through the middle of a p-dimensional data set, providing a nonlinear summary of the data. They are nonparametric, and their shape is suggested by the data. The algorithm for constructing principal curves starts with some prior summary, such as the usual principal-component li e. The curve in each successive iteration is a smooth or local average of the p-dimensional points, where the definition of local is based on the distance in arc length of the projections of the points onto the curve found in the previous iteration. In this article principal curves are defined, an algorithm for their construction is given, some theoretical results are presented, and the procedure is compared to other generalizations ofprincipal components. Two applications illustrate the use of principal curves. The first describes how the principal-curve procedure was used to align the magnets of the Stanford linear collider. The collider uses about 950 magnets in a roughly circular arrangement tobend electron and positron beams and bring them to collision. After construction, it was found that some of the magnets had ended up significantly outof place. As a result, the beams had to be bent too sharply and could not be focused. The engineers realized that the magnets did not have to be moved to their originally planned locations, but rather to a sufficiently smooth arc through the middle of the existing positions. This arc was found using the principalcurve procedure. In the second application, two different assays for gold content in several samples of computer-chip waste appear to show some systematic differences that are blurred by measurement error. The classical approach using linear errors in variables regression can detect systematic linear differences but is not able to account for nonlinearities. When the first linear principal component is replaced with a principal curve, a local "bump" is revealed, and bootstrapping is used to verify its presence.
Date of publication 2010
Code Programming Language R
Comment

Copyright Researcher 2021