How to control the flexibility of a curve: from almost linear components (under-fitting) to very complex curves (over-fitting)?
Several ways are used to control neural network complexity: number of hidden nodes, number of iterations (early stopping), etc.
In nonlinear PCA the most important parameter is weight-decay.
Hidden units are not good for controlling the complexity because of the discrete scale: 2 or 3 nodes is a big difference, there is no option between. Best to have a reasonable large (or even too large) number of hidden units and control complexity by using weight-decay only. If not specified by 'units_per_layer', hidden units are set automatically to fit best the data dimension.
Number of hidden units gives the potential of non-linearity and weight-decay is for controlling it!
You can test different weight-decay values to check and avoid over-fitting.
Low or zero means no restriction of complexity which can lead to over-fitting.
A very high value (max 1) leads to under-fitting with the effect that we get only a linear component as in standard PCA.
The optimum is somewhere between, from experience a good choose and default is 0.01
Inverse NLPCA and hierarchical order also have important impacts on controlling curve complexity.
see Section 5.5.1 (page 51) of my PhD theses:
See also: Validation of nonlinear PCA