Complexity (hidden-units, weight-decay)
How to choose the number of hidden-units [12-6-1-6-12] ?
How to control the flexibility of a curve: from almost linear components (under-fitting) to very complex curves (over-fitting)?
Several ways are used to control neural network complexity: number of hidden nodes, number of iterations (early stopping), etc.
In nonlinear PCA the most important parameter is weight-decay.
Hidden units are not good for controlling the complexity because of the discrete scale: 2 or 3 nodes is a big difference, there is no option between. Best to have a reasonable large (or even too large) number of hidden units and control complexity by using weight-decay only. If not specified by 'units_per_layer', hidden units are set automatically to fit best the data dimension.
Number of hidden units gives the potential of non-linearity and weight-decay is for controlling it!
You can test different weight-decay values to check and avoid over-fitting.
Inverse NLPCA and hierarchical order also have important impacts on controlling curve complexity.
See also: Validation of nonlinear PCA