Complexity (hidden-units, weight-decay)

FAQ - Frequently asked questions

How to choose the number of hidden-units [12-6-1-6-12] ?

How to control the flexibility of a curve: from almost linear components (under-fitting) to very complex curves (over-fitting)?

Several ways are used to control neural network complexity: number of hidden nodes, number of iterations (early stopping), etc.

In nonlinear PCA the most important parameter is weight-decay.

Hidden units are not good for controlling the complexity because of the discrete scale: 2 or 3 nodes is a big difference, there is no option between. Best to have a reasonable large (or even too large) number of hidden units and control complexity by using weight-decay only. If not specified by 'units_per_layer', hidden units are set automatically to fit best the data dimension.

Number of hidden units gives the potential of non-linearity and weight-decay is for controlling it!

You can test different weight-decay values to check and avoid over-fitting.


Low or zero means no restriction of complexity which can lead to over-fitting.

A very high value (max 1) leads to under-fitting with the effect that we get only a linear component as in standard PCA.

The optimum is somewhere between, from experience a good choose and default is 0.01

Inverse NLPCA and hierarchical order also have important impacts on controlling curve complexity.

see Section 5.5.1 (page 51) of my PhD theses:

See also: Validation of nonlinear PCA