How to get the loadings (COEFF) as in linear PCA?

How to know the variables of highest impact on a component?

NLPCA cannot give you a single loading or COEFF vector, instead to each point on the curve the contribution is different according to the direction of the curve. For example, if your curve represent a time series, you can have some features having an impact at early times and other features having high contribution at later times: this means at any point on the curve you get a different COEFF vector.

Fig 2.11 in http://pca.narod.ru/2MainGorbanKeglWunschZin.pdf shows a possible discussion about the impact of features (metabolites) along a curved nonlinear component.

However, a time-point dependent loading vector can be calculated by a gradient/tangent on the curved nonlinear component, in case of a single k=1 time related component.


How to get the loading/tangent vector for a selected time point?

In nonlinear PCA, loading-like values are time dependent. At different time steps, different variables (e.g. genes) will have a strong impact due to increasing/decreasing activity. Instead of a single loading vector like in standard linear PCA, in nonlinear PCA each time point along the curve has it's own specific loading vector, given by the tangent on the time curve.

    1. You would need to select you time point t of interest (e.g. "early stress response").

    2. Get tangent vector dz at the curve for time t. (the vector dz can be considered like a loading vector "0" means no impact at time t, negative or positive values represent decreasing/increasing activity at time t)

How to get the direction of the curve at time t ?

[pc,net]=nlpca(data, 1) % meaningful only for extracting one (k=1) nonlinear component (time series)

pc=linspace(min(pc),max(pc),100); % define 100 points (positions) along the curve (first component)

[data,dz] = nlpca_get_data(net,pc); % get data and gradient/tangent/loadings dz for all component values 'pc'

t = 5; % select a (time) point of interest

pc(:,t); % get component value (PC score) at time t

Tangent (loadings) at time t

v = dz(:,t); % get tangent/gradient vector v, representing the change over time of all original variables. The largest absolute values tell you which variables have the strongest dynamic (highest change) at the selected point. This can be used to rank the variables by importance: highest impact, reaction, or response at this time (for example: find the genes of highest increase or decrease in activity at time t).

plot selected point (to see the position p of time t on the curve)

nlpca_plot(net); % plot data and component curve

p = data(:,t) % get position in original data space

hold on;

plot(p(1,:),p(2,:),'kx','MarkerSize',20,'LineWidth',3);

hold off;

see also:

page 8 of http://www.matthias-scholz.de/scholz_circularPCA_BIRD2007.pdf

page 63, Fig 2.11D in http://pca.narod.ru/2MainGorbanKeglWunschZin.pdf