Significance Testing

ffTRF supports two layers of permutation-based significance testing for held-out prediction scores:

TRF.permutation_test(...): fast score-level null using one fitted model
TRF.refit_permutation_test(...): slower refit null that retrains the full model on surrogate-aligned training data

This answers a different question than bootstrap confidence intervals:

bootstrap asks how stable the recovered kernel is across trials
permutation testing asks whether the held-out prediction score is larger than expected under a surrogate null alignment

Which Method Should I Use?

Use permutation_test(...) when:

you want the fastest practical null model
you are comfortable conditioning on the already fitted model
you mainly want to know whether the held-out score beats a surrogate score distribution for that fixed kernel

Use refit_permutation_test(...) when:

you want a stronger null that includes retraining
you used cross-validation and want that model-selection step inside the null
runtime is acceptable for many repeated refits

Quick Example

result = model.permutation_test(
    stimulus=test_stimulus,
    response=test_response,
    n_permutations=1000,
    surrogate="circular_shift",
    min_shift=0.5,
    average=False,
    seed=0,
    n_jobs=4,
)

The returned PermutationTestResult stores:

observed_score: the aligned held-out score
null_scores: the surrogate null distribution
p_value: permutation p-value
z_score: standardized score relative to the null mean and variance

Stronger Refit Null

result = model.refit_permutation_test(
    train_stimulus=train_stimulus,
    train_response=train_response,
    test_stimulus=test_stimulus,
    test_response=test_response,
    n_permutations=100,
    surrogate="circular_shift",
    min_shift=0.5,
    seed=0,
    n_jobs=4,
)

This method trains one model on the original training alignment and one fresh model for each surrogate alignment, then scores all of them on the same held-out test set.

By default it reuses the most recent training configuration stored on the estimator, but disables bootstrap estimation and progress output during the surrogate refits for speed.

Supported Surrogates

`circular_shift`

This rolls each evaluation target trial by a random non-zero offset.

Use it when:

you have one long continuous evaluation trial
trial lengths vary
you want to preserve within-trial autocorrelation and amplitude distribution

min_shift is given in seconds. Increase it when you want to avoid near-aligned shifts for slowly varying signals.

`trial_shuffle`

This permutes whole evaluation target trials.

Use it when:

you have at least two evaluation trials
all evaluation trials have the same sample count
trial identity is the natural exchangeable unit

This null is often easier to explain than circular shifts, but it is only valid when trial boundaries are meaningful and exchangeable.

For refit_permutation_test(...), trial_shuffle applies to the training target trials, so those training trials must also have equal sample counts.

How to Read the Result

lower p-values mean the observed score is more extreme than the surrogate null
the default tail="greater" is the natural choice for the built-in metrics, because larger values are better in ffTRF
average=False keeps one p-value per output channel
aggregated scores use the same average rules as score(...)

Practical Advice

Start with surrogate="circular_shift" for continuous data.
Use trial_shuffle when you have clean repeated trials of equal length.
Keep held-out evaluation data separate from the training data when you want a genuine generalization test.
Start with permutation_test(...) and move to refit_permutation_test(...) only when you need the stronger training-pipeline null.
Use bootstrap intervals and permutation tests together when you care about both kernel stability and predictive significance.