Significance Testing
ffTRF supports two layers of permutation-based significance testing for
held-out prediction scores:
TRF.permutation_test(...): fast score-level null using one fitted modelTRF.refit_permutation_test(...): slower refit null that retrains the full model on surrogate-aligned training data
This answers a different question than bootstrap confidence intervals:
- bootstrap asks how stable the recovered kernel is across trials
- permutation testing asks whether the held-out prediction score is larger than expected under a surrogate null alignment
Which Method Should I Use?
Use permutation_test(...) when:
- you want the fastest practical null model
- you are comfortable conditioning on the already fitted model
- you mainly want to know whether the held-out score beats a surrogate score distribution for that fixed kernel
Use refit_permutation_test(...) when:
- you want a stronger null that includes retraining
- you used cross-validation and want that model-selection step inside the null
- runtime is acceptable for many repeated refits
Quick Example
result = model.permutation_test(
stimulus=test_stimulus,
response=test_response,
n_permutations=1000,
surrogate="circular_shift",
min_shift=0.5,
average=False,
seed=0,
n_jobs=4,
)
The returned PermutationTestResult stores:
observed_score: the aligned held-out scorenull_scores: the surrogate null distributionp_value: permutation p-valuez_score: standardized score relative to the null mean and variance
Stronger Refit Null
result = model.refit_permutation_test(
train_stimulus=train_stimulus,
train_response=train_response,
test_stimulus=test_stimulus,
test_response=test_response,
n_permutations=100,
surrogate="circular_shift",
min_shift=0.5,
seed=0,
n_jobs=4,
)
This method trains one model on the original training alignment and one fresh model for each surrogate alignment, then scores all of them on the same held-out test set.
By default it reuses the most recent training configuration stored on the estimator, but disables bootstrap estimation and progress output during the surrogate refits for speed.
Supported Surrogates
circular_shift
This rolls each evaluation target trial by a random non-zero offset.
Use it when:
- you have one long continuous evaluation trial
- trial lengths vary
- you want to preserve within-trial autocorrelation and amplitude distribution
min_shift is given in seconds. Increase it when you want to avoid
near-aligned shifts for slowly varying signals.
trial_shuffle
This permutes whole evaluation target trials.
Use it when:
- you have at least two evaluation trials
- all evaluation trials have the same sample count
- trial identity is the natural exchangeable unit
This null is often easier to explain than circular shifts, but it is only valid when trial boundaries are meaningful and exchangeable.
For refit_permutation_test(...), trial_shuffle applies to the training
target trials, so those training trials must also have equal sample counts.
How to Read the Result
- lower p-values mean the observed score is more extreme than the surrogate null
- the default
tail="greater"is the natural choice for the built-in metrics, because larger values are better inffTRF average=Falsekeeps one p-value per output channel- aggregated scores use the same
averagerules asscore(...)
Practical Advice
- Start with
surrogate="circular_shift"for continuous data. - Use
trial_shufflewhen you have clean repeated trials of equal length. - Keep held-out evaluation data separate from the training data when you want a genuine generalization test.
- Start with
permutation_test(...)and move torefit_permutation_test(...)only when you need the stronger training-pipeline null. - Use bootstrap intervals and permutation tests together when you care about both kernel stability and predictive significance.