Menu
Log in


Is k-fold cross-validation worth teaching?

  • 12 May 2026 7:06 AM
    Reply # 13630614 on 13628615

    The benefits of k-fold cross-validation are surely 1) that it can be used to account for the selection effects of the strategy used to select the model and 2) that it uses all the data.  Obviously, it does not account for any difference between the data used to train the data and the data to which it will be applied.  For that a training/test approach is needed, or if one is being very careful, a training/validate/test approach.  These sorts of approaches are crucial if AI is used in any substantial way, e.g., to select a model.

  • 11 May 2026 5:16 PM
    Reply # 13630323 on 13628615

    Hi Chris,

    that's a characteristically interesting question.  I think that your claim is correct: the leave-k-out cross-validation estimates the uncertainty that arises from modeling strategy.  

    But I don't see how you, as an avowed hard-core Frequentist, could fail to find that attractive!  I've always been attracted to the idea of paying an honest uncertainty price for an honest modeling effort.  I recall that Paul Kabaila has done considerable elegant work on confidence intervals with coverage that allows for model selection.  

    I think it's also an appealing way to get the students to think about what they are doing.

    What would you do instead, if anything?

    Andrew 

  • 6 May 2026 5:23 PM
    Message # 13628615

    I have been interrogating AI about k-fold cross-validation for a course I am writing. I thought I understood it before, but I understand it better now. AI is really brilliant for this kind of thing. It is like talking to an enthusiastic RA who is smart and knowledgeable, makes some mistakes, but always sucks up to you because you hold the research grant.  ;)

    Anyway, it seems to me that the cross-validation is b-s, at least in any application I can think of. So I am intending not to teach it, or at least give it a couple of slides and say not to bother.

    Hear me out.

    CV gives you the accuracy of the modelling strategy, not the model you actually use. It estimates accuracy averaged over different possible models from different random subsets of the data. And then at the end, you apply the modelling strategy to all the data and say that the CV estimate described it. But it doesn’t describe the model; it describes the modelling strategy.

    Now I am a pretty hard core frequentist. But in this case, I want to estimate future prediction accuracy conditional on all the (training) data that led to the model. I do not want to average over models that might have been! 

    It's like estimating my life outcome averaging over different decisions I could have made over the last 68 years.

    That’s why Kaggle just uses partition – leaving some data out and judging the winner based on this test data.

    Help me out here! Have I completely missed the point? Am I becoming a closet Bayesian in my old age?!

    It could happen.!Fred Hoyle became a catholic after all....


Powered by Wild Apricot Membership Software