This is a great description of something I've been thinking about in terms of concepts like regression to the mean, clustering and median filtering - the space of LLM output is much smaller than that of the input, precisely because the LLM works so hard to extract minimal-information patterns from its input.
This is a great description of something I've been thinking about in terms of concepts like regression to the mean, clustering and median filtering - the space of LLM output is much smaller than that of the input, precisely because the LLM works so hard to extract minimal-information patterns from its input.