I like Wittgenstein's ruler. Which can be applied to IQ and SAT tests. Does the fact that LLMs score highly on standardized tests say more about LLMs or more about the tests?
"Unless you have confidence in the ruler's reliability, if you use a ruler to measure a table you may also be using the table to measure the ruler"
Thanks for bringing this! I hand't heard about this concept before.
Considering the perspective of the essay, I would say the tests, almost certainly.
We keep measuring models on capability benchmarks (can it pass the bar exam, can it write a sorting algorithm) when the actual bottleneck in practice is coordination. The model can write an entire Three.js application. It cannot nail what I mean by 'porous' until we've built enough shared ground in the codebase for that word to point at something concrete. I think no standardized test currently measures that.
> The model can write an entire Three.js application....
Once you are willing to question the referential basis of measurement, please go on to question the terms chosen as the basis of performance:
For example, saying a model "writes" the application.
It can also be said that the model "retrieves" the application.
The term write in context amplifies the term intelligence, becoming arbitrarily anthropomorphic and implying human creativity.
But what computational theory explains the machine's creativity? Is there any theory of human creativity?
This distinction is important by virtue of the principle of Occam's razor: If a question must be begged for an explanation, prefer the simplest question.
The term retrieval is with respect to computation ipso facto. To say the model retrieves avoids any need for a theory of creativity and its corresponding anthropomorphism.
By extension, any appearance of intelligence manifest by the model is understood truly as artificial. A taxonomic circuit is closed without need for a mysterious antecedent of creativity.
Assuming any intelligible theory of creativity can be formed, this must regarded as significant in the entire history of thought, not sidelined as a semantic quirk in the field of AI.
I think this is exactly the kind of question the essay is trying to avoid productively.
Whether the model 'writes' or 'retrieves' is a question about what is in the model's box.
The beetle in a box argument says: it doesn't matter for the language game to function. What matters is whether the coordination works. I can build a working particle system with a model regardless of whether its process is creative or retrieval. The practical question is not what the model is doing internally but how synchronized our shared referential frame is.
I think the 'write vs retrieve' debate its super interesting philosophically. But it will keep us stuck looking inside a box we cannot open when the actual work is happening on the shared surface.
I like Wittgenstein's ruler. Which can be applied to IQ and SAT tests. Does the fact that LLMs score highly on standardized tests say more about LLMs or more about the tests?
"Unless you have confidence in the ruler's reliability, if you use a ruler to measure a table you may also be using the table to measure the ruler"
Thanks for bringing this! I hand't heard about this concept before.
Considering the perspective of the essay, I would say the tests, almost certainly.
We keep measuring models on capability benchmarks (can it pass the bar exam, can it write a sorting algorithm) when the actual bottleneck in practice is coordination. The model can write an entire Three.js application. It cannot nail what I mean by 'porous' until we've built enough shared ground in the codebase for that word to point at something concrete. I think no standardized test currently measures that.
> The model can write an entire Three.js application....
Once you are willing to question the referential basis of measurement, please go on to question the terms chosen as the basis of performance:
For example, saying a model "writes" the application.
It can also be said that the model "retrieves" the application.
The term write in context amplifies the term intelligence, becoming arbitrarily anthropomorphic and implying human creativity.
But what computational theory explains the machine's creativity? Is there any theory of human creativity?
This distinction is important by virtue of the principle of Occam's razor: If a question must be begged for an explanation, prefer the simplest question.
The term retrieval is with respect to computation ipso facto. To say the model retrieves avoids any need for a theory of creativity and its corresponding anthropomorphism.
By extension, any appearance of intelligence manifest by the model is understood truly as artificial. A taxonomic circuit is closed without need for a mysterious antecedent of creativity.
Assuming any intelligible theory of creativity can be formed, this must regarded as significant in the entire history of thought, not sidelined as a semantic quirk in the field of AI.
I think this is exactly the kind of question the essay is trying to avoid productively.
Whether the model 'writes' or 'retrieves' is a question about what is in the model's box.
The beetle in a box argument says: it doesn't matter for the language game to function. What matters is whether the coordination works. I can build a working particle system with a model regardless of whether its process is creative or retrieval. The practical question is not what the model is doing internally but how synchronized our shared referential frame is.
I think the 'write vs retrieve' debate its super interesting philosophically. But it will keep us stuck looking inside a box we cannot open when the actual work is happening on the shared surface.