What entails the LLM Completion are you talking sequence of prompts with files / mcp servers. Could you share a bit more, cause I have spent some time with this and have something that might be precisely what you are asking for...
When I think of LLM / Agent observability I think of some combination of open telemetry and like Influxdb, but I don't think that's what your asking for?
Reproduction I suppose. I would like the same things as OP too.
LLM outputs are qualitative; they can't really be automatically scored and prompt enhancements tend to multiply the bug. It can solve a problem, but introduce a new one. It's practical just to do it manually.
What entails the LLM Completion are you talking sequence of prompts with files / mcp servers. Could you share a bit more, cause I have spent some time with this and have something that might be precisely what you are asking for...
I'm sure if you ask Claude Code exactly that, they will develop what you want.
Tell it to create an API for the LLM data ingestion, then integrate with it on your software.
BTW, this is far from what an LLM Observability tool will offer you. You are a bit confused what O11Y is.
When I think of LLM / Agent observability I think of some combination of open telemetry and like Influxdb, but I don't think that's what your asking for?
I am curious, what’s the point of re-running these interactions on a UI?
Reproduction I suppose. I would like the same things as OP too.
LLM outputs are qualitative; they can't really be automatically scored and prompt enhancements tend to multiply the bug. It can solve a problem, but introduce a new one. It's practical just to do it manually.