docs/upstream-requests/lq-ai-streaming-inference-routing-log.md

A bug ask that explains why the Receipts drawer and anonymization indicator are blank for real (streaming) chats: _stream_openai_sse writes the inference_routing_log row after yielding data: [DONE], and the consumer closes the stream on [DONE], cancelling the generator before the write lands. The fix moves the write before the [DONE] yield — all the data (usage, cost, correlation ids, anon_mapper) is already in hand — with a test asserting one routing-log row per streaming completion. Awaiting implementation. When shipping streaming chat, or debugging missing receipts/routing logs.

Upstream bug for lq-ai: streamed turns never persist `inference_routing_log`

Found by: Donna (frontend), building the P2c Receipts drawer + anonymization indicator. Severity: High — every chat turn created through a streaming client (the entire Donna UI streams) is missing its inference_routing_log row, so the receipts inference/error event and everything derived from it (tier, tokens, latency, anonymization_applied) never appear. Non-streaming (stream:false) turns are fine.

Evidence (reproduced live)

Two turns, same prompt, same stack:

stream:true  chat → 0 rows in inference_routing_log   (receipts: 0 inference events)
stream:false chat → 1 row                              (receipts: 1 inference event)

The streamed assistant message persists fine (token counts land in messages), but no routing-log row is written.

Root cause

gateway/app/api/inference.py, _stream_openai_sse (success path):

    yield b"data: [DONE]\n\n"          # line ~1210

    # Persist the routing-log row using the final chunk's usage block.
    usage = ...
    ...
    await log_writer.write(            # line ~1225 — runs AFTER [DONE]
        InferenceRoutingLogRow(...)
    )

The await log_writer.write(...) is after the yield b"data: [DONE]\n\n". The api-side consumer (GatewayClient.chat_completion_stream → _iter_sse_chunks) stops iterating when it sees [DONE] and closes the async with client.stream(...) context, which cancels this async generator before the write executes. So the row is never persisted.

Note the failure path in the same function (~line 1181) calls _write_failure(...) before its yield b"data: [DONE]\n\n" — which is why refused/error turns do log. Only the success path is affected.

Fix

Move the success-path routing-log write to before yield b"data: [DONE]\n\n". Everything it needs (last_chunk.usage, cost, correlation ids, anon_mapper) is already available after the async for loop and after the tail-flush terminal chunk is yielded. Sketch:

    # (tail-flush terminal chunk yielded here, as today)

    # Persist the routing-log row BEFORE signalling end-of-stream, so the
    # consumer closing on [DONE] can't cancel us mid-write.
    usage = (last_chunk.usage if last_chunk is not None else None) or None
    cost = ...
    chat_id, message_id = _correlation_ids(chat_request)
    await log_writer.write(InferenceRoutingLogRow(... anonymization_applied=anon_mapper is not None ...))

    yield b"data: [DONE]\n\n"

(Alternatively, schedule the write as a background task that survives client disconnect — e.g. asyncio.create_task(...) or a FastAPI BackgroundTask — but writing before the [DONE] yield is simpler and deterministic, and matches what the failure path already does.)

Test

Add a gateway/api integration test asserting that a streaming chat completion persists exactly one inference_routing_log row (same invariant the non-streaming path already satisfies). Today only the non-streaming path is covered, which is why this regressed silently.

Why it matters to Donna (M2 transparency)

The Receipts drawer's provenance timeline and the per-message anonymization indicator both read inference_routing_log (via GET /chats/{id}/receipts). Until this is fixed, those surfaces are blank for any chat created in the real (streaming) UI — the feature only demonstrably works on stream:false/API-seeded chats. No Donna change is needed once the row is written for streamed turns; the surfaces light up automatically.

When it's done

Report the merged SHA. Donna will bump the vendor/lq-ai pin, regen types (likely no type diff), and the drawer/indicator will populate for normal streamed chats.

Upstream bug for lq-ai: streamed turns never persist inference_routing_log