2026年首次公开!Qwen3工具调用token-level tracing全链路图谱:用vLLM `logprobs` 可视化12个关键决策节点(含Jupyter可视化工具)

首次公开!Qwen3工具调用token-level tracing全链路图谱:用vLLM `logprobs` 可视化12个关键决策节点(含Jupyter可视化工具)Qwen3 工具调用与 token level tracing 一场从黑箱到认知基础设施的演进 在智能体 Agent 系统日益复杂的今天 一个看似简单的 查询北京天气 请求背后 可能隐藏着跨越数十个 token 横跨 prefill 与 decode 阶段 牵涉工具协议解析与安全策略拦截的语义坍塌链 传统 LLM 可观测性方案 依赖 output text 级日志 基于终态响应的指标统计

大家好,我是讯享网,很高兴认识大家。这里提供最前沿的Ai技术和互联网信息。

# Qwen3工具调用与token-level tracing:一场从黑箱到认知基础设施的演进

在智能体(Agent)系统日益复杂的今天,一个看似简单的“查询北京天气”请求背后,可能隐藏着跨越数十个token、横跨prefill与decode阶段、牵涉工具协议解析与安全策略拦截的语义坍塌链。传统LLM可观测性方案——依赖output_text级日志、基于终态响应的指标统计、或粗粒度的latency/throughput监控——早已无法应对这种深度耦合、多阶段跃迁、强上下文依赖的决策过程。当Qwen3-72B以每秒18次工具调用的节奏运行于生产环境时,“失败”不再是终点,而是一个亟待解构的起点;每一次<|tool_start|>的生成、每一个JSON引号的闭合、甚至安全模块对某个token的强制替换,都携带了可量化的不确定性证据——那便是logprob。

这正是token-level tracing的价值所在:它不是给模型加装一堆仪表盘,而是将整个推理过程重构为一张可微分、可建模、可干预的认知事件图谱。你不再问“模型输出了什么”,而是问“模型在第1246个token位置,为什么对}这个字符的置信度骤降2.59个标准差?它是在对抗一个错误的schema约束,还是在修复一段失衡的引号配对?”这种提问方式的转变,标志着LLM工程正从经验驱动走向认知驱动。


从“能力开关”到“语义决策链”:工具调用的本质跃迁

回望早期的Agent系统,工具调用更像一个二进制开关:要么调用,要么不调。Prompt中写一句“如果需要查天气,请调用weather_api”,模型便在内部维护一个布尔状态。这种方式简单,但脆弱。它无法解释为何模型在{"name": "wea"之后,突然跳转至<|eot_id|>而非继续生成ther_api;也无法说明为何"args": {"city": "Beijing"}}中那个缺失的},会引发下游服务端长达数秒的JSON解析超时。

Qwen3的突破在于,它将工具调用彻底语义化结构化<|tool_start|>不是一个魔法标记,而是模型启动一个全新推理子程序的信号;<|tool_args|>不是语法糖,而是触发一个针对JSON Schema的有限状态机(FSM)的指令;而<|tool_end|>则标志着该子程序的优雅退出。在这个范式下,每一次token生成,都是模型在自然语言与结构化协议之间的一次精确语义跃迁。这种跃迁不是凭空发生的,它被严格锚定在模型内部的概率空间里——而logprob,就是这个空间里最忠实的刻度尺。

因此,token-level tracing的核心价值,远不止于故障归因。它是一把钥匙,开启了三重工程可能性:

  1. 可观测性(Observability):我们终于能“看见”模型的思考路径。不再是猜测“它为什么错了”,而是直接定位到ARG.JSON.SCHEMA_ALIGN节点上那个logprob为-6.49的}字符,并确认其违反了"unit"字段必须为["c", "f"]的enum约束。
  2. 可建模性(Modelability):离散的logprob序列可以被升维为带权重的DAG。TOOL.PARSE.NAME_START → ARG.JSON.SCHEMA_ALIGN → TOOL.EXEC.FAIL这条高频失败路径,其0.94的转移概率,本身就是对模型行为最真实的建模。这个模型不依赖权重,只依赖数据。
  3. 可干预性(Intervenability):一旦诊断完成,干预就变得精准而高效。当PRE_TOOL.TRIGGER.CONFIDENCE_DROP被检测到,系统可以毫秒级地注入一条few-shot示例,而不是让整个请求失败后重试。这种“在决策流中动态注入”的能力,是传统监控体系望尘莫及的。

这正是Qwen3构建高可靠Agent的技术基座:它不追求模型更大、参数更多,而是追求推理过程更透明、决策链条更健壮、干预手段更精细。这是一种面向生产环境的务实哲学。


解构logprob:超越“自信程度”的数学本质

当我们说“这个token的logprob很低”,我们常将其直觉理解为“模型对它没信心”。这种理解虽然直观,却过于浅层,甚至具有误导性。要真正驾驭token-level tracing,我们必须潜入其数学本质。

从信息论角度看,logprob L = log(P(token | context)) 是交叉熵损失 H(y, p) 关于模型参数 θ 的梯度信号在token维度上的投影。设真实标签为 y,当前logits为 z,则交叉熵损失 C = -log(softmax(z)_y)。根据链式法则,其关于logits z_y 的偏导数为 ∂C/∂z_y = softmax(z)_y - 1。而 log(softmax(z)_y) 正是我们观测到的logprob L。这意味着:

> logprob越低(越负),模型在该token上的预测误差越大,对应梯度幅值越高,系统越“紧张”。

这是一个颠覆性的视角。logprob突变(如从-0.5骤降至-4.2)并非单纯的“置信度下降”,而是一次局部优化困境的显性爆发。它可能是注意力头之间的剧烈冲突,是位置编码(position embedding)在长上下文下的尺度漂移,也可能是KV cache中某一块内存被意外污染。

这一洞察在Qwen3的实践中得到了反复验证。一个典型案例是<|tool_start|>后首个工具名token的logprob异常偏低。深入分析发现,其根源并非模型本身的能力缺陷,而是prefill阶段加载的tool description embedding与decode阶段所用的position embedding,在数值尺度上存在不匹配。这种细微的工程偏差,在终态响应层面完全不可见,却会在token级logprob上留下清晰的指纹——一个深达-4.2的“谷底”。

下表总结了不同logprob区间所映射的深层语义与潜在根因,这些结论并非凭空臆想,而是基于对Qwen3-7B在10万条真实trace数据的核密度估计(KDE)与异常聚类所得:

logprob区间 表面语义 深层语义解释 典型场景 潜在根因
> -0.1 高度确定 模型处于“舒适区”,预测路径稳定 结尾、明确标点 position embedding对齐良好,KV cache纯净无污染
[-0.1, -1.0] 健康竞争 多个语义相近的候选token形成有效竞争,需后续token消歧 工具名首字母w vs g 相似工具共现,模型正在学习精细化区分
[-1.0, -2.5] 中度困惑 模型感知到上下文存在模糊性,需要更强的语法或语义线索 JSON key后引号缺失预警 tokenizer的quote规则与模型训练时的分词逻辑不一致
< -2.5 严重决策失败 系统级异常,模型已丧失对该位置的基本控制力 `
"4847.0" http_request_duration_highr_seconds_bucket{le="0.1"}
"5046.0" http_request_duration_highr_seconds_bucket{le="0.05"}
"4859.0" http_request_duration_highr_seconds_bucket{le="0.5"}
"5934.0" http_request_duration_highr_seconds_bucket{le="0.025"}
"4853.0" http_request_duration_highr_seconds_bucket{le="0.25"}
"5279.0" http_request_duration_highr_seconds_bucket{le="0.075"}
"4866.0" http_request_duration_highr_seconds_bucket{le="0.75"}
"6667.0" http_request_duration_highr_seconds_bucket{le="1.0"}
"7415.0" http_request_duration_highr_seconds_bucket{le="1.5"}
"8357.0" http_request_duration_highr_seconds_bucket{le="2.0"}
"9227.0" http_request_duration_highr_seconds_bucket{le="2.5"}
"10121.0" http_request_duration_highr_seconds_bucket{le="3.0"}
"10998.0" http_request_duration_highr_seconds_bucket{le="3.5"}
"11729.0" http_request_duration_highr_seconds_bucket{le="4.0"}
"12385.0" http_request_duration_highr_seconds_bucket{le="4.5"}
"12882.0" http_request_duration_highr_seconds_bucket{le="5.0"}
"13234.0" http_request_duration_highr_seconds_bucket{le="7.5"}
"14374.0" http_request_duration_highr_seconds_bucket{le="10.0"}
"15581.0" http_request_duration_highr_seconds_bucket{le="30.0"}
"25709.0" http_request_duration_highr_seconds_bucket{le="60.0"}
"26209.0" http_request_duration_highr_seconds_bucket{le="+Inf"}
"26448.0" http_request_duration_highr_seconds_count
"26448.0" http_request_duration_highr_seconds_created
"1.72858e+09" http_request_duration_highr_seconds_sum
"." http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="0.1",method="POST"}
"227.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="0.5",method="POST"}
"1115.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="1.0",method="POST"}
"2596.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="+Inf",method="POST"}
"21629.0" http_request_duration_seconds_bucket{handler="/v1/models",le="0.1",method="GET"}
"1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="0.5",method="GET"}
"1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="1.0",method="GET"}
"1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="+Inf",method="GET"}
"1.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="GET"}
"4693.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="HEAD"}
"6.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="OPTIONS"}
"12.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="POST"}
"95.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="PROPFIND"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="PUT"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="SEARCH"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="TRACE"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="GET"}
"4693.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="HEAD"}
"6.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="OPTIONS"}
"12.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="POST"}
"95.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="PROPFIND"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="PUT"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="SEARCH"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="TRACE"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="GET"}
"4693.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="HEAD"}
"6.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="OPTIONS"}
"12.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="POST"}
"95.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="PROPFIND"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="PUT"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="SEARCH"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="TRACE"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="GET"}
"4693.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="HEAD"}
"6.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="OPTIONS"}
"12.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="POST"}
"95.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="PROPFIND"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="PUT"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="SEARCH"}
"3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="TRACE"}
"3.0" http_request_duration_seconds_count{handler="/v1/chat/completions",method="POST"}
"21629.0" http_request_duration_seconds_count{handler="/v1/models",method="GET"}
"1.0" http_request_duration_seconds_count{handler="none",method="GET"}
"4693.0" http_request_duration_seconds_count{handler="none",method="HEAD"}
"6.0" http_request_duration_seconds_count{handler="none",method="OPTIONS"}
"12.0" http_request_duration_seconds_count{handler="none",method="POST"}
"95.0" http_request_duration_seconds_count{handler="none",method="PROPFIND"}
"3.0" http_request_duration_seconds_count{handler="none",method="PUT"}
"3.0" http_request_duration_seconds_count{handler="none",method="SEARCH"}
"3.0" http_request_duration_seconds_count{handler="none",method="TRACE"}
"3.0" http_request_duration_seconds_created{handler="/v1/chat/completions",method="POST"}
"1.23778e+09" http_request_duration_seconds_created{handler="/v1/models",method="GET"}
"1.42406e+09" http_request_duration_seconds_created{handler="none",method="GET"}
"1.08707e+09" http_request_duration_seconds_created{handler="none",method="HEAD"}
"1.9915e+09" http_request_duration_seconds_created{handler="none",method="OPTIONS"}
"1.90425e+09" http_request_duration_seconds_created{handler="none",method="POST"}
"1.95128e+09" http_request_duration_seconds_created{handler="none",method="PROPFIND"}
"1.53226e+09" http_request_duration_seconds_created{handler="none",method="PUT"}
"1.65367e+09" http_request_duration_seconds_created{handler="none",method="SEARCH"}
"1.99503e+09" http_request_duration_seconds_created{handler="none",method="TRACE"}
"1.6383e+09" http_request_duration_seconds_sum{handler="/v1/chat/completions",method="POST"}
"." http_request_duration_seconds_sum{handler="/v1/models",method="GET"}
"0.00" http_request_duration_seconds_sum{handler="none",method="GET"}
"0.11272" http_request_duration_seconds_sum{handler="none",method="HEAD"}
"0.00054779" http_request_duration_seconds_sum{handler="none",method="OPTIONS"}
"0.00" http_request_duration_seconds_sum{handler="none",method="POST"}
"0.0" http_request_duration_seconds_sum{handler="none",method="PROPFIND"}
"0.0009606" http_request_duration_seconds_sum{handler="none",method="PUT"}
"0.000" http_request_duration_seconds_sum{handler="none",method="SEARCH"}
"0.00030145" http_request_duration_seconds_sum{handler="none",method="TRACE"}
"0.00025256" http_request_size_bytes_count{handler="/v1/chat/completions"}
"21629.0" http_request_size_bytes_count{handler="/v1/models"}
"1.0" http_request_size_bytes_count{handler="none"}
"4818.0" http_request_size_bytes_created{handler="/v1/chat/completions"}
"1.23284e+09" http_request_size_bytes_created{handler="/v1/models"}
"1.4021e+09" http_request_size_bytes_created{handler="none"}
"1.04244e+09" http_request_size_bytes_sum{handler="/v1/chat/completions"}
".0" http_request_size_bytes_sum{handler="/v1/models"}
"0.0" http_request_size_bytes_sum{handler="none"}
"32625.0" http_requests_created{handler="/v1/chat/completions",method="POST",status="2xx"}
"1.23055e+09" http_requests_created{handler="/v1/chat/completions",method="POST",status="4xx"}
"1.33803e+09" http_requests_created{handler="/v1/models",method="GET",status="2xx"}
"1.3783e+09" http_requests_created{handler="none",method="GET",status="4xx"}
"1.01185e+09" http_requests_created{handler="none",method="HEAD",status="4xx"}
"1.98838e+09" http_requests_created{handler="none",method="OPTIONS",status="4xx"}
"1.90091e+09" http_requests_created{handler="none",method="POST",status="4xx"}
"1.94773e+09" http_requests_created{handler="none",method="PROPFIND",status="4xx"}
"1.52897e+09" http_requests_created{handler="none",method="PUT",status="4xx"}
"1.64842e+09" http_requests_created{handler="none",method="SEARCH",status="4xx"}
"1.99005e+09" http_requests_created{handler="none",method="TRACE",status="4xx"}
"1.63416e+09" http_requests_total{handler="/v1/chat/completions",method="POST",status="2xx"}
"21576.0" http_requests_total{handler="/v1/chat/completions",method="POST",status="4xx"}
"53.0" http_requests_total{handler="/v1/models",method="GET",status="2xx"}
"1.0" http_requests_total{handler="none",method="GET",status="4xx"}
"4693.0" http_requests_total{handler="none",method="HEAD",status="4xx"}
"6.0" http_requests_total{handler="none",method="OPTIONS",status="4xx"}
"12.0" http_requests_total{handler="none",method="POST",status="4xx"}
"95.0" http_requests_total{handler="none",method="PROPFIND",status="4xx"}
"3.0" http_requests_total{handler="none",method="PUT",status="4xx"}
"3.0" http_requests_total{handler="none",method="SEARCH",status="4xx"}
"3.0" http_requests_total{handler="none",method="TRACE",status="4xx"}
"3.0" http_response_size_bytes_count{handler="/v1/chat/completions"}
"21629.0" http_response_size_bytes_count{handler="/v1/models"}
"1.0" http_response_size_bytes_count{handler="none"}
"4818.0" http_response_size_bytes_created{handler="/v1/chat/completions"}
"1.23535e+09" http_response_size_bytes_created{handler="/v1/models"}
"1.40377e+09" http_response_size_bytes_created{handler="none"}
"1.0456e+09" http_response_size_bytes_sum{handler="/v1/chat/completions"}
"3.e+06" http_response_size_bytes_sum{handler="/v1/models"}
"538.0" http_response_size_bytes_sum{handler="none"}
".0" process_cpu_seconds_total
"2391.38" process_max_fds
"1.0e+09" process_open_fds
"48.0" process_resident_memory_bytes
"4.e+08" process_start_time_seconds
"1.e+09" process_virtual_memory_bytes
"1.e+010" python_gc_collections_total{generation="0"}
"5127.0" python_gc_collections_total{generation="1"}
"465.0" python_gc_collections_total{generation="2"}
"29.0" python_gc_objects_collected_total{generation="0"}
"8032.0" python_gc_objects_collected_total{generation="1"}
"1350.0" python_gc_objects_collected_total{generation="2"}
"994.0" python_gc_objects_uncollectable_total{generation="0"}
"0.0" python_gc_objects_uncollectable_total{generation="1"}
"0.0" python_gc_objects_uncollectable_total{generation="2"}
"0.0" python_info{implementation="CPython",major="3",minor="12",patchlevel="10",version="3.12.10"}
"1.0" vllm:cache_config_info{block_size="16",cache_dtype="auto",calculate_kv_scales="False",cpu_offload_gb="0",enable_prefix_caching="True",gpu_memory_utilization="0.95",is_attention_free="False",num_gpu_blocks_override="None",prefix_caching_hash_algo="builtin",sliding_window="None",swap_space="4",swap_space_bytes=""}
"1.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
"491.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1088.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1947.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2563.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"3503.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"4373.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5264.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"8367.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"10710.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"15758.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"19765.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20837.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21082.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21252.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21336.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21514.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21560.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21563.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21565.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:e2e_request_latency_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:e2e_request_latency_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.05371e+09" vllm:e2e_request_latency_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
".0" vllm:generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.02482e+09" vllm:generation_tokens_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:gpu_cache_usage_perc{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0066981" vllm:gpu_prefix_cache_hits_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.0216e+09" vllm:gpu_prefix_cache_hits_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
".0" vllm:gpu_prefix_cache_queries_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.02012e+09" vllm:gpu_prefix_cache_queries_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="8.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="16.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="32.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="64.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="128.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="256.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="512.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="1024.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="2048.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="4096.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="8192.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="16384.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+06" vllm:iteration_tokens_total_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.03833e+09" vllm:iteration_tokens_total_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.e+07" vllm:num_preemptions_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.0228e+09" vllm:num_preemptions_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:num_requests_running{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.0" vllm:num_requests_waiting{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:prompt_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.02384e+09" vllm:prompt_tokens_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.e+07" vllm:request_decode_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1036.0" vllm:request_decode_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1763.0" vllm:request_decode_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2659.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"3215.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"3962.0" vllm:request_decode_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"4894.0" vllm:request_decode_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5906.0" vllm:request_decode_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"8900.0" vllm:request_decode_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"11080.0" vllm:request_decode_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"16114.0" vllm:request_decode_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20016.0" vllm:request_decode_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20972.0" vllm:request_decode_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21180.0" vllm:request_decode_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21317.0" vllm:request_decode_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21386.0" vllm:request_decode_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21521.0" vllm:request_decode_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21560.0" vllm:request_decode_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21563.0" vllm:request_decode_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21565.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_decode_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_decode_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_decode_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_decode_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.07164e+09" vllm:request_decode_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"." vllm:request_generation_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_generation_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"375.0" vllm:request_generation_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"680.0" vllm:request_generation_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1406.0" vllm:request_generation_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2613.0" vllm:request_generation_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"4881.0" vllm:request_generation_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"8468.0" vllm:request_generation_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"10587.0" vllm:request_generation_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20580.0" vllm:request_generation_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21278.0" vllm:request_generation_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21486.0" vllm:request_generation_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21561.0" vllm:request_generation_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21563.0" vllm:request_generation_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21565.0" vllm:request_generation_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_generation_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.03566e+09" vllm:request_generation_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:request_inference_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
"497.0" vllm:request_inference_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1113.0" vllm:request_inference_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1963.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2577.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"3516.0" vllm:request_inference_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"4394.0" vllm:request_inference_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5300.0" vllm:request_inference_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"8410.0" vllm:request_inference_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"10739.0" vllm:request_inference_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"15812.0" vllm:request_inference_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"19801.0" vllm:request_inference_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20861.0" vllm:request_inference_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21104.0" vllm:request_inference_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21270.0" vllm:request_inference_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21345.0" vllm:request_inference_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21515.0" vllm:request_inference_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21560.0" vllm:request_inference_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21563.0" vllm:request_inference_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21565.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_inference_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_inference_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_inference_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_inference_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.05977e+09" vllm:request_inference_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"." vllm:request_max_num_generation_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"375.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"680.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1406.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2613.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"4881.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"8468.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"10587.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20580.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21278.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21486.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21561.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21563.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21565.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_max_num_generation_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_max_num_generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.0409e+09" vllm:request_max_num_generation_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:request_params_max_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_params_max_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_params_max_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_params_max_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_params_max_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_params_max_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_params_max_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.0" vllm:request_params_max_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2.0" vllm:request_params_max_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"4.0" vllm:request_params_max_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"7.0" vllm:request_params_max_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"136.0" vllm:request_params_max_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"422.0" vllm:request_params_max_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"439.0" vllm:request_params_max_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"924.0" vllm:request_params_max_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_max_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_max_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.04518e+09" vllm:request_params_max_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"6.e+08" vllm:request_params_n_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_n_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_n_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_n_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_n_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_n_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_n_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_params_n_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.04315e+09" vllm:request_params_n_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
"17877.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"18517.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
"19080.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"19493.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20099.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20423.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20583.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21096.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21308.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21468.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21525.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21539.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21565.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prefill_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.0625e+09" vllm:request_prefill_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"11950." vllm:request_prompt_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_prompt_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_prompt_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_prompt_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_prompt_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_prompt_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"52.0" vllm:request_prompt_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"110.0" vllm:request_prompt_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1287.0" vllm:request_prompt_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"15764.0" vllm:request_prompt_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"18257.0" vllm:request_prompt_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"19958.0" vllm:request_prompt_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21066.0" vllm:request_prompt_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21416.0" vllm:request_prompt_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21522.0" vllm:request_prompt_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prompt_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_prompt_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.03123e+09" vllm:request_prompt_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.e+07" vllm:request_queue_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21497.0" vllm:request_queue_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21519.0" vllm:request_queue_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21531.0" vllm:request_queue_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21541.0" vllm:request_queue_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21550.0" vllm:request_queue_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21568.0" vllm:request_queue_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21574.0" vllm:request_queue_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.05698e+09" vllm:request_queue_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1095.94" vllm:request_success_created{engine="0",finished_reason="abort",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.02787e+09" vllm:request_success_created{engine="0",finished_reason="length",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.0273e+09" vllm:request_success_created{engine="0",finished_reason="stop",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.02663e+09" vllm:request_success_total{engine="0",finished_reason="abort",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:request_success_total{engine="0",finished_reason="length",model_name="qwen2.5-72b-instruct-gptq-int4"}
"16.0" vllm:request_success_total{engine="0",finished_reason="stop",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21558.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.01",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.1",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.2",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.4",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12097e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.05",model_name="qwen2.5-72b-instruct-gptq-int4"}
"4.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.15",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.025",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.075",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.096754e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.75",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="7.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="80.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5.12463e+06" vllm:time_per_output_token_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.05083e+09" vllm:time_per_output_token_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"." vllm:time_to_first_token_seconds_bucket{engine="0",le="0.001",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.01",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.1",model_name="qwen2.5-72b-instruct-gptq-int4"}
"10761.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.02",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.04",model_name="qwen2.5-72b-instruct-gptq-int4"}
"22.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.005",model_name="qwen2.5-72b-instruct-gptq-int4"}
"0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"18421.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.06",model_name="qwen2.5-72b-instruct-gptq-int4"}
"2307.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.08",model_name="qwen2.5-72b-instruct-gptq-int4"}
"5692.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.25",model_name="qwen2.5-72b-instruct-gptq-int4"}
"17492.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.75",model_name="qwen2.5-72b-instruct-gptq-int4"}
"18898.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"19414.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"20526.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21039.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="7.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21181.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21258.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21496.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21561.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="80.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="160.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="640.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="2560.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21576.0" vllm:time_to_first_token_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"21576.0" vllm:time_to_first_token_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"1.04754e+09" vllm:time_to_first_token_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
"13699.4" ”可以根据这些数据推测出或者计算出“QPS”“最大运行数”"最大等待数""失败率""成功率""平均耗时(ms)"吗 “ class=“flex-1” data-v-5e667ebc>
小讯
上一篇 2026-04-26 21:43
下一篇 2026-04-26 21:41

相关推荐

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/280767.html