# Qwen3工具调用与token-level tracing:一场从黑箱到认知基础设施的演进
在智能体(Agent)系统日益复杂的今天,一个看似简单的“查询北京天气”请求背后,可能隐藏着跨越数十个token、横跨prefill与decode阶段、牵涉工具协议解析与安全策略拦截的语义坍塌链。传统LLM可观测性方案——依赖output_text级日志、基于终态响应的指标统计、或粗粒度的latency/throughput监控——早已无法应对这种深度耦合、多阶段跃迁、强上下文依赖的决策过程。当Qwen3-72B以每秒18次工具调用的节奏运行于生产环境时,“失败”不再是终点,而是一个亟待解构的起点;每一次<|tool_start|>的生成、每一个JSON引号的闭合、甚至安全模块对某个token的强制替换,都携带了可量化的不确定性证据——那便是logprob。
这正是token-level tracing的价值所在:它不是给模型加装一堆仪表盘,而是将整个推理过程重构为一张可微分、可建模、可干预的认知事件图谱。你不再问“模型输出了什么”,而是问“模型在第1246个token位置,为什么对}这个字符的置信度骤降2.59个标准差?它是在对抗一个错误的schema约束,还是在修复一段失衡的引号配对?”这种提问方式的转变,标志着LLM工程正从经验驱动走向认知驱动。
从“能力开关”到“语义决策链”:工具调用的本质跃迁
回望早期的Agent系统,工具调用更像一个二进制开关:要么调用,要么不调。Prompt中写一句“如果需要查天气,请调用weather_api”,模型便在内部维护一个布尔状态。这种方式简单,但脆弱。它无法解释为何模型在{"name": "wea"之后,突然跳转至<|eot_id|>而非继续生成ther_api;也无法说明为何"args": {"city": "Beijing"}}中那个缺失的},会引发下游服务端长达数秒的JSON解析超时。
Qwen3的突破在于,它将工具调用彻底语义化与结构化。<|tool_start|>不是一个魔法标记,而是模型启动一个全新推理子程序的信号;<|tool_args|>不是语法糖,而是触发一个针对JSON Schema的有限状态机(FSM)的指令;而<|tool_end|>则标志着该子程序的优雅退出。在这个范式下,每一次token生成,都是模型在自然语言与结构化协议之间的一次精确语义跃迁。这种跃迁不是凭空发生的,它被严格锚定在模型内部的概率空间里——而logprob,就是这个空间里最忠实的刻度尺。
因此,token-level tracing的核心价值,远不止于故障归因。它是一把钥匙,开启了三重工程可能性:
- 可观测性(Observability):我们终于能“看见”模型的思考路径。不再是猜测“它为什么错了”,而是直接定位到
ARG.JSON.SCHEMA_ALIGN节点上那个logprob为-6.49的}字符,并确认其违反了"unit"字段必须为["c", "f"]的enum约束。 - 可建模性(Modelability):离散的logprob序列可以被升维为带权重的DAG。
TOOL.PARSE.NAME_START → ARG.JSON.SCHEMA_ALIGN → TOOL.EXEC.FAIL这条高频失败路径,其0.94的转移概率,本身就是对模型行为最真实的建模。这个模型不依赖权重,只依赖数据。 - 可干预性(Intervenability):一旦诊断完成,干预就变得精准而高效。当
PRE_TOOL.TRIGGER.CONFIDENCE_DROP被检测到,系统可以毫秒级地注入一条few-shot示例,而不是让整个请求失败后重试。这种“在决策流中动态注入”的能力,是传统监控体系望尘莫及的。
这正是Qwen3构建高可靠Agent的技术基座:它不追求模型更大、参数更多,而是追求推理过程更透明、决策链条更健壮、干预手段更精细。这是一种面向生产环境的务实哲学。
解构logprob:超越“自信程度”的数学本质
当我们说“这个token的logprob很低”,我们常将其直觉理解为“模型对它没信心”。这种理解虽然直观,却过于浅层,甚至具有误导性。要真正驾驭token-level tracing,我们必须潜入其数学本质。
从信息论角度看,logprob L = log(P(token | context)) 是交叉熵损失 H(y, p) 关于模型参数 θ 的梯度信号在token维度上的投影。设真实标签为 y,当前logits为 z,则交叉熵损失 C = -log(softmax(z)_y)。根据链式法则,其关于logits z_y 的偏导数为 ∂C/∂z_y = softmax(z)_y - 1。而 log(softmax(z)_y) 正是我们观测到的logprob L。这意味着:
> logprob越低(越负),模型在该token上的预测误差越大,对应梯度幅值越高,系统越“紧张”。
这是一个颠覆性的视角。logprob突变(如从-0.5骤降至-4.2)并非单纯的“置信度下降”,而是一次局部优化困境的显性爆发。它可能是注意力头之间的剧烈冲突,是位置编码(position embedding)在长上下文下的尺度漂移,也可能是KV cache中某一块内存被意外污染。
这一洞察在Qwen3的实践中得到了反复验证。一个典型案例是<|tool_start|>后首个工具名token的logprob异常偏低。深入分析发现,其根源并非模型本身的能力缺陷,而是prefill阶段加载的tool description embedding与decode阶段所用的position embedding,在数值尺度上存在不匹配。这种细微的工程偏差,在终态响应层面完全不可见,却会在token级logprob上留下清晰的指纹——一个深达-4.2的“谷底”。
下表总结了不同logprob区间所映射的深层语义与潜在根因,这些结论并非凭空臆想,而是基于对Qwen3-7B在10万条真实trace数据的核密度估计(KDE)与异常聚类所得:
| logprob区间 | 表面语义 | 深层语义解释 | 典型场景 | 潜在根因 |
|---|---|---|---|---|
| > -0.1 | 高度确定 | 模型处于“舒适区”,预测路径稳定 | 结尾、明确标点 |
position embedding对齐良好,KV cache纯净无污染 |
| [-0.1, -1.0] | 健康竞争 | 多个语义相近的候选token形成有效竞争,需后续token消歧 | 工具名首字母w vs g |
相似工具共现,模型正在学习精细化区分 |
| [-1.0, -2.5] | 中度困惑 | 模型感知到上下文存在模糊性,需要更强的语法或语义线索 | JSON key后引号缺失预警 | tokenizer的quote规则与模型训练时的分词逻辑不一致 |
| < -2.5 | 严重决策失败 | 系统级异常,模型已丧失对该位置的基本控制力 | ` |
- "4847.0" http_request_duration_highr_seconds_bucket{le="0.1"}
- "5046.0" http_request_duration_highr_seconds_bucket{le="0.05"}
- "4859.0" http_request_duration_highr_seconds_bucket{le="0.5"}
- "5934.0" http_request_duration_highr_seconds_bucket{le="0.025"}
- "4853.0" http_request_duration_highr_seconds_bucket{le="0.25"}
- "5279.0" http_request_duration_highr_seconds_bucket{le="0.075"}
- "4866.0" http_request_duration_highr_seconds_bucket{le="0.75"}
- "6667.0" http_request_duration_highr_seconds_bucket{le="1.0"}
- "7415.0" http_request_duration_highr_seconds_bucket{le="1.5"}
- "8357.0" http_request_duration_highr_seconds_bucket{le="2.0"}
- "9227.0" http_request_duration_highr_seconds_bucket{le="2.5"}
- "10121.0" http_request_duration_highr_seconds_bucket{le="3.0"}
- "10998.0" http_request_duration_highr_seconds_bucket{le="3.5"}
- "11729.0" http_request_duration_highr_seconds_bucket{le="4.0"}
- "12385.0" http_request_duration_highr_seconds_bucket{le="4.5"}
- "12882.0" http_request_duration_highr_seconds_bucket{le="5.0"}
- "13234.0" http_request_duration_highr_seconds_bucket{le="7.5"}
- "14374.0" http_request_duration_highr_seconds_bucket{le="10.0"}
- "15581.0" http_request_duration_highr_seconds_bucket{le="30.0"}
- "25709.0" http_request_duration_highr_seconds_bucket{le="60.0"}
- "26209.0" http_request_duration_highr_seconds_bucket{le="+Inf"}
- "26448.0" http_request_duration_highr_seconds_count
- "26448.0" http_request_duration_highr_seconds_created
- "1.72858e+09" http_request_duration_highr_seconds_sum
- "." http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="0.1",method="POST"}
- "227.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="0.5",method="POST"}
- "1115.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="1.0",method="POST"}
- "2596.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="+Inf",method="POST"}
- "21629.0" http_request_duration_seconds_bucket{handler="/v1/models",le="0.1",method="GET"}
- "1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="0.5",method="GET"}
- "1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="1.0",method="GET"}
- "1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="+Inf",method="GET"}
- "1.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="GET"}
- "4693.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="HEAD"}
- "6.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="OPTIONS"}
- "12.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="POST"}
- "95.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="PROPFIND"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="PUT"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="SEARCH"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="TRACE"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="GET"}
- "4693.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="HEAD"}
- "6.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="OPTIONS"}
- "12.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="POST"}
- "95.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="PROPFIND"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="PUT"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="SEARCH"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="TRACE"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="GET"}
- "4693.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="HEAD"}
- "6.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="OPTIONS"}
- "12.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="POST"}
- "95.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="PROPFIND"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="PUT"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="SEARCH"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="TRACE"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="GET"}
- "4693.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="HEAD"}
- "6.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="OPTIONS"}
- "12.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="POST"}
- "95.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="PROPFIND"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="PUT"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="SEARCH"}
- "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="TRACE"}
- "3.0" http_request_duration_seconds_count{handler="/v1/chat/completions",method="POST"}
- "21629.0" http_request_duration_seconds_count{handler="/v1/models",method="GET"}
- "1.0" http_request_duration_seconds_count{handler="none",method="GET"}
- "4693.0" http_request_duration_seconds_count{handler="none",method="HEAD"}
- "6.0" http_request_duration_seconds_count{handler="none",method="OPTIONS"}
- "12.0" http_request_duration_seconds_count{handler="none",method="POST"}
- "95.0" http_request_duration_seconds_count{handler="none",method="PROPFIND"}
- "3.0" http_request_duration_seconds_count{handler="none",method="PUT"}
- "3.0" http_request_duration_seconds_count{handler="none",method="SEARCH"}
- "3.0" http_request_duration_seconds_count{handler="none",method="TRACE"}
- "3.0" http_request_duration_seconds_created{handler="/v1/chat/completions",method="POST"}
- "1.23778e+09" http_request_duration_seconds_created{handler="/v1/models",method="GET"}
- "1.42406e+09" http_request_duration_seconds_created{handler="none",method="GET"}
- "1.08707e+09" http_request_duration_seconds_created{handler="none",method="HEAD"}
- "1.9915e+09" http_request_duration_seconds_created{handler="none",method="OPTIONS"}
- "1.90425e+09" http_request_duration_seconds_created{handler="none",method="POST"}
- "1.95128e+09" http_request_duration_seconds_created{handler="none",method="PROPFIND"}
- "1.53226e+09" http_request_duration_seconds_created{handler="none",method="PUT"}
- "1.65367e+09" http_request_duration_seconds_created{handler="none",method="SEARCH"}
- "1.99503e+09" http_request_duration_seconds_created{handler="none",method="TRACE"}
- "1.6383e+09" http_request_duration_seconds_sum{handler="/v1/chat/completions",method="POST"}
- "." http_request_duration_seconds_sum{handler="/v1/models",method="GET"}
- "0.00" http_request_duration_seconds_sum{handler="none",method="GET"}
- "0.11272" http_request_duration_seconds_sum{handler="none",method="HEAD"}
- "0.00054779" http_request_duration_seconds_sum{handler="none",method="OPTIONS"}
- "0.00" http_request_duration_seconds_sum{handler="none",method="POST"}
- "0.0" http_request_duration_seconds_sum{handler="none",method="PROPFIND"}
- "0.0009606" http_request_duration_seconds_sum{handler="none",method="PUT"}
- "0.000" http_request_duration_seconds_sum{handler="none",method="SEARCH"}
- "0.00030145" http_request_duration_seconds_sum{handler="none",method="TRACE"}
- "0.00025256" http_request_size_bytes_count{handler="/v1/chat/completions"}
- "21629.0" http_request_size_bytes_count{handler="/v1/models"}
- "1.0" http_request_size_bytes_count{handler="none"}
- "4818.0" http_request_size_bytes_created{handler="/v1/chat/completions"}
- "1.23284e+09" http_request_size_bytes_created{handler="/v1/models"}
- "1.4021e+09" http_request_size_bytes_created{handler="none"}
- "1.04244e+09" http_request_size_bytes_sum{handler="/v1/chat/completions"}
- ".0" http_request_size_bytes_sum{handler="/v1/models"}
- "0.0" http_request_size_bytes_sum{handler="none"}
- "32625.0" http_requests_created{handler="/v1/chat/completions",method="POST",status="2xx"}
- "1.23055e+09" http_requests_created{handler="/v1/chat/completions",method="POST",status="4xx"}
- "1.33803e+09" http_requests_created{handler="/v1/models",method="GET",status="2xx"}
- "1.3783e+09" http_requests_created{handler="none",method="GET",status="4xx"}
- "1.01185e+09" http_requests_created{handler="none",method="HEAD",status="4xx"}
- "1.98838e+09" http_requests_created{handler="none",method="OPTIONS",status="4xx"}
- "1.90091e+09" http_requests_created{handler="none",method="POST",status="4xx"}
- "1.94773e+09" http_requests_created{handler="none",method="PROPFIND",status="4xx"}
- "1.52897e+09" http_requests_created{handler="none",method="PUT",status="4xx"}
- "1.64842e+09" http_requests_created{handler="none",method="SEARCH",status="4xx"}
- "1.99005e+09" http_requests_created{handler="none",method="TRACE",status="4xx"}
- "1.63416e+09" http_requests_total{handler="/v1/chat/completions",method="POST",status="2xx"}
- "21576.0" http_requests_total{handler="/v1/chat/completions",method="POST",status="4xx"}
- "53.0" http_requests_total{handler="/v1/models",method="GET",status="2xx"}
- "1.0" http_requests_total{handler="none",method="GET",status="4xx"}
- "4693.0" http_requests_total{handler="none",method="HEAD",status="4xx"}
- "6.0" http_requests_total{handler="none",method="OPTIONS",status="4xx"}
- "12.0" http_requests_total{handler="none",method="POST",status="4xx"}
- "95.0" http_requests_total{handler="none",method="PROPFIND",status="4xx"}
- "3.0" http_requests_total{handler="none",method="PUT",status="4xx"}
- "3.0" http_requests_total{handler="none",method="SEARCH",status="4xx"}
- "3.0" http_requests_total{handler="none",method="TRACE",status="4xx"}
- "3.0" http_response_size_bytes_count{handler="/v1/chat/completions"}
- "21629.0" http_response_size_bytes_count{handler="/v1/models"}
- "1.0" http_response_size_bytes_count{handler="none"}
- "4818.0" http_response_size_bytes_created{handler="/v1/chat/completions"}
- "1.23535e+09" http_response_size_bytes_created{handler="/v1/models"}
- "1.40377e+09" http_response_size_bytes_created{handler="none"}
- "1.0456e+09" http_response_size_bytes_sum{handler="/v1/chat/completions"}
- "3.e+06" http_response_size_bytes_sum{handler="/v1/models"}
- "538.0" http_response_size_bytes_sum{handler="none"}
- ".0" process_cpu_seconds_total
- "2391.38" process_max_fds
- "1.0e+09" process_open_fds
- "48.0" process_resident_memory_bytes
- "4.e+08" process_start_time_seconds
- "1.e+09" process_virtual_memory_bytes
- "1.e+010" python_gc_collections_total{generation="0"}
- "5127.0" python_gc_collections_total{generation="1"}
- "465.0" python_gc_collections_total{generation="2"}
- "29.0" python_gc_objects_collected_total{generation="0"}
- "8032.0" python_gc_objects_collected_total{generation="1"}
- "1350.0" python_gc_objects_collected_total{generation="2"}
- "994.0" python_gc_objects_uncollectable_total{generation="0"}
- "0.0" python_gc_objects_uncollectable_total{generation="1"}
- "0.0" python_gc_objects_uncollectable_total{generation="2"}
- "0.0" python_info{implementation="CPython",major="3",minor="12",patchlevel="10",version="3.12.10"}
- "1.0" vllm:cache_config_info{block_size="16",cache_dtype="auto",calculate_kv_scales="False",cpu_offload_gb="0",enable_prefix_caching="True",gpu_memory_utilization="0.95",is_attention_free="False",num_gpu_blocks_override="None",prefix_caching_hash_algo="builtin",sliding_window="None",swap_space="4",swap_space_bytes=""}
- "1.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "491.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1088.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1947.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2563.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "3503.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "4373.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5264.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "8367.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "10710.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "15758.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "19765.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20837.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21082.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21252.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21336.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21514.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21560.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21563.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21565.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:e2e_request_latency_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:e2e_request_latency_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.05371e+09" vllm:e2e_request_latency_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- ".0" vllm:generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.02482e+09" vllm:generation_tokens_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:gpu_cache_usage_perc{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0066981" vllm:gpu_prefix_cache_hits_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.0216e+09" vllm:gpu_prefix_cache_hits_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- ".0" vllm:gpu_prefix_cache_queries_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.02012e+09" vllm:gpu_prefix_cache_queries_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="8.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="16.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="32.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="64.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="128.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="256.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="512.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="1024.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="2048.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="4096.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="8192.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="16384.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+06" vllm:iteration_tokens_total_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.03833e+09" vllm:iteration_tokens_total_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.e+07" vllm:num_preemptions_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.0228e+09" vllm:num_preemptions_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:num_requests_running{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.0" vllm:num_requests_waiting{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:prompt_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.02384e+09" vllm:prompt_tokens_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.e+07" vllm:request_decode_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1036.0" vllm:request_decode_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1763.0" vllm:request_decode_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2659.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "3215.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "3962.0" vllm:request_decode_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "4894.0" vllm:request_decode_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5906.0" vllm:request_decode_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "8900.0" vllm:request_decode_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "11080.0" vllm:request_decode_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "16114.0" vllm:request_decode_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20016.0" vllm:request_decode_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20972.0" vllm:request_decode_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21180.0" vllm:request_decode_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21317.0" vllm:request_decode_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21386.0" vllm:request_decode_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21521.0" vllm:request_decode_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21560.0" vllm:request_decode_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21563.0" vllm:request_decode_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21565.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_decode_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_decode_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_decode_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_decode_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.07164e+09" vllm:request_decode_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "." vllm:request_generation_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_generation_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "375.0" vllm:request_generation_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "680.0" vllm:request_generation_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1406.0" vllm:request_generation_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2613.0" vllm:request_generation_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "4881.0" vllm:request_generation_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "8468.0" vllm:request_generation_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "10587.0" vllm:request_generation_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20580.0" vllm:request_generation_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21278.0" vllm:request_generation_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21486.0" vllm:request_generation_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21561.0" vllm:request_generation_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21563.0" vllm:request_generation_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21565.0" vllm:request_generation_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_generation_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.03566e+09" vllm:request_generation_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:request_inference_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "497.0" vllm:request_inference_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1113.0" vllm:request_inference_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1963.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2577.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "3516.0" vllm:request_inference_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "4394.0" vllm:request_inference_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5300.0" vllm:request_inference_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "8410.0" vllm:request_inference_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "10739.0" vllm:request_inference_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "15812.0" vllm:request_inference_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "19801.0" vllm:request_inference_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20861.0" vllm:request_inference_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21104.0" vllm:request_inference_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21270.0" vllm:request_inference_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21345.0" vllm:request_inference_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21515.0" vllm:request_inference_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21560.0" vllm:request_inference_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21563.0" vllm:request_inference_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21565.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_inference_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_inference_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_inference_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_inference_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.05977e+09" vllm:request_inference_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "." vllm:request_max_num_generation_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "375.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "680.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1406.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2613.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "4881.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "8468.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "10587.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20580.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21278.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21486.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21561.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21563.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21565.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_max_num_generation_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_max_num_generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.0409e+09" vllm:request_max_num_generation_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:request_params_max_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.0" vllm:request_params_max_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2.0" vllm:request_params_max_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "4.0" vllm:request_params_max_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "7.0" vllm:request_params_max_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "136.0" vllm:request_params_max_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "422.0" vllm:request_params_max_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "439.0" vllm:request_params_max_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "924.0" vllm:request_params_max_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_max_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_max_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.04518e+09" vllm:request_params_max_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "6.e+08" vllm:request_params_n_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_n_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_n_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_n_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_n_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_n_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_n_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_params_n_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.04315e+09" vllm:request_params_n_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "17877.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "18517.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "19080.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "19493.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20099.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20423.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20583.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21096.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21308.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21468.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21525.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21539.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21565.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prefill_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.0625e+09" vllm:request_prefill_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "11950." vllm:request_prompt_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "52.0" vllm:request_prompt_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "110.0" vllm:request_prompt_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1287.0" vllm:request_prompt_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "15764.0" vllm:request_prompt_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "18257.0" vllm:request_prompt_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "19958.0" vllm:request_prompt_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21066.0" vllm:request_prompt_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21416.0" vllm:request_prompt_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21522.0" vllm:request_prompt_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prompt_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_prompt_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.03123e+09" vllm:request_prompt_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.e+07" vllm:request_queue_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21489.0" vllm:request_queue_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21497.0" vllm:request_queue_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21519.0" vllm:request_queue_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21531.0" vllm:request_queue_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21541.0" vllm:request_queue_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21550.0" vllm:request_queue_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21568.0" vllm:request_queue_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21574.0" vllm:request_queue_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.05698e+09" vllm:request_queue_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1095.94" vllm:request_success_created{engine="0",finished_reason="abort",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.02787e+09" vllm:request_success_created{engine="0",finished_reason="length",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.0273e+09" vllm:request_success_created{engine="0",finished_reason="stop",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.02663e+09" vllm:request_success_total{engine="0",finished_reason="abort",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:request_success_total{engine="0",finished_reason="length",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "16.0" vllm:request_success_total{engine="0",finished_reason="stop",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21558.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.01",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.1",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.2",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.4",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12097e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.05",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "4.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.15",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.025",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.075",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.096754e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.75",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="7.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="80.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5.12463e+06" vllm:time_per_output_token_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.05083e+09" vllm:time_per_output_token_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "." vllm:time_to_first_token_seconds_bucket{engine="0",le="0.001",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.01",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.1",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "10761.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.02",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.04",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "22.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.005",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "18421.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.06",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "2307.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.08",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "5692.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.25",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "17492.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.75",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "18898.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "19414.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "20526.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21039.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="7.5",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21181.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21258.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21496.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21561.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="80.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="160.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="640.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="2560.0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21576.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21576.0" vllm:time_to_first_token_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "21576.0" vllm:time_to_first_token_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "1.04754e+09" vllm:time_to_first_token_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"}
- "13699.4" ”可以根据这些数据推测出或者计算出“QPS”“最大运行数”"最大等待数""失败率""成功率""平均耗时(ms)"吗 “ class=“flex-1” data-v-5e667ebc>
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/280767.html