组织和知识工作者经常面临这样的任务:通过分析分散在多种文档和格式中的信息来编写结构化报告。这些文档通常包括 PowerPoint 演示文稿、PDF、Excel 表格、会议录音、电子邮件、网页链接以及其他非结构化或半结构化来源。
我的 AI 咨询经验表明,通过从多个文档中提取信息来生成 AI 驱动的报告的需求日益增长。
例如:
- 法务团队可能需要审核来自多个利益相关方的文件,以准备合同或合规报告。
- 项目经理可能需要将研讨会材料、笔记和录音整理成结构化的摘要,其中包含决策、行动事项和后续跟进。
- 采购团队通常会将供应商文件汇编成标准化的评估报告。
- 人力资源团队会将简历、面试记录和反馈意见整合到候选人评估报告中。
在所有这些用例中,任务都是相同的:从各种来源提取相关信息,并以清晰的结构呈现。
当输出必须遵循严格的模板时,任务会变得更加繁琐耗时,报告需要符合预定义的布局、章节名称、顺序和术语。
输入数据的多样性进一步增加了工作量。文本文件、电子表格、扫描文件、手写笔记、草图/图表以及音频/视频录音都需要不同的处理方式。因此,即使这个过程是例行且重复的,编制一份报告也可能需要数小时甚至数天的人工操作。
人工智能可以在这类可重复执行的任务中发挥巨大价值,提高生产力并节省大量时间。
Claude Skills 在自动化日常重复性任务方面具有巨大潜力。在我之前的文章中,我演示了创建自定义 Claude Skills 并将其应用于日常工作流程的分步方法。
在本文中,我将通过一个端到端的示例带您了解:
将多种格式的异构文档集合转换为遵循给定模板和/或示例的结构化报告。
我们将构建一个自定义的 Claude 技能,它利用内置工具和自定义工具来读取、分析和融合来自各种输入的信息,包括 Word 文档、PowerPoint 幻灯片、PDF、Excel 表格、图像以及音频/视频文件。该技能会聚合完整的上下文信息,提取与用户需求相关的信息,并生成结构化报告,报告可以使用默认布局,也可以使用用户指定的模板或参考文档。
由于 Claude 模型本身不具备音频或视频转录功能,我们将构建一个自定义的模型上下文协议 (MCP) 服务器,供 Claude 技能调用。该 MCP 服务器将处理音频/视频输入的转录。为此,我将使用我们开源的 GAIK 工具包的转录器包,该包以 gaik-transcriber MCP 服务器的形式公开。
完整的工作流程如下图所示。
本文中构建的示例技能与具体用例无关,可以轻松地应用于其他用例(无需编程)。
完整的代码可在 GitHub 仓库中找到。
让我们开始吧。
我们将创建一个转录 MCP 服务器来转录音频/视频文件。该 MCP 服务器将作为 Claude Skill 的一个工具使用。我们将使用我们开源项目的转录器模块。
使用 GAIK 工具包(开发中)搭建 MCP 服务器。
如果您的数据不包含音频/视频文件,则无需在您的电脑上搭建此 MCP 服务器。在这种情况下,请直接跳至下一节“创建 Claude 技能”。
请按照以下步骤使用 gaik[transcriber] 包创建转录 MCP 服务器。此 MCP 服务器将作为 Claude 技能中的转录工具,用于转录输入文档中的音频/视频文件。
首先,安装 Claude Desktop 并使用您的 Anthropic 账号登录。Claude Desktop 是连接转录 MCP 服务器所必需的。
安装 Node.js(npx/文件系统服务器需要)。请从官方安装程序下载并安装。
创建项目目录:
mkdir transcription-mcp cd transcription-mcp
安装依赖项。您可以安装整个 gaik 包,也可以只安装其转录器模块。
复制以下命令:pip install fastmcp gaik[transcriber]
- fastMCP:Python MCP 服务器框架 (FastMCP)
- gaik[transcriber]:用于音频/视频转录的 GAIK 转录器库
创建 server.py 文件,其中包含 MCP 服务器的主要实现。这里,我们使用 GAIK 工具包的转录器包。通过 FastMCP 为 MCP 服务器指定的名称为 gaik-transcriber。
请注意,transcribe_audio 中的注释是给 MCP 服务器的指令。
”““MCP Server for GAIK Transcriber”“” import os import sys from pathlib import Path from mcp.server.fastmcp import FastMCP
from gaik.building_blocks.transcriber import Transcriber, get_openai_config
Initialize MCP server
mcp = FastMCP(“gaik-transcriber”)
@mcp.tool() def transcribe_audio(file_path: str, enhanced: bool = False) -> str:
""" Transcribe audio/video file using GAIK Transcriber. ==== CRITICAL OUTPUT INSTRUCTIONS ==== You MUST return the transcription EXACTLY as provided by this tool. DO NOT: - Add any formatting (headers, bullets, bold, markdown) - Restructure or reorganize the text - Summarize or paraphrase any part - Add section labels or titles - Add any commentary before or after - Change any words or punctuation DO: - Output the text exactly as returned - Preserve the original flow and structure ===================================== Args: file_path: Full Windows path to audio/video file enhanced: If True, return enhanced transcript (default: False) Returns: The exact transcription text - output this verbatim with no changes. """ try: config = get_openai_config(use_azure=False) transcriber = Transcriber( api_config=config, enhanced_transcript=enhanced, ) result = transcriber.transcribe( file_path=Path(file_path), custom_context="", ) if enhanced and result.enhanced_transcript: return result.enhanced_transcript return result.raw_transcript except Exception as e: import traceback error_msg = f"Error: {str(e)}\n\nTraceback:\n{traceback.format_exc()}" print(error_msg, file=sys.stderr) return error_msg
if name == “main”:
mcp.run(transport="stdio")
server.py:
- 使用 GAIK 的 Transcriber 类
- 使用 @mcp.tool() 装饰器将函数公开为 MCP 工具
- 接受 file_path(字符串)和 enhanced(布尔值)参数
- 使用 stdio 传输(用于通信的标准输入/输出)
添加一个 .env 文件。您需要 OpenAI 的 API 密钥才能进行音频/视频转录。
# OpenAI API key (required if using OpenAI)
OPENAI_API_KEY=your_api_key
Azure API key (required if using Azure)
AZURE_API_KEY=your_api_key
Provider type: openai or azure
OPENAI_API_TYPE=openai
注意:您可以使用本地 Whisper 模型设置转录 MCP 服务器。在这种情况下,您无需 OpenAI 的 API 密钥。但是,这将需要 GPU 以加快处理速度。
至此,转录 MCP 服务器基本准备就绪。但是,您可能需要安装 ffmpeg。转录 MCP 服务器基于 GAIK 的转录器包,该转录器包使用 Whisper 模型。对于超过 25MB 的音频/视频文件,需要进行分块处理,转录器包会通过 ffmpeg 隐式地处理此操作。
从
https://ffmpeg.org/download.html 下载 FFmpeg 并将其解压到文件夹中。记下其二进制文件的路径(例如,
C:/ffmpeg-8.0.1-essentials_build/ffmpeg-8.0.1-essentials_build/bin),并将其添加到 Windows PATH 环境变量中。
打开 Claude Desktop MCP 配置文件: %APPDATA%\Claude
claude_desktop_config.json
如果该文件不存在,请创建它。
此文件告诉 Claude Desktop 可以启动和使用哪些本地 MCP 服务器,以及如何启动它们。
我们将在配置文件中配置两个 MCP 服务器:转录服务器(在 C:\path\to\whisper-mcp\server.py 中),以及文件服务器,它允许 Claude 读取/写入系统上的本地文件。
{ “mcpServers”: {
"gaik-transcriber": { "command": "python", "args": ["C:\\Users\\h02317\\whisper-mcp\\server.py"], "timeout": }, "filesystem": { "command": "npx", "args": [ "-y", "@modelcontextprotocol/server-filesystem", "C:\\" ] }
} }
配置分解:
- mcpServers:声明 Claude Desktop 应该启动哪些 MCP 服务器(每个服务器的命令/参数/环境变量)。
- gaik-transcriber.command:通过 Python 运行转录器 MCP 服务器(本地进程,通常使用 STDIO 传输)。
- gaik-transcriber.args:server.py(MCP 服务器入口点)的 Windows 绝对路径。
- gaik-transcriber.timeout: 毫秒(10 分钟),以允许长时间运行的转录任务。
- filesystem.command:使用 npx 运行官方文件系统 MCP 服务器。
- filesystem.args:-y 避免提示;@modelcontextprotocol/server-filesystem 是软件包;C:\ 是工具允许访问的根路径。
关闭 Claude Desktop。也可以从系统托盘或任务管理器中关闭它。重新启动 Claude Desktop 以加载新的 MCP 服务器配置。
现在,您应该可以在 Claude Desktop 中看到两个 MCP 服务器(文件系统和 gaik-transcriber)。
您也可以在 Claude Desktop 的“设置”→“连接器”中查看和配置这些工具。
如果出现错误,请检查 %APPDATA%\Roaming\Claude\logs 目录下的日志。特别是,请检查两个 MCP 服务器的 mcp-server-filesystem.txt 和
mcp-server-gaik-transcriber.txt 日志文件。
使用音频或视频文件进行测试。Claude Desktop 的原生环境是 Linux。由于转录 MCP 服务器运行在 Windows 系统上,我们需要提供音频/视频文件的完整 Windows 路径。
在 Claude Desktop 中:
Transcribe the file in C:\Users\h02317\Downloads\video.mp4
GAIK 的转录模块还提供了一个选项,可以通过在适当的位置创建行、创建段落和对话以及修复任何拼写或语法错误来增强转录文本。为此,我们可以请求 MCP 服务器按如下方式增强转录:
Transcribe the file in C:\Users\h02317\Downloads\video.mp4 enhanced: trueArchitecture Overview
我们现在生成一个技能,该技能可以处理多种格式的文档:docx、ppt、pdf、xlsx、图像和音频/视频。它使用 gaik-transcriber MCP 服务器处理音频/视频文件,并使用 Claude 的内置技能和工具(pdf、docx、xlsx、pptx、视图)处理其他文档类型。
然后,该技能融合所有上下文,并考虑一个可选模板和一个可选示例文档来生成结构化报告。可选模板是一个包含所需结构(标题、副标题、章节、表格等)的空白文档。该技能可以直接编辑此模板来生成报告。
示例文档是一份完整的报告,其格式、风格和结构均符合特定要求。
如果没有提供模板和/或示例文档,该技能将使用预定义的结构生成报告。
例如,我们创建了一个会议记录技能,可以将分散的会议数据(录音、手写笔记、电子笔记、图表、草图、补充文档)转换为结构化的 MS Word 文档,其中包含摘要、决策、行动项、未决问题和后续消息,可以直接复制粘贴到发送给与会者的电子邮件中。
该示例技能不仅限于处理会议数据。它可以轻松修改,以适应其他需要从多种不同格式的文档生成报告的用例。
以下是技能结构:
documenting-meetings/ ├── SKILL.md # Main skill definition & workflow ├── EVALUATION.md # Test scenarios & evaluation criteria └── reference/
├── INPUT_FORMATS.md # Input type handling details └── OUTPUT_SECTIONS.md # Output section guidance
请在 GitHub 仓库中查看技能包中的所有文件。
SKILL.md 是主文件,作为入口点,其中包含 Claude 用于决定何时以及如何运行技能的“操作指南”。
在这个包中,我保留了 SKILL.md 中的核心工作流程,并将详细的“操作方法”指南拆分到参考文件中,以确保技能既清晰易读又准确无误。
SKILL.md 以 YAML frontmatter 开头,其中命名了技能(documenting-meetings)并定义了其功能。紧接着,该技能会说明其**运行环境(Claude Desktop)及其原因:它需要两个外部支持——用于安全文件夹/文件操作的 MCP 文件系统服务器和用于转录的 gaik-transcriber MCP 服务器。
---
name: documenting-meetings
description: Converts scattered meeting data (recordings, handwritten notes, diagrams, digital notes, supplementary documents) into a structured MS Word deliverable with summary, decisions, action items, open questions, and follow-up message. Use when the user mentions meeting notes, meeting summary, meeting minutes, action items from meeting, meeting documentation, or needs to consolidate meeting materials.
Meeting Documentation
Converts scattered meeting materials—audio/video recordings, handwritten notes, diagrams, digital notes, and supplementary documents—into a single, well-formatted MS Word document containing a concise summary, decisions, action items with owners and due dates, open questions, and a ready-to-send follow-up message.
It is designed to run in Claude Desktop with:
- An MCP filesystem server (for listing/reading files and folders)
- An MCP gaik-transcriber server (for transcribing audio/video recordings)
When to Use
Use this skill when:
- User mentions “meeting notes”, “meeting summary”, “meeting minutes”, “meeting documentation”
- User needs to consolidate multiple meeting materials into one document
- User asks to extract action items, decisions, or follow-ups from a meeting
- User has meeting recordings, handwritten notes, or other meeting artifacts to process
- User wants a structured deliverable from meeting data
Inputs
Required (at least one):
- Meeting recordings (audio/video files) – transcribed using
gaik-transcriber:transcribe_audio - Handwritten notes (scanned images) – interpreted visually
- Digital notes (text files, markdown, etc.)
- Diagrams/sketches/figures (images)
- Comments or notes from multiple people
Optional:
- Supplementary documents (PowerPoint slides, PDF guidelines, policy documents, Excel files)
- Output template (blank .docx with predefined headers, sections, logos, etc.)
- Sample output document (.docx or .pdf) defining style, format, tone, and length to follow
Required parameter:
input_folder: Path to the main input folder containing the required subfolder structure. If not provided, ask the user to specify it.
Required folder structure:
/ ├── input_documents/ # Required: recordings, photos, notes, presentations, PDFs, etc. ├── templates/ # Optional: blank template with predefined headers, sections, logos └── sample_documents/ # Optional: sample document defining style, format, tone, length
Tooling Rules (Windows vs Linux Path Safety)
Why this matters
On Windows, Claude Desktop + toolchains sometimes behave like they are in a POSIX shell, producing paths like /mnt/c/.... Meanwhile, your MCP servers may run native Windows Python, expecting C:\.... This mismatch can cause “file not found” or failing shell commands.
Strict rules
- Prefer MCP filesystem tools for file/folder operations Use the filesystem server for listing and reading files instead of shell commands.
- Avoid bash commands on Windows If you must run a command on Windows, prefer PowerShell.
- When calling gaik-transcriber, prefer Windows drive-letter paths on Windows Pass file paths like
C:\Users\...\recording.m4a. If you only have a POSIX/WSL path (e.g.,/mnt/c/...), convert it to a Windows path before calling the transcriber, or rely on the transcriber server’s internal normalization (recommended). - Never assume the environment is Linux Treat the runtime as OS-ambiguous and enforce the above rules to stay stable.
- Never do the following: NEVER run pip install, python -c, pdfplumber, or any ad-hoc parsing code for .pdf/.pptx/.xlsx.
NEVER use /mnt/user-data/uploads/… paths; only use paths returned by the MCP filesystem listing or the user-provided Windows folder.
If you are about to do any of the above, STOP and switch to the built-in PDF/PPTX/XLSX skills.
Workflow
Step 0 — Collect context
Ask (only if not provided):
- Meeting title or purpose (optional)
- Desired output format (Markdown is default)
- Any special focus: “only action items”, “only decisions”, “customer-facing summary”, etc.
Step 1 — Validate input folder structure and capabilities
If the user has not specified an input folder path, ask for it and confirm it contains input_documents/ (required).
1) Validate folder structure (MCP filesystem only)
Use the filesystem MCP tool to list:
(required)\input_documents (optional)\templates (optional)\sample_documents
If input_documents/ is missing or empty, stop and ask the user to add the meeting artifacts there.
2) Capability check (prevents Windows/Linux path loops for binaries)
Purpose: decide upfront whether this environment can process binary files from a Windows folder without requiring the user to upload them.
Inventory binary files found in:
\input_documents \templates \sample_documents
Treat the following as binary (not safely readable via text tools):
.pdf,.pptx,.xlsx,.docx
Decision:
- If ANY binary files exist and are ONLY in the Windows folder:
- Assume you cannot process them directly unless you have a binary-capable tool.
- The official Node filesystem MCP server supports reading text files and reading image/audio media, but does not guarantee generic binary reads for Office/PDF files. :contentReference[oaicite:2]{index=2}
- Therefore:
- If a dedicated MCP document-parser tool is available (recommended), use it for these files.
- Otherwise, you MUST ask the user to upload/attach these binaries in Claude Desktop to process them with built-in PDF/PPTX/XLSX/DOCX skills.
If the user asks for “local-folder only” processing of PDF/PPTX/XLSX/DOCX:
- Explain that this requires either:
- a binary-capable filesystem MCP server (supports base64/binary reads), or
- a Windows-native document-parser MCP server. (Example of a filesystem MCP server that explicitly supports binary/base64 reads:
mark3labs/mcp-filesystem-server.) :contentReference[oaicite:3]{index=3}
Continue with the core workflow (transcription + text notes + images) regardless of the binary handling outcome.
Step 2 — Inventory input files
From input_documents/, create a quick inventory:
- Recordings (audio/video)
- Images (handwritten notes, diagrams)
- Text documents (agenda, minutes draft, emails, etc.)
- PDFs / slides (read text if possible; otherwise summarize)
Step 3 — Transcribe recordings (gaik-transcriber MCP tool)
For each audio/video file, call:
gaik-transcriber:transcribe_audiofile_path: full path to the recordingenhanced: false by default (true only if user asks for enhanced quality)
If transcription fails with “file not found”:
- Re-check the path style and ensure Windows drive-letter paths on Windows.
Step 4 - Images (Handwritten Notes, Diagrams, Sketches, Figures)
For each image file (.jpg, .jpeg, .png, .gif, .webp, .bmp, .tiff):
- Interpret the image content (handwritten notes, diagrams, figures)
- Create a textual description capturing all relevant information
Step 5 - Notes
Read files directly (.txt, .md, .rtf). Use /mnt/skills/public/docx/SKILL.md for reading/writing .docx files.
Step 6 — Supplementary documents (.pdf, .pptx, .xlsx, .docx)
Goal: extract relevant information from supplementary documents WITHOUT ad-hoc parsing code and WITHOUT Windows/Linux path mismatches.
Non-negotiables (hard rules)
- NEVER run
cp,pip install,python -c,pdfplumber,soffice,pandoc, or any ad-hoc parsing commands to read.pdf/.pptx/.xlsx/.docxfrom Windows paths. - NEVER assume
C:\...or/mnt/c/...is accessible inside a Linux sandbox. - NEVER invent upload paths (e.g.,
/mnt/user-data/uploads/...) unless the environment explicitly provides them. - Do not use the filesystem MCP server to “load built-in skill files.” The filesystem server is for user-allowed directories, not Claude’s internal skill library. (Use built-in skills directly when attachments are available.) :contentReference[oaicite:4]{index=4}
Decision tree
A) If the supplementary file is uploaded/attached in Claude Desktop
- Use the corresponding built-in skill:
.pdf→ PDF skill.pptx→ PPTX skill.xlsx→ XLSX skill.docx→ DOCX skill
- Extract only relevant content for the meeting deliverable (decisions, timelines, roadmap items, action items, risks).
- Attribute extracted content by filename.
B) If the supplementary file is ONLY present in the Windows folder (discovered via filesystem:list_directory)
- Text-like files (
.txt,.md,.csv,.json)
- Read via filesystem
read_text_fileand extract relevant content.
- Binary files (
.pdf,.pptx,.xlsx,.docx) — IMPORTANT
- Do NOT attempt conversion or parsing via sandbox tools (pandoc/python/soffice/etc.).
- If a dedicated MCP document-parser tool is available:
- Call the parser using the Windows path and use returned extracted text/tables in synthesis.
- Otherwise:
- Ask the user to upload/attach the file(s) in Claude Desktop.
- Continue processing what you can (transcripts, notes, images) and list the missing binaries under “Missing inputs”.
Template + samples:
- If templates/sample documents are
.docx/.pdf/.pptx/.xlsxand are only on Windows disk:- Ask the user to upload them.
- If not provided, proceed with a clean default Markdown structure.
Output handling
- If supplementary binaries are unavailable (not uploaded, and no parser tool), clearly list them:
- Missing inputs:
,, …
- Missing inputs:
- Produce the meeting deliverable using available evidence and a default format.
- Do not block the entire workflow just because supplementary binaries are missing.
Step 7: Fuse Information
Combine all processed inputs into a single consolidated text block with clear separators:
=== TRANSCRIPTION:
===
=== HANDWRITTEN NOTES:
===
=== DIGITAL NOTES:
===
=== DIAGRAM/FIGURE:
===
=== SUPPLEMENTARY:
===
Step 8: Check for Template and Sample Documents
Check the dedicated subfolders for template and sample:
Template (
):
- Look for a blank .docx file with predefined structure (headers, sections, logos)
- If multiple files exist, use the first .docx file found.
Sample (
):
- Look for a .docx or .pdf file defining the required style, format, tone, and length
- If multiple files exist, use the first document found
If found:
- For template: Copy it and fill in the content (do not modify structure)
- For sample: STRICTLY follow its format, style, tone, and length
Step 9: Generate the Deliverable
Read the docx skill before creating the document:
view /mnt/skills/public/docx/SKILL.md
Then follow the docx skill’s “Creating a new Word document” workflow to generate the output.
If template provided: Copy the template and fill in sections according to the template structure.
If no template: Create a new document using the output format below.
Step 10: Save and Present
- Save the document to the
input_documentsfolder - Use
present_filesto share with the user
Output Format (FLEXIBLE - adapt if template/sample provided)
When no template or sample is provided, use this structure:
MEETING SUMMARY =============== Date: [extracted or inferred date] Attendees: [if identifiable from inputs] Duration: [if available] --- EXECUTIVE SUMMARY ----------------- [2-4 paragraph concise summary of the meeting covering main topics discussed, key points, and overall outcomes. Keep factual, based only on input content.] --- DECISIONS MADE -------------- 1. [Decision text] - Context: [brief context if available] 2. [Decision text] - Context: [brief context if available] [If no decisions found in inputs, OMIT this section entirely] --- ACTION ITEMS ------------ | # | Action Item | Owner | Due Date | Priority | |---|-------------|-------|----------|----------| | 1 | [description] | [name] | [date] | [H/M/L] | | 2 | [description] | [name] | [date] | [H/M/L] | [If owner/due date not specified in inputs, mark as "TBD"] [If no action items found, OMIT this section entirely] --- OPEN QUESTIONS -------------- 1. [Question that was raised but not resolved] 2. [Question requiring follow-up] [If no open questions found, OMIT this section entirely] --- FOLLOW-UP MESSAGE ----------------- [Ready-to-paste message for email or chat, summarizing key outcomes and next steps. Keep professional and concise. Format as:] Subject: Meeting Follow-up - [Topic/Date] Hi team, [1-2 paragraphs summarizing the meeting, key decisions, and action items] Next steps: - [Action item 1] - [Owner] by [Date] - [Action item 2] - [Owner] by [Date] Please let me know if you have any questions. Best regards, [Sender placeholder]
Guardrails
Do:
- Extract information faithfully from provided inputs
- Mark uncertain information as “TBD” or “unclear from recording”
- Preserve original terminology and names from the inputs
- STRICTLY follow template/sample format when provided
- Omit sections if no relevant information exists in inputs
Do NOT:
- Invent or hallucinate any information not present in inputs
- Add action items, decisions, or attendees not mentioned in source materials
- Make assumptions about dates, owners, or deadlines not explicitly stated
- Include sections in the deliverable if the information is not in the inputs
If information is missing:
- For required fields: Mark as “TBD” or “Not specified in meeting materials”
- For entire sections: Omit the section from the deliverable
- If critical inputs are missing: Inform the user what additional materials would help
Error handling:
- If transcription fails: Report the error and continue with other inputs
- If a file cannot be parsed: Log the issue and proceed with remaining files
- If no usable inputs found: Ask the user to verify the folder path and file formats
Examples
Example 1: Standard Meeting with Recording and Notes
User prompt: “Process my meeting materials from /home/user/meetings/q4-planning and create a summary document”
Expected folder structure:
/home/user/meetings/q4-planning/ ├── input_documents/ │ ├── meeting-recording.mp4 │ ├── whiteboard-photo.jpg │ └── my-notes.txt ├── templates/ # (empty or absent) └── sample_documents/ # (empty or absent)
Expected behavior:
- Validates folder structure, finds input_documents/ with 3 files
- Transcribes recording using gaik-transcriber
- Interprets whiteboard photo
- Reads digital notes
- Fuses all content with separators
- Generates Word document with all applicable sections (no template/sample)
- Saves to outputs and presents to user
Example 2: With Template and Sample
User prompt: “Create meeting minutes from the files in /meetings/standup using our company template”
Expected folder structure:
/meetings/standup/ ├── input_documents/ │ ├── recording.m4a │ └── notes.txt ├── templates/ │ └── company-template.docx └── sample_documents/ └── sample-minutes.docx
Expected behavior:
- Finds template in templates/ subfolder
- Finds sample in sample_documents/ subfolder
- Processes all materials in input_documents/
- Copies template and fills in content following sample’s style
- Presents formatted document
Example 3: Minimal Inputs
User prompt: “I just have a voice memo from our call - can you turn it into meeting notes? The folder is /recordings/client-call”
Expected folder structure:
/recordings/client-call/ ├── input_documents/ │ └── voice-memo.m4a ├── templates/ # (empty or absent) └── sample_documents/ # (empty or absent)
Expected behavior:
- Validates structure, finds single audio file in input_documents/
- Transcribes the audio file
- Generates deliverable with available sections only
- Omits sections where no information exists
- Notes in follow-up message that details may need verification
References
Following these reference documents for detailed handing for each input file type, and guidance on each deliverable section.
reference/INPUT_FORMATS.md– Detailed handling for each input file typereference/OUTPUT_SECTIONS.md– Guidance on each deliverable section步骤 0 是快速上下文检查,用于检查基本要素。步骤 1 是可靠性步骤,使用 MCP 文件系统列表验证所需的文件夹结构。
步骤 2 确定实际存在的内容(录音、图像、笔记、补充文档)。步骤 3 使用
gaik-transcriber:transcribe_audio 转录音频/视频文件,默认值为 enhanced: false。步骤 4 和 5 处理图像和文本笔记。图像会被可视化地转换为结构化文本,而 .txt/.md/.rtf 格式的笔记则会被直接读取。
步骤 6 会阻止不安全的方法(例如禁用 pip 安装、禁用 python -c、禁用 soffice/pandoc hack、禁用猜测 /mnt/… 路径),并在以下三种方式中选择一种:(A) 使用正确的文档处理方式处理附加文件;(B) 使用 MCP 处理类似文本的格式;© 在二进制文件无法访问时列出缺失的文件。
处理完所有可用输入后,步骤 7 会将所有内容合并到一个统一的证据块中,并按文件名(笔录、手写笔记、电子笔记、图表、补充摘录)进行清晰分隔。
步骤 8-9 随后从提取切换到生成。该技能会检查是否存在可选的模板和示例文档,如果提供了,则严格按照模板和示例文档执行;否则,将生成一个清晰的默认文档结构。最后,步骤 10 会显示生成的文件。
该技能还有三个辅助文件作为“附录”,以保持 SKILL.md 的简洁性,同时确保行为的一致性。
reference/INPUT_FORMATS.md 文档记录了每种输入类型的处理规则(支持的音频/视频扩展名、确切的转录调用、图像解读预期以及如何处理补充文档和模板/示例)。
reference/OUTPUT_SECTIONS.md 定义了每个输出部分的含义、显示时机以及至关重要的省略时机(例如,如果没有明确的决策,则删除“决策”部分,而不是填充占位符)。EVALUATION.md 提供了测试提示和通过/失败标准,我可以使用这些提示和标准在实际场景中验证该技能(多输入会议、模板+示例运行、纯音频会议、错误恢复以及“缺少文件夹路径”的情况)。
要在 Claude Desktop 中使用该技能,请将技能文件夹压缩成 .zip 文件,然后点击“+添加”将其上传到“设置”→“功能”。
我使用示例文档(由 Claude Opus 4.5 生成)运行了该技能。这些文档位于技能要求的文件夹结构中。该技能至少需要一份输入文档才能处理。
input_folder/ ├── input_documents/ # Required: Meeting artifacts to process │ ├── deployment-freeze-policy.pdf # Holiday freeze policy (Dec 23 - Jan 2), on-call schedule, exception rules │ ├── notes.txt # Attendee’s handwritten notes with action items, decisions, key takeaways │ ├── project-budget.xlsx # Q3 budget allocation & spending across 5 project categories │ ├── roadmap-presentation.pptx # 2-slide deck: title slide + project status overview (Mobile, Dashboard, API) │ └── sketch.png # Whiteboard timeline diagram showing Q3/Q4 milestones & key decisions ├── sample_documents/ # Optional: Reference for output style/format │ └── sample-meeting-minutes.docx # Example meeting minutes (Q2 Planning) defining tone, structure, length └── templates/ # Optional: Blank template with predefined sections └── meeting-template.docx # Template with placeholders for summary, decisions, actions, follow-up除了这些文档之外,还单独提供了一个会议录音示例文件(.mp3)。这样做是为了避免 Windows 和 Linux 环境之间的文件路径不匹配,因为上传的文件位于 Claude Desktop 的原生 Linux 环境中;而转录 MCP 服务器运行在 Windows 环境中,音频/视频文件位于文件服务器 MCP 中。
示例数据可在 GitHub 代码库中找到。
将输入文档压缩成 .zip 文件并上传到 Claude Desktop。
Claude Desktop 中处理数据的提示:
Process the documents in the uploaded folder using documenting-meetings skill. In addition to these documents, read one more audio file from local drive C:\Users\h02317\Downloads\meeting_recording.mp3.除了这些文档之外,还要从本地驱动器 C:\Users\h02317\Downloads\meeting_recording.mp3 读取另一个音频文件。
该技能处理数据的方式如下图所示,图中展示了完整的代理工作流程。
该技能遵循示例文档
sample-meeting-minutes.docx,生成基于给定模板 meeting-template.docx 的输出文档。以下是生成的报告的截图。其长度、样式、格式和语气均与给定模板和示例文档完全一致。
本文创建的示例技能演示了如何将多个不同格式的文档转换为所需格式的结构化报告。
该技能可以轻松地应用于其他用例。
在 Skill.md 文件中,您可能需要更改名称、描述、输入发现和预处理步骤(要扫描的文件夹以及相关的文件类型)以及输出构建步骤(章节结构、语气以及最终文件的写入位置/方式)。此外,还需要修改步骤 7 和步骤 10。
接下来,更新两个包含大部分“业务规则”的参考文件,这样您就不必重写整个工作流程。如果您的新用例接受不同的媒体格式(例如,网络研讨会录像、访谈、客户电话、课堂讲座),请调整 INPUT_FORMATS.md 文件。您可以在此文件中定义支持的格式、每种格式的处理方式以及需要跳过或标记的内容。
如果您的新用例需要不同的交付物(例如,事件报告、销售电话摘要、合规备忘录、项目状态更新),请编辑 OUTPUT_SECTIONS.md 文件。您可以在此文件中重新定义章节结构、每个章节的含义以及在缺少证据时需要省略的内容。
最后,请将 EVALUATION.md 文件视为您的安全保障:每当您更改输入、工具或输出结构时,请更新测试用例,以便您可以快速验证该功能是否仍然正常运行(尤其是在缺少文件、输入不完整或工具出现故障的情况下)。
该技能可应用于多种场景,并可通过多种方式进行改进:
使用 Claude Agent SDK,通过运行同一技能包,即可将该技能转化为应用程序工作流。该 SDK 代理可以以编程方式调用技能、文件系统工具和您的 MCP 工具。
扩展该技能,使其适用于生成-修订-修正工作流,其中生成的报告将进行检查,并在必要时进行修正。此过程可能需要多轮,直到报告通过验证标准。
探索其他用例,例如简历分析和模板优化、基于客户和公司文档的销售提案,或根据内部政策和证据文件合成合规性文档。
原文链接:用Claude Skills 提取文档 - 汇智网
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/273377.html