refactor: 通用技能按类别拆分为独立目录
skills/ → skills-dev(9), skills-req(10), skills-ops(4), skills-integration(8), skills-biz(4), skills-workflow(7) generate-marketplace.py 改为自动扫描所有 skills-* 目录。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"name": "data-excel-plugin",
|
||||
"description": "Plugin for data-excel",
|
||||
"version": "1.0.0",
|
||||
"author": {
|
||||
"name": "qiudl"
|
||||
}
|
||||
}
|
||||
443
skills-integration/data-excel-plugin/skills/SKILL.md
Normal file
443
skills-integration/data-excel-plugin/skills/SKILL.md
Normal file
@@ -0,0 +1,443 @@
|
||||
---
|
||||
name: data-excel
|
||||
description: Excel 数据处理与 BI 集成。通过自然语言操作 Excel 文件的读取、编辑、转换,并支持导入到 BI 系统(Metabase)进行可视化分析。当用户提到 Excel、表格处理、数据导入、BI 分析相关任务时自动激活。
|
||||
---
|
||||
|
||||
# Excel 数据处理与 BI 集成 Skill
|
||||
|
||||
## 功能概述
|
||||
|
||||
- **Excel 读取**: 读取 .xlsx/.xls 文件内容,支持多 Sheet
|
||||
- **Excel 编辑**: 修改单元格、添加/删除行列、格式化
|
||||
- **数据转换**: Excel ↔ CSV ↔ JSON ↔ SQL
|
||||
- **BI 集成**: 导入数据到 Metabase 进行可视化
|
||||
|
||||
---
|
||||
|
||||
## 环境依赖
|
||||
|
||||
```bash
|
||||
# Python 包(推荐使用 uv)
|
||||
uv pip install pandas openpyxl xlrd xlsxwriter sqlalchemy pymysql
|
||||
|
||||
# 或使用 pip
|
||||
pip install pandas openpyxl xlrd xlsxwriter sqlalchemy pymysql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 自然语言操作示例
|
||||
|
||||
### 读取操作
|
||||
|
||||
| 用户说 | 执行操作 |
|
||||
|--------|----------|
|
||||
| "读取这个 Excel 文件" | 读取并显示内容摘要 |
|
||||
| "看一下第二个 Sheet" | 切换到指定 Sheet |
|
||||
| "显示前 20 行" | 限制显示行数 |
|
||||
| "这个表有哪些列" | 列出列名和数据类型 |
|
||||
|
||||
### 编辑操作
|
||||
|
||||
| 用户说 | 执行操作 |
|
||||
|--------|----------|
|
||||
| "把 A 列的空值填充为 0" | 填充缺失值 |
|
||||
| "删除重复行" | 去重 |
|
||||
| "把日期列转成 YYYY-MM-DD 格式" | 格式化日期 |
|
||||
| "添加一列计算总价=单价×数量" | 新增计算列 |
|
||||
| "筛选销售额大于 1000 的记录" | 过滤数据 |
|
||||
| "按部门汇总销售额" | 分组聚合 |
|
||||
|
||||
### 导出操作
|
||||
|
||||
| 用户说 | 执行操作 |
|
||||
|--------|----------|
|
||||
| "导出为 CSV" | 转换格式 |
|
||||
| "生成 SQL 插入语句" | 生成 INSERT 语句 |
|
||||
| "导入到数据库" | 写入 MySQL/PostgreSQL |
|
||||
| "上传到 Metabase" | BI 系统集成 |
|
||||
|
||||
---
|
||||
|
||||
## Python 代码模板
|
||||
|
||||
### 读取 Excel
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
# 读取 Excel 文件
|
||||
df = pd.read_excel('data.xlsx')
|
||||
|
||||
# 读取指定 Sheet
|
||||
df = pd.read_excel('data.xlsx', sheet_name='Sheet2')
|
||||
|
||||
# 读取所有 Sheet
|
||||
all_sheets = pd.read_excel('data.xlsx', sheet_name=None)
|
||||
for name, sheet_df in all_sheets.items():
|
||||
print(f"Sheet: {name}, 行数: {len(sheet_df)}")
|
||||
|
||||
# 显示基本信息
|
||||
print(df.head(10)) # 前 10 行
|
||||
print(df.columns) # 列名
|
||||
print(df.dtypes) # 数据类型
|
||||
print(df.describe()) # 统计摘要
|
||||
```
|
||||
|
||||
### 编辑 Excel
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_excel('data.xlsx')
|
||||
|
||||
# 填充空值
|
||||
df['列名'].fillna(0, inplace=True)
|
||||
|
||||
# 删除重复行
|
||||
df.drop_duplicates(inplace=True)
|
||||
|
||||
# 添加计算列
|
||||
df['总价'] = df['单价'] * df['数量']
|
||||
|
||||
# 筛选数据
|
||||
df_filtered = df[df['销售额'] > 1000]
|
||||
|
||||
# 分组汇总
|
||||
df_summary = df.groupby('部门')['销售额'].sum().reset_index()
|
||||
|
||||
# 日期格式化
|
||||
df['日期'] = pd.to_datetime(df['日期']).dt.strftime('%Y-%m-%d')
|
||||
|
||||
# 重命名列
|
||||
df.rename(columns={'旧列名': '新列名'}, inplace=True)
|
||||
|
||||
# 排序
|
||||
df.sort_values(by='销售额', ascending=False, inplace=True)
|
||||
|
||||
# 保存修改
|
||||
df.to_excel('output.xlsx', index=False)
|
||||
```
|
||||
|
||||
### 格式化 Excel(带样式)
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
from openpyxl import Workbook
|
||||
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
|
||||
|
||||
# 创建 Excel 并设置样式
|
||||
df = pd.read_excel('data.xlsx')
|
||||
|
||||
with pd.ExcelWriter('styled_output.xlsx', engine='openpyxl') as writer:
|
||||
df.to_excel(writer, index=False, sheet_name='数据')
|
||||
|
||||
workbook = writer.book
|
||||
worksheet = writer.sheets['数据']
|
||||
|
||||
# 设置标题行样式
|
||||
header_font = Font(bold=True, color='FFFFFF')
|
||||
header_fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
|
||||
|
||||
for col in range(1, len(df.columns) + 1):
|
||||
cell = worksheet.cell(row=1, column=col)
|
||||
cell.font = header_font
|
||||
cell.fill = header_fill
|
||||
cell.alignment = Alignment(horizontal='center')
|
||||
|
||||
# 自动调整列宽
|
||||
for column in worksheet.columns:
|
||||
max_length = max(len(str(cell.value or '')) for cell in column)
|
||||
worksheet.column_dimensions[column[0].column_letter].width = max_length + 2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 数据转换
|
||||
|
||||
### Excel → CSV
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_excel('data.xlsx')
|
||||
df.to_csv('output.csv', index=False, encoding='utf-8-sig') # utf-8-sig 解决中文乱码
|
||||
```
|
||||
|
||||
### Excel → JSON
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_excel('data.xlsx')
|
||||
|
||||
# 转为 JSON 数组
|
||||
json_str = df.to_json(orient='records', force_ascii=False, indent=2)
|
||||
print(json_str)
|
||||
|
||||
# 保存为文件
|
||||
with open('output.json', 'w', encoding='utf-8') as f:
|
||||
f.write(json_str)
|
||||
```
|
||||
|
||||
### Excel → SQL INSERT
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_excel('data.xlsx')
|
||||
table_name = 'my_table'
|
||||
|
||||
# 生成 INSERT 语句
|
||||
def generate_insert_sql(df, table_name):
|
||||
columns = ', '.join(df.columns)
|
||||
values_list = []
|
||||
for _, row in df.iterrows():
|
||||
values = ', '.join([f"'{v}'" if isinstance(v, str) else str(v) for v in row])
|
||||
values_list.append(f"({values})")
|
||||
|
||||
sql = f"INSERT INTO {table_name} ({columns}) VALUES\n" + ',\n'.join(values_list) + ';'
|
||||
return sql
|
||||
|
||||
sql = generate_insert_sql(df, table_name)
|
||||
print(sql)
|
||||
```
|
||||
|
||||
### Excel → MySQL 直接导入
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
from sqlalchemy import create_engine
|
||||
|
||||
# 数据库连接
|
||||
engine = create_engine('mysql+pymysql://user:password@host:3306/database')
|
||||
|
||||
# 读取 Excel
|
||||
df = pd.read_excel('data.xlsx')
|
||||
|
||||
# 导入数据库(如果表存在则替换)
|
||||
df.to_sql('table_name', engine, if_exists='replace', index=False)
|
||||
|
||||
# 或追加数据
|
||||
df.to_sql('table_name', engine, if_exists='append', index=False)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## BI 系统集成 (Metabase)
|
||||
|
||||
### Metabase 服务信息
|
||||
|
||||
| 项目 | 值 |
|
||||
|------|-----|
|
||||
| 服务器 | prod-metaBI (192.144.174.87) |
|
||||
| SSH 用户 | ubuntu |
|
||||
| SSH 密钥 | ~/.ssh/prod_meta.pem |
|
||||
| 系统 | Ubuntu 24.04 LTS |
|
||||
| 数据源 | MySQL |
|
||||
| 所属公司 | 北京欢乐宿 |
|
||||
|
||||
### SSH 连接
|
||||
|
||||
```bash
|
||||
# 连接 Metabase 服务器
|
||||
ssh prod-metaBI
|
||||
|
||||
# 查看 MySQL 状态
|
||||
ssh prod-metaBI "docker ps | grep mysql"
|
||||
|
||||
# 查看 Metabase 容器状态
|
||||
ssh prod-metaBI "docker ps | grep metabase"
|
||||
```
|
||||
|
||||
### MySQL 连接配置 (Metabase 数据源)
|
||||
|
||||
| 配置项 | 值 |
|
||||
|--------|-----|
|
||||
| Host | `127.0.0.1` |
|
||||
| Port | `3306` |
|
||||
| Database | `finance_db` |
|
||||
| Username | `root` |
|
||||
| Password | `root123456` |
|
||||
|
||||
> **注意**: MySQL 8.0 需使用 `mysql_native_password` 认证,否则会报 RSA 公钥错误。
|
||||
|
||||
### 数据导入流程
|
||||
|
||||
```
|
||||
Excel 文件 → Python 处理 → MySQL 数据库 → Metabase 可视化
|
||||
```
|
||||
|
||||
### 完整导入脚本
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
from sqlalchemy import create_engine
|
||||
import os
|
||||
|
||||
def excel_to_metabase(excel_path, table_name, db_config):
|
||||
"""
|
||||
将 Excel 数据导入到 Metabase 可查询的数据库
|
||||
|
||||
Args:
|
||||
excel_path: Excel 文件路径
|
||||
table_name: 目标表名
|
||||
db_config: 数据库配置 dict
|
||||
"""
|
||||
# 读取 Excel
|
||||
df = pd.read_excel(excel_path)
|
||||
|
||||
# 数据清洗
|
||||
df.columns = df.columns.str.strip() # 去除列名空格
|
||||
df = df.dropna(how='all') # 删除全空行
|
||||
|
||||
# 连接数据库
|
||||
engine = create_engine(
|
||||
f"mysql+pymysql://{db_config['user']}:{db_config['password']}"
|
||||
f"@{db_config['host']}:{db_config['port']}/{db_config['database']}"
|
||||
)
|
||||
|
||||
# 导入数据
|
||||
df.to_sql(table_name, engine, if_exists='replace', index=False)
|
||||
|
||||
print(f"✓ 已导入 {len(df)} 行数据到表 {table_name}")
|
||||
print(f"→ 现在可以在 Metabase 中查询此表")
|
||||
|
||||
return df
|
||||
|
||||
# 使用示例
|
||||
db_config = {
|
||||
'host': 'localhost',
|
||||
'port': 3306,
|
||||
'user': 'metabase_user',
|
||||
'password': 'your_password',
|
||||
'database': 'analytics'
|
||||
}
|
||||
|
||||
excel_to_metabase('sales_data.xlsx', 'sales_report', db_config)
|
||||
```
|
||||
|
||||
### Metabase 常用查询模板
|
||||
|
||||
导入数据后,在 Metabase 中可使用以下查询:
|
||||
|
||||
```sql
|
||||
-- 数据概览
|
||||
SELECT * FROM imported_table LIMIT 100;
|
||||
|
||||
-- 按日期汇总
|
||||
SELECT DATE(created_at) as date, COUNT(*) as count, SUM(amount) as total
|
||||
FROM imported_table
|
||||
GROUP BY DATE(created_at)
|
||||
ORDER BY date;
|
||||
|
||||
-- 分类统计
|
||||
SELECT category, COUNT(*) as count, AVG(value) as avg_value
|
||||
FROM imported_table
|
||||
GROUP BY category
|
||||
ORDER BY count DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 常见场景
|
||||
|
||||
### 场景 1: 合并多个 Excel 文件
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
import glob
|
||||
|
||||
# 合并目录下所有 Excel 文件
|
||||
files = glob.glob('data/*.xlsx')
|
||||
df_list = [pd.read_excel(f) for f in files]
|
||||
df_merged = pd.concat(df_list, ignore_index=True)
|
||||
|
||||
df_merged.to_excel('merged.xlsx', index=False)
|
||||
print(f"已合并 {len(files)} 个文件,共 {len(df_merged)} 行")
|
||||
```
|
||||
|
||||
### 场景 2: 数据透视表
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_excel('sales.xlsx')
|
||||
|
||||
# 创建透视表
|
||||
pivot = pd.pivot_table(
|
||||
df,
|
||||
values='销售额',
|
||||
index='产品类别',
|
||||
columns='月份',
|
||||
aggfunc='sum',
|
||||
fill_value=0
|
||||
)
|
||||
|
||||
pivot.to_excel('pivot_report.xlsx')
|
||||
```
|
||||
|
||||
### 场景 3: 数据校验
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_excel('data.xlsx')
|
||||
|
||||
# 检查空值
|
||||
null_counts = df.isnull().sum()
|
||||
print("空值统计:\n", null_counts[null_counts > 0])
|
||||
|
||||
# 检查重复
|
||||
duplicates = df[df.duplicated()]
|
||||
print(f"重复行数: {len(duplicates)}")
|
||||
|
||||
# 数据类型检查
|
||||
print("数据类型:\n", df.dtypes)
|
||||
```
|
||||
|
||||
### 场景 4: 生成报表
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_excel('data.xlsx')
|
||||
|
||||
# 生成多 Sheet 报表
|
||||
with pd.ExcelWriter('report.xlsx') as writer:
|
||||
# 原始数据
|
||||
df.to_excel(writer, sheet_name='原始数据', index=False)
|
||||
|
||||
# 汇总数据
|
||||
summary = df.groupby('部门').agg({
|
||||
'销售额': 'sum',
|
||||
'订单数': 'count'
|
||||
}).reset_index()
|
||||
summary.to_excel(writer, sheet_name='部门汇总', index=False)
|
||||
|
||||
# 统计信息
|
||||
stats = df.describe()
|
||||
stats.to_excel(writer, sheet_name='统计信息')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 工作流程
|
||||
|
||||
```
|
||||
1. 用户上传 Excel 文件或提供路径
|
||||
2. 用自然语言描述需求(如"按月份汇总销售额")
|
||||
3. Claude 生成并执行 Python 代码
|
||||
4. 返回处理结果或导出文件
|
||||
5. 可选:导入到 BI 系统进行可视化
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 注意事项
|
||||
|
||||
- Excel 文件路径使用绝对路径更可靠
|
||||
- 中文 CSV 导出使用 `encoding='utf-8-sig'` 避免乱码
|
||||
- 大文件(>10万行)考虑分批处理
|
||||
- 导入数据库前先备份现有数据
|
||||
- 敏感数据注意脱敏处理
|
||||
Reference in New Issue
Block a user