skills/ → skills-dev(9), skills-req(10), skills-ops(4), skills-integration(8), skills-biz(4), skills-workflow(7) generate-marketplace.py 改为自动扫描所有 skills-* 目录。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
949 lines
31 KiB
Markdown
949 lines
31 KiB
Markdown
---
|
||
name: finance
|
||
description: 财务对账技能。支持银行对账、应收对账、应付对账、内部往来对账等。通过自然语言实现账单导入、差异分析、对账报告生成。当用户提到对账、银行流水、应收应付、账单核对、财务核销相关任务时自动激活。
|
||
---
|
||
|
||
# 财务对账技能
|
||
|
||
## 功能概述
|
||
|
||
- **银行对账**: 银行流水 vs 系统收付款记录匹配
|
||
- **应收对账**: 客户对账单生成与核对
|
||
- **应付对账**: 供应商对账单核对
|
||
- **内部对账**: 公司间/部门间往来核对
|
||
- **差异分析**: 自动识别未达账项、差异原因
|
||
- **对账报告**: 生成标准化对账报告
|
||
- **扫描单据OCR**: PDF/图片扫描件识别,提取订单号与对账单匹配(适用于诉讼证据核对)
|
||
|
||
---
|
||
|
||
## 对账类型
|
||
|
||
### 1. 银行对账
|
||
|
||
| 对账项 | 说明 |
|
||
|--------|------|
|
||
| 银行已收/企业未收 | 银行已入账,系统未记录 |
|
||
| 企业已收/银行未收 | 系统已记录,银行未入账 |
|
||
| 银行已付/企业未付 | 银行已扣款,系统未记录 |
|
||
| 企业已付/银行未付 | 系统已记录,银行未扣款 |
|
||
|
||
### 2. 应收对账
|
||
|
||
- 客户欠款余额核对
|
||
- 发票与回款匹配
|
||
- 账龄分析
|
||
|
||
### 3. 应付对账
|
||
|
||
- 供应商应付余额核对
|
||
- 采购入库与付款匹配
|
||
- 预付款冲销
|
||
|
||
---
|
||
|
||
## 数据格式要求
|
||
|
||
### 银行流水格式
|
||
|
||
```csv
|
||
交易日期,交易类型,摘要,金额,余额,对方账户,对方名称,交易流水号
|
||
2026-01-15,收入,货款,50000.00,150000.00,622848******1234,深圳XX公司,TXN20260115001
|
||
2026-01-16,支出,采购款,-30000.00,120000.00,622848******5678,广州YY公司,TXN20260116001
|
||
```
|
||
|
||
### 系统收付款格式
|
||
|
||
```csv
|
||
单据日期,单据类型,单据编号,金额,客户/供应商,备注,关联流水号
|
||
2026-01-15,收款单,SK20260115001,50000.00,深圳XX公司,货款,TXN20260115001
|
||
2026-01-16,付款单,FK20260116001,30000.00,广州YY公司,采购款,
|
||
```
|
||
|
||
---
|
||
|
||
## 自然语言操作示例
|
||
|
||
### 银行对账
|
||
|
||
| 用户说 | 执行操作 |
|
||
|--------|----------|
|
||
| "导入银行流水" | 解析银行导出的 Excel/CSV |
|
||
| "和系统记录对账" | 自动匹配流水号/金额/日期 |
|
||
| "找出未达账项" | 识别双方差异记录 |
|
||
| "生成银行余额调节表" | 输出标准调节表 |
|
||
|
||
### 应收对账
|
||
|
||
| 用户说 | 执行操作 |
|
||
|--------|----------|
|
||
| "生成客户对账单" | 按客户汇总应收明细 |
|
||
| "核对XX公司的应收款" | 单客户对账 |
|
||
| "应收账龄分析" | 按账龄分段统计 |
|
||
| "哪些发票还没收到款" | 查询未核销发票 |
|
||
|
||
### 应付对账
|
||
|
||
| 用户说 | 执行操作 |
|
||
|--------|----------|
|
||
| "核对供应商对账单" | 导入并匹配供应商账单 |
|
||
| "本月应付汇总" | 按供应商统计应付 |
|
||
| "哪些采购单还没付款" | 查询未付款采购单 |
|
||
|
||
### 扫描单据OCR对账
|
||
|
||
| 用户说 | 执行操作 |
|
||
|--------|----------|
|
||
| "识别扫描件与对账单匹配" | OCR 提取 PDF 订单号并与 Excel 对账 |
|
||
| "承运商起诉了,核对证据" | 诉讼证据对账分析 |
|
||
| "PDF发票识别" | OCR 提取发票信息 |
|
||
| "找出对账单没有的订单" | 差异订单识别 |
|
||
|
||
---
|
||
|
||
## Python 代码模板
|
||
|
||
### 银行对账核心逻辑
|
||
|
||
```python
|
||
import pandas as pd
|
||
from datetime import datetime
|
||
|
||
def load_bank_statement(file_path: str) -> pd.DataFrame:
|
||
"""加载银行流水"""
|
||
df = pd.read_excel(file_path) if file_path.endswith('.xlsx') else pd.read_csv(file_path)
|
||
|
||
# 标准化列名
|
||
column_mapping = {
|
||
'交易日期': 'date',
|
||
'金额': 'amount',
|
||
'交易流水号': 'ref_no',
|
||
'对方名称': 'counterparty',
|
||
'摘要': 'memo'
|
||
}
|
||
df = df.rename(columns=column_mapping)
|
||
df['date'] = pd.to_datetime(df['date'])
|
||
df['source'] = 'bank'
|
||
|
||
return df
|
||
|
||
def load_system_records(file_path: str) -> pd.DataFrame:
|
||
"""加载系统收付款记录"""
|
||
df = pd.read_excel(file_path) if file_path.endswith('.xlsx') else pd.read_csv(file_path)
|
||
|
||
column_mapping = {
|
||
'单据日期': 'date',
|
||
'金额': 'amount',
|
||
'关联流水号': 'ref_no',
|
||
'客户/供应商': 'counterparty',
|
||
'单据编号': 'doc_no'
|
||
}
|
||
df = df.rename(columns=column_mapping)
|
||
df['date'] = pd.to_datetime(df['date'])
|
||
df['source'] = 'system'
|
||
|
||
return df
|
||
|
||
def reconcile(bank_df: pd.DataFrame, system_df: pd.DataFrame) -> dict:
|
||
"""
|
||
执行对账
|
||
|
||
Returns:
|
||
{
|
||
'matched': 已匹配记录,
|
||
'bank_only': 银行有/系统无 (银行未达),
|
||
'system_only': 系统有/银行无 (企业未达),
|
||
'summary': 对账汇总
|
||
}
|
||
"""
|
||
# 方法1: 按流水号精确匹配
|
||
bank_with_ref = bank_df[bank_df['ref_no'].notna() & (bank_df['ref_no'] != '')]
|
||
system_with_ref = system_df[system_df['ref_no'].notna() & (system_df['ref_no'] != '')]
|
||
|
||
matched_by_ref = pd.merge(
|
||
bank_with_ref,
|
||
system_with_ref,
|
||
on='ref_no',
|
||
suffixes=('_bank', '_system'),
|
||
how='inner'
|
||
)
|
||
|
||
matched_refs = set(matched_by_ref['ref_no'])
|
||
|
||
# 方法2: 无流水号的按金额+日期模糊匹配
|
||
bank_unmatched = bank_df[~bank_df['ref_no'].isin(matched_refs)]
|
||
system_unmatched = system_df[~system_df['ref_no'].isin(matched_refs)]
|
||
|
||
# 按金额和日期(±3天)匹配
|
||
fuzzy_matched = []
|
||
bank_remaining = bank_unmatched.copy()
|
||
system_remaining = system_unmatched.copy()
|
||
|
||
for idx, bank_row in bank_unmatched.iterrows():
|
||
for sys_idx, sys_row in system_remaining.iterrows():
|
||
if (abs(bank_row['amount'] - sys_row['amount']) < 0.01 and
|
||
abs((bank_row['date'] - sys_row['date']).days) <= 3):
|
||
fuzzy_matched.append({
|
||
'bank_idx': idx,
|
||
'system_idx': sys_idx,
|
||
'amount': bank_row['amount'],
|
||
'bank_date': bank_row['date'],
|
||
'system_date': sys_row['date']
|
||
})
|
||
system_remaining = system_remaining.drop(sys_idx)
|
||
bank_remaining = bank_remaining.drop(idx)
|
||
break
|
||
|
||
# 未匹配项
|
||
bank_only = bank_remaining # 银行有/系统无
|
||
system_only = system_remaining # 系统有/银行无
|
||
|
||
# 汇总
|
||
summary = {
|
||
'bank_total': bank_df['amount'].sum(),
|
||
'system_total': system_df['amount'].sum(),
|
||
'matched_count': len(matched_by_ref) + len(fuzzy_matched),
|
||
'bank_only_count': len(bank_only),
|
||
'bank_only_amount': bank_only['amount'].sum() if len(bank_only) > 0 else 0,
|
||
'system_only_count': len(system_only),
|
||
'system_only_amount': system_only['amount'].sum() if len(system_only) > 0 else 0,
|
||
'difference': bank_df['amount'].sum() - system_df['amount'].sum()
|
||
}
|
||
|
||
return {
|
||
'matched': pd.concat([matched_by_ref, pd.DataFrame(fuzzy_matched)]),
|
||
'bank_only': bank_only,
|
||
'system_only': system_only,
|
||
'summary': summary
|
||
}
|
||
```
|
||
|
||
### 银行余额调节表
|
||
|
||
```python
|
||
def generate_bank_reconciliation_report(
|
||
bank_balance: float,
|
||
book_balance: float,
|
||
bank_only: pd.DataFrame,
|
||
system_only: pd.DataFrame,
|
||
as_of_date: str
|
||
) -> str:
|
||
"""生成银行余额调节表"""
|
||
|
||
# 分类未达账项
|
||
bank_receipts_not_in_book = bank_only[bank_only['amount'] > 0]['amount'].sum()
|
||
bank_payments_not_in_book = bank_only[bank_only['amount'] < 0]['amount'].sum()
|
||
book_receipts_not_in_bank = system_only[system_only['amount'] > 0]['amount'].sum()
|
||
book_payments_not_in_bank = system_only[system_only['amount'] < 0]['amount'].sum()
|
||
|
||
report = f"""
|
||
================================================================================
|
||
银行存款余额调节表
|
||
截止日期: {as_of_date}
|
||
================================================================================
|
||
|
||
一、银行对账单余额 {bank_balance:>15,.2f}
|
||
|
||
加: 企业已收/银行未收 {book_receipts_not_in_bank:>15,.2f}
|
||
减: 企业已付/银行未付 {abs(book_payments_not_in_bank):>15,.2f}
|
||
─────────────────
|
||
调节后余额 {bank_balance + book_receipts_not_in_bank + book_payments_not_in_bank:>15,.2f}
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
二、企业账面余额 {book_balance:>15,.2f}
|
||
|
||
加: 银行已收/企业未收 {bank_receipts_not_in_book:>15,.2f}
|
||
减: 银行已付/企业未付 {abs(bank_payments_not_in_book):>15,.2f}
|
||
─────────────────
|
||
调节后余额 {book_balance + bank_receipts_not_in_book + bank_payments_not_in_book:>15,.2f}
|
||
|
||
================================================================================
|
||
未达账项明细
|
||
================================================================================
|
||
|
||
【银行已收/企业未收】
|
||
"""
|
||
|
||
if len(bank_only[bank_only['amount'] > 0]) > 0:
|
||
for _, row in bank_only[bank_only['amount'] > 0].iterrows():
|
||
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
|
||
else:
|
||
report += " (无)\n"
|
||
|
||
report += "\n【银行已付/企业未付】\n"
|
||
if len(bank_only[bank_only['amount'] < 0]) > 0:
|
||
for _, row in bank_only[bank_only['amount'] < 0].iterrows():
|
||
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
|
||
else:
|
||
report += " (无)\n"
|
||
|
||
report += "\n【企业已收/银行未收】\n"
|
||
if len(system_only[system_only['amount'] > 0]) > 0:
|
||
for _, row in system_only[system_only['amount'] > 0].iterrows():
|
||
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
|
||
else:
|
||
report += " (无)\n"
|
||
|
||
report += "\n【企业已付/银行未付】\n"
|
||
if len(system_only[system_only['amount'] < 0]) > 0:
|
||
for _, row in system_only[system_only['amount'] < 0].iterrows():
|
||
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
|
||
else:
|
||
report += " (无)\n"
|
||
|
||
return report
|
||
```
|
||
|
||
### 应收对账
|
||
|
||
```python
|
||
def generate_ar_statement(
|
||
customer_name: str,
|
||
transactions: pd.DataFrame,
|
||
as_of_date: str
|
||
) -> str:
|
||
"""
|
||
生成客户对账单
|
||
|
||
transactions 格式:
|
||
- date: 日期
|
||
- doc_type: 单据类型 (发票/收款/退货)
|
||
- doc_no: 单据编号
|
||
- debit: 借方(应收增加)
|
||
- credit: 贷方(应收减少)
|
||
- memo: 摘要
|
||
"""
|
||
|
||
df = transactions.sort_values('date')
|
||
|
||
report = f"""
|
||
================================================================================
|
||
客户对账单
|
||
================================================================================
|
||
客户名称: {customer_name}
|
||
对账期间: {df['date'].min().strftime('%Y-%m-%d')} 至 {as_of_date}
|
||
================================================================================
|
||
|
||
日期 单据类型 单据编号 借方 贷方 余额
|
||
--------------------------------------------------------------------------------
|
||
"""
|
||
|
||
balance = 0
|
||
for _, row in df.iterrows():
|
||
debit = row.get('debit', 0) or 0
|
||
credit = row.get('credit', 0) or 0
|
||
balance += debit - credit
|
||
|
||
report += f"{row['date'].strftime('%Y-%m-%d')} {row['doc_type']:<8} {row['doc_no']:<16} {debit:>12,.2f} {credit:>12,.2f} {balance:>12,.2f}\n"
|
||
|
||
report += f"""
|
||
--------------------------------------------------------------------------------
|
||
合计 {df['debit'].sum():>12,.2f} {df['credit'].sum():>12,.2f} {balance:>12,.2f}
|
||
================================================================================
|
||
|
||
期末应收余额: {balance:,.2f}
|
||
|
||
请贵司核对以上账目,如有异议请于收到对账单后 7 日内书面告知。
|
||
如无异议,视为确认。
|
||
|
||
对账联系人: _______________
|
||
联系电话: _______________
|
||
"""
|
||
|
||
return report
|
||
```
|
||
|
||
### 账龄分析
|
||
|
||
```python
|
||
from datetime import datetime, timedelta
|
||
|
||
def aging_analysis(
|
||
receivables: pd.DataFrame,
|
||
as_of_date: str,
|
||
aging_buckets: list = [30, 60, 90, 180, 365]
|
||
) -> pd.DataFrame:
|
||
"""
|
||
应收账龄分析
|
||
|
||
receivables 格式:
|
||
- customer: 客户
|
||
- invoice_date: 发票日期
|
||
- amount: 未收金额
|
||
"""
|
||
|
||
as_of = datetime.strptime(as_of_date, '%Y-%m-%d')
|
||
|
||
def get_aging_bucket(invoice_date):
|
||
days = (as_of - invoice_date).days
|
||
for i, bucket in enumerate(aging_buckets):
|
||
if days <= bucket:
|
||
if i == 0:
|
||
return f'0-{bucket}天'
|
||
else:
|
||
return f'{aging_buckets[i-1]+1}-{bucket}天'
|
||
return f'{aging_buckets[-1]+1}天以上'
|
||
|
||
df = receivables.copy()
|
||
df['invoice_date'] = pd.to_datetime(df['invoice_date'])
|
||
df['aging_days'] = (as_of - df['invoice_date']).dt.days
|
||
df['aging_bucket'] = df['invoice_date'].apply(get_aging_bucket)
|
||
|
||
# 按客户和账龄分组汇总
|
||
summary = df.pivot_table(
|
||
index='customer',
|
||
columns='aging_bucket',
|
||
values='amount',
|
||
aggfunc='sum',
|
||
fill_value=0
|
||
)
|
||
|
||
# 添加合计列
|
||
summary['合计'] = summary.sum(axis=1)
|
||
|
||
# 添加合计行
|
||
summary.loc['合计'] = summary.sum()
|
||
|
||
return summary
|
||
```
|
||
|
||
---
|
||
|
||
## 对账报告模板
|
||
|
||
### Excel 对账报告
|
||
|
||
```python
|
||
from openpyxl import Workbook
|
||
from openpyxl.styles import Font, Alignment, Border, Side, PatternFill
|
||
|
||
def export_reconciliation_to_excel(
|
||
result: dict,
|
||
output_path: str,
|
||
report_date: str
|
||
):
|
||
"""导出对账结果到 Excel"""
|
||
|
||
wb = Workbook()
|
||
|
||
# Sheet1: 对账汇总
|
||
ws_summary = wb.active
|
||
ws_summary.title = "对账汇总"
|
||
|
||
summary = result['summary']
|
||
ws_summary['A1'] = '对账汇总报告'
|
||
ws_summary['A1'].font = Font(bold=True, size=14)
|
||
ws_summary['A2'] = f'对账日期: {report_date}'
|
||
|
||
headers = ['项目', '金额', '笔数']
|
||
for col, header in enumerate(headers, 1):
|
||
ws_summary.cell(row=4, column=col, value=header)
|
||
|
||
data = [
|
||
('银行流水合计', summary['bank_total'], '-'),
|
||
('系统记录合计', summary['system_total'], '-'),
|
||
('已匹配', '-', summary['matched_count']),
|
||
('银行有/系统无', summary['bank_only_amount'], summary['bank_only_count']),
|
||
('系统有/银行无', summary['system_only_amount'], summary['system_only_count']),
|
||
('差异金额', summary['difference'], '-'),
|
||
]
|
||
|
||
for row, (item, amount, count) in enumerate(data, 5):
|
||
ws_summary.cell(row=row, column=1, value=item)
|
||
ws_summary.cell(row=row, column=2, value=amount)
|
||
ws_summary.cell(row=row, column=3, value=count)
|
||
|
||
# Sheet2: 银行未达账项
|
||
ws_bank = wb.create_sheet("银行未达账项")
|
||
if len(result['bank_only']) > 0:
|
||
for col, header in enumerate(result['bank_only'].columns, 1):
|
||
ws_bank.cell(row=1, column=col, value=header)
|
||
for row, data_row in enumerate(result['bank_only'].values, 2):
|
||
for col, value in enumerate(data_row, 1):
|
||
ws_bank.cell(row=row, column=col, value=value)
|
||
|
||
# Sheet3: 企业未达账项
|
||
ws_system = wb.create_sheet("企业未达账项")
|
||
if len(result['system_only']) > 0:
|
||
for col, header in enumerate(result['system_only'].columns, 1):
|
||
ws_system.cell(row=1, column=col, value=header)
|
||
for row, data_row in enumerate(result['system_only'].values, 2):
|
||
for col, value in enumerate(data_row, 1):
|
||
ws_system.cell(row=row, column=col, value=value)
|
||
|
||
wb.save(output_path)
|
||
print(f"对账报告已保存: {output_path}")
|
||
```
|
||
|
||
---
|
||
|
||
## 常见对账场景
|
||
|
||
### 场景1: 月末银行对账
|
||
|
||
```python
|
||
# 1. 加载数据
|
||
bank_df = load_bank_statement('银行流水_202601.xlsx')
|
||
system_df = load_system_records('收付款记录_202601.xlsx')
|
||
|
||
# 2. 执行对账
|
||
result = reconcile(bank_df, system_df)
|
||
|
||
# 3. 查看汇总
|
||
print(f"已匹配: {result['summary']['matched_count']} 笔")
|
||
print(f"银行未达: {result['summary']['bank_only_count']} 笔, 金额 {result['summary']['bank_only_amount']:,.2f}")
|
||
print(f"企业未达: {result['summary']['system_only_count']} 笔, 金额 {result['summary']['system_only_amount']:,.2f}")
|
||
|
||
# 4. 生成调节表
|
||
report = generate_bank_reconciliation_report(
|
||
bank_balance=150000.00, # 银行对账单余额
|
||
book_balance=145000.00, # 企业账面余额
|
||
bank_only=result['bank_only'],
|
||
system_only=result['system_only'],
|
||
as_of_date='2026-01-31'
|
||
)
|
||
print(report)
|
||
|
||
# 5. 导出 Excel
|
||
export_reconciliation_to_excel(result, '银行对账_202601.xlsx', '2026-01-31')
|
||
```
|
||
|
||
### 场景2: 客户对账
|
||
|
||
```python
|
||
# 加载客户交易明细
|
||
ar_transactions = pd.read_excel('应收明细.xlsx')
|
||
|
||
# 筛选特定客户
|
||
customer_data = ar_transactions[ar_transactions['customer'] == '深圳XX公司']
|
||
|
||
# 生成对账单
|
||
statement = generate_ar_statement(
|
||
customer_name='深圳XX公司',
|
||
transactions=customer_data,
|
||
as_of_date='2026-01-31'
|
||
)
|
||
print(statement)
|
||
```
|
||
|
||
### 场景3: 账龄分析
|
||
|
||
```python
|
||
# 加载未收款数据
|
||
receivables = pd.read_excel('应收账款.xlsx')
|
||
|
||
# 账龄分析
|
||
aging = aging_analysis(
|
||
receivables=receivables,
|
||
as_of_date='2026-01-31',
|
||
aging_buckets=[30, 60, 90, 180, 365]
|
||
)
|
||
print(aging)
|
||
|
||
# 导出
|
||
aging.to_excel('账龄分析_202601.xlsx')
|
||
```
|
||
|
||
### 场景4: 扫描单据OCR识别与对账匹配(诉讼证据核对)
|
||
|
||
适用于:承运商/供应商提供扫描件作为诉讼证据,需要与我司对账单进行匹配核对。
|
||
|
||
```python
|
||
from pdf2image import convert_from_path
|
||
import pytesseract
|
||
import pandas as pd
|
||
import re
|
||
|
||
def ocr_pdf_extract_orders(pdf_path: str, dpi: int = 150) -> pd.DataFrame:
|
||
"""
|
||
从 PDF 扫描件中 OCR 提取订单信息
|
||
|
||
Args:
|
||
pdf_path: PDF 文件路径
|
||
dpi: 图像分辨率,越高识别越准但速度越慢
|
||
|
||
Returns:
|
||
DataFrame 包含 order_no, page, tc_no 等字段
|
||
"""
|
||
print(f"正在转换 PDF...")
|
||
images = convert_from_path(pdf_path, dpi=dpi)
|
||
print(f"共 {len(images)} 页,开始 OCR 识别...")
|
||
|
||
all_records = []
|
||
for i, img in enumerate(images, 1):
|
||
if i % 10 == 0:
|
||
print(f" 处理进度: {i}/{len(images)}")
|
||
|
||
text = pytesseract.image_to_string(img, lang='chi_sim+eng')
|
||
|
||
# 提取多种订单号格式(根据实际业务调整)
|
||
hm_orders = re.findall(r'(HM\d{12,15})(?:-\d+)?', text) # 华住订单
|
||
hpo_orders = re.findall(r'(HPO-\d{8}-\d+)', text) # 华住采购订单
|
||
fy_orders = re.findall(r'(FY\d{9,})', text) # 凤悦订单
|
||
phgco_orders = re.findall(r'(PHGCO\d{9,})', text) # 锦江订单
|
||
tc_orders = re.findall(r'(TC\d{8,})', text) # 运单号
|
||
|
||
# 提取日期
|
||
dates = re.findall(r'(\d{4}-\d{2}-\d{2})', text)
|
||
|
||
all_orders = hm_orders + hpo_orders + fy_orders + phgco_orders
|
||
for order in all_orders:
|
||
all_records.append({
|
||
'order_no': order,
|
||
'page': i,
|
||
'tc_no': tc_orders[0] if tc_orders else None,
|
||
'date': dates[0] if dates else None,
|
||
'raw_text': text[:500] # 保留部分原文用于核查
|
||
})
|
||
|
||
df = pd.DataFrame(all_records)
|
||
df = df.drop_duplicates(subset=['order_no'])
|
||
print(f"提取完成,共 {len(df)} 条唯一订单")
|
||
return df
|
||
|
||
|
||
def extract_base_order_no(order_no: str) -> str:
|
||
"""
|
||
提取基础订单号(移除后缀如 -1, -2)
|
||
|
||
例如: HM202505227320-1 -> HM202505227320
|
||
"""
|
||
if pd.isna(order_no):
|
||
return None
|
||
order_str = str(order_no)
|
||
match = re.match(r'(HM\d{12,15}|HPO-\d{8}-\d+|FY\d{9,}|PHGCO\d{9,})', order_str)
|
||
return match.group(1) if match else order_str
|
||
|
||
|
||
def reconcile_scanned_with_excel(
|
||
scanned_df: pd.DataFrame,
|
||
excel_path: str,
|
||
main_sheet: str,
|
||
pending_sheet: str = None,
|
||
order_column: str = '华住/锦江订单号'
|
||
) -> dict:
|
||
"""
|
||
扫描单据与 Excel 对账单匹配
|
||
|
||
Args:
|
||
scanned_df: OCR 提取的订单 DataFrame
|
||
excel_path: Excel 对账单路径
|
||
main_sheet: 主数据 Sheet 名称
|
||
pending_sheet: 待结算/次月结 Sheet 名称(可选)
|
||
order_column: 订单号列名
|
||
|
||
Returns:
|
||
{
|
||
'matched_main': 在主表中匹配的订单,
|
||
'matched_pending': 在待结算表中匹配的订单,
|
||
'not_found': 未找到的订单,
|
||
'summary': 汇总统计
|
||
}
|
||
"""
|
||
# 加载主表
|
||
df_main = pd.read_excel(excel_path, sheet_name=main_sheet)
|
||
# 处理表头(如果第一行是表头)
|
||
if df_main.iloc[0].astype(str).str.contains(order_column).any():
|
||
df_main.columns = df_main.iloc[0]
|
||
df_main = df_main.iloc[1:].reset_index(drop=True)
|
||
|
||
# 提取基础订单号
|
||
main_orders = set(df_main[order_column].apply(extract_base_order_no).dropna())
|
||
|
||
# 加载待结算表(如有)
|
||
pending_orders = set()
|
||
df_pending = None
|
||
if pending_sheet:
|
||
try:
|
||
df_pending = pd.read_excel(excel_path, sheet_name=pending_sheet)
|
||
pending_orders = set(df_pending[order_column].apply(extract_base_order_no).dropna())
|
||
except:
|
||
pass
|
||
|
||
all_excel_orders = main_orders | pending_orders
|
||
|
||
# 扫描件订单号
|
||
scan_orders = set(scanned_df['order_no'].apply(extract_base_order_no).dropna())
|
||
|
||
# 匹配分析
|
||
matched_main = scan_orders & main_orders
|
||
matched_pending = scan_orders & pending_orders
|
||
not_found = scan_orders - all_excel_orders
|
||
|
||
# 汇总
|
||
summary = {
|
||
'scan_total': len(scan_orders),
|
||
'matched_main_count': len(matched_main),
|
||
'matched_pending_count': len(matched_pending),
|
||
'not_found_count': len(not_found),
|
||
'match_rate': (len(matched_main) + len(matched_pending)) / len(scan_orders) * 100 if scan_orders else 0
|
||
}
|
||
|
||
return {
|
||
'matched_main': matched_main,
|
||
'matched_pending': matched_pending,
|
||
'not_found': not_found,
|
||
'summary': summary,
|
||
'df_main': df_main,
|
||
'df_pending': df_pending
|
||
}
|
||
|
||
|
||
def generate_litigation_reconciliation_report(
|
||
result: dict,
|
||
scanned_df: pd.DataFrame,
|
||
output_path: str
|
||
):
|
||
"""
|
||
生成诉讼对账分析报告
|
||
"""
|
||
summary = result['summary']
|
||
|
||
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
|
||
# Sheet1: 汇总
|
||
summary_df = pd.DataFrame({
|
||
'项目': [
|
||
'扫描件订单数',
|
||
'在主对账单中找到',
|
||
'在待结算表中找到',
|
||
'完全未找到',
|
||
'匹配率'
|
||
],
|
||
'数值': [
|
||
summary['scan_total'],
|
||
summary['matched_main_count'],
|
||
summary['matched_pending_count'],
|
||
summary['not_found_count'],
|
||
f"{summary['match_rate']:.1f}%"
|
||
]
|
||
})
|
||
summary_df.to_excel(writer, sheet_name='对账汇总', index=False)
|
||
|
||
# Sheet2: 未找到的订单
|
||
if result['not_found']:
|
||
not_found_df = scanned_df[
|
||
scanned_df['order_no'].apply(extract_base_order_no).isin(result['not_found'])
|
||
][['order_no', 'page', 'tc_no', 'date']]
|
||
not_found_df.columns = ['订单号', 'PDF页码', '运单号', '日期']
|
||
not_found_df.to_excel(writer, sheet_name='未找到订单', index=False)
|
||
|
||
# Sheet3: 待结算订单详情
|
||
if result['matched_pending'] and result['df_pending'] is not None:
|
||
pending_detail = result['df_pending'][
|
||
result['df_pending']['华住/锦江订单号'].apply(extract_base_order_no).isin(result['matched_pending'])
|
||
]
|
||
pending_detail.to_excel(writer, sheet_name='待结算订单', index=False)
|
||
|
||
print(f"报告已保存: {output_path}")
|
||
```
|
||
|
||
#### 使用示例
|
||
|
||
```python
|
||
# 1. 从 PDF 扫描件提取订单
|
||
pdf_path = "承运商证据_6月业务.pdf"
|
||
scanned_df = ocr_pdf_extract_orders(pdf_path, dpi=150)
|
||
|
||
# 2. 与 Excel 对账单匹配
|
||
result = reconcile_scanned_with_excel(
|
||
scanned_df=scanned_df,
|
||
excel_path="对账单_2025年6月.xlsx",
|
||
main_sheet="2025.6",
|
||
pending_sheet="次月结",
|
||
order_column="华住/锦江订单号"
|
||
)
|
||
|
||
# 3. 查看匹配结果
|
||
print(f"匹配率: {result['summary']['match_rate']:.1f}%")
|
||
print(f"未找到: {result['summary']['not_found_count']} 条")
|
||
|
||
if result['not_found']:
|
||
print("\n未找到的订单:")
|
||
for order in result['not_found']:
|
||
print(f" {order}")
|
||
|
||
# 4. 生成报告
|
||
generate_litigation_reconciliation_report(
|
||
result=result,
|
||
scanned_df=scanned_df,
|
||
output_path="诉讼对账分析报告.xlsx"
|
||
)
|
||
```
|
||
|
||
#### OCR 识别注意事项
|
||
|
||
1. **环境依赖**:
|
||
```bash
|
||
# macOS
|
||
brew install tesseract tesseract-lang poppler
|
||
pip install pdf2image pytesseract pandas openpyxl
|
||
```
|
||
|
||
2. **识别精度**:
|
||
- DPI 150 适合快速扫描,DPI 200-250 适合精确识别
|
||
- 中文识别需要 `chi_sim` 语言包
|
||
|
||
3. **订单号格式差异**:
|
||
- PDF 中可能是 `HM202505227320`
|
||
- Excel 中可能是 `HM202505227320-1`、`HM202505227320-2`(带后缀)
|
||
- 匹配时应使用基础订单号
|
||
|
||
4. **常见问题**:
|
||
- 空白页(签收单背面)会被自动跳过
|
||
- OCR 可能误识别字符(如 5→S, 0→O),建议人工抽查
|
||
- 大 PDF 处理较慢,可先用低 DPI 预览
|
||
|
||
5. **"次月结"/"待结算" 订单**:
|
||
- 这类订单通常标记为"没有报价"、"未建单"等
|
||
- 虽然在系统中有记录,但可能未计入正式结算金额
|
||
- 诉讼中需特别关注这类订单的状态
|
||
|
||
6. **多承运商订单归属校验(重要风险点)**:
|
||
- 承运商可能将其他公司的订单混入证据,企图多收运费
|
||
- 必须检查PDF订单是否出现在其他承运商的对账单中
|
||
- 如果订单同时出现在多个承运商对账单中,则为异常订单
|
||
|
||
#### 多承运商订单归属校验
|
||
|
||
```python
|
||
def check_order_ownership(
|
||
scanned_df: pd.DataFrame,
|
||
my_company_excel: str,
|
||
other_carrier_excels: list[str],
|
||
order_column: str = '华住/锦江订单号'
|
||
) -> dict:
|
||
"""
|
||
检查订单归属,识别是否有其他承运商的订单被混入
|
||
|
||
Args:
|
||
scanned_df: OCR 提取的订单 DataFrame
|
||
my_company_excel: 我司对账单路径
|
||
other_carrier_excels: 其他承运商对账单路径列表
|
||
order_column: 订单号列名
|
||
|
||
Returns:
|
||
{
|
||
'my_orders': 属于我司的订单,
|
||
'other_carrier_orders': 属于其他承运商的订单(异常),
|
||
'disputed_orders': 同时出现在多处的争议订单,
|
||
'summary': 汇总
|
||
}
|
||
"""
|
||
# 提取PDF订单
|
||
scan_orders = set(scanned_df['order_no'].apply(extract_base_order_no).dropna())
|
||
|
||
# 加载我司对账单订单
|
||
my_orders = set()
|
||
try:
|
||
xl = pd.ExcelFile(my_company_excel)
|
||
for sheet in xl.sheet_names:
|
||
df = pd.read_excel(my_company_excel, sheet_name=sheet)
|
||
for col in df.columns:
|
||
if '订单' in str(col):
|
||
orders = df[col].apply(extract_base_order_no).dropna()
|
||
my_orders.update(orders)
|
||
except Exception as e:
|
||
print(f"加载我司对账单失败: {e}")
|
||
|
||
# 加载其他承运商对账单订单
|
||
other_orders = {} # {carrier_name: set of orders}
|
||
for excel_path in other_carrier_excels:
|
||
carrier_name = excel_path.split('/')[-1]
|
||
carrier_orders = set()
|
||
try:
|
||
xl = pd.ExcelFile(excel_path)
|
||
for sheet in xl.sheet_names:
|
||
df = pd.read_excel(excel_path, sheet_name=sheet)
|
||
for col in df.columns:
|
||
if '订单' in str(col):
|
||
orders = df[col].apply(extract_base_order_no).dropna()
|
||
carrier_orders.update(orders)
|
||
other_orders[carrier_name] = carrier_orders
|
||
except:
|
||
pass
|
||
|
||
# 分析订单归属
|
||
all_other_orders = set()
|
||
for orders in other_orders.values():
|
||
all_other_orders.update(orders)
|
||
|
||
# 分类
|
||
only_mine = scan_orders & my_orders - all_other_orders # 仅在我司
|
||
only_others = scan_orders & all_other_orders - my_orders # 仅在其他承运商(异常!)
|
||
in_both = scan_orders & my_orders & all_other_orders # 同时出现(争议)
|
||
nowhere = scan_orders - my_orders - all_other_orders # 哪里都没有
|
||
|
||
# 识别异常订单属于哪个承运商
|
||
abnormal_details = []
|
||
for order in only_others:
|
||
for carrier, orders in other_orders.items():
|
||
if order in orders:
|
||
page = scanned_df[scanned_df['order_no'].apply(extract_base_order_no) == order]['page'].values
|
||
abnormal_details.append({
|
||
'order_no': order,
|
||
'belongs_to': carrier,
|
||
'pdf_page': page[0] if len(page) > 0 else None
|
||
})
|
||
|
||
summary = {
|
||
'scan_total': len(scan_orders),
|
||
'my_orders_count': len(only_mine),
|
||
'other_carrier_count': len(only_others), # 异常订单数
|
||
'disputed_count': len(in_both),
|
||
'not_found_count': len(nowhere)
|
||
}
|
||
|
||
return {
|
||
'my_orders': only_mine,
|
||
'other_carrier_orders': only_others,
|
||
'abnormal_details': abnormal_details,
|
||
'disputed_orders': in_both,
|
||
'not_found': nowhere,
|
||
'summary': summary
|
||
}
|
||
```
|
||
|
||
#### 使用示例(多承运商校验)
|
||
|
||
```python
|
||
# 检查订单归属
|
||
result = check_order_ownership(
|
||
scanned_df=scanned_df,
|
||
my_company_excel="智慧云彩对账单_2025年6月.xlsx",
|
||
other_carrier_excels=[
|
||
"友利速运对账单_2025年6月.xlsx",
|
||
"其他承运商对账单.xlsx"
|
||
]
|
||
)
|
||
|
||
# 检查是否有异常订单
|
||
if result['summary']['other_carrier_count'] > 0:
|
||
print(f"⚠️ 发现 {result['summary']['other_carrier_count']} 个异常订单!")
|
||
print("这些订单属于其他承运商,不应由我司支付:")
|
||
for item in result['abnormal_details']:
|
||
print(f" {item['order_no']} -> 属于 {item['belongs_to']} (PDF第{item['pdf_page']}页)")
|
||
else:
|
||
print("✓ 未发现异常订单,所有订单归属正确")
|
||
```
|
||
|
||
---
|
||
|
||
## 注意事项
|
||
|
||
1. **数据格式统一**: 确保日期、金额格式一致后再对账
|
||
2. **匹配规则灵活**: 先精确匹配流水号,再模糊匹配金额+日期
|
||
3. **差异原因分析**: 对未达账项要逐笔核实原因
|
||
4. **定期对账**: 建议至少每月进行一次银行对账
|
||
5. **凭证留存**: 对账报告需打印签字存档
|
||
6. **OCR 核对**: 扫描件 OCR 后应人工抽查,确保识别准确
|
||
7. **多承运商校验**: 诉讼场景下必须检查订单是否属于其他承运商,防止对方混入他人订单骗取运费
|
||
|
||
---
|
||
|
||
## 与其他技能协作
|
||
|
||
- **data-excel**: 处理复杂的 Excel 数据转换
|
||
- **feishu**: 将对账报告保存到飞书云文档
|
||
- **siyuan**: 记录对账工作日志到思源笔记
|