Files
John Qiu 712063071c refactor: 通用技能按类别拆分为独立目录
skills/ → skills-dev(9), skills-req(10), skills-ops(4),
skills-integration(8), skills-biz(4), skills-workflow(7)

generate-marketplace.py 改为自动扫描所有 skills-* 目录。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 11:31:58 +10:30

949 lines
31 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: finance
description: 财务对账技能。支持银行对账、应收对账、应付对账、内部往来对账等。通过自然语言实现账单导入、差异分析、对账报告生成。当用户提到对账、银行流水、应收应付、账单核对、财务核销相关任务时自动激活。
---
# 财务对账技能
## 功能概述
- **银行对账**: 银行流水 vs 系统收付款记录匹配
- **应收对账**: 客户对账单生成与核对
- **应付对账**: 供应商对账单核对
- **内部对账**: 公司间/部门间往来核对
- **差异分析**: 自动识别未达账项、差异原因
- **对账报告**: 生成标准化对账报告
- **扫描单据OCR**: PDF/图片扫描件识别,提取订单号与对账单匹配(适用于诉讼证据核对)
---
## 对账类型
### 1. 银行对账
| 对账项 | 说明 |
|--------|------|
| 银行已收/企业未收 | 银行已入账,系统未记录 |
| 企业已收/银行未收 | 系统已记录,银行未入账 |
| 银行已付/企业未付 | 银行已扣款,系统未记录 |
| 企业已付/银行未付 | 系统已记录,银行未扣款 |
### 2. 应收对账
- 客户欠款余额核对
- 发票与回款匹配
- 账龄分析
### 3. 应付对账
- 供应商应付余额核对
- 采购入库与付款匹配
- 预付款冲销
---
## 数据格式要求
### 银行流水格式
```csv
交易日期,交易类型,摘要,金额,余额,对方账户,对方名称,交易流水号
2026-01-15,收入,货款,50000.00,150000.00,622848******1234,深圳XX公司,TXN20260115001
2026-01-16,支出,采购款,-30000.00,120000.00,622848******5678,广州YY公司,TXN20260116001
```
### 系统收付款格式
```csv
单据日期,单据类型,单据编号,金额,客户/供应商,备注,关联流水号
2026-01-15,收款单,SK20260115001,50000.00,深圳XX公司,货款,TXN20260115001
2026-01-16,付款单,FK20260116001,30000.00,广州YY公司,采购款,
```
---
## 自然语言操作示例
### 银行对账
| 用户说 | 执行操作 |
|--------|----------|
| "导入银行流水" | 解析银行导出的 Excel/CSV |
| "和系统记录对账" | 自动匹配流水号/金额/日期 |
| "找出未达账项" | 识别双方差异记录 |
| "生成银行余额调节表" | 输出标准调节表 |
### 应收对账
| 用户说 | 执行操作 |
|--------|----------|
| "生成客户对账单" | 按客户汇总应收明细 |
| "核对XX公司的应收款" | 单客户对账 |
| "应收账龄分析" | 按账龄分段统计 |
| "哪些发票还没收到款" | 查询未核销发票 |
### 应付对账
| 用户说 | 执行操作 |
|--------|----------|
| "核对供应商对账单" | 导入并匹配供应商账单 |
| "本月应付汇总" | 按供应商统计应付 |
| "哪些采购单还没付款" | 查询未付款采购单 |
### 扫描单据OCR对账
| 用户说 | 执行操作 |
|--------|----------|
| "识别扫描件与对账单匹配" | OCR 提取 PDF 订单号并与 Excel 对账 |
| "承运商起诉了,核对证据" | 诉讼证据对账分析 |
| "PDF发票识别" | OCR 提取发票信息 |
| "找出对账单没有的订单" | 差异订单识别 |
---
## Python 代码模板
### 银行对账核心逻辑
```python
import pandas as pd
from datetime import datetime
def load_bank_statement(file_path: str) -> pd.DataFrame:
"""加载银行流水"""
df = pd.read_excel(file_path) if file_path.endswith('.xlsx') else pd.read_csv(file_path)
# 标准化列名
column_mapping = {
'交易日期': 'date',
'金额': 'amount',
'交易流水号': 'ref_no',
'对方名称': 'counterparty',
'摘要': 'memo'
}
df = df.rename(columns=column_mapping)
df['date'] = pd.to_datetime(df['date'])
df['source'] = 'bank'
return df
def load_system_records(file_path: str) -> pd.DataFrame:
"""加载系统收付款记录"""
df = pd.read_excel(file_path) if file_path.endswith('.xlsx') else pd.read_csv(file_path)
column_mapping = {
'单据日期': 'date',
'金额': 'amount',
'关联流水号': 'ref_no',
'客户/供应商': 'counterparty',
'单据编号': 'doc_no'
}
df = df.rename(columns=column_mapping)
df['date'] = pd.to_datetime(df['date'])
df['source'] = 'system'
return df
def reconcile(bank_df: pd.DataFrame, system_df: pd.DataFrame) -> dict:
"""
执行对账
Returns:
{
'matched': 已匹配记录,
'bank_only': 银行有/系统无 (银行未达),
'system_only': 系统有/银行无 (企业未达),
'summary': 对账汇总
}
"""
# 方法1: 按流水号精确匹配
bank_with_ref = bank_df[bank_df['ref_no'].notna() & (bank_df['ref_no'] != '')]
system_with_ref = system_df[system_df['ref_no'].notna() & (system_df['ref_no'] != '')]
matched_by_ref = pd.merge(
bank_with_ref,
system_with_ref,
on='ref_no',
suffixes=('_bank', '_system'),
how='inner'
)
matched_refs = set(matched_by_ref['ref_no'])
# 方法2: 无流水号的按金额+日期模糊匹配
bank_unmatched = bank_df[~bank_df['ref_no'].isin(matched_refs)]
system_unmatched = system_df[~system_df['ref_no'].isin(matched_refs)]
# 按金额和日期(±3天)匹配
fuzzy_matched = []
bank_remaining = bank_unmatched.copy()
system_remaining = system_unmatched.copy()
for idx, bank_row in bank_unmatched.iterrows():
for sys_idx, sys_row in system_remaining.iterrows():
if (abs(bank_row['amount'] - sys_row['amount']) < 0.01 and
abs((bank_row['date'] - sys_row['date']).days) <= 3):
fuzzy_matched.append({
'bank_idx': idx,
'system_idx': sys_idx,
'amount': bank_row['amount'],
'bank_date': bank_row['date'],
'system_date': sys_row['date']
})
system_remaining = system_remaining.drop(sys_idx)
bank_remaining = bank_remaining.drop(idx)
break
# 未匹配项
bank_only = bank_remaining # 银行有/系统无
system_only = system_remaining # 系统有/银行无
# 汇总
summary = {
'bank_total': bank_df['amount'].sum(),
'system_total': system_df['amount'].sum(),
'matched_count': len(matched_by_ref) + len(fuzzy_matched),
'bank_only_count': len(bank_only),
'bank_only_amount': bank_only['amount'].sum() if len(bank_only) > 0 else 0,
'system_only_count': len(system_only),
'system_only_amount': system_only['amount'].sum() if len(system_only) > 0 else 0,
'difference': bank_df['amount'].sum() - system_df['amount'].sum()
}
return {
'matched': pd.concat([matched_by_ref, pd.DataFrame(fuzzy_matched)]),
'bank_only': bank_only,
'system_only': system_only,
'summary': summary
}
```
### 银行余额调节表
```python
def generate_bank_reconciliation_report(
bank_balance: float,
book_balance: float,
bank_only: pd.DataFrame,
system_only: pd.DataFrame,
as_of_date: str
) -> str:
"""生成银行余额调节表"""
# 分类未达账项
bank_receipts_not_in_book = bank_only[bank_only['amount'] > 0]['amount'].sum()
bank_payments_not_in_book = bank_only[bank_only['amount'] < 0]['amount'].sum()
book_receipts_not_in_bank = system_only[system_only['amount'] > 0]['amount'].sum()
book_payments_not_in_bank = system_only[system_only['amount'] < 0]['amount'].sum()
report = f"""
================================================================================
银行存款余额调节表
截止日期: {as_of_date}
================================================================================
一、银行对账单余额 {bank_balance:>15,.2f}
加: 企业已收/银行未收 {book_receipts_not_in_bank:>15,.2f}
减: 企业已付/银行未付 {abs(book_payments_not_in_bank):>15,.2f}
─────────────────
调节后余额 {bank_balance + book_receipts_not_in_bank + book_payments_not_in_bank:>15,.2f}
--------------------------------------------------------------------------------
二、企业账面余额 {book_balance:>15,.2f}
加: 银行已收/企业未收 {bank_receipts_not_in_book:>15,.2f}
减: 银行已付/企业未付 {abs(bank_payments_not_in_book):>15,.2f}
─────────────────
调节后余额 {book_balance + bank_receipts_not_in_book + bank_payments_not_in_book:>15,.2f}
================================================================================
未达账项明细
================================================================================
【银行已收/企业未收】
"""
if len(bank_only[bank_only['amount'] > 0]) > 0:
for _, row in bank_only[bank_only['amount'] > 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
report += "\n【银行已付/企业未付】\n"
if len(bank_only[bank_only['amount'] < 0]) > 0:
for _, row in bank_only[bank_only['amount'] < 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
report += "\n【企业已收/银行未收】\n"
if len(system_only[system_only['amount'] > 0]) > 0:
for _, row in system_only[system_only['amount'] > 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
report += "\n【企业已付/银行未付】\n"
if len(system_only[system_only['amount'] < 0]) > 0:
for _, row in system_only[system_only['amount'] < 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
return report
```
### 应收对账
```python
def generate_ar_statement(
customer_name: str,
transactions: pd.DataFrame,
as_of_date: str
) -> str:
"""
生成客户对账单
transactions 格式:
- date: 日期
- doc_type: 单据类型 (发票/收款/退货)
- doc_no: 单据编号
- debit: 借方(应收增加)
- credit: 贷方(应收减少)
- memo: 摘要
"""
df = transactions.sort_values('date')
report = f"""
================================================================================
客户对账单
================================================================================
客户名称: {customer_name}
对账期间: {df['date'].min().strftime('%Y-%m-%d')}{as_of_date}
================================================================================
日期 单据类型 单据编号 借方 贷方 余额
--------------------------------------------------------------------------------
"""
balance = 0
for _, row in df.iterrows():
debit = row.get('debit', 0) or 0
credit = row.get('credit', 0) or 0
balance += debit - credit
report += f"{row['date'].strftime('%Y-%m-%d')} {row['doc_type']:<8} {row['doc_no']:<16} {debit:>12,.2f} {credit:>12,.2f} {balance:>12,.2f}\n"
report += f"""
--------------------------------------------------------------------------------
合计 {df['debit'].sum():>12,.2f} {df['credit'].sum():>12,.2f} {balance:>12,.2f}
================================================================================
期末应收余额: {balance:,.2f}
请贵司核对以上账目,如有异议请于收到对账单后 7 日内书面告知。
如无异议,视为确认。
对账联系人: _______________
联系电话: _______________
"""
return report
```
### 账龄分析
```python
from datetime import datetime, timedelta
def aging_analysis(
receivables: pd.DataFrame,
as_of_date: str,
aging_buckets: list = [30, 60, 90, 180, 365]
) -> pd.DataFrame:
"""
应收账龄分析
receivables 格式:
- customer: 客户
- invoice_date: 发票日期
- amount: 未收金额
"""
as_of = datetime.strptime(as_of_date, '%Y-%m-%d')
def get_aging_bucket(invoice_date):
days = (as_of - invoice_date).days
for i, bucket in enumerate(aging_buckets):
if days <= bucket:
if i == 0:
return f'0-{bucket}'
else:
return f'{aging_buckets[i-1]+1}-{bucket}'
return f'{aging_buckets[-1]+1}天以上'
df = receivables.copy()
df['invoice_date'] = pd.to_datetime(df['invoice_date'])
df['aging_days'] = (as_of - df['invoice_date']).dt.days
df['aging_bucket'] = df['invoice_date'].apply(get_aging_bucket)
# 按客户和账龄分组汇总
summary = df.pivot_table(
index='customer',
columns='aging_bucket',
values='amount',
aggfunc='sum',
fill_value=0
)
# 添加合计列
summary['合计'] = summary.sum(axis=1)
# 添加合计行
summary.loc['合计'] = summary.sum()
return summary
```
---
## 对账报告模板
### Excel 对账报告
```python
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, Border, Side, PatternFill
def export_reconciliation_to_excel(
result: dict,
output_path: str,
report_date: str
):
"""导出对账结果到 Excel"""
wb = Workbook()
# Sheet1: 对账汇总
ws_summary = wb.active
ws_summary.title = "对账汇总"
summary = result['summary']
ws_summary['A1'] = '对账汇总报告'
ws_summary['A1'].font = Font(bold=True, size=14)
ws_summary['A2'] = f'对账日期: {report_date}'
headers = ['项目', '金额', '笔数']
for col, header in enumerate(headers, 1):
ws_summary.cell(row=4, column=col, value=header)
data = [
('银行流水合计', summary['bank_total'], '-'),
('系统记录合计', summary['system_total'], '-'),
('已匹配', '-', summary['matched_count']),
('银行有/系统无', summary['bank_only_amount'], summary['bank_only_count']),
('系统有/银行无', summary['system_only_amount'], summary['system_only_count']),
('差异金额', summary['difference'], '-'),
]
for row, (item, amount, count) in enumerate(data, 5):
ws_summary.cell(row=row, column=1, value=item)
ws_summary.cell(row=row, column=2, value=amount)
ws_summary.cell(row=row, column=3, value=count)
# Sheet2: 银行未达账项
ws_bank = wb.create_sheet("银行未达账项")
if len(result['bank_only']) > 0:
for col, header in enumerate(result['bank_only'].columns, 1):
ws_bank.cell(row=1, column=col, value=header)
for row, data_row in enumerate(result['bank_only'].values, 2):
for col, value in enumerate(data_row, 1):
ws_bank.cell(row=row, column=col, value=value)
# Sheet3: 企业未达账项
ws_system = wb.create_sheet("企业未达账项")
if len(result['system_only']) > 0:
for col, header in enumerate(result['system_only'].columns, 1):
ws_system.cell(row=1, column=col, value=header)
for row, data_row in enumerate(result['system_only'].values, 2):
for col, value in enumerate(data_row, 1):
ws_system.cell(row=row, column=col, value=value)
wb.save(output_path)
print(f"对账报告已保存: {output_path}")
```
---
## 常见对账场景
### 场景1: 月末银行对账
```python
# 1. 加载数据
bank_df = load_bank_statement('银行流水_202601.xlsx')
system_df = load_system_records('收付款记录_202601.xlsx')
# 2. 执行对账
result = reconcile(bank_df, system_df)
# 3. 查看汇总
print(f"已匹配: {result['summary']['matched_count']}")
print(f"银行未达: {result['summary']['bank_only_count']} 笔, 金额 {result['summary']['bank_only_amount']:,.2f}")
print(f"企业未达: {result['summary']['system_only_count']} 笔, 金额 {result['summary']['system_only_amount']:,.2f}")
# 4. 生成调节表
report = generate_bank_reconciliation_report(
bank_balance=150000.00, # 银行对账单余额
book_balance=145000.00, # 企业账面余额
bank_only=result['bank_only'],
system_only=result['system_only'],
as_of_date='2026-01-31'
)
print(report)
# 5. 导出 Excel
export_reconciliation_to_excel(result, '银行对账_202601.xlsx', '2026-01-31')
```
### 场景2: 客户对账
```python
# 加载客户交易明细
ar_transactions = pd.read_excel('应收明细.xlsx')
# 筛选特定客户
customer_data = ar_transactions[ar_transactions['customer'] == '深圳XX公司']
# 生成对账单
statement = generate_ar_statement(
customer_name='深圳XX公司',
transactions=customer_data,
as_of_date='2026-01-31'
)
print(statement)
```
### 场景3: 账龄分析
```python
# 加载未收款数据
receivables = pd.read_excel('应收账款.xlsx')
# 账龄分析
aging = aging_analysis(
receivables=receivables,
as_of_date='2026-01-31',
aging_buckets=[30, 60, 90, 180, 365]
)
print(aging)
# 导出
aging.to_excel('账龄分析_202601.xlsx')
```
### 场景4: 扫描单据OCR识别与对账匹配诉讼证据核对
适用于:承运商/供应商提供扫描件作为诉讼证据,需要与我司对账单进行匹配核对。
```python
from pdf2image import convert_from_path
import pytesseract
import pandas as pd
import re
def ocr_pdf_extract_orders(pdf_path: str, dpi: int = 150) -> pd.DataFrame:
"""
从 PDF 扫描件中 OCR 提取订单信息
Args:
pdf_path: PDF 文件路径
dpi: 图像分辨率,越高识别越准但速度越慢
Returns:
DataFrame 包含 order_no, page, tc_no 等字段
"""
print(f"正在转换 PDF...")
images = convert_from_path(pdf_path, dpi=dpi)
print(f"{len(images)} 页,开始 OCR 识别...")
all_records = []
for i, img in enumerate(images, 1):
if i % 10 == 0:
print(f" 处理进度: {i}/{len(images)}")
text = pytesseract.image_to_string(img, lang='chi_sim+eng')
# 提取多种订单号格式(根据实际业务调整)
hm_orders = re.findall(r'(HM\d{12,15})(?:-\d+)?', text) # 华住订单
hpo_orders = re.findall(r'(HPO-\d{8}-\d+)', text) # 华住采购订单
fy_orders = re.findall(r'(FY\d{9,})', text) # 凤悦订单
phgco_orders = re.findall(r'(PHGCO\d{9,})', text) # 锦江订单
tc_orders = re.findall(r'(TC\d{8,})', text) # 运单号
# 提取日期
dates = re.findall(r'(\d{4}-\d{2}-\d{2})', text)
all_orders = hm_orders + hpo_orders + fy_orders + phgco_orders
for order in all_orders:
all_records.append({
'order_no': order,
'page': i,
'tc_no': tc_orders[0] if tc_orders else None,
'date': dates[0] if dates else None,
'raw_text': text[:500] # 保留部分原文用于核查
})
df = pd.DataFrame(all_records)
df = df.drop_duplicates(subset=['order_no'])
print(f"提取完成,共 {len(df)} 条唯一订单")
return df
def extract_base_order_no(order_no: str) -> str:
"""
提取基础订单号(移除后缀如 -1, -2
例如: HM202505227320-1 -> HM202505227320
"""
if pd.isna(order_no):
return None
order_str = str(order_no)
match = re.match(r'(HM\d{12,15}|HPO-\d{8}-\d+|FY\d{9,}|PHGCO\d{9,})', order_str)
return match.group(1) if match else order_str
def reconcile_scanned_with_excel(
scanned_df: pd.DataFrame,
excel_path: str,
main_sheet: str,
pending_sheet: str = None,
order_column: str = '华住/锦江订单号'
) -> dict:
"""
扫描单据与 Excel 对账单匹配
Args:
scanned_df: OCR 提取的订单 DataFrame
excel_path: Excel 对账单路径
main_sheet: 主数据 Sheet 名称
pending_sheet: 待结算/次月结 Sheet 名称(可选)
order_column: 订单号列名
Returns:
{
'matched_main': 在主表中匹配的订单,
'matched_pending': 在待结算表中匹配的订单,
'not_found': 未找到的订单,
'summary': 汇总统计
}
"""
# 加载主表
df_main = pd.read_excel(excel_path, sheet_name=main_sheet)
# 处理表头(如果第一行是表头)
if df_main.iloc[0].astype(str).str.contains(order_column).any():
df_main.columns = df_main.iloc[0]
df_main = df_main.iloc[1:].reset_index(drop=True)
# 提取基础订单号
main_orders = set(df_main[order_column].apply(extract_base_order_no).dropna())
# 加载待结算表(如有)
pending_orders = set()
df_pending = None
if pending_sheet:
try:
df_pending = pd.read_excel(excel_path, sheet_name=pending_sheet)
pending_orders = set(df_pending[order_column].apply(extract_base_order_no).dropna())
except:
pass
all_excel_orders = main_orders | pending_orders
# 扫描件订单号
scan_orders = set(scanned_df['order_no'].apply(extract_base_order_no).dropna())
# 匹配分析
matched_main = scan_orders & main_orders
matched_pending = scan_orders & pending_orders
not_found = scan_orders - all_excel_orders
# 汇总
summary = {
'scan_total': len(scan_orders),
'matched_main_count': len(matched_main),
'matched_pending_count': len(matched_pending),
'not_found_count': len(not_found),
'match_rate': (len(matched_main) + len(matched_pending)) / len(scan_orders) * 100 if scan_orders else 0
}
return {
'matched_main': matched_main,
'matched_pending': matched_pending,
'not_found': not_found,
'summary': summary,
'df_main': df_main,
'df_pending': df_pending
}
def generate_litigation_reconciliation_report(
result: dict,
scanned_df: pd.DataFrame,
output_path: str
):
"""
生成诉讼对账分析报告
"""
summary = result['summary']
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
# Sheet1: 汇总
summary_df = pd.DataFrame({
'项目': [
'扫描件订单数',
'在主对账单中找到',
'在待结算表中找到',
'完全未找到',
'匹配率'
],
'数值': [
summary['scan_total'],
summary['matched_main_count'],
summary['matched_pending_count'],
summary['not_found_count'],
f"{summary['match_rate']:.1f}%"
]
})
summary_df.to_excel(writer, sheet_name='对账汇总', index=False)
# Sheet2: 未找到的订单
if result['not_found']:
not_found_df = scanned_df[
scanned_df['order_no'].apply(extract_base_order_no).isin(result['not_found'])
][['order_no', 'page', 'tc_no', 'date']]
not_found_df.columns = ['订单号', 'PDF页码', '运单号', '日期']
not_found_df.to_excel(writer, sheet_name='未找到订单', index=False)
# Sheet3: 待结算订单详情
if result['matched_pending'] and result['df_pending'] is not None:
pending_detail = result['df_pending'][
result['df_pending']['华住/锦江订单号'].apply(extract_base_order_no).isin(result['matched_pending'])
]
pending_detail.to_excel(writer, sheet_name='待结算订单', index=False)
print(f"报告已保存: {output_path}")
```
#### 使用示例
```python
# 1. 从 PDF 扫描件提取订单
pdf_path = "承运商证据_6月业务.pdf"
scanned_df = ocr_pdf_extract_orders(pdf_path, dpi=150)
# 2. 与 Excel 对账单匹配
result = reconcile_scanned_with_excel(
scanned_df=scanned_df,
excel_path="对账单_2025年6月.xlsx",
main_sheet="2025.6",
pending_sheet="次月结",
order_column="华住/锦江订单号"
)
# 3. 查看匹配结果
print(f"匹配率: {result['summary']['match_rate']:.1f}%")
print(f"未找到: {result['summary']['not_found_count']}")
if result['not_found']:
print("\n未找到的订单:")
for order in result['not_found']:
print(f" {order}")
# 4. 生成报告
generate_litigation_reconciliation_report(
result=result,
scanned_df=scanned_df,
output_path="诉讼对账分析报告.xlsx"
)
```
#### OCR 识别注意事项
1. **环境依赖**:
```bash
# macOS
brew install tesseract tesseract-lang poppler
pip install pdf2image pytesseract pandas openpyxl
```
2. **识别精度**:
- DPI 150 适合快速扫描DPI 200-250 适合精确识别
- 中文识别需要 `chi_sim` 语言包
3. **订单号格式差异**:
- PDF 中可能是 `HM202505227320`
- Excel 中可能是 `HM202505227320-1`、`HM202505227320-2`(带后缀)
- 匹配时应使用基础订单号
4. **常见问题**:
- 空白页(签收单背面)会被自动跳过
- OCR 可能误识别字符(如 5→S, 0→O建议人工抽查
- 大 PDF 处理较慢,可先用低 DPI 预览
5. **"次月结"/"待结算" 订单**:
- 这类订单通常标记为"没有报价"、"未建单"等
- 虽然在系统中有记录,但可能未计入正式结算金额
- 诉讼中需特别关注这类订单的状态
6. **多承运商订单归属校验(重要风险点)**:
- 承运商可能将其他公司的订单混入证据,企图多收运费
- 必须检查PDF订单是否出现在其他承运商的对账单中
- 如果订单同时出现在多个承运商对账单中,则为异常订单
#### 多承运商订单归属校验
```python
def check_order_ownership(
scanned_df: pd.DataFrame,
my_company_excel: str,
other_carrier_excels: list[str],
order_column: str = '华住/锦江订单号'
) -> dict:
"""
检查订单归属,识别是否有其他承运商的订单被混入
Args:
scanned_df: OCR 提取的订单 DataFrame
my_company_excel: 我司对账单路径
other_carrier_excels: 其他承运商对账单路径列表
order_column: 订单号列名
Returns:
{
'my_orders': 属于我司的订单,
'other_carrier_orders': 属于其他承运商的订单(异常),
'disputed_orders': 同时出现在多处的争议订单,
'summary': 汇总
}
"""
# 提取PDF订单
scan_orders = set(scanned_df['order_no'].apply(extract_base_order_no).dropna())
# 加载我司对账单订单
my_orders = set()
try:
xl = pd.ExcelFile(my_company_excel)
for sheet in xl.sheet_names:
df = pd.read_excel(my_company_excel, sheet_name=sheet)
for col in df.columns:
if '订单' in str(col):
orders = df[col].apply(extract_base_order_no).dropna()
my_orders.update(orders)
except Exception as e:
print(f"加载我司对账单失败: {e}")
# 加载其他承运商对账单订单
other_orders = {} # {carrier_name: set of orders}
for excel_path in other_carrier_excels:
carrier_name = excel_path.split('/')[-1]
carrier_orders = set()
try:
xl = pd.ExcelFile(excel_path)
for sheet in xl.sheet_names:
df = pd.read_excel(excel_path, sheet_name=sheet)
for col in df.columns:
if '订单' in str(col):
orders = df[col].apply(extract_base_order_no).dropna()
carrier_orders.update(orders)
other_orders[carrier_name] = carrier_orders
except:
pass
# 分析订单归属
all_other_orders = set()
for orders in other_orders.values():
all_other_orders.update(orders)
# 分类
only_mine = scan_orders & my_orders - all_other_orders # 仅在我司
only_others = scan_orders & all_other_orders - my_orders # 仅在其他承运商(异常!)
in_both = scan_orders & my_orders & all_other_orders # 同时出现(争议)
nowhere = scan_orders - my_orders - all_other_orders # 哪里都没有
# 识别异常订单属于哪个承运商
abnormal_details = []
for order in only_others:
for carrier, orders in other_orders.items():
if order in orders:
page = scanned_df[scanned_df['order_no'].apply(extract_base_order_no) == order]['page'].values
abnormal_details.append({
'order_no': order,
'belongs_to': carrier,
'pdf_page': page[0] if len(page) > 0 else None
})
summary = {
'scan_total': len(scan_orders),
'my_orders_count': len(only_mine),
'other_carrier_count': len(only_others), # 异常订单数
'disputed_count': len(in_both),
'not_found_count': len(nowhere)
}
return {
'my_orders': only_mine,
'other_carrier_orders': only_others,
'abnormal_details': abnormal_details,
'disputed_orders': in_both,
'not_found': nowhere,
'summary': summary
}
```
#### 使用示例(多承运商校验)
```python
# 检查订单归属
result = check_order_ownership(
scanned_df=scanned_df,
my_company_excel="智慧云彩对账单_2025年6月.xlsx",
other_carrier_excels=[
"友利速运对账单_2025年6月.xlsx",
"其他承运商对账单.xlsx"
]
)
# 检查是否有异常订单
if result['summary']['other_carrier_count'] > 0:
print(f"⚠️ 发现 {result['summary']['other_carrier_count']} 个异常订单!")
print("这些订单属于其他承运商,不应由我司支付:")
for item in result['abnormal_details']:
print(f" {item['order_no']} -> 属于 {item['belongs_to']} (PDF第{item['pdf_page']}页)")
else:
print("✓ 未发现异常订单,所有订单归属正确")
```
---
## 注意事项
1. **数据格式统一**: 确保日期、金额格式一致后再对账
2. **匹配规则灵活**: 先精确匹配流水号,再模糊匹配金额+日期
3. **差异原因分析**: 对未达账项要逐笔核实原因
4. **定期对账**: 建议至少每月进行一次银行对账
5. **凭证留存**: 对账报告需打印签字存档
6. **OCR 核对**: 扫描件 OCR 后应人工抽查,确保识别准确
7. **多承运商校验**: 诉讼场景下必须检查订单是否属于其他承运商,防止对方混入他人订单骗取运费
---
## 与其他技能协作
- **data-excel**: 处理复杂的 Excel 数据转换
- **feishu**: 将对账报告保存到飞书云文档
- **siyuan**: 记录对账工作日志到思源笔记