跳转至

单证识别架构

单证识别现在是 Fusion run 输出的回顾与持久化边界。Fusion 子系统负责创建和执行 run;ai_service/document_recognition 负责把可识别的 Fusion structured output 持久化为单证识别 projection,并提供字段回顾、问题列表、摘要和下载元数据。

当前主链路

flowchart LR
  FusionRun["Fusion run completed"] --> Persist["PersistFusionDocumentRecognitionRunUseCase"]
  Persist --> Projection["Document recognition persisted projection"]
  Projection --> Detail["GET /document-recognition/runs/{run_id}"]
  Projection --> Reject["POST /document-recognition/runs/{run_id}/reject"]
  Reject --> ReviewEvent["Append document-level review event"]
  ReviewEvent --> Projection
  Projection --> Rerun["POST /document-recognition/runs/{run_id}/rerun"]
  Rerun --> TargetRun["Create new Fusion-backed run"]
  TargetRun --> RerunLink["Append source/target rerun link"]
  Detail --> Timeline["GET /document-recognition/runs/{run_id}/field-reviews/{field_id}/revisions"]
  Detail --> Review["PATCH /document-recognition/runs/{run_id}/field-reviews/{field_id}"]
  Review --> Ledger["Append revision ledger entry"]
  Ledger --> Recalculate["Recalculate review counters"]
  Recalculate --> Projection
  Projection --> Admin["Admin overview / runs"]

分层边界

domain/

  • 定义单证识别 run、field review、issue、summary 的投影模型。
  • 不依赖 FastAPI、SQLAlchemy、MinIO 或 Fusion runtime。

application/

  • use_cases/review_runs.py:把 Fusion run 输出持久化为单证识别投影,并处理字段回顾更新。
  • projections.py:从 Fusion structured output 合成 summary、issue、field review 和 preview payload。
  • ports/:repository / asset store 抽象。

infrastructure/

  • persistence/document_recognition_repository.py:查询 Fusion run,读写回顾投影。
  • persistence/legacy_document_extraction_job_bridge.py:临时复用历史 review storage 表,并把旧存储命名隔离在 infrastructure 层。
  • persistence/review_lineage_models.py:拥有 document-recognition rerun source/target 关系表和 document-level reject event 表。
  • storage/minio_document_asset_store.py:文档资产读写。

interfaces/http/

  • runs.py:公开的 /document-recognition/runs* run/review API。
  • admin.py:管理端 overview / runs。
  • serialization.py:把 persisted projection 转为 HTTP response。
  • HTTP 层不创建 runtime 作业,也不决定 runtime family。

Fusion 输出模型

Document Recognition 会读取 Fusion run 的 persisted inputs/outputs:

  • source input 决定 source_filenamesource_media_type 和 source object key。
  • structured JSON output 决定 document fields、summary 和 result object key。
  • governance context / failed status 会转成 validation issue。

当前支持两类 structured output:

  • fields[] / tables[] 风格 payload
  • canonical document_field_set named-field object payload

Review Persistence

字段 review 当前态仍复用既有 field-review row,但每次有效修改都会追加一条 append-only revision ledger。这样可以同时满足:

  • run projection 继续由当前 field-review row 驱动 review_statuscorrected_field_countworkspace_output
  • 前端按需读取单字段 timeline,而不必把完整历史塞进所有 run detail
  • 对没有 ledger 的历史旧 run 继续保留 baseline 快照,并显式标记为 unrecorded

Rerun Lineage Persistence

Rerun 不是原地覆盖 source run,而是创建新的 target run。source run 必须是 status=failedreview_status=rejected,后端会复用 source object、文件名、MIME type、runtime agent 和 input slot 上下文创建新 run。

低频 lineage 和 document-level reject audit 由 ai_service/document_recognition/infrastructure/persistence/review_lineage_models.py 拥有:

  • document_extraction_job_rerun_links 记录 source_run_id -> target_run_id
  • document_extraction_job_review_events 记录 rejected reason、actor 与状态迁移

普通 DocumentRecognitionRunResponse 会在 rerun target 上返回 rerun_source_run_id;rerun endpoint response 和 GET /document-recognition/runs?source_run_id=... 仍然是 source/target 关系的主要入口。

关键代码入口

路径 作用
ai_service/document_recognition/application/use_cases/review_runs.py Fusion run projection 与字段回顾用例
ai_service/document_recognition/application/projections.py summary / issue / field review 归一化
ai_service/document_recognition/infrastructure/persistence/document_recognition_repository.py SQLAlchemy 仓库适配器
ai_service/document_recognition/infrastructure/persistence/legacy_document_extraction_job_bridge.py 历史 review storage bridge
ai_service/document_recognition/infrastructure/persistence/review_lineage_models.py rerun lineage 与 document-level reject event 表
ai_service/document_recognition/interfaces/http/runs.py document-recognition run/review API
ai_service/document_recognition/interfaces/http/admin.py 管理端总览与运行列表

常见改动应该去哪里

想改 Fusion 输出识别规则

application/projections.py

想改字段回顾持久化规则

application/use_cases/review_runs.pyinfrastructure/persistence/document_recognition_repository.pyinfrastructure/persistence/review_lineage_models.py。既有 projection 与 field-review row 仍复用 storage/model_domains/jobs.py,新的 rerun lineage / reject event 不放在 storage domain 中。

想改 API 返回字段

interfaces/http/schemas.pyinterfaces/http/serialization.py