📝 Publications

📊 AI-Native Data Systems

SIGMOD 2025
sym

Doctopus: Budget-aware Structural Data Extraction from Documents
Yuanhao Zhong, Yuhao Deng, Chengliang Chai, et al.

Project | | Video Demo

  • Doctopus is a framework designed to accurately extract structured data from large-scale unstructured documents under cost constraints.
  • Impact: Doctopus improves accuracy by 11% under the same cost, or achieves a 2.7x cost reduction while maintaining precision.
VLDB 2025
sym

DocDB: A Database for Unstructured Document Analysis
Zequn Li*, Yuanhao Zhong*, Chengliang Chai, Zhaoze Sun, Ye Yuan, Lei Cao

Project | | Video Demo

  • DocDB is tailored for unstructured document analysis, enabling users to perform complex data filtering and joining via standard SQL queries.
  • Performance: DocDB significantly outperforms existing systems in query accuracy, execution latency, and Token consumption cost.

💡 Others