陈氢,宋仕伟.数据治理视角下的湖仓一体架构研究[J].数字图书馆论坛,2023,(4):19~28 |
数据治理视角下的湖仓一体架构研究 |
Lakehouse Architecture Under Perspective of Data Governance |
投稿时间:2023-02-28 |
DOI:10.3772/j.issn.1673-2286.2023.04.003 |
中文关键词: 数据治理;数据湖;湖仓一体;数据共享 |
英文关键词: Data Governance; Data Lake; Lakehouse; Data Sharing |
基金项目: |
作者 | 单位 | 陈氢 | 湖北工业大学经济与管理学院;湖北循环经济发展研究中心 | 宋仕伟 | 湖北工业大学经济与管理学院 |
|
摘要点击次数: 1133 |
全文下载次数: 1677 |
中文摘要: |
海量分布异构数据给企业数据治理带来严重挑战,加速数据仓库和数据湖向结合二者功能的湖仓一体转变。通过比较数据仓库、数据湖和湖仓一体之间的差异性,分析湖仓一体的优势及其面临的挑战,再通过划分业务领域并映射到数据视角来构建分布式湖仓一体架构,综合已有研究和相关技术构建湖仓一体功能模块,并阐述动态流批一体数据流转过程。分布式湖仓一体架构包括数据领域解耦、跨领域数据共享、联合数据治理等构建理念;湖仓一体功能模块主要包括数据源、湖仓一体核心功能区和用户;流批一体数据流转过程包括批量数据过程和实时数据过程。本研究可为湖仓一体融入有效数据治理过程,构建较为完善的湖仓一体架构体系,从而为相关研究或企业提供参考。 |
英文摘要: |
The huge amount of distributed heterogeneous data brings serious challenges to enterprise data governance and accelerates the transformation of data warehouse and data lake to lakehouse which combines the functions of both. We compare the differences between data warehouse, data lake, and lakehouse, analyze the advantages of lakehouse and its challenges, build a distributed lakehouse architecture by dividing business areas and mapping themto data perspective, synthesize the existing research and related technologies to build a functional module of the lakehouse, and explain the flow batch all-inone data flow process. The distributed lakehouse architecture includes the concepts of data domain decoupling, cross-domain data sharing, and federated datagovernance. The lakehouse functional modules mainly include data sources, lakehouse core functional areas, and users. The flow batch all-in-one data flow process includes batch data process and real-time data process. This study integrates effective data governance process for lakehouse integration and builds amore complete architecture system for lakehouse integration to provide references for related research or enterprises. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |