最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

google cloud dataflow - What’s the difference between regular Apache Beam connectors and Managed IO? - Stack Overflow

programmeradmin2浏览0评论

Apache Beam recently introduced Managed I/O APIs for Java and Python. What is the difference between Managed I/O and the regular Apache Beam connectors (sources and sinks) ?

Apache Beam recently introduced Managed I/O APIs for Java and Python. What is the difference between Managed I/O and the regular Apache Beam connectors (sources and sinks) ?

Share Improve this question asked Mar 12 at 21:51 chamikarachamikara 2,0741 gold badge11 silver badges9 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 3

Managed I/O offers a simplified API for using Apache Beam connectors (sources and sinks). Managed I/O includes several key features.

Simplified API

Managed I/O offers a simplified Java and Python API for using Apache Bean connectors in your pipelines. For example, Java Iceberg source available via Managed I/O can be instantiated using the following syntax.

Map<String, Object> configMap = …
Managed.read(Managed.ICEBERG).withConfig(configMap)

Java Iceberg sink can be instantiated using the following syntax.

Managed.write(Managed.ICEBERG).withConfig(configMap)

Similarly, a Python Iceberg source can be instantiated using the following syntax.

config_map = {...}
managed.Read(managed.ICEBERG, config=iceberg_config)

This makes it really easy to switch from one connector to another and start using new Apache Beam connectors.

Please Note that, for established connectors such as Kafka and BigQuery, Managed I/O uses the same Apache Beam transforms underneath, hence offering the same, well battle-tested, connector implementation via the new simplified API.

Runner specific features

When running Apache Beam pipelines using Google Cloud Dataflow, Managed I/O provides additional features that makes using connectors more safe, performant, and better suited for Dataflow.

Dataflow automatically upgrades the Managed I/O connectors to the latest vetted version during job submission and streaming update via a replacement job. This makes sure that these key connectors always include the latest improvements and fixes during job execution, even if your overall pipeline is using an older version of Apache Beam.

Dataflow may change the behavior of connectors to better optimize for the Dataflow runtime. For example, when operating in at-least-once streaming mode Dataflow automatically switches the BigQuery sink to use the more performant Storage Write API at-least-once delivery semantics.

Please see the Dataflow documentation for more details about the Managed I/O features offered by Dataflow. The documentation also includes information specific to supported I/O connectors and example pipelines.

Connector Support

Managed I/O supports key Beam connectors Iceberg, Kafka, and BigQuery. To determine the latest set of supported connectors, please refer to the Apache Beam I/O Connectors listing. For additional information please reach out to the Apache Beam community or Google Cloud Dataflow support.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论