NebulaGraph Flink Connector¶

NebulaGraph Flink Connector 是一款帮助 Flink 用户快速访问NebulaGraph的连接器，支持从NebulaGraph图数据库中读取数据，或者将其他外部数据源读取的数据写入NebulaGraph图数据库。

适用场景¶

NebulaGraph Flink Connector 适用于以下场景：

读取NebulaGraph数据进行分析计算。
分析计算完的数据写入NebulaGraph。
迁移数据。

更新说明¶

版本兼容性¶

NebulaGraph Flink Connector 和NebulaGraph内核版本对应关系如下。

Flink Connector 版本	NebulaGraph版本
3.0-SNAPSHOT	nightly
3.8.0	3.x.x
3.5.0	3.x.x
3.3.0	3.x.x
3.0.0	3.x.x
2.6.1	2.6.0、2.6.1
2.6.0	2.6.0、2.6.1
2.5.0	2.5.0、2.5.1
2.0.0	2.0.0、2.0.1

前提条件¶

已安装 Java 8 或更高版本。
已安装 Flink 1.11.x。

获取 NebulaGraph Flink Connector¶

设置 Maven 依赖¶

在 Maven 的配置文件pom.xml里添加以下依赖自动获取 Flink Connector.

<dependency>
    <groupId>com.vesoft</groupId>
    <artifactId>nebula-flink-connector</artifactId>
    <version>3.8.0</version>
</dependency>

编译打包¶

按照以下步骤自行编译打包 Flink Connector。

克隆仓库nebula-flink-connector。

$ git clone -b release-3.8 https://github.com/vesoft-inc/nebula-flink-connector.git

进入目录nebula-flink-connector。

编译打包。

$ mvn clean package -Dmaven.test.skip=true

编译完成后，在目录的文件夹connector/target下生成类似文件nebula-flink-connector-3.8.0.jar。

使用方法¶

向NebulaGraph写入数据¶

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
NebulaClientOptions nebulaClientOptions = new NebulaClientOptions.NebulaClientOptionsBuilder()
                .setGraphAddress("127.0.0.1:9669")
                .setMetaAddress("127.0.0.1:9559")
                .build();
NebulaGraphConnectionProvider graphConnectionProvider = new NebulaGraphConnectionProvider(nebulaClientOptions);
NebulaMetaConnectionProvider metaConnectionProvider = new NebulaMetaConnectionProvider(nebulaClientOptions);

VertexExecutionOptions executionOptions = new VertexExecutionOptions.ExecutionOptionBuilder()
                .setGraphSpace("flinkSink")
                .setTag("player")
                .setIdIndex(0)
                .setFields(Arrays.asList("name", "age"))
                .setPositions(Arrays.asList(1, 2))
                .setBatchSize(2)
                .build();

NebulaVertexBatchOutputFormat outputFormat = new NebulaVertexBatchOutputFormat(
                graphConnectionProvider, metaConnectionProvider, executionOptions);
NebulaSinkFunction<Row> nebulaSinkFunction = new NebulaSinkFunction<>(outputFormat);
DataStream<Row> dataStream = playerSource.map(row -> {
            Row record = new org.apache.flink.types.Row(row.size());
            for (int i = 0; i < row.size(); i++) {
                record.setField(i, row.get(i));
            }
            return record;
        });
dataStream.addSink(nebulaSinkFunction);
env.execute("write nebula")

从NebulaGraph读取数据¶

NebulaClientOptions nebulaClientOptions = new NebulaClientOptions.NebulaClientOptionsBuilder()
        .setMetaAddress("127.0.0.1:9559")
        .build();
storageConnectionProvider = new NebulaStorageConnectionProvider(nebulaClientOptions);
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);

VertexExecutionOptions vertexExecutionOptions = new VertexExecutionOptions.ExecutionOptionBuilder()
        .setGraphSpace("flinkSource")
        .setTag("person")
        .setNoColumn(false)
        .setFields(Arrays.asList())
        .setLimit(100)
        .build();

NebulaSourceFunction sourceFunction = new NebulaSourceFunction(storageConnectionProvider)
        .setExecutionOptions(vertexExecutionOptions);
DataStreamSource<BaseTableRow> dataStreamSource = env.addSource(sourceFunction);
dataStreamSource.map(row -> {
    List<ValueWrapper> values = row.getValues();
    Row record = new Row(15);
    record.setField(0, values.get(0).asLong());
    record.setField(1, values.get(1).asString());
    record.setField(2, values.get(2).asString());
    record.setField(3, values.get(3).asLong());
    record.setField(4, values.get(4).asLong());
    record.setField(5, values.get(5).asLong());
    record.setField(6, values.get(6).asLong());
    record.setField(7, values.get(7).asDate());
    record.setField(8, values.get(8).asDateTime().getUTCDateTimeStr());
    record.setField(9, values.get(9).asLong());
    record.setField(10, values.get(10).asBoolean());
    record.setField(11, values.get(11).asDouble());
    record.setField(12, values.get(12).asDouble());
    record.setField(13, values.get(13).asTime().getUTCTimeStr());
    record.setField(14, values.get(14).asGeography());
    return record;
}).print();
env.execute("NebulaStreamSource");

参数说明¶

NebulaClientOptions是连接NebulaGraph的配置，说明如下。

参数类型是否必须说明

setGraphAddress String 是 NebulaGraph Graph 服务地址。

setMetaAddress String 是 NebulaGraph Meta 服务地址。

VertexExecutionOptions是执行点读写的配置，说明如下。

参数	类型	是否必须	说明
`setGraphSpace`	String	是	图空间名称。
`setTag`	String	是	Tag 名称。
`setIdIndex`	Int	是	向NebulaGraph写入数据时作为 VID 的流数据字段下标。
`setFields`	List	是	Tag 的属性名集合。用于向NebulaGraph写入数据或从NebulaGraph读取数据。读取时需要确保`setNoColumn`为`false`，否则配置无效。读取时本参数为空，表示读取所有属性。
`setPositions`	List	是	流数据字段下标的集合。表示将对应的字段值作为属性值写入NebulaGraph。需要和`setFields`一一对应。
`setBatchSize`	String	否	每次写入NebulaGraph的最大数据记录条数。默认值为`2000`。
`setNoColumn`	String	否	读取数据时设置为`true`则不会读取属性。默认值为`false`。
`setLimit`	String	否	读取数据时每次拉取的最大数据记录条数。默认值为`2000`。

EdgeExecutionOptions是执行边读写的配置，说明如下。

参数	类型	是否必须	说明
`setGraphSpace`	String	是	图空间名称。
`setEdge`	String	是	Edge type 名称。
`setSrcIndex`	Int	是	向NebulaGraph写入数据时作为起始点 VID 的流数据字段下标。
`setDstIndex`	Int	是	向NebulaGraph写入数据时作为目的点 VID 的流数据字段下标。
`setRankIndex`	Int	是	向NebulaGraph写入数据时作为边的 Rank 的流数据字段下标。
`setFields`	List	是	Edge type 属性名集合。用于向NebulaGraph写入数据或从NebulaGraph读取数据。读取时需要确保`setNoColumn`为`false`，否则配置无效。读取时本参数为空，表示读取所有属性。
`setPositions`	List	是	流数据字段下标的集合。表示将对应的字段值作为属性值写入NebulaGraph。需要和`setFields`一一对应。
`setBatchSize`	String	否	每次写入NebulaGraph的最大数据记录条数。默认值为`2000`。
`setNoColumn`	String	否	读取数据时设置为`true`则不会读取属性。默认值为`false`。
`setLimit`	String	否	读取数据时每次拉取的最大数据记录条数。默认值为`2000`。

示例¶

创建图空间。

NebulaCatalog nebulaCatalog = NebulaCatalogUtils.createNebulaCatalog(
        "NebulaCatalog",
        "default",
        "root",
        "nebula",
        "127.0.0.1:9559",
        "127.0.0.1:9669");

EnvironmentSettings settings = EnvironmentSettings.newInstance()
        .inStreamingMode()
        .build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

tableEnv.registerCatalog(CATALOG_NAME, nebulaCatalog);
tableEnv.useCatalog(CATALOG_NAME);

String createDataBase = "CREATE DATABASE IF NOT EXISTS `db1`"
        + " COMMENT 'space 1'"
        + " WITH ("
        + " 'partition_num' = '100',"
        + " 'replica_factor' = '3',"
        + " 'vid_type' = 'FIXED_STRING(10)'"
        + ")";
tableEnv.executeSql(createDataBase);

创建 Tag。

tableEnvironment.executeSql("CREATE TABLE `person` ("
        + " vid BIGINT,"
        + " col1 STRING,"
        + " col2 STRING,"
        + " col3 BIGINT,"
        + " col4 BIGINT,"
        + " col5 BIGINT,"
        + " col6 BIGINT,"
        + " col7 DATE,"
        + " col8 TIMESTAMP,"
        + " col9 BIGINT,"
        + " col10 BOOLEAN,"
        + " col11 DOUBLE,"
        + " col12 DOUBLE,"
        + " col13 TIME,"
        + " col14 STRING"
        + ") WITH ("
        + " 'connector' = 'nebula',"
        + " 'meta-address' = '127.0.0.1:9559',"
        + " 'graph-address' = '127.0.0.1:9669',"
        + " 'username' = 'root',"
        + " 'password' = 'nebula',"
        + " 'data-type' = 'vertex',"
        + " 'graph-space' = 'flink_test',"
        + " 'label-name' = 'person'"
        + ")"
);

创建 Edge type。

tableEnvironment.executeSql("CREATE TABLE `friend` ("
        + " sid BIGINT,"
        + " did BIGINT,"
        + " rid BIGINT,"
        + " col1 STRING,"
        + " col2 STRING,"
        + " col3 BIGINT,"
        + " col4 BIGINT,"
        + " col5 BIGINT,"
        + " col6 BIGINT,"
        + " col7 DATE,"
        + " col8 TIMESTAMP,"
        + " col9 BIGINT,"
        + " col10 BOOLEAN,"
        + " col11 DOUBLE,"
        + " col12 DOUBLE,"
        + " col13 TIME,"
        + " col14 STRING"
        + ") WITH ("
        + " 'connector' = 'nebula',"
        + " 'meta-address' = '127.0.0.1:9559',"
        + " 'graph-address' = '127.0.0.1:9669',"
        + " 'username' = 'root',"
        + " 'password' = 'nebula',"
        + " 'graph-space' = 'flink_test',"
        + " 'label-name' = 'friend',"
        + " 'data-type'='edge',"
        + " 'src-id-index'='0',"
        + " 'dst-id-index'='1',"
        + " 'rank-id-index'='2'"
        + ")"
);

查询边数据并插入到另一个边类型中。

Table table = tableEnvironment.sqlQuery("SELECT * FROM `friend`");
table.executeInsert("`friend_sink`").await();

最后更新: May 9, 2024

参数	类型	是否必须	说明
`setGraphAddress`	String	是	NebulaGraph Graph 服务地址。
`setMetaAddress`	String	是	NebulaGraph Meta 服务地址。