跳转至

全文搜索

全文搜索是基于全文索引对值为字符串类型的属性进行前缀搜索、通配符搜索、正则表达式搜索和模糊搜索。

LOOKUP语句中,使用WHERE子句指定字符串的搜索条件。

前提条件

请确保已经部署全文索引。详情请参见部署全文索引部署 listener

注意事项

使用全文索引前,请确认已经了解全文索引的使用限制

自然语言全文搜索

自然语言搜索将搜索的字符串解释为自然人类语言中的短语。搜索区分大小写,且默认是对字符串进行前缀匹配。例如,有三个点属于标签player,标签player含有属性name,这三个点的name分别为Kevin DurantTim DuncanDavid Beckham。现在已经建立好有关player.name的全文索引,在用全文索引前缀搜索语句LOOKUP ON player WHERE PREFIX(player.name,"D");查询时,只有David Beckham会被查询到。

语法

创建全文索引

CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} ([<prop_name>]);

显示全文索引

SHOW FULLTEXT INDEXES;

重建全文索引

REBUILD FULLTEXT INDEX;

Caution

数据量大时,重建全文索引速度较慢,可以修改 Storage 服务的配置文件(nebula-storaged.conf)中snapshot_send_files=false

删除全文索引

DROP FULLTEXT INDEX <index_name>;

使用查询选项

LOOKUP ON {<tag> | <edge_type>} WHERE <expression> [YIELD <return_list>];

<expression> ::=
    PREFIX | WILDCARD | REGEXP | FUZZY

<return_list>
    <prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...]
  • PREFIX(schema_name.prop_name, prefix_string, row_limit, timeout)
  • WILDCARD(schema_name.prop_name, wildcard_string, row_limit, timeout)
  • REGEXP(schema_name.prop_name, regexp_string, row_limit, timeout)
  • FUZZY(schema_name.prop_name, fuzzy_string, fuzziness, operator, row_limit, timeout)

    • fuzziness:可选项。允许匹配的最大编辑距离。默认值为AUTO。查看其他可选值和更多信息,请参见 Elasticsearch 官方文档
    • operator:可选项。解释文本的布尔逻辑。可选值为OR(默认)和and
  • row_limit:可选项。指定要返回的行数。默认值为100
  • timeout:可选项。指定超时时间。单位:毫秒(ms)。默认值为200

示例

//创建图空间。
nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));

//登录文本搜索客户端。
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP);

//检查是否成功登录。
nebula> SHOW TEXT SEARCH CLIENTS;

//切换图空间。
nebula> USE basketballplayer;

//添加 listener 到 NebulaGraph 集群。
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.5:9789;

//检查是否成功添加 listener,当状态为 Online 时表示成功添加。
nebula> SHOW LISTENER;

//创建 Tag。
nebula> CREATE TAG IF NOT EXISTS player(name string, age int);

//创建全文索引,索引名称需要以`nebula_`开头。
nebula> CREATE FULLTEXT TAG INDEX nebula_index_1 ON player(name);

//重建全文索引。
nebula> REBUILD FULLTEXT INDEX;

//查看全文索引。
nebula> SHOW FULLTEXT INDEXES;
+------------------+-------------+-------------+--------+
| Name             | Schema Type | Schema Name | Fields |
+------------------+-------------+-------------+--------+
| "nebula_index_1" | "Tag"       | "player"    | "name" |
+------------------+-------------+-------------+--------+

//插入测试数据。
nebula> INSERT VERTEX player(name, age) VALUES \
  "Russell Westbrook": ("Russell Westbrook", 30), \
  "Chris Paul": ("Chris Paul", 33),\
  "Boris Diaw": ("Boris Diaw", 36),\
  "David West": ("David West", 38),\
  "Danny Green": ("Danny Green", 31),\
  "Tim Duncan": ("Tim Duncan", 42),\
  "James Harden": ("James Harden", 29),\
  "Tony Parker": ("Tony Parker", 36),\
  "Aron Baynes": ("Aron Baynes", 32),\
  "Ben Simmons": ("Ben Simmons", 22),\
  "Blake Griffin": ("Blake Griffin", 30);

//测试查询
nebula> LOOKUP ON player WHERE PREFIX(player.name, "B") YIELD id(vertex);
+-----------------+
| id(VERTEX)      |
+-----------------+
| "Boris Diaw"    |
| "Ben Simmons"   |
| "Blake Griffin" |
+-----------------+

nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") YIELD player.name, player.age;
+-----------------+-----+
| name            | age |
+-----------------+-----+
| "Chris Paul"    | 33  |
| "Boris Diaw"    | 36  |
| "Blake Griffin" | 30  |
+-----------------+-----+

nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") | YIELD count(*);
+----------+
| count(*) |
+----------+
| 3        |
+----------+

nebula> LOOKUP ON player WHERE REGEXP(player.name, "R.*") YIELD player.name, player.age;
+---------------------+-----+
| name                | age |
+---------------------+-----+
| "Russell Westbrook" | 30  |
+---------------------+-----+

nebula> LOOKUP ON player WHERE REGEXP(player.name, ".*") YIELD id(vertex);
+---------------------+
| id(VERTEX)          |
+---------------------+
| "Danny Green"       |
| "David West"        |
| "Russell Westbrook" |
+---------------------+
...

nebula> LOOKUP ON player WHERE FUZZY(player.name, "Tim Dunncan", AUTO, OR) YIELD player.name;
+--------------+
| name         |
+--------------+
| "Tim Duncan" |
+--------------+

//删除全文索引。
nebula> DROP FULLTEXT INDEX nebula_index_1;

最后更新: July 6, 2023