全文搜索¶
全文搜索是基于全文索引对值为字符串类型的属性进行前缀搜索、通配符搜索、正则表达式搜索和模糊搜索。
在LOOKUP
语句中,使用WHERE
子句指定字符串的搜索条件。
前提条件¶
请确保已经部署全文索引。详情请参见部署全文索引和部署 listener。
注意事项¶
使用全文索引前,请确认已经了解全文索引的使用限制。
自然语言全文搜索¶
自然语言搜索将搜索的字符串解释为自然人类语言中的短语。搜索不区分大小写,且默认是对字符串的每个子字符串(以空格分隔)单独判断搜索。例如,有三个点属于标签player
,标签player
含有属性name
,这三个点的name
分别为Kevin Durant
、Tim Duncan
和David Beckham
。现在已经建立好有关player.name
的全文索引,在用全文索引前缀搜索语句LOOKUP ON player WHERE PREFIX(player.name,"d");
查询时,这三个点都会被查询到。
语法¶
创建全文索引¶
CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} ([<prop_name_list>]);
显示全文索引¶
SHOW FULLTEXT INDEXES;
重建全文索引¶
REBUILD FULLTEXT INDEX;
删除全文索引¶
DROP FULLTEXT INDEX <index_name>;
使用查询选项¶
LOOKUP ON {<tag> | <edge_type>} WHERE <expression> [YIELD <return_list>];
<expression> ::=
PREFIX | WILDCARD | REGEXP | FUZZY
<return_list>
<prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...]
- PREFIX(schema_name.prop_name, prefix_string, row_limit, timeout)
- WILDCARD(schema_name.prop_name, wildcard_string, row_limit, timeout)
- REGEXP(schema_name.prop_name, regexp_string, row_limit, timeout)
-
FUZZY(schema_name.prop_name, fuzzy_string, fuzziness, operator, row_limit, timeout)
fuzziness
:可选项。允许匹配的最大编辑距离。默认值为AUTO
。查看其他可选值和更多信息,请参见 Elasticsearch 官方文档。
operator
:可选项。解释文本的布尔逻辑。可选值为OR
(默认)和and
。
row_limit
:可选项。指定要返回的行数。默认值为100
。
timeout
:可选项。指定超时时间。单位:毫秒(ms)。默认值为200
。
示例¶
//创建图空间。
nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));
//登录文本搜索客户端。
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP);
//切换图空间。
nebula> USE basketballplayer;
//添加 listener 到 Nebula Graph 集群。
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.5:9789;
//创建 Tag。
nebula> CREATE TAG IF NOT EXISTS player(name string, age int);
//创建原生索引。
nebula> CREATE TAG INDEX IF NOT EXISTS name ON player(name(20));
//重建原生索引。
nebula> REBUILD TAG INDEX;
//创建全文索引,索引名称需要以 nebula 开头。
nebula> CREATE FULLTEXT TAG INDEX nebula_index_1 ON player(name);
//重建全文索引。
nebula> REBUILD FULLTEXT INDEX;
//查看全文索引。
nebula> SHOW FULLTEXT INDEXES;
+------------------+-------------+-------------+--------+
| Name | Schema Type | Schema Name | Fields |
+------------------+-------------+-------------+--------+
| "nebula_index_1" | "Tag" | "player" | "name" |
+------------------+-------------+-------------+--------+
//插入测试数据。
nebula> INSERT VERTEX player(name, age) VALUES \
"Russell Westbrook": ("Russell Westbrook", 30), \
"Chris Paul": ("Chris Paul", 33),\
"Boris Diaw": ("Boris Diaw", 36),\
"David West": ("David West", 38),\
"Danny Green": ("Danny Green", 31),\
"Tim Duncan": ("Tim Duncan", 42),\
"James Harden": ("James Harden", 29),\
"Tony Parker": ("Tony Parker", 36),\
"Aron Baynes": ("Aron Baynes", 32),\
"Ben Simmons": ("Ben Simmons", 22),\
"Blake Griffin": ("Blake Griffin", 30);
//测试查询
nebula> LOOKUP ON player WHERE PREFIX(player.name, "B") YIELD id(vertex);
+-----------------+
| id(VERTEX) |
+-----------------+
| "Boris Diaw" |
| "Ben Simmons" |
| "Blake Griffin" |
+-----------------+
nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") YIELD player.name, player.age;
+-----------------+-----+
| name | age |
+-----------------+-----+
| "Chris Paul" | 33 |
| "Boris Diaw" | 36 |
| "Blake Griffin" | 30 |
+-----------------+-----+
nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") | YIELD count(*);
+----------+
| count(*) |
+----------+
| 3 |
+----------+
nebula> LOOKUP ON player WHERE REGEXP(player.name, "R.*") YIELD player.name, player.age;
+---------------------+-----+
| name | age |
+---------------------+-----+
| "Russell Westbrook" | 30 |
+---------------------+-----+
nebula> LOOKUP ON player WHERE REGEXP(player.name, ".*") YIELD id(vertex);
+---------------------+
| id(VERTEX) |
+---------------------+
| "Danny Green" |
| "David West" |
| "Russell Westbrook" |
+---------------------+
...
nebula> LOOKUP ON player WHERE FUZZY(player.name, "Tim Dunncan", AUTO, OR) YIELD player.name;
+--------------+
| name |
+--------------+
| "Tim Duncan" |
+--------------+
//删除全文索引。
nebula> DROP FULLTEXT INDEX nebula_index_1;
最后更新:
March 7, 2023