跳转至

全文搜索

全文搜索是基于全文索引对值为字符串类型的属性进行前缀搜索、通配符搜索、正则表达式搜索和模糊搜索。

LOOKUP语句中,使用WHERE子句指定字符串的搜索条件。

前提条件

请确保已经部署全文索引。详情请参见部署全文索引部署 listener

注意事项

使用全文索引前,请确认已经了解全文索引的使用限制

全文本查询

全文本查询使您能够搜索经过分析的文本字段,使用具有严格语法的解析器,根据提供的查询字符串返回内容。详情参见Query string query

语法

创建全文索引

CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} (<prop_name> [,<prop_name>]...) [ANALYZER="<analyzer_name>"];
  • 创建全文索引时支持多属性的复合索引。
  • <analyzer_name>为分词器名称。默认为standard。使用其他分词器(例如 IK Analysis)需要确保已经提前在 Elasticsearch 中安装对应分词器。

显示全文索引

SHOW FULLTEXT INDEXES;

重建全文索引

REBUILD FULLTEXT INDEX;

Caution

数据量大时,重建全文索引速度较慢,可以修改 Storage 服务的配置文件(nebula-storaged.conf)中snapshot_send_files=false

删除全文索引

DROP FULLTEXT INDEX <index_name>;

使用查询选项

LOOKUP ON {<tag> | <edge_type>} WHERE ES_QUERY(<index_name>, "<text>") YIELD <return_list> [| LIMIT [<offset>,] <number_rows>];

<return_list>
    <prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...] [, id(vertex)  [AS <prop_alias>]] [, score() AS <score_alias>]
  • index_name:索引名称。
  • text:搜索条件。WHERE 后只能跟一个 ES_QUERY,所有判断条件必须写在 text 里。详细语法请参见Query string syntax
  • score():对符合条件的点做 N 度扩展计算出的分数。默认值为1.0。分数越高,匹配程度越高。返回值默认按照分数从高到低排序。详情参见Search and Scoring in Lucene

示例

//创建图空间。
nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));

//登录文本搜索客户端。
nebula> SIGN IN TEXT SERVICE (192.168.8.100:9200, HTTP);

//检查是否成功登录。
nebula> SHOW TEXT SEARCH CLIENTS;
+-----------------+-----------------+------+
| Type            | Host            | Port |
+-----------------+-----------------+------+
| "ELASTICSEARCH" | "192.168.8.100" | 9200 |
+-----------------+-----------------+------+

//切换图空间。
nebula> USE basketballplayer;

//添加 listener 到 NebulaGraph 集群。
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.100:9789;

//检查是否成功添加 listener,当状态为 Online 时表示成功添加。
nebula> SHOW LISTENER;
+--------+-----------------+------------------------+-------------+
| PartId | Type            | Host                   | Host Status |
+--------+-----------------+------------------------+-------------+
| 1      | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE"    |
| 2      | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE"    |
| 3      | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE"    |
+--------+-----------------+------------------------+-------------+

//创建 Tag。
nebula> CREATE TAG IF NOT EXISTS player(name string, city string);

//创建单属性全文索引。
nebula> CREATE FULLTEXT TAG INDEX fulltext_index_1 ON player(name) ANALYZER="standard";

//创建多属性全文索引。
nebula> CREATE FULLTEXT TAG INDEX fulltext_index_2 ON player(name,city) ANALYZER="standard";

//重建全文索引。
nebula> REBUILD FULLTEXT INDEX;

//查看全文索引。
nebula> SHOW FULLTEXT INDEXES;
+--------------------+-------------+-------------+--------------+------------+
| Name               | Schema Type | Schema Name | Fields       | Analyzer   |
+--------------------+-------------+-------------+--------------+------------+
| "fulltext_index_1" | "Tag"       | "player"    | "name"       | "standard" |
| "fulltext_index_2" | "Tag"       | "player"    | "name, city" | "standard" |
+--------------------+-------------+-------------+--------------+------------+

//插入测试数据。
nebula> INSERT VERTEX player(name, city) VALUES \
  "Russell Westbrook": ("Russell Westbrook", "Los Angeles"), \
  "Chris Paul": ("Chris Paul", "Houston"),\
  "Boris Diaw": ("Boris Diaw", "Houston"),\
  "David West": ("David West", "Philadelphia"),\
  "Danny Green": ("Danny Green", "Philadelphia"),\
  "Tim Duncan": ("Tim Duncan", "New York"),\
  "James Harden": ("James Harden", "New York"),\
  "Tony Parker": ("Tony Parker", "Chicago"),\
  "Aron Baynes": ("Aron Baynes", "Chicago"),\
  "Ben Simmons": ("Ben Simmons", "Phoenix"),\
  "Blake Griffin": ("Blake Griffin", "Phoenix");

//测试查询
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Chris") YIELD id(vertex);
+--------------+
| id(VERTEX)   |
+--------------+
| "Chris Paul" |
+--------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Harden") YIELD properties(vertex);
+----------------------------------------------------------------+
| properties(VERTEX)                                             |
+----------------------------------------------------------------+
| {_vid: "James Harden", city: "New York", name: "James Harden"} |
+----------------------------------------------------------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Da*") YIELD properties(vertex);
+------------------------------------------------------------------+
| properties(VERTEX)                                               |
+------------------------------------------------------------------+
| {_vid: "David West", city: "Philadelphia", name: "David West"}   |
| {_vid: "Danny Green", city: "Philadelphia", name: "Danny Green"} |
+------------------------------------------------------------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex);
+---------------------+
| id(VERTEX)          |
+---------------------+
| "Russell Westbrook" |
| "Boris Diaw"        |
| "Aron Baynes"       |
| "Ben Simmons"       |
| "Blake Griffin"     |
+---------------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex) | LIMIT 2,3;
+-----------------+
| id(VERTEX)      |
+-----------------+
| "Aron Baynes"   |
| "Ben Simmons"   |
| "Blake Griffin" |
+-----------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex) | YIELD count(*);
+----------+
| count(*) |
+----------+
| 5        |
+----------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex), score() AS score;
+---------------------+-------+
| id(VERTEX)          | score |
+---------------------+-------+
| "Russell Westbrook" | 1.0   |
| "Boris Diaw"        | 1.0   |
| "Aron Baynes"       | 1.0   |
| "Ben Simmons"       | 1.0   |
| "Blake Griffin"     | 1.0   |
+---------------------+-------+

//对于包含 b 的词的文档,它的得分将乘以加权因子 4,而对于包含 c 的词的文档,使用默认加权因子 1。
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*^4 OR *c*") YIELD id(vertex), score() AS score;
+---------------------+-------+
| id(VERTEX)          | score |
+---------------------+-------+
| "Russell Westbrook" | 4.0   |
| "Boris Diaw"        | 4.0   |
| "Aron Baynes"       | 4.0   |
| "Ben Simmons"       | 4.0   |
| "Blake Griffin"     | 4.0   |
| "Chris Paul"        | 1.0   |
| "Tim Duncan"        | 1.0   |
+---------------------+-------+

//使用多属性全文索引查询,此时条件会在索引的所有属性内进行匹配。
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_2,"*h*") YIELD properties(vertex);
+------------------------------------------------------------------+
| properties(VERTEX)                                               |
+------------------------------------------------------------------+
| {_vid: "Chris Paul", city: "Houston", name: "Chris Paul"}        |
| {_vid: "Boris Diaw", city: "Houston", name: "Boris Diaw"}        |
| {_vid: "David West", city: "Philadelphia", name: "David West"}   |
| {_vid: "James Harden", city: "New York", name: "James Harden"}   |
| {_vid: "Tony Parker", city: "Chicago", name: "Tony Parker"}      |
| {_vid: "Aron Baynes", city: "Chicago", name: "Aron Baynes"}      |
| {_vid: "Ben Simmons", city: "Phoenix", name: "Ben Simmons"}      |
| {_vid: "Blake Griffin", city: "Phoenix", name: "Blake Griffin"}  |
| {_vid: "Danny Green", city: "Philadelphia", name: "Danny Green"} |
+------------------------------------------------------------------+

//使用多属性全文索引查询时,可以对不同的属性指定不同的文本进行查询。
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_2,"name:*b* AND city:Houston") YIELD properties(vertex);
+-----------------------------------------------------------+
| properties(VERTEX)                                        |
+-----------------------------------------------------------+
| {_vid: "Boris Diaw", city: "Houston", name: "Boris Diaw"} |
+-----------------------------------------------------------+

//删除单属性全文索引。
nebula> DROP FULLTEXT INDEX fulltext_index_1;

最后更新: September 6, 2024