跳转至

聚合函数

本文介绍 NebulaGraph 支持的聚合函数。

avg()

avg() 返回参数的平均值。

语法:avg(<expression>)

  • 返回类型:double。

示例:

nebula> MATCH (v:player) RETURN avg(v.player.age);
+--------------------+
| avg(v.player.age)  |
+--------------------+
| 33.294117647058826 |
+--------------------+

count()

count() 返回参数的数量。

  • (原生 nGQL)用户可以同时使用count()GROUP BY对传参进行分组和计数,再使用YIELD返回结果。
  • (openCypher 方式)用户可以使用count()对指定的值进行计数,再使用RETURN返回结果。不需要使用GROUP BY

语法:count({<expression> | *})

  • count(*) 返回总行数(包括 NULL)。
  • 返回类型:int。

示例:

nebula> WITH [NULL, 1, 1, 2, 2] As a UNWIND a AS b \
        RETURN count(b), count(*), count(DISTINCT b);
+----------+----------+-------------------+
| count(b) | count(*) | count(distinct b) |
+----------+----------+-------------------+
| 4        | 5        | 2                 |
+----------+----------+-------------------+
# 返回 player101 follow 的人,以及 follow player101 的人,即双向查询。
# 使用`count()`和`GROUP BY`进行分组和计数。
nebula> GO FROM "player101" OVER follow BIDIRECT \
        YIELD properties($$).name AS Name \
        | GROUP BY $-.Name YIELD $-.Name, count(*);
+---------------------+----------+
| $-.Name             | count(*) |
+---------------------+----------+
| "LaMarcus Aldridge" | 2        |
| "Tim Duncan"        | 2        |
| "Marco Belinelli"   | 1        |
| "Manu Ginobili"     | 1        |
| "Boris Diaw"        | 1        |
| "Dejounte Murray"   | 1        |
+---------------------+----------+

# 使用`count()`进行计数。
nebula> MATCH (v1:player)-[:follow]-(v2:player) \
        WHERE id(v1)== "player101" \
        RETURN v2.player.name AS Name, count(*) as cnt ORDER BY cnt DESC;
+---------------------+-----+
| Name                | cnt |
+---------------------+-----+
| "LaMarcus Aldridge" | 2   |
| "Tim Duncan"        | 2   |
| "Boris Diaw"        | 1   |
| "Manu Ginobili"     | 1   |
| "Dejounte Murray"   | 1   |
| "Marco Belinelli"   | 1   |
+---------------------+-----+

上述示例的返回结果有两列:

  • $-.Name:查询结果包含的姓名。
  • count(*):姓名出现的次数。

因为测试数据集basketballplayer中没有重复的姓名,count(*)列中数字2表示该行的人和player101是互相follow的关系。

# 方法一:统计数据库中的年龄分布情况。
nebula> LOOKUP ON player \
        YIELD player.age As playerage \
        | GROUP BY $-.playerage \
        YIELD $-.playerage as age, count(*) AS number \
        | ORDER BY $-.number DESC, $-.age DESC;
+-----+--------+
| age | number |
+-----+--------+
| 34  | 4      |
| 33  | 4      |
| 30  | 4      |
| 29  | 4      |
| 38  | 3      |
+-----+--------+
...

# 方法二:统计数据库中的年龄分布情况。
nebula> MATCH (n:player) \
        RETURN n.player.age as age, count(*) as number \
        ORDER BY number DESC, age DESC;
+-----+--------+
| age | number |
+-----+--------+
| 34  | 4      |
| 33  | 4      |
| 30  | 4      |
| 29  | 4      |
| 38  | 3      |
+-----+--------+
...
# 统计 Tim Duncan 关联的边数。
nebula> MATCH (v:player{name:"Tim Duncan"}) -[e]- (v2) \
        RETURN count(e);
+----------+
| count(e) |
+----------+
| 13       |
+----------+


# 多跳查询,统计 Tim Duncan 关联的边数,返回两列(不去重和去重)。
nebula> MATCH (n:player {name : "Tim Duncan"})-[]->(friend:player)-[]->(fof:player) \
        RETURN count(fof), count(DISTINCT fof);
+------------+---------------------+
| count(fof) | count(distinct fof) |
+------------+---------------------+
| 4          | 3                   |
+------------+---------------------+

max()

max() 返回参数的最大值。

语法:max(<expression>)

  • 返回类型:与原参数相同。

示例:

nebula> MATCH (v:player) RETURN max(v.player.age);
+-------------------+
| max(v.player.age) |
+-------------------+
| 47                |
+-------------------+

min()

min() 返回参数的最小值。

语法:min(<expression>)

  • 返回类型:与原参数相同。

示例:

nebula> MATCH (v:player) RETURN min(v.player.age);
+-------------------+
| min(v.player.age) |
+-------------------+
| 20                |
+-------------------+

collect()

collect() 返回一个符合表达式返回结果的列表。该函数可以将多条记录或值合并进一个列表,实现数据聚合。

语法:collect(<expression>)

  • 返回类型:list。

示例:

nebula> UNWIND [1, 2, 1] AS a \
        RETURN a;
+---+
| a |
+---+
| 1 |
| 2 |
| 1 |
+---+

nebula> UNWIND [1, 2, 1] AS a \
        RETURN collect(a);
+------------+
| collect(a) |
+------------+
| [1, 2, 1]  |
+------------+

nebula> UNWIND [1, 2, 1] AS a \
        RETURN a, collect(a), size(collect(a));
+---+------------+------------------+
| a | collect(a) | size(collect(a)) |
+---+------------+------------------+
| 2 | [2]        | 1                |
| 1 | [1, 1]     | 2                |
+---+------------+------------------+

# 降序排列,限制输出行数为 3,然后将结果输出到列表中。
nebula> UNWIND ["c", "b", "a", "d" ] AS p \
        WITH p AS q \
        ORDER BY q DESC LIMIT 3 \
        RETURN collect(q);
+-----------------+
| collect(q)      |
+-----------------+
| ["d", "c", "b"] |
+-----------------+

nebula> WITH [1, 1, 2, 2] AS coll \
        UNWIND coll AS x \
        WITH DISTINCT x \
        RETURN collect(x) AS ss;
+--------+
| ss     |
+--------+
| [1, 2] |
+--------+

nebula> MATCH (n:player) \
        RETURN collect(n.player.age);
+---------------------------------------------------------------+
| collect(n.player.age)                                         |
+---------------------------------------------------------------+
| [32, 32, 34, 29, 41, 40, 33, 25, 40, 37, ...
...

# 基于年龄聚合姓名。
nebula> MATCH (n:player) \
        RETURN n.player.age AS age, collect(n.player.name);
+-----+--------------------------------------------------------------------------+
| age | collect(n.player.name)                                                   |
+-----+--------------------------------------------------------------------------+
| 24  | ["Giannis Antetokounmpo"]                                                |
| 20  | ["Luka Doncic"]                                                          |
| 25  | ["Joel Embiid", "Kyle Anderson"]                                         |
+-----+--------------------------------------------------------------------------+
...

nebula> GO FROM "player100" OVER serve \
        YIELD properties($$).name AS name \
        | GROUP BY $-.name \
        YIELD collect($-.name) AS name;
+-----------+
| name      |
+-----------+
| ["Spurs"] |
+-----------+

nebula> LOOKUP ON player \
        YIELD player.age As playerage \
        | GROUP BY $-.playerage \
        YIELD collect($-.playerage) AS playerage;
+------------------+
| playerage        |
+------------------+
| [22]             |
| [47]             |
| [43]             |
| [25, 25]         |
+------------------+
...

std()

std() 返回参数的总体标准差。

语法:std(<expression>)

  • 返回类型:double。

示例:

nebula> MATCH (v:player) RETURN std(v.player.age);
+-------------------+
| std(v.player.age) |
+-------------------+
| 6.423895701687502 |
+-------------------+

sum()

sum() 返回参数的和。

语法:sum(<expression>)

  • 返回类型:与原参数相同。

示例:

nebula> MATCH (v:player) RETURN sum(v.player.age);
+-------------------+
| sum(v.player.age) |
+-------------------+
| 1698              |
+-------------------+

聚合示例

nebula>  GO FROM "player100" OVER follow YIELD dst(edge) AS dst, properties($$).age AS age \
         | GROUP BY $-.dst \
         YIELD \
         $-.dst AS dst, \
         toInteger((sum($-.age)/count($-.age)))+avg(distinct $-.age+1)+1 AS statistics;
+-------------+------------+
| dst         | statistics |
+-------------+------------+
| "player125" | 84.0       |
| "player101" | 74.0       |
+-------------+------------+

最后更新: August 7, 2023