ClickHouse エラー解消のヒント:「ClickHouse Column xxx is not under aggregate function and not in GROUP BY keys」の理解と解決策

2024-09-17

ClickHouseで発生するエラー「ClickHouse Column xxx is not under aggregate function and not in GROUP BY keys」の詳細解説と解決策

ClickHouseでこのエラーが発生するのは、以下の2つの状況が考えられます。

解決策

このエラーを解決するには、以下のいずれかの方法を試す必要があります。

集計関数に正しい列を指定する

もし、集計関数に誤った列を指定している場合は、正しい列を指定する必要があります。例えば、COUNT(*) で全件数をカウントしたい場合は、* ではなく、カウントしたい列を指定する必要があります。

GROUP BY キーに列を追加する

もし、SELECT 句で参照したい列がGROUP BY キーに含まれていない場合は、その列をGROUP BY キーに追加する必要があります。

ウィンドウ関数を使用する

どうしてもGROUP BY キーに列を追加できない場合は、ウィンドウ関数を使用する方法があります。ウィンドウ関数を使用すると、GROUP BY に関係なく、任意の列に対して集計を行うことができます。

以下の例は、ClickHouseでこのエラーが発生する状況と、それを解決する方法を示しています。

例1:集計関数に誤った列を指定している

SELECT COUNT(DISTINCT id), name
FROM example_table
GROUP BY name;

このクエリは、name ごとにレコード数をカウントしようとしていますが、COUNT(DISTINCT id)id 列に対してカウントを行っているため、エラーが発生します。

SELECT COUNT(DISTINCT name), name
FROM example_table
GROUP BY name;

上記のように、COUNT(DISTINCT) の引数にカウントしたい列を指定する必要があります。

例2:GROUP BY キーにない列を参照しようとしている

SELECT id, MAX(price)
FROM example_table
GROUP BY name;
SELECT id, MAX(price)
FROM example_table
GROUP BY name, id;

上記のように、GROUP BY キーに参照したい列を追加する必要があります。

SELECT id, name, MAX(price) OVER (PARTITION BY name) AS max_price
FROM example_table;



Consider a table named example_table with columns id, name, and price. The following query attempts to count the number of distinct names while also selecting the maximum price for each name:

SELECT COUNT(DISTINCT id), MAX(price)
FROM example_table
GROUP BY name;

This query will result in the error "ClickHouse Column id is not under aggregate function and not in GROUP BY keys" because the COUNT(DISTINCT id) aggregate function is using the id column, which is not included in the GROUP BY clause.

Solution:

To fix this error, use the correct column in the aggregate function. In this case, we want to count the distinct names, so we should use COUNT(DISTINCT name):

SELECT COUNT(DISTINCT name), MAX(price)
FROM example_table
GROUP BY name;

Example 2: Column not in GROUP BY key

Imagine the same example_table with the same columns. The following query tries to retrieve the maximum price for each name while also selecting the corresponding ID:

SELECT id, MAX(price)
FROM example_table
GROUP BY name;

To resolve this error, add the id column to the GROUP BY clause:

SELECT id, MAX(price)
FROM example_table
GROUP BY name, id;

By including id in the GROUP BY clause, ClickHouse can correctly associate each maximum price with its corresponding ID.

Example 3: Using window functions

In scenarios where adding columns to the GROUP BY clause is not feasible, window functions can be employed. Window functions allow for aggregation within specific partitions or windows without explicitly grouping the data.

Let's revisit the previous example:

SELECT id, name, MAX(price) OVER (PARTITION BY name) AS max_price
FROM example_table;

This query utilizes the MAX window function, partitioning the data by name. For each row, it calculates the maximum price within the corresponding name partition, effectively achieving the desired result without modifying the GROUP BY clause.




In some cases, using subqueries can be an alternative approach to achieve the desired results without encountering the error. Subqueries allow you to embed multiple queries within a single query, enabling more complex data manipulation.

Consider the following scenario:

SELECT name, MAX(price)
FROM example_table;

This query attempts to find the maximum price for each name without using GROUP BY, which would trigger the error.

SELECT name, max_price
FROM (
  SELECT name, price
  FROM example_table
) AS subquery
GROUP BY name;

In this approach, we create a subquery to select the name and price columns from the example_table. Then, we perform the GROUP BY and MAX aggregation on the subquery's result set.

Using conditional aggregation

Conditional aggregation allows you to selectively apply aggregate functions based on specific conditions. This can be useful when you need to filter or group data before performing aggregations.

For instance, let's say you want to calculate the maximum price for each name only for products with a price greater than 100:

SELECT name, MAX(price) AS max_price
FROM example_table
WHERE price > 100
GROUP BY name;

This query will still result in the error since the GROUP BY clause doesn't include the price column.

SELECT name,
       MAX(CASE WHEN price > 100 THEN price ELSE NULL END) AS max_price
FROM example_table
GROUP BY name;

By using a conditional expression within the MAX aggregate function, we ensure that the aggregation is only applied to rows where the price is greater than 100.

Modifying table schema

In some situations, it might be beneficial to modify the table schema to better suit your data analysis needs. For example, if you frequently perform aggregations on specific combinations of columns, consider creating composite columns or materialized views to optimize query performance and avoid potential errors.

Remember that modifying the schema should be done carefully and with consideration for the overall data structure and usage patterns.


aggregate clickhouse

aggregate clickhouse