ClickHouse エラー解消のヒント:「ClickHouse Column xxx is not under aggregate function and not in GROUP BY keys」の理解と解決策
ClickHouseで発生するエラー「ClickHouse Column xxx is not under aggregate function and not in GROUP BY keys」の詳細解説と解決策
ClickHouseでこのエラーが発生するのは、以下の2つの状況が考えられます。
解決策
このエラーを解決するには、以下のいずれかの方法を試す必要があります。
集計関数に正しい列を指定する
もし、集計関数に誤った列を指定している場合は、正しい列を指定する必要があります。例えば、COUNT(*)
で全件数をカウントしたい場合は、*
ではなく、カウントしたい列を指定する必要があります。
GROUP BY キーに列を追加する
もし、SELECT
句で参照したい列がGROUP BY
キーに含まれていない場合は、その列をGROUP BY
キーに追加する必要があります。
ウィンドウ関数を使用する
どうしてもGROUP BY
キーに列を追加できない場合は、ウィンドウ関数を使用する方法があります。ウィンドウ関数を使用すると、GROUP BY
に関係なく、任意の列に対して集計を行うことができます。
例
以下の例は、ClickHouseでこのエラーが発生する状況と、それを解決する方法を示しています。
例1:集計関数に誤った列を指定している
SELECT COUNT(DISTINCT id), name
FROM example_table
GROUP BY name;
このクエリは、name
ごとにレコード数をカウントしようとしていますが、COUNT(DISTINCT id)
は id
列に対してカウントを行っているため、エラーが発生します。
SELECT COUNT(DISTINCT name), name
FROM example_table
GROUP BY name;
上記のように、COUNT(DISTINCT)
の引数にカウントしたい列を指定する必要があります。
例2:GROUP BY
キーにない列を参照しようとしている
SELECT id, MAX(price)
FROM example_table
GROUP BY name;
SELECT id, MAX(price)
FROM example_table
GROUP BY name, id;
上記のように、GROUP BY
キーに参照したい列を追加する必要があります。
SELECT id, name, MAX(price) OVER (PARTITION BY name) AS max_price
FROM example_table;
Consider a table named example_table
with columns id
, name
, and price
. The following query attempts to count the number of distinct names while also selecting the maximum price for each name:
SELECT COUNT(DISTINCT id), MAX(price)
FROM example_table
GROUP BY name;
This query will result in the error "ClickHouse Column id is not under aggregate function and not in GROUP BY keys" because the COUNT(DISTINCT id)
aggregate function is using the id
column, which is not included in the GROUP BY
clause.
Solution:
To fix this error, use the correct column in the aggregate function. In this case, we want to count the distinct names, so we should use COUNT(DISTINCT name)
:
SELECT COUNT(DISTINCT name), MAX(price)
FROM example_table
GROUP BY name;
Example 2: Column not in GROUP BY key
Imagine the same example_table
with the same columns. The following query tries to retrieve the maximum price for each name while also selecting the corresponding ID:
SELECT id, MAX(price)
FROM example_table
GROUP BY name;
To resolve this error, add the id
column to the GROUP BY
clause:
SELECT id, MAX(price)
FROM example_table
GROUP BY name, id;
By including id
in the GROUP BY
clause, ClickHouse can correctly associate each maximum price with its corresponding ID.
Example 3: Using window functions
In scenarios where adding columns to the GROUP BY
clause is not feasible, window functions can be employed. Window functions allow for aggregation within specific partitions or windows without explicitly grouping the data.
Let's revisit the previous example:
SELECT id, name, MAX(price) OVER (PARTITION BY name) AS max_price
FROM example_table;
This query utilizes the MAX
window function, partitioning the data by name
. For each row, it calculates the maximum price within the corresponding name
partition, effectively achieving the desired result without modifying the GROUP BY
clause.
In some cases, using subqueries can be an alternative approach to achieve the desired results without encountering the error. Subqueries allow you to embed multiple queries within a single query, enabling more complex data manipulation.
Consider the following scenario:
SELECT name, MAX(price)
FROM example_table;
This query attempts to find the maximum price for each name without using GROUP BY
, which would trigger the error.
SELECT name, max_price
FROM (
SELECT name, price
FROM example_table
) AS subquery
GROUP BY name;
In this approach, we create a subquery to select the name
and price
columns from the example_table
. Then, we perform the GROUP BY
and MAX
aggregation on the subquery's result set.
Using conditional aggregation
Conditional aggregation allows you to selectively apply aggregate functions based on specific conditions. This can be useful when you need to filter or group data before performing aggregations.
For instance, let's say you want to calculate the maximum price for each name only for products with a price greater than 100:
SELECT name, MAX(price) AS max_price
FROM example_table
WHERE price > 100
GROUP BY name;
This query will still result in the error since the GROUP BY
clause doesn't include the price
column.
SELECT name,
MAX(CASE WHEN price > 100 THEN price ELSE NULL END) AS max_price
FROM example_table
GROUP BY name;
By using a conditional expression within the MAX
aggregate function, we ensure that the aggregation is only applied to rows where the price
is greater than 100.
Modifying table schema
In some situations, it might be beneficial to modify the table schema to better suit your data analysis needs. For example, if you frequently perform aggregations on specific combinations of columns, consider creating composite columns or materialized views to optimize query performance and avoid potential errors.
Remember that modifying the schema should be done carefully and with consideration for the overall data structure and usage patterns.
aggregate clickhouse