INLINE STATS
Stack Serverless
The INLINE STATS
processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows. The results
are appended as new columns to the input rows.
The command is identical to STATS
except that it preserves all the columns from the input table.
Syntax
INLINE STATS [column1 =] expression1 [WHERE boolean_expression1][,
...,
[columnN =] expressionN [WHERE boolean_expressionN]]
[BY [grouping_name1 =] grouping_expression1[,
...,
[grouping_nameN = ] grouping_expressionN]]
Parameters
columnX
- The name by which the aggregated value is returned. If omitted, the name is
equal to the corresponding expression (
expressionX
). If multiple columns have the same name, all but the rightmost column with this name will be ignored. expressionX
- An expression that computes an aggregated value.
grouping_expressionX
- An expression that outputs the values to group by. If its name coincides with one of the existing or computed columns, that column will be overridden by this one.
boolean_expressionX
- The condition that determines which rows are included when evaluating
expressionX
.
Individual null
values are skipped when computing aggregations.
Description
The INLINE STATS
processing command groups rows according to a common value
(also known as the grouping key), specified after BY
, and calculates one or more
aggregated values over the grouped rows. The output table contains the same
number of rows as the input table. The command only adds new columns or overrides
existing columns with the same name as the result.
If column names overlap, existing column values may be overridden and column order
may change. The new columns are added/moved so that they appear in the order
they are defined in the INLINE STATS
command.
For the calculation of each aggregated value, the rows in a group can be filtered with
WHERE
. If BY
is omitted the aggregations are applied over the entire dataset.
The following aggregation functions are supported:
AVG
COUNT
COUNT_DISTINCT
MAX
MEDIAN
MEDIAN_ABSOLUTE_DEVIATION
MIN
PERCENTILE
SAMPLE
- [preview]
ST_CENTROID_AGG
- [preview]
ST_EXTENT_AGG
STD_DEV
SUM
TOP
- [preview]
VALUES
WEIGHTED_AVG
PRESENT
The following grouping functions are supported:
Examples
The following example shows how to calculate a statistic on one column and group by the values of another column.
The languages
column moves to the last position in the output table because it is
a column overriden by the INLINE STATS
command (it's the grouping key) and it is the last column defined by it.
FROM employees
| KEEP emp_no, languages, salary
| INLINE STATS max_salary = MAX(salary) BY languages
emp_no:integer | salary:integer | max_salary:integer | languages:integer |
---|---|---|---|
10001 | 57305 | 73578 | 2 |
10002 | 56371 | 66817 | 5 |
10003 | 61805 | 74572 | 4 |
10004 | 36174 | 66817 | 5 |
10005 | 63528 | 73717 | 1 |
The following example shows how to calculate an aggregation over the entire dataset
by omitting BY
. The order of the existing columns is preserved and a new column
with the calculated maximum salary value is added as the last column:
FROM employees
| KEEP emp_no, languages, salary
| INLINE STATS max_salary = MAX(salary)
emp_no:integer | languages:integer | salary:integer | max_salary:integer |
---|---|---|---|
10001 | 2 | 57305 | 74999 |
10002 | 5 | 56371 | 74999 |
10003 | 4 | 61805 | 74999 |
10004 | 5 | 36174 | 74999 |
10005 | 1 | 63528 | 74999 |
The following example shows how to calculate multiple aggregations with multiple grouping keys:
FROM employees
| WHERE still_hired
| KEEP emp_no, languages, salary, hire_date
| EVAL tenure = DATE_DIFF("year", hire_date, now())
| DROP hire_date
| INLINE STATS avg_salary = AVG(salary), count = count(*) BY languages, tenure
emp_no:integer | salary:integer | avg_salary:double | count:long | languages:integer | tenure:integer |
---|---|---|---|---|---|
10001 | 57305 | 51130.5 | 2 | 2 | 39 |
10002 | 56371 | 40180.0 | 3 | 5 | 39 |
10004 | 36174 | 30749.0 | 2 | 5 | 38 |
10005 | 63528 | 63528.0 | 1 | 1 | 36 |
10007 | 74572 | 58644.0 | 2 | 4 | 36 |
The following example shows how to filter which rows are used for each aggregation, using the WHERE
clause:
FROM employees
| KEEP emp_no, salary
| INLINE STATS avg_lt_50 = ROUND(AVG(salary)) WHERE salary < 50000,
avg_lt_60 = ROUND(AVG(salary)) WHERE salary >=50000 AND salary < 60000,
avg_gt_60 = ROUND(AVG(salary)) WHERE salary >= 60000
emp_no:integer | salary:integer | avg_lt_50:double | avg_lt_60:double | avg_gt_60:double |
---|---|---|---|---|
10001 | 57305 | 38292.0 | 54221.0 | 67286.0 |
10002 | 56371 | 38292.0 | 54221.0 | 67286.0 |
10003 | 61805 | 38292.0 | 54221.0 | 67286.0 |
10004 | 36174 | 38292.0 | 54221.0 | 67286.0 |
10005 | 63528 | 38292.0 | 54221.0 | 67286.0 |
Limitations
- The
CATEGORIZE
grouping function is not currently supported. INLINE STATS
cannot yet have an unboundedSORT
before it. You must either move the SORT after it, or add aLIMIT
before theSORT
.