`INLINE STATS`

Stack Planned Serverless Preview

The INLINE STATS processing command groups rows according to a common value and calculates one or more aggregated values over the grouped rows. The results are appended as new columns to the input rows.

The command is identical to STATS except that it preserves all the columns from the input table.

Syntax

		INLINE STATS [column1 =] expression1 [WHERE boolean_expression1][,
      ...,
      [columnN =] expressionN [WHERE boolean_expressionN]]
      [BY [grouping_name1 =] grouping_expression1[,
          ...,
          [grouping_nameN = ] grouping_expressionN]]
		
	

Parameters

columnX: The name by which the aggregated value is returned. If omitted, the name is equal to the corresponding expression (expressionX). If multiple columns have the same name, all but the rightmost column with this name will be ignored.
expressionX: An expression that computes an aggregated value.
grouping_expressionX: An expression that outputs the values to group by. If its name coincides with one of the existing or computed columns, that column will be overridden by this one.
boolean_expressionX: The condition that determines which rows are included when evaluating expressionX.

Note

Individual null values are skipped when computing aggregations.

Description

The INLINE STATS processing command groups rows according to a common value (also known as the grouping key), specified after BY, and calculates one or more aggregated values over the grouped rows. The output table contains the same number of rows as the input table. The command only adds new columns or overrides existing columns with the same name as the result.

If column names overlap, existing column values may be overridden and column order may change. The new columns are added/moved so that they appear in the order they are defined in the INLINE STATS command.

For the calculation of each aggregated value, the rows in a group can be filtered with WHERE. If BY is omitted the aggregations are applied over the entire dataset.

The following aggregation functions are supported:

The following grouping functions are supported:

Examples

The following example shows how to calculate a statistic on one column and group by the values of another column.

Note

The languages column moves to the last position in the output table because it is a column overriden by the INLINE STATS command (it's the grouping key) and it is the last column defined by it.

		FROM employees
| KEEP emp_no, languages, salary
| INLINE STATS max_salary = MAX(salary) BY languages
		
	

emp_no:integer	salary:integer	max_salary:integer	languages:integer
10001	57305	73578	2
10002	56371	66817	5
10003	61805	74572	4
10004	36174	66817	5
10005	63528	73717	1

The following example shows how to calculate an aggregation over the entire dataset by omitting BY. The order of the existing columns is preserved and a new column with the calculated maximum salary value is added as the last column:

		FROM employees
| KEEP emp_no, languages, salary
| INLINE STATS max_salary = MAX(salary)
		
	

emp_no:integer	languages:integer	salary:integer	max_salary:integer
10001	2	57305	74999
10002	5	56371	74999
10003	4	61805	74999
10004	5	36174	74999
10005	1	63528	74999

The following example shows how to calculate multiple aggregations with multiple grouping keys:

		FROM employees
| WHERE still_hired
| KEEP emp_no, languages, salary, hire_date
| EVAL tenure = DATE_DIFF("year", hire_date, now())
| DROP hire_date
| INLINE STATS avg_salary = AVG(salary), count = count(*) BY languages, tenure
		
	

emp_no:integer	salary:integer	avg_salary:double	count:long	languages:integer	tenure:integer
10001	57305	51130.5	2	2	39
10002	56371	40180.0	3	5	39
10004	36174	30749.0	2	5	38
10005	63528	63528.0	1	1	36
10007	74572	58644.0	2	4	36

The following example shows how to filter which rows are used for each aggregation, using the WHERE clause:

		FROM employees
| KEEP emp_no, salary
| INLINE STATS avg_lt_50 = ROUND(AVG(salary)) WHERE salary < 50000,
               avg_lt_60 = ROUND(AVG(salary)) WHERE salary >=50000 AND salary < 60000,
               avg_gt_60 = ROUND(AVG(salary)) WHERE salary >= 60000
		
	

emp_no:integer	salary:integer	avg_lt_50:double	avg_lt_60:double	avg_gt_60:double
10001	57305	38292.0	54221.0	67286.0
10002	56371	38292.0	54221.0	67286.0
10003	61805	38292.0	54221.0	67286.0
10004	36174	38292.0	54221.0	67286.0
10005	63528	38292.0	54221.0	67286.0

Limitations

The CATEGORIZE grouping function is not currently supported.
INLINE STATS cannot yet have an unbounded SORT before it. You must either move the SORT after it, or add a LIMIT before the SORT.