Function Description
any(expr) Returns true if at least one value of `expr` is true.
any_value(expr[, isIgnoreNull]) Returns some value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
approx_count_distinct(expr[, relativeSD]) Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum relative standard deviation allowed.
approx_percentile(col, percentage [, accuracy]) Returns the approximate `percentile` of the numeric or ansi interval column `col` which is the smallest value in the ordered `col` values (sorted from least to greatest) such that no more than `percentage` of `col` values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
array_agg(expr) Collects and returns a list of non-unique elements.
avg(expr) Returns the mean calculated from values of a group.
bit_and(expr) Returns the bitwise AND of all non-null input values, or null if none.
bit_or(expr) Returns the bitwise OR of all non-null input values, or null if none.
bit_xor(expr) Returns the bitwise XOR of all non-null input values, or null if none.
bitmap_construct_agg(child) Returns a bitmap with the positions of the bits set from all the values from the child expression. The child expression will most likely be bitmap_bit_position().
bitmap_or_agg(child) Returns a bitmap that is the bitwise OR of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg().
bool_and(expr) Returns true if all values of `expr` are true.
bool_or(expr) Returns true if at least one value of `expr` is true.
collect_list(expr) Collects and returns a list of non-unique elements.
collect_set(expr) Collects and returns a set of unique elements.
corr(expr1, expr2) Returns Pearson coefficient of correlation between a set of number pairs.
count(*) Returns the total number of retrieved rows, including rows containing null.
count(expr[, expr...]) Returns the number of rows for which the supplied expression(s) are all non-null.
count(DISTINCT expr[, expr...]) Returns the number of rows for which the supplied expression(s) are unique and non-null.
count_if(expr) Returns the number of `TRUE` values for the expression.
count_min_sketch(col, eps, confidence, seed) Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a `CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
covar_pop(expr1, expr2) Returns the population covariance of a set of number pairs.
covar_samp(expr1, expr2) Returns the sample covariance of a set of number pairs.
every(expr) Returns true if all values of `expr` are true.
first(expr[, isIgnoreNull]) Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
first_value(expr[, isIgnoreNull]) Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
grouping(col) indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.",
grouping_id([col1[, col2 ..]]) returns the level of grouping, equals to `(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)`
histogram_numeric(expr, nb) Computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function.
hll_sketch_agg(expr, lgConfigK) Returns the HllSketch's updatable binary representation. `lgConfigK` (optional) the log-base-2 of K, with K is the number of buckets or slots for the HllSketch.
hll_union_agg(expr, allowDifferentLgConfigK) Returns the estimated number of unique values. `allowDifferentLgConfigK` (optional) Allow sketches with different lgConfigK values to be unioned (defaults to false).
kurtosis(expr) Returns the kurtosis value calculated from values of a group.
last(expr[, isIgnoreNull]) Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values
last_value(expr[, isIgnoreNull]) Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values
max(expr) Returns the maximum value of `expr`.
max_by(x, y) Returns the value of `x` associated with the maximum value of `y`.
mean(expr) Returns the mean calculated from values of a group.
median(col) Returns the median of numeric or ANSI interval column `col`.
min(expr) Returns the minimum value of `expr`.
min_by(x, y) Returns the value of `x` associated with the minimum value of `y`.
mode(col[, deterministic]) Returns the most frequent value for the values within `col`. NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency then either any of values is returned if `deterministic` is false or is not defined, or the lowest value is returned if `deterministic` is true.
mode() WITHIN GROUP (ORDER BY col) Returns the most frequent value for the values within `col` (specified in ORDER BY clause). NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency only one value will be returned. The value will be chosen based on sort direction. Return the smallest value if sort direction is asc or the largest value if sort direction is desc from multiple values with the same frequency.
percentile(col, percentage [, frequency]) Returns the exact percentile value of numeric or ANSI interval column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) Returns the exact percentile value array of numeric column `col` at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile_approx(col, percentage [, accuracy]) Returns the approximate `percentile` of the numeric or ansi interval column `col` which is the smallest value in the ordered `col` values (sorted from least to greatest) such that no more than `percentage` of `col` values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
percentile_cont(percentage) WITHIN GROUP (ORDER BY col) Return a percentile value based on a continuous distribution of numeric or ANSI interval column `col` at the given `percentage` (specified in ORDER BY clause).
percentile_disc(percentage) WITHIN GROUP (ORDER BY col) Return a percentile value based on a discrete distribution of numeric or ANSI interval column `col` at the given `percentage` (specified in ORDER BY clause).
regr_avgx(y, x) Returns the average of the independent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_avgy(y, x) Returns the average of the dependent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_count(y, x) Returns the number of non-null number pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_intercept(y, x) Returns the intercept of the univariate linear regression line for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_r2(y, x) Returns the coefficient of determination for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_slope(y, x) Returns the slope of the linear regression line for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_sxx(y, x) Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_sxy(y, x) Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_syy(y, x) Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
skewness(expr) Returns the skewness value calculated from values of a group.
some(expr) Returns true if at least one value of `expr` is true.
std(expr) Returns the sample standard deviation calculated from values of a group.
stddev(expr) Returns the sample standard deviation calculated from values of a group.
stddev_pop(expr) Returns the population standard deviation calculated from values of a group.
stddev_samp(expr) Returns the sample standard deviation calculated from values of a group.
sum(expr) Returns the sum calculated from values of a group.
try_avg(expr) Returns the mean calculated from values of a group and the result is null on overflow.
try_sum(expr) Returns the sum calculated from values of a group and the result is null on overflow.
var_pop(expr) Returns the population variance calculated from values of a group.
var_samp(expr) Returns the sample variance calculated from values of a group.
variance(expr) Returns the sample variance calculated from values of a group.