Function Description
any(expr) Returns true if at least one value of `expr` is true.
approx_count_distinct(expr[, relativeSD]) Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum relative standard deviation allowed.
approx_percentile(col, percentage [, accuracy]) Returns the approximate `percentile` of the numeric or ansi interval column `col` which is the smallest value in the ordered `col` values (sorted from least to greatest) such that no more than `percentage` of `col` values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
array_agg(expr) Collects and returns a list of non-unique elements.
avg(expr) Returns the mean calculated from values of a group.
bit_and(expr) Returns the bitwise AND of all non-null input values, or null if none.
bit_or(expr) Returns the bitwise OR of all non-null input values, or null if none.
bit_xor(expr) Returns the bitwise XOR of all non-null input values, or null if none.
bool_and(expr) Returns true if all values of `expr` are true.
bool_or(expr) Returns true if at least one value of `expr` is true.
collect_list(expr) Collects and returns a list of non-unique elements.
collect_set(expr) Collects and returns a set of unique elements.
corr(expr1, expr2) Returns Pearson coefficient of correlation between a set of number pairs.
count(*) Returns the total number of retrieved rows, including rows containing null.
count(expr[, expr...]) Returns the number of rows for which the supplied expression(s) are all non-null.
count(DISTINCT expr[, expr...]) Returns the number of rows for which the supplied expression(s) are unique and non-null.
count_if(expr) Returns the number of `TRUE` values for the expression.
count_min_sketch(col, eps, confidence, seed) Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a `CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
covar_pop(expr1, expr2) Returns the population covariance of a set of number pairs.
covar_samp(expr1, expr2) Returns the sample covariance of a set of number pairs.
every(expr) Returns true if all values of `expr` are true.
first(expr[, isIgnoreNull]) Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
first_value(expr[, isIgnoreNull]) Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
grouping(col) indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.",
grouping_id([col1[, col2 ..]]) returns the level of grouping, equals to `(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)`
histogram_numeric(expr, nb) Computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function.
kurtosis(expr) Returns the kurtosis value calculated from values of a group.
last(expr[, isIgnoreNull]) Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values
last_value(expr[, isIgnoreNull]) Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values
max(expr) Returns the maximum value of `expr`.
max_by(x, y) Returns the value of `x` associated with the maximum value of `y`.
mean(expr) Returns the mean calculated from values of a group.
min(expr) Returns the minimum value of `expr`.
min_by(x, y) Returns the value of `x` associated with the minimum value of `y`.
percentile(col, percentage [, frequency]) Returns the exact percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) Returns the exact percentile value array of numeric column `col` at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile_approx(col, percentage [, accuracy]) Returns the approximate `percentile` of the numeric or ansi interval column `col` which is the smallest value in the ordered `col` values (sorted from least to greatest) such that no more than `percentage` of `col` values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
regr_avgx(y, x) Returns the average of the independent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_avgy(y, x) Returns the average of the dependent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_count(y, x) Returns the number of non-null number pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
regr_r2(y, x) Returns the coefficient of determination for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable.
skewness(expr) Returns the skewness value calculated from values of a group.
some(expr) Returns true if at least one value of `expr` is true.
std(expr) Returns the sample standard deviation calculated from values of a group.
stddev(expr) Returns the sample standard deviation calculated from values of a group.
stddev_pop(expr) Returns the population standard deviation calculated from values of a group.
stddev_samp(expr) Returns the sample standard deviation calculated from values of a group.
sum(expr) Returns the sum calculated from values of a group.
try_avg(expr) Returns the mean calculated from values of a group and the result is null on overflow.
try_sum(expr) Returns the sum calculated from values of a group and the result is null on overflow.
var_pop(expr) Returns the population variance calculated from values of a group.
var_samp(expr) Returns the sample variance calculated from values of a group.
variance(expr) Returns the sample variance calculated from values of a group.