Function Description
any(expr) Returns true if at least one value of `expr` is true.
approx_count_distinct(expr[, relativeSD]) Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum relative standard deviation allowed.
approx_percentile(col, percentage [, accuracy]) Returns the approximate `percentile` of the numeric column `col` which is the smallest value in the ordered `col` values (sorted from least to greatest) such that no more than `percentage` of `col` values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
avg(expr) Returns the mean calculated from values of a group.
bit_and(expr) Returns the bitwise AND of all non-null input values, or null if none.
bit_or(expr) Returns the bitwise OR of all non-null input values, or null if none.
bit_xor(expr) Returns the bitwise XOR of all non-null input values, or null if none.
bool_and(expr) Returns true if all values of `expr` are true.
bool_or(expr) Returns true if at least one value of `expr` is true.
collect_list(expr) Collects and returns a list of non-unique elements.
collect_set(expr) Collects and returns a set of unique elements.
corr(expr1, expr2) Returns Pearson coefficient of correlation between a set of number pairs.
count(*) Returns the total number of retrieved rows, including rows containing null.
count(expr[, expr...]) Returns the number of rows for which the supplied expression(s) are all non-null.
count(DISTINCT expr[, expr...]) Returns the number of rows for which the supplied expression(s) are unique and non-null.
count_if(expr) Returns the number of `TRUE` values for the expression.
count_min_sketch(col, eps, confidence, seed) Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a `CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
covar_pop(expr1, expr2) Returns the population covariance of a set of number pairs.
covar_samp(expr1, expr2) Returns the sample covariance of a set of number pairs.
every(expr) Returns true if all values of `expr` are true.
first(expr[, isIgnoreNull]) Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
first_value(expr[, isIgnoreNull]) Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.
kurtosis(expr) Returns the kurtosis value calculated from values of a group.
last(expr[, isIgnoreNull]) Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values
last_value(expr[, isIgnoreNull]) Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values
max(expr) Returns the maximum value of `expr`.
max_by(x, y) Returns the value of `x` associated with the maximum value of `y`.
mean(expr) Returns the mean calculated from values of a group.
min(expr) Returns the minimum value of `expr`.
min_by(x, y) Returns the value of `x` associated with the minimum value of `y`.
percentile(col, percentage [, frequency]) Returns the exact percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) Returns the exact percentile value array of numeric column `col` at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile_approx(col, percentage [, accuracy]) Returns the approximate `percentile` of the numeric column `col` which is the smallest value in the ordered `col` values (sorted from least to greatest) such that no more than `percentage` of `col` values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.
skewness(expr) Returns the skewness value calculated from values of a group.
some(expr) Returns true if at least one value of `expr` is true.
std(expr) Returns the sample standard deviation calculated from values of a group.
stddev(expr) Returns the sample standard deviation calculated from values of a group.
stddev_pop(expr) Returns the population standard deviation calculated from values of a group.
stddev_samp(expr) Returns the sample standard deviation calculated from values of a group.
sum(expr) Returns the sum calculated from values of a group.
var_pop(expr) Returns the population variance calculated from values of a group.
var_samp(expr) Returns the sample variance calculated from values of a group.
variance(expr) Returns the sample variance calculated from values of a group.