{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Built-in Functions \u00b6 .func-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 0.5rem; margin-bottom: 1.5rem; } .func-grid a { padding: 0.25rem 0; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; } @media (max-width: 1200px) { .func-grid { grid-template-columns: repeat(3, 1fr); } } @media (max-width: 768px) { .func-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .func-grid { grid-template-columns: 1fr; } } Spark SQL provides a comprehensive set of built-in functions for data manipulation and analysis. Functions are organized into the following categories: Agg Functions (86) \u00b6 any any_value approx_count_distinct approx_percentile approx_top_k approx_top_k_accumulate approx_top_k_combine array_agg avg bit_and bit_or bit_xor bitmap_and_agg bitmap_construct_agg bitmap_or_agg bool_and bool_or collect_list collect_set corr count count_if count_min_sketch covar_pop covar_samp every first first_value grouping grouping_id histogram_numeric hll_sketch_agg hll_union_agg kll_merge_agg_bigint kll_merge_agg_double kll_merge_agg_float kll_sketch_agg_bigint kll_sketch_agg_double kll_sketch_agg_float kurtosis last last_value listagg max max_by mean measure median min min_by mode percentile percentile_approx percentile_cont percentile_disc regr_avgx regr_avgy regr_count regr_intercept regr_r2 regr_slope regr_sxx regr_sxy regr_syy skewness some std stddev stddev_pop stddev_samp string_agg sum theta_intersection_agg theta_sketch_agg theta_union_agg try_avg try_sum tuple_intersection_agg_double tuple_intersection_agg_integer tuple_sketch_agg_double tuple_sketch_agg_integer tuple_union_agg_double tuple_union_agg_integer var_pop var_samp variance Array Functions (25) \u00b6 array array_append array_compact array_contains array_distinct array_except array_insert array_intersect array_join array_max array_min array_position array_prepend array_remove array_repeat array_size array_union arrays_overlap arrays_zip flatten get sequence shuffle slice sort_array Avro Functions (3) \u00b6 from_avro schema_of_avro to_avro Bitwise Functions (13) \u00b6 & < < >> >>> ^ bit_count bit_get getbit shiftleft shiftright shiftrightunsigned | ~ Collection Functions (18) \u00b6 aggregate array_sort cardinality concat element_at exists filter forall map_filter map_zip_with reduce reverse size transform transform_keys transform_values try_element_at zip_with Conditional Functions (12) \u00b6 between case coalesce if ifnull nanvl nullif nullifzero nvl nvl2 when zeroifnull Conversion Functions (14) \u00b6 bigint binary boolean cast date decimal double float int smallint string time timestamp tinyint Csv Functions (3) \u00b6 from_csv schema_of_csv to_csv Datetime Functions (81) \u00b6 add_months convert_timezone curdate current_date current_time current_timestamp current_timezone date_add date_diff date_format date_from_unix_date date_part date_sub date_trunc dateadd datediff datepart day dayname dayofmonth dayofweek dayofyear extract from_unixtime from_utc_timestamp hour last_day localtimestamp make_date make_dt_interval make_interval make_time make_timestamp make_timestamp_ltz make_timestamp_ntz make_ym_interval minute month monthname months_between next_day now quarter second session_window time_diff time_from_micros time_from_millis time_from_seconds time_to_micros time_to_millis time_to_seconds time_trunc timestamp_micros timestamp_millis timestamp_seconds to_date to_time to_timestamp to_timestamp_ltz to_timestamp_ntz to_unix_timestamp to_utc_timestamp trunc try_make_interval try_make_timestamp try_make_timestamp_ltz try_make_timestamp_ntz try_to_date try_to_time try_to_timestamp unix_date unix_micros unix_millis unix_seconds unix_timestamp weekday weekofyear window window_time year Generator Functions (9) \u00b6 collations explode explode_outer inline inline_outer posexplode posexplode_outer sql_keywords stack Hash Functions (7) \u00b6 crc32 hash md5 sha sha1 sha2 xxhash64 Json Functions (7) \u00b6 from_json get_json_object json_array_length json_object_keys json_tuple schema_of_json to_json Map Functions (9) \u00b6 map map_concat map_contains_key map_entries map_from_arrays map_from_entries map_keys map_values str_to_map Math Functions (68) \u00b6 % * + - / abs acos acosh asin asinh atan atan2 atanh bin bround cbrt ceil ceiling conv cos cosh cot csc degrees div e exp expm1 factorial floor greatest hex hypot least ln log log10 log1p log2 mod negative pi pmod positive pow power radians rand randn random rint round sec sign signum sin sinh sqrt tan tanh try_add try_divide try_mod try_multiply try_subtract unhex uniform width_bucket Misc Functions (25) \u00b6 aes_decrypt aes_encrypt assert_true bitmap_bit_position bitmap_bucket_number bitmap_count current_catalog current_database current_schema current_user input_file_block_length input_file_block_start input_file_name java_method monotonically_increasing_id raise_error reflect session_user spark_partition_id try_aes_decrypt try_reflect typeof user uuid version Predicate Functions (23) \u00b6 ! != < < = < => < > = == > >= and equal_null ilike in isnan isnotnull isnull like not or regexp regexp_like rlike Protobuf Functions (2) \u00b6 from_protobuf to_protobuf Sketch Functions (40) \u00b6 approx_top_k_estimate hll_sketch_estimate hll_union kll_sketch_get_n_bigint kll_sketch_get_n_double kll_sketch_get_n_float kll_sketch_get_quantile_bigint kll_sketch_get_quantile_double kll_sketch_get_quantile_float kll_sketch_get_rank_bigint kll_sketch_get_rank_double kll_sketch_get_rank_float kll_sketch_merge_bigint kll_sketch_merge_double kll_sketch_merge_float kll_sketch_to_string_bigint kll_sketch_to_string_double kll_sketch_to_string_float theta_difference theta_intersection theta_sketch_estimate theta_union tuple_difference_double tuple_difference_integer tuple_difference_theta_double tuple_difference_theta_integer tuple_intersection_double tuple_intersection_integer tuple_intersection_theta_double tuple_intersection_theta_integer tuple_sketch_estimate_double tuple_sketch_estimate_integer tuple_sketch_summary_double tuple_sketch_summary_integer tuple_sketch_theta_double tuple_sketch_theta_integer tuple_union_double tuple_union_integer tuple_union_theta_double tuple_union_theta_integer St Functions (5) \u00b6 st_asbinary st_geogfromwkb st_geomfromwkb st_setsrid st_srid String Functions (74) \u00b6 ascii base64 bit_length btrim char char_length character_length chr collate collation concat_ws contains decode elt encode endswith find_in_set format_number format_string initcap instr is_valid_utf8 lcase left len length levenshtein locate lower lpad ltrim luhn_check make_valid_utf8 mask octet_length overlay position printf quote randstr regexp_count regexp_extract regexp_extract_all regexp_instr regexp_replace regexp_substr repeat replace right rpad rtrim sentences soundex space split split_part startswith substr substring substring_index to_binary to_char to_number to_varchar translate trim try_to_binary try_to_number try_validate_utf8 ucase unbase64 upper validate_utf8 || Struct Functions (2) \u00b6 named_struct struct Table Functions (2) \u00b6 python_worker_logs range Url Functions (5) \u00b6 parse_url try_parse_url try_url_decode url_decode url_encode Variant Functions (10) \u00b6 is_variant_null parse_json schema_of_variant schema_of_variant_agg to_variant_object try_parse_json try_variant_get variant_explode variant_explode_outer variant_get Vector Functions (7) \u00b6 vector_avg vector_cosine_similarity vector_inner_product vector_l2_distance vector_norm vector_normalize vector_sum Window Functions (9) \u00b6 cume_dist dense_rank lag lead nth_value ntile percent_rank rank row_number Xml Functions (12) \u00b6 from_xml schema_of_xml to_xml xpath xpath_boolean xpath_double xpath_float xpath_int xpath_long xpath_number xpath_short xpath_string","title":"Overview"},{"location":"#built-in-functions","text":".func-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 0.5rem; margin-bottom: 1.5rem; } .func-grid a { padding: 0.25rem 0; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; } @media (max-width: 1200px) { .func-grid { grid-template-columns: repeat(3, 1fr); } } @media (max-width: 768px) { .func-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .func-grid { grid-template-columns: 1fr; } } Spark SQL provides a comprehensive set of built-in functions for data manipulation and analysis. Functions are organized into the following categories:","title":"Built-in Functions"},{"location":"#agg-functions-86","text":"any any_value approx_count_distinct approx_percentile approx_top_k approx_top_k_accumulate approx_top_k_combine array_agg avg bit_and bit_or bit_xor bitmap_and_agg bitmap_construct_agg bitmap_or_agg bool_and bool_or collect_list collect_set corr count count_if count_min_sketch covar_pop covar_samp every first first_value grouping grouping_id histogram_numeric hll_sketch_agg hll_union_agg kll_merge_agg_bigint kll_merge_agg_double kll_merge_agg_float kll_sketch_agg_bigint kll_sketch_agg_double kll_sketch_agg_float kurtosis last last_value listagg max max_by mean measure median min min_by mode percentile percentile_approx percentile_cont percentile_disc regr_avgx regr_avgy regr_count regr_intercept regr_r2 regr_slope regr_sxx regr_sxy regr_syy skewness some std stddev stddev_pop stddev_samp string_agg sum theta_intersection_agg theta_sketch_agg theta_union_agg try_avg try_sum tuple_intersection_agg_double tuple_intersection_agg_integer tuple_sketch_agg_double tuple_sketch_agg_integer tuple_union_agg_double tuple_union_agg_integer var_pop var_samp variance","title":"Agg Functions (86)"},{"location":"#array-functions-25","text":"array array_append array_compact array_contains array_distinct array_except array_insert array_intersect array_join array_max array_min array_position array_prepend array_remove array_repeat array_size array_union arrays_overlap arrays_zip flatten get sequence shuffle slice sort_array","title":"Array Functions (25)"},{"location":"#avro-functions-3","text":"from_avro schema_of_avro to_avro","title":"Avro Functions (3)"},{"location":"#bitwise-functions-13","text":"& < < >> >>> ^ bit_count bit_get getbit shiftleft shiftright shiftrightunsigned | ~","title":"Bitwise Functions (13)"},{"location":"#collection-functions-18","text":"aggregate array_sort cardinality concat element_at exists filter forall map_filter map_zip_with reduce reverse size transform transform_keys transform_values try_element_at zip_with","title":"Collection Functions (18)"},{"location":"#conditional-functions-12","text":"between case coalesce if ifnull nanvl nullif nullifzero nvl nvl2 when zeroifnull","title":"Conditional Functions (12)"},{"location":"#conversion-functions-14","text":"bigint binary boolean cast date decimal double float int smallint string time timestamp tinyint","title":"Conversion Functions (14)"},{"location":"#csv-functions-3","text":"from_csv schema_of_csv to_csv","title":"Csv Functions (3)"},{"location":"#datetime-functions-81","text":"add_months convert_timezone curdate current_date current_time current_timestamp current_timezone date_add date_diff date_format date_from_unix_date date_part date_sub date_trunc dateadd datediff datepart day dayname dayofmonth dayofweek dayofyear extract from_unixtime from_utc_timestamp hour last_day localtimestamp make_date make_dt_interval make_interval make_time make_timestamp make_timestamp_ltz make_timestamp_ntz make_ym_interval minute month monthname months_between next_day now quarter second session_window time_diff time_from_micros time_from_millis time_from_seconds time_to_micros time_to_millis time_to_seconds time_trunc timestamp_micros timestamp_millis timestamp_seconds to_date to_time to_timestamp to_timestamp_ltz to_timestamp_ntz to_unix_timestamp to_utc_timestamp trunc try_make_interval try_make_timestamp try_make_timestamp_ltz try_make_timestamp_ntz try_to_date try_to_time try_to_timestamp unix_date unix_micros unix_millis unix_seconds unix_timestamp weekday weekofyear window window_time year","title":"Datetime Functions (81)"},{"location":"#generator-functions-9","text":"collations explode explode_outer inline inline_outer posexplode posexplode_outer sql_keywords stack","title":"Generator Functions (9)"},{"location":"#hash-functions-7","text":"crc32 hash md5 sha sha1 sha2 xxhash64","title":"Hash Functions (7)"},{"location":"#json-functions-7","text":"from_json get_json_object json_array_length json_object_keys json_tuple schema_of_json to_json","title":"Json Functions (7)"},{"location":"#map-functions-9","text":"map map_concat map_contains_key map_entries map_from_arrays map_from_entries map_keys map_values str_to_map","title":"Map Functions (9)"},{"location":"#math-functions-68","text":"% * + - / abs acos acosh asin asinh atan atan2 atanh bin bround cbrt ceil ceiling conv cos cosh cot csc degrees div e exp expm1 factorial floor greatest hex hypot least ln log log10 log1p log2 mod negative pi pmod positive pow power radians rand randn random rint round sec sign signum sin sinh sqrt tan tanh try_add try_divide try_mod try_multiply try_subtract unhex uniform width_bucket","title":"Math Functions (68)"},{"location":"#misc-functions-25","text":"aes_decrypt aes_encrypt assert_true bitmap_bit_position bitmap_bucket_number bitmap_count current_catalog current_database current_schema current_user input_file_block_length input_file_block_start input_file_name java_method monotonically_increasing_id raise_error reflect session_user spark_partition_id try_aes_decrypt try_reflect typeof user uuid version","title":"Misc Functions (25)"},{"location":"#predicate-functions-23","text":"! != < < = < => < > = == > >= and equal_null ilike in isnan isnotnull isnull like not or regexp regexp_like rlike","title":"Predicate Functions (23)"},{"location":"#protobuf-functions-2","text":"from_protobuf to_protobuf","title":"Protobuf Functions (2)"},{"location":"#sketch-functions-40","text":"approx_top_k_estimate hll_sketch_estimate hll_union kll_sketch_get_n_bigint kll_sketch_get_n_double kll_sketch_get_n_float kll_sketch_get_quantile_bigint kll_sketch_get_quantile_double kll_sketch_get_quantile_float kll_sketch_get_rank_bigint kll_sketch_get_rank_double kll_sketch_get_rank_float kll_sketch_merge_bigint kll_sketch_merge_double kll_sketch_merge_float kll_sketch_to_string_bigint kll_sketch_to_string_double kll_sketch_to_string_float theta_difference theta_intersection theta_sketch_estimate theta_union tuple_difference_double tuple_difference_integer tuple_difference_theta_double tuple_difference_theta_integer tuple_intersection_double tuple_intersection_integer tuple_intersection_theta_double tuple_intersection_theta_integer tuple_sketch_estimate_double tuple_sketch_estimate_integer tuple_sketch_summary_double tuple_sketch_summary_integer tuple_sketch_theta_double tuple_sketch_theta_integer tuple_union_double tuple_union_integer tuple_union_theta_double tuple_union_theta_integer","title":"Sketch Functions (40)"},{"location":"#st-functions-5","text":"st_asbinary st_geogfromwkb st_geomfromwkb st_setsrid st_srid","title":"St Functions (5)"},{"location":"#string-functions-74","text":"ascii base64 bit_length btrim char char_length character_length chr collate collation concat_ws contains decode elt encode endswith find_in_set format_number format_string initcap instr is_valid_utf8 lcase left len length levenshtein locate lower lpad ltrim luhn_check make_valid_utf8 mask octet_length overlay position printf quote randstr regexp_count regexp_extract regexp_extract_all regexp_instr regexp_replace regexp_substr repeat replace right rpad rtrim sentences soundex space split split_part startswith substr substring substring_index to_binary to_char to_number to_varchar translate trim try_to_binary try_to_number try_validate_utf8 ucase unbase64 upper validate_utf8 ||","title":"String Functions (74)"},{"location":"#struct-functions-2","text":"named_struct struct","title":"Struct Functions (2)"},{"location":"#table-functions-2","text":"python_worker_logs range","title":"Table Functions (2)"},{"location":"#url-functions-5","text":"parse_url try_parse_url try_url_decode url_decode url_encode","title":"Url Functions (5)"},{"location":"#variant-functions-10","text":"is_variant_null parse_json schema_of_variant schema_of_variant_agg to_variant_object try_parse_json try_variant_get variant_explode variant_explode_outer variant_get","title":"Variant Functions (10)"},{"location":"#vector-functions-7","text":"vector_avg vector_cosine_similarity vector_inner_product vector_l2_distance vector_norm vector_normalize vector_sum","title":"Vector Functions (7)"},{"location":"#window-functions-9","text":"cume_dist dense_rank lag lead nth_value ntile percent_rank rank row_number","title":"Window Functions (9)"},{"location":"#xml-functions-12","text":"from_xml schema_of_xml to_xml xpath xpath_boolean xpath_double xpath_float xpath_int xpath_long xpath_number xpath_short xpath_string","title":"Xml Functions (12)"},{"location":"agg-functions/","text":"Agg Functions \u00b6 This page lists all agg functions available in Spark SQL. any \u00b6 any(expr) - Returns true if at least one value of expr is true. Examples: > SELECT any(col) FROM VALUES (true), (false), (false) AS tab(col); true > SELECT any(col) FROM VALUES (NULL), (true), (false) AS tab(col); true > SELECT any(col) FROM VALUES (false), (false), (NULL) AS tab(col); false Since: 3.0.0 any_value \u00b6 any_value(expr[, isIgnoreNull]) - Returns some value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. Examples: > SELECT any_value(col) FROM VALUES (10), (5), (20) AS tab(col); 10 > SELECT any_value(col) FROM VALUES (NULL), (5), (20) AS tab(col); NULL > SELECT any_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col); 5 Note: The function is non-deterministic. Since: 3.4.0 approx_count_distinct \u00b6 approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD defines the maximum relative standard deviation allowed. Examples: > SELECT approx_count_distinct(col1) FROM VALUES (1), (1), (2), (2), (3) tab(col1); 3 Since: 1.6.0 approx_percentile \u00b6 approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array. Examples: > SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col); [1,1,0] > SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col); 7 > SELECT approx_percentile(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-1 > SELECT approx_percentile(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col); [0 00:00:01.000000000,0 00:00:02.000000000] Since: 2.1.0 approx_top_k \u00b6 approx_top_k(expr, k, maxItemsTracked) - Returns top k items with their frequency. k An optional INTEGER literal greater than 0. If k is not specified, it defaults to 5. maxItemsTracked An optional INTEGER literal greater than or equal to k and has upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000. Examples: > SELECT approx_top_k(expr) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] > SELECT approx_top_k(expr, 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr); [{\"item\":\"c\",\"count\":4},{\"item\":\"d\",\"count\":2}] > SELECT approx_top_k(expr, 10, 100) FROM VALUES (0), (1), (1), (2), (2), (2) AS tab(expr); [{\"item\":2,\"count\":3},{\"item\":1,\"count\":2},{\"item\":0,\"count\":1}] Since: 4.1.0 approx_top_k_accumulate \u00b6 approx_top_k_accumulate(expr, maxItemsTracked) - Accumulates items into a sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000. Examples: > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr, 100), 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr); [{\"item\":\"c\",\"count\":4},{\"item\":\"d\",\"count\":2}] Since: 4.1.0 approx_top_k_combine \u00b6 approx_top_k_combine(state, maxItemsTracked) - Combines multiple sketches into a single sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is specified, it will be set for the combined sketch. If maxItemsTracked is not specified, the input sketches must have the same maxItemsTracked value, otherwise an error will be thrown. The output sketch will use the same value from the input sketches. Examples: > SELECT approx_top_k_estimate(approx_top_k_combine(sketch, 10000), 5) FROM (SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (0), (0), (1), (1) AS tab(expr) UNION ALL SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (2), (3), (4), (4) AS tab(expr)); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] Since: 4.1.0 array_agg \u00b6 array_agg(expr) - Collects and returns a list of non-unique elements. Examples: > SELECT array_agg(col) FROM VALUES (1), (2), (1) AS tab(col); [1,2,1] Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 3.3.0 avg \u00b6 avg(expr) - Returns the mean calculated from values of a group. Examples: > SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col); 2.0 > SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col); 1.5 Since: 1.0.0 bit_and \u00b6 bit_and(expr) - Returns the bitwise AND of all non-null input values, or null if none. Examples: > SELECT bit_and(col) FROM VALUES (3), (5) AS tab(col); 1 Since: 3.0.0 bit_or \u00b6 bit_or(expr) - Returns the bitwise OR of all non-null input values, or null if none. Examples: > SELECT bit_or(col) FROM VALUES (3), (5) AS tab(col); 7 Since: 3.0.0 bit_xor \u00b6 bit_xor(expr) - Returns the bitwise XOR of all non-null input values, or null if none. Examples: > SELECT bit_xor(col) FROM VALUES (3), (5) AS tab(col); 6 Since: 3.0.0 bitmap_and_agg \u00b6 bitmap_and_agg(child) - Returns a bitmap that is the bitwise AND of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg(). Examples: > SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'F0'), (X '70'), (X '30') AS tab(col); 300000 > SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'FF'), (X 'FF'), (X 'FF') AS tab(col); FF0000 Since: 4.1.0 bitmap_construct_agg \u00b6 bitmap_construct_agg(child) - Returns a bitmap with the positions of the bits set from all the values from the child expression. The child expression will most likely be bitmap_bit_position(). Examples: > SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (2), (3) AS tab(col); 070000 > SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (1), (1) AS tab(col); 010000 Since: 3.5.0 bitmap_or_agg \u00b6 bitmap_or_agg(child) - Returns a bitmap that is the bitwise OR of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg(). Examples: > SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '20'), (X '40') AS tab(col); 700000 > SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '10'), (X '10') AS tab(col); 100000 Since: 3.5.0 bool_and \u00b6 bool_and(expr) - Returns true if all values of expr are true. Examples: > SELECT bool_and(col) FROM VALUES (true), (true), (true) AS tab(col); true > SELECT bool_and(col) FROM VALUES (NULL), (true), (true) AS tab(col); true > SELECT bool_and(col) FROM VALUES (true), (false), (true) AS tab(col); false Since: 3.0.0 bool_or \u00b6 bool_or(expr) - Returns true if at least one value of expr is true. Examples: > SELECT bool_or(col) FROM VALUES (true), (false), (false) AS tab(col); true > SELECT bool_or(col) FROM VALUES (NULL), (true), (false) AS tab(col); true > SELECT bool_or(col) FROM VALUES (false), (false), (NULL) AS tab(col); false Since: 3.0.0 collect_list \u00b6 collect_list(expr) - Collects and returns a list of non-unique elements. Examples: > SELECT collect_list(col) FROM VALUES (1), (2), (1) AS tab(col); [1,2,1] Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0 collect_set \u00b6 collect_set(expr) - Collects and returns a set of unique elements. Examples: > SELECT collect_set(col) FROM VALUES (1), (2), (1) AS tab(col); [1,2] Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0 corr \u00b6 corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs. Examples: > SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (6, 4) as tab(c1, c2); 0.8660254037844387 Since: 1.6.0 count \u00b6 count(*) - Returns the total number of retrieved rows, including rows containing null. count(expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are all non-null. count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null. Examples: > SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col); 4 > SELECT count(col) FROM VALUES (NULL), (5), (5), (20) AS tab(col); 3 > SELECT count(DISTINCT col) FROM VALUES (NULL), (5), (5), (10) AS tab(col); 2 Since: 1.0.0 count_if \u00b6 count_if(expr) - Returns the number of TRUE values for the expression. Examples: > SELECT count_if(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); 2 > SELECT count_if(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); 1 Since: 3.0.0 count_min_sketch \u00b6 count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space. Examples: > SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col); 0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000 Since: 2.2.0 covar_pop \u00b6 covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs. Examples: > SELECT covar_pop(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2); 0.6666666666666666 Since: 2.0.0 covar_samp \u00b6 covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs. Examples: > SELECT covar_samp(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2); 1.0 Since: 2.0.0 every \u00b6 every(expr) - Returns true if all values of expr are true. Examples: > SELECT every(col) FROM VALUES (true), (true), (true) AS tab(col); true > SELECT every(col) FROM VALUES (NULL), (true), (true) AS tab(col); true > SELECT every(col) FROM VALUES (true), (false), (true) AS tab(col); false Since: 3.0.0 first \u00b6 first(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. Examples: > SELECT first(col) FROM VALUES (10), (5), (20) AS tab(col); 10 > SELECT first(col) FROM VALUES (NULL), (5), (20) AS tab(col); NULL > SELECT first(col, true) FROM VALUES (NULL), (5), (20) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0 first_value \u00b6 first_value(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. Examples: > SELECT first_value(col) FROM VALUES (10), (5), (20) AS tab(col); 10 > SELECT first_value(col) FROM VALUES (NULL), (5), (20) AS tab(col); NULL > SELECT first_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0 grouping \u00b6 grouping(col) - indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.\", Examples: > SELECT name, grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY cube(name); Alice 0 2 Bob 0 5 NULL 1 7 Since: 2.0.0 grouping_id \u00b6 grouping_id([col1[, col2 ..]]) - returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn) Examples: > SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height); Alice 0 2 165.0 Alice 1 2 165.0 NULL 3 7 172.5 Bob 0 5 180.0 Bob 1 5 180.0 NULL 2 2 165.0 NULL 2 5 180.0 Note: Input columns should match with grouping columns exactly, or empty (means all the grouping columns). Since: 2.0.0 histogram_numeric \u00b6 histogram_numeric(expr, nb) - Computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function. Examples: > SELECT histogram_numeric(col, 5) FROM VALUES (0), (1), (2), (10) AS tab(col); [{\"x\":0,\"y\":1.0},{\"x\":1,\"y\":1.0},{\"x\":2,\"y\":1.0},{\"x\":10,\"y\":1.0}] Since: 3.3.0 hll_sketch_agg \u00b6 hll_sketch_agg(expr, lgConfigK) - Returns the HllSketch's updatable binary representation. lgConfigK (optional) the log-base-2 of K, with K is the number of buckets or slots for the HllSketch. Examples: > SELECT hll_sketch_estimate(hll_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 3.5.0 hll_union_agg \u00b6 hll_union_agg(expr, allowDifferentLgConfigK) - Returns the estimated number of unique values. allowDifferentLgConfigK (optional) Allow sketches with different lgConfigK values to be unioned (defaults to false). Examples: > SELECT hll_sketch_estimate(hll_union_agg(sketch, true)) FROM (SELECT hll_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT hll_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col)); 1 Since: 3.5.0 kll_merge_agg_bigint \u00b6 kll_merge_agg_bigint(expr[, k]) - Merges binary KllLongsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_bigint). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch. Examples: > SELECT kll_sketch_get_n_bigint(kll_merge_agg_bigint(sketch)) FROM (SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (1), (2), (3) tab(col) UNION ALL SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (4), (5), (6) tab(col)) t; 6 Since: 4.1.0 kll_merge_agg_double \u00b6 kll_merge_agg_double(expr[, k]) - Merges binary KllDoublesSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_double). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch. Examples: > SELECT kll_sketch_get_n_double(kll_merge_agg_double(sketch)) FROM (SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)) tab(col) UNION ALL SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)), (CAST(6.0 AS DOUBLE)) tab(col)) t; 6 Since: 4.1.0 kll_merge_agg_float \u00b6 kll_merge_agg_float(expr[, k]) - Merges binary KllFloatsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_float). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch. Examples: > SELECT kll_sketch_get_n_float(kll_merge_agg_float(sketch)) FROM (SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)) tab(col) UNION ALL SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)), (CAST(6.0 AS FLOAT)) tab(col)) t; 6 Since: 4.1.0 kll_sketch_agg_bigint \u00b6 kll_sketch_agg_bigint(expr[, k]) - Returns the KllLongsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches. Examples: > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col, 400))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0 kll_sketch_agg_double \u00b6 kll_sketch_agg_double(expr[, k]) - Returns the KllDoublesSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches. Examples: > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col, 400))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0 kll_sketch_agg_float \u00b6 kll_sketch_agg_float(expr[, k]) - Returns the KllFloatsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches. Examples: > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col, 400))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0 kurtosis \u00b6 kurtosis(expr) - Returns the kurtosis value calculated from values of a group. Examples: > SELECT kurtosis(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col); -0.7014368047529627 > SELECT kurtosis(col) FROM VALUES (1), (10), (100), (10), (1) as tab(col); 0.19432323191699075 Since: 1.6.0 last \u00b6 last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values Examples: > SELECT last(col) FROM VALUES (10), (5), (20) AS tab(col); 20 > SELECT last(col) FROM VALUES (10), (5), (NULL) AS tab(col); NULL > SELECT last(col, true) FROM VALUES (10), (5), (NULL) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0 last_value \u00b6 last_value(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values Examples: > SELECT last_value(col) FROM VALUES (10), (5), (20) AS tab(col); 20 > SELECT last_value(col) FROM VALUES (10), (5), (NULL) AS tab(col); NULL > SELECT last_value(col, true) FROM VALUES (10), (5), (NULL) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0 listagg \u00b6 listagg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned. Arguments: expr - a string or binary expression to be concatenated. delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL. key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic. Examples: > SELECT listagg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col); abc > SELECT listagg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col); cba > SELECT listagg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col); ab > SELECT listagg(col) FROM VALUES ('a'), ('a') AS tab(col); aa > SELECT listagg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col); ab > SELECT listagg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col); a, b, c > SELECT listagg(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL Note: If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle. If DISTINCT is specified, then expr and key must be the same expression. Since: 4.0.0 max \u00b6 max(expr) - Returns the maximum value of expr . Examples: > SELECT max(col) FROM VALUES (10), (50), (20) AS tab(col); 50 Since: 1.0.0 max_by \u00b6 max_by(x, y) - Returns the value of x associated with the maximum value of y . max_by(x, y, k) - Returns an array of the k values of x associated with the maximum values of y , sorted in descending order by y . Returns NULL if there are no non-NULL ordering values. Examples: > SELECT max_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); b > SELECT max_by(x, y, 2) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); [\"b\",\"c\"] Note: The function is non-deterministic so the output order can be different for those associated the same values of y . The maximum value of k is 100000. Since: 4.2.0 mean \u00b6 mean(expr) - Returns the mean calculated from values of a group. Examples: > SELECT mean(col) FROM VALUES (1), (2), (3) AS tab(col); 2.0 > SELECT mean(col) FROM VALUES (1), (2), (NULL) AS tab(col); 1.5 Since: 1.0.0 measure \u00b6 measure(expr) - this function is used and can only be used to calculate a measure defined in a metric view. Examples: > SELECT dimension_col, measure(measure_col) FROM test_metric_view GROUP BY dimension_col; dim_1, 100 dim_2, 200 Since: 4.2.0 median \u00b6 median(col) - Returns the median of numeric or ANSI interval column col . Examples: > SELECT median(col) FROM VALUES (0), (10) AS tab(col); 5.0 > SELECT median(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-5 Since: 3.4.0 min \u00b6 min(expr) - Returns the minimum value of expr . Examples: > SELECT min(col) FROM VALUES (10), (-1), (20) AS tab(col); -1 Since: 1.0.0 min_by \u00b6 min_by(x, y) - Returns the value of x associated with the minimum value of y . min_by(x, y, k) - Returns an array of the k values of x associated with the minimum values of y , sorted in ascending order by y . Returns NULL if there are no non-NULL ordering values. Examples: > SELECT min_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); a > SELECT min_by(x, y, 2) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); [\"a\",\"c\"] Note: The function is non-deterministic so the output order can be different for those associated the same values of y . The maximum value of k is 100000. Since: 4.2.0 mode \u00b6 mode(col[, deterministic]) - Returns the most frequent value for the values within col . NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true. mode() WITHIN GROUP (ORDER BY col) - Returns the most frequent value for the values within col (specified in ORDER BY clause). NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency only one value will be returned. The value will be chosen based on sort direction. Return the smallest value if sort direction is asc or the largest value if sort direction is desc from multiple values with the same frequency. Examples: > SELECT mode(col) FROM VALUES (0), (10), (10) AS tab(col); 10 > SELECT mode(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-10 > SELECT mode(col) FROM VALUES (0), (10), (10), (null), (null), (null) AS tab(col); 10 > SELECT mode(col, false) FROM VALUES (-10), (0), (10) AS tab(col); 0 > SELECT mode(col, true) FROM VALUES (-10), (0), (10) AS tab(col); -10 > SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10) AS tab(col); 10 > SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10), (20), (20) AS tab(col); 10 > SELECT mode() WITHIN GROUP (ORDER BY col DESC) FROM VALUES (0), (10), (10), (20), (20) AS tab(col); 20 Since: 3.4.0 percentile \u00b6 percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric or ANSI interval column col at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral Examples: > SELECT percentile(col, 0.3) FROM VALUES (0), (10) AS tab(col); 3.0 > SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col); [2.5,7.5] > SELECT percentile(col, 0.5) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-5 > SELECT percentile(col, array(0.2, 0.5)) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '10' SECOND) AS tab(col); [0 00:00:02.000000000,0 00:00:05.000000000] Since: 2.1.0 percentile_approx \u00b6 percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array. Examples: > SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col); [1,1,0] > SELECT percentile_approx(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col); 7 > SELECT percentile_approx(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-1 > SELECT percentile_approx(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col); [0 00:00:01.000000000,0 00:00:02.000000000] Since: 2.1.0 percentile_cont \u00b6 percentile_cont(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a continuous distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause). Examples: > SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col); 2.5 > SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-2 Since: 4.0.0 percentile_disc \u00b6 percentile_disc(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a discrete distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause). Examples: > SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col); 0.0 > SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-0 Since: 4.0.0 regr_avgx \u00b6 regr_avgx(y, x) - Returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 2.75 > SELECT regr_avgx(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_avgx(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 3.0 > SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 3.0 Since: 3.3.0 regr_avgy \u00b6 regr_avgy(y, x) - Returns the average of the dependent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 1.75 > SELECT regr_avgy(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_avgy(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 1.6666666666666667 > SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 1.5 Since: 3.3.0 regr_count \u00b6 regr_count(y, x) - Returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_count(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 4 > SELECT regr_count(y, x) FROM VALUES (1, null) AS tab(y, x); 0 > SELECT regr_count(y, x) FROM VALUES (null, 1) AS tab(y, x); 0 > SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 3 > SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 2 Since: 3.3.0 regr_intercept \u00b6 regr_intercept(y, x) - Returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x); 0.0 > SELECT regr_intercept(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_intercept(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x); 0.0 > SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x); 0.0 Since: 3.4.0 regr_r2 \u00b6 regr_r2(y, x) - Returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 0.2727272727272727 > SELECT regr_r2(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_r2(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 0.7500000000000001 > SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 1.0 Since: 3.3.0 regr_slope \u00b6 regr_slope(y, x) - Returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x); 1.0 > SELECT regr_slope(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_slope(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x); 1.0 > SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x); 1.0 Since: 3.4.0 regr_sxx \u00b6 regr_sxx(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 2.75 > SELECT regr_sxx(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_sxx(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 2.0 > SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 2.0 Since: 3.4.0 regr_sxy \u00b6 regr_sxy(y, x) - Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 0.75 > SELECT regr_sxy(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_sxy(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 1.0 > SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 1.0 Since: 3.4.0 regr_syy \u00b6 regr_syy(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 0.75 > SELECT regr_syy(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_syy(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 0.6666666666666666 > SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 0.5 Since: 3.4.0 skewness \u00b6 skewness(expr) - Returns the skewness value calculated from values of a group. Examples: > SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col); 1.1135657469022011 > SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col); -1.1135657469022011 Since: 1.6.0 some \u00b6 some(expr) - Returns true if at least one value of expr is true. Examples: > SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col); true > SELECT some(col) FROM VALUES (NULL), (true), (false) AS tab(col); true > SELECT some(col) FROM VALUES (false), (false), (NULL) AS tab(col); false Since: 3.0.0 std \u00b6 std(expr) - Returns the sample standard deviation calculated from values of a group. Examples: > SELECT std(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0 stddev \u00b6 stddev(expr) - Returns the sample standard deviation calculated from values of a group. Examples: > SELECT stddev(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0 stddev_pop \u00b6 stddev_pop(expr) - Returns the population standard deviation calculated from values of a group. Examples: > SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col); 0.816496580927726 Since: 1.6.0 stddev_samp \u00b6 stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group. Examples: > SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0 string_agg \u00b6 string_agg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned. Arguments: expr - a string or binary expression to be concatenated. delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL. key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic. Examples: > SELECT string_agg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col); abc > SELECT string_agg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col); cba > SELECT string_agg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col); ab > SELECT string_agg(col) FROM VALUES ('a'), ('a') AS tab(col); aa > SELECT string_agg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col); ab > SELECT string_agg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col); a, b, c > SELECT string_agg(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL Note: If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle. If DISTINCT is specified, then expr and key must be the same expression. Since: 4.0.0 sum \u00b6 sum(expr) - Returns the sum calculated from values of a group. Examples: > SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col); 30 > SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col); 25 > SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL Since: 1.0.0 theta_intersection_agg \u00b6 theta_intersection_agg(expr) - Returns the ThetaSketch's Compact binary representation by intersecting all the Theta sketches in the input column. Examples: > SELECT theta_sketch_estimate(theta_intersection_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col)); 1 Since: 4.1.0 theta_sketch_agg \u00b6 theta_sketch_agg(expr, lgNomEntries) - Returns the ThetaSketch compact binary representation. lgNomEntries (optional) is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the ThetaSketch. Examples: > SELECT theta_sketch_estimate(theta_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 4.1.0 theta_union_agg \u00b6 theta_union_agg(expr, lgNomEntries) - Returns the ThetaSketch's Compact binary representation. lgNomEntries (optional) the log-base-2 of Nominal Entries, with Nominal Entries deciding the number buckets or slots for the ThetaSketch. Examples: > SELECT theta_sketch_estimate(theta_union_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col)); 1 Since: 4.1.0 try_avg \u00b6 try_avg(expr) - Returns the mean calculated from values of a group and the result is null on overflow. Examples: > SELECT try_avg(col) FROM VALUES (1), (2), (3) AS tab(col); 2.0 > SELECT try_avg(col) FROM VALUES (1), (2), (NULL) AS tab(col); 1.5 > SELECT try_avg(col) FROM VALUES (interval '2147483647 months'), (interval '1 months') AS tab(col); NULL Since: 3.3.0 try_sum \u00b6 try_sum(expr) - Returns the sum calculated from values of a group and the result is null on overflow. Examples: > SELECT try_sum(col) FROM VALUES (5), (10), (15) AS tab(col); 30 > SELECT try_sum(col) FROM VALUES (NULL), (10), (15) AS tab(col); 25 > SELECT try_sum(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL > SELECT try_sum(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col); NULL Since: 3.3.0 tuple_intersection_agg_double \u00b6 tuple_intersection_agg_double(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_double(tuple_intersection_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D), (3, 15.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D), (4, 12.0D) tab(key, summary)); 2.0 Since: 4.2.0 tuple_intersection_agg_integer \u00b6 tuple_intersection_agg_integer(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_integer(tuple_intersection_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 1), (2, 2), (3, 3) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 2), (3, 3), (4, 4) tab(key, summary)); 2.0 Since: 4.2.0 tuple_sketch_agg_double \u00b6 tuple_sketch_agg_double(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the double value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary, 12, 'sum')) FROM VALUES (1, 5.0D), (1, 1.0D), (2, 2.0D), (2, 3.0D), (3, 2.2D) tab(key, summary); 3.0 Since: 4.2.0 tuple_sketch_agg_integer \u00b6 tuple_sketch_agg_integer(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the integer value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary, 12, 'sum')) FROM VALUES (1, 5), (1, 1), (2, 2), (2, 3), (3, 2) tab(key, summary); 3.0 Since: 4.2.0 tuple_union_agg_double \u00b6 tuple_union_agg_double(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_double(tuple_union_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D) tab(key, summary)); 3.0 Since: 4.2.0 tuple_union_agg_integer \u00b6 tuple_union_agg_integer(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_integer(tuple_union_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 5), (2, 10) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 3), (3, 7) tab(key, summary)); 3.0 Since: 4.2.0 var_pop \u00b6 var_pop(expr) - Returns the population variance calculated from values of a group. Examples: > SELECT var_pop(col) FROM VALUES (1), (2), (3) AS tab(col); 0.6666666666666666 Since: 1.6.0 var_samp \u00b6 var_samp(expr) - Returns the sample variance calculated from values of a group. Examples: > SELECT var_samp(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0 variance \u00b6 variance(expr) - Returns the sample variance calculated from values of a group. Examples: > SELECT variance(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0","title":"Agg Functions"},{"location":"agg-functions/#agg-functions","text":"This page lists all agg functions available in Spark SQL.","title":"Agg Functions"},{"location":"agg-functions/#any","text":"any(expr) - Returns true if at least one value of expr is true. Examples: > SELECT any(col) FROM VALUES (true), (false), (false) AS tab(col); true > SELECT any(col) FROM VALUES (NULL), (true), (false) AS tab(col); true > SELECT any(col) FROM VALUES (false), (false), (NULL) AS tab(col); false Since: 3.0.0","title":"any"},{"location":"agg-functions/#any_value","text":"any_value(expr[, isIgnoreNull]) - Returns some value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. Examples: > SELECT any_value(col) FROM VALUES (10), (5), (20) AS tab(col); 10 > SELECT any_value(col) FROM VALUES (NULL), (5), (20) AS tab(col); NULL > SELECT any_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col); 5 Note: The function is non-deterministic. Since: 3.4.0","title":"any_value"},{"location":"agg-functions/#approx_count_distinct","text":"approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD defines the maximum relative standard deviation allowed. Examples: > SELECT approx_count_distinct(col1) FROM VALUES (1), (1), (2), (2), (3) tab(col1); 3 Since: 1.6.0","title":"approx_count_distinct"},{"location":"agg-functions/#approx_percentile","text":"approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array. Examples: > SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col); [1,1,0] > SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col); 7 > SELECT approx_percentile(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-1 > SELECT approx_percentile(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col); [0 00:00:01.000000000,0 00:00:02.000000000] Since: 2.1.0","title":"approx_percentile"},{"location":"agg-functions/#approx_top_k","text":"approx_top_k(expr, k, maxItemsTracked) - Returns top k items with their frequency. k An optional INTEGER literal greater than 0. If k is not specified, it defaults to 5. maxItemsTracked An optional INTEGER literal greater than or equal to k and has upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000. Examples: > SELECT approx_top_k(expr) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] > SELECT approx_top_k(expr, 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr); [{\"item\":\"c\",\"count\":4},{\"item\":\"d\",\"count\":2}] > SELECT approx_top_k(expr, 10, 100) FROM VALUES (0), (1), (1), (2), (2), (2) AS tab(expr); [{\"item\":2,\"count\":3},{\"item\":1,\"count\":2},{\"item\":0,\"count\":1}] Since: 4.1.0","title":"approx_top_k"},{"location":"agg-functions/#approx_top_k_accumulate","text":"approx_top_k_accumulate(expr, maxItemsTracked) - Accumulates items into a sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000. Examples: > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr, 100), 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr); [{\"item\":\"c\",\"count\":4},{\"item\":\"d\",\"count\":2}] Since: 4.1.0","title":"approx_top_k_accumulate"},{"location":"agg-functions/#approx_top_k_combine","text":"approx_top_k_combine(state, maxItemsTracked) - Combines multiple sketches into a single sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is specified, it will be set for the combined sketch. If maxItemsTracked is not specified, the input sketches must have the same maxItemsTracked value, otherwise an error will be thrown. The output sketch will use the same value from the input sketches. Examples: > SELECT approx_top_k_estimate(approx_top_k_combine(sketch, 10000), 5) FROM (SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (0), (0), (1), (1) AS tab(expr) UNION ALL SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (2), (3), (4), (4) AS tab(expr)); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] Since: 4.1.0","title":"approx_top_k_combine"},{"location":"agg-functions/#array_agg","text":"array_agg(expr) - Collects and returns a list of non-unique elements. Examples: > SELECT array_agg(col) FROM VALUES (1), (2), (1) AS tab(col); [1,2,1] Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 3.3.0","title":"array_agg"},{"location":"agg-functions/#avg","text":"avg(expr) - Returns the mean calculated from values of a group. Examples: > SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col); 2.0 > SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col); 1.5 Since: 1.0.0","title":"avg"},{"location":"agg-functions/#bit_and","text":"bit_and(expr) - Returns the bitwise AND of all non-null input values, or null if none. Examples: > SELECT bit_and(col) FROM VALUES (3), (5) AS tab(col); 1 Since: 3.0.0","title":"bit_and"},{"location":"agg-functions/#bit_or","text":"bit_or(expr) - Returns the bitwise OR of all non-null input values, or null if none. Examples: > SELECT bit_or(col) FROM VALUES (3), (5) AS tab(col); 7 Since: 3.0.0","title":"bit_or"},{"location":"agg-functions/#bit_xor","text":"bit_xor(expr) - Returns the bitwise XOR of all non-null input values, or null if none. Examples: > SELECT bit_xor(col) FROM VALUES (3), (5) AS tab(col); 6 Since: 3.0.0","title":"bit_xor"},{"location":"agg-functions/#bitmap_and_agg","text":"bitmap_and_agg(child) - Returns a bitmap that is the bitwise AND of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg(). Examples: > SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'F0'), (X '70'), (X '30') AS tab(col); 300000 > SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'FF'), (X 'FF'), (X 'FF') AS tab(col); FF0000 Since: 4.1.0","title":"bitmap_and_agg"},{"location":"agg-functions/#bitmap_construct_agg","text":"bitmap_construct_agg(child) - Returns a bitmap with the positions of the bits set from all the values from the child expression. The child expression will most likely be bitmap_bit_position(). Examples: > SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (2), (3) AS tab(col); 070000 > SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (1), (1) AS tab(col); 010000 Since: 3.5.0","title":"bitmap_construct_agg"},{"location":"agg-functions/#bitmap_or_agg","text":"bitmap_or_agg(child) - Returns a bitmap that is the bitwise OR of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg(). Examples: > SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '20'), (X '40') AS tab(col); 700000 > SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '10'), (X '10') AS tab(col); 100000 Since: 3.5.0","title":"bitmap_or_agg"},{"location":"agg-functions/#bool_and","text":"bool_and(expr) - Returns true if all values of expr are true. Examples: > SELECT bool_and(col) FROM VALUES (true), (true), (true) AS tab(col); true > SELECT bool_and(col) FROM VALUES (NULL), (true), (true) AS tab(col); true > SELECT bool_and(col) FROM VALUES (true), (false), (true) AS tab(col); false Since: 3.0.0","title":"bool_and"},{"location":"agg-functions/#bool_or","text":"bool_or(expr) - Returns true if at least one value of expr is true. Examples: > SELECT bool_or(col) FROM VALUES (true), (false), (false) AS tab(col); true > SELECT bool_or(col) FROM VALUES (NULL), (true), (false) AS tab(col); true > SELECT bool_or(col) FROM VALUES (false), (false), (NULL) AS tab(col); false Since: 3.0.0","title":"bool_or"},{"location":"agg-functions/#collect_list","text":"collect_list(expr) - Collects and returns a list of non-unique elements. Examples: > SELECT collect_list(col) FROM VALUES (1), (2), (1) AS tab(col); [1,2,1] Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0","title":"collect_list"},{"location":"agg-functions/#collect_set","text":"collect_set(expr) - Collects and returns a set of unique elements. Examples: > SELECT collect_set(col) FROM VALUES (1), (2), (1) AS tab(col); [1,2] Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0","title":"collect_set"},{"location":"agg-functions/#corr","text":"corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs. Examples: > SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (6, 4) as tab(c1, c2); 0.8660254037844387 Since: 1.6.0","title":"corr"},{"location":"agg-functions/#count","text":"count(*) - Returns the total number of retrieved rows, including rows containing null. count(expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are all non-null. count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null. Examples: > SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col); 4 > SELECT count(col) FROM VALUES (NULL), (5), (5), (20) AS tab(col); 3 > SELECT count(DISTINCT col) FROM VALUES (NULL), (5), (5), (10) AS tab(col); 2 Since: 1.0.0","title":"count"},{"location":"agg-functions/#count_if","text":"count_if(expr) - Returns the number of TRUE values for the expression. Examples: > SELECT count_if(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); 2 > SELECT count_if(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); 1 Since: 3.0.0","title":"count_if"},{"location":"agg-functions/#count_min_sketch","text":"count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space. Examples: > SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col); 0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000 Since: 2.2.0","title":"count_min_sketch"},{"location":"agg-functions/#covar_pop","text":"covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs. Examples: > SELECT covar_pop(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2); 0.6666666666666666 Since: 2.0.0","title":"covar_pop"},{"location":"agg-functions/#covar_samp","text":"covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs. Examples: > SELECT covar_samp(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2); 1.0 Since: 2.0.0","title":"covar_samp"},{"location":"agg-functions/#every","text":"every(expr) - Returns true if all values of expr are true. Examples: > SELECT every(col) FROM VALUES (true), (true), (true) AS tab(col); true > SELECT every(col) FROM VALUES (NULL), (true), (true) AS tab(col); true > SELECT every(col) FROM VALUES (true), (false), (true) AS tab(col); false Since: 3.0.0","title":"every"},{"location":"agg-functions/#first","text":"first(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. Examples: > SELECT first(col) FROM VALUES (10), (5), (20) AS tab(col); 10 > SELECT first(col) FROM VALUES (NULL), (5), (20) AS tab(col); NULL > SELECT first(col, true) FROM VALUES (NULL), (5), (20) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0","title":"first"},{"location":"agg-functions/#first_value","text":"first_value(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. Examples: > SELECT first_value(col) FROM VALUES (10), (5), (20) AS tab(col); 10 > SELECT first_value(col) FROM VALUES (NULL), (5), (20) AS tab(col); NULL > SELECT first_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0","title":"first_value"},{"location":"agg-functions/#grouping","text":"grouping(col) - indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.\", Examples: > SELECT name, grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY cube(name); Alice 0 2 Bob 0 5 NULL 1 7 Since: 2.0.0","title":"grouping"},{"location":"agg-functions/#grouping_id","text":"grouping_id([col1[, col2 ..]]) - returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn) Examples: > SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height); Alice 0 2 165.0 Alice 1 2 165.0 NULL 3 7 172.5 Bob 0 5 180.0 Bob 1 5 180.0 NULL 2 2 165.0 NULL 2 5 180.0 Note: Input columns should match with grouping columns exactly, or empty (means all the grouping columns). Since: 2.0.0","title":"grouping_id"},{"location":"agg-functions/#histogram_numeric","text":"histogram_numeric(expr, nb) - Computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function. Examples: > SELECT histogram_numeric(col, 5) FROM VALUES (0), (1), (2), (10) AS tab(col); [{\"x\":0,\"y\":1.0},{\"x\":1,\"y\":1.0},{\"x\":2,\"y\":1.0},{\"x\":10,\"y\":1.0}] Since: 3.3.0","title":"histogram_numeric"},{"location":"agg-functions/#hll_sketch_agg","text":"hll_sketch_agg(expr, lgConfigK) - Returns the HllSketch's updatable binary representation. lgConfigK (optional) the log-base-2 of K, with K is the number of buckets or slots for the HllSketch. Examples: > SELECT hll_sketch_estimate(hll_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 3.5.0","title":"hll_sketch_agg"},{"location":"agg-functions/#hll_union_agg","text":"hll_union_agg(expr, allowDifferentLgConfigK) - Returns the estimated number of unique values. allowDifferentLgConfigK (optional) Allow sketches with different lgConfigK values to be unioned (defaults to false). Examples: > SELECT hll_sketch_estimate(hll_union_agg(sketch, true)) FROM (SELECT hll_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT hll_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col)); 1 Since: 3.5.0","title":"hll_union_agg"},{"location":"agg-functions/#kll_merge_agg_bigint","text":"kll_merge_agg_bigint(expr[, k]) - Merges binary KllLongsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_bigint). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch. Examples: > SELECT kll_sketch_get_n_bigint(kll_merge_agg_bigint(sketch)) FROM (SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (1), (2), (3) tab(col) UNION ALL SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (4), (5), (6) tab(col)) t; 6 Since: 4.1.0","title":"kll_merge_agg_bigint"},{"location":"agg-functions/#kll_merge_agg_double","text":"kll_merge_agg_double(expr[, k]) - Merges binary KllDoublesSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_double). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch. Examples: > SELECT kll_sketch_get_n_double(kll_merge_agg_double(sketch)) FROM (SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)) tab(col) UNION ALL SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)), (CAST(6.0 AS DOUBLE)) tab(col)) t; 6 Since: 4.1.0","title":"kll_merge_agg_double"},{"location":"agg-functions/#kll_merge_agg_float","text":"kll_merge_agg_float(expr[, k]) - Merges binary KllFloatsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_float). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch. Examples: > SELECT kll_sketch_get_n_float(kll_merge_agg_float(sketch)) FROM (SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)) tab(col) UNION ALL SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)), (CAST(6.0 AS FLOAT)) tab(col)) t; 6 Since: 4.1.0","title":"kll_merge_agg_float"},{"location":"agg-functions/#kll_sketch_agg_bigint","text":"kll_sketch_agg_bigint(expr[, k]) - Returns the KllLongsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches. Examples: > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col, 400))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0","title":"kll_sketch_agg_bigint"},{"location":"agg-functions/#kll_sketch_agg_double","text":"kll_sketch_agg_double(expr[, k]) - Returns the KllDoublesSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches. Examples: > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col, 400))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0","title":"kll_sketch_agg_double"},{"location":"agg-functions/#kll_sketch_agg_float","text":"kll_sketch_agg_float(expr[, k]) - Returns the KllFloatsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches. Examples: > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col, 400))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0","title":"kll_sketch_agg_float"},{"location":"agg-functions/#kurtosis","text":"kurtosis(expr) - Returns the kurtosis value calculated from values of a group. Examples: > SELECT kurtosis(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col); -0.7014368047529627 > SELECT kurtosis(col) FROM VALUES (1), (10), (100), (10), (1) as tab(col); 0.19432323191699075 Since: 1.6.0","title":"kurtosis"},{"location":"agg-functions/#last","text":"last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values Examples: > SELECT last(col) FROM VALUES (10), (5), (20) AS tab(col); 20 > SELECT last(col) FROM VALUES (10), (5), (NULL) AS tab(col); NULL > SELECT last(col, true) FROM VALUES (10), (5), (NULL) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0","title":"last"},{"location":"agg-functions/#last_value","text":"last_value(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values Examples: > SELECT last_value(col) FROM VALUES (10), (5), (20) AS tab(col); 20 > SELECT last_value(col) FROM VALUES (10), (5), (NULL) AS tab(col); NULL > SELECT last_value(col, true) FROM VALUES (10), (5), (NULL) AS tab(col); 5 Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle. Since: 2.0.0","title":"last_value"},{"location":"agg-functions/#listagg","text":"listagg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned. Arguments: expr - a string or binary expression to be concatenated. delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL. key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic. Examples: > SELECT listagg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col); abc > SELECT listagg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col); cba > SELECT listagg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col); ab > SELECT listagg(col) FROM VALUES ('a'), ('a') AS tab(col); aa > SELECT listagg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col); ab > SELECT listagg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col); a, b, c > SELECT listagg(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL Note: If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle. If DISTINCT is specified, then expr and key must be the same expression. Since: 4.0.0","title":"listagg"},{"location":"agg-functions/#max","text":"max(expr) - Returns the maximum value of expr . Examples: > SELECT max(col) FROM VALUES (10), (50), (20) AS tab(col); 50 Since: 1.0.0","title":"max"},{"location":"agg-functions/#max_by","text":"max_by(x, y) - Returns the value of x associated with the maximum value of y . max_by(x, y, k) - Returns an array of the k values of x associated with the maximum values of y , sorted in descending order by y . Returns NULL if there are no non-NULL ordering values. Examples: > SELECT max_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); b > SELECT max_by(x, y, 2) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); [\"b\",\"c\"] Note: The function is non-deterministic so the output order can be different for those associated the same values of y . The maximum value of k is 100000. Since: 4.2.0","title":"max_by"},{"location":"agg-functions/#mean","text":"mean(expr) - Returns the mean calculated from values of a group. Examples: > SELECT mean(col) FROM VALUES (1), (2), (3) AS tab(col); 2.0 > SELECT mean(col) FROM VALUES (1), (2), (NULL) AS tab(col); 1.5 Since: 1.0.0","title":"mean"},{"location":"agg-functions/#measure","text":"measure(expr) - this function is used and can only be used to calculate a measure defined in a metric view. Examples: > SELECT dimension_col, measure(measure_col) FROM test_metric_view GROUP BY dimension_col; dim_1, 100 dim_2, 200 Since: 4.2.0","title":"measure"},{"location":"agg-functions/#median","text":"median(col) - Returns the median of numeric or ANSI interval column col . Examples: > SELECT median(col) FROM VALUES (0), (10) AS tab(col); 5.0 > SELECT median(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-5 Since: 3.4.0","title":"median"},{"location":"agg-functions/#min","text":"min(expr) - Returns the minimum value of expr . Examples: > SELECT min(col) FROM VALUES (10), (-1), (20) AS tab(col); -1 Since: 1.0.0","title":"min"},{"location":"agg-functions/#min_by","text":"min_by(x, y) - Returns the value of x associated with the minimum value of y . min_by(x, y, k) - Returns an array of the k values of x associated with the minimum values of y , sorted in ascending order by y . Returns NULL if there are no non-NULL ordering values. Examples: > SELECT min_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); a > SELECT min_by(x, y, 2) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); [\"a\",\"c\"] Note: The function is non-deterministic so the output order can be different for those associated the same values of y . The maximum value of k is 100000. Since: 4.2.0","title":"min_by"},{"location":"agg-functions/#mode","text":"mode(col[, deterministic]) - Returns the most frequent value for the values within col . NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true. mode() WITHIN GROUP (ORDER BY col) - Returns the most frequent value for the values within col (specified in ORDER BY clause). NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency only one value will be returned. The value will be chosen based on sort direction. Return the smallest value if sort direction is asc or the largest value if sort direction is desc from multiple values with the same frequency. Examples: > SELECT mode(col) FROM VALUES (0), (10), (10) AS tab(col); 10 > SELECT mode(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-10 > SELECT mode(col) FROM VALUES (0), (10), (10), (null), (null), (null) AS tab(col); 10 > SELECT mode(col, false) FROM VALUES (-10), (0), (10) AS tab(col); 0 > SELECT mode(col, true) FROM VALUES (-10), (0), (10) AS tab(col); -10 > SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10) AS tab(col); 10 > SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10), (20), (20) AS tab(col); 10 > SELECT mode() WITHIN GROUP (ORDER BY col DESC) FROM VALUES (0), (10), (10), (20), (20) AS tab(col); 20 Since: 3.4.0","title":"mode"},{"location":"agg-functions/#percentile","text":"percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric or ANSI interval column col at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral Examples: > SELECT percentile(col, 0.3) FROM VALUES (0), (10) AS tab(col); 3.0 > SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col); [2.5,7.5] > SELECT percentile(col, 0.5) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-5 > SELECT percentile(col, array(0.2, 0.5)) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '10' SECOND) AS tab(col); [0 00:00:02.000000000,0 00:00:05.000000000] Since: 2.1.0","title":"percentile"},{"location":"agg-functions/#percentile_approx","text":"percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array. Examples: > SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col); [1,1,0] > SELECT percentile_approx(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col); 7 > SELECT percentile_approx(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-1 > SELECT percentile_approx(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col); [0 00:00:01.000000000,0 00:00:02.000000000] Since: 2.1.0","title":"percentile_approx"},{"location":"agg-functions/#percentile_cont","text":"percentile_cont(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a continuous distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause). Examples: > SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col); 2.5 > SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-2 Since: 4.0.0","title":"percentile_cont"},{"location":"agg-functions/#percentile_disc","text":"percentile_disc(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a discrete distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause). Examples: > SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col); 0.0 > SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col); 0-0 Since: 4.0.0","title":"percentile_disc"},{"location":"agg-functions/#regr_avgx","text":"regr_avgx(y, x) - Returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 2.75 > SELECT regr_avgx(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_avgx(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 3.0 > SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 3.0 Since: 3.3.0","title":"regr_avgx"},{"location":"agg-functions/#regr_avgy","text":"regr_avgy(y, x) - Returns the average of the dependent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 1.75 > SELECT regr_avgy(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_avgy(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 1.6666666666666667 > SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 1.5 Since: 3.3.0","title":"regr_avgy"},{"location":"agg-functions/#regr_count","text":"regr_count(y, x) - Returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_count(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 4 > SELECT regr_count(y, x) FROM VALUES (1, null) AS tab(y, x); 0 > SELECT regr_count(y, x) FROM VALUES (null, 1) AS tab(y, x); 0 > SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 3 > SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 2 Since: 3.3.0","title":"regr_count"},{"location":"agg-functions/#regr_intercept","text":"regr_intercept(y, x) - Returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x); 0.0 > SELECT regr_intercept(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_intercept(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x); 0.0 > SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x); 0.0 Since: 3.4.0","title":"regr_intercept"},{"location":"agg-functions/#regr_r2","text":"regr_r2(y, x) - Returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 0.2727272727272727 > SELECT regr_r2(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_r2(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 0.7500000000000001 > SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 1.0 Since: 3.3.0","title":"regr_r2"},{"location":"agg-functions/#regr_slope","text":"regr_slope(y, x) - Returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x); 1.0 > SELECT regr_slope(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_slope(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x); 1.0 > SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x); 1.0 Since: 3.4.0","title":"regr_slope"},{"location":"agg-functions/#regr_sxx","text":"regr_sxx(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 2.75 > SELECT regr_sxx(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_sxx(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 2.0 > SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 2.0 Since: 3.4.0","title":"regr_sxx"},{"location":"agg-functions/#regr_sxy","text":"regr_sxy(y, x) - Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 0.75 > SELECT regr_sxy(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_sxy(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 1.0 > SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 1.0 Since: 3.4.0","title":"regr_sxy"},{"location":"agg-functions/#regr_syy","text":"regr_syy(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable. Examples: > SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x); 0.75 > SELECT regr_syy(y, x) FROM VALUES (1, null) AS tab(y, x); NULL > SELECT regr_syy(y, x) FROM VALUES (null, 1) AS tab(y, x); NULL > SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x); 0.6666666666666666 > SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x); 0.5 Since: 3.4.0","title":"regr_syy"},{"location":"agg-functions/#skewness","text":"skewness(expr) - Returns the skewness value calculated from values of a group. Examples: > SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col); 1.1135657469022011 > SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col); -1.1135657469022011 Since: 1.6.0","title":"skewness"},{"location":"agg-functions/#some","text":"some(expr) - Returns true if at least one value of expr is true. Examples: > SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col); true > SELECT some(col) FROM VALUES (NULL), (true), (false) AS tab(col); true > SELECT some(col) FROM VALUES (false), (false), (NULL) AS tab(col); false Since: 3.0.0","title":"some"},{"location":"agg-functions/#std","text":"std(expr) - Returns the sample standard deviation calculated from values of a group. Examples: > SELECT std(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0","title":"std"},{"location":"agg-functions/#stddev","text":"stddev(expr) - Returns the sample standard deviation calculated from values of a group. Examples: > SELECT stddev(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0","title":"stddev"},{"location":"agg-functions/#stddev_pop","text":"stddev_pop(expr) - Returns the population standard deviation calculated from values of a group. Examples: > SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col); 0.816496580927726 Since: 1.6.0","title":"stddev_pop"},{"location":"agg-functions/#stddev_samp","text":"stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group. Examples: > SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0","title":"stddev_samp"},{"location":"agg-functions/#string_agg","text":"string_agg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned. Arguments: expr - a string or binary expression to be concatenated. delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL. key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic. Examples: > SELECT string_agg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col); abc > SELECT string_agg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col); cba > SELECT string_agg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col); ab > SELECT string_agg(col) FROM VALUES ('a'), ('a') AS tab(col); aa > SELECT string_agg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col); ab > SELECT string_agg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col); a, b, c > SELECT string_agg(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL Note: If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle. If DISTINCT is specified, then expr and key must be the same expression. Since: 4.0.0","title":"string_agg"},{"location":"agg-functions/#sum","text":"sum(expr) - Returns the sum calculated from values of a group. Examples: > SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col); 30 > SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col); 25 > SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL Since: 1.0.0","title":"sum"},{"location":"agg-functions/#theta_intersection_agg","text":"theta_intersection_agg(expr) - Returns the ThetaSketch's Compact binary representation by intersecting all the Theta sketches in the input column. Examples: > SELECT theta_sketch_estimate(theta_intersection_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col)); 1 Since: 4.1.0","title":"theta_intersection_agg"},{"location":"agg-functions/#theta_sketch_agg","text":"theta_sketch_agg(expr, lgNomEntries) - Returns the ThetaSketch compact binary representation. lgNomEntries (optional) is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the ThetaSketch. Examples: > SELECT theta_sketch_estimate(theta_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 4.1.0","title":"theta_sketch_agg"},{"location":"agg-functions/#theta_union_agg","text":"theta_union_agg(expr, lgNomEntries) - Returns the ThetaSketch's Compact binary representation. lgNomEntries (optional) the log-base-2 of Nominal Entries, with Nominal Entries deciding the number buckets or slots for the ThetaSketch. Examples: > SELECT theta_sketch_estimate(theta_union_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col)); 1 Since: 4.1.0","title":"theta_union_agg"},{"location":"agg-functions/#try_avg","text":"try_avg(expr) - Returns the mean calculated from values of a group and the result is null on overflow. Examples: > SELECT try_avg(col) FROM VALUES (1), (2), (3) AS tab(col); 2.0 > SELECT try_avg(col) FROM VALUES (1), (2), (NULL) AS tab(col); 1.5 > SELECT try_avg(col) FROM VALUES (interval '2147483647 months'), (interval '1 months') AS tab(col); NULL Since: 3.3.0","title":"try_avg"},{"location":"agg-functions/#try_sum","text":"try_sum(expr) - Returns the sum calculated from values of a group and the result is null on overflow. Examples: > SELECT try_sum(col) FROM VALUES (5), (10), (15) AS tab(col); 30 > SELECT try_sum(col) FROM VALUES (NULL), (10), (15) AS tab(col); 25 > SELECT try_sum(col) FROM VALUES (NULL), (NULL) AS tab(col); NULL > SELECT try_sum(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col); NULL Since: 3.3.0","title":"try_sum"},{"location":"agg-functions/#tuple_intersection_agg_double","text":"tuple_intersection_agg_double(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_double(tuple_intersection_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D), (3, 15.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D), (4, 12.0D) tab(key, summary)); 2.0 Since: 4.2.0","title":"tuple_intersection_agg_double"},{"location":"agg-functions/#tuple_intersection_agg_integer","text":"tuple_intersection_agg_integer(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_integer(tuple_intersection_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 1), (2, 2), (3, 3) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 2), (3, 3), (4, 4) tab(key, summary)); 2.0 Since: 4.2.0","title":"tuple_intersection_agg_integer"},{"location":"agg-functions/#tuple_sketch_agg_double","text":"tuple_sketch_agg_double(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the double value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary, 12, 'sum')) FROM VALUES (1, 5.0D), (1, 1.0D), (2, 2.0D), (2, 3.0D), (3, 2.2D) tab(key, summary); 3.0 Since: 4.2.0","title":"tuple_sketch_agg_double"},{"location":"agg-functions/#tuple_sketch_agg_integer","text":"tuple_sketch_agg_integer(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the integer value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary, 12, 'sum')) FROM VALUES (1, 5), (1, 1), (2, 2), (2, 3), (3, 2) tab(key, summary); 3.0 Since: 4.2.0","title":"tuple_sketch_agg_integer"},{"location":"agg-functions/#tuple_union_agg_double","text":"tuple_union_agg_double(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_double(tuple_union_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D) tab(key, summary)); 3.0 Since: 4.2.0","title":"tuple_union_agg_double"},{"location":"agg-functions/#tuple_union_agg_integer","text":"tuple_union_agg_integer(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum. Examples: > SELECT tuple_sketch_estimate_integer(tuple_union_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 5), (2, 10) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 3), (3, 7) tab(key, summary)); 3.0 Since: 4.2.0","title":"tuple_union_agg_integer"},{"location":"agg-functions/#var_pop","text":"var_pop(expr) - Returns the population variance calculated from values of a group. Examples: > SELECT var_pop(col) FROM VALUES (1), (2), (3) AS tab(col); 0.6666666666666666 Since: 1.6.0","title":"var_pop"},{"location":"agg-functions/#var_samp","text":"var_samp(expr) - Returns the sample variance calculated from values of a group. Examples: > SELECT var_samp(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0","title":"var_samp"},{"location":"agg-functions/#variance","text":"variance(expr) - Returns the sample variance calculated from values of a group. Examples: > SELECT variance(col) FROM VALUES (1), (2), (3) AS tab(col); 1.0 Since: 1.6.0","title":"variance"},{"location":"array-functions/","text":"Array Functions \u00b6 This page lists all array functions available in Spark SQL. array \u00b6 array(expr, ...) - Returns an array with the given elements. Examples: > SELECT array(1, 2, 3); [1,2,3] Since: 1.1.0 array_append \u00b6 array_append(array, element) - Add the element at the end of the array passed as first argument. Type of element should be similar to type of the elements of the array. Null element is also appended into the array. But if the array passed, is NULL output is NULL Examples: > SELECT array_append(array('b', 'd', 'c', 'a'), 'd'); [\"b\",\"d\",\"c\",\"a\",\"d\"] > SELECT array_append(array(1, 2, 3, null), null); [1,2,3,null,null] > SELECT array_append(CAST(null as Array<Int>), 2); NULL Since: 3.4.0 array_compact \u00b6 array_compact(array) - Removes null values from the array. Examples: > SELECT array_compact(array(1, 2, 3, null)); [1,2,3] > SELECT array_compact(array(\"a\", \"b\", \"c\")); [\"a\",\"b\",\"c\"] Since: 3.4.0 array_contains \u00b6 array_contains(array, value) - Returns true if the array contains the value. Examples: > SELECT array_contains(array(1, 2, 3), 2); true Since: 1.5.0 array_distinct \u00b6 array_distinct(array) - Removes duplicate values from the array. Examples: > SELECT array_distinct(array(1, 2, 3, null, 3)); [1,2,3,null] Since: 2.4.0 array_except \u00b6 array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates. Examples: > SELECT array_except(array(1, 2, 3), array(1, 3, 5)); [2] Since: 2.4.0 array_insert \u00b6 array_insert(x, pos, val) - Places val into index pos of array x. Array indices start at 1. The maximum negative index is -1 for which the function inserts new element after the current last element. Index above array size appends the array, or prepends the array if index is negative, with 'null' elements. Examples: > SELECT array_insert(array(1, 2, 3, 4), 5, 5); [1,2,3,4,5] > SELECT array_insert(array(5, 4, 3, 2), -1, 1); [5,4,3,2,1] > SELECT array_insert(array(5, 3, 2, 1), -4, 4); [5,4,3,2,1] Since: 3.4.0 array_intersect \u00b6 array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates. Examples: > SELECT array_intersect(array(1, 2, 3), array(1, 3, 5)); [1,3] Since: 2.4.0 array_join \u00b6 array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered. Examples: > SELECT array_join(array('hello', 'world'), ' '); hello world > SELECT array_join(array('hello', null ,'world'), ' '); hello world > SELECT array_join(array('hello', null ,'world'), ' ', ','); hello , world Since: 2.4.0 array_max \u00b6 array_max(array) - Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped. Examples: > SELECT array_max(array(1, 20, null, 3)); 20 Since: 2.4.0 array_min \u00b6 array_min(array) - Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped. Examples: > SELECT array_min(array(1, 20, null, 3)); 1 Since: 2.4.0 array_position \u00b6 array_position(array, element) - Returns the (1-based) index of the first matching element of the array as long, or 0 if no match is found. Examples: > SELECT array_position(array(312, 773, 708, 708), 708); 3 > SELECT array_position(array(312, 773, 708, 708), 414); 0 Since: 2.4.0 array_prepend \u00b6 array_prepend(array, element) - Add the element at the beginning of the array passed as first argument. Type of element should be the same as the type of the elements of the array. Null element is also prepended to the array. But if the array passed is NULL output is NULL Examples: > SELECT array_prepend(array('b', 'd', 'c', 'a'), 'd'); [\"d\",\"b\",\"d\",\"c\",\"a\"] > SELECT array_prepend(array(1, 2, 3, null), null); [null,1,2,3,null] > SELECT array_prepend(CAST(null as Array<Int>), 2); NULL Since: 3.5.0 array_remove \u00b6 array_remove(array, element) - Remove all elements that equal to element from array. Examples: > SELECT array_remove(array(1, 2, 3, null, 3), 3); [1,2,null] Since: 2.4.0 array_repeat \u00b6 array_repeat(element, count) - Returns the array containing element count times. Examples: > SELECT array_repeat('123', 2); [\"123\",\"123\"] Since: 2.4.0 array_size \u00b6 array_size(expr) - Returns the size of an array. The function returns null for null input. Examples: > SELECT array_size(array('b', 'd', 'c', 'a')); 4 Since: 3.3.0 array_union \u00b6 array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates. Examples: > SELECT array_union(array(1, 2, 3), array(1, 3, 5)); [1,2,3,5] Since: 2.4.0 arrays_overlap \u00b6 arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise. Examples: > SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5)); true Since: 2.4.0 arrays_zip \u00b6 arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. Examples: > SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4)); [{\"0\":1,\"1\":2},{\"0\":2,\"1\":3},{\"0\":3,\"1\":4}] > SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4)); [{\"0\":1,\"1\":2,\"2\":3},{\"0\":2,\"1\":3,\"2\":4}] Since: 2.4.0 flatten \u00b6 flatten(arrayOfArrays) - Transforms an array of arrays into a single array. Examples: > SELECT flatten(array(array(1, 2), array(3, 4))); [1,2,3,4] Since: 2.4.0 get \u00b6 get(array, index) - Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL. Examples: > SELECT get(array(1, 2, 3), 0); 1 > SELECT get(array(1, 2, 3), 3); NULL > SELECT get(array(1, 2, 3), -1); NULL Since: 3.4.0 sequence \u00b6 sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions. Supported types are: byte, short, integer, long, date, timestamp. The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' or 'year-month interval' or 'day-time interval' type, otherwise to the same type as the start and stop expressions. Arguments: start - an expression. The start of the range. stop - an expression. The end the range (inclusive). step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa. Examples: > SELECT sequence(1, 5); [1,2,3,4,5] > SELECT sequence(5, 1); [5,4,3,2,1] > SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month); [2018-01-01,2018-02-01,2018-03-01] > SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval '0-1' year to month); [2018-01-01,2018-02-01,2018-03-01] Since: 2.4.0 shuffle \u00b6 shuffle(array) - Returns a random permutation of the given array. Examples: > SELECT shuffle(array(1, 20, 3, 5)); [3,1,5,20] > SELECT shuffle(array(1, 20, null, 3)); [20,null,3,1] Note: The function is non-deterministic. Since: 2.4.0 slice \u00b6 slice(x, start, length) - Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length. Examples: > SELECT slice(array(1, 2, 3, 4), 2, 2); [2,3] > SELECT slice(array(1, 2, 3, 4), -2, 2); [3,4] Since: 2.4.0 sort_array \u00b6 sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order. Examples: > SELECT sort_array(array('b', 'd', null, 'c', 'a'), true); [null,\"a\",\"b\",\"c\",\"d\"] > SELECT sort_array(array('b', 'd', null, 'c', 'a'), false); [\"d\",\"c\",\"b\",\"a\",null] Since: 1.5.0","title":"Array Functions"},{"location":"array-functions/#array-functions","text":"This page lists all array functions available in Spark SQL.","title":"Array Functions"},{"location":"array-functions/#array","text":"array(expr, ...) - Returns an array with the given elements. Examples: > SELECT array(1, 2, 3); [1,2,3] Since: 1.1.0","title":"array"},{"location":"array-functions/#array_append","text":"array_append(array, element) - Add the element at the end of the array passed as first argument. Type of element should be similar to type of the elements of the array. Null element is also appended into the array. But if the array passed, is NULL output is NULL Examples: > SELECT array_append(array('b', 'd', 'c', 'a'), 'd'); [\"b\",\"d\",\"c\",\"a\",\"d\"] > SELECT array_append(array(1, 2, 3, null), null); [1,2,3,null,null] > SELECT array_append(CAST(null as Array<Int>), 2); NULL Since: 3.4.0","title":"array_append"},{"location":"array-functions/#array_compact","text":"array_compact(array) - Removes null values from the array. Examples: > SELECT array_compact(array(1, 2, 3, null)); [1,2,3] > SELECT array_compact(array(\"a\", \"b\", \"c\")); [\"a\",\"b\",\"c\"] Since: 3.4.0","title":"array_compact"},{"location":"array-functions/#array_contains","text":"array_contains(array, value) - Returns true if the array contains the value. Examples: > SELECT array_contains(array(1, 2, 3), 2); true Since: 1.5.0","title":"array_contains"},{"location":"array-functions/#array_distinct","text":"array_distinct(array) - Removes duplicate values from the array. Examples: > SELECT array_distinct(array(1, 2, 3, null, 3)); [1,2,3,null] Since: 2.4.0","title":"array_distinct"},{"location":"array-functions/#array_except","text":"array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates. Examples: > SELECT array_except(array(1, 2, 3), array(1, 3, 5)); [2] Since: 2.4.0","title":"array_except"},{"location":"array-functions/#array_insert","text":"array_insert(x, pos, val) - Places val into index pos of array x. Array indices start at 1. The maximum negative index is -1 for which the function inserts new element after the current last element. Index above array size appends the array, or prepends the array if index is negative, with 'null' elements. Examples: > SELECT array_insert(array(1, 2, 3, 4), 5, 5); [1,2,3,4,5] > SELECT array_insert(array(5, 4, 3, 2), -1, 1); [5,4,3,2,1] > SELECT array_insert(array(5, 3, 2, 1), -4, 4); [5,4,3,2,1] Since: 3.4.0","title":"array_insert"},{"location":"array-functions/#array_intersect","text":"array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates. Examples: > SELECT array_intersect(array(1, 2, 3), array(1, 3, 5)); [1,3] Since: 2.4.0","title":"array_intersect"},{"location":"array-functions/#array_join","text":"array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered. Examples: > SELECT array_join(array('hello', 'world'), ' '); hello world > SELECT array_join(array('hello', null ,'world'), ' '); hello world > SELECT array_join(array('hello', null ,'world'), ' ', ','); hello , world Since: 2.4.0","title":"array_join"},{"location":"array-functions/#array_max","text":"array_max(array) - Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped. Examples: > SELECT array_max(array(1, 20, null, 3)); 20 Since: 2.4.0","title":"array_max"},{"location":"array-functions/#array_min","text":"array_min(array) - Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped. Examples: > SELECT array_min(array(1, 20, null, 3)); 1 Since: 2.4.0","title":"array_min"},{"location":"array-functions/#array_position","text":"array_position(array, element) - Returns the (1-based) index of the first matching element of the array as long, or 0 if no match is found. Examples: > SELECT array_position(array(312, 773, 708, 708), 708); 3 > SELECT array_position(array(312, 773, 708, 708), 414); 0 Since: 2.4.0","title":"array_position"},{"location":"array-functions/#array_prepend","text":"array_prepend(array, element) - Add the element at the beginning of the array passed as first argument. Type of element should be the same as the type of the elements of the array. Null element is also prepended to the array. But if the array passed is NULL output is NULL Examples: > SELECT array_prepend(array('b', 'd', 'c', 'a'), 'd'); [\"d\",\"b\",\"d\",\"c\",\"a\"] > SELECT array_prepend(array(1, 2, 3, null), null); [null,1,2,3,null] > SELECT array_prepend(CAST(null as Array<Int>), 2); NULL Since: 3.5.0","title":"array_prepend"},{"location":"array-functions/#array_remove","text":"array_remove(array, element) - Remove all elements that equal to element from array. Examples: > SELECT array_remove(array(1, 2, 3, null, 3), 3); [1,2,null] Since: 2.4.0","title":"array_remove"},{"location":"array-functions/#array_repeat","text":"array_repeat(element, count) - Returns the array containing element count times. Examples: > SELECT array_repeat('123', 2); [\"123\",\"123\"] Since: 2.4.0","title":"array_repeat"},{"location":"array-functions/#array_size","text":"array_size(expr) - Returns the size of an array. The function returns null for null input. Examples: > SELECT array_size(array('b', 'd', 'c', 'a')); 4 Since: 3.3.0","title":"array_size"},{"location":"array-functions/#array_union","text":"array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates. Examples: > SELECT array_union(array(1, 2, 3), array(1, 3, 5)); [1,2,3,5] Since: 2.4.0","title":"array_union"},{"location":"array-functions/#arrays_overlap","text":"arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise. Examples: > SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5)); true Since: 2.4.0","title":"arrays_overlap"},{"location":"array-functions/#arrays_zip","text":"arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. Examples: > SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4)); [{\"0\":1,\"1\":2},{\"0\":2,\"1\":3},{\"0\":3,\"1\":4}] > SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4)); [{\"0\":1,\"1\":2,\"2\":3},{\"0\":2,\"1\":3,\"2\":4}] Since: 2.4.0","title":"arrays_zip"},{"location":"array-functions/#flatten","text":"flatten(arrayOfArrays) - Transforms an array of arrays into a single array. Examples: > SELECT flatten(array(array(1, 2), array(3, 4))); [1,2,3,4] Since: 2.4.0","title":"flatten"},{"location":"array-functions/#get","text":"get(array, index) - Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL. Examples: > SELECT get(array(1, 2, 3), 0); 1 > SELECT get(array(1, 2, 3), 3); NULL > SELECT get(array(1, 2, 3), -1); NULL Since: 3.4.0","title":"get"},{"location":"array-functions/#sequence","text":"sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions. Supported types are: byte, short, integer, long, date, timestamp. The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' or 'year-month interval' or 'day-time interval' type, otherwise to the same type as the start and stop expressions. Arguments: start - an expression. The start of the range. stop - an expression. The end the range (inclusive). step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa. Examples: > SELECT sequence(1, 5); [1,2,3,4,5] > SELECT sequence(5, 1); [5,4,3,2,1] > SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month); [2018-01-01,2018-02-01,2018-03-01] > SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval '0-1' year to month); [2018-01-01,2018-02-01,2018-03-01] Since: 2.4.0","title":"sequence"},{"location":"array-functions/#shuffle","text":"shuffle(array) - Returns a random permutation of the given array. Examples: > SELECT shuffle(array(1, 20, 3, 5)); [3,1,5,20] > SELECT shuffle(array(1, 20, null, 3)); [20,null,3,1] Note: The function is non-deterministic. Since: 2.4.0","title":"shuffle"},{"location":"array-functions/#slice","text":"slice(x, start, length) - Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length. Examples: > SELECT slice(array(1, 2, 3, 4), 2, 2); [2,3] > SELECT slice(array(1, 2, 3, 4), -2, 2); [3,4] Since: 2.4.0","title":"slice"},{"location":"array-functions/#sort_array","text":"sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order. Examples: > SELECT sort_array(array('b', 'd', null, 'c', 'a'), true); [null,\"a\",\"b\",\"c\",\"d\"] > SELECT sort_array(array('b', 'd', null, 'c', 'a'), false); [\"d\",\"c\",\"b\",\"a\",null] Since: 1.5.0","title":"sort_array"},{"location":"avro-functions/","text":"Avro Functions \u00b6 This page lists all avro functions available in Spark SQL. from_avro \u00b6 from_avro(child, jsonFormatSchema, options) - Converts a binary Avro value into a Catalyst value. Examples: > SELECT from_avro(s, '{\"type\": \"record\", \"name\": \"struct\", \"fields\": [{ \"name\": \"u\", \"type\": [\"int\",\"string\"] }]}', map()) IS NULL AS result FROM (SELECT NAMED_STRUCT('u', NAMED_STRUCT('member0', member0, 'member1', member1)) AS s FROM VALUES (1, NULL), (NULL, 'a') tab(member0, member1)); [false] Note: The specified schema must match actual schema of the read data, otherwise the behavior is undefined: it may fail or return arbitrary result. To deserialize the data with a compatible and evolved schema, the expected Avro schema can be set via the corresponding option. Since: 4.0.0 schema_of_avro \u00b6 schema_of_avro(jsonFormatSchema, options) - Returns schema in the DDL format of the avro schema in JSON string format. Examples: > SELECT schema_of_avro('{\"type\": \"record\", \"name\": \"struct\", \"fields\": [{\"name\": \"u\", \"type\": [\"int\", \"string\"]}]}', map()); STRUCT<u: STRUCT<member0: INT, member1: STRING> NOT NULL> Since: 4.0.0 to_avro \u00b6 to_avro(child[, jsonFormatSchema]) - Converts a Catalyst binary input value into its corresponding Avro format result. Examples: > SELECT to_avro(s, '{\"type\": \"record\", \"name\": \"struct\", \"fields\": [{ \"name\": \"u\", \"type\": [\"int\",\"string\"] }]}') IS NULL FROM (SELECT NULL AS s); [true] > SELECT to_avro(s) IS NULL FROM (SELECT NULL AS s); [true] Since: 4.0.0","title":"Avro Functions"},{"location":"avro-functions/#avro-functions","text":"This page lists all avro functions available in Spark SQL.","title":"Avro Functions"},{"location":"avro-functions/#from_avro","text":"from_avro(child, jsonFormatSchema, options) - Converts a binary Avro value into a Catalyst value. Examples: > SELECT from_avro(s, '{\"type\": \"record\", \"name\": \"struct\", \"fields\": [{ \"name\": \"u\", \"type\": [\"int\",\"string\"] }]}', map()) IS NULL AS result FROM (SELECT NAMED_STRUCT('u', NAMED_STRUCT('member0', member0, 'member1', member1)) AS s FROM VALUES (1, NULL), (NULL, 'a') tab(member0, member1)); [false] Note: The specified schema must match actual schema of the read data, otherwise the behavior is undefined: it may fail or return arbitrary result. To deserialize the data with a compatible and evolved schema, the expected Avro schema can be set via the corresponding option. Since: 4.0.0","title":"from_avro"},{"location":"avro-functions/#schema_of_avro","text":"schema_of_avro(jsonFormatSchema, options) - Returns schema in the DDL format of the avro schema in JSON string format. Examples: > SELECT schema_of_avro('{\"type\": \"record\", \"name\": \"struct\", \"fields\": [{\"name\": \"u\", \"type\": [\"int\", \"string\"]}]}', map()); STRUCT<u: STRUCT<member0: INT, member1: STRING> NOT NULL> Since: 4.0.0","title":"schema_of_avro"},{"location":"avro-functions/#to_avro","text":"to_avro(child[, jsonFormatSchema]) - Converts a Catalyst binary input value into its corresponding Avro format result. Examples: > SELECT to_avro(s, '{\"type\": \"record\", \"name\": \"struct\", \"fields\": [{ \"name\": \"u\", \"type\": [\"int\",\"string\"] }]}') IS NULL FROM (SELECT NULL AS s); [true] > SELECT to_avro(s) IS NULL FROM (SELECT NULL AS s); [true] Since: 4.0.0","title":"to_avro"},{"location":"bitwise-functions/","text":"Bitwise Functions \u00b6 This page lists all bitwise functions available in Spark SQL. & \u00b6 expr1 & expr2 - Returns the result of bitwise AND of expr1 and expr2 . Examples: > SELECT 3 & 5; 1 Since: 1.4.0 << \u00b6 base << exp - Bitwise left shift. Examples: > SELECT shiftleft(2, 1); 4 > SELECT 2 << 1; 4 Note: << operator is added in Spark 4.0.0 as an alias for shiftleft . Since: 4.0.0 >> \u00b6 base >> expr - Bitwise (signed) right shift. Examples: > SELECT shiftright(4, 1); 2 > SELECT 4 >> 1; 2 Note: >> operator is added in Spark 4.0.0 as an alias for shiftright . Since: 4.0.0 >>> \u00b6 base >>> expr - Bitwise unsigned right shift. Examples: > SELECT shiftrightunsigned(4, 1); 2 > SELECT 4 >>> 1; 2 Note: >>> operator is added in Spark 4.0.0 as an alias for shiftrightunsigned . Since: 4.0.0 ^ \u00b6 expr1 ^ expr2 - Returns the result of bitwise exclusive OR of expr1 and expr2 . Examples: > SELECT 3 ^ 5; 6 Since: 1.4.0 bit_count \u00b6 bit_count(expr) - Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL. Examples: > SELECT bit_count(0); 0 Since: 3.0.0 bit_get \u00b6 bit_get(expr, pos) - Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative. Examples: > SELECT bit_get(11, 0); 1 > SELECT bit_get(11, 2); 0 Since: 3.2.0 getbit \u00b6 getbit(expr, pos) - Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative. Examples: > SELECT getbit(11, 0); 1 > SELECT getbit(11, 2); 0 Since: 3.2.0 shiftleft \u00b6 base shiftleft exp - Bitwise left shift. Examples: > SELECT shiftleft(2, 1); 4 > SELECT 2 << 1; 4 Note: << operator is added in Spark 4.0.0 as an alias for shiftleft . Since: 1.5.0 shiftright \u00b6 base shiftright expr - Bitwise (signed) right shift. Examples: > SELECT shiftright(4, 1); 2 > SELECT 4 >> 1; 2 Note: >> operator is added in Spark 4.0.0 as an alias for shiftright . Since: 1.5.0 shiftrightunsigned \u00b6 base shiftrightunsigned expr - Bitwise unsigned right shift. Examples: > SELECT shiftrightunsigned(4, 1); 2 > SELECT 4 >>> 1; 2 Note: >>> operator is added in Spark 4.0.0 as an alias for shiftrightunsigned . Since: 1.5.0 | \u00b6 expr1 | expr2 - Returns the result of bitwise OR of expr1 and expr2 . Examples: > SELECT 3 | 5; 7 Since: 1.4.0 ~ \u00b6 ~ expr - Returns the result of bitwise NOT of expr . Examples: > SELECT ~ 0; -1 Since: 1.4.0","title":"Bitwise Functions"},{"location":"bitwise-functions/#bitwise-functions","text":"This page lists all bitwise functions available in Spark SQL.","title":"Bitwise Functions"},{"location":"bitwise-functions/#_1","text":"expr1 & expr2 - Returns the result of bitwise AND of expr1 and expr2 . Examples: > SELECT 3 & 5; 1 Since: 1.4.0","title":"&amp;"},{"location":"bitwise-functions/#_2","text":"base << exp - Bitwise left shift. Examples: > SELECT shiftleft(2, 1); 4 > SELECT 2 << 1; 4 Note: << operator is added in Spark 4.0.0 as an alias for shiftleft . Since: 4.0.0","title":"&lt;&lt;"},{"location":"bitwise-functions/#_3","text":"base >> expr - Bitwise (signed) right shift. Examples: > SELECT shiftright(4, 1); 2 > SELECT 4 >> 1; 2 Note: >> operator is added in Spark 4.0.0 as an alias for shiftright . Since: 4.0.0","title":"&gt;&gt;"},{"location":"bitwise-functions/#_4","text":"base >>> expr - Bitwise unsigned right shift. Examples: > SELECT shiftrightunsigned(4, 1); 2 > SELECT 4 >>> 1; 2 Note: >>> operator is added in Spark 4.0.0 as an alias for shiftrightunsigned . Since: 4.0.0","title":"&gt;&gt;&gt;"},{"location":"bitwise-functions/#_5","text":"expr1 ^ expr2 - Returns the result of bitwise exclusive OR of expr1 and expr2 . Examples: > SELECT 3 ^ 5; 6 Since: 1.4.0","title":"^"},{"location":"bitwise-functions/#bit_count","text":"bit_count(expr) - Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL. Examples: > SELECT bit_count(0); 0 Since: 3.0.0","title":"bit_count"},{"location":"bitwise-functions/#bit_get","text":"bit_get(expr, pos) - Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative. Examples: > SELECT bit_get(11, 0); 1 > SELECT bit_get(11, 2); 0 Since: 3.2.0","title":"bit_get"},{"location":"bitwise-functions/#getbit","text":"getbit(expr, pos) - Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative. Examples: > SELECT getbit(11, 0); 1 > SELECT getbit(11, 2); 0 Since: 3.2.0","title":"getbit"},{"location":"bitwise-functions/#shiftleft","text":"base shiftleft exp - Bitwise left shift. Examples: > SELECT shiftleft(2, 1); 4 > SELECT 2 << 1; 4 Note: << operator is added in Spark 4.0.0 as an alias for shiftleft . Since: 1.5.0","title":"shiftleft"},{"location":"bitwise-functions/#shiftright","text":"base shiftright expr - Bitwise (signed) right shift. Examples: > SELECT shiftright(4, 1); 2 > SELECT 4 >> 1; 2 Note: >> operator is added in Spark 4.0.0 as an alias for shiftright . Since: 1.5.0","title":"shiftright"},{"location":"bitwise-functions/#shiftrightunsigned","text":"base shiftrightunsigned expr - Bitwise unsigned right shift. Examples: > SELECT shiftrightunsigned(4, 1); 2 > SELECT 4 >>> 1; 2 Note: >>> operator is added in Spark 4.0.0 as an alias for shiftrightunsigned . Since: 1.5.0","title":"shiftrightunsigned"},{"location":"bitwise-functions/#_6","text":"expr1 | expr2 - Returns the result of bitwise OR of expr1 and expr2 . Examples: > SELECT 3 | 5; 7 Since: 1.4.0","title":"|"},{"location":"bitwise-functions/#_7","text":"~ expr - Returns the result of bitwise NOT of expr . Examples: > SELECT ~ 0; -1 Since: 1.4.0","title":"~"},{"location":"collection-functions/","text":"Collection Functions \u00b6 This page lists all collection functions available in Spark SQL. aggregate \u00b6 aggregate(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function. Examples: > SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x); 6 > SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10); 60 Since: 2.4.0 array_sort \u00b6 array_sort(expr, func) - Sorts the input array. If func is omitted, sort in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array. Since 3.0.0 this function also sorts and returns the array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error. Examples: > SELECT array_sort(array(5, 6, 1), (left, right) -> case when left < right then -1 when left > right then 1 else 0 end); [1,5,6] > SELECT array_sort(array('bc', 'ab', 'dc'), (left, right) -> case when left is null and right is null then 0 when left is null then -1 when right is null then 1 when left < right then 1 when left > right then -1 else 0 end); [\"dc\",\"bc\",\"ab\"] > SELECT array_sort(array('b', 'd', null, 'c', 'a')); [\"a\",\"b\",\"c\",\"d\",null] Since: 2.4.0 cardinality \u00b6 cardinality(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input. Examples: > SELECT cardinality(array('b', 'd', 'c', 'a')); 4 > SELECT cardinality(map('a', 1, 'b', 2)); 2 Since: 2.4.0 concat \u00b6 concat(col1, col2, ..., colN) - Returns the concatenation of col1, col2, ..., colN. Examples: > SELECT concat('Spark', 'SQL'); SparkSQL > SELECT concat(array(1, 2, 3), array(4, 5), array(6)); [1,2,3,4,5,6] Note: Concat logic for arrays is available since 2.4.0. Since: 1.5.0 element_at \u00b6 element_at(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled is set to false. If spark.sql.ansi.enabled is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices. element_at(map, key) - Returns value for given key. The function returns NULL if the key is not contained in the map. Examples: > SELECT element_at(array(1, 2, 3), 2); 2 > SELECT element_at(map(1, 'a', 2, 'b'), 2); b Since: 2.4.0 exists \u00b6 exists(expr, pred) - Tests whether a predicate holds for one or more elements in the array. Examples: > SELECT exists(array(1, 2, 3), x -> x % 2 == 0); true > SELECT exists(array(1, 2, 3), x -> x % 2 == 10); false > SELECT exists(array(1, null, 3), x -> x % 2 == 0); NULL > SELECT exists(array(0, null, 2, 3, null), x -> x IS NULL); true > SELECT exists(array(1, 2, 3), x -> x IS NULL); false Since: 2.4.0 filter \u00b6 filter(expr, func) - Filters the input array using the given predicate. Examples: > SELECT filter(array(1, 2, 3), x -> x % 2 == 1); [1,3] > SELECT filter(array(0, 2, 3), (x, i) -> x > i); [2,3] > SELECT filter(array(0, null, 2, 3, null), x -> x IS NOT NULL); [0,2,3] Note: The inner function may use the index argument since 3.0.0. Since: 2.4.0 forall \u00b6 forall(expr, pred) - Tests whether a predicate holds for all elements in the array. Examples: > SELECT forall(array(1, 2, 3), x -> x % 2 == 0); false > SELECT forall(array(2, 4, 8), x -> x % 2 == 0); true > SELECT forall(array(1, null, 3), x -> x % 2 == 0); false > SELECT forall(array(2, null, 8), x -> x % 2 == 0); NULL Since: 3.0.0 map_filter \u00b6 map_filter(expr, func) - Filters entries in a map using the function. Examples: > SELECT map_filter(map(1, 0, 2, 2, 3, -1), (k, v) -> k > v); {1:0,3:-1} Since: 3.0.0 map_zip_with \u00b6 map_zip_with(map1, map2, function) - Merges two given maps into a single map by applying function to the pair of values with the same key. For keys only presented in one map, NULL will be passed as the value for the missing key. If an input map contains duplicated keys, only the first entry of the duplicated key is passed into the lambda function. Examples: > SELECT map_zip_with(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); {1:\"ax\",2:\"by\"} > SELECT map_zip_with(map('a', 1, 'b', 2), map('b', 3, 'c', 4), (k, v1, v2) -> coalesce(v1, 0) + coalesce(v2, 0)); {\"a\":1,\"b\":5,\"c\":4} Since: 3.0.0 reduce \u00b6 reduce(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function. Examples: > SELECT reduce(array(1, 2, 3), 0, (acc, x) -> acc + x); 6 > SELECT reduce(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10); 60 Since: 3.4.0 reverse \u00b6 reverse(array) - Returns a reversed string or an array with reverse order of elements. Examples: > SELECT reverse('Spark SQL'); LQS krapS > SELECT reverse(array(2, 1, 4, 3)); [3,4,1,2] Note: Reverse logic for arrays is available since 2.4.0. Since: 1.5.0 size \u00b6 size(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input. Examples: > SELECT size(array('b', 'd', 'c', 'a')); 4 > SELECT size(map('a', 1, 'b', 2)); 2 Since: 1.5.0 transform \u00b6 transform(expr, func) - Transforms elements in an array using the function. Examples: > SELECT transform(array(1, 2, 3), x -> x + 1); [2,3,4] > SELECT transform(array(1, 2, 3), (x, i) -> x + i); [1,3,5] Since: 2.4.0 transform_keys \u00b6 transform_keys(expr, func) - Transforms elements in a map using the function. Examples: > SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1); {2:1,3:2,4:3} > SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); {2:1,4:2,6:3} Since: 3.0.0 transform_values \u00b6 transform_values(expr, func) - Transforms values in the map using the function. Examples: > SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1); {1:2,2:3,3:4} > SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); {1:2,2:4,3:6} Since: 3.0.0 try_element_at \u00b6 try_element_at(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function always returns NULL if the index exceeds the length of the array. try_element_at(map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map. Examples: > SELECT try_element_at(array(1, 2, 3), 2); 2 > SELECT try_element_at(map(1, 'a', 2, 'b'), 2); b Since: 3.3.0 zip_with \u00b6 zip_with(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function. Examples: > SELECT zip_with(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x)); [{\"y\":\"a\",\"x\":1},{\"y\":\"b\",\"x\":2},{\"y\":\"c\",\"x\":3}] > SELECT zip_with(array(1, 2), array(3, 4), (x, y) -> x + y); [4,6] > SELECT zip_with(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)); [\"ad\",\"be\",\"cf\"] Since: 2.4.0","title":"Collection Functions"},{"location":"collection-functions/#collection-functions","text":"This page lists all collection functions available in Spark SQL.","title":"Collection Functions"},{"location":"collection-functions/#aggregate","text":"aggregate(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function. Examples: > SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x); 6 > SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10); 60 Since: 2.4.0","title":"aggregate"},{"location":"collection-functions/#array_sort","text":"array_sort(expr, func) - Sorts the input array. If func is omitted, sort in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array. Since 3.0.0 this function also sorts and returns the array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error. Examples: > SELECT array_sort(array(5, 6, 1), (left, right) -> case when left < right then -1 when left > right then 1 else 0 end); [1,5,6] > SELECT array_sort(array('bc', 'ab', 'dc'), (left, right) -> case when left is null and right is null then 0 when left is null then -1 when right is null then 1 when left < right then 1 when left > right then -1 else 0 end); [\"dc\",\"bc\",\"ab\"] > SELECT array_sort(array('b', 'd', null, 'c', 'a')); [\"a\",\"b\",\"c\",\"d\",null] Since: 2.4.0","title":"array_sort"},{"location":"collection-functions/#cardinality","text":"cardinality(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input. Examples: > SELECT cardinality(array('b', 'd', 'c', 'a')); 4 > SELECT cardinality(map('a', 1, 'b', 2)); 2 Since: 2.4.0","title":"cardinality"},{"location":"collection-functions/#concat","text":"concat(col1, col2, ..., colN) - Returns the concatenation of col1, col2, ..., colN. Examples: > SELECT concat('Spark', 'SQL'); SparkSQL > SELECT concat(array(1, 2, 3), array(4, 5), array(6)); [1,2,3,4,5,6] Note: Concat logic for arrays is available since 2.4.0. Since: 1.5.0","title":"concat"},{"location":"collection-functions/#element_at","text":"element_at(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled is set to false. If spark.sql.ansi.enabled is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices. element_at(map, key) - Returns value for given key. The function returns NULL if the key is not contained in the map. Examples: > SELECT element_at(array(1, 2, 3), 2); 2 > SELECT element_at(map(1, 'a', 2, 'b'), 2); b Since: 2.4.0","title":"element_at"},{"location":"collection-functions/#exists","text":"exists(expr, pred) - Tests whether a predicate holds for one or more elements in the array. Examples: > SELECT exists(array(1, 2, 3), x -> x % 2 == 0); true > SELECT exists(array(1, 2, 3), x -> x % 2 == 10); false > SELECT exists(array(1, null, 3), x -> x % 2 == 0); NULL > SELECT exists(array(0, null, 2, 3, null), x -> x IS NULL); true > SELECT exists(array(1, 2, 3), x -> x IS NULL); false Since: 2.4.0","title":"exists"},{"location":"collection-functions/#filter","text":"filter(expr, func) - Filters the input array using the given predicate. Examples: > SELECT filter(array(1, 2, 3), x -> x % 2 == 1); [1,3] > SELECT filter(array(0, 2, 3), (x, i) -> x > i); [2,3] > SELECT filter(array(0, null, 2, 3, null), x -> x IS NOT NULL); [0,2,3] Note: The inner function may use the index argument since 3.0.0. Since: 2.4.0","title":"filter"},{"location":"collection-functions/#forall","text":"forall(expr, pred) - Tests whether a predicate holds for all elements in the array. Examples: > SELECT forall(array(1, 2, 3), x -> x % 2 == 0); false > SELECT forall(array(2, 4, 8), x -> x % 2 == 0); true > SELECT forall(array(1, null, 3), x -> x % 2 == 0); false > SELECT forall(array(2, null, 8), x -> x % 2 == 0); NULL Since: 3.0.0","title":"forall"},{"location":"collection-functions/#map_filter","text":"map_filter(expr, func) - Filters entries in a map using the function. Examples: > SELECT map_filter(map(1, 0, 2, 2, 3, -1), (k, v) -> k > v); {1:0,3:-1} Since: 3.0.0","title":"map_filter"},{"location":"collection-functions/#map_zip_with","text":"map_zip_with(map1, map2, function) - Merges two given maps into a single map by applying function to the pair of values with the same key. For keys only presented in one map, NULL will be passed as the value for the missing key. If an input map contains duplicated keys, only the first entry of the duplicated key is passed into the lambda function. Examples: > SELECT map_zip_with(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); {1:\"ax\",2:\"by\"} > SELECT map_zip_with(map('a', 1, 'b', 2), map('b', 3, 'c', 4), (k, v1, v2) -> coalesce(v1, 0) + coalesce(v2, 0)); {\"a\":1,\"b\":5,\"c\":4} Since: 3.0.0","title":"map_zip_with"},{"location":"collection-functions/#reduce","text":"reduce(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function. Examples: > SELECT reduce(array(1, 2, 3), 0, (acc, x) -> acc + x); 6 > SELECT reduce(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10); 60 Since: 3.4.0","title":"reduce"},{"location":"collection-functions/#reverse","text":"reverse(array) - Returns a reversed string or an array with reverse order of elements. Examples: > SELECT reverse('Spark SQL'); LQS krapS > SELECT reverse(array(2, 1, 4, 3)); [3,4,1,2] Note: Reverse logic for arrays is available since 2.4.0. Since: 1.5.0","title":"reverse"},{"location":"collection-functions/#size","text":"size(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input. Examples: > SELECT size(array('b', 'd', 'c', 'a')); 4 > SELECT size(map('a', 1, 'b', 2)); 2 Since: 1.5.0","title":"size"},{"location":"collection-functions/#transform","text":"transform(expr, func) - Transforms elements in an array using the function. Examples: > SELECT transform(array(1, 2, 3), x -> x + 1); [2,3,4] > SELECT transform(array(1, 2, 3), (x, i) -> x + i); [1,3,5] Since: 2.4.0","title":"transform"},{"location":"collection-functions/#transform_keys","text":"transform_keys(expr, func) - Transforms elements in a map using the function. Examples: > SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1); {2:1,3:2,4:3} > SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); {2:1,4:2,6:3} Since: 3.0.0","title":"transform_keys"},{"location":"collection-functions/#transform_values","text":"transform_values(expr, func) - Transforms values in the map using the function. Examples: > SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1); {1:2,2:3,3:4} > SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); {1:2,2:4,3:6} Since: 3.0.0","title":"transform_values"},{"location":"collection-functions/#try_element_at","text":"try_element_at(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function always returns NULL if the index exceeds the length of the array. try_element_at(map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map. Examples: > SELECT try_element_at(array(1, 2, 3), 2); 2 > SELECT try_element_at(map(1, 'a', 2, 'b'), 2); b Since: 3.3.0","title":"try_element_at"},{"location":"collection-functions/#zip_with","text":"zip_with(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function. Examples: > SELECT zip_with(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x)); [{\"y\":\"a\",\"x\":1},{\"y\":\"b\",\"x\":2},{\"y\":\"c\",\"x\":3}] > SELECT zip_with(array(1, 2), array(3, 4), (x, y) -> x + y); [4,6] > SELECT zip_with(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)); [\"ad\",\"be\",\"cf\"] Since: 2.4.0","title":"zip_with"},{"location":"conditional-functions/","text":"Conditional Functions \u00b6 This page lists all conditional functions available in Spark SQL. between \u00b6 input [NOT] between lower AND upper - evaluate if input is [not] in between lower and upper Arguments: input - An expression that is being compared with lower and upper bound. lower - Lower bound of the between check. upper - Upper bound of the between check. Examples: > SELECT 0.5 between 0.1 AND 1.0; true Since: 1.0.0 case \u00b6 CASE expr1 WHEN expr2 THEN expr3 [WHEN expr4 THEN expr5]* [ELSE expr6] END - When expr1 = expr2 , returns expr3 ; when expr1 = expr4 , return expr5 ; else return expr6 . Arguments: expr1 - the expression which is one operand of comparison. expr2, expr4 - the expressions each of which is the other operand of comparison. expr3, expr5, expr6 - the branch value expressions and else value expression should all be same type or coercible to a common type. Examples: > SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE '?' END FROM VALUES 1, 2, 3; one two ? > SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' END FROM VALUES 1, 2, 3; one two NULL Since: 1.0.1 coalesce \u00b6 coalesce(expr1, expr2, ...) - Returns the first non-null argument if exists. Otherwise, null. Examples: > SELECT coalesce(NULL, 1, NULL); 1 Since: 1.0.0 if \u00b6 if(expr1, expr2, expr3) - If expr1 evaluates to true, then returns expr2 ; otherwise returns expr3 . Examples: > SELECT if(1 < 2, 'a', 'b'); a Since: 1.0.0 ifnull \u00b6 ifnull(expr1, expr2) - Returns expr2 if expr1 is null, or expr1 otherwise. Examples: > SELECT ifnull(NULL, array('2')); [\"2\"] Since: 2.0.0 nanvl \u00b6 nanvl(expr1, expr2) - Returns expr1 if it's not NaN, or expr2 otherwise. Examples: > SELECT nanvl(cast('NaN' as double), 123); 123.0 Since: 1.5.0 nullif \u00b6 nullif(expr1, expr2) - Returns null if expr1 equals to expr2 , or expr1 otherwise. Examples: > SELECT nullif(2, 2); NULL Since: 2.0.0 nullifzero \u00b6 nullifzero(expr) - Returns null if expr is equal to zero, or expr otherwise. Examples: > SELECT nullifzero(0); NULL > SELECT nullifzero(2); 2 Since: 4.0.0 nvl \u00b6 nvl(expr1, expr2) - Returns expr2 if expr1 is null, or expr1 otherwise. Examples: > SELECT nvl(NULL, array('2')); [\"2\"] Since: 2.0.0 nvl2 \u00b6 nvl2(expr1, expr2, expr3) - Returns expr2 if expr1 is not null, or expr3 otherwise. Examples: > SELECT nvl2(NULL, 2, 1); 1 Since: 2.0.0 when \u00b6 CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When expr1 = true, returns expr2 ; else when expr3 = true, returns expr4 ; else returns expr5 . Arguments: expr1, expr3 - the branch condition expressions should all be boolean type. expr2, expr4, expr5 - the branch value expressions and else value expression should all be same type or coercible to a common type. Examples: > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END; 1.0 > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END; 2.0 > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 END; NULL Since: 1.0.1 zeroifnull \u00b6 zeroifnull(expr) - Returns zero if expr is equal to null, or expr otherwise. Examples: > SELECT zeroifnull(NULL); 0 > SELECT zeroifnull(2); 2 Since: 4.0.0","title":"Conditional Functions"},{"location":"conditional-functions/#conditional-functions","text":"This page lists all conditional functions available in Spark SQL.","title":"Conditional Functions"},{"location":"conditional-functions/#between","text":"input [NOT] between lower AND upper - evaluate if input is [not] in between lower and upper Arguments: input - An expression that is being compared with lower and upper bound. lower - Lower bound of the between check. upper - Upper bound of the between check. Examples: > SELECT 0.5 between 0.1 AND 1.0; true Since: 1.0.0","title":"between"},{"location":"conditional-functions/#case","text":"CASE expr1 WHEN expr2 THEN expr3 [WHEN expr4 THEN expr5]* [ELSE expr6] END - When expr1 = expr2 , returns expr3 ; when expr1 = expr4 , return expr5 ; else return expr6 . Arguments: expr1 - the expression which is one operand of comparison. expr2, expr4 - the expressions each of which is the other operand of comparison. expr3, expr5, expr6 - the branch value expressions and else value expression should all be same type or coercible to a common type. Examples: > SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE '?' END FROM VALUES 1, 2, 3; one two ? > SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' END FROM VALUES 1, 2, 3; one two NULL Since: 1.0.1","title":"case"},{"location":"conditional-functions/#coalesce","text":"coalesce(expr1, expr2, ...) - Returns the first non-null argument if exists. Otherwise, null. Examples: > SELECT coalesce(NULL, 1, NULL); 1 Since: 1.0.0","title":"coalesce"},{"location":"conditional-functions/#if","text":"if(expr1, expr2, expr3) - If expr1 evaluates to true, then returns expr2 ; otherwise returns expr3 . Examples: > SELECT if(1 < 2, 'a', 'b'); a Since: 1.0.0","title":"if"},{"location":"conditional-functions/#ifnull","text":"ifnull(expr1, expr2) - Returns expr2 if expr1 is null, or expr1 otherwise. Examples: > SELECT ifnull(NULL, array('2')); [\"2\"] Since: 2.0.0","title":"ifnull"},{"location":"conditional-functions/#nanvl","text":"nanvl(expr1, expr2) - Returns expr1 if it's not NaN, or expr2 otherwise. Examples: > SELECT nanvl(cast('NaN' as double), 123); 123.0 Since: 1.5.0","title":"nanvl"},{"location":"conditional-functions/#nullif","text":"nullif(expr1, expr2) - Returns null if expr1 equals to expr2 , or expr1 otherwise. Examples: > SELECT nullif(2, 2); NULL Since: 2.0.0","title":"nullif"},{"location":"conditional-functions/#nullifzero","text":"nullifzero(expr) - Returns null if expr is equal to zero, or expr otherwise. Examples: > SELECT nullifzero(0); NULL > SELECT nullifzero(2); 2 Since: 4.0.0","title":"nullifzero"},{"location":"conditional-functions/#nvl","text":"nvl(expr1, expr2) - Returns expr2 if expr1 is null, or expr1 otherwise. Examples: > SELECT nvl(NULL, array('2')); [\"2\"] Since: 2.0.0","title":"nvl"},{"location":"conditional-functions/#nvl2","text":"nvl2(expr1, expr2, expr3) - Returns expr2 if expr1 is not null, or expr3 otherwise. Examples: > SELECT nvl2(NULL, 2, 1); 1 Since: 2.0.0","title":"nvl2"},{"location":"conditional-functions/#when","text":"CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When expr1 = true, returns expr2 ; else when expr3 = true, returns expr4 ; else returns expr5 . Arguments: expr1, expr3 - the branch condition expressions should all be boolean type. expr2, expr4, expr5 - the branch value expressions and else value expression should all be same type or coercible to a common type. Examples: > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END; 1.0 > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END; 2.0 > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 END; NULL Since: 1.0.1","title":"when"},{"location":"conditional-functions/#zeroifnull","text":"zeroifnull(expr) - Returns zero if expr is equal to null, or expr otherwise. Examples: > SELECT zeroifnull(NULL); 0 > SELECT zeroifnull(2); 2 Since: 4.0.0","title":"zeroifnull"},{"location":"conversion-functions/","text":"Conversion Functions \u00b6 This page lists all conversion functions available in Spark SQL. bigint \u00b6 bigint(expr) - Casts the value expr to the target data type bigint . Since: 2.0.1 binary \u00b6 binary(expr) - Casts the value expr to the target data type binary . Since: 2.0.1 boolean \u00b6 boolean(expr) - Casts the value expr to the target data type boolean . Since: 2.0.1 cast \u00b6 cast(expr AS type) - Casts the value expr to the target data type type . expr :: type alternative casting syntax is also supported. Examples: > SELECT cast('10' as int); 10 > SELECT '10' :: int; 10 Since: 1.0.0 date \u00b6 date(expr) - Casts the value expr to the target data type date . Since: 2.0.1 decimal \u00b6 decimal(expr) - Casts the value expr to the target data type decimal . Since: 2.0.1 double \u00b6 double(expr) - Casts the value expr to the target data type double . Since: 2.0.1 float \u00b6 float(expr) - Casts the value expr to the target data type float . Since: 2.0.1 int \u00b6 int(expr) - Casts the value expr to the target data type int . Since: 2.0.1 smallint \u00b6 smallint(expr) - Casts the value expr to the target data type smallint . Since: 2.0.1 string \u00b6 string(expr) - Casts the value expr to the target data type string . Since: 2.0.1 time \u00b6 time(expr) - Casts the value expr to the target data type time . Since: 2.0.1 timestamp \u00b6 timestamp(expr) - Casts the value expr to the target data type timestamp . Since: 2.0.1 tinyint \u00b6 tinyint(expr) - Casts the value expr to the target data type tinyint . Since: 2.0.1","title":"Conversion Functions"},{"location":"conversion-functions/#conversion-functions","text":"This page lists all conversion functions available in Spark SQL.","title":"Conversion Functions"},{"location":"conversion-functions/#bigint","text":"bigint(expr) - Casts the value expr to the target data type bigint . Since: 2.0.1","title":"bigint"},{"location":"conversion-functions/#binary","text":"binary(expr) - Casts the value expr to the target data type binary . Since: 2.0.1","title":"binary"},{"location":"conversion-functions/#boolean","text":"boolean(expr) - Casts the value expr to the target data type boolean . Since: 2.0.1","title":"boolean"},{"location":"conversion-functions/#cast","text":"cast(expr AS type) - Casts the value expr to the target data type type . expr :: type alternative casting syntax is also supported. Examples: > SELECT cast('10' as int); 10 > SELECT '10' :: int; 10 Since: 1.0.0","title":"cast"},{"location":"conversion-functions/#date","text":"date(expr) - Casts the value expr to the target data type date . Since: 2.0.1","title":"date"},{"location":"conversion-functions/#decimal","text":"decimal(expr) - Casts the value expr to the target data type decimal . Since: 2.0.1","title":"decimal"},{"location":"conversion-functions/#double","text":"double(expr) - Casts the value expr to the target data type double . Since: 2.0.1","title":"double"},{"location":"conversion-functions/#float","text":"float(expr) - Casts the value expr to the target data type float . Since: 2.0.1","title":"float"},{"location":"conversion-functions/#int","text":"int(expr) - Casts the value expr to the target data type int . Since: 2.0.1","title":"int"},{"location":"conversion-functions/#smallint","text":"smallint(expr) - Casts the value expr to the target data type smallint . Since: 2.0.1","title":"smallint"},{"location":"conversion-functions/#string","text":"string(expr) - Casts the value expr to the target data type string . Since: 2.0.1","title":"string"},{"location":"conversion-functions/#time","text":"time(expr) - Casts the value expr to the target data type time . Since: 2.0.1","title":"time"},{"location":"conversion-functions/#timestamp","text":"timestamp(expr) - Casts the value expr to the target data type timestamp . Since: 2.0.1","title":"timestamp"},{"location":"conversion-functions/#tinyint","text":"tinyint(expr) - Casts the value expr to the target data type tinyint . Since: 2.0.1","title":"tinyint"},{"location":"csv-functions/","text":"Csv Functions \u00b6 This page lists all csv functions available in Spark SQL. from_csv \u00b6 from_csv(csvStr, schema[, options]) - Returns a struct value with the given csvStr and schema . Examples: > SELECT from_csv('1, 0.8', 'a INT, b DOUBLE'); {\"a\":1,\"b\":0.8} > SELECT from_csv('26/08/2015', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); {\"time\":2015-08-26 00:00:00} Since: 3.0.0 schema_of_csv \u00b6 schema_of_csv(csv[, options]) - Returns schema in the DDL format of CSV string. Examples: > SELECT schema_of_csv('1,abc'); STRUCT<_c0: INT, _c1: STRING> Since: 3.0.0 to_csv \u00b6 to_csv(expr[, options]) - Returns a CSV string with a given struct value Examples: > SELECT to_csv(named_struct('a', 1, 'b', 2)); 1,2 > SELECT to_csv(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); 26/08/2015 Since: 3.0.0","title":"Csv Functions"},{"location":"csv-functions/#csv-functions","text":"This page lists all csv functions available in Spark SQL.","title":"Csv Functions"},{"location":"csv-functions/#from_csv","text":"from_csv(csvStr, schema[, options]) - Returns a struct value with the given csvStr and schema . Examples: > SELECT from_csv('1, 0.8', 'a INT, b DOUBLE'); {\"a\":1,\"b\":0.8} > SELECT from_csv('26/08/2015', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); {\"time\":2015-08-26 00:00:00} Since: 3.0.0","title":"from_csv"},{"location":"csv-functions/#schema_of_csv","text":"schema_of_csv(csv[, options]) - Returns schema in the DDL format of CSV string. Examples: > SELECT schema_of_csv('1,abc'); STRUCT<_c0: INT, _c1: STRING> Since: 3.0.0","title":"schema_of_csv"},{"location":"csv-functions/#to_csv","text":"to_csv(expr[, options]) - Returns a CSV string with a given struct value Examples: > SELECT to_csv(named_struct('a', 1, 'b', 2)); 1,2 > SELECT to_csv(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); 26/08/2015 Since: 3.0.0","title":"to_csv"},{"location":"datetime-functions/","text":"Datetime Functions \u00b6 This page lists all datetime functions available in Spark SQL. add_months \u00b6 add_months(start_date, num_months) - Returns the date that is num_months after start_date . Examples: > SELECT add_months('2016-08-31', 1); 2016-09-30 Since: 1.5.0 convert_timezone \u00b6 convert_timezone([sourceTz, ]targetTz, sourceTs) - Converts the timestamp without time zone sourceTs from the sourceTz time zone to targetTz . Arguments: sourceTz - the time zone for the input timestamp. If it is missed, the current session time zone is used as the source time zone. targetTz - the time zone to which the input timestamp should be converted sourceTs - a timestamp without time zone Examples: > SELECT convert_timezone('Europe/Brussels', 'America/Los_Angeles', timestamp_ntz'2021-12-06 00:00:00'); 2021-12-05 15:00:00 > SELECT convert_timezone('Europe/Brussels', timestamp_ntz'2021-12-05 15:00:00'); 2021-12-06 00:00:00 Since: 3.4.0 curdate \u00b6 curdate() - Returns the current date at the start of query evaluation. All calls of curdate within the same query return the same value. Examples: > SELECT curdate(); 2022-09-06 Since: 3.4.0 current_date \u00b6 current_date() - Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value. current_date - Returns the current date at the start of query evaluation. Examples: > SELECT current_date(); 2020-04-25 > SELECT current_date; 2020-04-25 Note: The syntax without braces has been supported since 2.0.1. Since: 1.5.0 current_time \u00b6 current_time([precision]) - Returns the current time at the start of query evaluation. All calls of current_time within the same query return the same value. current_time - Returns the current time at the start of query evaluation. Arguments: precision - An optional integer literal in the range [0..6], indicating how many fractional digits of seconds to include. If omitted, the default is 6. Examples: > SELECT current_time(); 15:49:11.914120 > SELECT current_time; 15:49:11.914120 > SELECT current_time(0); 15:49:11 > SELECT current_time(3); 15:49:11.914 > SELECT current_time(1+1); 15:49:11.91 Since: 4.1.0 current_timestamp \u00b6 current_timestamp() - Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value. current_timestamp - Returns the current timestamp at the start of query evaluation. Examples: > SELECT current_timestamp(); 2020-04-25 15:49:11.914 > SELECT current_timestamp; 2020-04-25 15:49:11.914 Note: The syntax without braces has been supported since 2.0.1. Since: 1.5.0 current_timezone \u00b6 current_timezone() - Returns the current session local timezone. Examples: > SELECT current_timezone(); Asia/Shanghai Since: 3.1.0 date_add \u00b6 date_add(start_date, num_days) - Returns the date that is num_days after start_date . Examples: > SELECT date_add('2016-07-30', 1); 2016-07-31 Since: 1.5.0 date_diff \u00b6 date_diff(endDate, startDate) - Returns the number of days from startDate to endDate . Examples: > SELECT date_diff('2009-07-31', '2009-07-30'); 1 > SELECT date_diff('2009-07-30', '2009-07-31'); -1 Since: 3.4.0 date_format \u00b6 date_format(timestamp, fmt) - Converts timestamp to a value of string in the format specified by the date format fmt . Arguments: timestamp - A date/timestamp or string to be converted to the given format. fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT date_format('2016-04-08', 'y'); 2016 Since: 1.5.0 date_from_unix_date \u00b6 date_from_unix_date(days) - Create date from the number of days since 1970-01-01. Examples: > SELECT date_from_unix_date(1); 1970-01-02 Since: 3.1.0 date_part \u00b6 date_part(field, source) - Extracts a part of the date/timestamp or interval source. Arguments: field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function EXTRACT . source - a date/timestamp or interval column from where field should be extracted Examples: > SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); 2019 > SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456'); 33 > SELECT date_part('doy', DATE'2019-08-12'); 224 > SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001'); 1.000001 > SELECT date_part('days', interval 5 days 3 hours 7 minutes); 5 > SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 30.001001 > SELECT date_part('MONTH', INTERVAL '2021-11' YEAR TO MONTH); 11 > SELECT date_part('MINUTE', INTERVAL '123 23:55:59.002001' DAY TO SECOND); 55 Note: The date_part function is equivalent to the SQL-standard function EXTRACT(field FROM source) Since: 3.0.0 date_sub \u00b6 date_sub(start_date, num_days) - Returns the date that is num_days before start_date . Examples: > SELECT date_sub('2016-07-30', 1); 2016-07-29 Since: 1.5.0 date_trunc \u00b6 date_trunc(fmt, ts) - Returns timestamp ts truncated to the unit specified by the format model fmt . Arguments: fmt - the format representing the unit to be truncated to \"YEAR\", \"YYYY\", \"YY\" - truncate to the first date of the year that the ts falls in, the time part will be zero out \"QUARTER\" - truncate to the first date of the quarter that the ts falls in, the time part will be zero out \"MONTH\", \"MM\", \"MON\" - truncate to the first date of the month that the ts falls in, the time part will be zero out \"WEEK\" - truncate to the Monday of the week that the ts falls in, the time part will be zero out \"DAY\", \"DD\" - zero out the time part \"HOUR\" - zero out the minute and second with fraction part \"MINUTE\"- zero out the second with fraction part \"SECOND\" - zero out the second fraction part \"MILLISECOND\" - zero out the microseconds \"MICROSECOND\" - everything remains ts - datetime value or valid timestamp string Examples: > SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359'); 2015-01-01 00:00:00 > SELECT date_trunc('MM', '2015-03-05T09:32:05.359'); 2015-03-01 00:00:00 > SELECT date_trunc('DD', '2015-03-05T09:32:05.359'); 2015-03-05 00:00:00 > SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359'); 2015-03-05 09:00:00 > SELECT date_trunc('MILLISECOND', '2015-03-05T09:32:05.123456'); 2015-03-05 09:32:05.123 Since: 2.3.0 dateadd \u00b6 dateadd(start_date, num_days) - Returns the date that is num_days after start_date . Examples: > SELECT dateadd('2016-07-30', 1); 2016-07-31 Since: 3.4.0 datediff \u00b6 datediff(endDate, startDate) - Returns the number of days from startDate to endDate . Examples: > SELECT datediff('2009-07-31', '2009-07-30'); 1 > SELECT datediff('2009-07-30', '2009-07-31'); -1 Since: 1.5.0 datepart \u00b6 datepart(field, source) - Extracts a part of the date/timestamp or interval source. Arguments: field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function EXTRACT . source - a date/timestamp or interval column from where field should be extracted Examples: > SELECT datepart('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); 2019 > SELECT datepart('week', timestamp'2019-08-12 01:00:00.123456'); 33 > SELECT datepart('doy', DATE'2019-08-12'); 224 > SELECT datepart('SECONDS', timestamp'2019-10-01 00:00:01.000001'); 1.000001 > SELECT datepart('days', interval 5 days 3 hours 7 minutes); 5 > SELECT datepart('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 30.001001 > SELECT datepart('MONTH', INTERVAL '2021-11' YEAR TO MONTH); 11 > SELECT datepart('MINUTE', INTERVAL '123 23:55:59.002001' DAY TO SECOND); 55 Note: The datepart function is equivalent to the SQL-standard function EXTRACT(field FROM source) Since: 3.4.0 day \u00b6 day(date) - Returns the day of month of the date/timestamp. Examples: > SELECT day('2009-07-30'); 30 Since: 1.5.0 dayname \u00b6 dayname(date) - Returns the three-letter abbreviated day name from the given date. Examples: > SELECT dayname(DATE('2008-02-20')); Wed Since: 4.0.0 dayofmonth \u00b6 dayofmonth(date) - Returns the day of month of the date/timestamp. Examples: > SELECT dayofmonth('2009-07-30'); 30 Since: 1.5.0 dayofweek \u00b6 dayofweek(date) - Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday). Examples: > SELECT dayofweek('2009-07-30'); 5 Since: 2.3.0 dayofyear \u00b6 dayofyear(date) - Returns the day of year of the date/timestamp. Examples: > SELECT dayofyear('2016-04-09'); 100 Since: 1.5.0 extract \u00b6 extract(field FROM source) - Extracts a part of the date or timestamp or time or interval source. Arguments: field - selects which part of the source should be extracted Supported string values of field for dates and timestamps are(case insensitive): \"YEAR\", (\"Y\", \"YEARS\", \"YR\", \"YRS\") - the year field \"YEAROFWEEK\" - the ISO 8601 week-numbering year that the datetime falls in. For example, 2005-01-02 is part of the 53rd week of year 2004, so the result is 2004 \"QUARTER\", (\"QTR\") - the quarter (1 - 4) of the year that the datetime falls in \"MONTH\", (\"MON\", \"MONS\", \"MONTHS\") - the month field (1 - 12) \"WEEK\", (\"W\", \"WEEKS\") - the number of the ISO 8601 week-of-week-based-year. A week is considered to start on a Monday and week 1 is the first week with >3 days. In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-02 is part of the 53rd week of year 2004, while 2012-12-31 is part of the first week of 2013 \"DAY\", (\"D\", \"DAYS\") - the day of the month field (1 - 31) \"DAYOFWEEK\",(\"DOW\") - the day of the week for datetime as Sunday(1) to Saturday(7) \"DAYOFWEEK_ISO\",(\"DOW_ISO\") - ISO 8601 based day of the week for datetime as Monday(1) to Sunday(7) \"DOY\" - the day of the year (1 - 365/366) \"HOUR\", (\"H\", \"HOURS\", \"HR\", \"HRS\") - The hour field (0 - 23) \"MINUTE\", (\"M\", \"MIN\", \"MINS\", \"MINUTES\") - the minutes field (0 - 59) \"SECOND\", (\"S\", \"SEC\", \"SECONDS\", \"SECS\") - the seconds field, including fractional parts Supported string values of field for interval(which consists of months , days , microseconds ) are(case insensitive): \"YEAR\", (\"Y\", \"YEARS\", \"YR\", \"YRS\") - the total months / 12 \"MONTH\", (\"MON\", \"MONS\", \"MONTHS\") - the total months % 12 \"DAY\", (\"D\", \"DAYS\") - the days part of interval \"HOUR\", (\"H\", \"HOURS\", \"HR\", \"HRS\") - how many hours the microseconds contains \"MINUTE\", (\"M\", \"MIN\", \"MINS\", \"MINUTES\") - how many minutes left after taking hours from microseconds \"SECOND\", (\"S\", \"SEC\", \"SECONDS\", \"SECS\") - how many second with fractions left after taking hours and minutes from microseconds Supported string values of field for time (which consists of hour , minute , second ) are(case insensitive): \"HOUR\", (\"H\", \"HOURS\", \"HR\", \"HRS\") - The hour field (0 - 23) \"MINUTE\", (\"M\", \"MIN\", \"MINS\", \"MINUTES\") - the minutes field (0 - 59) \"SECOND\", (\"S\", \"SEC\", \"SECONDS\", \"SECS\") - the seconds field, including fractional parts up to micro second precision. Returns a DECIMAL(8, 6) precision value. source - a date or timestamp or time or interval column from where field should be extracted Examples: > SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456'); 2019 > SELECT extract(week FROM timestamp'2019-08-12 01:00:00.123456'); 33 > SELECT extract(doy FROM DATE'2019-08-12'); 224 > SELECT extract(SECONDS FROM timestamp'2019-10-01 00:00:01.000001'); 1.000001 > SELECT extract(days FROM interval 5 days 3 hours 7 minutes); 5 > SELECT extract(seconds FROM interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 30.001001 > SELECT extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH); 11 > SELECT extract(MINUTE FROM INTERVAL '123 23:55:59.002001' DAY TO SECOND); 55 > SELECT extract(HOUR FROM time '09:08:01.000001'); 9 > SELECT extract(MINUTE FROM time '09:08:01.000001'); 8 > SELECT extract(SECOND FROM time '09:08:01.000001'); 1.000001 Note: The extract function is equivalent to date_part(field, source) . Since: 3.0.0 from_unixtime \u00b6 from_unixtime(unix_time[, fmt]) - Returns unix_time in the specified fmt . Arguments: unix_time - UNIX Timestamp to be converted to the provided format. fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns. The 'yyyy-MM-dd HH:mm:ss' pattern is used if omitted. Examples: > SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss'); 1969-12-31 16:00:00 > SELECT from_unixtime(0); 1969-12-31 16:00:00 Since: 1.5.0 from_utc_timestamp \u00b6 from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0 hour \u00b6 hour(expr) - Returns the hour component of the given expression. If expr is a TIMESTAMP or a string that can be cast to timestamp, it returns the hour of that timestamp. If expr is a TIME type (since 4.1.0), it returns the hour of the time-of-day. Examples: > SELECT hour('2018-02-14 12:58:59'); 12 > SELECT hour(TIME'13:59:59.999999'); 13 Since: 1.5.0 last_day \u00b6 last_day(date) - Returns the last day of the month which the date belongs to. Examples: > SELECT last_day('2009-01-12'); 2009-01-31 Since: 1.5.0 localtimestamp \u00b6 localtimestamp() - Returns the current timestamp without time zone at the start of query evaluation. All calls of localtimestamp within the same query return the same value. localtimestamp - Returns the current local date-time at the session time zone at the start of query evaluation. Examples: > SELECT localtimestamp(); 2020-04-25 15:49:11.914 Since: 3.4.0 make_date \u00b6 make_date(year, month, day) - Create date from year, month and day fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 Examples: > SELECT make_date(2013, 7, 15); 2013-07-15 > SELECT make_date(2019, 7, NULL); NULL Since: 3.0.0 make_dt_interval \u00b6 make_dt_interval([days[, hours[, mins[, secs]]]]) - Make DayTimeIntervalType duration from days, hours, mins and secs. Arguments: days - the number of days, positive or negative hours - the number of hours, positive or negative mins - the number of minutes, positive or negative secs - the number of seconds with the fractional part in microsecond precision. Examples: > SELECT make_dt_interval(1, 12, 30, 01.001001); 1 12:30:01.001001000 > SELECT make_dt_interval(2); 2 00:00:00.000000000 > SELECT make_dt_interval(100, null, 3); NULL Since: 3.2.0 make_interval \u00b6 make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - Make interval from years, months, weeks, days, hours, mins and secs. Arguments: years - the number of years, positive or negative months - the number of months, positive or negative weeks - the number of weeks, positive or negative days - the number of days, positive or negative hours - the number of hours, positive or negative mins - the number of minutes, positive or negative secs - the number of seconds with the fractional part in microsecond precision. Examples: > SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001); 100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds > SELECT make_interval(100, null, 3); NULL > SELECT make_interval(0, 1, 0, 1, 0, 0, 100.000001); 1 months 1 days 1 minutes 40.000001 seconds Since: 3.0.0 make_time \u00b6 make_time(hour, minute, second) - Create time from hour, minute and second fields. For invalid inputs it will throw an error. Arguments: hour - the hour to represent, from 0 to 23 minute - the minute to represent, from 0 to 59 second - the second to represent, from 0 to 59.999999 Examples: > SELECT make_time(6, 30, 45.887); 06:30:45.887 > SELECT make_time(NULL, 30, 0); NULL Since: 4.1.0 make_timestamp \u00b6 make_timestamp(year, month, day, hour, min, sec[, timezone]) - Create the current timestamp with local time zone from year, month, day, hour, min, sec and timezone fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. make_timestamp(date[, time[, timezone]]) - Create timestamp from date and time fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date expression time - a time expression (optional). Default is 00:00:00. timezone - the time zone identifier (optional). For example, CET, UTC and etc. Examples: > SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT make_timestamp(DATE'2014-12-28'); 2014-12-28 00:00:00 > SELECT make_timestamp(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT make_timestamp(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 Since: 3.0.0 make_timestamp_ltz \u00b6 make_timestamp_ltz(year, month, day, hour, min, sec[, timezone]) - Create the current timestamp with local time zone from year, month, day, hour, min, sec and (optional) timezone fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. make_timestamp_ltz(date, time[, timezone]) - Create a local date-time from date, time and (optional) timezone fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. timezone - the time zone identifier. For example, CET, UTC and etc. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT make_timestamp_ltz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT make_timestamp_ltz(null, 7, 22, 15, 30, 0); NULL > SELECT make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 Since: 3.4.0 make_timestamp_ntz \u00b6 make_timestamp_ntz(year, month, day, hour, min, sec) - Create local date-time from year, month, day, hour, min, sec fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. make_timestamp_ntz(date, time) - Create a local date-time from date and time fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT make_timestamp_ntz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT make_timestamp_ntz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT make_timestamp_ntz(null, 7, 22, 15, 30, 0); NULL > SELECT make_timestamp_ntz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 Since: 3.4.0 make_ym_interval \u00b6 make_ym_interval([years[, months]]) - Make year-month interval from years, months. Arguments: years - the number of years, positive or negative months - the number of months, positive or negative Examples: > SELECT make_ym_interval(1, 2); 1-2 > SELECT make_ym_interval(1, 0); 1-0 > SELECT make_ym_interval(-1, 1); -0-11 > SELECT make_ym_interval(2); 2-0 Since: 3.2.0 minute \u00b6 minute(expr) - Returns the minute component of the given expression. If expr is a TIMESTAMP or a string that can be cast to timestamp, it returns the minute of that timestamp. If expr is a TIME type (since 4.1.0), it returns the minute of the time-of-day. Examples: > SELECT minute('2009-07-30 12:58:59'); 58 > SELECT minute(TIME'23:59:59.999999'); 59 Since: 1.5.0 month \u00b6 month(date) - Returns the month component of the date/timestamp. Examples: > SELECT month('2016-07-30'); 7 Since: 1.5.0 monthname \u00b6 monthname(date) - Returns the three-letter abbreviated month name from the given date. Examples: > SELECT monthname('2008-02-20'); Feb Since: 4.0.0 months_between \u00b6 months_between(timestamp1, timestamp2[, roundOff]) - If timestamp1 is later than timestamp2 , then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false. Examples: > SELECT months_between('1997-02-28 10:30:00', '1996-10-30'); 3.94959677 > SELECT months_between('1997-02-28 10:30:00', '1996-10-30', false); 3.9495967741935485 Since: 1.5.0 next_day \u00b6 next_day(start_date, day_of_week) - Returns the first date which is later than start_date and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws SparkIllegalArgumentException if spark.sql.ansi.enabled is set to true, otherwise NULL. Examples: > SELECT next_day('2015-01-14', 'TU'); 2015-01-20 Since: 1.5.0 now \u00b6 now() - Returns the current timestamp at the start of query evaluation. Examples: > SELECT now(); 2020-04-25 15:49:11.914 Since: 1.6.0 quarter \u00b6 quarter(date) - Returns the quarter of the year for date, in the range 1 to 4. Examples: > SELECT quarter('2016-08-31'); 3 Since: 1.5.0 second \u00b6 second(expr) - Returns the second component of the given expression. If expr is a TIMESTAMP or a string that can be cast to timestamp, it returns the second of that timestamp. If expr is a TIME type (since 4.1.0), it returns the second of the time-of-day. Examples: > SELECT second('2018-02-14 12:58:59'); 59 > SELECT second(TIME'13:25:59.999999'); 59 Since: 1.5.0 session_window \u00b6 session_window(time_column, gap_duration) - Generates session window given a timestamp specifying column and gap duration. See 'Types of time windows' in Structured Streaming guide doc for detailed explanation and examples. Arguments: time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType. gap_duration - A string specifying the timeout of the session represented as \"interval value\" (See Interval Literal for more details.) for the fixed gap duration, or an expression which is applied for each input and evaluated to the \"interval value\" for the dynamic gap duration. Examples: > SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2 A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1 A2 2021-01-01 00:01:00 2021-01-01 00:06:00 1 > SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2 A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1 A2 2021-01-01 00:01:00 2021-01-01 00:02:00 1 A2 2021-01-01 00:04:30 2021-01-01 00:05:30 1 Since: 3.2.0 time_diff \u00b6 time_diff(unit, start, end) - Gets the difference between the times in the specified units. Arguments: unit - the unit of the difference between the given times \"HOUR\" \"MINUTE\" \"SECOND\" \"MILLISECOND\" \"MICROSECOND\" start - a starting TIME expression end - an ending TIME expression Examples: > SELECT time_diff('HOUR', TIME'20:30:29', TIME'21:30:28'); 0 > SELECT time_diff('HOUR', TIME'20:30:29', TIME'21:30:29'); 1 > SELECT time_diff('HOUR', TIME'20:30:29', TIME'12:00:00'); -8 Since: 4.1.0 time_from_micros \u00b6 time_from_micros(micros) - Creates a TIME value from microseconds since midnight. Arguments: micros - microseconds since midnight (0 to 86399999999) Examples: > SELECT time_from_micros(0); 00:00:00 > SELECT time_from_micros(52200000000); 14:30:00 > SELECT time_from_micros(52200500000); 14:30:00.5 > SELECT time_from_micros(86399999999); 23:59:59.999999 Since: 4.2.0 time_from_millis \u00b6 time_from_millis(millis) - Creates a TIME value from milliseconds since midnight. Arguments: millis - milliseconds since midnight (0 to 86399999) Examples: > SELECT time_from_millis(0); 00:00:00 > SELECT time_from_millis(52200000); 14:30:00 > SELECT time_from_millis(52200500); 14:30:00.5 > SELECT time_from_millis(86399999); 23:59:59.999 Since: 4.2.0 time_from_seconds \u00b6 time_from_seconds(seconds) - Creates a TIME value from seconds since midnight. Arguments: seconds - seconds since midnight (0 to 86399.999999). Supports decimals for fractional seconds. Examples: > SELECT time_from_seconds(0); 00:00:00 > SELECT time_from_seconds(52200); 14:30:00 > SELECT time_from_seconds(52200.5); 14:30:00.5 > SELECT time_from_seconds(86399.999999); 23:59:59.999999 Since: 4.2.0 time_to_micros \u00b6 time_to_micros(time) - Returns the number of microseconds since midnight for the given TIME value. Arguments: time - TIME value to convert Examples: > SELECT time_to_micros(TIME'00:00:00'); 0 > SELECT time_to_micros(TIME'14:30:00'); 52200000000 > SELECT time_to_micros(TIME'14:30:00.5'); 52200500000 > SELECT time_to_micros(TIME'23:59:59.999999'); 86399999999 Since: 4.2.0 time_to_millis \u00b6 time_to_millis(time) - Returns the number of milliseconds since midnight for the given TIME value. Arguments: time - TIME value to convert Examples: > SELECT time_to_millis(TIME'00:00:00'); 0 > SELECT time_to_millis(TIME'14:30:00'); 52200000 > SELECT time_to_millis(TIME'14:30:00.5'); 52200500 > SELECT time_to_millis(TIME'23:59:59.999'); 86399999 Since: 4.2.0 time_to_seconds \u00b6 time_to_seconds(time) - Returns the number of seconds since midnight for the given TIME value. Arguments: time - TIME value to convert Examples: > SELECT time_to_seconds(TIME'00:00:00'); 0.000000 > SELECT time_to_seconds(TIME'14:30:00'); 52200.000000 > SELECT time_to_seconds(TIME'14:30:00.5'); 52200.500000 > SELECT time_to_seconds(TIME'23:59:59.999999'); 86399.999999 Since: 4.2.0 time_trunc \u00b6 time_trunc(unit, time) - Returns time truncated to the unit . Arguments: unit - the unit to truncate to \"HOUR\" - zero out the minutes and seconds with fraction part \"MINUTE\" - zero out the seconds with fraction part \"SECOND\" - zero out the fraction part of seconds \"MILLISECOND\" - zero out the microseconds \"MICROSECOND\" - zero out the nanoseconds time - a TIME expression Examples: > SELECT time_trunc('HOUR', TIME'09:32:05.359'); 09:00:00 > SELECT time_trunc('MILLISECOND', TIME'09:32:05.123456'); 09:32:05.123 Since: 4.1.0 timestamp_micros \u00b6 timestamp_micros(microseconds) - Creates timestamp from the number of microseconds since UTC epoch. Examples: > SELECT timestamp_micros(1230219000123123); 2008-12-25 07:30:00.123123 Since: 3.1.0 timestamp_millis \u00b6 timestamp_millis(milliseconds) - Creates timestamp from the number of milliseconds since UTC epoch. Examples: > SELECT timestamp_millis(1230219000123); 2008-12-25 07:30:00.123 Since: 3.1.0 timestamp_seconds \u00b6 timestamp_seconds(seconds) - Creates timestamp from the number of seconds (can be fractional) since UTC epoch. Examples: > SELECT timestamp_seconds(1230219000); 2008-12-25 07:30:00 > SELECT timestamp_seconds(1230219000.123); 2008-12-25 07:30:00.123 Since: 3.1.0 to_date \u00b6 to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted. Arguments: date_str - A string to be parsed to date. fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_date('2009-07-30 04:17:52'); 2009-07-30 > SELECT to_date('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 Since: 1.5.0 to_time \u00b6 to_time(str[, format]) - Parses the str expression with the format expression to a time. If format is malformed or its application does not result in a well formed time, the function raises an error. By default, it follows casting rules to a time if the format is omitted. Arguments: str - A string to be parsed to time. format - Time format pattern to follow. See Datetime Patterns for valid time format patterns. Examples: > SELECT to_time('00:12:00'); 00:12:00 > SELECT to_time('12.10.05', 'HH.mm.ss'); 12:10:05 Since: 4.1.0 to_timestamp \u00b6 to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType . Arguments: timestamp_str - A string to be parsed to timestamp. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_timestamp('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 Since: 2.2.0 to_timestamp_ltz \u00b6 to_timestamp_ltz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp with local time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. Arguments: timestamp_str - A string to be parsed to timestamp with local time zone. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_timestamp_ltz('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT to_timestamp_ltz('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 Since: 3.4.0 to_timestamp_ntz \u00b6 to_timestamp_ntz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp without time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. Arguments: timestamp_str - A string to be parsed to timestamp without time zone. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_timestamp_ntz('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT to_timestamp_ntz('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 Since: 3.4.0 to_unix_timestamp \u00b6 to_unix_timestamp(timeExp[, fmt]) - Returns the UNIX timestamp of the given time. Arguments: timeExp - A date/timestamp or string which is returned as a UNIX timestamp. fmt - Date/time format pattern to follow. Ignored if timeExp is not a string. Default value is \"yyyy-MM-dd HH:mm:ss\". See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd'); 1460098800 Since: 1.6.0 to_utc_timestamp \u00b6 to_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'. Examples: > SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-30 15:00:00 Since: 1.5.0 trunc \u00b6 trunc(date, fmt) - Returns date with the time portion of the day truncated to the unit specified by the format model fmt . Arguments: date - date value or valid date string fmt - the format representing the unit to be truncated to \"YEAR\", \"YYYY\", \"YY\" - truncate to the first date of the year that the date falls in \"QUARTER\" - truncate to the first date of the quarter that the date falls in \"MONTH\", \"MM\", \"MON\" - truncate to the first date of the month that the date falls in \"WEEK\" - truncate to the Monday of the week that the date falls in Examples: > SELECT trunc('2019-08-04', 'week'); 2019-07-29 > SELECT trunc('2019-08-04', 'quarter'); 2019-07-01 > SELECT trunc('2009-02-12', 'MM'); 2009-02-01 > SELECT trunc('2015-10-27', 'YEAR'); 2015-01-01 Since: 1.5.0 try_make_interval \u00b6 try_make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - This is a special version of make_interval that performs the same operation, but returns NULL when an overflow occurs. Arguments: years - the number of years, positive or negative months - the number of months, positive or negative weeks - the number of weeks, positive or negative days - the number of days, positive or negative hours - the number of hours, positive or negative mins - the number of minutes, positive or negative secs - the number of seconds with the fractional part in microsecond precision. Examples: > SELECT try_make_interval(100, 11, 1, 1, 12, 30, 01.001001); 100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds > SELECT try_make_interval(100, null, 3); NULL > SELECT try_make_interval(0, 1, 0, 1, 0, 0, 100.000001); 1 months 1 days 1 minutes 40.000001 seconds > SELECT try_make_interval(2147483647); NULL Since: 4.0.0 try_make_timestamp \u00b6 try_make_timestamp(year, month, day, hour, min, sec[, timezone]) - Try to create a timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType . The function returns NULL on invalid inputs. try_make_timestamp(date[, time[, timezone]]) - Try to create a timestamp from date, time, and timezone fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. The value can be either an integer like 13 , or a fraction like 13.123. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date expression time - a time expression (optional). Default is 00:00:00. timezone - the time zone identifier (optional). For example, CET, UTC and etc. Examples: > SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT try_make_timestamp(DATE'2014-12-28'); 2014-12-28 00:00:00 > SELECT try_make_timestamp(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 > SELECT try_make_timestamp(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT try_make_timestamp(2019, 6, 30, 23, 59, 1); 2019-06-30 23:59:01 > SELECT try_make_timestamp(null, 7, 22, 15, 30, 0); NULL > SELECT try_make_timestamp(2024, 13, 22, 15, 30, 0); NULL Since: 4.0.0 try_make_timestamp_ltz \u00b6 try_make_timestamp_ltz(year, month, day, hour, min, sec[, timezone]) - Try to create the current timestamp with local time zone from year, month, day, hour, min, sec and (optional) timezone fields. The function returns NULL on invalid inputs. try_make_timestamp_ltz(date, time[, timezone]) - Try to create the current timestamp with local time zone from date, time and (optional) timezone fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. timezone - the time zone identifier. For example, CET, UTC and etc. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT try_make_timestamp_ltz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT try_make_timestamp_ltz(null, 7, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ltz(2024, 13, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 Since: 4.0.0 try_make_timestamp_ntz \u00b6 try_make_timestamp_ntz(year, month, day, hour, min, sec) - Try to create local date-time from year, month, day, hour, min, sec fields. The function returns NULL on invalid inputs. try_make_timestamp_ntz(date, time) - Create a local date-time from date and time fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT try_make_timestamp_ntz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp_ntz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT try_make_timestamp_ntz(null, 7, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ntz(2024, 13, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ntz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 Since: 4.0.0 try_to_date \u00b6 try_to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. The function always returns null on an invalid input with/without ANSI SQL mode enabled. By default, it follows casting rules to a date if the fmt is omitted. Arguments: date_str - A string to be parsed to date. fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT try_to_date('2016-12-31'); 2016-12-31 > SELECT try_to_date('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 > SELECT try_to_date('foo', 'yyyy-MM-dd'); NULL Since: 4.0.0 try_to_time \u00b6 try_to_time(str[, format]) - Parses the str expression with the format expression to a time. If format is malformed or its application does not result in a well formed time, the function returns NULL. By default, it follows casting rules to a time if the format is omitted. Arguments: str - A string to be parsed to time. format - Time format pattern to follow. See Datetime Patterns for valid time format patterns. Examples: > SELECT try_to_time('00:12:00.001'); 00:12:00.001 > SELECT try_to_time('12.10.05.999999', 'HH.mm.ss.SSSSSS'); 12:10:05.999999 > SELECT try_to_time('foo', 'HH:mm:ss'); NULL Since: 4.1.0 try_to_timestamp \u00b6 try_to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType . Arguments: timestamp_str - A string to be parsed to timestamp. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT try_to_timestamp('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT try_to_timestamp('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 > SELECT try_to_timestamp('foo', 'yyyy-MM-dd'); NULL Since: 3.4.0 unix_date \u00b6 unix_date(date) - Returns the number of days since 1970-01-01. Examples: > SELECT unix_date(DATE(\"1970-01-02\")); 1 Since: 3.1.0 unix_micros \u00b6 unix_micros(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC. Examples: > SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z')); 1000000 Since: 3.1.0 unix_millis \u00b6 unix_millis(timestamp) - Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision. Examples: > SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z')); 1000 Since: 3.1.0 unix_seconds \u00b6 unix_seconds(timestamp) - Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision. Examples: > SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z')); 1 Since: 3.1.0 unix_timestamp \u00b6 unix_timestamp([timeExp[, fmt]]) - Returns the UNIX timestamp of current or specified time. Arguments: timeExp - A date/timestamp or string. If not provided, this defaults to current time. fmt - Date/time format pattern to follow. Ignored if timeExp is not a string. Default value is \"yyyy-MM-dd HH:mm:ss\". See Datetime Patterns for valid date and time format patterns. Examples: > SELECT unix_timestamp(); 1476884637 > SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd'); 1460041200 Since: 1.5.0 weekday \u00b6 weekday(date) - Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday). Examples: > SELECT weekday('2009-07-30'); 3 Since: 2.4.0 weekofyear \u00b6 weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days. Examples: > SELECT weekofyear('2008-02-20'); 8 Since: 1.5.0 window \u00b6 window(time_column, window_duration[, slide_duration[, start_time]]) - Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples. Arguments: time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType. window_duration - A string specifying the width of the window represented as \"interval value\". (See Interval Literal for more details.) Note that the duration is a fixed length of time, and does not vary over time according to a calendar. slide_duration - A string specifying the sliding interval of the window represented as \"interval value\". A new window will be generated every slide_duration . Must be less than or equal to the window_duration . This duration is likewise absolute, and does not vary according to a calendar. start_time - The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide start_time as 15 minutes . Examples: > SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:05:00 2 A1 2021-01-01 00:05:00 2021-01-01 00:10:00 1 A2 2021-01-01 00:00:00 2021-01-01 00:05:00 1 > SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start; A1 2020-12-31 23:55:00 2021-01-01 00:05:00 2 A1 2021-01-01 00:00:00 2021-01-01 00:10:00 3 A1 2021-01-01 00:05:00 2021-01-01 00:15:00 1 A2 2020-12-31 23:55:00 2021-01-01 00:05:00 1 A2 2021-01-01 00:00:00 2021-01-01 00:10:00 1 Since: 2.0.0 window_time \u00b6 window_time(window_column) - Extract the time value from time/session window column which can be used for event time value of window. The extracted time is (window.end - 1) which reflects the fact that the aggregating windows have exclusive upper bound - [start, end) See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples. Arguments: window_column - The column representing time/session window. Examples: > SELECT a, window.start as start, window.end as end, window_time(window), cnt FROM (SELECT a, window, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, window.start); A1 2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999 2 A1 2021-01-01 00:05:00 2021-01-01 00:10:00 2021-01-01 00:09:59.999999 1 A2 2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999 1 Since: 3.4.0 year \u00b6 year(date) - Returns the year component of the date/timestamp. Examples: > SELECT year('2016-07-30'); 2016 Since: 1.5.0","title":"Datetime Functions"},{"location":"datetime-functions/#datetime-functions","text":"This page lists all datetime functions available in Spark SQL.","title":"Datetime Functions"},{"location":"datetime-functions/#add_months","text":"add_months(start_date, num_months) - Returns the date that is num_months after start_date . Examples: > SELECT add_months('2016-08-31', 1); 2016-09-30 Since: 1.5.0","title":"add_months"},{"location":"datetime-functions/#convert_timezone","text":"convert_timezone([sourceTz, ]targetTz, sourceTs) - Converts the timestamp without time zone sourceTs from the sourceTz time zone to targetTz . Arguments: sourceTz - the time zone for the input timestamp. If it is missed, the current session time zone is used as the source time zone. targetTz - the time zone to which the input timestamp should be converted sourceTs - a timestamp without time zone Examples: > SELECT convert_timezone('Europe/Brussels', 'America/Los_Angeles', timestamp_ntz'2021-12-06 00:00:00'); 2021-12-05 15:00:00 > SELECT convert_timezone('Europe/Brussels', timestamp_ntz'2021-12-05 15:00:00'); 2021-12-06 00:00:00 Since: 3.4.0","title":"convert_timezone"},{"location":"datetime-functions/#curdate","text":"curdate() - Returns the current date at the start of query evaluation. All calls of curdate within the same query return the same value. Examples: > SELECT curdate(); 2022-09-06 Since: 3.4.0","title":"curdate"},{"location":"datetime-functions/#current_date","text":"current_date() - Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value. current_date - Returns the current date at the start of query evaluation. Examples: > SELECT current_date(); 2020-04-25 > SELECT current_date; 2020-04-25 Note: The syntax without braces has been supported since 2.0.1. Since: 1.5.0","title":"current_date"},{"location":"datetime-functions/#current_time","text":"current_time([precision]) - Returns the current time at the start of query evaluation. All calls of current_time within the same query return the same value. current_time - Returns the current time at the start of query evaluation. Arguments: precision - An optional integer literal in the range [0..6], indicating how many fractional digits of seconds to include. If omitted, the default is 6. Examples: > SELECT current_time(); 15:49:11.914120 > SELECT current_time; 15:49:11.914120 > SELECT current_time(0); 15:49:11 > SELECT current_time(3); 15:49:11.914 > SELECT current_time(1+1); 15:49:11.91 Since: 4.1.0","title":"current_time"},{"location":"datetime-functions/#current_timestamp","text":"current_timestamp() - Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value. current_timestamp - Returns the current timestamp at the start of query evaluation. Examples: > SELECT current_timestamp(); 2020-04-25 15:49:11.914 > SELECT current_timestamp; 2020-04-25 15:49:11.914 Note: The syntax without braces has been supported since 2.0.1. Since: 1.5.0","title":"current_timestamp"},{"location":"datetime-functions/#current_timezone","text":"current_timezone() - Returns the current session local timezone. Examples: > SELECT current_timezone(); Asia/Shanghai Since: 3.1.0","title":"current_timezone"},{"location":"datetime-functions/#date_add","text":"date_add(start_date, num_days) - Returns the date that is num_days after start_date . Examples: > SELECT date_add('2016-07-30', 1); 2016-07-31 Since: 1.5.0","title":"date_add"},{"location":"datetime-functions/#date_diff","text":"date_diff(endDate, startDate) - Returns the number of days from startDate to endDate . Examples: > SELECT date_diff('2009-07-31', '2009-07-30'); 1 > SELECT date_diff('2009-07-30', '2009-07-31'); -1 Since: 3.4.0","title":"date_diff"},{"location":"datetime-functions/#date_format","text":"date_format(timestamp, fmt) - Converts timestamp to a value of string in the format specified by the date format fmt . Arguments: timestamp - A date/timestamp or string to be converted to the given format. fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT date_format('2016-04-08', 'y'); 2016 Since: 1.5.0","title":"date_format"},{"location":"datetime-functions/#date_from_unix_date","text":"date_from_unix_date(days) - Create date from the number of days since 1970-01-01. Examples: > SELECT date_from_unix_date(1); 1970-01-02 Since: 3.1.0","title":"date_from_unix_date"},{"location":"datetime-functions/#date_part","text":"date_part(field, source) - Extracts a part of the date/timestamp or interval source. Arguments: field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function EXTRACT . source - a date/timestamp or interval column from where field should be extracted Examples: > SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); 2019 > SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456'); 33 > SELECT date_part('doy', DATE'2019-08-12'); 224 > SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001'); 1.000001 > SELECT date_part('days', interval 5 days 3 hours 7 minutes); 5 > SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 30.001001 > SELECT date_part('MONTH', INTERVAL '2021-11' YEAR TO MONTH); 11 > SELECT date_part('MINUTE', INTERVAL '123 23:55:59.002001' DAY TO SECOND); 55 Note: The date_part function is equivalent to the SQL-standard function EXTRACT(field FROM source) Since: 3.0.0","title":"date_part"},{"location":"datetime-functions/#date_sub","text":"date_sub(start_date, num_days) - Returns the date that is num_days before start_date . Examples: > SELECT date_sub('2016-07-30', 1); 2016-07-29 Since: 1.5.0","title":"date_sub"},{"location":"datetime-functions/#date_trunc","text":"date_trunc(fmt, ts) - Returns timestamp ts truncated to the unit specified by the format model fmt . Arguments: fmt - the format representing the unit to be truncated to \"YEAR\", \"YYYY\", \"YY\" - truncate to the first date of the year that the ts falls in, the time part will be zero out \"QUARTER\" - truncate to the first date of the quarter that the ts falls in, the time part will be zero out \"MONTH\", \"MM\", \"MON\" - truncate to the first date of the month that the ts falls in, the time part will be zero out \"WEEK\" - truncate to the Monday of the week that the ts falls in, the time part will be zero out \"DAY\", \"DD\" - zero out the time part \"HOUR\" - zero out the minute and second with fraction part \"MINUTE\"- zero out the second with fraction part \"SECOND\" - zero out the second fraction part \"MILLISECOND\" - zero out the microseconds \"MICROSECOND\" - everything remains ts - datetime value or valid timestamp string Examples: > SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359'); 2015-01-01 00:00:00 > SELECT date_trunc('MM', '2015-03-05T09:32:05.359'); 2015-03-01 00:00:00 > SELECT date_trunc('DD', '2015-03-05T09:32:05.359'); 2015-03-05 00:00:00 > SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359'); 2015-03-05 09:00:00 > SELECT date_trunc('MILLISECOND', '2015-03-05T09:32:05.123456'); 2015-03-05 09:32:05.123 Since: 2.3.0","title":"date_trunc"},{"location":"datetime-functions/#dateadd","text":"dateadd(start_date, num_days) - Returns the date that is num_days after start_date . Examples: > SELECT dateadd('2016-07-30', 1); 2016-07-31 Since: 3.4.0","title":"dateadd"},{"location":"datetime-functions/#datediff","text":"datediff(endDate, startDate) - Returns the number of days from startDate to endDate . Examples: > SELECT datediff('2009-07-31', '2009-07-30'); 1 > SELECT datediff('2009-07-30', '2009-07-31'); -1 Since: 1.5.0","title":"datediff"},{"location":"datetime-functions/#datepart","text":"datepart(field, source) - Extracts a part of the date/timestamp or interval source. Arguments: field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function EXTRACT . source - a date/timestamp or interval column from where field should be extracted Examples: > SELECT datepart('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); 2019 > SELECT datepart('week', timestamp'2019-08-12 01:00:00.123456'); 33 > SELECT datepart('doy', DATE'2019-08-12'); 224 > SELECT datepart('SECONDS', timestamp'2019-10-01 00:00:01.000001'); 1.000001 > SELECT datepart('days', interval 5 days 3 hours 7 minutes); 5 > SELECT datepart('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 30.001001 > SELECT datepart('MONTH', INTERVAL '2021-11' YEAR TO MONTH); 11 > SELECT datepart('MINUTE', INTERVAL '123 23:55:59.002001' DAY TO SECOND); 55 Note: The datepart function is equivalent to the SQL-standard function EXTRACT(field FROM source) Since: 3.4.0","title":"datepart"},{"location":"datetime-functions/#day","text":"day(date) - Returns the day of month of the date/timestamp. Examples: > SELECT day('2009-07-30'); 30 Since: 1.5.0","title":"day"},{"location":"datetime-functions/#dayname","text":"dayname(date) - Returns the three-letter abbreviated day name from the given date. Examples: > SELECT dayname(DATE('2008-02-20')); Wed Since: 4.0.0","title":"dayname"},{"location":"datetime-functions/#dayofmonth","text":"dayofmonth(date) - Returns the day of month of the date/timestamp. Examples: > SELECT dayofmonth('2009-07-30'); 30 Since: 1.5.0","title":"dayofmonth"},{"location":"datetime-functions/#dayofweek","text":"dayofweek(date) - Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday). Examples: > SELECT dayofweek('2009-07-30'); 5 Since: 2.3.0","title":"dayofweek"},{"location":"datetime-functions/#dayofyear","text":"dayofyear(date) - Returns the day of year of the date/timestamp. Examples: > SELECT dayofyear('2016-04-09'); 100 Since: 1.5.0","title":"dayofyear"},{"location":"datetime-functions/#extract","text":"extract(field FROM source) - Extracts a part of the date or timestamp or time or interval source. Arguments: field - selects which part of the source should be extracted Supported string values of field for dates and timestamps are(case insensitive): \"YEAR\", (\"Y\", \"YEARS\", \"YR\", \"YRS\") - the year field \"YEAROFWEEK\" - the ISO 8601 week-numbering year that the datetime falls in. For example, 2005-01-02 is part of the 53rd week of year 2004, so the result is 2004 \"QUARTER\", (\"QTR\") - the quarter (1 - 4) of the year that the datetime falls in \"MONTH\", (\"MON\", \"MONS\", \"MONTHS\") - the month field (1 - 12) \"WEEK\", (\"W\", \"WEEKS\") - the number of the ISO 8601 week-of-week-based-year. A week is considered to start on a Monday and week 1 is the first week with >3 days. In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-02 is part of the 53rd week of year 2004, while 2012-12-31 is part of the first week of 2013 \"DAY\", (\"D\", \"DAYS\") - the day of the month field (1 - 31) \"DAYOFWEEK\",(\"DOW\") - the day of the week for datetime as Sunday(1) to Saturday(7) \"DAYOFWEEK_ISO\",(\"DOW_ISO\") - ISO 8601 based day of the week for datetime as Monday(1) to Sunday(7) \"DOY\" - the day of the year (1 - 365/366) \"HOUR\", (\"H\", \"HOURS\", \"HR\", \"HRS\") - The hour field (0 - 23) \"MINUTE\", (\"M\", \"MIN\", \"MINS\", \"MINUTES\") - the minutes field (0 - 59) \"SECOND\", (\"S\", \"SEC\", \"SECONDS\", \"SECS\") - the seconds field, including fractional parts Supported string values of field for interval(which consists of months , days , microseconds ) are(case insensitive): \"YEAR\", (\"Y\", \"YEARS\", \"YR\", \"YRS\") - the total months / 12 \"MONTH\", (\"MON\", \"MONS\", \"MONTHS\") - the total months % 12 \"DAY\", (\"D\", \"DAYS\") - the days part of interval \"HOUR\", (\"H\", \"HOURS\", \"HR\", \"HRS\") - how many hours the microseconds contains \"MINUTE\", (\"M\", \"MIN\", \"MINS\", \"MINUTES\") - how many minutes left after taking hours from microseconds \"SECOND\", (\"S\", \"SEC\", \"SECONDS\", \"SECS\") - how many second with fractions left after taking hours and minutes from microseconds Supported string values of field for time (which consists of hour , minute , second ) are(case insensitive): \"HOUR\", (\"H\", \"HOURS\", \"HR\", \"HRS\") - The hour field (0 - 23) \"MINUTE\", (\"M\", \"MIN\", \"MINS\", \"MINUTES\") - the minutes field (0 - 59) \"SECOND\", (\"S\", \"SEC\", \"SECONDS\", \"SECS\") - the seconds field, including fractional parts up to micro second precision. Returns a DECIMAL(8, 6) precision value. source - a date or timestamp or time or interval column from where field should be extracted Examples: > SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456'); 2019 > SELECT extract(week FROM timestamp'2019-08-12 01:00:00.123456'); 33 > SELECT extract(doy FROM DATE'2019-08-12'); 224 > SELECT extract(SECONDS FROM timestamp'2019-10-01 00:00:01.000001'); 1.000001 > SELECT extract(days FROM interval 5 days 3 hours 7 minutes); 5 > SELECT extract(seconds FROM interval 5 hours 30 seconds 1 milliseconds 1 microseconds); 30.001001 > SELECT extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH); 11 > SELECT extract(MINUTE FROM INTERVAL '123 23:55:59.002001' DAY TO SECOND); 55 > SELECT extract(HOUR FROM time '09:08:01.000001'); 9 > SELECT extract(MINUTE FROM time '09:08:01.000001'); 8 > SELECT extract(SECOND FROM time '09:08:01.000001'); 1.000001 Note: The extract function is equivalent to date_part(field, source) . Since: 3.0.0","title":"extract"},{"location":"datetime-functions/#from_unixtime","text":"from_unixtime(unix_time[, fmt]) - Returns unix_time in the specified fmt . Arguments: unix_time - UNIX Timestamp to be converted to the provided format. fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns. The 'yyyy-MM-dd HH:mm:ss' pattern is used if omitted. Examples: > SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss'); 1969-12-31 16:00:00 > SELECT from_unixtime(0); 1969-12-31 16:00:00 Since: 1.5.0","title":"from_unixtime"},{"location":"datetime-functions/#from_utc_timestamp","text":"from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0","title":"from_utc_timestamp"},{"location":"datetime-functions/#hour","text":"hour(expr) - Returns the hour component of the given expression. If expr is a TIMESTAMP or a string that can be cast to timestamp, it returns the hour of that timestamp. If expr is a TIME type (since 4.1.0), it returns the hour of the time-of-day. Examples: > SELECT hour('2018-02-14 12:58:59'); 12 > SELECT hour(TIME'13:59:59.999999'); 13 Since: 1.5.0","title":"hour"},{"location":"datetime-functions/#last_day","text":"last_day(date) - Returns the last day of the month which the date belongs to. Examples: > SELECT last_day('2009-01-12'); 2009-01-31 Since: 1.5.0","title":"last_day"},{"location":"datetime-functions/#localtimestamp","text":"localtimestamp() - Returns the current timestamp without time zone at the start of query evaluation. All calls of localtimestamp within the same query return the same value. localtimestamp - Returns the current local date-time at the session time zone at the start of query evaluation. Examples: > SELECT localtimestamp(); 2020-04-25 15:49:11.914 Since: 3.4.0","title":"localtimestamp"},{"location":"datetime-functions/#make_date","text":"make_date(year, month, day) - Create date from year, month and day fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 Examples: > SELECT make_date(2013, 7, 15); 2013-07-15 > SELECT make_date(2019, 7, NULL); NULL Since: 3.0.0","title":"make_date"},{"location":"datetime-functions/#make_dt_interval","text":"make_dt_interval([days[, hours[, mins[, secs]]]]) - Make DayTimeIntervalType duration from days, hours, mins and secs. Arguments: days - the number of days, positive or negative hours - the number of hours, positive or negative mins - the number of minutes, positive or negative secs - the number of seconds with the fractional part in microsecond precision. Examples: > SELECT make_dt_interval(1, 12, 30, 01.001001); 1 12:30:01.001001000 > SELECT make_dt_interval(2); 2 00:00:00.000000000 > SELECT make_dt_interval(100, null, 3); NULL Since: 3.2.0","title":"make_dt_interval"},{"location":"datetime-functions/#make_interval","text":"make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - Make interval from years, months, weeks, days, hours, mins and secs. Arguments: years - the number of years, positive or negative months - the number of months, positive or negative weeks - the number of weeks, positive or negative days - the number of days, positive or negative hours - the number of hours, positive or negative mins - the number of minutes, positive or negative secs - the number of seconds with the fractional part in microsecond precision. Examples: > SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001); 100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds > SELECT make_interval(100, null, 3); NULL > SELECT make_interval(0, 1, 0, 1, 0, 0, 100.000001); 1 months 1 days 1 minutes 40.000001 seconds Since: 3.0.0","title":"make_interval"},{"location":"datetime-functions/#make_time","text":"make_time(hour, minute, second) - Create time from hour, minute and second fields. For invalid inputs it will throw an error. Arguments: hour - the hour to represent, from 0 to 23 minute - the minute to represent, from 0 to 59 second - the second to represent, from 0 to 59.999999 Examples: > SELECT make_time(6, 30, 45.887); 06:30:45.887 > SELECT make_time(NULL, 30, 0); NULL Since: 4.1.0","title":"make_time"},{"location":"datetime-functions/#make_timestamp","text":"make_timestamp(year, month, day, hour, min, sec[, timezone]) - Create the current timestamp with local time zone from year, month, day, hour, min, sec and timezone fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. make_timestamp(date[, time[, timezone]]) - Create timestamp from date and time fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date expression time - a time expression (optional). Default is 00:00:00. timezone - the time zone identifier (optional). For example, CET, UTC and etc. Examples: > SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT make_timestamp(DATE'2014-12-28'); 2014-12-28 00:00:00 > SELECT make_timestamp(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT make_timestamp(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 Since: 3.0.0","title":"make_timestamp"},{"location":"datetime-functions/#make_timestamp_ltz","text":"make_timestamp_ltz(year, month, day, hour, min, sec[, timezone]) - Create the current timestamp with local time zone from year, month, day, hour, min, sec and (optional) timezone fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. make_timestamp_ltz(date, time[, timezone]) - Create a local date-time from date, time and (optional) timezone fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. timezone - the time zone identifier. For example, CET, UTC and etc. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT make_timestamp_ltz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT make_timestamp_ltz(null, 7, 22, 15, 30, 0); NULL > SELECT make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 Since: 3.4.0","title":"make_timestamp_ltz"},{"location":"datetime-functions/#make_timestamp_ntz","text":"make_timestamp_ntz(year, month, day, hour, min, sec) - Create local date-time from year, month, day, hour, min, sec fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. make_timestamp_ntz(date, time) - Create a local date-time from date and time fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT make_timestamp_ntz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT make_timestamp_ntz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT make_timestamp_ntz(null, 7, 22, 15, 30, 0); NULL > SELECT make_timestamp_ntz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 Since: 3.4.0","title":"make_timestamp_ntz"},{"location":"datetime-functions/#make_ym_interval","text":"make_ym_interval([years[, months]]) - Make year-month interval from years, months. Arguments: years - the number of years, positive or negative months - the number of months, positive or negative Examples: > SELECT make_ym_interval(1, 2); 1-2 > SELECT make_ym_interval(1, 0); 1-0 > SELECT make_ym_interval(-1, 1); -0-11 > SELECT make_ym_interval(2); 2-0 Since: 3.2.0","title":"make_ym_interval"},{"location":"datetime-functions/#minute","text":"minute(expr) - Returns the minute component of the given expression. If expr is a TIMESTAMP or a string that can be cast to timestamp, it returns the minute of that timestamp. If expr is a TIME type (since 4.1.0), it returns the minute of the time-of-day. Examples: > SELECT minute('2009-07-30 12:58:59'); 58 > SELECT minute(TIME'23:59:59.999999'); 59 Since: 1.5.0","title":"minute"},{"location":"datetime-functions/#month","text":"month(date) - Returns the month component of the date/timestamp. Examples: > SELECT month('2016-07-30'); 7 Since: 1.5.0","title":"month"},{"location":"datetime-functions/#monthname","text":"monthname(date) - Returns the three-letter abbreviated month name from the given date. Examples: > SELECT monthname('2008-02-20'); Feb Since: 4.0.0","title":"monthname"},{"location":"datetime-functions/#months_between","text":"months_between(timestamp1, timestamp2[, roundOff]) - If timestamp1 is later than timestamp2 , then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false. Examples: > SELECT months_between('1997-02-28 10:30:00', '1996-10-30'); 3.94959677 > SELECT months_between('1997-02-28 10:30:00', '1996-10-30', false); 3.9495967741935485 Since: 1.5.0","title":"months_between"},{"location":"datetime-functions/#next_day","text":"next_day(start_date, day_of_week) - Returns the first date which is later than start_date and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws SparkIllegalArgumentException if spark.sql.ansi.enabled is set to true, otherwise NULL. Examples: > SELECT next_day('2015-01-14', 'TU'); 2015-01-20 Since: 1.5.0","title":"next_day"},{"location":"datetime-functions/#now","text":"now() - Returns the current timestamp at the start of query evaluation. Examples: > SELECT now(); 2020-04-25 15:49:11.914 Since: 1.6.0","title":"now"},{"location":"datetime-functions/#quarter","text":"quarter(date) - Returns the quarter of the year for date, in the range 1 to 4. Examples: > SELECT quarter('2016-08-31'); 3 Since: 1.5.0","title":"quarter"},{"location":"datetime-functions/#second","text":"second(expr) - Returns the second component of the given expression. If expr is a TIMESTAMP or a string that can be cast to timestamp, it returns the second of that timestamp. If expr is a TIME type (since 4.1.0), it returns the second of the time-of-day. Examples: > SELECT second('2018-02-14 12:58:59'); 59 > SELECT second(TIME'13:25:59.999999'); 59 Since: 1.5.0","title":"second"},{"location":"datetime-functions/#session_window","text":"session_window(time_column, gap_duration) - Generates session window given a timestamp specifying column and gap duration. See 'Types of time windows' in Structured Streaming guide doc for detailed explanation and examples. Arguments: time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType. gap_duration - A string specifying the timeout of the session represented as \"interval value\" (See Interval Literal for more details.) for the fixed gap duration, or an expression which is applied for each input and evaluated to the \"interval value\" for the dynamic gap duration. Examples: > SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2 A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1 A2 2021-01-01 00:01:00 2021-01-01 00:06:00 1 > SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2 A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1 A2 2021-01-01 00:01:00 2021-01-01 00:02:00 1 A2 2021-01-01 00:04:30 2021-01-01 00:05:30 1 Since: 3.2.0","title":"session_window"},{"location":"datetime-functions/#time_diff","text":"time_diff(unit, start, end) - Gets the difference between the times in the specified units. Arguments: unit - the unit of the difference between the given times \"HOUR\" \"MINUTE\" \"SECOND\" \"MILLISECOND\" \"MICROSECOND\" start - a starting TIME expression end - an ending TIME expression Examples: > SELECT time_diff('HOUR', TIME'20:30:29', TIME'21:30:28'); 0 > SELECT time_diff('HOUR', TIME'20:30:29', TIME'21:30:29'); 1 > SELECT time_diff('HOUR', TIME'20:30:29', TIME'12:00:00'); -8 Since: 4.1.0","title":"time_diff"},{"location":"datetime-functions/#time_from_micros","text":"time_from_micros(micros) - Creates a TIME value from microseconds since midnight. Arguments: micros - microseconds since midnight (0 to 86399999999) Examples: > SELECT time_from_micros(0); 00:00:00 > SELECT time_from_micros(52200000000); 14:30:00 > SELECT time_from_micros(52200500000); 14:30:00.5 > SELECT time_from_micros(86399999999); 23:59:59.999999 Since: 4.2.0","title":"time_from_micros"},{"location":"datetime-functions/#time_from_millis","text":"time_from_millis(millis) - Creates a TIME value from milliseconds since midnight. Arguments: millis - milliseconds since midnight (0 to 86399999) Examples: > SELECT time_from_millis(0); 00:00:00 > SELECT time_from_millis(52200000); 14:30:00 > SELECT time_from_millis(52200500); 14:30:00.5 > SELECT time_from_millis(86399999); 23:59:59.999 Since: 4.2.0","title":"time_from_millis"},{"location":"datetime-functions/#time_from_seconds","text":"time_from_seconds(seconds) - Creates a TIME value from seconds since midnight. Arguments: seconds - seconds since midnight (0 to 86399.999999). Supports decimals for fractional seconds. Examples: > SELECT time_from_seconds(0); 00:00:00 > SELECT time_from_seconds(52200); 14:30:00 > SELECT time_from_seconds(52200.5); 14:30:00.5 > SELECT time_from_seconds(86399.999999); 23:59:59.999999 Since: 4.2.0","title":"time_from_seconds"},{"location":"datetime-functions/#time_to_micros","text":"time_to_micros(time) - Returns the number of microseconds since midnight for the given TIME value. Arguments: time - TIME value to convert Examples: > SELECT time_to_micros(TIME'00:00:00'); 0 > SELECT time_to_micros(TIME'14:30:00'); 52200000000 > SELECT time_to_micros(TIME'14:30:00.5'); 52200500000 > SELECT time_to_micros(TIME'23:59:59.999999'); 86399999999 Since: 4.2.0","title":"time_to_micros"},{"location":"datetime-functions/#time_to_millis","text":"time_to_millis(time) - Returns the number of milliseconds since midnight for the given TIME value. Arguments: time - TIME value to convert Examples: > SELECT time_to_millis(TIME'00:00:00'); 0 > SELECT time_to_millis(TIME'14:30:00'); 52200000 > SELECT time_to_millis(TIME'14:30:00.5'); 52200500 > SELECT time_to_millis(TIME'23:59:59.999'); 86399999 Since: 4.2.0","title":"time_to_millis"},{"location":"datetime-functions/#time_to_seconds","text":"time_to_seconds(time) - Returns the number of seconds since midnight for the given TIME value. Arguments: time - TIME value to convert Examples: > SELECT time_to_seconds(TIME'00:00:00'); 0.000000 > SELECT time_to_seconds(TIME'14:30:00'); 52200.000000 > SELECT time_to_seconds(TIME'14:30:00.5'); 52200.500000 > SELECT time_to_seconds(TIME'23:59:59.999999'); 86399.999999 Since: 4.2.0","title":"time_to_seconds"},{"location":"datetime-functions/#time_trunc","text":"time_trunc(unit, time) - Returns time truncated to the unit . Arguments: unit - the unit to truncate to \"HOUR\" - zero out the minutes and seconds with fraction part \"MINUTE\" - zero out the seconds with fraction part \"SECOND\" - zero out the fraction part of seconds \"MILLISECOND\" - zero out the microseconds \"MICROSECOND\" - zero out the nanoseconds time - a TIME expression Examples: > SELECT time_trunc('HOUR', TIME'09:32:05.359'); 09:00:00 > SELECT time_trunc('MILLISECOND', TIME'09:32:05.123456'); 09:32:05.123 Since: 4.1.0","title":"time_trunc"},{"location":"datetime-functions/#timestamp_micros","text":"timestamp_micros(microseconds) - Creates timestamp from the number of microseconds since UTC epoch. Examples: > SELECT timestamp_micros(1230219000123123); 2008-12-25 07:30:00.123123 Since: 3.1.0","title":"timestamp_micros"},{"location":"datetime-functions/#timestamp_millis","text":"timestamp_millis(milliseconds) - Creates timestamp from the number of milliseconds since UTC epoch. Examples: > SELECT timestamp_millis(1230219000123); 2008-12-25 07:30:00.123 Since: 3.1.0","title":"timestamp_millis"},{"location":"datetime-functions/#timestamp_seconds","text":"timestamp_seconds(seconds) - Creates timestamp from the number of seconds (can be fractional) since UTC epoch. Examples: > SELECT timestamp_seconds(1230219000); 2008-12-25 07:30:00 > SELECT timestamp_seconds(1230219000.123); 2008-12-25 07:30:00.123 Since: 3.1.0","title":"timestamp_seconds"},{"location":"datetime-functions/#to_date","text":"to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted. Arguments: date_str - A string to be parsed to date. fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_date('2009-07-30 04:17:52'); 2009-07-30 > SELECT to_date('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 Since: 1.5.0","title":"to_date"},{"location":"datetime-functions/#to_time","text":"to_time(str[, format]) - Parses the str expression with the format expression to a time. If format is malformed or its application does not result in a well formed time, the function raises an error. By default, it follows casting rules to a time if the format is omitted. Arguments: str - A string to be parsed to time. format - Time format pattern to follow. See Datetime Patterns for valid time format patterns. Examples: > SELECT to_time('00:12:00'); 00:12:00 > SELECT to_time('12.10.05', 'HH.mm.ss'); 12:10:05 Since: 4.1.0","title":"to_time"},{"location":"datetime-functions/#to_timestamp","text":"to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType . Arguments: timestamp_str - A string to be parsed to timestamp. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_timestamp('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 Since: 2.2.0","title":"to_timestamp"},{"location":"datetime-functions/#to_timestamp_ltz","text":"to_timestamp_ltz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp with local time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. Arguments: timestamp_str - A string to be parsed to timestamp with local time zone. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_timestamp_ltz('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT to_timestamp_ltz('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 Since: 3.4.0","title":"to_timestamp_ltz"},{"location":"datetime-functions/#to_timestamp_ntz","text":"to_timestamp_ntz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp without time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. Arguments: timestamp_str - A string to be parsed to timestamp without time zone. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_timestamp_ntz('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT to_timestamp_ntz('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 Since: 3.4.0","title":"to_timestamp_ntz"},{"location":"datetime-functions/#to_unix_timestamp","text":"to_unix_timestamp(timeExp[, fmt]) - Returns the UNIX timestamp of the given time. Arguments: timeExp - A date/timestamp or string which is returned as a UNIX timestamp. fmt - Date/time format pattern to follow. Ignored if timeExp is not a string. Default value is \"yyyy-MM-dd HH:mm:ss\". See Datetime Patterns for valid date and time format patterns. Examples: > SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd'); 1460098800 Since: 1.6.0","title":"to_unix_timestamp"},{"location":"datetime-functions/#to_utc_timestamp","text":"to_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'. Examples: > SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-30 15:00:00 Since: 1.5.0","title":"to_utc_timestamp"},{"location":"datetime-functions/#trunc","text":"trunc(date, fmt) - Returns date with the time portion of the day truncated to the unit specified by the format model fmt . Arguments: date - date value or valid date string fmt - the format representing the unit to be truncated to \"YEAR\", \"YYYY\", \"YY\" - truncate to the first date of the year that the date falls in \"QUARTER\" - truncate to the first date of the quarter that the date falls in \"MONTH\", \"MM\", \"MON\" - truncate to the first date of the month that the date falls in \"WEEK\" - truncate to the Monday of the week that the date falls in Examples: > SELECT trunc('2019-08-04', 'week'); 2019-07-29 > SELECT trunc('2019-08-04', 'quarter'); 2019-07-01 > SELECT trunc('2009-02-12', 'MM'); 2009-02-01 > SELECT trunc('2015-10-27', 'YEAR'); 2015-01-01 Since: 1.5.0","title":"trunc"},{"location":"datetime-functions/#try_make_interval","text":"try_make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - This is a special version of make_interval that performs the same operation, but returns NULL when an overflow occurs. Arguments: years - the number of years, positive or negative months - the number of months, positive or negative weeks - the number of weeks, positive or negative days - the number of days, positive or negative hours - the number of hours, positive or negative mins - the number of minutes, positive or negative secs - the number of seconds with the fractional part in microsecond precision. Examples: > SELECT try_make_interval(100, 11, 1, 1, 12, 30, 01.001001); 100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds > SELECT try_make_interval(100, null, 3); NULL > SELECT try_make_interval(0, 1, 0, 1, 0, 0, 100.000001); 1 months 1 days 1 minutes 40.000001 seconds > SELECT try_make_interval(2147483647); NULL Since: 4.0.0","title":"try_make_interval"},{"location":"datetime-functions/#try_make_timestamp","text":"try_make_timestamp(year, month, day, hour, min, sec[, timezone]) - Try to create a timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType . The function returns NULL on invalid inputs. try_make_timestamp(date[, time[, timezone]]) - Try to create a timestamp from date, time, and timezone fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. The value can be either an integer like 13 , or a fraction like 13.123. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date expression time - a time expression (optional). Default is 00:00:00. timezone - the time zone identifier (optional). For example, CET, UTC and etc. Examples: > SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT try_make_timestamp(DATE'2014-12-28'); 2014-12-28 00:00:00 > SELECT try_make_timestamp(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 > SELECT try_make_timestamp(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT try_make_timestamp(2019, 6, 30, 23, 59, 1); 2019-06-30 23:59:01 > SELECT try_make_timestamp(null, 7, 22, 15, 30, 0); NULL > SELECT try_make_timestamp(2024, 13, 22, 15, 30, 0); NULL Since: 4.0.0","title":"try_make_timestamp"},{"location":"datetime-functions/#try_make_timestamp_ltz","text":"try_make_timestamp_ltz(year, month, day, hour, min, sec[, timezone]) - Try to create the current timestamp with local time zone from year, month, day, hour, min, sec and (optional) timezone fields. The function returns NULL on invalid inputs. try_make_timestamp_ltz(date, time[, timezone]) - Try to create the current timestamp with local time zone from date, time and (optional) timezone fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. timezone - the time zone identifier. For example, CET, UTC and etc. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887, 'CET'); 2014-12-27 21:30:45.887 > SELECT try_make_timestamp_ltz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT try_make_timestamp_ltz(null, 7, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ltz(2024, 13, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp_ltz(DATE'2014-12-28', TIME'6:30:45.887', 'CET'); 2014-12-27 21:30:45.887 Since: 4.0.0","title":"try_make_timestamp_ltz"},{"location":"datetime-functions/#try_make_timestamp_ntz","text":"try_make_timestamp_ntz(year, month, day, hour, min, sec) - Try to create local date-time from year, month, day, hour, min, sec fields. The function returns NULL on invalid inputs. try_make_timestamp_ntz(date, time) - Create a local date-time from date and time fields. Arguments: year - the year to represent, from 1 to 9999 month - the month-of-year to represent, from 1 (January) to 12 (December) day - the day-of-month to represent, from 1 to 31 hour - the hour-of-day to represent, from 0 to 23 min - the minute-of-hour to represent, from 0 to 59 sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp. date - a date to represent, from 0001-01-01 to 9999-12-31 time - a local time to represent, from 00:00:00 to 23:59:59.999999 Examples: > SELECT try_make_timestamp_ntz(2014, 12, 28, 6, 30, 45.887); 2014-12-28 06:30:45.887 > SELECT try_make_timestamp_ntz(2019, 6, 30, 23, 59, 60); 2019-07-01 00:00:00 > SELECT try_make_timestamp_ntz(null, 7, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ntz(2024, 13, 22, 15, 30, 0); NULL > SELECT try_make_timestamp_ntz(DATE'2014-12-28', TIME'6:30:45.887'); 2014-12-28 06:30:45.887 Since: 4.0.0","title":"try_make_timestamp_ntz"},{"location":"datetime-functions/#try_to_date","text":"try_to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. The function always returns null on an invalid input with/without ANSI SQL mode enabled. By default, it follows casting rules to a date if the fmt is omitted. Arguments: date_str - A string to be parsed to date. fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT try_to_date('2016-12-31'); 2016-12-31 > SELECT try_to_date('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 > SELECT try_to_date('foo', 'yyyy-MM-dd'); NULL Since: 4.0.0","title":"try_to_date"},{"location":"datetime-functions/#try_to_time","text":"try_to_time(str[, format]) - Parses the str expression with the format expression to a time. If format is malformed or its application does not result in a well formed time, the function returns NULL. By default, it follows casting rules to a time if the format is omitted. Arguments: str - A string to be parsed to time. format - Time format pattern to follow. See Datetime Patterns for valid time format patterns. Examples: > SELECT try_to_time('00:12:00.001'); 00:12:00.001 > SELECT try_to_time('12.10.05.999999', 'HH.mm.ss.SSSSSS'); 12:10:05.999999 > SELECT try_to_time('foo', 'HH:mm:ss'); NULL Since: 4.1.0","title":"try_to_time"},{"location":"datetime-functions/#try_to_timestamp","text":"try_to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType . Arguments: timestamp_str - A string to be parsed to timestamp. fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns. Examples: > SELECT try_to_timestamp('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT try_to_timestamp('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 > SELECT try_to_timestamp('foo', 'yyyy-MM-dd'); NULL Since: 3.4.0","title":"try_to_timestamp"},{"location":"datetime-functions/#unix_date","text":"unix_date(date) - Returns the number of days since 1970-01-01. Examples: > SELECT unix_date(DATE(\"1970-01-02\")); 1 Since: 3.1.0","title":"unix_date"},{"location":"datetime-functions/#unix_micros","text":"unix_micros(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC. Examples: > SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z')); 1000000 Since: 3.1.0","title":"unix_micros"},{"location":"datetime-functions/#unix_millis","text":"unix_millis(timestamp) - Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision. Examples: > SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z')); 1000 Since: 3.1.0","title":"unix_millis"},{"location":"datetime-functions/#unix_seconds","text":"unix_seconds(timestamp) - Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision. Examples: > SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z')); 1 Since: 3.1.0","title":"unix_seconds"},{"location":"datetime-functions/#unix_timestamp","text":"unix_timestamp([timeExp[, fmt]]) - Returns the UNIX timestamp of current or specified time. Arguments: timeExp - A date/timestamp or string. If not provided, this defaults to current time. fmt - Date/time format pattern to follow. Ignored if timeExp is not a string. Default value is \"yyyy-MM-dd HH:mm:ss\". See Datetime Patterns for valid date and time format patterns. Examples: > SELECT unix_timestamp(); 1476884637 > SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd'); 1460041200 Since: 1.5.0","title":"unix_timestamp"},{"location":"datetime-functions/#weekday","text":"weekday(date) - Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday). Examples: > SELECT weekday('2009-07-30'); 3 Since: 2.4.0","title":"weekday"},{"location":"datetime-functions/#weekofyear","text":"weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days. Examples: > SELECT weekofyear('2008-02-20'); 8 Since: 1.5.0","title":"weekofyear"},{"location":"datetime-functions/#window","text":"window(time_column, window_duration[, slide_duration[, start_time]]) - Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples. Arguments: time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType. window_duration - A string specifying the width of the window represented as \"interval value\". (See Interval Literal for more details.) Note that the duration is a fixed length of time, and does not vary over time according to a calendar. slide_duration - A string specifying the sliding interval of the window represented as \"interval value\". A new window will be generated every slide_duration . Must be less than or equal to the window_duration . This duration is likewise absolute, and does not vary according to a calendar. start_time - The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide start_time as 15 minutes . Examples: > SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:05:00 2 A1 2021-01-01 00:05:00 2021-01-01 00:10:00 1 A2 2021-01-01 00:00:00 2021-01-01 00:05:00 1 > SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start; A1 2020-12-31 23:55:00 2021-01-01 00:05:00 2 A1 2021-01-01 00:00:00 2021-01-01 00:10:00 3 A1 2021-01-01 00:05:00 2021-01-01 00:15:00 1 A2 2020-12-31 23:55:00 2021-01-01 00:05:00 1 A2 2021-01-01 00:00:00 2021-01-01 00:10:00 1 Since: 2.0.0","title":"window"},{"location":"datetime-functions/#window_time","text":"window_time(window_column) - Extract the time value from time/session window column which can be used for event time value of window. The extracted time is (window.end - 1) which reflects the fact that the aggregating windows have exclusive upper bound - [start, end) See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples. Arguments: window_column - The column representing time/session window. Examples: > SELECT a, window.start as start, window.end as end, window_time(window), cnt FROM (SELECT a, window, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, window.start); A1 2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999 2 A1 2021-01-01 00:05:00 2021-01-01 00:10:00 2021-01-01 00:09:59.999999 1 A2 2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999 1 Since: 3.4.0","title":"window_time"},{"location":"datetime-functions/#year","text":"year(date) - Returns the year component of the date/timestamp. Examples: > SELECT year('2016-07-30'); 2016 Since: 1.5.0","title":"year"},{"location":"generator-functions/","text":"Generator Functions \u00b6 This page lists all generator functions available in Spark SQL. collations \u00b6 collations() - Get all of the Spark SQL string collations Examples: > SELECT * FROM collations() WHERE NAME = 'UTF8_BINARY'; SYSTEM BUILTIN UTF8_BINARY NULL NULL ACCENT_SENSITIVE CASE_SENSITIVE NO_PAD NULL Since: 4.0.0 explode \u00b6 explode(expr) - Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map. Examples: > SELECT explode(array(10, 20)); 10 20 > SELECT explode(collection => array(10, 20)); 10 20 Since: 1.0.0 explode_outer \u00b6 explode_outer(expr) - Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map. Examples: > SELECT * FROM explode_outer(array(10, 20)); 10 20 > SELECT * FROM explode_outer(collection => array(10, 20)); 10 20 Since: 3.4.0 inline \u00b6 inline(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise. Examples: > SELECT inline(array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b > SELECT inline(input => array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b Since: 2.0.0 inline_outer \u00b6 inline_outer(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise. Examples: > SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b > SELECT inline_outer(input => array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b Since: 2.0.0 posexplode \u00b6 posexplode(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map. Examples: > SELECT * FROM posexplode(array(10,20)); 0 10 1 20 > SELECT * FROM posexplode(collection => array(10,20)); 0 10 1 20 Since: 3.5.0 posexplode_outer \u00b6 posexplode_outer(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map. Examples: > SELECT * FROM posexplode_outer(array(10,20)); 0 10 1 20 > SELECT * FROM posexplode_outer(collection => array(10,20)); 0 10 1 20 Since: 3.5.0 sql_keywords \u00b6 sql_keywords() - Get Spark SQL keywords Examples: > SELECT * FROM sql_keywords() LIMIT 2; ADD false AFTER false Since: 3.5.0 stack \u00b6 stack(n, expr1, ..., exprk) - Separates expr1 , ..., exprk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise. Examples: > SELECT stack(2, 1, 2, 3); 1 2 3 NULL Since: 2.0.0","title":"Generator Functions"},{"location":"generator-functions/#generator-functions","text":"This page lists all generator functions available in Spark SQL.","title":"Generator Functions"},{"location":"generator-functions/#collations","text":"collations() - Get all of the Spark SQL string collations Examples: > SELECT * FROM collations() WHERE NAME = 'UTF8_BINARY'; SYSTEM BUILTIN UTF8_BINARY NULL NULL ACCENT_SENSITIVE CASE_SENSITIVE NO_PAD NULL Since: 4.0.0","title":"collations"},{"location":"generator-functions/#explode","text":"explode(expr) - Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map. Examples: > SELECT explode(array(10, 20)); 10 20 > SELECT explode(collection => array(10, 20)); 10 20 Since: 1.0.0","title":"explode"},{"location":"generator-functions/#explode_outer","text":"explode_outer(expr) - Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map. Examples: > SELECT * FROM explode_outer(array(10, 20)); 10 20 > SELECT * FROM explode_outer(collection => array(10, 20)); 10 20 Since: 3.4.0","title":"explode_outer"},{"location":"generator-functions/#inline","text":"inline(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise. Examples: > SELECT inline(array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b > SELECT inline(input => array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b Since: 2.0.0","title":"inline"},{"location":"generator-functions/#inline_outer","text":"inline_outer(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise. Examples: > SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b > SELECT inline_outer(input => array(struct(1, 'a'), struct(2, 'b'))); 1 a 2 b Since: 2.0.0","title":"inline_outer"},{"location":"generator-functions/#posexplode","text":"posexplode(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map. Examples: > SELECT * FROM posexplode(array(10,20)); 0 10 1 20 > SELECT * FROM posexplode(collection => array(10,20)); 0 10 1 20 Since: 3.5.0","title":"posexplode"},{"location":"generator-functions/#posexplode_outer","text":"posexplode_outer(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map. Examples: > SELECT * FROM posexplode_outer(array(10,20)); 0 10 1 20 > SELECT * FROM posexplode_outer(collection => array(10,20)); 0 10 1 20 Since: 3.5.0","title":"posexplode_outer"},{"location":"generator-functions/#sql_keywords","text":"sql_keywords() - Get Spark SQL keywords Examples: > SELECT * FROM sql_keywords() LIMIT 2; ADD false AFTER false Since: 3.5.0","title":"sql_keywords"},{"location":"generator-functions/#stack","text":"stack(n, expr1, ..., exprk) - Separates expr1 , ..., exprk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise. Examples: > SELECT stack(2, 1, 2, 3); 1 2 3 NULL Since: 2.0.0","title":"stack"},{"location":"hash-functions/","text":"Hash Functions \u00b6 This page lists all hash functions available in Spark SQL. crc32 \u00b6 crc32(expr) - Returns a cyclic redundancy check value of the expr as a bigint. Examples: > SELECT crc32('Spark'); 1557323817 Since: 1.5.0 hash \u00b6 hash(expr1, expr2, ...) - Returns a hash value of the arguments. Examples: > SELECT hash('Spark', array(123), 2); -1321691492 Since: 2.0.0 md5 \u00b6 md5(expr) - Returns an MD5 128-bit checksum as a hex string of expr . Examples: > SELECT md5('Spark'); 8cde774d6f7333752ed72cacddb05126 Since: 1.5.0 sha \u00b6 sha(expr) - Returns a sha1 hash value as a hex string of the expr . Examples: > SELECT sha('Spark'); 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c Since: 1.5.0 sha1 \u00b6 sha1(expr) - Returns a sha1 hash value as a hex string of the expr . Examples: > SELECT sha1('Spark'); 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c Since: 1.5.0 sha2 \u00b6 sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr . SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256. Examples: > SELECT sha2('Spark', 256); 529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b Since: 1.5.0 xxhash64 \u00b6 xxhash64(expr1, expr2, ...) - Returns a 64-bit hash value of the arguments. Hash seed is 42. Examples: > SELECT xxhash64('Spark', array(123), 2); 5602566077635097486 Since: 3.0.0","title":"Hash Functions"},{"location":"hash-functions/#hash-functions","text":"This page lists all hash functions available in Spark SQL.","title":"Hash Functions"},{"location":"hash-functions/#crc32","text":"crc32(expr) - Returns a cyclic redundancy check value of the expr as a bigint. Examples: > SELECT crc32('Spark'); 1557323817 Since: 1.5.0","title":"crc32"},{"location":"hash-functions/#hash","text":"hash(expr1, expr2, ...) - Returns a hash value of the arguments. Examples: > SELECT hash('Spark', array(123), 2); -1321691492 Since: 2.0.0","title":"hash"},{"location":"hash-functions/#md5","text":"md5(expr) - Returns an MD5 128-bit checksum as a hex string of expr . Examples: > SELECT md5('Spark'); 8cde774d6f7333752ed72cacddb05126 Since: 1.5.0","title":"md5"},{"location":"hash-functions/#sha","text":"sha(expr) - Returns a sha1 hash value as a hex string of the expr . Examples: > SELECT sha('Spark'); 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c Since: 1.5.0","title":"sha"},{"location":"hash-functions/#sha1","text":"sha1(expr) - Returns a sha1 hash value as a hex string of the expr . Examples: > SELECT sha1('Spark'); 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c Since: 1.5.0","title":"sha1"},{"location":"hash-functions/#sha2","text":"sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr . SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256. Examples: > SELECT sha2('Spark', 256); 529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b Since: 1.5.0","title":"sha2"},{"location":"hash-functions/#xxhash64","text":"xxhash64(expr1, expr2, ...) - Returns a 64-bit hash value of the arguments. Hash seed is 42. Examples: > SELECT xxhash64('Spark', array(123), 2); 5602566077635097486 Since: 3.0.0","title":"xxhash64"},{"location":"json-functions/","text":"Json Functions \u00b6 This page lists all json functions available in Spark SQL. from_json \u00b6 from_json(jsonStr, schema[, options]) - Returns a struct value with the given jsonStr and schema . Examples: > SELECT from_json('{\"a\":1, \"b\":0.8}', 'a INT, b DOUBLE'); {\"a\":1,\"b\":0.8} > SELECT from_json('{\"time\":\"26/08/2015\"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); {\"time\":2015-08-26 00:00:00} > SELECT from_json('{\"teacher\": \"Alice\", \"student\": [{\"name\": \"Bob\", \"rank\": 1}, {\"name\": \"Charlie\", \"rank\": 2}]}', 'STRUCT<teacher: STRING, student: ARRAY<STRUCT<name: STRING, rank: INT>>>'); {\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]} Since: 2.2.0 get_json_object \u00b6 get_json_object(json_txt, path) - Extracts a json object from path . Examples: > SELECT get_json_object('{\"a\":\"b\"}', '$.a'); b > SELECT get_json_object('[{\"a\":\"b\"},{\"a\":\"c\"}]', '$[0].a'); b > SELECT get_json_object('[{\"a\":\"b\"},{\"a\":\"c\"}]', '$[*].a'); [\"b\",\"c\"] Since: 1.5.0 json_array_length \u00b6 json_array_length(jsonArray) - Returns the number of elements in the outermost JSON array. Arguments: jsonArray - A JSON array. NULL is returned in case of any other valid JSON string, NULL or an invalid JSON. Examples: > SELECT json_array_length('[1,2,3,4]'); 4 > SELECT json_array_length('[1,2,3,{\"f1\":1,\"f2\":[5,6]},4]'); 5 > SELECT json_array_length('[1,2'); NULL Since: 3.1.0 json_object_keys \u00b6 json_object_keys(json_object) - Returns all the keys of the outermost JSON object as an array. Arguments: json_object - A JSON object. If a valid JSON object is given, all the keys of the outermost object will be returned as an array. If it is any other valid JSON string, an invalid JSON string or an empty string, the function returns null. Examples: > SELECT json_object_keys('{}'); [] > SELECT json_object_keys('{\"key\": \"value\"}'); [\"key\"] > SELECT json_object_keys('{\"f1\":\"abc\",\"f2\":{\"f3\":\"a\", \"f4\":\"b\"}}'); [\"f1\",\"f2\"] Since: 3.1.0 json_tuple \u00b6 json_tuple(jsonStr, p1, p2, ..., pn) - Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string. Examples: > SELECT json_tuple('{\"a\":1, \"b\":2}', 'a', 'b'); 1 2 Since: 1.6.0 schema_of_json \u00b6 schema_of_json(json[, options]) - Returns schema in the DDL format of JSON string. Examples: > SELECT schema_of_json('[{\"col\":0}]'); ARRAY<STRUCT<col: BIGINT>> > SELECT schema_of_json('[{\"col\":01}]', map('allowNumericLeadingZeros', 'true')); ARRAY<STRUCT<col: BIGINT>> Since: 2.4.0 to_json \u00b6 to_json(expr[, options]) - Returns a JSON string with a given struct value Examples: > SELECT to_json(named_struct('a', 1, 'b', 2)); {\"a\":1,\"b\":2} > SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); {\"time\":\"26/08/2015\"} > SELECT to_json(array(named_struct('a', 1, 'b', 2))); [{\"a\":1,\"b\":2}] > SELECT to_json(map('a', named_struct('b', 1))); {\"a\":{\"b\":1}} > SELECT to_json(map(named_struct('a', 1),named_struct('b', 2))); {\"[1]\":{\"b\":2}} > SELECT to_json(map('a', 1)); {\"a\":1} > SELECT to_json(array(map('a', 1))); [{\"a\":1}] Since: 2.2.0","title":"Json Functions"},{"location":"json-functions/#json-functions","text":"This page lists all json functions available in Spark SQL.","title":"Json Functions"},{"location":"json-functions/#from_json","text":"from_json(jsonStr, schema[, options]) - Returns a struct value with the given jsonStr and schema . Examples: > SELECT from_json('{\"a\":1, \"b\":0.8}', 'a INT, b DOUBLE'); {\"a\":1,\"b\":0.8} > SELECT from_json('{\"time\":\"26/08/2015\"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); {\"time\":2015-08-26 00:00:00} > SELECT from_json('{\"teacher\": \"Alice\", \"student\": [{\"name\": \"Bob\", \"rank\": 1}, {\"name\": \"Charlie\", \"rank\": 2}]}', 'STRUCT<teacher: STRING, student: ARRAY<STRUCT<name: STRING, rank: INT>>>'); {\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]} Since: 2.2.0","title":"from_json"},{"location":"json-functions/#get_json_object","text":"get_json_object(json_txt, path) - Extracts a json object from path . Examples: > SELECT get_json_object('{\"a\":\"b\"}', '$.a'); b > SELECT get_json_object('[{\"a\":\"b\"},{\"a\":\"c\"}]', '$[0].a'); b > SELECT get_json_object('[{\"a\":\"b\"},{\"a\":\"c\"}]', '$[*].a'); [\"b\",\"c\"] Since: 1.5.0","title":"get_json_object"},{"location":"json-functions/#json_array_length","text":"json_array_length(jsonArray) - Returns the number of elements in the outermost JSON array. Arguments: jsonArray - A JSON array. NULL is returned in case of any other valid JSON string, NULL or an invalid JSON. Examples: > SELECT json_array_length('[1,2,3,4]'); 4 > SELECT json_array_length('[1,2,3,{\"f1\":1,\"f2\":[5,6]},4]'); 5 > SELECT json_array_length('[1,2'); NULL Since: 3.1.0","title":"json_array_length"},{"location":"json-functions/#json_object_keys","text":"json_object_keys(json_object) - Returns all the keys of the outermost JSON object as an array. Arguments: json_object - A JSON object. If a valid JSON object is given, all the keys of the outermost object will be returned as an array. If it is any other valid JSON string, an invalid JSON string or an empty string, the function returns null. Examples: > SELECT json_object_keys('{}'); [] > SELECT json_object_keys('{\"key\": \"value\"}'); [\"key\"] > SELECT json_object_keys('{\"f1\":\"abc\",\"f2\":{\"f3\":\"a\", \"f4\":\"b\"}}'); [\"f1\",\"f2\"] Since: 3.1.0","title":"json_object_keys"},{"location":"json-functions/#json_tuple","text":"json_tuple(jsonStr, p1, p2, ..., pn) - Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string. Examples: > SELECT json_tuple('{\"a\":1, \"b\":2}', 'a', 'b'); 1 2 Since: 1.6.0","title":"json_tuple"},{"location":"json-functions/#schema_of_json","text":"schema_of_json(json[, options]) - Returns schema in the DDL format of JSON string. Examples: > SELECT schema_of_json('[{\"col\":0}]'); ARRAY<STRUCT<col: BIGINT>> > SELECT schema_of_json('[{\"col\":01}]', map('allowNumericLeadingZeros', 'true')); ARRAY<STRUCT<col: BIGINT>> Since: 2.4.0","title":"schema_of_json"},{"location":"json-functions/#to_json","text":"to_json(expr[, options]) - Returns a JSON string with a given struct value Examples: > SELECT to_json(named_struct('a', 1, 'b', 2)); {\"a\":1,\"b\":2} > SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); {\"time\":\"26/08/2015\"} > SELECT to_json(array(named_struct('a', 1, 'b', 2))); [{\"a\":1,\"b\":2}] > SELECT to_json(map('a', named_struct('b', 1))); {\"a\":{\"b\":1}} > SELECT to_json(map(named_struct('a', 1),named_struct('b', 2))); {\"[1]\":{\"b\":2}} > SELECT to_json(map('a', 1)); {\"a\":1} > SELECT to_json(array(map('a', 1))); [{\"a\":1}] Since: 2.2.0","title":"to_json"},{"location":"map-functions/","text":"Map Functions \u00b6 This page lists all map functions available in Spark SQL. map \u00b6 map(key0, value0, key1, value1, ...) - Creates a map with the given key/value pairs. Examples: > SELECT map(1.0, '2', 3.0, '4'); {1.0:\"2\",3.0:\"4\"} Since: 2.0.0 map_concat \u00b6 map_concat(map, ...) - Returns the union of all the given maps Examples: > SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c')); {1:\"a\",2:\"b\",3:\"c\"} Since: 2.4.0 map_contains_key \u00b6 map_contains_key(map, key) - Returns true if the map contains the key. Examples: > SELECT map_contains_key(map(1, 'a', 2, 'b'), 1); true > SELECT map_contains_key(map(1, 'a', 2, 'b'), 3); false Since: 3.3.0 map_entries \u00b6 map_entries(map) - Returns an unordered array of all entries in the given map. Examples: > SELECT map_entries(map(1, 'a', 2, 'b')); [{\"key\":1,\"value\":\"a\"},{\"key\":2,\"value\":\"b\"}] Since: 3.0.0 map_from_arrays \u00b6 map_from_arrays(keys, values) - Creates a map with a pair of the given key/value arrays. All elements in keys should not be null Examples: > SELECT map_from_arrays(array(1.0, 3.0), array('2', '4')); {1.0:\"2\",3.0:\"4\"} Since: 2.4.0 map_from_entries \u00b6 map_from_entries(arrayOfEntries) - Returns a map created from the given array of entries. Examples: > SELECT map_from_entries(array(struct(1, 'a'), struct(2, 'b'))); {1:\"a\",2:\"b\"} Since: 2.4.0 map_keys \u00b6 map_keys(map) - Returns an unordered array containing the keys of the map. Examples: > SELECT map_keys(map(1, 'a', 2, 'b')); [1,2] Since: 2.0.0 map_values \u00b6 map_values(map) - Returns an unordered array containing the values of the map. Examples: > SELECT map_values(map(1, 'a', 2, 'b')); [\"a\",\"b\"] Since: 2.0.0 str_to_map \u00b6 str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and ':' for keyValueDelim . Both pairDelim and keyValueDelim are treated as regular expressions. Examples: > SELECT str_to_map('a:1,b:2,c:3', ',', ':'); {\"a\":\"1\",\"b\":\"2\",\"c\":\"3\"} > SELECT str_to_map('a'); {\"a\":null} Since: 2.0.1","title":"Map Functions"},{"location":"map-functions/#map-functions","text":"This page lists all map functions available in Spark SQL.","title":"Map Functions"},{"location":"map-functions/#map","text":"map(key0, value0, key1, value1, ...) - Creates a map with the given key/value pairs. Examples: > SELECT map(1.0, '2', 3.0, '4'); {1.0:\"2\",3.0:\"4\"} Since: 2.0.0","title":"map"},{"location":"map-functions/#map_concat","text":"map_concat(map, ...) - Returns the union of all the given maps Examples: > SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c')); {1:\"a\",2:\"b\",3:\"c\"} Since: 2.4.0","title":"map_concat"},{"location":"map-functions/#map_contains_key","text":"map_contains_key(map, key) - Returns true if the map contains the key. Examples: > SELECT map_contains_key(map(1, 'a', 2, 'b'), 1); true > SELECT map_contains_key(map(1, 'a', 2, 'b'), 3); false Since: 3.3.0","title":"map_contains_key"},{"location":"map-functions/#map_entries","text":"map_entries(map) - Returns an unordered array of all entries in the given map. Examples: > SELECT map_entries(map(1, 'a', 2, 'b')); [{\"key\":1,\"value\":\"a\"},{\"key\":2,\"value\":\"b\"}] Since: 3.0.0","title":"map_entries"},{"location":"map-functions/#map_from_arrays","text":"map_from_arrays(keys, values) - Creates a map with a pair of the given key/value arrays. All elements in keys should not be null Examples: > SELECT map_from_arrays(array(1.0, 3.0), array('2', '4')); {1.0:\"2\",3.0:\"4\"} Since: 2.4.0","title":"map_from_arrays"},{"location":"map-functions/#map_from_entries","text":"map_from_entries(arrayOfEntries) - Returns a map created from the given array of entries. Examples: > SELECT map_from_entries(array(struct(1, 'a'), struct(2, 'b'))); {1:\"a\",2:\"b\"} Since: 2.4.0","title":"map_from_entries"},{"location":"map-functions/#map_keys","text":"map_keys(map) - Returns an unordered array containing the keys of the map. Examples: > SELECT map_keys(map(1, 'a', 2, 'b')); [1,2] Since: 2.0.0","title":"map_keys"},{"location":"map-functions/#map_values","text":"map_values(map) - Returns an unordered array containing the values of the map. Examples: > SELECT map_values(map(1, 'a', 2, 'b')); [\"a\",\"b\"] Since: 2.0.0","title":"map_values"},{"location":"map-functions/#str_to_map","text":"str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and ':' for keyValueDelim . Both pairDelim and keyValueDelim are treated as regular expressions. Examples: > SELECT str_to_map('a:1,b:2,c:3', ',', ':'); {\"a\":\"1\",\"b\":\"2\",\"c\":\"3\"} > SELECT str_to_map('a'); {\"a\":null} Since: 2.0.1","title":"str_to_map"},{"location":"math-functions/","text":"Math Functions \u00b6 This page lists all math functions available in Spark SQL. % \u00b6 expr1 % expr2, or mod(expr1, expr2) - Returns the remainder after expr1 / expr2 . Examples: > SELECT 2 % 1.8; 0.2 > SELECT MOD(2, 1.8); 0.2 Since: 1.0.0 * \u00b6 expr1 * expr2 - Returns expr1 * expr2 . Examples: > SELECT 2 * 3; 6 Since: 1.0.0 + \u00b6 expr1 + expr2 - Returns expr1 + expr2 . Examples: > SELECT 1 + 2; 3 Since: 1.0.0 - \u00b6 expr1 - expr2 - Returns expr1 - expr2 . Examples: > SELECT 2 - 1; 1 Since: 1.0.0 / \u00b6 expr1 / expr2 - Returns expr1 / expr2 . It always performs floating point division. Examples: > SELECT 3 / 2; 1.5 > SELECT 2L / 2L; 1.0 Since: 1.0.0 abs \u00b6 abs(expr) - Returns the absolute value of the numeric or interval value. Examples: > SELECT abs(-1); 1 > SELECT abs(INTERVAL -'1-1' YEAR TO MONTH); 1-1 Since: 1.2.0 acos \u00b6 acos(expr) - Returns the inverse cosine (a.k.a. arc cosine) of expr , as if computed by java.lang.Math.acos . Examples: > SELECT acos(1); 0.0 > SELECT acos(2); NaN Since: 1.4.0 acosh \u00b6 acosh(expr) - Returns inverse hyperbolic cosine of expr . Examples: > SELECT acosh(1); 0.0 > SELECT acosh(0); NaN Since: 3.0.0 asin \u00b6 asin(expr) - Returns the inverse sine (a.k.a. arc sine) the arc sin of expr , as if computed by java.lang.Math.asin . Examples: > SELECT asin(0); 0.0 > SELECT asin(2); NaN Since: 1.4.0 asinh \u00b6 asinh(expr) - Returns inverse hyperbolic sine of expr . Examples: > SELECT asinh(0); 0.0 Since: 3.0.0 atan \u00b6 atan(expr) - Returns the inverse tangent (a.k.a. arc tangent) of expr , as if computed by java.lang.Math.atan Examples: > SELECT atan(0); 0.0 Since: 1.4.0 atan2 \u00b6 atan2(exprY, exprX) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates ( exprX , exprY ), as if computed by java.lang.Math.atan2 . Arguments: exprY - coordinate on y-axis exprX - coordinate on x-axis Examples: > SELECT atan2(0, 0); 0.0 Since: 1.4.0 atanh \u00b6 atanh(expr) - Returns inverse hyperbolic tangent of expr . Examples: > SELECT atanh(0); 0.0 > SELECT atanh(2); NaN Since: 3.0.0 bin \u00b6 bin(expr) - Returns the string representation of the long value expr represented in binary. Examples: > SELECT bin(13); 1101 > SELECT bin(-13); 1111111111111111111111111111111111111111111111111111111111110011 > SELECT bin(13.3); 1101 Since: 1.5.0 bround \u00b6 bround(expr, d) - Returns expr rounded to d decimal places using HALF_EVEN rounding mode. Examples: > SELECT bround(2.5, 0); 2 > SELECT bround(25, -1); 20 Since: 2.0.0 cbrt \u00b6 cbrt(expr) - Returns the cube root of expr . Examples: > SELECT cbrt(27.0); 3.0 Since: 1.4.0 ceil \u00b6 ceil(expr[, scale]) - Returns the smallest number after rounding up that is not smaller than expr . An optional scale parameter can be specified to control the rounding behavior. Examples: > SELECT ceil(-0.1); 0 > SELECT ceil(5); 5 > SELECT ceil(3.1411, 3); 3.142 > SELECT ceil(3.1411, -3); 1000 Since: 3.3.0 ceiling \u00b6 ceiling(expr[, scale]) - Returns the smallest number after rounding up that is not smaller than expr . An optional scale parameter can be specified to control the rounding behavior. Examples: > SELECT ceiling(-0.1); 0 > SELECT ceiling(5); 5 > SELECT ceiling(3.1411, 3); 3.142 > SELECT ceiling(3.1411, -3); 1000 Since: 3.3.0 conv \u00b6 conv(num, from_base, to_base) - Convert num from from_base to to_base . Examples: > SELECT conv('100', 2, 10); 4 > SELECT conv(-10, 16, -10); -16 Since: 1.5.0 cos \u00b6 cos(expr) - Returns the cosine of expr , as if computed by java.lang.Math.cos . Arguments: expr - angle in radians Examples: > SELECT cos(0); 1.0 Since: 1.4.0 cosh \u00b6 cosh(expr) - Returns the hyperbolic cosine of expr , as if computed by java.lang.Math.cosh . Arguments: expr - hyperbolic angle Examples: > SELECT cosh(0); 1.0 Since: 1.4.0 cot \u00b6 cot(expr) - Returns the cotangent of expr , as if computed by 1/java.lang.Math.tan . Arguments: expr - angle in radians Examples: > SELECT cot(1); 0.6420926159343306 Since: 2.3.0 csc \u00b6 csc(expr) - Returns the cosecant of expr , as if computed by 1/java.lang.Math.sin . Arguments: expr - angle in radians Examples: > SELECT csc(1); 1.1883951057781212 Since: 3.3.0 degrees \u00b6 degrees(expr) - Converts radians to degrees. Arguments: expr - angle in radians Examples: > SELECT degrees(3.141592653589793); 180.0 Since: 1.4.0 div \u00b6 expr1 div expr2 - Divide expr1 by expr2 . It returns NULL if an operand is NULL or expr2 is 0. The result is casted to long. Examples: > SELECT 3 div 2; 1 > SELECT INTERVAL '1-1' YEAR TO MONTH div INTERVAL '-1' MONTH; -13 Since: 3.0.0 e \u00b6 e() - Returns Euler's number, e. Examples: > SELECT e(); 2.718281828459045 Since: 1.5.0 exp \u00b6 exp(expr) - Returns e to the power of expr . Examples: > SELECT exp(0); 1.0 Since: 1.4.0 expm1 \u00b6 expm1(expr) - Returns exp( expr ) - 1. Examples: > SELECT expm1(0); 0.0 Since: 1.4.0 factorial \u00b6 factorial(expr) - Returns the factorial of expr . expr is [0..20]. Otherwise, null. Examples: > SELECT factorial(5); 120 Since: 1.5.0 floor \u00b6 floor(expr[, scale]) - Returns the largest number after rounding down that is not greater than expr . An optional scale parameter can be specified to control the rounding behavior. Examples: > SELECT floor(-0.1); -1 > SELECT floor(5); 5 > SELECT floor(3.1411, 3); 3.141 > SELECT floor(3.1411, -3); 0 Since: 3.3.0 greatest \u00b6 greatest(expr, ...) - Returns the greatest value of all parameters, skipping null values. Examples: > SELECT greatest(10, 9, 2, 4, 3); 10 Since: 1.5.0 hex \u00b6 hex(expr) - Converts expr to hexadecimal. Examples: > SELECT hex(17); 11 > SELECT hex('Spark SQL'); 537061726B2053514C Since: 1.5.0 hypot \u00b6 hypot(expr1, expr2) - Returns sqrt( expr1 \u00b2 + expr2 \u00b2). Examples: > SELECT hypot(3, 4); 5.0 Since: 1.4.0 least \u00b6 least(expr, ...) - Returns the least value of all parameters, skipping null values. Examples: > SELECT least(10, 9, 2, 4, 3); 2 Since: 1.5.0 ln \u00b6 ln(expr) - Returns the natural logarithm (base e) of expr . Examples: > SELECT ln(1); 0.0 Since: 1.4.0 log \u00b6 log(base, expr) - Returns the logarithm of expr with base . Examples: > SELECT log(10, 100); 2.0 Since: 1.5.0 log10 \u00b6 log10(expr) - Returns the logarithm of expr with base 10. Examples: > SELECT log10(10); 1.0 Since: 1.4.0 log1p \u00b6 log1p(expr) - Returns log(1 + expr ). Examples: > SELECT log1p(0); 0.0 Since: 1.4.0 log2 \u00b6 log2(expr) - Returns the logarithm of expr with base 2. Examples: > SELECT log2(2); 1.0 Since: 1.4.0 mod \u00b6 expr1 % expr2, or mod(expr1, expr2) - Returns the remainder after expr1 / expr2 . Examples: > SELECT 2 % 1.8; 0.2 > SELECT MOD(2, 1.8); 0.2 Since: 2.3.0 negative \u00b6 negative(expr) - Returns the negated value of expr . Examples: > SELECT negative(1); -1 Since: 1.0.0 pi \u00b6 pi() - Returns pi. Examples: > SELECT pi(); 3.141592653589793 Since: 1.5.0 pmod \u00b6 pmod(expr1, expr2) - Returns the positive value of expr1 mod expr2 . Examples: > SELECT pmod(10, 3); 1 > SELECT pmod(-10, 3); 2 Since: 1.5.0 positive \u00b6 positive(expr) - Returns the value of expr . Examples: > SELECT positive(1); 1 Since: 1.5.0 pow \u00b6 pow(expr1, expr2) - Raises expr1 to the power of expr2 . Examples: > SELECT pow(2, 3); 8.0 Since: 1.4.0 power \u00b6 power(expr1, expr2) - Raises expr1 to the power of expr2 . Examples: > SELECT power(2, 3); 8.0 Since: 1.4.0 radians \u00b6 radians(expr) - Converts degrees to radians. Arguments: expr - angle in degrees Examples: > SELECT radians(180); 3.141592653589793 Since: 1.4.0 rand \u00b6 rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1). Examples: > SELECT rand(); 0.9629742951434543 > SELECT rand(0); 0.7604953758285915 > SELECT rand(null); 0.7604953758285915 Note: The function is non-deterministic in general case. Since: 1.5.0 randn \u00b6 randn([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution. Examples: > SELECT randn(); -0.3254147983080288 > SELECT randn(0); 1.6034991609278433 > SELECT randn(null); 1.6034991609278433 Note: The function is non-deterministic in general case. Since: 1.5.0 random \u00b6 random([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1). Examples: > SELECT random(); 0.9629742951434543 > SELECT random(0); 0.7604953758285915 > SELECT random(null); 0.7604953758285915 Note: The function is non-deterministic in general case. Since: 3.0.0 rint \u00b6 rint(expr) - Returns the double value that is closest in value to the argument and is equal to a mathematical integer. Examples: > SELECT rint(12.3456); 12.0 Since: 1.4.0 round \u00b6 round(expr, d) - Returns expr rounded to d decimal places using HALF_UP rounding mode. Examples: > SELECT round(2.5, 0); 3 Since: 1.5.0 sec \u00b6 sec(expr) - Returns the secant of expr , as if computed by 1/java.lang.Math.cos . Arguments: expr - angle in radians Examples: > SELECT sec(0); 1.0 Since: 3.3.0 sign \u00b6 sign(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive. Examples: > SELECT sign(40); 1.0 > SELECT sign(INTERVAL -'100' YEAR); -1.0 Since: 1.4.0 signum \u00b6 signum(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive. Examples: > SELECT signum(40); 1.0 > SELECT signum(INTERVAL -'100' YEAR); -1.0 Since: 1.4.0 sin \u00b6 sin(expr) - Returns the sine of expr , as if computed by java.lang.Math.sin . Arguments: expr - angle in radians Examples: > SELECT sin(0); 0.0 Since: 1.4.0 sinh \u00b6 sinh(expr) - Returns hyperbolic sine of expr , as if computed by java.lang.Math.sinh . Arguments: expr - hyperbolic angle Examples: > SELECT sinh(0); 0.0 Since: 1.4.0 sqrt \u00b6 sqrt(expr) - Returns the square root of expr . Examples: > SELECT sqrt(4); 2.0 Since: 1.1.1 tan \u00b6 tan(expr) - Returns the tangent of expr , as if computed by java.lang.Math.tan . Arguments: expr - angle in radians Examples: > SELECT tan(0); 0.0 Since: 1.4.0 tanh \u00b6 tanh(expr) - Returns the hyperbolic tangent of expr , as if computed by java.lang.Math.tanh . Arguments: expr - hyperbolic angle Examples: > SELECT tanh(0); 0.0 Since: 1.4.0 try_add \u00b6 try_add(expr1, expr2) - Returns the sum of expr1 and expr2 and the result is null on overflow. The acceptable input types are the same with the + operator. Examples: > SELECT try_add(1, 2); 3 > SELECT try_add(2147483647, 1); NULL > SELECT try_add(date'2021-01-01', 1); 2021-01-02 > SELECT try_add(date'2021-01-01', interval 1 year); 2022-01-01 > SELECT try_add(timestamp'2021-01-01 00:00:00', interval 1 day); 2021-01-02 00:00:00 > SELECT try_add(interval 1 year, interval 2 year); 3-0 Since: 3.2.0 try_divide \u00b6 try_divide(dividend, divisor) - Returns dividend / divisor . It always performs floating point division. Its result is always null if expr2 is 0. dividend must be a numeric or an interval. divisor must be a numeric. Examples: > SELECT try_divide(3, 2); 1.5 > SELECT try_divide(2L, 2L); 1.0 > SELECT try_divide(1, 0); NULL > SELECT try_divide(interval 2 month, 2); 0-1 > SELECT try_divide(interval 2 month, 0); NULL Since: 3.2.0 try_mod \u00b6 try_mod(dividend, divisor) - Returns the remainder after expr1 / expr2 . dividend must be a numeric. divisor must be a numeric. Examples: > SELECT try_mod(3, 2); 1 > SELECT try_mod(2L, 2L); 0 > SELECT try_mod(3.0, 2.0); 1.0 > SELECT try_mod(1, 0); NULL Since: 4.0.0 try_multiply \u00b6 try_multiply(expr1, expr2) - Returns expr1 * expr2 and the result is null on overflow. The acceptable input types are the same with the * operator. Examples: > SELECT try_multiply(2, 3); 6 > SELECT try_multiply(-2147483648, 10); NULL > SELECT try_multiply(interval 2 year, 3); 6-0 Since: 3.3.0 try_subtract \u00b6 try_subtract(expr1, expr2) - Returns expr1 - expr2 and the result is null on overflow. The acceptable input types are the same with the - operator. Examples: > SELECT try_subtract(2, 1); 1 > SELECT try_subtract(-2147483648, 1); NULL > SELECT try_subtract(date'2021-01-02', 1); 2021-01-01 > SELECT try_subtract(date'2021-01-01', interval 1 year); 2020-01-01 > SELECT try_subtract(timestamp'2021-01-02 00:00:00', interval 1 day); 2021-01-01 00:00:00 > SELECT try_subtract(interval 2 year, interval 1 year); 1-0 Since: 3.3.0 unhex \u00b6 unhex(expr) - Converts hexadecimal expr to binary. Examples: > SELECT decode(unhex('537061726B2053514C'), 'UTF-8'); Spark SQL Since: 1.5.0 uniform \u00b6 uniform(min, max[, seed]) - Returns a random value with independent and identically distributed (i.i.d.) values with the specified range of numbers. The random seed is optional. The provided numbers specifying the minimum and maximum values of the range must be constant. If both of these numbers are integers, then the result will also be an integer. Otherwise if one or both of these are floating-point numbers, then the result will also be a floating-point number. Examples: > SELECT uniform(10, 20, 0) > 0 AS result; true Since: 4.0.0 width_bucket \u00b6 width_bucket(value, min_value, max_value, num_bucket) - Returns the bucket number to which value would be assigned in an equiwidth histogram with num_bucket buckets, in the range min_value to max_value .\" Examples: > SELECT width_bucket(5.3, 0.2, 10.6, 5); 3 > SELECT width_bucket(-2.1, 1.3, 3.4, 3); 0 > SELECT width_bucket(8.1, 0.0, 5.7, 4); 5 > SELECT width_bucket(-0.9, 5.2, 0.5, 2); 3 > SELECT width_bucket(INTERVAL '0' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10); 1 > SELECT width_bucket(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10); 2 > SELECT width_bucket(INTERVAL '0' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10); 1 > SELECT width_bucket(INTERVAL '1' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10); 2 Since: 3.1.0","title":"Math Functions"},{"location":"math-functions/#math-functions","text":"This page lists all math functions available in Spark SQL.","title":"Math Functions"},{"location":"math-functions/#_1","text":"expr1 % expr2, or mod(expr1, expr2) - Returns the remainder after expr1 / expr2 . Examples: > SELECT 2 % 1.8; 0.2 > SELECT MOD(2, 1.8); 0.2 Since: 1.0.0","title":"%"},{"location":"math-functions/#_2","text":"expr1 * expr2 - Returns expr1 * expr2 . Examples: > SELECT 2 * 3; 6 Since: 1.0.0","title":"*"},{"location":"math-functions/#_3","text":"expr1 + expr2 - Returns expr1 + expr2 . Examples: > SELECT 1 + 2; 3 Since: 1.0.0","title":"+"},{"location":"math-functions/#-","text":"expr1 - expr2 - Returns expr1 - expr2 . Examples: > SELECT 2 - 1; 1 Since: 1.0.0","title":"-"},{"location":"math-functions/#_4","text":"expr1 / expr2 - Returns expr1 / expr2 . It always performs floating point division. Examples: > SELECT 3 / 2; 1.5 > SELECT 2L / 2L; 1.0 Since: 1.0.0","title":"/"},{"location":"math-functions/#abs","text":"abs(expr) - Returns the absolute value of the numeric or interval value. Examples: > SELECT abs(-1); 1 > SELECT abs(INTERVAL -'1-1' YEAR TO MONTH); 1-1 Since: 1.2.0","title":"abs"},{"location":"math-functions/#acos","text":"acos(expr) - Returns the inverse cosine (a.k.a. arc cosine) of expr , as if computed by java.lang.Math.acos . Examples: > SELECT acos(1); 0.0 > SELECT acos(2); NaN Since: 1.4.0","title":"acos"},{"location":"math-functions/#acosh","text":"acosh(expr) - Returns inverse hyperbolic cosine of expr . Examples: > SELECT acosh(1); 0.0 > SELECT acosh(0); NaN Since: 3.0.0","title":"acosh"},{"location":"math-functions/#asin","text":"asin(expr) - Returns the inverse sine (a.k.a. arc sine) the arc sin of expr , as if computed by java.lang.Math.asin . Examples: > SELECT asin(0); 0.0 > SELECT asin(2); NaN Since: 1.4.0","title":"asin"},{"location":"math-functions/#asinh","text":"asinh(expr) - Returns inverse hyperbolic sine of expr . Examples: > SELECT asinh(0); 0.0 Since: 3.0.0","title":"asinh"},{"location":"math-functions/#atan","text":"atan(expr) - Returns the inverse tangent (a.k.a. arc tangent) of expr , as if computed by java.lang.Math.atan Examples: > SELECT atan(0); 0.0 Since: 1.4.0","title":"atan"},{"location":"math-functions/#atan2","text":"atan2(exprY, exprX) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates ( exprX , exprY ), as if computed by java.lang.Math.atan2 . Arguments: exprY - coordinate on y-axis exprX - coordinate on x-axis Examples: > SELECT atan2(0, 0); 0.0 Since: 1.4.0","title":"atan2"},{"location":"math-functions/#atanh","text":"atanh(expr) - Returns inverse hyperbolic tangent of expr . Examples: > SELECT atanh(0); 0.0 > SELECT atanh(2); NaN Since: 3.0.0","title":"atanh"},{"location":"math-functions/#bin","text":"bin(expr) - Returns the string representation of the long value expr represented in binary. Examples: > SELECT bin(13); 1101 > SELECT bin(-13); 1111111111111111111111111111111111111111111111111111111111110011 > SELECT bin(13.3); 1101 Since: 1.5.0","title":"bin"},{"location":"math-functions/#bround","text":"bround(expr, d) - Returns expr rounded to d decimal places using HALF_EVEN rounding mode. Examples: > SELECT bround(2.5, 0); 2 > SELECT bround(25, -1); 20 Since: 2.0.0","title":"bround"},{"location":"math-functions/#cbrt","text":"cbrt(expr) - Returns the cube root of expr . Examples: > SELECT cbrt(27.0); 3.0 Since: 1.4.0","title":"cbrt"},{"location":"math-functions/#ceil","text":"ceil(expr[, scale]) - Returns the smallest number after rounding up that is not smaller than expr . An optional scale parameter can be specified to control the rounding behavior. Examples: > SELECT ceil(-0.1); 0 > SELECT ceil(5); 5 > SELECT ceil(3.1411, 3); 3.142 > SELECT ceil(3.1411, -3); 1000 Since: 3.3.0","title":"ceil"},{"location":"math-functions/#ceiling","text":"ceiling(expr[, scale]) - Returns the smallest number after rounding up that is not smaller than expr . An optional scale parameter can be specified to control the rounding behavior. Examples: > SELECT ceiling(-0.1); 0 > SELECT ceiling(5); 5 > SELECT ceiling(3.1411, 3); 3.142 > SELECT ceiling(3.1411, -3); 1000 Since: 3.3.0","title":"ceiling"},{"location":"math-functions/#conv","text":"conv(num, from_base, to_base) - Convert num from from_base to to_base . Examples: > SELECT conv('100', 2, 10); 4 > SELECT conv(-10, 16, -10); -16 Since: 1.5.0","title":"conv"},{"location":"math-functions/#cos","text":"cos(expr) - Returns the cosine of expr , as if computed by java.lang.Math.cos . Arguments: expr - angle in radians Examples: > SELECT cos(0); 1.0 Since: 1.4.0","title":"cos"},{"location":"math-functions/#cosh","text":"cosh(expr) - Returns the hyperbolic cosine of expr , as if computed by java.lang.Math.cosh . Arguments: expr - hyperbolic angle Examples: > SELECT cosh(0); 1.0 Since: 1.4.0","title":"cosh"},{"location":"math-functions/#cot","text":"cot(expr) - Returns the cotangent of expr , as if computed by 1/java.lang.Math.tan . Arguments: expr - angle in radians Examples: > SELECT cot(1); 0.6420926159343306 Since: 2.3.0","title":"cot"},{"location":"math-functions/#csc","text":"csc(expr) - Returns the cosecant of expr , as if computed by 1/java.lang.Math.sin . Arguments: expr - angle in radians Examples: > SELECT csc(1); 1.1883951057781212 Since: 3.3.0","title":"csc"},{"location":"math-functions/#degrees","text":"degrees(expr) - Converts radians to degrees. Arguments: expr - angle in radians Examples: > SELECT degrees(3.141592653589793); 180.0 Since: 1.4.0","title":"degrees"},{"location":"math-functions/#div","text":"expr1 div expr2 - Divide expr1 by expr2 . It returns NULL if an operand is NULL or expr2 is 0. The result is casted to long. Examples: > SELECT 3 div 2; 1 > SELECT INTERVAL '1-1' YEAR TO MONTH div INTERVAL '-1' MONTH; -13 Since: 3.0.0","title":"div"},{"location":"math-functions/#e","text":"e() - Returns Euler's number, e. Examples: > SELECT e(); 2.718281828459045 Since: 1.5.0","title":"e"},{"location":"math-functions/#exp","text":"exp(expr) - Returns e to the power of expr . Examples: > SELECT exp(0); 1.0 Since: 1.4.0","title":"exp"},{"location":"math-functions/#expm1","text":"expm1(expr) - Returns exp( expr ) - 1. Examples: > SELECT expm1(0); 0.0 Since: 1.4.0","title":"expm1"},{"location":"math-functions/#factorial","text":"factorial(expr) - Returns the factorial of expr . expr is [0..20]. Otherwise, null. Examples: > SELECT factorial(5); 120 Since: 1.5.0","title":"factorial"},{"location":"math-functions/#floor","text":"floor(expr[, scale]) - Returns the largest number after rounding down that is not greater than expr . An optional scale parameter can be specified to control the rounding behavior. Examples: > SELECT floor(-0.1); -1 > SELECT floor(5); 5 > SELECT floor(3.1411, 3); 3.141 > SELECT floor(3.1411, -3); 0 Since: 3.3.0","title":"floor"},{"location":"math-functions/#greatest","text":"greatest(expr, ...) - Returns the greatest value of all parameters, skipping null values. Examples: > SELECT greatest(10, 9, 2, 4, 3); 10 Since: 1.5.0","title":"greatest"},{"location":"math-functions/#hex","text":"hex(expr) - Converts expr to hexadecimal. Examples: > SELECT hex(17); 11 > SELECT hex('Spark SQL'); 537061726B2053514C Since: 1.5.0","title":"hex"},{"location":"math-functions/#hypot","text":"hypot(expr1, expr2) - Returns sqrt( expr1 \u00b2 + expr2 \u00b2). Examples: > SELECT hypot(3, 4); 5.0 Since: 1.4.0","title":"hypot"},{"location":"math-functions/#least","text":"least(expr, ...) - Returns the least value of all parameters, skipping null values. Examples: > SELECT least(10, 9, 2, 4, 3); 2 Since: 1.5.0","title":"least"},{"location":"math-functions/#ln","text":"ln(expr) - Returns the natural logarithm (base e) of expr . Examples: > SELECT ln(1); 0.0 Since: 1.4.0","title":"ln"},{"location":"math-functions/#log","text":"log(base, expr) - Returns the logarithm of expr with base . Examples: > SELECT log(10, 100); 2.0 Since: 1.5.0","title":"log"},{"location":"math-functions/#log10","text":"log10(expr) - Returns the logarithm of expr with base 10. Examples: > SELECT log10(10); 1.0 Since: 1.4.0","title":"log10"},{"location":"math-functions/#log1p","text":"log1p(expr) - Returns log(1 + expr ). Examples: > SELECT log1p(0); 0.0 Since: 1.4.0","title":"log1p"},{"location":"math-functions/#log2","text":"log2(expr) - Returns the logarithm of expr with base 2. Examples: > SELECT log2(2); 1.0 Since: 1.4.0","title":"log2"},{"location":"math-functions/#mod","text":"expr1 % expr2, or mod(expr1, expr2) - Returns the remainder after expr1 / expr2 . Examples: > SELECT 2 % 1.8; 0.2 > SELECT MOD(2, 1.8); 0.2 Since: 2.3.0","title":"mod"},{"location":"math-functions/#negative","text":"negative(expr) - Returns the negated value of expr . Examples: > SELECT negative(1); -1 Since: 1.0.0","title":"negative"},{"location":"math-functions/#pi","text":"pi() - Returns pi. Examples: > SELECT pi(); 3.141592653589793 Since: 1.5.0","title":"pi"},{"location":"math-functions/#pmod","text":"pmod(expr1, expr2) - Returns the positive value of expr1 mod expr2 . Examples: > SELECT pmod(10, 3); 1 > SELECT pmod(-10, 3); 2 Since: 1.5.0","title":"pmod"},{"location":"math-functions/#positive","text":"positive(expr) - Returns the value of expr . Examples: > SELECT positive(1); 1 Since: 1.5.0","title":"positive"},{"location":"math-functions/#pow","text":"pow(expr1, expr2) - Raises expr1 to the power of expr2 . Examples: > SELECT pow(2, 3); 8.0 Since: 1.4.0","title":"pow"},{"location":"math-functions/#power","text":"power(expr1, expr2) - Raises expr1 to the power of expr2 . Examples: > SELECT power(2, 3); 8.0 Since: 1.4.0","title":"power"},{"location":"math-functions/#radians","text":"radians(expr) - Converts degrees to radians. Arguments: expr - angle in degrees Examples: > SELECT radians(180); 3.141592653589793 Since: 1.4.0","title":"radians"},{"location":"math-functions/#rand","text":"rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1). Examples: > SELECT rand(); 0.9629742951434543 > SELECT rand(0); 0.7604953758285915 > SELECT rand(null); 0.7604953758285915 Note: The function is non-deterministic in general case. Since: 1.5.0","title":"rand"},{"location":"math-functions/#randn","text":"randn([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution. Examples: > SELECT randn(); -0.3254147983080288 > SELECT randn(0); 1.6034991609278433 > SELECT randn(null); 1.6034991609278433 Note: The function is non-deterministic in general case. Since: 1.5.0","title":"randn"},{"location":"math-functions/#random","text":"random([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1). Examples: > SELECT random(); 0.9629742951434543 > SELECT random(0); 0.7604953758285915 > SELECT random(null); 0.7604953758285915 Note: The function is non-deterministic in general case. Since: 3.0.0","title":"random"},{"location":"math-functions/#rint","text":"rint(expr) - Returns the double value that is closest in value to the argument and is equal to a mathematical integer. Examples: > SELECT rint(12.3456); 12.0 Since: 1.4.0","title":"rint"},{"location":"math-functions/#round","text":"round(expr, d) - Returns expr rounded to d decimal places using HALF_UP rounding mode. Examples: > SELECT round(2.5, 0); 3 Since: 1.5.0","title":"round"},{"location":"math-functions/#sec","text":"sec(expr) - Returns the secant of expr , as if computed by 1/java.lang.Math.cos . Arguments: expr - angle in radians Examples: > SELECT sec(0); 1.0 Since: 3.3.0","title":"sec"},{"location":"math-functions/#sign","text":"sign(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive. Examples: > SELECT sign(40); 1.0 > SELECT sign(INTERVAL -'100' YEAR); -1.0 Since: 1.4.0","title":"sign"},{"location":"math-functions/#signum","text":"signum(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive. Examples: > SELECT signum(40); 1.0 > SELECT signum(INTERVAL -'100' YEAR); -1.0 Since: 1.4.0","title":"signum"},{"location":"math-functions/#sin","text":"sin(expr) - Returns the sine of expr , as if computed by java.lang.Math.sin . Arguments: expr - angle in radians Examples: > SELECT sin(0); 0.0 Since: 1.4.0","title":"sin"},{"location":"math-functions/#sinh","text":"sinh(expr) - Returns hyperbolic sine of expr , as if computed by java.lang.Math.sinh . Arguments: expr - hyperbolic angle Examples: > SELECT sinh(0); 0.0 Since: 1.4.0","title":"sinh"},{"location":"math-functions/#sqrt","text":"sqrt(expr) - Returns the square root of expr . Examples: > SELECT sqrt(4); 2.0 Since: 1.1.1","title":"sqrt"},{"location":"math-functions/#tan","text":"tan(expr) - Returns the tangent of expr , as if computed by java.lang.Math.tan . Arguments: expr - angle in radians Examples: > SELECT tan(0); 0.0 Since: 1.4.0","title":"tan"},{"location":"math-functions/#tanh","text":"tanh(expr) - Returns the hyperbolic tangent of expr , as if computed by java.lang.Math.tanh . Arguments: expr - hyperbolic angle Examples: > SELECT tanh(0); 0.0 Since: 1.4.0","title":"tanh"},{"location":"math-functions/#try_add","text":"try_add(expr1, expr2) - Returns the sum of expr1 and expr2 and the result is null on overflow. The acceptable input types are the same with the + operator. Examples: > SELECT try_add(1, 2); 3 > SELECT try_add(2147483647, 1); NULL > SELECT try_add(date'2021-01-01', 1); 2021-01-02 > SELECT try_add(date'2021-01-01', interval 1 year); 2022-01-01 > SELECT try_add(timestamp'2021-01-01 00:00:00', interval 1 day); 2021-01-02 00:00:00 > SELECT try_add(interval 1 year, interval 2 year); 3-0 Since: 3.2.0","title":"try_add"},{"location":"math-functions/#try_divide","text":"try_divide(dividend, divisor) - Returns dividend / divisor . It always performs floating point division. Its result is always null if expr2 is 0. dividend must be a numeric or an interval. divisor must be a numeric. Examples: > SELECT try_divide(3, 2); 1.5 > SELECT try_divide(2L, 2L); 1.0 > SELECT try_divide(1, 0); NULL > SELECT try_divide(interval 2 month, 2); 0-1 > SELECT try_divide(interval 2 month, 0); NULL Since: 3.2.0","title":"try_divide"},{"location":"math-functions/#try_mod","text":"try_mod(dividend, divisor) - Returns the remainder after expr1 / expr2 . dividend must be a numeric. divisor must be a numeric. Examples: > SELECT try_mod(3, 2); 1 > SELECT try_mod(2L, 2L); 0 > SELECT try_mod(3.0, 2.0); 1.0 > SELECT try_mod(1, 0); NULL Since: 4.0.0","title":"try_mod"},{"location":"math-functions/#try_multiply","text":"try_multiply(expr1, expr2) - Returns expr1 * expr2 and the result is null on overflow. The acceptable input types are the same with the * operator. Examples: > SELECT try_multiply(2, 3); 6 > SELECT try_multiply(-2147483648, 10); NULL > SELECT try_multiply(interval 2 year, 3); 6-0 Since: 3.3.0","title":"try_multiply"},{"location":"math-functions/#try_subtract","text":"try_subtract(expr1, expr2) - Returns expr1 - expr2 and the result is null on overflow. The acceptable input types are the same with the - operator. Examples: > SELECT try_subtract(2, 1); 1 > SELECT try_subtract(-2147483648, 1); NULL > SELECT try_subtract(date'2021-01-02', 1); 2021-01-01 > SELECT try_subtract(date'2021-01-01', interval 1 year); 2020-01-01 > SELECT try_subtract(timestamp'2021-01-02 00:00:00', interval 1 day); 2021-01-01 00:00:00 > SELECT try_subtract(interval 2 year, interval 1 year); 1-0 Since: 3.3.0","title":"try_subtract"},{"location":"math-functions/#unhex","text":"unhex(expr) - Converts hexadecimal expr to binary. Examples: > SELECT decode(unhex('537061726B2053514C'), 'UTF-8'); Spark SQL Since: 1.5.0","title":"unhex"},{"location":"math-functions/#uniform","text":"uniform(min, max[, seed]) - Returns a random value with independent and identically distributed (i.i.d.) values with the specified range of numbers. The random seed is optional. The provided numbers specifying the minimum and maximum values of the range must be constant. If both of these numbers are integers, then the result will also be an integer. Otherwise if one or both of these are floating-point numbers, then the result will also be a floating-point number. Examples: > SELECT uniform(10, 20, 0) > 0 AS result; true Since: 4.0.0","title":"uniform"},{"location":"math-functions/#width_bucket","text":"width_bucket(value, min_value, max_value, num_bucket) - Returns the bucket number to which value would be assigned in an equiwidth histogram with num_bucket buckets, in the range min_value to max_value .\" Examples: > SELECT width_bucket(5.3, 0.2, 10.6, 5); 3 > SELECT width_bucket(-2.1, 1.3, 3.4, 3); 0 > SELECT width_bucket(8.1, 0.0, 5.7, 4); 5 > SELECT width_bucket(-0.9, 5.2, 0.5, 2); 3 > SELECT width_bucket(INTERVAL '0' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10); 1 > SELECT width_bucket(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10); 2 > SELECT width_bucket(INTERVAL '0' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10); 1 > SELECT width_bucket(INTERVAL '1' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10); 2 Since: 3.1.0","title":"width_bucket"},{"location":"misc-functions/","text":"Misc Functions \u00b6 This page lists all misc functions available in Spark SQL. aes_decrypt \u00b6 aes_decrypt(expr, key[, mode[, padding[, aad]]]) - Returns a decrypted value of expr using AES in mode with padding . Key lengths of 16, 24 and 32 bits are supported. Supported combinations of ( mode , padding ) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM. Arguments: expr - The binary value to decrypt. key - The passphrase to use to decrypt the data. mode - Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC. padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC. aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption. Examples: > SELECT aes_decrypt(unhex('83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94'), '0000111122223333'); Spark > SELECT aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM'); Spark SQL > SELECT aes_decrypt(unbase64('3lmwu+Mw0H3fi5NDvcu9lg=='), '1234567890abcdef', 'ECB', 'PKCS'); Spark SQL > SELECT aes_decrypt(unbase64('2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo='), '1234567890abcdef', 'CBC'); Apache Spark > SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg='), 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT'); Spark > SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'), 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', 'This is an AAD mixed into the input'); Spark Since: 3.3.0 aes_encrypt \u00b6 aes_encrypt(expr, key[, mode[, padding[, iv[, aad]]]]) - Returns an encrypted value of expr using AES in given mode with the specified padding . Key lengths of 16, 24 and 32 bits are supported. Supported combinations of ( mode , padding ) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output. Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM. Arguments: expr - The binary value to encrypt. key - The passphrase to use to encrypt the data. mode - Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC. padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC. iv - Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or ''. 16-byte array for CBC mode. 12-byte array for GCM mode. aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption. Examples: > SELECT hex(aes_encrypt('Spark', '0000111122223333')); 83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94 > SELECT hex(aes_encrypt('Spark SQL', '0000111122223333', 'GCM')); 6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210 > SELECT base64(aes_encrypt('Spark SQL', '1234567890abcdef', 'ECB', 'PKCS')); 3lmwu+Mw0H3fi5NDvcu9lg== > SELECT base64(aes_encrypt('Apache Spark', '1234567890abcdef', 'CBC', 'DEFAULT')); 2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo= > SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT', unhex('00000000000000000000000000000000'))); AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg= > SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', unhex('000000000000000000000000'), 'This is an AAD mixed into the input')); AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4 Since: 3.3.0 assert_true \u00b6 assert_true(expr [, message]) - Throws an exception if expr is not true. Examples: > SELECT assert_true(0 < 1); NULL Since: 2.0.0 bitmap_bit_position \u00b6 bitmap_bit_position(child) - Returns the bit position for the given input child expression. Examples: > SELECT bitmap_bit_position(1); 0 > SELECT bitmap_bit_position(123); 122 Since: 3.5.0 bitmap_bucket_number \u00b6 bitmap_bucket_number(child) - Returns the bucket number for the given input child expression. Examples: > SELECT bitmap_bucket_number(123); 1 > SELECT bitmap_bucket_number(0); 0 Since: 3.5.0 bitmap_count \u00b6 bitmap_count(child) - Returns the number of set bits in the child bitmap. Examples: > SELECT bitmap_count(X '1010'); 2 > SELECT bitmap_count(X 'FFFF'); 16 > SELECT bitmap_count(X '0'); 0 Since: 3.5.0 current_catalog \u00b6 current_catalog() - Returns the current catalog. Examples: > SELECT current_catalog(); spark_catalog Since: 3.1.0 current_database \u00b6 current_database() - Returns the current database. Examples: > SELECT current_database(); default Since: 1.6.0 current_schema \u00b6 current_schema() - Returns the current database. Examples: > SELECT current_schema(); default Since: 3.4.0 current_user \u00b6 current_user() - user name of current execution context. Examples: > SELECT current_user(); mockingjay Since: 3.2.0 input_file_block_length \u00b6 input_file_block_length() - Returns the length of the block being read, or -1 if not available. Examples: > SELECT input_file_block_length(); -1 Since: 2.2.0 input_file_block_start \u00b6 input_file_block_start() - Returns the start offset of the block being read, or -1 if not available. Examples: > SELECT input_file_block_start(); -1 Since: 2.2.0 input_file_name \u00b6 input_file_name() - Returns the name of the file being read, or empty string if not available. Examples: > SELECT input_file_name(); Since: 1.5.0 java_method \u00b6 java_method(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection. Examples: > SELECT java_method('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 Since: 2.0.0 monotonically_increasing_id \u00b6 monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs. Examples: > SELECT monotonically_increasing_id(); 0 Since: 1.4.0 raise_error \u00b6 raise_error( expr ) - Throws a USER_RAISED_EXCEPTION with expr as message. Examples: > SELECT raise_error('custom error message'); [USER_RAISED_EXCEPTION] custom error message Since: 3.1.0 reflect \u00b6 reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection. Examples: > SELECT reflect('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 Since: 2.0.0 session_user \u00b6 session_user() - user name of current execution context. Examples: > SELECT session_user(); mockingjay Since: 4.0.0 spark_partition_id \u00b6 spark_partition_id() - Returns the current partition id. Examples: > SELECT spark_partition_id(); 0 Since: 1.4.0 try_aes_decrypt \u00b6 try_aes_decrypt(expr, key[, mode[, padding[, aad]]]) - This is a special version of aes_decrypt that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed. Examples: > SELECT try_aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM'); Spark SQL > SELECT try_aes_decrypt(unhex('----------468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM'); NULL Since: 3.5.0 try_reflect \u00b6 try_reflect(class, method[, arg1[, arg2 ..]]) - This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception. Examples: > SELECT try_reflect('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT try_reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 > SELECT try_reflect('java.net.URLDecoder', 'decode', '%'); NULL Since: 4.0.0 typeof \u00b6 typeof(expr) - Return DDL-formatted type string for the data type of the input. Examples: > SELECT typeof(1); int > SELECT typeof(array(1)); array<int> Since: 3.0.0 user \u00b6 user() - user name of current execution context. Examples: > SELECT user(); mockingjay Since: 3.4.0 uuid \u00b6 uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string. Examples: > SELECT uuid(); 46707d92-02f4-4817-8116-a4c3b23e6266 Note: The function is non-deterministic. Since: 2.3.0 version \u00b6 version() - Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision. Examples: > SELECT version(); 3.1.0 a6d6ea3efedbad14d99c24143834cd4e2e52fb40 Since: 3.0.0","title":"Misc Functions"},{"location":"misc-functions/#misc-functions","text":"This page lists all misc functions available in Spark SQL.","title":"Misc Functions"},{"location":"misc-functions/#aes_decrypt","text":"aes_decrypt(expr, key[, mode[, padding[, aad]]]) - Returns a decrypted value of expr using AES in mode with padding . Key lengths of 16, 24 and 32 bits are supported. Supported combinations of ( mode , padding ) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM. Arguments: expr - The binary value to decrypt. key - The passphrase to use to decrypt the data. mode - Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC. padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC. aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption. Examples: > SELECT aes_decrypt(unhex('83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94'), '0000111122223333'); Spark > SELECT aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM'); Spark SQL > SELECT aes_decrypt(unbase64('3lmwu+Mw0H3fi5NDvcu9lg=='), '1234567890abcdef', 'ECB', 'PKCS'); Spark SQL > SELECT aes_decrypt(unbase64('2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo='), '1234567890abcdef', 'CBC'); Apache Spark > SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg='), 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT'); Spark > SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'), 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', 'This is an AAD mixed into the input'); Spark Since: 3.3.0","title":"aes_decrypt"},{"location":"misc-functions/#aes_encrypt","text":"aes_encrypt(expr, key[, mode[, padding[, iv[, aad]]]]) - Returns an encrypted value of expr using AES in given mode with the specified padding . Key lengths of 16, 24 and 32 bits are supported. Supported combinations of ( mode , padding ) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output. Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM. Arguments: expr - The binary value to encrypt. key - The passphrase to use to encrypt the data. mode - Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC. padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC. iv - Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or ''. 16-byte array for CBC mode. 12-byte array for GCM mode. aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption. Examples: > SELECT hex(aes_encrypt('Spark', '0000111122223333')); 83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94 > SELECT hex(aes_encrypt('Spark SQL', '0000111122223333', 'GCM')); 6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210 > SELECT base64(aes_encrypt('Spark SQL', '1234567890abcdef', 'ECB', 'PKCS')); 3lmwu+Mw0H3fi5NDvcu9lg== > SELECT base64(aes_encrypt('Apache Spark', '1234567890abcdef', 'CBC', 'DEFAULT')); 2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo= > SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT', unhex('00000000000000000000000000000000'))); AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg= > SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', unhex('000000000000000000000000'), 'This is an AAD mixed into the input')); AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4 Since: 3.3.0","title":"aes_encrypt"},{"location":"misc-functions/#assert_true","text":"assert_true(expr [, message]) - Throws an exception if expr is not true. Examples: > SELECT assert_true(0 < 1); NULL Since: 2.0.0","title":"assert_true"},{"location":"misc-functions/#bitmap_bit_position","text":"bitmap_bit_position(child) - Returns the bit position for the given input child expression. Examples: > SELECT bitmap_bit_position(1); 0 > SELECT bitmap_bit_position(123); 122 Since: 3.5.0","title":"bitmap_bit_position"},{"location":"misc-functions/#bitmap_bucket_number","text":"bitmap_bucket_number(child) - Returns the bucket number for the given input child expression. Examples: > SELECT bitmap_bucket_number(123); 1 > SELECT bitmap_bucket_number(0); 0 Since: 3.5.0","title":"bitmap_bucket_number"},{"location":"misc-functions/#bitmap_count","text":"bitmap_count(child) - Returns the number of set bits in the child bitmap. Examples: > SELECT bitmap_count(X '1010'); 2 > SELECT bitmap_count(X 'FFFF'); 16 > SELECT bitmap_count(X '0'); 0 Since: 3.5.0","title":"bitmap_count"},{"location":"misc-functions/#current_catalog","text":"current_catalog() - Returns the current catalog. Examples: > SELECT current_catalog(); spark_catalog Since: 3.1.0","title":"current_catalog"},{"location":"misc-functions/#current_database","text":"current_database() - Returns the current database. Examples: > SELECT current_database(); default Since: 1.6.0","title":"current_database"},{"location":"misc-functions/#current_schema","text":"current_schema() - Returns the current database. Examples: > SELECT current_schema(); default Since: 3.4.0","title":"current_schema"},{"location":"misc-functions/#current_user","text":"current_user() - user name of current execution context. Examples: > SELECT current_user(); mockingjay Since: 3.2.0","title":"current_user"},{"location":"misc-functions/#input_file_block_length","text":"input_file_block_length() - Returns the length of the block being read, or -1 if not available. Examples: > SELECT input_file_block_length(); -1 Since: 2.2.0","title":"input_file_block_length"},{"location":"misc-functions/#input_file_block_start","text":"input_file_block_start() - Returns the start offset of the block being read, or -1 if not available. Examples: > SELECT input_file_block_start(); -1 Since: 2.2.0","title":"input_file_block_start"},{"location":"misc-functions/#input_file_name","text":"input_file_name() - Returns the name of the file being read, or empty string if not available. Examples: > SELECT input_file_name(); Since: 1.5.0","title":"input_file_name"},{"location":"misc-functions/#java_method","text":"java_method(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection. Examples: > SELECT java_method('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 Since: 2.0.0","title":"java_method"},{"location":"misc-functions/#monotonically_increasing_id","text":"monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs. Examples: > SELECT monotonically_increasing_id(); 0 Since: 1.4.0","title":"monotonically_increasing_id"},{"location":"misc-functions/#raise_error","text":"raise_error( expr ) - Throws a USER_RAISED_EXCEPTION with expr as message. Examples: > SELECT raise_error('custom error message'); [USER_RAISED_EXCEPTION] custom error message Since: 3.1.0","title":"raise_error"},{"location":"misc-functions/#reflect","text":"reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection. Examples: > SELECT reflect('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 Since: 2.0.0","title":"reflect"},{"location":"misc-functions/#session_user","text":"session_user() - user name of current execution context. Examples: > SELECT session_user(); mockingjay Since: 4.0.0","title":"session_user"},{"location":"misc-functions/#spark_partition_id","text":"spark_partition_id() - Returns the current partition id. Examples: > SELECT spark_partition_id(); 0 Since: 1.4.0","title":"spark_partition_id"},{"location":"misc-functions/#try_aes_decrypt","text":"try_aes_decrypt(expr, key[, mode[, padding[, aad]]]) - This is a special version of aes_decrypt that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed. Examples: > SELECT try_aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM'); Spark SQL > SELECT try_aes_decrypt(unhex('----------468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM'); NULL Since: 3.5.0","title":"try_aes_decrypt"},{"location":"misc-functions/#try_reflect","text":"try_reflect(class, method[, arg1[, arg2 ..]]) - This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception. Examples: > SELECT try_reflect('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT try_reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 > SELECT try_reflect('java.net.URLDecoder', 'decode', '%'); NULL Since: 4.0.0","title":"try_reflect"},{"location":"misc-functions/#typeof","text":"typeof(expr) - Return DDL-formatted type string for the data type of the input. Examples: > SELECT typeof(1); int > SELECT typeof(array(1)); array<int> Since: 3.0.0","title":"typeof"},{"location":"misc-functions/#user","text":"user() - user name of current execution context. Examples: > SELECT user(); mockingjay Since: 3.4.0","title":"user"},{"location":"misc-functions/#uuid","text":"uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string. Examples: > SELECT uuid(); 46707d92-02f4-4817-8116-a4c3b23e6266 Note: The function is non-deterministic. Since: 2.3.0","title":"uuid"},{"location":"misc-functions/#version","text":"version() - Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision. Examples: > SELECT version(); 3.1.0 a6d6ea3efedbad14d99c24143834cd4e2e52fb40 Since: 3.0.0","title":"version"},{"location":"predicate-functions/","text":"Predicate Functions \u00b6 This page lists all predicate functions available in Spark SQL. ! \u00b6 ! expr - Logical not. Examples: > SELECT ! true; false > SELECT ! false; true > SELECT ! NULL; NULL Since: 1.0.0 != \u00b6 expr1 != expr2 - Returns true if expr1 is not equal to expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 1 != 2; true > SELECT 1 != '2'; true > SELECT true != NULL; NULL > SELECT NULL != NULL; NULL Since: 1.0.0 < \u00b6 expr1 < expr2 - Returns true if expr1 is less than expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 1 < 2; true > SELECT 1.1 < '1'; false > SELECT to_date('2009-07-30 04:17:52') < to_date('2009-07-30 04:17:52'); false > SELECT to_date('2009-07-30 04:17:52') < to_date('2009-08-01 04:17:52'); true > SELECT 1 < NULL; NULL Since: 1.0.0 <= \u00b6 expr1 <= expr2 - Returns true if expr1 is less than or equal to expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 <= 2; true > SELECT 1.0 <= '1'; true > SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-07-30 04:17:52'); true > SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-08-01 04:17:52'); true > SELECT 1 <= NULL; NULL Since: 1.0.0 <=> \u00b6 expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 <=> 2; true > SELECT 1 <=> '1'; true > SELECT true <=> NULL; false > SELECT NULL <=> NULL; true Since: 1.1.0 <> \u00b6 expr1 != expr2 - Returns true if expr1 is not equal to expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 1 != 2; true > SELECT 1 != '2'; true > SELECT true != NULL; NULL > SELECT NULL != NULL; NULL Since: 1.0.0 = \u00b6 expr1 = expr2 - Returns true if expr1 equals expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 = 2; true > SELECT 1 = '1'; true > SELECT true = NULL; NULL > SELECT NULL = NULL; NULL Since: 1.0.0 == \u00b6 expr1 == expr2 - Returns true if expr1 equals expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 == 2; true > SELECT 1 == '1'; true > SELECT true == NULL; NULL > SELECT NULL == NULL; NULL Since: 1.0.0 > \u00b6 expr1 > expr2 - Returns true if expr1 is greater than expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 > 1; true > SELECT 2 > 1.1; true > SELECT to_date('2009-07-30 04:17:52') > to_date('2009-07-30 04:17:52'); false > SELECT to_date('2009-07-30 04:17:52') > to_date('2009-08-01 04:17:52'); false > SELECT 1 > NULL; NULL Since: 1.0.0 >= \u00b6 expr1 >= expr2 - Returns true if expr1 is greater than or equal to expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 >= 1; true > SELECT 2.0 >= '2.1'; false > SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-07-30 04:17:52'); true > SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-08-01 04:17:52'); false > SELECT 1 >= NULL; NULL Since: 1.0.0 and \u00b6 expr1 and expr2 - Logical AND. Examples: > SELECT true and true; true > SELECT true and false; false > SELECT true and NULL; NULL > SELECT false and NULL; false Since: 1.0.0 equal_null \u00b6 equal_null(expr1, expr2) - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT equal_null(3, 3); true > SELECT equal_null(1, '11'); false > SELECT equal_null(true, NULL); false > SELECT equal_null(NULL, 'abc'); false > SELECT equal_null(NULL, NULL); true Since: 3.4.0 ilike \u00b6 str ilike pattern[ ESCAPE escape] - Returns true if str matches pattern with escape case-insensitively, null if any arguments are null, false otherwise. Arguments: str - a string expression pattern - a string expression. The pattern is a string which is matched literally and case-insensitively, with exception to the following special symbols: _ matches any one character in the input (similar to . in posix regular expressions) % matches zero or more characters in the input (similar to .* in posix regular expressions) Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, in order to match \"\\abc\", the pattern should be \"\\abc\". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match \"\\abc\" should be \"\\abc\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. escape - an character added since Spark 3.0. The default escape character is the '\\'. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character. Examples: > SELECT ilike('Spark', '_Park'); true > SELECT '\\\\abc' AS S, S ilike r'\\\\abc', S ilike '\\\\\\\\abc'; \\abc true true > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT '%SystemDrive%\\Users\\John' ilike '\\%SystemDrive\\%\\\\users%'; true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\\\USERS\\\\John' ilike r'%SystemDrive%\\\\Users%'; true > SELECT '%SystemDrive%/Users/John' ilike '/%SYSTEMDrive/%//Users%' ESCAPE '/'; true Note: Use RLIKE to match with standard regular expressions. Since: 3.3.0 in \u00b6 expr1 in(expr2, expr3, ...) - Returns true if expr equals to any valN. Arguments: expr1, expr2, expr3, ... - the arguments must be same type. Examples: > SELECT 1 in(1, 2, 3); true > SELECT 1 in(2, 3, 4); false > SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3)); false > SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3)); true Since: 1.0.0 isnan \u00b6 isnan(expr) - Returns true if expr is NaN, or false otherwise. Examples: > SELECT isnan(cast('NaN' as double)); true Since: 1.5.0 isnotnull \u00b6 isnotnull(expr) - Returns true if expr is not null, or false otherwise. Examples: > SELECT isnotnull(1); true Since: 1.0.0 isnull \u00b6 isnull(expr) - Returns true if expr is null, or false otherwise. Examples: > SELECT isnull(1); false Since: 1.0.0 like \u00b6 str like pattern[ ESCAPE escape] - Returns true if str matches pattern with escape , null if any arguments are null, false otherwise. Arguments: str - a string expression pattern - a string expression. The pattern is a string which is matched literally, with exception to the following special symbols: _ matches any one character in the input (similar to . in posix regular expressions)\\ % matches zero or more characters in the input (similar to .* in posix regular expressions) Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, in order to match \"\\abc\", the pattern should be \"\\abc\". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match \"\\abc\" should be \"\\abc\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. escape - an character added since Spark 3.0. The default escape character is the '\\'. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character. Examples: > SELECT like('Spark', '_park'); true > SELECT '\\\\abc' AS S, S like r'\\\\abc', S like '\\\\\\\\abc'; \\abc true true > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT '%SystemDrive%\\Users\\John' like '\\%SystemDrive\\%\\\\Users%'; true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\\\Users\\\\John' like r'%SystemDrive%\\\\Users%'; true > SELECT '%SystemDrive%/Users/John' like '/%SystemDrive/%//Users%' ESCAPE '/'; true Note: Use RLIKE to match with standard regular expressions. Since: 1.0.0 not \u00b6 not expr - Logical not. Examples: > SELECT not true; false > SELECT not false; true > SELECT not NULL; NULL Since: 1.0.0 or \u00b6 expr1 or expr2 - Logical OR. Examples: > SELECT true or false; true > SELECT false or false; false > SELECT true or NULL; true > SELECT false or NULL; NULL Since: 1.0.0 regexp \u00b6 regexp(str, regexp) - Returns true if str matches regexp , or false otherwise. Arguments: str - a string expression regexp - a string expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT regexp('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*'); true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT regexp('%SystemDrive%\\\\Users\\\\John', '%SystemDrive%\\\\\\\\Users.*'); true > SELECT regexp('%SystemDrive%\\\\Users\\\\John', r'%SystemDrive%\\\\Users.*'); true Note: Use LIKE to match with simple string pattern. Since: 3.2.0 regexp_like \u00b6 regexp_like(str, regexp) - Returns true if str matches regexp , or false otherwise. Arguments: str - a string expression regexp - a string expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT regexp_like('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*'); true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT regexp_like('%SystemDrive%\\\\Users\\\\John', '%SystemDrive%\\\\\\\\Users.*'); true > SELECT regexp_like('%SystemDrive%\\\\Users\\\\John', r'%SystemDrive%\\\\Users.*'); true Note: Use LIKE to match with simple string pattern. Since: 3.2.0 rlike \u00b6 rlike(str, regexp) - Returns true if str matches regexp , or false otherwise. Arguments: str - a string expression regexp - a string expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT rlike('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*'); true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT rlike('%SystemDrive%\\\\Users\\\\John', '%SystemDrive%\\\\\\\\Users.*'); true > SELECT rlike('%SystemDrive%\\\\Users\\\\John', r'%SystemDrive%\\\\Users.*'); true Note: Use LIKE to match with simple string pattern. Since: 1.0.0","title":"Predicate Functions"},{"location":"predicate-functions/#predicate-functions","text":"This page lists all predicate functions available in Spark SQL.","title":"Predicate Functions"},{"location":"predicate-functions/#_1","text":"! expr - Logical not. Examples: > SELECT ! true; false > SELECT ! false; true > SELECT ! NULL; NULL Since: 1.0.0","title":"!"},{"location":"predicate-functions/#_2","text":"expr1 != expr2 - Returns true if expr1 is not equal to expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 1 != 2; true > SELECT 1 != '2'; true > SELECT true != NULL; NULL > SELECT NULL != NULL; NULL Since: 1.0.0","title":"!="},{"location":"predicate-functions/#_3","text":"expr1 < expr2 - Returns true if expr1 is less than expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 1 < 2; true > SELECT 1.1 < '1'; false > SELECT to_date('2009-07-30 04:17:52') < to_date('2009-07-30 04:17:52'); false > SELECT to_date('2009-07-30 04:17:52') < to_date('2009-08-01 04:17:52'); true > SELECT 1 < NULL; NULL Since: 1.0.0","title":"&lt;"},{"location":"predicate-functions/#_4","text":"expr1 <= expr2 - Returns true if expr1 is less than or equal to expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 <= 2; true > SELECT 1.0 <= '1'; true > SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-07-30 04:17:52'); true > SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-08-01 04:17:52'); true > SELECT 1 <= NULL; NULL Since: 1.0.0","title":"&lt;="},{"location":"predicate-functions/#_5","text":"expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 <=> 2; true > SELECT 1 <=> '1'; true > SELECT true <=> NULL; false > SELECT NULL <=> NULL; true Since: 1.1.0","title":"&lt;=&gt;"},{"location":"predicate-functions/#_6","text":"expr1 != expr2 - Returns true if expr1 is not equal to expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 1 != 2; true > SELECT 1 != '2'; true > SELECT true != NULL; NULL > SELECT NULL != NULL; NULL Since: 1.0.0","title":"&lt;&gt;"},{"location":"predicate-functions/#_7","text":"expr1 = expr2 - Returns true if expr1 equals expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 = 2; true > SELECT 1 = '1'; true > SELECT true = NULL; NULL > SELECT NULL = NULL; NULL Since: 1.0.0","title":"="},{"location":"predicate-functions/#_8","text":"expr1 == expr2 - Returns true if expr1 equals expr2 , or false otherwise. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 == 2; true > SELECT 1 == '1'; true > SELECT true == NULL; NULL > SELECT NULL == NULL; NULL Since: 1.0.0","title":"=="},{"location":"predicate-functions/#_9","text":"expr1 > expr2 - Returns true if expr1 is greater than expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 > 1; true > SELECT 2 > 1.1; true > SELECT to_date('2009-07-30 04:17:52') > to_date('2009-07-30 04:17:52'); false > SELECT to_date('2009-07-30 04:17:52') > to_date('2009-08-01 04:17:52'); false > SELECT 1 > NULL; NULL Since: 1.0.0","title":"&gt;"},{"location":"predicate-functions/#_10","text":"expr1 >= expr2 - Returns true if expr1 is greater than or equal to expr2 . Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered. For example, map type is not orderable, so it is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT 2 >= 1; true > SELECT 2.0 >= '2.1'; false > SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-07-30 04:17:52'); true > SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-08-01 04:17:52'); false > SELECT 1 >= NULL; NULL Since: 1.0.0","title":"&gt;="},{"location":"predicate-functions/#and","text":"expr1 and expr2 - Logical AND. Examples: > SELECT true and true; true > SELECT true and false; false > SELECT true and NULL; NULL > SELECT false and NULL; false Since: 1.0.0","title":"and"},{"location":"predicate-functions/#equal_null","text":"equal_null(expr1, expr2) - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null. Arguments: expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples: > SELECT equal_null(3, 3); true > SELECT equal_null(1, '11'); false > SELECT equal_null(true, NULL); false > SELECT equal_null(NULL, 'abc'); false > SELECT equal_null(NULL, NULL); true Since: 3.4.0","title":"equal_null"},{"location":"predicate-functions/#ilike","text":"str ilike pattern[ ESCAPE escape] - Returns true if str matches pattern with escape case-insensitively, null if any arguments are null, false otherwise. Arguments: str - a string expression pattern - a string expression. The pattern is a string which is matched literally and case-insensitively, with exception to the following special symbols: _ matches any one character in the input (similar to . in posix regular expressions) % matches zero or more characters in the input (similar to .* in posix regular expressions) Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, in order to match \"\\abc\", the pattern should be \"\\abc\". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match \"\\abc\" should be \"\\abc\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. escape - an character added since Spark 3.0. The default escape character is the '\\'. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character. Examples: > SELECT ilike('Spark', '_Park'); true > SELECT '\\\\abc' AS S, S ilike r'\\\\abc', S ilike '\\\\\\\\abc'; \\abc true true > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT '%SystemDrive%\\Users\\John' ilike '\\%SystemDrive\\%\\\\users%'; true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\\\USERS\\\\John' ilike r'%SystemDrive%\\\\Users%'; true > SELECT '%SystemDrive%/Users/John' ilike '/%SYSTEMDrive/%//Users%' ESCAPE '/'; true Note: Use RLIKE to match with standard regular expressions. Since: 3.3.0","title":"ilike"},{"location":"predicate-functions/#in","text":"expr1 in(expr2, expr3, ...) - Returns true if expr equals to any valN. Arguments: expr1, expr2, expr3, ... - the arguments must be same type. Examples: > SELECT 1 in(1, 2, 3); true > SELECT 1 in(2, 3, 4); false > SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3)); false > SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3)); true Since: 1.0.0","title":"in"},{"location":"predicate-functions/#isnan","text":"isnan(expr) - Returns true if expr is NaN, or false otherwise. Examples: > SELECT isnan(cast('NaN' as double)); true Since: 1.5.0","title":"isnan"},{"location":"predicate-functions/#isnotnull","text":"isnotnull(expr) - Returns true if expr is not null, or false otherwise. Examples: > SELECT isnotnull(1); true Since: 1.0.0","title":"isnotnull"},{"location":"predicate-functions/#isnull","text":"isnull(expr) - Returns true if expr is null, or false otherwise. Examples: > SELECT isnull(1); false Since: 1.0.0","title":"isnull"},{"location":"predicate-functions/#like","text":"str like pattern[ ESCAPE escape] - Returns true if str matches pattern with escape , null if any arguments are null, false otherwise. Arguments: str - a string expression pattern - a string expression. The pattern is a string which is matched literally, with exception to the following special symbols: _ matches any one character in the input (similar to . in posix regular expressions)\\ % matches zero or more characters in the input (similar to .* in posix regular expressions) Since Spark 2.0, string literals are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, in order to match \"\\abc\", the pattern should be \"\\abc\". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it falls back to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match \"\\abc\" should be \"\\abc\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. escape - an character added since Spark 3.0. The default escape character is the '\\'. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character. Examples: > SELECT like('Spark', '_park'); true > SELECT '\\\\abc' AS S, S like r'\\\\abc', S like '\\\\\\\\abc'; \\abc true true > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT '%SystemDrive%\\Users\\John' like '\\%SystemDrive\\%\\\\Users%'; true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\\\Users\\\\John' like r'%SystemDrive%\\\\Users%'; true > SELECT '%SystemDrive%/Users/John' like '/%SystemDrive/%//Users%' ESCAPE '/'; true Note: Use RLIKE to match with standard regular expressions. Since: 1.0.0","title":"like"},{"location":"predicate-functions/#not","text":"not expr - Logical not. Examples: > SELECT not true; false > SELECT not false; true > SELECT not NULL; NULL Since: 1.0.0","title":"not"},{"location":"predicate-functions/#or","text":"expr1 or expr2 - Logical OR. Examples: > SELECT true or false; true > SELECT false or false; false > SELECT true or NULL; true > SELECT false or NULL; NULL Since: 1.0.0","title":"or"},{"location":"predicate-functions/#regexp","text":"regexp(str, regexp) - Returns true if str matches regexp , or false otherwise. Arguments: str - a string expression regexp - a string expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT regexp('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*'); true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT regexp('%SystemDrive%\\\\Users\\\\John', '%SystemDrive%\\\\\\\\Users.*'); true > SELECT regexp('%SystemDrive%\\\\Users\\\\John', r'%SystemDrive%\\\\Users.*'); true Note: Use LIKE to match with simple string pattern. Since: 3.2.0","title":"regexp"},{"location":"predicate-functions/#regexp_like","text":"regexp_like(str, regexp) - Returns true if str matches regexp , or false otherwise. Arguments: str - a string expression regexp - a string expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT regexp_like('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*'); true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT regexp_like('%SystemDrive%\\\\Users\\\\John', '%SystemDrive%\\\\\\\\Users.*'); true > SELECT regexp_like('%SystemDrive%\\\\Users\\\\John', r'%SystemDrive%\\\\Users.*'); true Note: Use LIKE to match with simple string pattern. Since: 3.2.0","title":"regexp_like"},{"location":"predicate-functions/#rlike","text":"rlike(str, regexp) - Returns true if str matches regexp , or false otherwise. Arguments: str - a string expression regexp - a string expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SET spark.sql.parser.escapedStringLiterals=true; spark.sql.parser.escapedStringLiterals true > SELECT rlike('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*'); true > SET spark.sql.parser.escapedStringLiterals=false; spark.sql.parser.escapedStringLiterals false > SELECT rlike('%SystemDrive%\\\\Users\\\\John', '%SystemDrive%\\\\\\\\Users.*'); true > SELECT rlike('%SystemDrive%\\\\Users\\\\John', r'%SystemDrive%\\\\Users.*'); true Note: Use LIKE to match with simple string pattern. Since: 1.0.0","title":"rlike"},{"location":"protobuf-functions/","text":"Protobuf Functions \u00b6 This page lists all protobuf functions available in Spark SQL. from_protobuf \u00b6 from_protobuf(data, messageName, descFilePath, options) - Converts a binary Protobuf value into a Catalyst value. Examples: > SELECT from_protobuf(s, 'Person', '/path/to/descriptor.desc', map()) IS NULL AS result FROM (SELECT NAMED_STRUCT('name', name, 'id', id) AS s FROM VALUES ('John Doe', 1), (NULL, 2) tab(name, id)); [false] Note: The specified Protobuf schema must match actual schema of the read data, otherwise the behavior is undefined: it may fail or return arbitrary result. To deserialize the data with a compatible and evolved schema, the expected Protobuf schema can be set via the corresponding option. Since: 4.0.0 to_protobuf \u00b6 to_protobuf(child, messageName, descFilePath, options) - Converts a Catalyst binary input value into its corresponding Protobuf format result. Examples: > SELECT to_protobuf(s, 'Person', '/path/to/descriptor.desc', map('emitDefaultValues', 'true')) IS NULL FROM (SELECT NULL AS s); [true] Since: 4.0.0","title":"Protobuf Functions"},{"location":"protobuf-functions/#protobuf-functions","text":"This page lists all protobuf functions available in Spark SQL.","title":"Protobuf Functions"},{"location":"protobuf-functions/#from_protobuf","text":"from_protobuf(data, messageName, descFilePath, options) - Converts a binary Protobuf value into a Catalyst value. Examples: > SELECT from_protobuf(s, 'Person', '/path/to/descriptor.desc', map()) IS NULL AS result FROM (SELECT NAMED_STRUCT('name', name, 'id', id) AS s FROM VALUES ('John Doe', 1), (NULL, 2) tab(name, id)); [false] Note: The specified Protobuf schema must match actual schema of the read data, otherwise the behavior is undefined: it may fail or return arbitrary result. To deserialize the data with a compatible and evolved schema, the expected Protobuf schema can be set via the corresponding option. Since: 4.0.0","title":"from_protobuf"},{"location":"protobuf-functions/#to_protobuf","text":"to_protobuf(child, messageName, descFilePath, options) - Converts a Catalyst binary input value into its corresponding Protobuf format result. Examples: > SELECT to_protobuf(s, 'Person', '/path/to/descriptor.desc', map('emitDefaultValues', 'true')) IS NULL FROM (SELECT NULL AS s); [true] Since: 4.0.0","title":"to_protobuf"},{"location":"sketch-functions/","text":"Sketch Functions \u00b6 This page lists all sketch functions available in Spark SQL. approx_top_k_estimate \u00b6 approx_top_k_estimate(state, k) - Returns top k items with their frequency. k An optional INTEGER literal greater than 0. If k is not specified, it defaults to 5. Examples: > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' tab(expr); [{\"item\":\"c\",\"count\":4},{\"item\":\"d\",\"count\":2}] Since: 4.1.0 hll_sketch_estimate \u00b6 hll_sketch_estimate(expr) - Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch. Examples: > SELECT hll_sketch_estimate(hll_sketch_agg(col)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 3.5.0 hll_union \u00b6 hll_union(first, second, allowDifferentLgConfigK) - Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Set allowDifferentLgConfigK to true to allow unions of sketches with different lgConfigK values (defaults to false). Examples: > SELECT hll_sketch_estimate(hll_union(hll_sketch_agg(col1), hll_sketch_agg(col2))) FROM VALUES (1, 4), (1, 4), (2, 5), (2, 5), (3, 6) tab(col1, col2); 6 Since: 3.5.0 kll_sketch_get_n_bigint \u00b6 kll_sketch_get_n_bigint(expr) - Returns the number of items collected in the sketch. Examples: > SELECT kll_sketch_get_n_bigint(kll_sketch_agg_bigint(col)) FROM VALUES (1), (2), (3), (4), (5) tab(col); 5 Since: 4.1.0 kll_sketch_get_n_double \u00b6 kll_sketch_get_n_double(expr) - Returns the number of items collected in the sketch. Examples: > SELECT kll_sketch_get_n_double(kll_sketch_agg_double(col)) FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); 5 Since: 4.1.0 kll_sketch_get_n_float \u00b6 kll_sketch_get_n_float(expr) - Returns the number of items collected in the sketch. Examples: > SELECT kll_sketch_get_n_float(kll_sketch_agg_float(col)) FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); 5 Since: 4.1.0 kll_sketch_get_quantile_bigint \u00b6 kll_sketch_get_quantile_bigint(left, right) - Extracts a single value from the quantiles sketch representing the desired quantile given the input rank. The desired quantile can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_quantile_bigint(kll_sketch_agg_bigint(col), 0.5) > 1 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0 kll_sketch_get_quantile_double \u00b6 kll_sketch_get_quantile_double(left, right) - Extracts a single value from the quantiles sketch representing the desired quantile given the input rank. The desired quantile can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_quantile_double(kll_sketch_agg_double(col), 0.5) > 1 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0 kll_sketch_get_quantile_float \u00b6 kll_sketch_get_quantile_float(left, right) - Extracts a single value from the quantiles sketch representing the desired quantile given the input rank. The desired quantile can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_quantile_float(kll_sketch_agg_float(col), 0.5) > 1 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0 kll_sketch_get_rank_bigint \u00b6 kll_sketch_get_rank_bigint(left, right) - Extracts a single value from the quantiles sketch representing the desired rank given the input quantile. The desired rank can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_rank_bigint(kll_sketch_agg_bigint(col), 3) > 0.3 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0 kll_sketch_get_rank_double \u00b6 kll_sketch_get_rank_double(left, right) - Extracts a single value from the quantiles sketch representing the desired rank given the input quantile. The desired rank can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_rank_double(kll_sketch_agg_double(col), 3.0) > 0.3 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0 kll_sketch_get_rank_float \u00b6 kll_sketch_get_rank_float(left, right) - Extracts a single value from the quantiles sketch representing the desired rank given the input quantile. The desired rank can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_rank_float(kll_sketch_agg_float(col), 3.0) > 0.3 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0 kll_sketch_merge_bigint \u00b6 kll_sketch_merge_bigint(left, right) - Merges two sketch buffers together into one. Examples: > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_merge_bigint(kll_sketch_agg_bigint(col), kll_sketch_agg_bigint(col)))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0 kll_sketch_merge_double \u00b6 kll_sketch_merge_double(left, right) - Merges two sketch buffers together into one. Examples: > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_merge_double(kll_sketch_agg_double(col), kll_sketch_agg_double(col)))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0 kll_sketch_merge_float \u00b6 kll_sketch_merge_float(left, right) - Merges two sketch buffers together into one. Examples: > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_merge_float(kll_sketch_agg_float(col), kll_sketch_agg_float(col)))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0 kll_sketch_to_string_bigint \u00b6 kll_sketch_to_string_bigint(expr) - Returns human readable summary information about this sketch. Examples: > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0 kll_sketch_to_string_double \u00b6 kll_sketch_to_string_double(expr) - Returns human readable summary information about this sketch. Examples: > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0 kll_sketch_to_string_float \u00b6 kll_sketch_to_string_float(expr) - Returns human readable summary information about this sketch. Examples: > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0 theta_difference \u00b6 theta_difference(first, second) - Subtracts two binary representations of Datasketches ThetaSketch objects from two input columns using a ThetaSketch AnotB object. Examples: > SELECT theta_sketch_estimate(theta_difference(theta_sketch_agg(col1), theta_sketch_agg(col2))) FROM VALUES (5, 4), (1, 4), (2, 5), (2, 5), (3, 1) tab(col1, col2); 2 Since: 4.1.0 theta_intersection \u00b6 theta_intersection(first, second) - Intersects two binary representations of Datasketches ThetaSketch objects from two input columns using a ThetaSketch Intersect object. Examples: > SELECT theta_sketch_estimate(theta_intersection(theta_sketch_agg(col1), theta_sketch_agg(col2))) FROM VALUES (5, 4), (1, 4), (2, 5), (2, 5), (3, 1) tab(col1, col2); 2 Since: 4.1.0 theta_sketch_estimate \u00b6 theta_sketch_estimate(expr) - Returns the estimated number of unique values given the binary representation of a Datasketches ThetaSketch. Examples: > SELECT theta_sketch_estimate(theta_sketch_agg(col)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 4.1.0 theta_union \u00b6 theta_union(first, second, lgNomEntries) - Merges two binary representations of Datasketches ThetaSketch objects using a ThetaSketch Union object. Users can set lgNomEntries to a value between 4 and 26 to find the union of sketches with different union buffer size values (defaults to 12). Examples: > SELECT theta_sketch_estimate(theta_union(theta_sketch_agg(col1), theta_sketch_agg(col2))) FROM VALUES (1, 4), (1, 4), (2, 5), (2, 5), (3, 6) tab(col1, col2); 6 Since: 4.1.0 tuple_difference_double \u00b6 tuple_difference_double(tupleSketch1, tupleSketch2) - Subtracts two binary representations of Datasketches TupleSketch objects with double summary data type using a TupleSketch AnotB object. Returns elements in the first sketch that are not in the second sketch. Examples: > SELECT tuple_sketch_estimate_double(tuple_difference_double(tuple_sketch_agg_double(col1, val1), tuple_sketch_agg_double(col2, val2))) FROM VALUES (5, 5.0D, 4, 4.0D), (1, 1.0D, 4, 4.0D), (2, 2.0D, 5, 5.0D), (3, 3.0D, 1, 1.0D) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0 tuple_difference_integer \u00b6 tuple_difference_integer(tupleSketch1, tupleSketch2) - Subtracts two binary representations of Datasketches TupleSketch objects with integer summary data type using a TupleSketch AnotB object. Returns elements in the first sketch that are not in the second sketch. Examples: > SELECT tuple_sketch_estimate_integer(tuple_difference_integer(tuple_sketch_agg_integer(col1, val1), tuple_sketch_agg_integer(col2, val2))) FROM VALUES (5, 5, 4, 4), (1, 1, 4, 4), (2, 2, 5, 5), (3, 3, 1, 1) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0 tuple_difference_theta_double \u00b6 tuple_difference_theta_double(tupleSketch, thetaSketch) - Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with double summary data type using a TupleSketch AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch. Examples: > SELECT tuple_sketch_estimate_double(tuple_difference_theta_double(tuple_sketch_agg_double(col1, val1), theta_sketch_agg(col2))) FROM VALUES (5, 5.0D, 4), (1, 1.0D, 4), (2, 2.0D, 5), (3, 3.0D, 1) tab(col1, val1, col2); 2.0 Since: 4.2.0 tuple_difference_theta_integer \u00b6 tuple_difference_theta_integer(tupleSketch, thetaSketch) - Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with integer summary data type using a TupleSketch AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch. Examples: > SELECT tuple_sketch_estimate_integer(tuple_difference_theta_integer(tuple_sketch_agg_integer(col1, val1), theta_sketch_agg(col2))) FROM VALUES (5, 5, 4), (1, 1, 4), (2, 2, 5), (3, 3, 1) tab(col1, val1, col2); 2.0 Since: 4.2.0 tuple_intersection_double \u00b6 tuple_intersection_double(tupleSketch1, tupleSketch2, mode) - Intersects two binary representations of Datasketches TupleSketch objects with double summary data type using a TupleSketch Intersection object. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_intersection_double(tuple_sketch_agg_double(col1, val1), tuple_sketch_agg_double(col2, val2))) FROM VALUES (1, 1.0D, 1, 4.0D), (2, 2.0D, 2, 5.0D), (3, 3.0D, 4, 6.0D) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0 tuple_intersection_integer \u00b6 tuple_intersection_integer(tupleSketch1, tupleSketch2, mode) - Intersects two binary representations of Datasketches TupleSketch objects with integer summary data type using a TupleSketch Intersection object. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_intersection_integer(tuple_sketch_agg_integer(col1, val1), tuple_sketch_agg_integer(col2, val2))) FROM VALUES (1, 1, 1, 4), (2, 2, 2, 5), (3, 3, 4, 6) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0 tuple_intersection_theta_double \u00b6 tuple_intersection_theta_double(tupleSketch, thetaSketch, mode) - Intersects the binary representation of a Datasketches TupleSketch with double summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Intersection object. The ThetaSketch entries are assigned a default double summary value based on the mode: 0.0 for 'sum' mode, +Infinity for 'min' mode, -Infinity for 'max' mode, or 1.0 for 'alwaysone' mode. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_intersection_theta_double(tuple_sketch_agg_double(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1.0D, 1), (2, 2.0D, 2), (3, 3.0D, 4) tab(col1, val1, col2); 2.0 Since: 4.2.0 tuple_intersection_theta_integer \u00b6 tuple_intersection_theta_integer(tupleSketch, thetaSketch, mode) - Intersects the binary representation of a Datasketches TupleSketch with integer summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Intersection object. The ThetaSketch entries are assigned a default integer summary value based on the mode: 0 for 'sum' mode, Integer.MAX_VALUE for 'min' mode, Integer.MIN_VALUE for 'max' mode, or 1 for 'alwaysone' mode. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_intersection_theta_integer(tuple_sketch_agg_integer(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1, 1), (2, 2, 2), (3, 3, 4) tab(col1, val1, col2); 2.0 Since: 4.2.0 tuple_sketch_estimate_double \u00b6 tuple_sketch_estimate_double(child) - Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch. The sketch's summary type must be a double. Examples: > SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (1, 2.0D), (2, 3.0D) tab(key, summary); 2.0 Since: 4.2.0 tuple_sketch_estimate_integer \u00b6 tuple_sketch_estimate_integer(child) - Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch. The sketch's summary type must be an integer. Examples: > SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (1, 2), (2, 3) tab(key, summary); 2.0 Since: 4.2.0 tuple_sketch_summary_double \u00b6 tuple_sketch_summary_double(child, mode) - Aggregates the summary values from a double summary type Datasketches TupleSketch. The mode can be 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_summary_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (1, 2.0D), (2, 3.0D) tab(key, summary); 6.0 Since: 4.2.0 tuple_sketch_summary_integer \u00b6 tuple_sketch_summary_integer(child, mode) - Aggregates the summary values from a integer summary type Datasketches TupleSketch. The mode can be 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_summary_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (1, 2), (2, 3) tab(key, summary); 6 Since: 4.2.0 tuple_sketch_theta_double \u00b6 tuple_sketch_theta_double(child) - Returns the theta value (sampling rate) from a Datasketches TupleSketch. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0. The sketch's summary type must be a double. Examples: > SELECT tuple_sketch_theta_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (2, 2.0D), (3, 3.0D) tab(key, summary); 1.0 Since: 4.2.0 tuple_sketch_theta_integer \u00b6 tuple_sketch_theta_integer(child) - Returns the theta value (sampling rate) from a Datasketches TupleSketch. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0. The sketch's summary type must be an integer. Examples: > SELECT tuple_sketch_theta_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (2, 2), (3, 3) tab(key, summary); 1.0 Since: 4.2.0 tuple_union_double \u00b6 tuple_union_double(tupleSketch1, tupleSketch2, lgNomEntries, mode) - Merges two binary representations of Datasketches TupleSketch objects with double summary data type using a TupleSketch Union object. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_union_double(tuple_sketch_agg_double(col1, val1), tuple_sketch_agg_double(col2, val2))) FROM VALUES (1, 1.0D, 4, 4.0D), (2, 2.0D, 5, 5.0D), (3, 3.0D, 6, 6.0D) tab(col1, val1, col2, val2); 6.0 Since: 4.2.0 tuple_union_integer \u00b6 tuple_union_integer(tupleSketch1, tupleSketch2, lgNomEntries, mode) - Merges two binary representations of Datasketches TupleSketch objects with integer summary data type using a TupleSketch Union object. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_union_integer(tuple_sketch_agg_integer(col1, val1), tuple_sketch_agg_integer(col2, val2))) FROM VALUES (1, 1, 4, 4), (2, 2, 5, 5), (3, 3, 6, 6) tab(col1, val1, col2, val2); 6.0 Since: 4.2.0 tuple_union_theta_double \u00b6 tuple_union_theta_double(tupleSketch, thetaSketch, lgNomEntries, mode) - Merges the binary representation of a Datasketches TupleSketch with double summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Union object. The ThetaSketch entries are assigned a default double summary value based on the mode: 0.0 for 'sum' mode, +Infinity for 'min' mode, -Infinity for 'max' mode, or 1.0 for 'alwaysone' mode. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_union_theta_double(tuple_sketch_agg_double(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1.0D, 4), (2, 2.0D, 5), (3, 3.0D, 6) tab(col1, val1, col2); 6.0 Since: 4.2.0 tuple_union_theta_integer \u00b6 tuple_union_theta_integer(tupleSketch, thetaSketch, lgNomEntries, mode) - Merges the binary representation of a Datasketches TupleSketch with integer summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Union object. The ThetaSketch entries are assigned a default integer summary value based on the mode: 0 for 'sum' mode, Integer.MAX_VALUE for 'min' mode, Integer.MIN_VALUE for 'max' mode, or 1 for 'alwaysone' mode. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_union_theta_integer(tuple_sketch_agg_integer(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1, 4), (2, 2, 5), (3, 3, 6) tab(col1, val1, col2); 6.0 Since: 4.2.0","title":"Sketch Functions"},{"location":"sketch-functions/#sketch-functions","text":"This page lists all sketch functions available in Spark SQL.","title":"Sketch Functions"},{"location":"sketch-functions/#approx_top_k_estimate","text":"approx_top_k_estimate(state, k) - Returns top k items with their frequency. k An optional INTEGER literal greater than 0. If k is not specified, it defaults to 5. Examples: > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); [{\"item\":0,\"count\":2},{\"item\":4,\"count\":2},{\"item\":1,\"count\":2},{\"item\":2,\"count\":1},{\"item\":3,\"count\":1}] > SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' tab(expr); [{\"item\":\"c\",\"count\":4},{\"item\":\"d\",\"count\":2}] Since: 4.1.0","title":"approx_top_k_estimate"},{"location":"sketch-functions/#hll_sketch_estimate","text":"hll_sketch_estimate(expr) - Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch. Examples: > SELECT hll_sketch_estimate(hll_sketch_agg(col)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 3.5.0","title":"hll_sketch_estimate"},{"location":"sketch-functions/#hll_union","text":"hll_union(first, second, allowDifferentLgConfigK) - Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Set allowDifferentLgConfigK to true to allow unions of sketches with different lgConfigK values (defaults to false). Examples: > SELECT hll_sketch_estimate(hll_union(hll_sketch_agg(col1), hll_sketch_agg(col2))) FROM VALUES (1, 4), (1, 4), (2, 5), (2, 5), (3, 6) tab(col1, col2); 6 Since: 3.5.0","title":"hll_union"},{"location":"sketch-functions/#kll_sketch_get_n_bigint","text":"kll_sketch_get_n_bigint(expr) - Returns the number of items collected in the sketch. Examples: > SELECT kll_sketch_get_n_bigint(kll_sketch_agg_bigint(col)) FROM VALUES (1), (2), (3), (4), (5) tab(col); 5 Since: 4.1.0","title":"kll_sketch_get_n_bigint"},{"location":"sketch-functions/#kll_sketch_get_n_double","text":"kll_sketch_get_n_double(expr) - Returns the number of items collected in the sketch. Examples: > SELECT kll_sketch_get_n_double(kll_sketch_agg_double(col)) FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); 5 Since: 4.1.0","title":"kll_sketch_get_n_double"},{"location":"sketch-functions/#kll_sketch_get_n_float","text":"kll_sketch_get_n_float(expr) - Returns the number of items collected in the sketch. Examples: > SELECT kll_sketch_get_n_float(kll_sketch_agg_float(col)) FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); 5 Since: 4.1.0","title":"kll_sketch_get_n_float"},{"location":"sketch-functions/#kll_sketch_get_quantile_bigint","text":"kll_sketch_get_quantile_bigint(left, right) - Extracts a single value from the quantiles sketch representing the desired quantile given the input rank. The desired quantile can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_quantile_bigint(kll_sketch_agg_bigint(col), 0.5) > 1 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0","title":"kll_sketch_get_quantile_bigint"},{"location":"sketch-functions/#kll_sketch_get_quantile_double","text":"kll_sketch_get_quantile_double(left, right) - Extracts a single value from the quantiles sketch representing the desired quantile given the input rank. The desired quantile can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_quantile_double(kll_sketch_agg_double(col), 0.5) > 1 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0","title":"kll_sketch_get_quantile_double"},{"location":"sketch-functions/#kll_sketch_get_quantile_float","text":"kll_sketch_get_quantile_float(left, right) - Extracts a single value from the quantiles sketch representing the desired quantile given the input rank. The desired quantile can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_quantile_float(kll_sketch_agg_float(col), 0.5) > 1 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0","title":"kll_sketch_get_quantile_float"},{"location":"sketch-functions/#kll_sketch_get_rank_bigint","text":"kll_sketch_get_rank_bigint(left, right) - Extracts a single value from the quantiles sketch representing the desired rank given the input quantile. The desired rank can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_rank_bigint(kll_sketch_agg_bigint(col), 3) > 0.3 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0","title":"kll_sketch_get_rank_bigint"},{"location":"sketch-functions/#kll_sketch_get_rank_double","text":"kll_sketch_get_rank_double(left, right) - Extracts a single value from the quantiles sketch representing the desired rank given the input quantile. The desired rank can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_rank_double(kll_sketch_agg_double(col), 3.0) > 0.3 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0","title":"kll_sketch_get_rank_double"},{"location":"sketch-functions/#kll_sketch_get_rank_float","text":"kll_sketch_get_rank_float(left, right) - Extracts a single value from the quantiles sketch representing the desired rank given the input quantile. The desired rank can either be a single value or an array. In the latter case, the function will return an array of results of equal length to the input array. Examples: > SELECT kll_sketch_get_rank_float(kll_sketch_agg_float(col), 3.0) > 0.3 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0","title":"kll_sketch_get_rank_float"},{"location":"sketch-functions/#kll_sketch_merge_bigint","text":"kll_sketch_merge_bigint(left, right) - Merges two sketch buffers together into one. Examples: > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_merge_bigint(kll_sketch_agg_bigint(col), kll_sketch_agg_bigint(col)))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0","title":"kll_sketch_merge_bigint"},{"location":"sketch-functions/#kll_sketch_merge_double","text":"kll_sketch_merge_double(left, right) - Merges two sketch buffers together into one. Examples: > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_merge_double(kll_sketch_agg_double(col), kll_sketch_agg_double(col)))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0","title":"kll_sketch_merge_double"},{"location":"sketch-functions/#kll_sketch_merge_float","text":"kll_sketch_merge_float(left, right) - Merges two sketch buffers together into one. Examples: > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_merge_float(kll_sketch_agg_float(col), kll_sketch_agg_float(col)))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0","title":"kll_sketch_merge_float"},{"location":"sketch-functions/#kll_sketch_to_string_bigint","text":"kll_sketch_to_string_bigint(expr) - Returns human readable summary information about this sketch. Examples: > SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col); true Since: 4.1.0","title":"kll_sketch_to_string_bigint"},{"location":"sketch-functions/#kll_sketch_to_string_double","text":"kll_sketch_to_string_double(expr) - Returns human readable summary information about this sketch. Examples: > SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); true Since: 4.1.0","title":"kll_sketch_to_string_double"},{"location":"sketch-functions/#kll_sketch_to_string_float","text":"kll_sketch_to_string_float(expr) - Returns human readable summary information about this sketch. Examples: > SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); true Since: 4.1.0","title":"kll_sketch_to_string_float"},{"location":"sketch-functions/#theta_difference","text":"theta_difference(first, second) - Subtracts two binary representations of Datasketches ThetaSketch objects from two input columns using a ThetaSketch AnotB object. Examples: > SELECT theta_sketch_estimate(theta_difference(theta_sketch_agg(col1), theta_sketch_agg(col2))) FROM VALUES (5, 4), (1, 4), (2, 5), (2, 5), (3, 1) tab(col1, col2); 2 Since: 4.1.0","title":"theta_difference"},{"location":"sketch-functions/#theta_intersection","text":"theta_intersection(first, second) - Intersects two binary representations of Datasketches ThetaSketch objects from two input columns using a ThetaSketch Intersect object. Examples: > SELECT theta_sketch_estimate(theta_intersection(theta_sketch_agg(col1), theta_sketch_agg(col2))) FROM VALUES (5, 4), (1, 4), (2, 5), (2, 5), (3, 1) tab(col1, col2); 2 Since: 4.1.0","title":"theta_intersection"},{"location":"sketch-functions/#theta_sketch_estimate","text":"theta_sketch_estimate(expr) - Returns the estimated number of unique values given the binary representation of a Datasketches ThetaSketch. Examples: > SELECT theta_sketch_estimate(theta_sketch_agg(col)) FROM VALUES (1), (1), (2), (2), (3) tab(col); 3 Since: 4.1.0","title":"theta_sketch_estimate"},{"location":"sketch-functions/#theta_union","text":"theta_union(first, second, lgNomEntries) - Merges two binary representations of Datasketches ThetaSketch objects using a ThetaSketch Union object. Users can set lgNomEntries to a value between 4 and 26 to find the union of sketches with different union buffer size values (defaults to 12). Examples: > SELECT theta_sketch_estimate(theta_union(theta_sketch_agg(col1), theta_sketch_agg(col2))) FROM VALUES (1, 4), (1, 4), (2, 5), (2, 5), (3, 6) tab(col1, col2); 6 Since: 4.1.0","title":"theta_union"},{"location":"sketch-functions/#tuple_difference_double","text":"tuple_difference_double(tupleSketch1, tupleSketch2) - Subtracts two binary representations of Datasketches TupleSketch objects with double summary data type using a TupleSketch AnotB object. Returns elements in the first sketch that are not in the second sketch. Examples: > SELECT tuple_sketch_estimate_double(tuple_difference_double(tuple_sketch_agg_double(col1, val1), tuple_sketch_agg_double(col2, val2))) FROM VALUES (5, 5.0D, 4, 4.0D), (1, 1.0D, 4, 4.0D), (2, 2.0D, 5, 5.0D), (3, 3.0D, 1, 1.0D) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0","title":"tuple_difference_double"},{"location":"sketch-functions/#tuple_difference_integer","text":"tuple_difference_integer(tupleSketch1, tupleSketch2) - Subtracts two binary representations of Datasketches TupleSketch objects with integer summary data type using a TupleSketch AnotB object. Returns elements in the first sketch that are not in the second sketch. Examples: > SELECT tuple_sketch_estimate_integer(tuple_difference_integer(tuple_sketch_agg_integer(col1, val1), tuple_sketch_agg_integer(col2, val2))) FROM VALUES (5, 5, 4, 4), (1, 1, 4, 4), (2, 2, 5, 5), (3, 3, 1, 1) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0","title":"tuple_difference_integer"},{"location":"sketch-functions/#tuple_difference_theta_double","text":"tuple_difference_theta_double(tupleSketch, thetaSketch) - Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with double summary data type using a TupleSketch AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch. Examples: > SELECT tuple_sketch_estimate_double(tuple_difference_theta_double(tuple_sketch_agg_double(col1, val1), theta_sketch_agg(col2))) FROM VALUES (5, 5.0D, 4), (1, 1.0D, 4), (2, 2.0D, 5), (3, 3.0D, 1) tab(col1, val1, col2); 2.0 Since: 4.2.0","title":"tuple_difference_theta_double"},{"location":"sketch-functions/#tuple_difference_theta_integer","text":"tuple_difference_theta_integer(tupleSketch, thetaSketch) - Subtracts the binary representation of a Datasketches ThetaSketch from a TupleSketch with integer summary data type using a TupleSketch AnotB object. Returns elements in the TupleSketch that are not in the ThetaSketch. Examples: > SELECT tuple_sketch_estimate_integer(tuple_difference_theta_integer(tuple_sketch_agg_integer(col1, val1), theta_sketch_agg(col2))) FROM VALUES (5, 5, 4), (1, 1, 4), (2, 2, 5), (3, 3, 1) tab(col1, val1, col2); 2.0 Since: 4.2.0","title":"tuple_difference_theta_integer"},{"location":"sketch-functions/#tuple_intersection_double","text":"tuple_intersection_double(tupleSketch1, tupleSketch2, mode) - Intersects two binary representations of Datasketches TupleSketch objects with double summary data type using a TupleSketch Intersection object. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_intersection_double(tuple_sketch_agg_double(col1, val1), tuple_sketch_agg_double(col2, val2))) FROM VALUES (1, 1.0D, 1, 4.0D), (2, 2.0D, 2, 5.0D), (3, 3.0D, 4, 6.0D) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0","title":"tuple_intersection_double"},{"location":"sketch-functions/#tuple_intersection_integer","text":"tuple_intersection_integer(tupleSketch1, tupleSketch2, mode) - Intersects two binary representations of Datasketches TupleSketch objects with integer summary data type using a TupleSketch Intersection object. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_intersection_integer(tuple_sketch_agg_integer(col1, val1), tuple_sketch_agg_integer(col2, val2))) FROM VALUES (1, 1, 1, 4), (2, 2, 2, 5), (3, 3, 4, 6) tab(col1, val1, col2, val2); 2.0 Since: 4.2.0","title":"tuple_intersection_integer"},{"location":"sketch-functions/#tuple_intersection_theta_double","text":"tuple_intersection_theta_double(tupleSketch, thetaSketch, mode) - Intersects the binary representation of a Datasketches TupleSketch with double summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Intersection object. The ThetaSketch entries are assigned a default double summary value based on the mode: 0.0 for 'sum' mode, +Infinity for 'min' mode, -Infinity for 'max' mode, or 1.0 for 'alwaysone' mode. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_intersection_theta_double(tuple_sketch_agg_double(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1.0D, 1), (2, 2.0D, 2), (3, 3.0D, 4) tab(col1, val1, col2); 2.0 Since: 4.2.0","title":"tuple_intersection_theta_double"},{"location":"sketch-functions/#tuple_intersection_theta_integer","text":"tuple_intersection_theta_integer(tupleSketch, thetaSketch, mode) - Intersects the binary representation of a Datasketches TupleSketch with integer summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Intersection object. The ThetaSketch entries are assigned a default integer summary value based on the mode: 0 for 'sum' mode, Integer.MAX_VALUE for 'min' mode, Integer.MIN_VALUE for 'max' mode, or 1 for 'alwaysone' mode. Users can set mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_intersection_theta_integer(tuple_sketch_agg_integer(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1, 1), (2, 2, 2), (3, 3, 4) tab(col1, val1, col2); 2.0 Since: 4.2.0","title":"tuple_intersection_theta_integer"},{"location":"sketch-functions/#tuple_sketch_estimate_double","text":"tuple_sketch_estimate_double(child) - Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch. The sketch's summary type must be a double. Examples: > SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (1, 2.0D), (2, 3.0D) tab(key, summary); 2.0 Since: 4.2.0","title":"tuple_sketch_estimate_double"},{"location":"sketch-functions/#tuple_sketch_estimate_integer","text":"tuple_sketch_estimate_integer(child) - Returns the estimated number of unique values given the binary representation of a Datasketches TupleSketch. The sketch's summary type must be an integer. Examples: > SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (1, 2), (2, 3) tab(key, summary); 2.0 Since: 4.2.0","title":"tuple_sketch_estimate_integer"},{"location":"sketch-functions/#tuple_sketch_summary_double","text":"tuple_sketch_summary_double(child, mode) - Aggregates the summary values from a double summary type Datasketches TupleSketch. The mode can be 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_summary_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (1, 2.0D), (2, 3.0D) tab(key, summary); 6.0 Since: 4.2.0","title":"tuple_sketch_summary_double"},{"location":"sketch-functions/#tuple_sketch_summary_integer","text":"tuple_sketch_summary_integer(child, mode) - Aggregates the summary values from a integer summary type Datasketches TupleSketch. The mode can be 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_summary_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (1, 2), (2, 3) tab(key, summary); 6 Since: 4.2.0","title":"tuple_sketch_summary_integer"},{"location":"sketch-functions/#tuple_sketch_theta_double","text":"tuple_sketch_theta_double(child) - Returns the theta value (sampling rate) from a Datasketches TupleSketch. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0. The sketch's summary type must be a double. Examples: > SELECT tuple_sketch_theta_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (2, 2.0D), (3, 3.0D) tab(key, summary); 1.0 Since: 4.2.0","title":"tuple_sketch_theta_double"},{"location":"sketch-functions/#tuple_sketch_theta_integer","text":"tuple_sketch_theta_integer(child) - Returns the theta value (sampling rate) from a Datasketches TupleSketch. The theta value represents the effective sampling rate of the sketch, between 0.0 and 1.0. The sketch's summary type must be an integer. Examples: > SELECT tuple_sketch_theta_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (2, 2), (3, 3) tab(key, summary); 1.0 Since: 4.2.0","title":"tuple_sketch_theta_integer"},{"location":"sketch-functions/#tuple_union_double","text":"tuple_union_double(tupleSketch1, tupleSketch2, lgNomEntries, mode) - Merges two binary representations of Datasketches TupleSketch objects with double summary data type using a TupleSketch Union object. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_union_double(tuple_sketch_agg_double(col1, val1), tuple_sketch_agg_double(col2, val2))) FROM VALUES (1, 1.0D, 4, 4.0D), (2, 2.0D, 5, 5.0D), (3, 3.0D, 6, 6.0D) tab(col1, val1, col2, val2); 6.0 Since: 4.2.0","title":"tuple_union_double"},{"location":"sketch-functions/#tuple_union_integer","text":"tuple_union_integer(tupleSketch1, tupleSketch2, lgNomEntries, mode) - Merges two binary representations of Datasketches TupleSketch objects with integer summary data type using a TupleSketch Union object. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_union_integer(tuple_sketch_agg_integer(col1, val1), tuple_sketch_agg_integer(col2, val2))) FROM VALUES (1, 1, 4, 4), (2, 2, 5, 5), (3, 3, 6, 6) tab(col1, val1, col2, val2); 6.0 Since: 4.2.0","title":"tuple_union_integer"},{"location":"sketch-functions/#tuple_union_theta_double","text":"tuple_union_theta_double(tupleSketch, thetaSketch, lgNomEntries, mode) - Merges the binary representation of a Datasketches TupleSketch with double summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Union object. The ThetaSketch entries are assigned a default double summary value based on the mode: 0.0 for 'sum' mode, +Infinity for 'min' mode, -Infinity for 'max' mode, or 1.0 for 'alwaysone' mode. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_double(tuple_union_theta_double(tuple_sketch_agg_double(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1.0D, 4), (2, 2.0D, 5), (3, 3.0D, 6) tab(col1, val1, col2); 6.0 Since: 4.2.0","title":"tuple_union_theta_double"},{"location":"sketch-functions/#tuple_union_theta_integer","text":"tuple_union_theta_integer(tupleSketch, thetaSketch, lgNomEntries, mode) - Merges the binary representation of a Datasketches TupleSketch with integer summary data type with the binary representation of a Datasketches ThetaSketch using a TupleSketch Union object. The ThetaSketch entries are assigned a default integer summary value based on the mode: 0 for 'sum' mode, Integer.MAX_VALUE for 'min' mode, Integer.MIN_VALUE for 'max' mode, or 1 for 'alwaysone' mode. Users can set lgNomEntries to a value between 4 and 26 (defaults to 12) and mode to 'sum', 'min', 'max', or 'alwaysone' (defaults to 'sum'). Examples: > SELECT tuple_sketch_estimate_integer(tuple_union_theta_integer(tuple_sketch_agg_integer(col1, val1), theta_sketch_agg(col2))) FROM VALUES (1, 1, 4), (2, 2, 5), (3, 3, 6) tab(col1, val1, col2); 6.0 Since: 4.2.0","title":"tuple_union_theta_integer"},{"location":"st-functions/","text":"St Functions \u00b6 This page lists all st functions available in Spark SQL. st_asbinary \u00b6 st_asbinary(geo) - Returns the geospatial value (value of type GEOGRAPHY or GEOMETRY) in WKB format. Arguments: geo - A geospatial value, either a GEOGRAPHY or a GEOMETRY. Examples: > SELECT hex(st_asbinary(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 > SELECT hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 Since: 4.1.0 st_geogfromwkb \u00b6 st_geogfromwkb(wkb) - Parses the WKB description of a geography and returns the corresponding GEOGRAPHY value. Arguments: wkb - A BINARY value in WKB format, representing a GEOGRAPHY value. Examples: > SELECT hex(st_asbinary(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 Since: 4.1.0 st_geomfromwkb \u00b6 st_geomfromwkb(wkb[, srid]) - Parses the WKB description of a geometry and returns the corresponding GEOMETRY value. Arguments: wkb - A BINARY value in WKB format, representing a GEOMETRY value. srid - The optional SRID value of the geometry. Default is 0. Examples: > SELECT hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 > SELECT st_srid(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040')); 0 > SELECT hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040', 4326))); 0101000000000000000000F03F0000000000000040 > SELECT st_srid(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040', 4326)); 4326 Since: 4.1.0 st_setsrid \u00b6 st_setsrid(geo, srid) - Returns a new GEOGRAPHY or GEOMETRY value whose SRID is the specified SRID value. Arguments: geo - A GEOGRAPHY or GEOMETRY value. srid - The new SRID value of the geography or geometry. Examples: > SELECT st_srid(st_setsrid(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040'), 4326)); 4326 > SELECT st_srid(st_setsrid(ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040'), 3857)); 3857 Since: 4.1.0 st_srid \u00b6 st_srid(geo) - Returns the SRID of the input GEOGRAPHY or GEOMETRY value. Arguments: geo - A GEOGRAPHY or GEOMETRY value. Examples: > SELECT st_srid(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040')); 4326 > SELECT st_srid(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040')); 0 > SELECT st_srid(NULL); NULL Since: 4.1.0","title":"St Functions"},{"location":"st-functions/#st-functions","text":"This page lists all st functions available in Spark SQL.","title":"St Functions"},{"location":"st-functions/#st_asbinary","text":"st_asbinary(geo) - Returns the geospatial value (value of type GEOGRAPHY or GEOMETRY) in WKB format. Arguments: geo - A geospatial value, either a GEOGRAPHY or a GEOMETRY. Examples: > SELECT hex(st_asbinary(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 > SELECT hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 Since: 4.1.0","title":"st_asbinary"},{"location":"st-functions/#st_geogfromwkb","text":"st_geogfromwkb(wkb) - Parses the WKB description of a geography and returns the corresponding GEOGRAPHY value. Arguments: wkb - A BINARY value in WKB format, representing a GEOGRAPHY value. Examples: > SELECT hex(st_asbinary(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 Since: 4.1.0","title":"st_geogfromwkb"},{"location":"st-functions/#st_geomfromwkb","text":"st_geomfromwkb(wkb[, srid]) - Parses the WKB description of a geometry and returns the corresponding GEOMETRY value. Arguments: wkb - A BINARY value in WKB format, representing a GEOMETRY value. srid - The optional SRID value of the geometry. Default is 0. Examples: > SELECT hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040'))); 0101000000000000000000F03F0000000000000040 > SELECT st_srid(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040')); 0 > SELECT hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040', 4326))); 0101000000000000000000F03F0000000000000040 > SELECT st_srid(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040', 4326)); 4326 Since: 4.1.0","title":"st_geomfromwkb"},{"location":"st-functions/#st_setsrid","text":"st_setsrid(geo, srid) - Returns a new GEOGRAPHY or GEOMETRY value whose SRID is the specified SRID value. Arguments: geo - A GEOGRAPHY or GEOMETRY value. srid - The new SRID value of the geography or geometry. Examples: > SELECT st_srid(st_setsrid(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040'), 4326)); 4326 > SELECT st_srid(st_setsrid(ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040'), 3857)); 3857 Since: 4.1.0","title":"st_setsrid"},{"location":"st-functions/#st_srid","text":"st_srid(geo) - Returns the SRID of the input GEOGRAPHY or GEOMETRY value. Arguments: geo - A GEOGRAPHY or GEOMETRY value. Examples: > SELECT st_srid(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040')); 4326 > SELECT st_srid(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040')); 0 > SELECT st_srid(NULL); NULL Since: 4.1.0","title":"st_srid"},{"location":"string-functions/","text":"String Functions \u00b6 This page lists all string functions available in Spark SQL. ascii \u00b6 ascii(str) - Returns the numeric value of the first character of str . Examples: > SELECT ascii('222'); 50 > SELECT ascii(2); 50 Since: 1.5.0 base64 \u00b6 base64(bin) - Converts the argument from a binary bin to a base 64 string. Examples: > SELECT base64('Spark SQL'); U3BhcmsgU1FM > SELECT base64(x'537061726b2053514c'); U3BhcmsgU1FM Since: 1.5.0 bit_length \u00b6 bit_length(expr) - Returns the bit length of string data or number of bits of binary data. Examples: > SELECT bit_length('Spark SQL'); 72 > SELECT bit_length(x'537061726b2053514c'); 72 Since: 2.3.0 btrim \u00b6 btrim(str) - Removes the leading and trailing space characters from str . btrim(str, trimStr) - Remove the leading and trailing trimStr characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space Examples: > SELECT btrim(' SparkSQL '); SparkSQL > SELECT btrim(encode(' SparkSQL ', 'utf-8')); SparkSQL > SELECT btrim('SSparkSQLS', 'SL'); parkSQ > SELECT btrim(encode('SSparkSQLS', 'utf-8'), encode('SL', 'utf-8')); parkSQ Since: 3.2.0 char \u00b6 char(expr) - Returns the ASCII character having the binary equivalent to expr . If n is larger than 256 the result is equivalent to chr(n % 256) Examples: > SELECT char(65); A Since: 2.3.0 char_length \u00b6 char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT char_length('Spark SQL '); 10 > SELECT char_length(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 2.3.0 character_length \u00b6 character_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT character_length('Spark SQL '); 10 > SELECT character_length(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 2.3.0 chr \u00b6 chr(expr) - Returns the ASCII character having the binary equivalent to expr . If n is larger than 256 the result is equivalent to chr(n % 256) Examples: > SELECT chr(65); A Since: 2.3.0 collate \u00b6 collate(expr, collationName) - Marks a given expression with the specified collation. Arguments: expr - String expression to perform collation on. collationName - Foldable string expression that specifies the collation name. Examples: > SELECT COLLATION('Spark SQL' collate UTF8_LCASE); SYSTEM.BUILTIN.UTF8_LCASE Since: 4.0.0 collation \u00b6 collation(expr) - Returns the collation name of a given expression. Arguments: expr - String expression to perform collation on. Examples: > SELECT collation('Spark SQL'); SYSTEM.BUILTIN.UTF8_BINARY Since: 4.0.0 concat_ws \u00b6 concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep , skipping null values. Examples: > SELECT concat_ws(' ', 'Spark', 'SQL'); Spark SQL > SELECT concat_ws('s'); > SELECT concat_ws('/', 'foo', null, 'bar'); foo/bar > SELECT concat_ws(null, 'Spark', 'SQL'); NULL Since: 1.5.0 contains \u00b6 contains(left, right) - Returns a boolean. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type. Examples: > SELECT contains('Spark SQL', 'Spark'); true > SELECT contains('Spark SQL', 'SPARK'); false > SELECT contains('Spark SQL', null); NULL > SELECT contains(x'537061726b2053514c', x'537061726b'); true Since: 3.3.0 decode \u00b6 decode(bin, charset) - Decodes the first argument using the second argument character set. If either argument is null, the result will also be null. decode(expr, search, result [, search, result ] ... [, default]) - Compares expr to each search value in order. If expr is equal to a search value, decode returns the corresponding result. If no match is found, then it returns default. If default is omitted, it returns null. Arguments: bin - a binary expression to decode charset - one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32' to decode bin into a STRING. It is case insensitive. Examples: > SELECT decode(encode('abc', 'utf-8'), 'utf-8'); abc > SELECT decode(2, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic'); San Francisco > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic'); Non domestic > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle'); NULL > SELECT decode(null, 6, 'Spark', NULL, 'SQL', 4, 'rocks'); SQL Note: decode(expr, search, result [, search, result ] ... [, default]) is supported since 3.2.0 Since: 1.5.0 elt \u00b6 elt(n, input1, input2, ...) - Returns the n -th input, e.g., returns input2 when n is 2. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled is set to false. If spark.sql.ansi.enabled is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices. Examples: > SELECT elt(1, 'scala', 'java'); scala > SELECT elt(2, 'a', 1); 1 Since: 2.0.0 encode \u00b6 encode(str, charset) - Encodes the first argument using the second argument character set. If either argument is null, the result will also be null. Arguments: str - a string expression charset - one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32' to encode str into a BINARY. It is case insensitive. Examples: > SELECT encode('abc', 'utf-8'); abc Since: 1.5.0 endswith \u00b6 endswith(left, right) - Returns a boolean. The value is True if left ends with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type. Examples: > SELECT endswith('Spark SQL', 'SQL'); true > SELECT endswith('Spark SQL', 'Spark'); false > SELECT endswith('Spark SQL', null); NULL > SELECT endswith(x'537061726b2053514c', x'537061726b'); false > SELECT endswith(x'537061726b2053514c', x'53514c'); true Since: 3.3.0 find_in_set \u00b6 find_in_set(str, str_array) - Returns the index (1-based) of the given string ( str ) in the comma-delimited list ( str_array ). Returns 0, if the string was not found or if the given string ( str ) contains a comma. Examples: > SELECT find_in_set('ab','abc,b,ab,c,def'); 3 Since: 1.5.0 format_number \u00b6 format_number(expr1, expr2) - Formats the number expr1 like '#,###,###.##', rounded to expr2 decimal places. If expr2 is 0, the result has no decimal point or fractional part. expr2 also accept a user specified format. This is supposed to function like MySQL's FORMAT. Examples: > SELECT format_number(12332.123456, 4); 12,332.1235 > SELECT format_number(12332.123456, '##################.###'); 12332.123 Since: 1.5.0 format_string \u00b6 format_string(strfmt, obj, ...) - Returns a formatted string from printf-style format strings. Examples: > SELECT format_string(\"Hello World %d %s\", 100, \"days\"); Hello World 100 days Since: 1.5.0 initcap \u00b6 initcap(str) - Returns str with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space. Examples: > SELECT initcap('sPark sql'); Spark Sql Since: 1.5.0 instr \u00b6 instr(str, substr) - Returns the (1-based) index of the first occurrence of substr in str . Examples: > SELECT instr('SparkSQL', 'SQL'); 6 Since: 1.5.0 is_valid_utf8 \u00b6 is_valid_utf8(str) - Returns true if str is a valid UTF-8 string, otherwise returns false. Arguments: str - a string expression Examples: > SELECT is_valid_utf8('Spark'); true > SELECT is_valid_utf8(x'61'); true > SELECT is_valid_utf8(x'80'); false > SELECT is_valid_utf8(x'61C262'); false Since: 4.0.0 lcase \u00b6 lcase(str) - Returns str with all characters changed to lowercase. Examples: > SELECT lcase('SparkSql'); sparksql Since: 1.0.1 left \u00b6 left(str, len) - Returns the leftmost len ( len can be string type) characters from the string str ,if len is less or equal than 0 the result is an empty string. Examples: > SELECT left('Spark SQL', 3); Spa > SELECT left(encode('Spark SQL', 'utf-8'), 3); Spa Since: 2.3.0 len \u00b6 len(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT len('Spark SQL '); 10 > SELECT len(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 3.4.0 length \u00b6 length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT length('Spark SQL '); 10 > SELECT length(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 1.5.0 levenshtein \u00b6 levenshtein(str1, str2[, threshold]) - Returns the Levenshtein distance between the two given strings. If threshold is set and distance more than it, return -1. Examples: > SELECT levenshtein('kitten', 'sitting'); 3 > SELECT levenshtein('kitten', 'sitting', 2); -1 Since: 1.5.0 locate \u00b6 locate(substr, str[, pos]) - Returns the position of the first occurrence of substr in str after position pos . The given pos and return value are 1-based. Examples: > SELECT locate('bar', 'foobarbar'); 4 > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4 Since: 1.5.0 lower \u00b6 lower(str) - Returns str with all characters changed to lowercase. Examples: > SELECT lower('SparkSql'); sparksql Since: 1.0.1 lpad \u00b6 lpad(str, len[, pad]) - Returns str , left-padded with pad to a length of len . If str is longer than len , the return value is shortened to len characters or bytes. If pad is not specified, str will be padded to the left with space characters if it is a character string, and with zeros if it is a byte sequence. Examples: > SELECT lpad('hi', 5, '??'); ???hi > SELECT lpad('hi', 1, '??'); h > SELECT lpad('hi', 5); hi > SELECT hex(lpad(unhex('aabb'), 5)); 000000AABB > SELECT hex(lpad(unhex('aabb'), 5, unhex('1122'))); 112211AABB Since: 1.5.0 ltrim \u00b6 ltrim(str) - Removes the leading space characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space Examples: > SELECT ltrim(' SparkSQL '); SparkSQL Since: 1.5.0 luhn_check \u00b6 luhn_check(str ) - Checks that a string of digits is valid according to the Luhn algorithm. This checksum function is widely applied on credit card numbers and government identification numbers to distinguish valid numbers from mistyped, incorrect numbers. Examples: > SELECT luhn_check('8112189876'); true > SELECT luhn_check('79927398713'); true > SELECT luhn_check('79927398714'); false Since: 3.5.0 make_valid_utf8 \u00b6 make_valid_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise returns a new string whose invalid UTF8 byte sequences are replaced using the UNICODE replacement character U+FFFD. Arguments: str - a string expression Examples: > SELECT make_valid_utf8('Spark'); Spark > SELECT make_valid_utf8(x'61'); a > SELECT make_valid_utf8(x'80'); \ufffd > SELECT make_valid_utf8(x'61C262'); a\ufffdb Since: 4.0.0 mask \u00b6 mask(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value. The function replaces characters with 'X' or 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed. Arguments: input - string value to mask. Supported types: STRING, VARCHAR, CHAR upperChar - character to replace upper-case characters with. Specify NULL to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify NULL to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify NULL to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify NULL to retain original character. Default value: NULL Examples: > SELECT mask('abcd-EFGH-8765-4321'); xxxx-XXXX-nnnn-nnnn > SELECT mask('abcd-EFGH-8765-4321', 'Q'); xxxx-QQQQ-nnnn-nnnn > SELECT mask('AbCD123-@$#', 'Q', 'q'); QqQQnnn-@$# > SELECT mask('AbCD123-@$#'); XxXXnnn-@$# > SELECT mask('AbCD123-@$#', 'Q'); QxQQnnn-@$# > SELECT mask('AbCD123-@$#', 'Q', 'q'); QqQQnnn-@$# > SELECT mask('AbCD123-@$#', 'Q', 'q', 'd'); QqQQddd-@$# > SELECT mask('AbCD123-@$#', 'Q', 'q', 'd', 'o'); QqQQdddoooo > SELECT mask('AbCD123-@$#', NULL, 'q', 'd', 'o'); AqCDdddoooo > SELECT mask('AbCD123-@$#', NULL, NULL, 'd', 'o'); AbCDdddoooo > SELECT mask('AbCD123-@$#', NULL, NULL, NULL, 'o'); AbCD123oooo > SELECT mask(NULL, NULL, NULL, NULL, 'o'); NULL > SELECT mask(NULL); NULL > SELECT mask('AbCD123-@$#', NULL, NULL, NULL, NULL); AbCD123-@$# Since: 3.4.0 octet_length \u00b6 octet_length(expr) - Returns the byte length of string data or number of bytes of binary data. Examples: > SELECT octet_length('Spark SQL'); 9 > SELECT octet_length(x'537061726b2053514c'); 9 Since: 2.3.0 overlay \u00b6 overlay(input, replace, pos[, len]) - Replace input with replace that starts at pos and is of length len . Examples: > SELECT overlay('Spark SQL' PLACING '_' FROM 6); Spark_SQL > SELECT overlay('Spark SQL' PLACING 'CORE' FROM 7); Spark CORE > SELECT overlay('Spark SQL' PLACING 'ANSI ' FROM 7 FOR 0); Spark ANSI SQL > SELECT overlay('Spark SQL' PLACING 'tructured' FROM 2 FOR 4); Structured SQL > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('_', 'utf-8') FROM 6); Spark_SQL > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('CORE', 'utf-8') FROM 7); Spark CORE > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('ANSI ', 'utf-8') FROM 7 FOR 0); Spark ANSI SQL > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4); Structured SQL Since: 3.0.0 position \u00b6 position(substr, str[, pos]) - Returns the position of the first occurrence of substr in str after position pos . The given pos and return value are 1-based. Examples: > SELECT position('bar', 'foobarbar'); 4 > SELECT position('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4 Since: 2.3.0 printf \u00b6 printf(strfmt, obj, ...) - Returns a formatted string from printf-style format strings. Examples: > SELECT printf(\"Hello World %d %s\", 100, \"days\"); Hello World 100 days Since: 1.5.0 quote \u00b6 quote(str) - Returns str enclosed by single quotes and each instance of single quote in it is preceded by a backslash. Examples: > SELECT quote('Don\\'t'); 'Don\\'t' Since: 4.1.0 randstr \u00b6 randstr(length[, seed]) - Returns a string of the specified length whose characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z. The random seed is optional. The string length must be a constant two-byte or four-byte integer (SMALLINT or INT, respectively). Examples: > SELECT randstr(3, 0) AS result; ceV Since: 4.0.0 regexp_count \u00b6 regexp_count(str, regexp) - Returns a count of the number of times that the regular expression pattern regexp is matched in the string str . Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Examples: > SELECT regexp_count('Steven Jones and Stephen Smith are the best players', 'Ste(v|ph)en'); 2 > SELECT regexp_count('abcdefghijklmnopqrstuvwxyz', '[a-z]{3}'); 8 Since: 3.4.0 regexp_extract \u00b6 regexp_extract(str, regexp[, idx]) - Extract the first string in the str that match the regexp expression and corresponding to the regex group index. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. idx - an integer expression that representing the group index. The regex maybe contains multiple groups. idx indicates which regex group to extract. The group index should be non-negative. The minimum value of idx is 0, which means matching the entire regular expression. If idx is not specified, the default group index value is 1. The idx parameter is the Java regex Matcher group() method index. Examples: > SELECT regexp_extract('100-200', '(\\\\d+)-(\\\\d+)', 1); 100 > SELECT regexp_extract('100-200', r'(\\d+)-(\\d+)', 1); 100 Since: 1.5.0 regexp_extract_all \u00b6 regexp_extract_all(str, regexp[, idx]) - Extract all strings in the str that match the regexp expression and corresponding to the regex group index. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. idx - an integer expression that representing the group index. The regex may contains multiple groups. idx indicates which regex group to extract. The group index should be non-negative. The minimum value of idx is 0, which means matching the entire regular expression. If idx is not specified, the default group index value is 1. The idx parameter is the Java regex Matcher group() method index. Examples: > SELECT regexp_extract_all('100-200, 300-400', '(\\\\d+)-(\\\\d+)', 1); [\"100\",\"300\"] > SELECT regexp_extract_all('100-200, 300-400', r'(\\d+)-(\\d+)', 1); [\"100\",\"300\"] Since: 3.1.0 regexp_instr \u00b6 regexp_instr(str, regexp) - Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SELECT regexp_instr(r\"\\abc\", r\"^\\\\abc$\"); 1 > SELECT regexp_instr('user@spark.apache.org', '@[^.]*'); 5 Since: 3.4.0 regexp_replace \u00b6 regexp_replace(str, regexp, rep[, position]) - Replaces all substrings of str that match regexp with rep . Arguments: str - a string expression to search for a regular expression pattern match. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. rep - a string expression to replace matched substrings. position - a positive integer literal that indicates the position within str to begin searching. The default is 1. If position is greater than the number of characters in str , the result is str . Examples: > SELECT regexp_replace('100-200', '(\\\\d+)', 'num'); num-num > SELECT regexp_replace('100-200', r'(\\d+)', 'num'); num-num Since: 1.5.0 regexp_substr \u00b6 regexp_substr(str, regexp) - Returns the substring that matches the regular expression regexp within the string str . If the regular expression is not found, the result is null. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Examples: > SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', 'Ste(v|ph)en'); Steven > SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', 'Jeck'); NULL Since: 3.4.0 repeat \u00b6 repeat(str, n) - Returns the string which repeats the given string value n times. Examples: > SELECT repeat('123', 2); 123123 Since: 1.5.0 replace \u00b6 replace(str, search[, replace]) - Replaces all occurrences of search with replace . Arguments: str - a string expression search - a string expression. If search is not found in str , str is returned unchanged. replace - a string expression. If replace is not specified or is an empty string, nothing replaces the string that is removed from str . Examples: > SELECT replace('ABCabc', 'abc', 'DEF'); ABCDEF Since: 2.3.0 right \u00b6 right(str, len) - Returns the rightmost len ( len can be string type) characters from the string str ,if len is less or equal than 0 the result is an empty string. Examples: > SELECT right('Spark SQL', 3); SQL Since: 2.3.0 rpad \u00b6 rpad(str, len[, pad]) - Returns str , right-padded with pad to a length of len . If str is longer than len , the return value is shortened to len characters. If pad is not specified, str will be padded to the right with space characters if it is a character string, and with zeros if it is a binary string. Examples: > SELECT rpad('hi', 5, '??'); hi??? > SELECT rpad('hi', 1, '??'); h > SELECT rpad('hi', 5); hi > SELECT hex(rpad(unhex('aabb'), 5)); AABB000000 > SELECT hex(rpad(unhex('aabb'), 5, unhex('1122'))); AABB112211 Since: 1.5.0 rtrim \u00b6 rtrim(str) - Removes the trailing space characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space Examples: > SELECT rtrim(' SparkSQL '); SparkSQL Since: 1.5.0 sentences \u00b6 sentences(str[, lang[, country]]) - Splits str into an array of array of words. Arguments: str - A STRING expression to be parsed. lang - An optional STRING expression with a language code from ISO 639 Alpha-2 (e.g. 'DE'), Alpha-3, or a language subtag of up to 8 characters. country - An optional STRING expression with a country code from ISO 3166 alpha-2 country code or a UN M.49 numeric-3 area code. Examples: > SELECT sentences('Hi there! Good morning.'); [[\"Hi\",\"there\"],[\"Good\",\"morning\"]] > SELECT sentences('Hi there! Good morning.', 'en'); [[\"Hi\",\"there\"],[\"Good\",\"morning\"]] > SELECT sentences('Hi there! Good morning.', 'en', 'US'); [[\"Hi\",\"there\"],[\"Good\",\"morning\"]] Since: 2.0.0 soundex \u00b6 soundex(str) - Returns Soundex code of the string. Examples: > SELECT soundex('Miller'); M460 Since: 1.5.0 space \u00b6 space(n) - Returns a string consisting of n spaces. Examples: > SELECT concat(space(2), '1'); 1 Since: 1.5.0 split \u00b6 split(str, regex, limit) - Splits str around occurrences that match regex and returns an array with a length of at most limit Arguments: str - a string expression to split. regex - a string representing a regular expression. The regex string should be a Java regular expression. limit - an integer expression which controls the number of times the regex is applied. limit > 0: The resulting array's length will not be more than limit , and the resulting array's last entry will contain all input beyond the last matched regex. limit <= 0: regex will be applied as many times as possible, and the resulting array can be of any size. Examples: > SELECT split('oneAtwoBthreeC', '[ABC]'); [\"one\",\"two\",\"three\",\"\"] > SELECT split('oneAtwoBthreeC', '[ABC]', -1); [\"one\",\"two\",\"three\",\"\"] > SELECT split('oneAtwoBthreeC', '[ABC]', 2); [\"one\",\"twoBthreeC\"] Since: 1.5.0 split_part \u00b6 split_part(str, delimiter, partNum) - Splits str by delimiter and return requested part of the split (1-based). If any input is null, returns null. if partNum is out of range of split parts, returns empty string. If partNum is 0, throws an error. If partNum is negative, the parts are counted backward from the end of the string. If the delimiter is an empty string, the str is not split. Examples: > SELECT split_part('11.12.13', '.', 3); 13 Since: 3.3.0 startswith \u00b6 startswith(left, right) - Returns a boolean. The value is True if left starts with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type. Examples: > SELECT startswith('Spark SQL', 'Spark'); true > SELECT startswith('Spark SQL', 'SQL'); false > SELECT startswith('Spark SQL', null); NULL > SELECT startswith(x'537061726b2053514c', x'537061726b'); true > SELECT startswith(x'537061726b2053514c', x'53514c'); false Since: 3.3.0 substr \u00b6 substr(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . substr(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . Examples: > SELECT substr('Spark SQL', 5); k SQL > SELECT substr('Spark SQL', -3); SQL > SELECT substr('Spark SQL', 5, 1); k > SELECT substr('Spark SQL' FROM 5); k SQL > SELECT substr('Spark SQL' FROM -3); SQL > SELECT substr('Spark SQL' FROM 5 FOR 1); k > SELECT substr(encode('Spark SQL', 'utf-8'), 5); k SQL Since: 1.5.0 substring \u00b6 substring(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . substring(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . Examples: > SELECT substring('Spark SQL', 5); k SQL > SELECT substring('Spark SQL', -3); SQL > SELECT substring('Spark SQL', 5, 1); k > SELECT substring('Spark SQL' FROM 5); k SQL > SELECT substring('Spark SQL' FROM -3); SQL > SELECT substring('Spark SQL' FROM 5 FOR 1); k > SELECT substring(encode('Spark SQL', 'utf-8'), 5); k SQL Since: 1.5.0 substring_index \u00b6 substring_index(str, delim, count) - Returns the substring from str before count occurrences of the delimiter delim . If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim . Examples: > SELECT substring_index('www.apache.org', '.', 2); www.apache Since: 1.5.0 to_binary \u00b6 to_binary(str[, fmt]) - Converts the input str to a binary value based on the supplied fmt . fmt can be a case-insensitive string literal of \"hex\", \"utf-8\", \"utf8\", or \"base64\". By default, the binary format for conversion is \"hex\" if fmt is omitted. The function returns NULL if at least one of the input parameters is NULL. Examples: > SELECT to_binary('abc', 'utf-8'); abc Since: 3.3.0 to_char \u00b6 to_char(expr, format) - Convert expr to a string based on the format . Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns . If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string. Examples: > SELECT to_char(454, '999'); 454 > SELECT to_char(454.00, '000D00'); 454.00 > SELECT to_char(12454, '99G999'); 12,454 > SELECT to_char(78.12, '$99.99'); $78.12 > SELECT to_char(-12454.8, '99G999D9S'); 12,454.8- > SELECT to_char(date'2016-04-08', 'y'); 2016 > SELECT to_char(x'537061726b2053514c', 'base64'); U3BhcmsgU1FM > SELECT to_char(x'537061726b2053514c', 'hex'); 537061726B2053514C > SELECT to_char(encode('abc', 'utf-8'), 'utf-8'); abc Since: 3.4.0 to_number \u00b6 to_number(expr, fmt) - Convert string 'expr' to a number based on the string format 'fmt'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets. ('<1>'). Examples: > SELECT to_number('454', '999'); 454 > SELECT to_number('454.00', '000.00'); 454.00 > SELECT to_number('12,454', '99,999'); 12454 > SELECT to_number('$78.12', '$99.99'); 78.12 > SELECT to_number('12,454.8-', '99,999.9S'); -12454.8 Since: 3.3.0 to_varchar \u00b6 to_varchar(expr, format) - Convert expr to a string based on the format . Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns . If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string. Examples: > SELECT to_varchar(454, '999'); 454 > SELECT to_varchar(454.00, '000D00'); 454.00 > SELECT to_varchar(12454, '99G999'); 12,454 > SELECT to_varchar(78.12, '$99.99'); $78.12 > SELECT to_varchar(-12454.8, '99G999D9S'); 12,454.8- > SELECT to_varchar(date'2016-04-08', 'y'); 2016 > SELECT to_varchar(x'537061726b2053514c', 'base64'); U3BhcmsgU1FM > SELECT to_varchar(x'537061726b2053514c', 'hex'); 537061726B2053514C > SELECT to_varchar(encode('abc', 'utf-8'), 'utf-8'); abc Since: 3.5.0 translate \u00b6 translate(input, from, to) - Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. Examples: > SELECT translate('AaBbCc', 'abc', '123'); A1B2C3 Since: 1.5.0 trim \u00b6 trim(str) - Removes the leading and trailing space characters from str . trim(BOTH FROM str) - Removes the leading and trailing space characters from str . trim(LEADING FROM str) - Removes the leading space characters from str . trim(TRAILING FROM str) - Removes the trailing space characters from str . trim(trimStr FROM str) - Remove the leading and trailing trimStr characters from str . trim(BOTH trimStr FROM str) - Remove the leading and trailing trimStr characters from str . trim(LEADING trimStr FROM str) - Remove the leading trimStr characters from str . trim(TRAILING trimStr FROM str) - Remove the trailing trimStr characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space BOTH, FROM - these are keywords to specify trimming string characters from both ends of the string LEADING, FROM - these are keywords to specify trimming string characters from the left end of the string TRAILING, FROM - these are keywords to specify trimming string characters from the right end of the string Examples: > SELECT trim(' SparkSQL '); SparkSQL > SELECT trim(BOTH FROM ' SparkSQL '); SparkSQL > SELECT trim(LEADING FROM ' SparkSQL '); SparkSQL > SELECT trim(TRAILING FROM ' SparkSQL '); SparkSQL > SELECT trim('SL' FROM 'SSparkSQLS'); parkSQ > SELECT trim(BOTH 'SL' FROM 'SSparkSQLS'); parkSQ > SELECT trim(LEADING 'SL' FROM 'SSparkSQLS'); parkSQLS > SELECT trim(TRAILING 'SL' FROM 'SSparkSQLS'); SSparkSQ Since: 1.5.0 try_to_binary \u00b6 try_to_binary(str[, fmt]) - This is a special version of to_binary that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed. Examples: > SELECT try_to_binary('abc', 'utf-8'); abc > select try_to_binary('a!', 'base64'); NULL > select try_to_binary('abc', 'invalidFormat'); NULL Since: 3.3.0 try_to_number \u00b6 try_to_number(expr, fmt) - Convert string 'expr' to a number based on the string format fmt . Returns NULL if the string 'expr' does not match the expected format. The format follows the same semantics as the to_number function. Examples: > SELECT try_to_number('454', '999'); 454 > SELECT try_to_number('454.00', '000.00'); 454.00 > SELECT try_to_number('12,454', '99,999'); 12454 > SELECT try_to_number('$78.12', '$99.99'); 78.12 > SELECT try_to_number('12,454.8-', '99,999.9S'); -12454.8 Since: 3.3.0 try_validate_utf8 \u00b6 try_validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise returns NULL. Arguments: str - a string expression Examples: > SELECT try_validate_utf8('Spark'); Spark > SELECT try_validate_utf8(x'61'); a > SELECT try_validate_utf8(x'80'); NULL > SELECT try_validate_utf8(x'61C262'); NULL Since: 4.0.0 ucase \u00b6 ucase(str) - Returns str with all characters changed to uppercase. Examples: > SELECT ucase('SparkSql'); SPARKSQL Since: 1.0.1 unbase64 \u00b6 unbase64(str) - Converts the argument from a base 64 string str to a binary. Examples: > SELECT unbase64('U3BhcmsgU1FM'); Spark SQL Since: 1.5.0 upper \u00b6 upper(str) - Returns str with all characters changed to uppercase. Examples: > SELECT upper('SparkSql'); SPARKSQL Since: 1.0.1 validate_utf8 \u00b6 validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise throws an exception. Arguments: str - a string expression Examples: > SELECT validate_utf8('Spark'); Spark > SELECT validate_utf8(x'61'); a Since: 4.0.0 || \u00b6 expr1 || expr2 - Returns the concatenation of expr1 and expr2 . Examples: > SELECT 'Spark' || 'SQL'; SparkSQL > SELECT array(1, 2, 3) || array(4, 5) || array(6); [1,2,3,4,5,6] Note: || for arrays is available since 2.4.0. Since: 2.3.0","title":"String Functions"},{"location":"string-functions/#string-functions","text":"This page lists all string functions available in Spark SQL.","title":"String Functions"},{"location":"string-functions/#ascii","text":"ascii(str) - Returns the numeric value of the first character of str . Examples: > SELECT ascii('222'); 50 > SELECT ascii(2); 50 Since: 1.5.0","title":"ascii"},{"location":"string-functions/#base64","text":"base64(bin) - Converts the argument from a binary bin to a base 64 string. Examples: > SELECT base64('Spark SQL'); U3BhcmsgU1FM > SELECT base64(x'537061726b2053514c'); U3BhcmsgU1FM Since: 1.5.0","title":"base64"},{"location":"string-functions/#bit_length","text":"bit_length(expr) - Returns the bit length of string data or number of bits of binary data. Examples: > SELECT bit_length('Spark SQL'); 72 > SELECT bit_length(x'537061726b2053514c'); 72 Since: 2.3.0","title":"bit_length"},{"location":"string-functions/#btrim","text":"btrim(str) - Removes the leading and trailing space characters from str . btrim(str, trimStr) - Remove the leading and trailing trimStr characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space Examples: > SELECT btrim(' SparkSQL '); SparkSQL > SELECT btrim(encode(' SparkSQL ', 'utf-8')); SparkSQL > SELECT btrim('SSparkSQLS', 'SL'); parkSQ > SELECT btrim(encode('SSparkSQLS', 'utf-8'), encode('SL', 'utf-8')); parkSQ Since: 3.2.0","title":"btrim"},{"location":"string-functions/#char","text":"char(expr) - Returns the ASCII character having the binary equivalent to expr . If n is larger than 256 the result is equivalent to chr(n % 256) Examples: > SELECT char(65); A Since: 2.3.0","title":"char"},{"location":"string-functions/#char_length","text":"char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT char_length('Spark SQL '); 10 > SELECT char_length(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 2.3.0","title":"char_length"},{"location":"string-functions/#character_length","text":"character_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT character_length('Spark SQL '); 10 > SELECT character_length(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 2.3.0","title":"character_length"},{"location":"string-functions/#chr","text":"chr(expr) - Returns the ASCII character having the binary equivalent to expr . If n is larger than 256 the result is equivalent to chr(n % 256) Examples: > SELECT chr(65); A Since: 2.3.0","title":"chr"},{"location":"string-functions/#collate","text":"collate(expr, collationName) - Marks a given expression with the specified collation. Arguments: expr - String expression to perform collation on. collationName - Foldable string expression that specifies the collation name. Examples: > SELECT COLLATION('Spark SQL' collate UTF8_LCASE); SYSTEM.BUILTIN.UTF8_LCASE Since: 4.0.0","title":"collate"},{"location":"string-functions/#collation","text":"collation(expr) - Returns the collation name of a given expression. Arguments: expr - String expression to perform collation on. Examples: > SELECT collation('Spark SQL'); SYSTEM.BUILTIN.UTF8_BINARY Since: 4.0.0","title":"collation"},{"location":"string-functions/#concat_ws","text":"concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep , skipping null values. Examples: > SELECT concat_ws(' ', 'Spark', 'SQL'); Spark SQL > SELECT concat_ws('s'); > SELECT concat_ws('/', 'foo', null, 'bar'); foo/bar > SELECT concat_ws(null, 'Spark', 'SQL'); NULL Since: 1.5.0","title":"concat_ws"},{"location":"string-functions/#contains","text":"contains(left, right) - Returns a boolean. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type. Examples: > SELECT contains('Spark SQL', 'Spark'); true > SELECT contains('Spark SQL', 'SPARK'); false > SELECT contains('Spark SQL', null); NULL > SELECT contains(x'537061726b2053514c', x'537061726b'); true Since: 3.3.0","title":"contains"},{"location":"string-functions/#decode","text":"decode(bin, charset) - Decodes the first argument using the second argument character set. If either argument is null, the result will also be null. decode(expr, search, result [, search, result ] ... [, default]) - Compares expr to each search value in order. If expr is equal to a search value, decode returns the corresponding result. If no match is found, then it returns default. If default is omitted, it returns null. Arguments: bin - a binary expression to decode charset - one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32' to decode bin into a STRING. It is case insensitive. Examples: > SELECT decode(encode('abc', 'utf-8'), 'utf-8'); abc > SELECT decode(2, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic'); San Francisco > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic'); Non domestic > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle'); NULL > SELECT decode(null, 6, 'Spark', NULL, 'SQL', 4, 'rocks'); SQL Note: decode(expr, search, result [, search, result ] ... [, default]) is supported since 3.2.0 Since: 1.5.0","title":"decode"},{"location":"string-functions/#elt","text":"elt(n, input1, input2, ...) - Returns the n -th input, e.g., returns input2 when n is 2. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled is set to false. If spark.sql.ansi.enabled is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices. Examples: > SELECT elt(1, 'scala', 'java'); scala > SELECT elt(2, 'a', 1); 1 Since: 2.0.0","title":"elt"},{"location":"string-functions/#encode","text":"encode(str, charset) - Encodes the first argument using the second argument character set. If either argument is null, the result will also be null. Arguments: str - a string expression charset - one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32' to encode str into a BINARY. It is case insensitive. Examples: > SELECT encode('abc', 'utf-8'); abc Since: 1.5.0","title":"encode"},{"location":"string-functions/#endswith","text":"endswith(left, right) - Returns a boolean. The value is True if left ends with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type. Examples: > SELECT endswith('Spark SQL', 'SQL'); true > SELECT endswith('Spark SQL', 'Spark'); false > SELECT endswith('Spark SQL', null); NULL > SELECT endswith(x'537061726b2053514c', x'537061726b'); false > SELECT endswith(x'537061726b2053514c', x'53514c'); true Since: 3.3.0","title":"endswith"},{"location":"string-functions/#find_in_set","text":"find_in_set(str, str_array) - Returns the index (1-based) of the given string ( str ) in the comma-delimited list ( str_array ). Returns 0, if the string was not found or if the given string ( str ) contains a comma. Examples: > SELECT find_in_set('ab','abc,b,ab,c,def'); 3 Since: 1.5.0","title":"find_in_set"},{"location":"string-functions/#format_number","text":"format_number(expr1, expr2) - Formats the number expr1 like '#,###,###.##', rounded to expr2 decimal places. If expr2 is 0, the result has no decimal point or fractional part. expr2 also accept a user specified format. This is supposed to function like MySQL's FORMAT. Examples: > SELECT format_number(12332.123456, 4); 12,332.1235 > SELECT format_number(12332.123456, '##################.###'); 12332.123 Since: 1.5.0","title":"format_number"},{"location":"string-functions/#format_string","text":"format_string(strfmt, obj, ...) - Returns a formatted string from printf-style format strings. Examples: > SELECT format_string(\"Hello World %d %s\", 100, \"days\"); Hello World 100 days Since: 1.5.0","title":"format_string"},{"location":"string-functions/#initcap","text":"initcap(str) - Returns str with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space. Examples: > SELECT initcap('sPark sql'); Spark Sql Since: 1.5.0","title":"initcap"},{"location":"string-functions/#instr","text":"instr(str, substr) - Returns the (1-based) index of the first occurrence of substr in str . Examples: > SELECT instr('SparkSQL', 'SQL'); 6 Since: 1.5.0","title":"instr"},{"location":"string-functions/#is_valid_utf8","text":"is_valid_utf8(str) - Returns true if str is a valid UTF-8 string, otherwise returns false. Arguments: str - a string expression Examples: > SELECT is_valid_utf8('Spark'); true > SELECT is_valid_utf8(x'61'); true > SELECT is_valid_utf8(x'80'); false > SELECT is_valid_utf8(x'61C262'); false Since: 4.0.0","title":"is_valid_utf8"},{"location":"string-functions/#lcase","text":"lcase(str) - Returns str with all characters changed to lowercase. Examples: > SELECT lcase('SparkSql'); sparksql Since: 1.0.1","title":"lcase"},{"location":"string-functions/#left","text":"left(str, len) - Returns the leftmost len ( len can be string type) characters from the string str ,if len is less or equal than 0 the result is an empty string. Examples: > SELECT left('Spark SQL', 3); Spa > SELECT left(encode('Spark SQL', 'utf-8'), 3); Spa Since: 2.3.0","title":"left"},{"location":"string-functions/#len","text":"len(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT len('Spark SQL '); 10 > SELECT len(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 3.4.0","title":"len"},{"location":"string-functions/#length","text":"length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros. Examples: > SELECT length('Spark SQL '); 10 > SELECT length(x'537061726b2053514c'); 9 > SELECT CHAR_LENGTH('Spark SQL '); 10 > SELECT CHARACTER_LENGTH('Spark SQL '); 10 Since: 1.5.0","title":"length"},{"location":"string-functions/#levenshtein","text":"levenshtein(str1, str2[, threshold]) - Returns the Levenshtein distance between the two given strings. If threshold is set and distance more than it, return -1. Examples: > SELECT levenshtein('kitten', 'sitting'); 3 > SELECT levenshtein('kitten', 'sitting', 2); -1 Since: 1.5.0","title":"levenshtein"},{"location":"string-functions/#locate","text":"locate(substr, str[, pos]) - Returns the position of the first occurrence of substr in str after position pos . The given pos and return value are 1-based. Examples: > SELECT locate('bar', 'foobarbar'); 4 > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4 Since: 1.5.0","title":"locate"},{"location":"string-functions/#lower","text":"lower(str) - Returns str with all characters changed to lowercase. Examples: > SELECT lower('SparkSql'); sparksql Since: 1.0.1","title":"lower"},{"location":"string-functions/#lpad","text":"lpad(str, len[, pad]) - Returns str , left-padded with pad to a length of len . If str is longer than len , the return value is shortened to len characters or bytes. If pad is not specified, str will be padded to the left with space characters if it is a character string, and with zeros if it is a byte sequence. Examples: > SELECT lpad('hi', 5, '??'); ???hi > SELECT lpad('hi', 1, '??'); h > SELECT lpad('hi', 5); hi > SELECT hex(lpad(unhex('aabb'), 5)); 000000AABB > SELECT hex(lpad(unhex('aabb'), 5, unhex('1122'))); 112211AABB Since: 1.5.0","title":"lpad"},{"location":"string-functions/#ltrim","text":"ltrim(str) - Removes the leading space characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space Examples: > SELECT ltrim(' SparkSQL '); SparkSQL Since: 1.5.0","title":"ltrim"},{"location":"string-functions/#luhn_check","text":"luhn_check(str ) - Checks that a string of digits is valid according to the Luhn algorithm. This checksum function is widely applied on credit card numbers and government identification numbers to distinguish valid numbers from mistyped, incorrect numbers. Examples: > SELECT luhn_check('8112189876'); true > SELECT luhn_check('79927398713'); true > SELECT luhn_check('79927398714'); false Since: 3.5.0","title":"luhn_check"},{"location":"string-functions/#make_valid_utf8","text":"make_valid_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise returns a new string whose invalid UTF8 byte sequences are replaced using the UNICODE replacement character U+FFFD. Arguments: str - a string expression Examples: > SELECT make_valid_utf8('Spark'); Spark > SELECT make_valid_utf8(x'61'); a > SELECT make_valid_utf8(x'80'); \ufffd > SELECT make_valid_utf8(x'61C262'); a\ufffdb Since: 4.0.0","title":"make_valid_utf8"},{"location":"string-functions/#mask","text":"mask(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value. The function replaces characters with 'X' or 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed. Arguments: input - string value to mask. Supported types: STRING, VARCHAR, CHAR upperChar - character to replace upper-case characters with. Specify NULL to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify NULL to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify NULL to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify NULL to retain original character. Default value: NULL Examples: > SELECT mask('abcd-EFGH-8765-4321'); xxxx-XXXX-nnnn-nnnn > SELECT mask('abcd-EFGH-8765-4321', 'Q'); xxxx-QQQQ-nnnn-nnnn > SELECT mask('AbCD123-@$#', 'Q', 'q'); QqQQnnn-@$# > SELECT mask('AbCD123-@$#'); XxXXnnn-@$# > SELECT mask('AbCD123-@$#', 'Q'); QxQQnnn-@$# > SELECT mask('AbCD123-@$#', 'Q', 'q'); QqQQnnn-@$# > SELECT mask('AbCD123-@$#', 'Q', 'q', 'd'); QqQQddd-@$# > SELECT mask('AbCD123-@$#', 'Q', 'q', 'd', 'o'); QqQQdddoooo > SELECT mask('AbCD123-@$#', NULL, 'q', 'd', 'o'); AqCDdddoooo > SELECT mask('AbCD123-@$#', NULL, NULL, 'd', 'o'); AbCDdddoooo > SELECT mask('AbCD123-@$#', NULL, NULL, NULL, 'o'); AbCD123oooo > SELECT mask(NULL, NULL, NULL, NULL, 'o'); NULL > SELECT mask(NULL); NULL > SELECT mask('AbCD123-@$#', NULL, NULL, NULL, NULL); AbCD123-@$# Since: 3.4.0","title":"mask"},{"location":"string-functions/#octet_length","text":"octet_length(expr) - Returns the byte length of string data or number of bytes of binary data. Examples: > SELECT octet_length('Spark SQL'); 9 > SELECT octet_length(x'537061726b2053514c'); 9 Since: 2.3.0","title":"octet_length"},{"location":"string-functions/#overlay","text":"overlay(input, replace, pos[, len]) - Replace input with replace that starts at pos and is of length len . Examples: > SELECT overlay('Spark SQL' PLACING '_' FROM 6); Spark_SQL > SELECT overlay('Spark SQL' PLACING 'CORE' FROM 7); Spark CORE > SELECT overlay('Spark SQL' PLACING 'ANSI ' FROM 7 FOR 0); Spark ANSI SQL > SELECT overlay('Spark SQL' PLACING 'tructured' FROM 2 FOR 4); Structured SQL > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('_', 'utf-8') FROM 6); Spark_SQL > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('CORE', 'utf-8') FROM 7); Spark CORE > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('ANSI ', 'utf-8') FROM 7 FOR 0); Spark ANSI SQL > SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4); Structured SQL Since: 3.0.0","title":"overlay"},{"location":"string-functions/#position","text":"position(substr, str[, pos]) - Returns the position of the first occurrence of substr in str after position pos . The given pos and return value are 1-based. Examples: > SELECT position('bar', 'foobarbar'); 4 > SELECT position('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4 Since: 2.3.0","title":"position"},{"location":"string-functions/#printf","text":"printf(strfmt, obj, ...) - Returns a formatted string from printf-style format strings. Examples: > SELECT printf(\"Hello World %d %s\", 100, \"days\"); Hello World 100 days Since: 1.5.0","title":"printf"},{"location":"string-functions/#quote","text":"quote(str) - Returns str enclosed by single quotes and each instance of single quote in it is preceded by a backslash. Examples: > SELECT quote('Don\\'t'); 'Don\\'t' Since: 4.1.0","title":"quote"},{"location":"string-functions/#randstr","text":"randstr(length[, seed]) - Returns a string of the specified length whose characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z. The random seed is optional. The string length must be a constant two-byte or four-byte integer (SMALLINT or INT, respectively). Examples: > SELECT randstr(3, 0) AS result; ceV Since: 4.0.0","title":"randstr"},{"location":"string-functions/#regexp_count","text":"regexp_count(str, regexp) - Returns a count of the number of times that the regular expression pattern regexp is matched in the string str . Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Examples: > SELECT regexp_count('Steven Jones and Stephen Smith are the best players', 'Ste(v|ph)en'); 2 > SELECT regexp_count('abcdefghijklmnopqrstuvwxyz', '[a-z]{3}'); 8 Since: 3.4.0","title":"regexp_count"},{"location":"string-functions/#regexp_extract","text":"regexp_extract(str, regexp[, idx]) - Extract the first string in the str that match the regexp expression and corresponding to the regex group index. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. idx - an integer expression that representing the group index. The regex maybe contains multiple groups. idx indicates which regex group to extract. The group index should be non-negative. The minimum value of idx is 0, which means matching the entire regular expression. If idx is not specified, the default group index value is 1. The idx parameter is the Java regex Matcher group() method index. Examples: > SELECT regexp_extract('100-200', '(\\\\d+)-(\\\\d+)', 1); 100 > SELECT regexp_extract('100-200', r'(\\d+)-(\\d+)', 1); 100 Since: 1.5.0","title":"regexp_extract"},{"location":"string-functions/#regexp_extract_all","text":"regexp_extract_all(str, regexp[, idx]) - Extract all strings in the str that match the regexp expression and corresponding to the regex group index. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. idx - an integer expression that representing the group index. The regex may contains multiple groups. idx indicates which regex group to extract. The group index should be non-negative. The minimum value of idx is 0, which means matching the entire regular expression. If idx is not specified, the default group index value is 1. The idx parameter is the Java regex Matcher group() method index. Examples: > SELECT regexp_extract_all('100-200, 300-400', '(\\\\d+)-(\\\\d+)', 1); [\"100\",\"300\"] > SELECT regexp_extract_all('100-200, 300-400', r'(\\d+)-(\\d+)', 1); [\"100\",\"300\"] Since: 3.1.0","title":"regexp_extract_all"},{"location":"string-functions/#regexp_instr","text":"regexp_instr(str, regexp) - Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. Examples: > SELECT regexp_instr(r\"\\abc\", r\"^\\\\abc$\"); 1 > SELECT regexp_instr('user@spark.apache.org', '@[^.]*'); 5 Since: 3.4.0","title":"regexp_instr"},{"location":"string-functions/#regexp_replace","text":"regexp_replace(str, regexp, rep[, position]) - Replaces all substrings of str that match regexp with rep . Arguments: str - a string expression to search for a regular expression pattern match. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal . For example, to match \"\\abc\", a regular expression for regexp can be \"^\\abc$\". There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match \"\\abc\" is \"^\\abc$\". It's recommended to use a raw string literal (with the r prefix) to avoid escaping special characters in the pattern string if exists. rep - a string expression to replace matched substrings. position - a positive integer literal that indicates the position within str to begin searching. The default is 1. If position is greater than the number of characters in str , the result is str . Examples: > SELECT regexp_replace('100-200', '(\\\\d+)', 'num'); num-num > SELECT regexp_replace('100-200', r'(\\d+)', 'num'); num-num Since: 1.5.0","title":"regexp_replace"},{"location":"string-functions/#regexp_substr","text":"regexp_substr(str, regexp) - Returns the substring that matches the regular expression regexp within the string str . If the regular expression is not found, the result is null. Arguments: str - a string expression. regexp - a string representing a regular expression. The regex string should be a Java regular expression. Examples: > SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', 'Ste(v|ph)en'); Steven > SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', 'Jeck'); NULL Since: 3.4.0","title":"regexp_substr"},{"location":"string-functions/#repeat","text":"repeat(str, n) - Returns the string which repeats the given string value n times. Examples: > SELECT repeat('123', 2); 123123 Since: 1.5.0","title":"repeat"},{"location":"string-functions/#replace","text":"replace(str, search[, replace]) - Replaces all occurrences of search with replace . Arguments: str - a string expression search - a string expression. If search is not found in str , str is returned unchanged. replace - a string expression. If replace is not specified or is an empty string, nothing replaces the string that is removed from str . Examples: > SELECT replace('ABCabc', 'abc', 'DEF'); ABCDEF Since: 2.3.0","title":"replace"},{"location":"string-functions/#right","text":"right(str, len) - Returns the rightmost len ( len can be string type) characters from the string str ,if len is less or equal than 0 the result is an empty string. Examples: > SELECT right('Spark SQL', 3); SQL Since: 2.3.0","title":"right"},{"location":"string-functions/#rpad","text":"rpad(str, len[, pad]) - Returns str , right-padded with pad to a length of len . If str is longer than len , the return value is shortened to len characters. If pad is not specified, str will be padded to the right with space characters if it is a character string, and with zeros if it is a binary string. Examples: > SELECT rpad('hi', 5, '??'); hi??? > SELECT rpad('hi', 1, '??'); h > SELECT rpad('hi', 5); hi > SELECT hex(rpad(unhex('aabb'), 5)); AABB000000 > SELECT hex(rpad(unhex('aabb'), 5, unhex('1122'))); AABB112211 Since: 1.5.0","title":"rpad"},{"location":"string-functions/#rtrim","text":"rtrim(str) - Removes the trailing space characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space Examples: > SELECT rtrim(' SparkSQL '); SparkSQL Since: 1.5.0","title":"rtrim"},{"location":"string-functions/#sentences","text":"sentences(str[, lang[, country]]) - Splits str into an array of array of words. Arguments: str - A STRING expression to be parsed. lang - An optional STRING expression with a language code from ISO 639 Alpha-2 (e.g. 'DE'), Alpha-3, or a language subtag of up to 8 characters. country - An optional STRING expression with a country code from ISO 3166 alpha-2 country code or a UN M.49 numeric-3 area code. Examples: > SELECT sentences('Hi there! Good morning.'); [[\"Hi\",\"there\"],[\"Good\",\"morning\"]] > SELECT sentences('Hi there! Good morning.', 'en'); [[\"Hi\",\"there\"],[\"Good\",\"morning\"]] > SELECT sentences('Hi there! Good morning.', 'en', 'US'); [[\"Hi\",\"there\"],[\"Good\",\"morning\"]] Since: 2.0.0","title":"sentences"},{"location":"string-functions/#soundex","text":"soundex(str) - Returns Soundex code of the string. Examples: > SELECT soundex('Miller'); M460 Since: 1.5.0","title":"soundex"},{"location":"string-functions/#space","text":"space(n) - Returns a string consisting of n spaces. Examples: > SELECT concat(space(2), '1'); 1 Since: 1.5.0","title":"space"},{"location":"string-functions/#split","text":"split(str, regex, limit) - Splits str around occurrences that match regex and returns an array with a length of at most limit Arguments: str - a string expression to split. regex - a string representing a regular expression. The regex string should be a Java regular expression. limit - an integer expression which controls the number of times the regex is applied. limit > 0: The resulting array's length will not be more than limit , and the resulting array's last entry will contain all input beyond the last matched regex. limit <= 0: regex will be applied as many times as possible, and the resulting array can be of any size. Examples: > SELECT split('oneAtwoBthreeC', '[ABC]'); [\"one\",\"two\",\"three\",\"\"] > SELECT split('oneAtwoBthreeC', '[ABC]', -1); [\"one\",\"two\",\"three\",\"\"] > SELECT split('oneAtwoBthreeC', '[ABC]', 2); [\"one\",\"twoBthreeC\"] Since: 1.5.0","title":"split"},{"location":"string-functions/#split_part","text":"split_part(str, delimiter, partNum) - Splits str by delimiter and return requested part of the split (1-based). If any input is null, returns null. if partNum is out of range of split parts, returns empty string. If partNum is 0, throws an error. If partNum is negative, the parts are counted backward from the end of the string. If the delimiter is an empty string, the str is not split. Examples: > SELECT split_part('11.12.13', '.', 3); 13 Since: 3.3.0","title":"split_part"},{"location":"string-functions/#startswith","text":"startswith(left, right) - Returns a boolean. The value is True if left starts with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type. Examples: > SELECT startswith('Spark SQL', 'Spark'); true > SELECT startswith('Spark SQL', 'SQL'); false > SELECT startswith('Spark SQL', null); NULL > SELECT startswith(x'537061726b2053514c', x'537061726b'); true > SELECT startswith(x'537061726b2053514c', x'53514c'); false Since: 3.3.0","title":"startswith"},{"location":"string-functions/#substr","text":"substr(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . substr(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . Examples: > SELECT substr('Spark SQL', 5); k SQL > SELECT substr('Spark SQL', -3); SQL > SELECT substr('Spark SQL', 5, 1); k > SELECT substr('Spark SQL' FROM 5); k SQL > SELECT substr('Spark SQL' FROM -3); SQL > SELECT substr('Spark SQL' FROM 5 FOR 1); k > SELECT substr(encode('Spark SQL', 'utf-8'), 5); k SQL Since: 1.5.0","title":"substr"},{"location":"string-functions/#substring","text":"substring(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . substring(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len , or the slice of byte array that starts at pos and is of length len . Examples: > SELECT substring('Spark SQL', 5); k SQL > SELECT substring('Spark SQL', -3); SQL > SELECT substring('Spark SQL', 5, 1); k > SELECT substring('Spark SQL' FROM 5); k SQL > SELECT substring('Spark SQL' FROM -3); SQL > SELECT substring('Spark SQL' FROM 5 FOR 1); k > SELECT substring(encode('Spark SQL', 'utf-8'), 5); k SQL Since: 1.5.0","title":"substring"},{"location":"string-functions/#substring_index","text":"substring_index(str, delim, count) - Returns the substring from str before count occurrences of the delimiter delim . If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim . Examples: > SELECT substring_index('www.apache.org', '.', 2); www.apache Since: 1.5.0","title":"substring_index"},{"location":"string-functions/#to_binary","text":"to_binary(str[, fmt]) - Converts the input str to a binary value based on the supplied fmt . fmt can be a case-insensitive string literal of \"hex\", \"utf-8\", \"utf8\", or \"base64\". By default, the binary format for conversion is \"hex\" if fmt is omitted. The function returns NULL if at least one of the input parameters is NULL. Examples: > SELECT to_binary('abc', 'utf-8'); abc Since: 3.3.0","title":"to_binary"},{"location":"string-functions/#to_char","text":"to_char(expr, format) - Convert expr to a string based on the format . Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns . If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string. Examples: > SELECT to_char(454, '999'); 454 > SELECT to_char(454.00, '000D00'); 454.00 > SELECT to_char(12454, '99G999'); 12,454 > SELECT to_char(78.12, '$99.99'); $78.12 > SELECT to_char(-12454.8, '99G999D9S'); 12,454.8- > SELECT to_char(date'2016-04-08', 'y'); 2016 > SELECT to_char(x'537061726b2053514c', 'base64'); U3BhcmsgU1FM > SELECT to_char(x'537061726b2053514c', 'hex'); 537061726B2053514C > SELECT to_char(encode('abc', 'utf-8'), 'utf-8'); abc Since: 3.4.0","title":"to_char"},{"location":"string-functions/#to_number","text":"to_number(expr, fmt) - Convert string 'expr' to a number based on the string format 'fmt'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets. ('<1>'). Examples: > SELECT to_number('454', '999'); 454 > SELECT to_number('454.00', '000.00'); 454.00 > SELECT to_number('12,454', '99,999'); 12454 > SELECT to_number('$78.12', '$99.99'); 78.12 > SELECT to_number('12,454.8-', '99,999.9S'); -12454.8 Since: 3.3.0","title":"to_number"},{"location":"string-functions/#to_varchar","text":"to_varchar(expr, format) - Convert expr to a string based on the format . Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns . If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string. Examples: > SELECT to_varchar(454, '999'); 454 > SELECT to_varchar(454.00, '000D00'); 454.00 > SELECT to_varchar(12454, '99G999'); 12,454 > SELECT to_varchar(78.12, '$99.99'); $78.12 > SELECT to_varchar(-12454.8, '99G999D9S'); 12,454.8- > SELECT to_varchar(date'2016-04-08', 'y'); 2016 > SELECT to_varchar(x'537061726b2053514c', 'base64'); U3BhcmsgU1FM > SELECT to_varchar(x'537061726b2053514c', 'hex'); 537061726B2053514C > SELECT to_varchar(encode('abc', 'utf-8'), 'utf-8'); abc Since: 3.5.0","title":"to_varchar"},{"location":"string-functions/#translate","text":"translate(input, from, to) - Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. Examples: > SELECT translate('AaBbCc', 'abc', '123'); A1B2C3 Since: 1.5.0","title":"translate"},{"location":"string-functions/#trim","text":"trim(str) - Removes the leading and trailing space characters from str . trim(BOTH FROM str) - Removes the leading and trailing space characters from str . trim(LEADING FROM str) - Removes the leading space characters from str . trim(TRAILING FROM str) - Removes the trailing space characters from str . trim(trimStr FROM str) - Remove the leading and trailing trimStr characters from str . trim(BOTH trimStr FROM str) - Remove the leading and trailing trimStr characters from str . trim(LEADING trimStr FROM str) - Remove the leading trimStr characters from str . trim(TRAILING trimStr FROM str) - Remove the trailing trimStr characters from str . Arguments: str - a string expression trimStr - the trim string characters to trim, the default value is a single space BOTH, FROM - these are keywords to specify trimming string characters from both ends of the string LEADING, FROM - these are keywords to specify trimming string characters from the left end of the string TRAILING, FROM - these are keywords to specify trimming string characters from the right end of the string Examples: > SELECT trim(' SparkSQL '); SparkSQL > SELECT trim(BOTH FROM ' SparkSQL '); SparkSQL > SELECT trim(LEADING FROM ' SparkSQL '); SparkSQL > SELECT trim(TRAILING FROM ' SparkSQL '); SparkSQL > SELECT trim('SL' FROM 'SSparkSQLS'); parkSQ > SELECT trim(BOTH 'SL' FROM 'SSparkSQLS'); parkSQ > SELECT trim(LEADING 'SL' FROM 'SSparkSQLS'); parkSQLS > SELECT trim(TRAILING 'SL' FROM 'SSparkSQLS'); SSparkSQ Since: 1.5.0","title":"trim"},{"location":"string-functions/#try_to_binary","text":"try_to_binary(str[, fmt]) - This is a special version of to_binary that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed. Examples: > SELECT try_to_binary('abc', 'utf-8'); abc > select try_to_binary('a!', 'base64'); NULL > select try_to_binary('abc', 'invalidFormat'); NULL Since: 3.3.0","title":"try_to_binary"},{"location":"string-functions/#try_to_number","text":"try_to_number(expr, fmt) - Convert string 'expr' to a number based on the string format fmt . Returns NULL if the string 'expr' does not match the expected format. The format follows the same semantics as the to_number function. Examples: > SELECT try_to_number('454', '999'); 454 > SELECT try_to_number('454.00', '000.00'); 454.00 > SELECT try_to_number('12,454', '99,999'); 12454 > SELECT try_to_number('$78.12', '$99.99'); 78.12 > SELECT try_to_number('12,454.8-', '99,999.9S'); -12454.8 Since: 3.3.0","title":"try_to_number"},{"location":"string-functions/#try_validate_utf8","text":"try_validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise returns NULL. Arguments: str - a string expression Examples: > SELECT try_validate_utf8('Spark'); Spark > SELECT try_validate_utf8(x'61'); a > SELECT try_validate_utf8(x'80'); NULL > SELECT try_validate_utf8(x'61C262'); NULL Since: 4.0.0","title":"try_validate_utf8"},{"location":"string-functions/#ucase","text":"ucase(str) - Returns str with all characters changed to uppercase. Examples: > SELECT ucase('SparkSql'); SPARKSQL Since: 1.0.1","title":"ucase"},{"location":"string-functions/#unbase64","text":"unbase64(str) - Converts the argument from a base 64 string str to a binary. Examples: > SELECT unbase64('U3BhcmsgU1FM'); Spark SQL Since: 1.5.0","title":"unbase64"},{"location":"string-functions/#upper","text":"upper(str) - Returns str with all characters changed to uppercase. Examples: > SELECT upper('SparkSql'); SPARKSQL Since: 1.0.1","title":"upper"},{"location":"string-functions/#validate_utf8","text":"validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise throws an exception. Arguments: str - a string expression Examples: > SELECT validate_utf8('Spark'); Spark > SELECT validate_utf8(x'61'); a Since: 4.0.0","title":"validate_utf8"},{"location":"string-functions/#_1","text":"expr1 || expr2 - Returns the concatenation of expr1 and expr2 . Examples: > SELECT 'Spark' || 'SQL'; SparkSQL > SELECT array(1, 2, 3) || array(4, 5) || array(6); [1,2,3,4,5,6] Note: || for arrays is available since 2.4.0. Since: 2.3.0","title":"||"},{"location":"struct-functions/","text":"Struct Functions \u00b6 This page lists all struct functions available in Spark SQL. named_struct \u00b6 named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values. Examples: > SELECT named_struct(\"a\", 1, \"b\", 2, \"c\", 3); {\"a\":1,\"b\":2,\"c\":3} Since: 1.5.0 struct \u00b6 struct(col1, col2, col3, ...) - Creates a struct with the given field values. Examples: > SELECT struct(1, 2, 3); {\"col1\":1,\"col2\":2,\"col3\":3} Since: 1.4.0","title":"Struct Functions"},{"location":"struct-functions/#struct-functions","text":"This page lists all struct functions available in Spark SQL.","title":"Struct Functions"},{"location":"struct-functions/#named_struct","text":"named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values. Examples: > SELECT named_struct(\"a\", 1, \"b\", 2, \"c\", 3); {\"a\":1,\"b\":2,\"c\":3} Since: 1.5.0","title":"named_struct"},{"location":"struct-functions/#struct","text":"struct(col1, col2, col3, ...) - Creates a struct with the given field values. Examples: > SELECT struct(1, 2, 3); {\"col1\":1,\"col2\":2,\"col3\":3} Since: 1.4.0","title":"struct"},{"location":"table-functions/","text":"Table Functions \u00b6 This page lists all table functions available in Spark SQL. python_worker_logs \u00b6 python_worker_logs() - Returns a table of logs collected from Python workers. Examples: > SET spark.sql.pyspark.worker.logging.enabled=true; spark.sql.pyspark.worker.logging.enabled true > SELECT * FROM python_worker_logs(); Since: 4.1.0 range \u00b6 range(start[, end[, step[, numSlices]]]) / range(end) - Returns a table of values within a specified range. Arguments: start - An optional BIGINT literal defaulted to 0, marking the first value generated. end - A BIGINT literal marking endpoint (exclusive) of the number generation. step - An optional BIGINT literal defaulted to 1, specifying the increment used when generating values. numParts - An optional INTEGER literal specifying how the production of rows is spread across partitions. Examples: > SELECT * FROM range(1); +---+ | id| +---+ | 0| +---+ > SELECT * FROM range(0, 2); +---+ |id | +---+ |0 | |1 | +---+ > SELECT * FROM range(0, 4, 2); +---+ |id | +---+ |0 | |2 | +---+ Since: 2.0.0","title":"Table Functions"},{"location":"table-functions/#table-functions","text":"This page lists all table functions available in Spark SQL.","title":"Table Functions"},{"location":"table-functions/#python_worker_logs","text":"python_worker_logs() - Returns a table of logs collected from Python workers. Examples: > SET spark.sql.pyspark.worker.logging.enabled=true; spark.sql.pyspark.worker.logging.enabled true > SELECT * FROM python_worker_logs(); Since: 4.1.0","title":"python_worker_logs"},{"location":"table-functions/#range","text":"range(start[, end[, step[, numSlices]]]) / range(end) - Returns a table of values within a specified range. Arguments: start - An optional BIGINT literal defaulted to 0, marking the first value generated. end - A BIGINT literal marking endpoint (exclusive) of the number generation. step - An optional BIGINT literal defaulted to 1, specifying the increment used when generating values. numParts - An optional INTEGER literal specifying how the production of rows is spread across partitions. Examples: > SELECT * FROM range(1); +---+ | id| +---+ | 0| +---+ > SELECT * FROM range(0, 2); +---+ |id | +---+ |0 | |1 | +---+ > SELECT * FROM range(0, 4, 2); +---+ |id | +---+ |0 | |2 | +---+ Since: 2.0.0","title":"range"},{"location":"url-functions/","text":"Url Functions \u00b6 This page lists all url functions available in Spark SQL. parse_url \u00b6 parse_url(url, partToExtract[, key]) - Extracts a part from a URL. Examples: > SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST'); spark.apache.org > SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY'); query=1 > SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query'); 1 Since: 2.0.0 try_parse_url \u00b6 try_parse_url(url, partToExtract[, key]) - This is a special version of parse_url that performs the same operation, but returns a NULL value instead of raising an error if the parsing cannot be performed. Examples: > SELECT try_parse_url('http://spark.apache.org/path?query=1', 'HOST'); spark.apache.org > SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY'); query=1 > SELECT try_parse_url('inva lid://spark.apache.org/path?query=1', 'QUERY'); NULL > SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query'); 1 Since: 4.0.0 try_url_decode \u00b6 try_url_decode(str) - This is a special version of url_decode that performs the same operation, but returns a NULL value instead of raising an error if the decoding cannot be performed. Arguments: str - a string expression to decode Examples: > SELECT try_url_decode('https%3A%2F%2Fspark.apache.org'); https://spark.apache.org Since: 4.0.0 url_decode \u00b6 url_decode(str) - Decodes a str in 'application/x-www-form-urlencoded' format using a specific encoding scheme. Arguments: str - a string expression to decode Examples: > SELECT url_decode('https%3A%2F%2Fspark.apache.org'); https://spark.apache.org Since: 3.4.0 url_encode \u00b6 url_encode(str) - Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme. Arguments: str - a string expression to be translated Examples: > SELECT url_encode('https://spark.apache.org'); https%3A%2F%2Fspark.apache.org Since: 3.4.0","title":"Url Functions"},{"location":"url-functions/#url-functions","text":"This page lists all url functions available in Spark SQL.","title":"Url Functions"},{"location":"url-functions/#parse_url","text":"parse_url(url, partToExtract[, key]) - Extracts a part from a URL. Examples: > SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST'); spark.apache.org > SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY'); query=1 > SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query'); 1 Since: 2.0.0","title":"parse_url"},{"location":"url-functions/#try_parse_url","text":"try_parse_url(url, partToExtract[, key]) - This is a special version of parse_url that performs the same operation, but returns a NULL value instead of raising an error if the parsing cannot be performed. Examples: > SELECT try_parse_url('http://spark.apache.org/path?query=1', 'HOST'); spark.apache.org > SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY'); query=1 > SELECT try_parse_url('inva lid://spark.apache.org/path?query=1', 'QUERY'); NULL > SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query'); 1 Since: 4.0.0","title":"try_parse_url"},{"location":"url-functions/#try_url_decode","text":"try_url_decode(str) - This is a special version of url_decode that performs the same operation, but returns a NULL value instead of raising an error if the decoding cannot be performed. Arguments: str - a string expression to decode Examples: > SELECT try_url_decode('https%3A%2F%2Fspark.apache.org'); https://spark.apache.org Since: 4.0.0","title":"try_url_decode"},{"location":"url-functions/#url_decode","text":"url_decode(str) - Decodes a str in 'application/x-www-form-urlencoded' format using a specific encoding scheme. Arguments: str - a string expression to decode Examples: > SELECT url_decode('https%3A%2F%2Fspark.apache.org'); https://spark.apache.org Since: 3.4.0","title":"url_decode"},{"location":"url-functions/#url_encode","text":"url_encode(str) - Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme. Arguments: str - a string expression to be translated Examples: > SELECT url_encode('https://spark.apache.org'); https%3A%2F%2Fspark.apache.org Since: 3.4.0","title":"url_encode"},{"location":"variant-functions/","text":"Variant Functions \u00b6 This page lists all variant functions available in Spark SQL. is_variant_null \u00b6 is_variant_null(expr) - Check if a variant value is a variant null. Returns true if and only if the input is a variant null and false otherwise (including in the case of SQL NULL). Examples: > SELECT is_variant_null(parse_json('null')); true > SELECT is_variant_null(parse_json('\"null\"')); false > SELECT is_variant_null(parse_json('13')); false > SELECT is_variant_null(parse_json(null)); false > SELECT is_variant_null(variant_get(parse_json('{\"a\":null, \"b\":\"spark\"}'), \"$.c\")); false > SELECT is_variant_null(variant_get(parse_json('{\"a\":null, \"b\":\"spark\"}'), \"$.a\")); true Since: 4.0.0 parse_json \u00b6 parse_json(jsonStr) - Parse a JSON string as a Variant value. Throw an exception when the string is not valid JSON value. Examples: > SELECT parse_json('{\"a\":1,\"b\":0.8}'); {\"a\":1,\"b\":0.8} Since: 4.0.0 schema_of_variant \u00b6 schema_of_variant(v) - Returns schema in the SQL format of a variant. Examples: > SELECT schema_of_variant(parse_json('null')); VOID > SELECT schema_of_variant(parse_json('[{\"b\":true,\"a\":0}]')); ARRAY<OBJECT<a: BIGINT, b: BOOLEAN>> Since: 4.0.0 schema_of_variant_agg \u00b6 schema_of_variant_agg(v) - Returns the merged schema in the SQL format of a variant column. Examples: > SELECT schema_of_variant_agg(parse_json(j)) FROM VALUES ('1'), ('2'), ('3') AS tab(j); BIGINT > SELECT schema_of_variant_agg(parse_json(j)) FROM VALUES ('{\"a\": 1}'), ('{\"b\": true}'), ('{\"c\": 1.23}') AS tab(j); OBJECT<a: BIGINT, b: BOOLEAN, c: DECIMAL(3,2)> Since: 4.0.0 to_variant_object \u00b6 to_variant_object(expr) - Convert a nested input (array/map/struct) into a variant where maps and structs are converted to variant objects which are unordered unlike SQL structs. Input maps can only have string keys. Examples: > SELECT to_variant_object(named_struct('a', 1, 'b', 2)); {\"a\":1,\"b\":2} > SELECT to_variant_object(array(1, 2, 3)); [1,2,3] > SELECT to_variant_object(array(named_struct('a', 1))); [{\"a\":1}] > SELECT to_variant_object(array(map(\"a\", 2))); [{\"a\":2}] Since: 4.0.0 try_parse_json \u00b6 try_parse_json(jsonStr) - Parse a JSON string as a Variant value. Return NULL when the string is not valid JSON value. Examples: > SELECT try_parse_json('{\"a\":1,\"b\":0.8}'); {\"a\":1,\"b\":0.8} > SELECT try_parse_json('{\"a\":1,'); NULL Since: 4.0.0 try_variant_get \u00b6 try_variant_get(v, path[, type]) - Extracts a sub-variant from v according to path , and then cast the sub-variant to type . When type is omitted, it is default to variant . Returns null if the path does not exist or the cast fails. Examples: > SELECT try_variant_get(parse_json('{\"a\": 1}'), '$.a', 'int'); 1 > SELECT try_variant_get(parse_json('{\"a\": 1}'), '$.b', 'int'); NULL > SELECT try_variant_get(parse_json('[1, \"2\"]'), '$[1]', 'string'); 2 > SELECT try_variant_get(parse_json('[1, \"2\"]'), '$[2]', 'string'); NULL > SELECT try_variant_get(parse_json('[1, \"hello\"]'), '$[1]'); \"hello\" > SELECT try_variant_get(parse_json('[1, \"hello\"]'), '$[1]', 'int'); NULL Since: 4.0.0 variant_explode \u00b6 variant_explode(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant> . pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values. Examples: > SELECT * from variant_explode(parse_json('[\"hello\", \"world\"]')); 0 NULL \"hello\" 1 NULL \"world\" > SELECT * from variant_explode(input => parse_json('{\"a\": true, \"b\": 3.14}')); 0 a true 1 b 3.14 Since: 4.0.0 variant_explode_outer \u00b6 variant_explode_outer(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant> . pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values. Examples: > SELECT * from variant_explode_outer(parse_json('[\"hello\", \"world\"]')); 0 NULL \"hello\" 1 NULL \"world\" > SELECT * from variant_explode_outer(input => parse_json('{\"a\": true, \"b\": 3.14}')); 0 a true 1 b 3.14 Since: 4.0.0 variant_get \u00b6 variant_get(v, path[, type]) - Extracts a sub-variant from v according to path , and then cast the sub-variant to type . When type is omitted, it is default to variant . Returns null if the path does not exist. Throws an exception if the cast fails. Examples: > SELECT variant_get(parse_json('{\"a\": 1}'), '$.a', 'int'); 1 > SELECT variant_get(parse_json('{\"a\": 1}'), '$.b', 'int'); NULL > SELECT variant_get(parse_json('[1, \"2\"]'), '$[1]', 'string'); 2 > SELECT variant_get(parse_json('[1, \"2\"]'), '$[2]', 'string'); NULL > SELECT variant_get(parse_json('[1, \"hello\"]'), '$[1]'); \"hello\" Since: 4.0.0","title":"Variant Functions"},{"location":"variant-functions/#variant-functions","text":"This page lists all variant functions available in Spark SQL.","title":"Variant Functions"},{"location":"variant-functions/#is_variant_null","text":"is_variant_null(expr) - Check if a variant value is a variant null. Returns true if and only if the input is a variant null and false otherwise (including in the case of SQL NULL). Examples: > SELECT is_variant_null(parse_json('null')); true > SELECT is_variant_null(parse_json('\"null\"')); false > SELECT is_variant_null(parse_json('13')); false > SELECT is_variant_null(parse_json(null)); false > SELECT is_variant_null(variant_get(parse_json('{\"a\":null, \"b\":\"spark\"}'), \"$.c\")); false > SELECT is_variant_null(variant_get(parse_json('{\"a\":null, \"b\":\"spark\"}'), \"$.a\")); true Since: 4.0.0","title":"is_variant_null"},{"location":"variant-functions/#parse_json","text":"parse_json(jsonStr) - Parse a JSON string as a Variant value. Throw an exception when the string is not valid JSON value. Examples: > SELECT parse_json('{\"a\":1,\"b\":0.8}'); {\"a\":1,\"b\":0.8} Since: 4.0.0","title":"parse_json"},{"location":"variant-functions/#schema_of_variant","text":"schema_of_variant(v) - Returns schema in the SQL format of a variant. Examples: > SELECT schema_of_variant(parse_json('null')); VOID > SELECT schema_of_variant(parse_json('[{\"b\":true,\"a\":0}]')); ARRAY<OBJECT<a: BIGINT, b: BOOLEAN>> Since: 4.0.0","title":"schema_of_variant"},{"location":"variant-functions/#schema_of_variant_agg","text":"schema_of_variant_agg(v) - Returns the merged schema in the SQL format of a variant column. Examples: > SELECT schema_of_variant_agg(parse_json(j)) FROM VALUES ('1'), ('2'), ('3') AS tab(j); BIGINT > SELECT schema_of_variant_agg(parse_json(j)) FROM VALUES ('{\"a\": 1}'), ('{\"b\": true}'), ('{\"c\": 1.23}') AS tab(j); OBJECT<a: BIGINT, b: BOOLEAN, c: DECIMAL(3,2)> Since: 4.0.0","title":"schema_of_variant_agg"},{"location":"variant-functions/#to_variant_object","text":"to_variant_object(expr) - Convert a nested input (array/map/struct) into a variant where maps and structs are converted to variant objects which are unordered unlike SQL structs. Input maps can only have string keys. Examples: > SELECT to_variant_object(named_struct('a', 1, 'b', 2)); {\"a\":1,\"b\":2} > SELECT to_variant_object(array(1, 2, 3)); [1,2,3] > SELECT to_variant_object(array(named_struct('a', 1))); [{\"a\":1}] > SELECT to_variant_object(array(map(\"a\", 2))); [{\"a\":2}] Since: 4.0.0","title":"to_variant_object"},{"location":"variant-functions/#try_parse_json","text":"try_parse_json(jsonStr) - Parse a JSON string as a Variant value. Return NULL when the string is not valid JSON value. Examples: > SELECT try_parse_json('{\"a\":1,\"b\":0.8}'); {\"a\":1,\"b\":0.8} > SELECT try_parse_json('{\"a\":1,'); NULL Since: 4.0.0","title":"try_parse_json"},{"location":"variant-functions/#try_variant_get","text":"try_variant_get(v, path[, type]) - Extracts a sub-variant from v according to path , and then cast the sub-variant to type . When type is omitted, it is default to variant . Returns null if the path does not exist or the cast fails. Examples: > SELECT try_variant_get(parse_json('{\"a\": 1}'), '$.a', 'int'); 1 > SELECT try_variant_get(parse_json('{\"a\": 1}'), '$.b', 'int'); NULL > SELECT try_variant_get(parse_json('[1, \"2\"]'), '$[1]', 'string'); 2 > SELECT try_variant_get(parse_json('[1, \"2\"]'), '$[2]', 'string'); NULL > SELECT try_variant_get(parse_json('[1, \"hello\"]'), '$[1]'); \"hello\" > SELECT try_variant_get(parse_json('[1, \"hello\"]'), '$[1]', 'int'); NULL Since: 4.0.0","title":"try_variant_get"},{"location":"variant-functions/#variant_explode","text":"variant_explode(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant> . pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values. Examples: > SELECT * from variant_explode(parse_json('[\"hello\", \"world\"]')); 0 NULL \"hello\" 1 NULL \"world\" > SELECT * from variant_explode(input => parse_json('{\"a\": true, \"b\": 3.14}')); 0 a true 1 b 3.14 Since: 4.0.0","title":"variant_explode"},{"location":"variant-functions/#variant_explode_outer","text":"variant_explode_outer(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant> . pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values. Examples: > SELECT * from variant_explode_outer(parse_json('[\"hello\", \"world\"]')); 0 NULL \"hello\" 1 NULL \"world\" > SELECT * from variant_explode_outer(input => parse_json('{\"a\": true, \"b\": 3.14}')); 0 a true 1 b 3.14 Since: 4.0.0","title":"variant_explode_outer"},{"location":"variant-functions/#variant_get","text":"variant_get(v, path[, type]) - Extracts a sub-variant from v according to path , and then cast the sub-variant to type . When type is omitted, it is default to variant . Returns null if the path does not exist. Throws an exception if the cast fails. Examples: > SELECT variant_get(parse_json('{\"a\": 1}'), '$.a', 'int'); 1 > SELECT variant_get(parse_json('{\"a\": 1}'), '$.b', 'int'); NULL > SELECT variant_get(parse_json('[1, \"2\"]'), '$[1]', 'string'); 2 > SELECT variant_get(parse_json('[1, \"2\"]'), '$[2]', 'string'); NULL > SELECT variant_get(parse_json('[1, \"hello\"]'), '$[1]'); \"hello\" Since: 4.0.0","title":"variant_get"},{"location":"vector-functions/","text":"Vector Functions \u00b6 This page lists all vector functions available in Spark SQL. vector_avg \u00b6 vector_avg(array) - Returns the element-wise mean of float vectors in a group. All vectors must have the same dimension. Examples: > SELECT vector_avg(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col); [2.0,3.0] Since: 4.2.0 vector_cosine_similarity \u00b6 vector_cosine_similarity(array1, array2) - Returns the cosine similarity between two float vectors. The vectors must have the same dimension. Examples: > SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F)); 0.9746319 Since: 4.2.0 vector_inner_product \u00b6 vector_inner_product(array1, array2) - Returns the inner product (dot product) between two float vectors. The vectors must have the same dimension. Examples: > SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F)); 32.0 Since: 4.2.0 vector_l2_distance \u00b6 vector_l2_distance(array1, array2) - Returns the Euclidean (L2) distance between two float vectors. The vectors must have the same dimension. Examples: > SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F)); 5.196152 Since: 4.2.0 vector_norm \u00b6 vector_norm(vector, degree) - Returns the Lp norm of a float vector using the specified degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm). Examples: > SELECT vector_norm(array(3.0F, 4.0F), 2.0F); 5.0 > SELECT vector_norm(array(3.0F, 4.0F), 1.0F); 7.0 > SELECT vector_norm(array(3.0F, 4.0F), float('inf')); 4.0 Since: 4.2.0 vector_normalize \u00b6 vector_normalize(vector, degree) - Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm). Examples: > SELECT vector_normalize(array(3.0F, 4.0F), 2.0F); [0.6,0.8] > SELECT vector_normalize(array(3.0F, 4.0F), 1.0F); [0.42857143,0.5714286] > SELECT vector_normalize(array(3.0F, 4.0F), float('inf')); [0.75,1.0] Since: 4.2.0 vector_sum \u00b6 vector_sum(array) - Returns the element-wise sum of float vectors in a group. All vectors must have the same dimension. Examples: > SELECT vector_sum(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col); [4.0,6.0] Since: 4.2.0","title":"Vector Functions"},{"location":"vector-functions/#vector-functions","text":"This page lists all vector functions available in Spark SQL.","title":"Vector Functions"},{"location":"vector-functions/#vector_avg","text":"vector_avg(array) - Returns the element-wise mean of float vectors in a group. All vectors must have the same dimension. Examples: > SELECT vector_avg(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col); [2.0,3.0] Since: 4.2.0","title":"vector_avg"},{"location":"vector-functions/#vector_cosine_similarity","text":"vector_cosine_similarity(array1, array2) - Returns the cosine similarity between two float vectors. The vectors must have the same dimension. Examples: > SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F)); 0.9746319 Since: 4.2.0","title":"vector_cosine_similarity"},{"location":"vector-functions/#vector_inner_product","text":"vector_inner_product(array1, array2) - Returns the inner product (dot product) between two float vectors. The vectors must have the same dimension. Examples: > SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F)); 32.0 Since: 4.2.0","title":"vector_inner_product"},{"location":"vector-functions/#vector_l2_distance","text":"vector_l2_distance(array1, array2) - Returns the Euclidean (L2) distance between two float vectors. The vectors must have the same dimension. Examples: > SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F)); 5.196152 Since: 4.2.0","title":"vector_l2_distance"},{"location":"vector-functions/#vector_norm","text":"vector_norm(vector, degree) - Returns the Lp norm of a float vector using the specified degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm). Examples: > SELECT vector_norm(array(3.0F, 4.0F), 2.0F); 5.0 > SELECT vector_norm(array(3.0F, 4.0F), 1.0F); 7.0 > SELECT vector_norm(array(3.0F, 4.0F), float('inf')); 4.0 Since: 4.2.0","title":"vector_norm"},{"location":"vector-functions/#vector_normalize","text":"vector_normalize(vector, degree) - Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm). Examples: > SELECT vector_normalize(array(3.0F, 4.0F), 2.0F); [0.6,0.8] > SELECT vector_normalize(array(3.0F, 4.0F), 1.0F); [0.42857143,0.5714286] > SELECT vector_normalize(array(3.0F, 4.0F), float('inf')); [0.75,1.0] Since: 4.2.0","title":"vector_normalize"},{"location":"vector-functions/#vector_sum","text":"vector_sum(array) - Returns the element-wise sum of float vectors in a group. All vectors must have the same dimension. Examples: > SELECT vector_sum(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col); [4.0,6.0] Since: 4.2.0","title":"vector_sum"},{"location":"window-functions/","text":"Window Functions \u00b6 This page lists all window functions available in Spark SQL. cume_dist \u00b6 cume_dist() - Computes the position of a value relative to all values in the partition. Examples: > SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 0.6666666666666666 A1 1 0.6666666666666666 A1 2 1.0 A2 3 1.0 Since: 2.0.0 dense_rank \u00b6 dense_rank() - Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence. Arguments: children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser. Examples: > SELECT a, b, dense_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 2 A2 3 1 Since: 2.0.0 lag \u00b6 lag(input[, offset[, default]]) - Returns the value of input at the offset th row before the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offset th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), default is returned. Arguments: input - a string expression to evaluate offset rows before the current row. offset - an int expression which is rows to jump back in the partition. default - a string expression which is to use when the offset row does not exist. Examples: > SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 NULL A1 1 1 A1 2 1 A2 3 NULL Since: 2.0.0 lead \u00b6 lead(input[, offset[, default]]) - Returns the value of input at the offset th row after the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offset th row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), default is returned. Arguments: input - a string expression to evaluate offset rows after the current row. offset - an int expression which is rows to jump ahead in the partition. default - a string expression which is to use when the offset is larger than the window. The default value is null. Examples: > SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 2 A1 2 NULL A2 3 NULL Since: 2.0.0 nth_value \u00b6 nth_value(input[, offset]) - Returns the value of input at the row that is the offset th row from beginning of the window frame. Offset starts at 1. If ignoreNulls=true, we will skip nulls when finding the offset th row. Otherwise, every row counts for the offset . If there is no such an offset th row (e.g., when the offset is 10, size of the window frame is less than 10), null is returned. Arguments: input - the target column or expression that the function operates on. offset - a positive int literal to indicate the offset in the window frame. It starts with 1. ignoreNulls - an optional specification that indicates the NthValue should skip null values in the determination of which row to use. Examples: > SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 1 A2 3 NULL Since: 3.1.0 ntile \u00b6 ntile(n) - Divides the rows for each window partition into n buckets ranging from 1 to at most n . Arguments: buckets - an int expression which is number of buckets to divide the rows in. Default value is 1. Examples: > SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 2 A2 3 1 Since: 2.0.0 percent_rank \u00b6 percent_rank() - Computes the percentage ranking of a value in a group of values. Arguments: children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser. Examples: > SELECT a, b, percent_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 0.0 A1 1 0.0 A1 2 1.0 A2 3 0.0 Since: 2.0.0 rank \u00b6 rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Arguments: children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser. Examples: > SELECT a, b, rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 3 A2 3 1 Since: 2.0.0 row_number \u00b6 row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition. Examples: > SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 2 A1 2 3 A2 3 1 Since: 2.0.0","title":"Window Functions"},{"location":"window-functions/#window-functions","text":"This page lists all window functions available in Spark SQL.","title":"Window Functions"},{"location":"window-functions/#cume_dist","text":"cume_dist() - Computes the position of a value relative to all values in the partition. Examples: > SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 0.6666666666666666 A1 1 0.6666666666666666 A1 2 1.0 A2 3 1.0 Since: 2.0.0","title":"cume_dist"},{"location":"window-functions/#dense_rank","text":"dense_rank() - Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence. Arguments: children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser. Examples: > SELECT a, b, dense_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 2 A2 3 1 Since: 2.0.0","title":"dense_rank"},{"location":"window-functions/#lag","text":"lag(input[, offset[, default]]) - Returns the value of input at the offset th row before the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offset th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), default is returned. Arguments: input - a string expression to evaluate offset rows before the current row. offset - an int expression which is rows to jump back in the partition. default - a string expression which is to use when the offset row does not exist. Examples: > SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 NULL A1 1 1 A1 2 1 A2 3 NULL Since: 2.0.0","title":"lag"},{"location":"window-functions/#lead","text":"lead(input[, offset[, default]]) - Returns the value of input at the offset th row after the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offset th row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), default is returned. Arguments: input - a string expression to evaluate offset rows after the current row. offset - an int expression which is rows to jump ahead in the partition. default - a string expression which is to use when the offset is larger than the window. The default value is null. Examples: > SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 2 A1 2 NULL A2 3 NULL Since: 2.0.0","title":"lead"},{"location":"window-functions/#nth_value","text":"nth_value(input[, offset]) - Returns the value of input at the row that is the offset th row from beginning of the window frame. Offset starts at 1. If ignoreNulls=true, we will skip nulls when finding the offset th row. Otherwise, every row counts for the offset . If there is no such an offset th row (e.g., when the offset is 10, size of the window frame is less than 10), null is returned. Arguments: input - the target column or expression that the function operates on. offset - a positive int literal to indicate the offset in the window frame. It starts with 1. ignoreNulls - an optional specification that indicates the NthValue should skip null values in the determination of which row to use. Examples: > SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 1 A2 3 NULL Since: 3.1.0","title":"nth_value"},{"location":"window-functions/#ntile","text":"ntile(n) - Divides the rows for each window partition into n buckets ranging from 1 to at most n . Arguments: buckets - an int expression which is number of buckets to divide the rows in. Default value is 1. Examples: > SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 2 A2 3 1 Since: 2.0.0","title":"ntile"},{"location":"window-functions/#percent_rank","text":"percent_rank() - Computes the percentage ranking of a value in a group of values. Arguments: children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser. Examples: > SELECT a, b, percent_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 0.0 A1 1 0.0 A1 2 1.0 A2 3 0.0 Since: 2.0.0","title":"percent_rank"},{"location":"window-functions/#rank","text":"rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Arguments: children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser. Examples: > SELECT a, b, rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 1 A1 2 3 A2 3 1 Since: 2.0.0","title":"rank"},{"location":"window-functions/#row_number","text":"row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition. Examples: > SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); A1 1 1 A1 1 2 A1 2 3 A2 3 1 Since: 2.0.0","title":"row_number"},{"location":"xml-functions/","text":"Xml Functions \u00b6 This page lists all xml functions available in Spark SQL. from_xml \u00b6 from_xml(xmlStr, schema[, options]) - Returns a struct value with the given xmlStr and schema . Examples: > SELECT from_xml('<p><a>1</a><b>0.8</b></p>', 'a INT, b DOUBLE'); {\"a\":1,\"b\":0.8} > SELECT from_xml('<p><time>26/08/2015</time></p>', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); {\"time\":2015-08-26 00:00:00} > SELECT from_xml('<p><teacher>Alice</teacher><student><name>Bob</name><rank>1</rank></student><student><name>Charlie</name><rank>2</rank></student></p>', 'STRUCT<teacher: STRING, student: ARRAY<STRUCT<name: STRING, rank: INT>>>'); {\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]} Since: 4.0.0 schema_of_xml \u00b6 schema_of_xml(xml[, options]) - Returns schema in the DDL format of XML string. Examples: > SELECT schema_of_xml('<p><a>1</a></p>'); STRUCT<a: BIGINT> > SELECT schema_of_xml('<p><a attr=\"2\">1</a><a>3</a></p>', map('excludeAttribute', 'true')); STRUCT<a: ARRAY<BIGINT>> Since: 4.0.0 to_xml \u00b6 to_xml(expr[, options]) - Returns a XML string with a given struct value Examples: > SELECT to_xml(named_struct('a', 1, 'b', 2)); <ROW> <a>1</a> <b>2</b> </ROW> > SELECT to_xml(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); <ROW> <time>26/08/2015</time> </ROW> Since: 4.0.0 xpath \u00b6 xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression. Examples: > SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()'); [\"b1\",\"b2\",\"b3\"] > SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b'); [null,null,null] Since: 2.0.0 xpath_boolean \u00b6 xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found. Examples: > SELECT xpath_boolean('<a><b>1</b></a>','a/b'); true Since: 2.0.0 xpath_double \u00b6 xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric. Examples: > SELECT xpath_double('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3.0 Since: 2.0.0 xpath_float \u00b6 xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric. Examples: > SELECT xpath_float('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3.0 Since: 2.0.0 xpath_int \u00b6 xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric. Examples: > SELECT xpath_int('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3 Since: 2.0.0 xpath_long \u00b6 xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric. Examples: > SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3 Since: 2.0.0 xpath_number \u00b6 xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric. Examples: > SELECT xpath_number('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3.0 Since: 2.0.0 xpath_short \u00b6 xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric. Examples: > SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3 Since: 2.0.0 xpath_string \u00b6 xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression. Examples: > SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c'); cc Since: 2.0.0","title":"Xml Functions"},{"location":"xml-functions/#xml-functions","text":"This page lists all xml functions available in Spark SQL.","title":"Xml Functions"},{"location":"xml-functions/#from_xml","text":"from_xml(xmlStr, schema[, options]) - Returns a struct value with the given xmlStr and schema . Examples: > SELECT from_xml('<p><a>1</a><b>0.8</b></p>', 'a INT, b DOUBLE'); {\"a\":1,\"b\":0.8} > SELECT from_xml('<p><time>26/08/2015</time></p>', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); {\"time\":2015-08-26 00:00:00} > SELECT from_xml('<p><teacher>Alice</teacher><student><name>Bob</name><rank>1</rank></student><student><name>Charlie</name><rank>2</rank></student></p>', 'STRUCT<teacher: STRING, student: ARRAY<STRUCT<name: STRING, rank: INT>>>'); {\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]} Since: 4.0.0","title":"from_xml"},{"location":"xml-functions/#schema_of_xml","text":"schema_of_xml(xml[, options]) - Returns schema in the DDL format of XML string. Examples: > SELECT schema_of_xml('<p><a>1</a></p>'); STRUCT<a: BIGINT> > SELECT schema_of_xml('<p><a attr=\"2\">1</a><a>3</a></p>', map('excludeAttribute', 'true')); STRUCT<a: ARRAY<BIGINT>> Since: 4.0.0","title":"schema_of_xml"},{"location":"xml-functions/#to_xml","text":"to_xml(expr[, options]) - Returns a XML string with a given struct value Examples: > SELECT to_xml(named_struct('a', 1, 'b', 2)); <ROW> <a>1</a> <b>2</b> </ROW> > SELECT to_xml(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); <ROW> <time>26/08/2015</time> </ROW> Since: 4.0.0","title":"to_xml"},{"location":"xml-functions/#xpath","text":"xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression. Examples: > SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()'); [\"b1\",\"b2\",\"b3\"] > SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b'); [null,null,null] Since: 2.0.0","title":"xpath"},{"location":"xml-functions/#xpath_boolean","text":"xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found. Examples: > SELECT xpath_boolean('<a><b>1</b></a>','a/b'); true Since: 2.0.0","title":"xpath_boolean"},{"location":"xml-functions/#xpath_double","text":"xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric. Examples: > SELECT xpath_double('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3.0 Since: 2.0.0","title":"xpath_double"},{"location":"xml-functions/#xpath_float","text":"xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric. Examples: > SELECT xpath_float('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3.0 Since: 2.0.0","title":"xpath_float"},{"location":"xml-functions/#xpath_int","text":"xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric. Examples: > SELECT xpath_int('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3 Since: 2.0.0","title":"xpath_int"},{"location":"xml-functions/#xpath_long","text":"xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric. Examples: > SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3 Since: 2.0.0","title":"xpath_long"},{"location":"xml-functions/#xpath_number","text":"xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric. Examples: > SELECT xpath_number('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3.0 Since: 2.0.0","title":"xpath_number"},{"location":"xml-functions/#xpath_short","text":"xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric. Examples: > SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)'); 3 Since: 2.0.0","title":"xpath_short"},{"location":"xml-functions/#xpath_string","text":"xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression. Examples: > SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c'); cc Since: 2.0.0","title":"xpath_string"}]}