Property Name	Default	Meaning	Since Version
`spark.sql.adaptive.advisoryPartitionSizeInBytes`	(value of `spark.sql.adaptive.shuffle.targetPostShuffleInputSize`)	The advisory size in bytes of the shuffle partition during adaptive optimization (when spark.sql.adaptive.enabled is true). It takes effect when Spark coalesces small shuffle partitions or splits skewed shuffle partition.	3.0.0
`spark.sql.adaptive.autoBroadcastJoinThreshold`	(none)	Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. The default value is same with spark.sql.autoBroadcastJoinThreshold. Note that, this config is used only in adaptive framework.	3.2.0
`spark.sql.adaptive.coalescePartitions.enabled`	true	When true and 'spark.sql.adaptive.enabled' is true, Spark will coalesce contiguous shuffle partitions according to the target size (specified by 'spark.sql.adaptive.advisoryPartitionSizeInBytes'), to avoid too many small tasks.	3.0.0
`spark.sql.adaptive.coalescePartitions.initialPartitionNum`	(none)	The initial number of shuffle partitions before coalescing. If not set, it equals to spark.sql.shuffle.partitions. This configuration only has an effect when 'spark.sql.adaptive.enabled' and 'spark.sql.adaptive.coalescePartitions.enabled' are both true.	3.0.0
`spark.sql.adaptive.coalescePartitions.minPartitionSize`	1MB	The minimum size of shuffle partitions after coalescing. This is useful when the adaptively calculated target size is too small during partition coalescing.	3.2.0
`spark.sql.adaptive.coalescePartitions.parallelismFirst`	true	When true, Spark does not respect the target size specified by 'spark.sql.adaptive.advisoryPartitionSizeInBytes' (default 64MB) when coalescing contiguous shuffle partitions, but adaptively calculate the target size according to the default parallelism of the Spark cluster. The calculated size is usually smaller than the configured target size. This is to maximize the parallelism and avoid performance regression when enabling adaptive query execution. It's recommended to set this config to false and respect the configured target size.	3.2.0
`spark.sql.adaptive.customCostEvaluatorClass`	(none)	The custom cost evaluator class to be used for adaptive execution. If not being set, Spark will use its own SimpleCostEvaluator by default.	3.2.0
`spark.sql.adaptive.enabled`	true	When true, enable adaptive query execution, which re-optimizes the query plan in the middle of query execution, based on accurate runtime statistics.	1.6.0
`spark.sql.adaptive.forceOptimizeSkewedJoin`	false	When true, force enable OptimizeSkewedJoin even if it introduces extra shuffle.	3.3.0
`spark.sql.adaptive.localShuffleReader.enabled`	true	When true and 'spark.sql.adaptive.enabled' is true, Spark tries to use local shuffle reader to read the shuffle data when the shuffle partitioning is not needed, for example, after converting sort-merge join to broadcast-hash join.	3.0.0
`spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold`	0b	Configures the maximum size in bytes per partition that can be allowed to build local hash map. If this value is not smaller than spark.sql.adaptive.advisoryPartitionSizeInBytes and all the partition size are not larger than this config, join selection prefer to use shuffled hash join instead of sort merge join regardless of the value of spark.sql.join.preferSortMergeJoin.	3.2.0
`spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled`	true	When true and 'spark.sql.adaptive.enabled' is true, Spark will optimize the skewed shuffle partitions in RebalancePartitions and split them to smaller ones according to the target size (specified by 'spark.sql.adaptive.advisoryPartitionSizeInBytes'), to avoid data skew.	3.2.0
`spark.sql.adaptive.optimizer.excludedRules`	(none)	Configures a list of rules to be disabled in the adaptive optimizer, in which the rules are specified by their rule names and separated by comma. The optimizer will log the rules that have indeed been excluded.	3.1.0
`spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor`	0.2	A partition will be merged during splitting if its size is small than this factor multiply spark.sql.adaptive.advisoryPartitionSizeInBytes.	3.3.0
`spark.sql.adaptive.skewJoin.enabled`	true	When true and 'spark.sql.adaptive.enabled' is true, Spark dynamically handles skew in shuffled join (sort-merge and shuffled hash) by splitting (and replicating if needed) skewed partitions.	3.0.0
`spark.sql.adaptive.skewJoin.skewedPartitionFactor`	5.0	A partition is considered as skewed if its size is larger than this factor multiplying the median partition size and also larger than 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'	3.0.0
`spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes`	256MB	A partition is considered as skewed if its size in bytes is larger than this threshold and also larger than 'spark.sql.adaptive.skewJoin.skewedPartitionFactor' multiplying the median partition size. Ideally this config should be set larger than 'spark.sql.adaptive.advisoryPartitionSizeInBytes'.	3.0.0
`spark.sql.allowNamedFunctionArguments`	true	If true, Spark will turn on support for named parameters for all functions that has it implemented.	3.5.0
`spark.sql.ansi.doubleQuotedIdentifiers`	false	When true and 'spark.sql.ansi.enabled' is true, Spark SQL reads literals enclosed in double quoted (") as identifiers. When false they are read as string literals.	3.4.0
`spark.sql.ansi.enabled`	false	When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. For example, Spark will throw an exception at runtime instead of returning null results when the inputs to a SQL operator/function are invalid.For full details of this dialect, you can find them in the section "ANSI Compliance" of Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL standard directly, but their behaviors align with ANSI SQL's style	3.0.0
`spark.sql.ansi.enforceReservedKeywords`	false	When true and 'spark.sql.ansi.enabled' is true, the Spark SQL parser enforces the ANSI reserved keywords and forbids SQL queries that use reserved keywords as alias names and/or identifiers for table, view, function, etc.	3.3.0
`spark.sql.ansi.relationPrecedence`	false	When true and 'spark.sql.ansi.enabled' is true, JOIN takes precedence over comma when combining relation. For example, `t1, t2 JOIN t3` should result to `t1 X (t2 X t3)`. If the config is false, the result is `(t1 X t2) X t3`.	3.4.0
`spark.sql.autoBroadcastJoinThreshold`	10MB	Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. Note that currently statistics are only supported for Hive Metastore tables where the command `ANALYZE TABLE <tableName> COMPUTE STATISTICS noscan` has been run, and file-based data source tables where the statistics are computed directly on the files of data.	1.1.0
`spark.sql.avro.compression.codec`	snappy	Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, snappy, bzip2, xz and zstandard. Default codec is snappy.	2.4.0
`spark.sql.avro.deflate.level`	-1	Compression level for the deflate codec used in writing of AVRO files. Valid value must be in the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level in the current implementation.	2.4.0
`spark.sql.avro.filterPushdown.enabled`	true	When true, enable filter pushdown to Avro datasource.	3.1.0
`spark.sql.broadcastTimeout`	300	Timeout in seconds for the broadcast wait time in broadcast joins.	1.3.0
`spark.sql.bucketing.coalesceBucketsInJoin.enabled`	false	When true, if two bucketed tables with the different number of buckets are joined, the side with a bigger number of buckets will be coalesced to have the same number of buckets as the other side. Bigger number of buckets is divisible by the smaller number of buckets. Bucket coalescing is applied to sort-merge joins and shuffled hash join. Note: Coalescing bucketed table can avoid unnecessary shuffling in join, but it also reduces parallelism and could possibly cause OOM for shuffled hash join.	3.1.0
`spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio`	4	The ratio of the number of two buckets being coalesced should be less than or equal to this value for bucket coalescing to be applied. This configuration only has an effect when 'spark.sql.bucketing.coalesceBucketsInJoin.enabled' is set to true.	3.1.0
`spark.sql.catalog.spark_catalog`	(none)	A catalog implementation that will be used as the v2 interface to Spark's built-in v1 catalog: spark_catalog. This catalog shares its identifier namespace with the spark_catalog and must be consistent with it; for example, if a table can be loaded by the spark_catalog, this catalog must also return the table metadata. To delegate operations to the spark_catalog, implementations can extend 'CatalogExtension'.	3.0.0
`spark.sql.cbo.enabled`	false	Enables CBO for estimation of plan statistics when set true.	2.2.0
`spark.sql.cbo.joinReorder.dp.star.filter`	false	Applies star-join filter heuristics to cost based join enumeration.	2.2.0
`spark.sql.cbo.joinReorder.dp.threshold`	12	The maximum number of joined nodes allowed in the dynamic programming algorithm.	2.2.0
`spark.sql.cbo.joinReorder.enabled`	false	Enables join reorder in CBO.	2.2.0
`spark.sql.cbo.planStats.enabled`	false	When true, the logical plan will fetch row counts and column statistics from catalog.	3.0.0
`spark.sql.cbo.starSchemaDetection`	false	When true, it enables join reordering based on star schema detection.	2.2.0
`spark.sql.charAsVarchar`	false	When true, Spark replaces CHAR type with VARCHAR type in CREATE/REPLACE/ALTER TABLE commands, so that newly created/updated tables will not have CHAR type columns/fields. Existing tables with CHAR type columns/fields are not affected by this config.	3.3.0
`spark.sql.cli.print.header`	false	When set to true, spark-sql CLI prints the names of the columns in query output.	3.2.0
`spark.sql.columnNameOfCorruptRecord`	_corrupt_record	The name of internal column for storing raw/un-parsed JSON and CSV records that fail to parse.	1.2.0
`spark.sql.csv.filterPushdown.enabled`	true	When true, enable filter pushdown to CSV datasource.	3.0.0
`spark.sql.datetime.java8API.enabled`	false	If the configuration property is set to true, java.time.Instant and java.time.LocalDate classes of Java 8 API are used as external types for Catalyst's TimestampType and DateType. If it is set to false, java.sql.Timestamp and java.sql.Date are used for the same purpose.	3.0.0
`spark.sql.debug.maxToStringFields`	25	Maximum number of fields of sequence-like entries can be converted to strings in debug output. Any elements beyond the limit will be dropped and replaced by a "... N more fields" placeholder.	3.0.0
`spark.sql.defaultCatalog`	spark_catalog	Name of the default catalog. This will be the current catalog if users have not explicitly set the current catalog yet.	3.0.0
`spark.sql.error.messageFormat`	PRETTY	When PRETTY, the error message consists of textual representation of error class, message and query context. The MINIMAL and STANDARD formats are pretty JSON formats where STANDARD includes an additional JSON field `message`. This configuration property influences on error messages of Thrift Server and SQL CLI while running queries.	3.4.0
`spark.sql.execution.arrow.enabled`	false	(Deprecated since Spark 3.0, please set 'spark.sql.execution.arrow.pyspark.enabled'.)	2.3.0
`spark.sql.execution.arrow.fallback.enabled`	true	(Deprecated since Spark 3.0, please set 'spark.sql.execution.arrow.pyspark.fallback.enabled'.)	2.4.0
`spark.sql.execution.arrow.localRelationThreshold`	48MB	When converting Arrow batches to Spark DataFrame, local collections are used in the driver side if the byte size of Arrow batches is smaller than this threshold. Otherwise, the Arrow batches are sent and deserialized to Spark internal rows in the executors.	3.4.0
`spark.sql.execution.arrow.maxRecordsPerBatch`	10000	When using Apache Arrow, limit the maximum number of records that can be written to a single ArrowRecordBatch in memory. If set to zero or negative there is no limit.	2.3.0
`spark.sql.execution.arrow.pyspark.enabled`	(value of `spark.sql.execution.arrow.enabled`)	When true, make use of Apache Arrow for columnar data transfers in PySpark. This optimization applies to: 1. pyspark.sql.DataFrame.toPandas. 2. pyspark.sql.SparkSession.createDataFrame when its input is a Pandas DataFrame or a NumPy ndarray. The following data type is unsupported: ArrayType of TimestampType.	3.0.0
`spark.sql.execution.arrow.pyspark.fallback.enabled`	(value of `spark.sql.execution.arrow.fallback.enabled`)	When true, optimizations enabled by 'spark.sql.execution.arrow.pyspark.enabled' will fallback automatically to non-optimized implementations if an error occurs.	3.0.0
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled`	false	(Experimental) When true, make use of Apache Arrow's self-destruct and split-blocks options for columnar data transfers in PySpark, when converting from Arrow to Pandas. This reduces memory usage at the cost of some CPU time. This optimization applies to: pyspark.sql.DataFrame.toPandas when 'spark.sql.execution.arrow.pyspark.enabled' is set.	3.2.0
`spark.sql.execution.arrow.sparkr.enabled`	false	When true, make use of Apache Arrow for columnar data transfers in SparkR. This optimization applies to: 1. createDataFrame when its input is an R DataFrame 2. collect 3. dapply 4. gapply The following data types are unsupported: FloatType, BinaryType, ArrayType, StructType and MapType.	3.0.0
`spark.sql.execution.pandas.structHandlingMode`	legacy	The conversion mode of struct type when creating pandas DataFrame. When "legacy",1. when Arrow optimization is disabled, convert to Row object, 2. when Arrow optimization is enabled, convert to dict or raise an Exception if there are duplicated nested field names. When "row", convert to Row object regardless of Arrow optimization. When "dict", convert to dict and use suffixed key names, e.g., a_0, a_1, if there are duplicated nested field names, regardless of Arrow optimization.	3.5.0
`spark.sql.execution.pandas.udf.buffer.size`	(value of `spark.buffer.size`)	Same as `spark.buffer.size` but only applies to Pandas UDF executions. If it is not set, the fallback is `spark.buffer.size`. Note that Pandas execution requires more than 4 bytes. Lowering this value could make small Pandas UDF batch iterated and pipelined; however, it might degrade performance. See SPARK-27870.	3.0.0
`spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled`	true	When true, the traceback from Python UDFs is simplified. It hides the Python worker, (de)serialization, etc from PySpark in tracebacks, and only shows the exception messages from UDFs. Note that this works only with CPython 3.7+.	3.1.0
`spark.sql.execution.pythonUDF.arrow.enabled`	false	Enable Arrow optimization in regular Python UDFs. This optimization can only be enabled when the given function takes at least one argument.	3.4.0
`spark.sql.execution.pythonUDTF.arrow.enabled`	false	Enable Arrow optimization for Python UDTFs.	3.5.0
`spark.sql.execution.topKSortFallbackThreshold`	2147483632	In SQL queries with a SORT followed by a LIMIT like 'SELECT x FROM t ORDER BY y LIMIT m', if m is under this threshold, do a top-K sort in memory, otherwise do a global sort which spills to disk if necessary.	2.4.0
`spark.sql.files.ignoreCorruptFiles`	false	Whether to ignore corrupt files. If true, the Spark jobs will continue to run when encountering corrupted files and the contents that have been read will still be returned. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.	2.1.1
`spark.sql.files.ignoreMissingFiles`	false	Whether to ignore missing files. If true, the Spark jobs will continue to run when encountering missing files and the contents that have been read will still be returned. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.	2.3.0
`spark.sql.files.maxPartitionBytes`	128MB	The maximum number of bytes to pack into a single partition when reading files. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.	2.0.0
`spark.sql.files.maxPartitionNum`	(none)	The suggested (not guaranteed) maximum number of split file partitions. If it is set, Spark will rescale each partition to make the number of partitions is close to this value if the initial number of partitions exceeds this value. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.	3.5.0
`spark.sql.files.maxRecordsPerFile`	0	Maximum number of records to write out to a single file. If this value is zero or negative, there is no limit.	2.2.0
`spark.sql.files.minPartitionNum`	(none)	The suggested (not guaranteed) minimum number of split file partitions. If not set, the default value is `spark.sql.leafNodeDefaultParallelism`. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.	3.1.0
`spark.sql.function.concatBinaryAsString`	false	When this option is set to false and all inputs are binary, `functions.concat` returns an output as binary. Otherwise, it returns as a string.	2.3.0
`spark.sql.function.eltOutputAsString`	false	When this option is set to false and all inputs are binary, `elt` returns an output as binary. Otherwise, it returns as a string.	2.3.0
`spark.sql.groupByAliases`	true	When true, aliases in a select list can be used in group by clauses. When false, an analysis exception is thrown in the case.	2.2.0
`spark.sql.groupByOrdinal`	true	When true, the ordinal numbers in group by clauses are treated as the position in the select list. When false, the ordinal numbers are ignored.	2.0.0
`spark.sql.hive.convertInsertingPartitionedTable`	true	When set to true, and `spark.sql.hive.convertMetastoreParquet` or `spark.sql.hive.convertMetastoreOrc` is true, the built-in ORC/Parquet writer is usedto process inserting into partitioned ORC/Parquet tables created by using the HiveSQL syntax.	3.0.0
`spark.sql.hive.convertMetastoreCtas`	true	When set to true, Spark will try to use built-in data source writer instead of Hive serde in CTAS. This flag is effective only if `spark.sql.hive.convertMetastoreParquet` or `spark.sql.hive.convertMetastoreOrc` is enabled respectively for Parquet and ORC formats	3.0.0
`spark.sql.hive.convertMetastoreInsertDir`	true	When set to true, Spark will try to use built-in data source writer instead of Hive serde in INSERT OVERWRITE DIRECTORY. This flag is effective only if `spark.sql.hive.convertMetastoreParquet` or `spark.sql.hive.convertMetastoreOrc` is enabled respectively for Parquet and ORC formats	3.3.0
`spark.sql.hive.convertMetastoreOrc`	true	When set to true, the built-in ORC reader and writer are used to process ORC tables created by using the HiveQL syntax, instead of Hive serde.	2.0.0
`spark.sql.hive.convertMetastoreParquet`	true	When set to true, the built-in Parquet reader and writer are used to process parquet tables created by using the HiveQL syntax, instead of Hive serde.	1.1.1
`spark.sql.hive.convertMetastoreParquet.mergeSchema`	false	When true, also tries to merge possibly different but compatible Parquet schemas in different Parquet data files. This configuration is only effective when "spark.sql.hive.convertMetastoreParquet" is true.	1.3.1
`spark.sql.hive.dropPartitionByName.enabled`	false	When true, Spark will get partition name rather than partition object to drop partition, which can improve the performance of drop partition.	3.4.0
`spark.sql.hive.filesourcePartitionFileCacheSize`	262144000	When nonzero, enable caching of partition file metadata in memory. All tables share a cache that can use up to specified num bytes for file metadata. This conf only has an effect when hive filesource partition management is enabled.	2.1.1
`spark.sql.hive.manageFilesourcePartitions`	true	When true, enable metastore partition management for file source tables as well. This includes both datasource and converted Hive tables. When partition management is enabled, datasource tables store partition in the Hive metastore, and use the metastore to prune partitions during query planning when spark.sql.hive.metastorePartitionPruning is set to true.	2.1.1
`spark.sql.hive.metastorePartitionPruning`	true	When true, some predicates will be pushed down into the Hive metastore so that unmatching partitions can be eliminated earlier.	1.5.0
`spark.sql.hive.metastorePartitionPruningFallbackOnException`	false	Whether to fallback to get all partitions from Hive metastore and perform partition pruning on Spark client side, when encountering MetaException from the metastore. Note that Spark query performance may degrade if this is enabled and there are many partitions to be listed. If this is disabled, Spark will fail the query instead.	3.3.0
`spark.sql.hive.metastorePartitionPruningFastFallback`	false	When this config is enabled, if the predicates are not supported by Hive or Spark does fallback due to encountering MetaException from the metastore, Spark will instead prune partitions by getting the partition names first and then evaluating the filter expressions on the client side. Note that the predicates with TimeZoneAwareExpression is not supported.	3.3.0
`spark.sql.hive.thriftServer.async`	true	When set to true, Hive Thrift server executes SQL queries in an asynchronous way.	1.5.0
`spark.sql.hive.verifyPartitionPath`	false	When true, check all the partition paths under the table's root directory when reading data stored in HDFS. This configuration will be deprecated in the future releases and replaced by spark.files.ignoreMissingFiles.	1.4.0
`spark.sql.inMemoryColumnarStorage.batchSize`	10000	Controls the size of batches for columnar caching. Larger batch sizes can improve memory utilization and compression, but risk OOMs when caching data.	1.1.1
`spark.sql.inMemoryColumnarStorage.compressed`	true	When set to true Spark SQL will automatically select a compression codec for each column based on statistics of the data.	1.0.1
`spark.sql.inMemoryColumnarStorage.enableVectorizedReader`	true	Enables vectorized reader for columnar caching.	2.3.1
`spark.sql.json.filterPushdown.enabled`	true	When true, enable filter pushdown to JSON datasource.	3.1.0
`spark.sql.jsonGenerator.ignoreNullFields`	true	Whether to ignore null fields when generating JSON objects in JSON data source and JSON functions such as to_json. If false, it generates null for null fields in JSON objects.	3.0.0
`spark.sql.leafNodeDefaultParallelism`	(none)	The default parallelism of Spark SQL leaf nodes that produce data, such as the file scan node, the local data scan node, the range node, etc. The default value of this config is 'SparkContext#defaultParallelism'.	3.2.0
`spark.sql.mapKeyDedupPolicy`	EXCEPTION	The policy to deduplicate map keys in builtin function: CreateMap, MapFromArrays, MapFromEntries, StringToMap, MapConcat and TransformKeys. When EXCEPTION, the query fails if duplicated map keys are detected. When LAST_WIN, the map key that is inserted at last takes precedence.	3.0.0
`spark.sql.maven.additionalRemoteRepositories`	https://maven-central.storage-download.googleapis.com/maven2/	A comma-delimited string config of the optional additional remote Maven mirror repositories. This is only used for downloading Hive jars in IsolatedClientLoader if the default Maven Central repo is unreachable.	3.0.0
`spark.sql.maxMetadataStringLength`	100	Maximum number of characters to output for a metadata string. e.g. file location in `DataSourceScanExec`, every value will be abbreviated if exceed length.	3.1.0
`spark.sql.maxPlanStringLength`	2147483632	Maximum number of characters to output for a plan string. If the plan is longer, further output will be truncated. The default setting always generates a full plan. Set this to a lower value such as 8k if plan strings are taking up too much memory or are causing OutOfMemory errors in the driver or UI processes.	3.0.0
`spark.sql.maxSinglePartitionBytes`	9223372036854775807b	The maximum number of bytes allowed for a single partition. Otherwise, The planner will introduce shuffle to improve parallelism.	3.4.0
`spark.sql.optimizer.collapseProjectAlwaysInline`	false	Whether to always collapse two adjacent projections and inline expressions even if it causes extra duplication.	3.3.0
`spark.sql.optimizer.dynamicPartitionPruning.enabled`	true	When true, we will generate predicate for partition column when it's used as join key	3.0.0
`spark.sql.optimizer.enableCsvExpressionOptimization`	true	Whether to optimize CSV expressions in SQL optimizer. It includes pruning unnecessary columns from from_csv.	3.2.0
`spark.sql.optimizer.enableJsonExpressionOptimization`	true	Whether to optimize JSON expressions in SQL optimizer. It includes pruning unnecessary columns from from_json, simplifying from_json + to_json, to_json + named_struct(from_json.col1, from_json.col2, ....).	3.1.0
`spark.sql.optimizer.excludedRules`	(none)	Configures a list of rules to be disabled in the optimizer, in which the rules are specified by their rule names and separated by comma. It is not guaranteed that all the rules in this configuration will eventually be excluded, as some rules are necessary for correctness. The optimizer will log the rules that have indeed been excluded.	2.4.0
`spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold`	10GB	Byte size threshold of the Bloom filter application side plan's aggregated scan size. Aggregated scan byte size of the Bloom filter application side needs to be over this value to inject a bloom filter.	3.3.0
`spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold`	10MB	Size threshold of the bloom filter creation side plan. Estimated size needs to be under this value to try to inject bloom filter.	3.3.0
`spark.sql.optimizer.runtime.bloomFilter.enabled`	true	When true and if one side of a shuffle join has a selective predicate, we attempt to insert a bloom filter in the other side to reduce the amount of shuffle data.	3.3.0
`spark.sql.optimizer.runtime.bloomFilter.expectedNumItems`	1000000	The default number of expected items for the runtime bloomfilter	3.3.0
`spark.sql.optimizer.runtime.bloomFilter.maxNumBits`	67108864	The max number of bits to use for the runtime bloom filter	3.3.0
`spark.sql.optimizer.runtime.bloomFilter.maxNumItems`	4000000	The max allowed number of expected items for the runtime bloom filter	3.3.0
`spark.sql.optimizer.runtime.bloomFilter.numBits`	8388608	The default number of bits to use for the runtime bloom filter	3.3.0
`spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled`	true	Enables runtime group filtering for group-based row-level operations. Data sources that replace groups of data (e.g. files, partitions) may prune entire groups using provided data source filters when planning a row-level operation scan. However, such filtering is limited as not all expressions can be converted into data source filters and some expressions can only be evaluated by Spark (e.g. subqueries). Since rewriting groups is expensive, Spark can execute a query at runtime to find what records match the condition of the row-level operation. The information about matching records will be passed back to the row-level operation scan, allowing data sources to discard groups that don't have to be rewritten.	3.4.0
`spark.sql.optimizer.runtimeFilter.number.threshold`	10	The total number of injected runtime filters (non-DPP) for a single query. This is to prevent driver OOMs with too many Bloom filters.	3.3.0
`spark.sql.optimizer.runtimeFilter.semiJoinReduction.enabled`	false	When true and if one side of a shuffle join has a selective predicate, we attempt to insert a semi join in the other side to reduce the amount of shuffle data.	3.3.0
`spark.sql.orc.aggregatePushdown`	false	If true, aggregates will be pushed down to ORC for optimization. Support MIN, MAX and COUNT as aggregate expression. For MIN/MAX, support boolean, integer, float and date type. For COUNT, support all data types. If statistics is missing from any ORC file footer, exception would be thrown.	3.3.0
`spark.sql.orc.columnarReaderBatchSize`	4096	The number of rows to include in a orc vectorized reader batch. The number should be carefully chosen to minimize overhead and avoid OOMs in reading data.	2.4.0
`spark.sql.orc.columnarWriterBatchSize`	1024	The number of rows to include in a orc vectorized writer batch. The number should be carefully chosen to minimize overhead and avoid OOMs in writing data.	3.4.0
`spark.sql.orc.compression.codec`	snappy	Sets the compression codec used when writing ORC files. If either `compression` or `orc.compress` is specified in the table-specific options/properties, the precedence would be `compression`, `orc.compress`, `spark.sql.orc.compression.codec`.Acceptable values include: none, uncompressed, snappy, zlib, lzo, zstd, lz4.	2.3.0
`spark.sql.orc.enableNestedColumnVectorizedReader`	true	Enables vectorized orc decoding for nested column.	3.2.0
`spark.sql.orc.enableVectorizedReader`	true	Enables vectorized orc decoding.	2.3.0
`spark.sql.orc.filterPushdown`	true	When true, enable filter pushdown for ORC files.	1.4.0
`spark.sql.orc.mergeSchema`	false	When true, the Orc data source merges schemas collected from all data files, otherwise the schema is picked from a random data file.	3.0.0
`spark.sql.orderByOrdinal`	true	When true, the ordinal numbers are treated as the position in the select list. When false, the ordinal numbers in order/sort by clause are ignored.	2.0.0
`spark.sql.parquet.aggregatePushdown`	false	If true, aggregates will be pushed down to Parquet for optimization. Support MIN, MAX and COUNT as aggregate expression. For MIN/MAX, support boolean, integer, float and date type. For COUNT, support all data types. If statistics is missing from any Parquet file footer, exception would be thrown.	3.3.0
`spark.sql.parquet.binaryAsString`	false	Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do not differentiate between binary data and strings when writing out the Parquet schema. This flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.	1.1.1
`spark.sql.parquet.columnarReaderBatchSize`	4096	The number of rows to include in a parquet vectorized reader batch. The number should be carefully chosen to minimize overhead and avoid OOMs in reading data.	2.4.0
`spark.sql.parquet.compression.codec`	snappy	Sets the compression codec used when writing Parquet files. If either `compression` or `parquet.compression` is specified in the table-specific options/properties, the precedence would be `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include: none, uncompressed, snappy, gzip, lzo, brotli, lz4, lz4raw, zstd.	1.1.1
`spark.sql.parquet.enableNestedColumnVectorizedReader`	true	Enables vectorized Parquet decoding for nested columns (e.g., struct, list, map). Requires spark.sql.parquet.enableVectorizedReader to be enabled.	3.3.0
`spark.sql.parquet.enableVectorizedReader`	true	Enables vectorized parquet decoding.	2.0.0
`spark.sql.parquet.fieldId.read.enabled`	false	Field ID is a native field of the Parquet schema spec. When enabled, Parquet readers will use field IDs (if present) in the requested Spark schema to look up Parquet fields instead of using column names	3.3.0
`spark.sql.parquet.fieldId.read.ignoreMissing`	false	When the Parquet file doesn't have any field IDs but the Spark read schema is using field IDs to read, we will silently return nulls when this flag is enabled, or error otherwise.	3.3.0
`spark.sql.parquet.fieldId.write.enabled`	true	Field ID is a native field of the Parquet schema spec. When enabled, Parquet writers will populate the field Id metadata (if present) in the Spark schema to the Parquet schema.	3.3.0
`spark.sql.parquet.filterPushdown`	true	Enables Parquet filter push-down optimization when set to true.	1.2.0
`spark.sql.parquet.inferTimestampNTZ.enabled`	true	When enabled, Parquet timestamp columns with annotation isAdjustedToUTC = false are inferred as TIMESTAMP_NTZ type during schema inference. Otherwise, all the Parquet timestamp columns are inferred as TIMESTAMP_LTZ types. Note that Spark writes the output schema into Parquet's footer metadata on file writing and leverages it on file reading. Thus this configuration only affects the schema inference on Parquet files which are not written by Spark.	3.4.0
`spark.sql.parquet.int96AsTimestamp`	true	Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems.	1.3.0
`spark.sql.parquet.int96TimestampConversion`	false	This controls whether timestamp adjustments should be applied to INT96 data when converting to timestamps, for data written by Impala. This is necessary because Impala stores INT96 data with a different timezone offset than Hive & Spark.	2.3.0
`spark.sql.parquet.mergeSchema`	false	When true, the Parquet data source merges schemas collected from all data files, otherwise the schema is picked from the summary file or a random data file if no summary file is available.	1.5.0
`spark.sql.parquet.outputTimestampType`	INT96	Sets which Parquet timestamp type to use when Spark writes data to Parquet files. INT96 is a non-standard but commonly used timestamp type in Parquet. TIMESTAMP_MICROS is a standard timestamp type in Parquet, which stores number of microseconds from the Unix epoch. TIMESTAMP_MILLIS is also standard, but with millisecond precision, which means Spark has to truncate the microsecond portion of its timestamp value.	2.3.0
`spark.sql.parquet.recordLevelFilter.enabled`	false	If true, enables Parquet's native record-level filtering using the pushed down filters. This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is enabled and the vectorized reader is not used. You can ensure the vectorized reader is not used by setting 'spark.sql.parquet.enableVectorizedReader' to false.	2.3.0
`spark.sql.parquet.respectSummaryFiles`	false	When true, we make assumption that all part-files of Parquet are consistent with summary files and we will ignore them when merging schema. Otherwise, if this is false, which is the default, we will merge all part-files. This should be considered as expert-only option, and shouldn't be enabled before knowing what it means exactly.	1.5.0
`spark.sql.parquet.writeLegacyFormat`	false	If true, data will be written in a way of Spark 1.4 and earlier. For example, decimal values will be written in Apache Parquet's fixed-length byte array format, which other systems such as Apache Hive and Apache Impala use. If false, the newer format in Parquet will be used. For example, decimals will be written in int-based format. If Parquet output is intended for use with systems that do not support this newer format, set to true.	1.6.0
`spark.sql.parser.quotedRegexColumnNames`	false	When true, quoted Identifiers (using backticks) in SELECT statement are interpreted as regular expressions.	2.3.0
`spark.sql.pivotMaxValues`	10000	When doing a pivot without specifying values for the pivot column this is the maximum number of (distinct) values that will be collected without error.	1.6.0
`spark.sql.pyspark.inferNestedDictAsStruct.enabled`	false	PySpark's SparkSession.createDataFrame infers the nested dict as a map by default. When it set to true, it infers the nested dict as a struct.	3.3.0
`spark.sql.pyspark.jvmStacktrace.enabled`	false	When true, it shows the JVM stacktrace in the user-facing PySpark exception together with Python stacktrace. By default, it is disabled to hide JVM stacktrace and shows a Python-friendly exception only. Note that this is independent from log level settings.	3.0.0
`spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled`	false	PySpark's SparkSession.createDataFrame infers the element type of an array from all values in the array by default. If this config is set to true, it restores the legacy behavior of only inferring the type from the first array element.	3.4.0
`spark.sql.readSideCharPadding`	true	When true, Spark applies string padding when reading CHAR type columns/fields, in addition to the write-side padding. This config is true by default to better enforce CHAR type semantic in cases such as external tables.	3.4.0
`spark.sql.redaction.options.regex`	(?i)url	Regex to decide which keys in a Spark SQL command's options map contain sensitive information. The values of options whose names that match this regex will be redacted in the explain output. This redaction is applied on top of the global redaction configuration defined by spark.redaction.regex.	2.2.2
`spark.sql.redaction.string.regex`	(value of `spark.redaction.string.regex`)	Regex to decide which parts of strings produced by Spark contain sensitive information. When this regex matches a string part, that string part is replaced by a dummy value. This is currently used to redact the output of SQL explain commands. When this conf is not set, the value from `spark.redaction.string.regex` is used.	2.3.0
`spark.sql.repl.eagerEval.enabled`	false	Enables eager evaluation or not. When true, the top K rows of Dataset will be displayed if and only if the REPL supports the eager evaluation. Currently, the eager evaluation is supported in PySpark and SparkR. In PySpark, for the notebooks like Jupyter, the HTML table (generated by repr_html) will be returned. For plain Python REPL, the returned outputs are formatted like dataframe.show(). In SparkR, the returned outputs are showed similar to R data.frame would.	2.4.0
`spark.sql.repl.eagerEval.maxNumRows`	20	The max number of rows that are returned by eager evaluation. This only takes effect when spark.sql.repl.eagerEval.enabled is set to true. The valid range of this config is from 0 to (Int.MaxValue - 1), so the invalid config like negative and greater than (Int.MaxValue - 1) will be normalized to 0 and (Int.MaxValue - 1).	2.4.0
`spark.sql.repl.eagerEval.truncate`	20	The max number of characters for each cell that is returned by eager evaluation. This only takes effect when spark.sql.repl.eagerEval.enabled is set to true.	2.4.0
`spark.sql.session.localRelationCacheThreshold`	67108864	The threshold for the size in bytes of local relations to be cached at the driver side after serialization.	3.5.0
`spark.sql.session.timeZone`	(value of local timezone)	The ID of session local timezone in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+\|-)HH', '(+\|-)HH:mm' or '(+\|-)HH:mm:ss', e.g '-08', '+01:00' or '-13:33:33'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.	2.2.0
`spark.sql.shuffle.partitions`	200	The default number of partitions to use when shuffling data for joins or aggregations. Note: For structured streaming, this configuration cannot be changed between query restarts from the same checkpoint location.	1.1.0
`spark.sql.shuffledHashJoinFactor`	3	The shuffle hash join can be selected if the data size of small side multiplied by this factor is still smaller than the large side.	3.3.0
`spark.sql.sources.bucketing.autoBucketedScan.enabled`	true	When true, decide whether to do bucketed scan on input tables based on query plan automatically. Do not use bucketed scan if 1. query does not have operators to utilize bucketing (e.g. join, group-by, etc), or 2. there's an exchange operator between these operators and table scan. Note when 'spark.sql.sources.bucketing.enabled' is set to false, this configuration does not take any effect.	3.1.0
`spark.sql.sources.bucketing.enabled`	true	When false, we will treat bucketed table as normal table	2.0.0
`spark.sql.sources.bucketing.maxBuckets`	100000	The maximum number of buckets allowed.	2.4.0
`spark.sql.sources.default`	parquet	The default data source to use in input/output.	1.3.0
`spark.sql.sources.parallelPartitionDiscovery.threshold`	32	The maximum number of paths allowed for listing files at driver side. If the number of detected paths exceeds this value during partition discovery, it tries to list the files with another Spark distributed job. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.	1.5.0
`spark.sql.sources.partitionColumnTypeInference.enabled`	true	When true, automatically infer the data types for partitioned columns.	1.5.0
`spark.sql.sources.partitionOverwriteMode`	STATIC	When INSERT OVERWRITE a partitioned data source table, we currently support 2 modes: static and dynamic. In static mode, Spark deletes all the partitions that match the partition specification(e.g. PARTITION(a=1,b)) in the INSERT statement, before overwriting. In dynamic mode, Spark doesn't delete partitions ahead, and only overwrite those partitions that have data written into it at runtime. By default we use static mode to keep the same behavior of Spark prior to 2.3. Note that this config doesn't affect Hive serde tables, as they are always overwritten with dynamic mode. This can also be set as an output option for a data source using key partitionOverwriteMode (which takes precedence over this setting), e.g. dataframe.write.option("partitionOverwriteMode", "dynamic").save(path).	2.3.0
`spark.sql.sources.v2.bucketing.enabled`	false	Similar to spark.sql.sources.bucketing.enabled, this config is used to enable bucketing for V2 data sources. When turned on, Spark will recognize the specific distribution reported by a V2 data source through SupportsReportPartitioning, and will try to avoid shuffle if necessary.	3.3.0
`spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled`	false	During a storage-partitioned join, whether to allow input partitions to be partially clustered, when both sides of the join are of KeyGroupedPartitioning. At planning time, Spark will pick the side with less data size based on table statistics, group and replicate them to match the other side. This is an optimization on skew join and can help to reduce data skewness when certain partitions are assigned large amount of data. This config requires both spark.sql.sources.v2.bucketing.enabled and spark.sql.sources.v2.bucketing.pushPartValues.enabled to be enabled	3.4.0
`spark.sql.sources.v2.bucketing.pushPartValues.enabled`	false	Whether to pushdown common partition values when spark.sql.sources.v2.bucketing.enabled is enabled. When turned on, if both sides of a join are of KeyGroupedPartitioning and if they share compatible partition keys, even if they don't have the exact same partition values, Spark will calculate a superset of partition values and pushdown that info to scan nodes, which will use empty partitions for the missing partition values on either side. This could help to eliminate unnecessary shuffles	3.4.0
`spark.sql.statistics.fallBackToHdfs`	false	When true, it will fall back to HDFS if the table statistics are not available from table metadata. This is useful in determining if a table is small enough to use broadcast joins. This flag is effective only for non-partitioned Hive tables. For non-partitioned data source tables, it will be automatically recalculated if table statistics are not available. For partitioned data source and partitioned Hive tables, It is 'spark.sql.defaultSizeInBytes' if table statistics are not available.	2.0.0
`spark.sql.statistics.histogram.enabled`	false	Generates histograms when computing column statistics if enabled. Histograms can provide better estimation accuracy. Currently, Spark only supports equi-height histogram. Note that collecting histograms takes extra cost. For example, collecting column statistics usually takes only one table scan, but generating equi-height histogram will cause an extra table scan.	2.3.0
`spark.sql.statistics.size.autoUpdate.enabled`	false	Enables automatic update for table size once table's data is changed. Note that if the total number of files of the table is very large, this can be expensive and slow down data change commands.	2.3.0
`spark.sql.storeAssignmentPolicy`	ANSI	When inserting a value into a column with different data type, Spark will perform type coercion. Currently, we support 3 policies for the type coercion rules: ANSI, legacy and strict. With ANSI policy, Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as converting `string` to `int` or `double` to `boolean`. With legacy policy, Spark allows the type coercion as long as it is a valid `Cast`, which is very loose. e.g. converting `string` to `int` or `double` to `boolean` is allowed. It is also the only behavior in Spark 2.x and it is compatible with Hive. With strict policy, Spark doesn't allow any possible precision loss or data truncation in type coercion, e.g. converting `double` to `int` or `decimal` to `double` is not allowed.	3.0.0
`spark.sql.streaming.checkpointLocation`	(none)	The default location for storing checkpoint data for streaming queries.	2.0.0
`spark.sql.streaming.continuous.epochBacklogQueueSize`	10000	The max number of entries to be stored in queue to wait for late epochs. If this parameter is exceeded by the size of the queue, stream will stop with an error.	3.0.0
`spark.sql.streaming.disabledV2Writers`		A comma-separated list of fully qualified data source register class names for which StreamWriteSupport is disabled. Writes to these sources will fall back to the V1 Sinks.	2.3.1
`spark.sql.streaming.fileSource.cleaner.numThreads`	1	Number of threads used in the file source completed file cleaner.	3.0.0
`spark.sql.streaming.forceDeleteTempCheckpointLocation`	false	When true, enable temporary checkpoint locations force delete.	3.0.0
`spark.sql.streaming.metricsEnabled`	false	Whether Dropwizard/Codahale metrics will be reported for active streaming queries.	2.0.2
`spark.sql.streaming.multipleWatermarkPolicy`	min	Policy to calculate the global watermark value when there are multiple watermark operators in a streaming query. The default value is 'min' which chooses the minimum watermark reported across multiple operators. Other alternative value is 'max' which chooses the maximum across multiple operators. Note: This configuration cannot be changed between query restarts from the same checkpoint location.	2.4.0
`spark.sql.streaming.noDataMicroBatches.enabled`	true	Whether streaming micro-batch engine will execute batches without data for eager state management for stateful streaming queries.	2.4.1
`spark.sql.streaming.numRecentProgressUpdates`	100	The number of progress updates to retain for a streaming query	2.1.1
`spark.sql.streaming.sessionWindow.merge.sessions.in.local.partition`	false	When true, streaming session window sorts and merge sessions in local partition prior to shuffle. This is to reduce the rows to shuffle, but only beneficial when there're lots of rows in a batch being assigned to same sessions.	3.2.0
`spark.sql.streaming.stateStore.stateSchemaCheck`	true	When true, Spark will validate the state schema against schema on existing state and fail query if it's incompatible.	3.1.0
`spark.sql.streaming.stopActiveRunOnRestart`	true	Running multiple runs of the same streaming query concurrently is not supported. If we find a concurrent active run for a streaming query (in the same or different SparkSessions on the same cluster) and this flag is true, we will stop the old streaming query run to start the new one.	3.0.0
`spark.sql.streaming.stopTimeout`	0	How long to wait in milliseconds for the streaming execution thread to stop when calling the streaming query's stop() method. 0 or negative values wait indefinitely.	3.0.0
`spark.sql.thriftServer.interruptOnCancel`	true	When true, all running tasks will be interrupted if one cancels a query. When false, all running tasks will remain until finished.	3.2.0
`spark.sql.thriftServer.queryTimeout`	0ms	Set a query duration timeout in seconds in Thrift Server. If the timeout is set to a positive value, a running query will be cancelled automatically when the timeout is exceeded, otherwise the query continues to run till completion. If timeout values are set for each statement via `java.sql.Statement.setQueryTimeout` and they are smaller than this configuration value, they take precedence. If you set this timeout and prefer to cancel the queries right away without waiting task to finish, consider enabling spark.sql.thriftServer.interruptOnCancel together.	3.1.0
`spark.sql.thriftserver.scheduler.pool`	(none)	Set a Fair Scheduler pool for a JDBC client session.	1.1.1
`spark.sql.thriftserver.ui.retainedSessions`	200	The number of SQL client sessions kept in the JDBC/ODBC web UI history.	1.4.0
`spark.sql.thriftserver.ui.retainedStatements`	200	The number of SQL statements kept in the JDBC/ODBC web UI history.	1.4.0
`spark.sql.timestampType`	TIMESTAMP_LTZ	Configures the default timestamp type of Spark SQL, including SQL DDL, Cast clause, type literal and the schema inference of data sources. Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP WITH LOCAL TIME ZONE. Before the 3.4.0 release, Spark only supports the TIMESTAMP WITH LOCAL TIME ZONE type.	3.4.0
`spark.sql.tvf.allowMultipleTableArguments.enabled`	false	When true, allows multiple table arguments for table-valued functions, receiving the cartesian product of all the rows of these tables.	3.5.0
`spark.sql.ui.explainMode`	formatted	Configures the query explain mode used in the Spark SQL UI. The value can be 'simple', 'extended', 'codegen', 'cost', or 'formatted'. The default value is 'formatted'.	3.1.0
`spark.sql.variable.substitute`	true	This enables substitution using syntax like `${var}`, `${system:var}`, and `${env:var}`.	2.0.0