public interface PartitionReaderFactory extends Serializable
PartitionReader instances.
If Spark fails to execute any methods in the implementations of this interface or in the returned
PartitionReader (by throwing an exception), corresponding Spark task would fail and
get retried until hitting the maximum retry times.| Modifier and Type | Method and Description |
|---|---|
default PartitionReader<GpuColumnBatch> |
createColumnarReader(org.apache.spark.sql.execution.datasources.FilePartition partition)
Returns a columnar partition reader to read data from the given
FilePartition. |
default boolean |
supportColumnarReads(org.apache.spark.sql.execution.datasources.FilePartition partition)
Returns true if the given
FilePartition should be read by Spark in a columnar way. |
default PartitionReader<GpuColumnBatch> createColumnarReader(org.apache.spark.sql.execution.datasources.FilePartition partition)
FilePartition.
Implementations probably need to cast the input partition to the concrete
FilePartition class defined for the data source.default boolean supportColumnarReads(org.apache.spark.sql.execution.datasources.FilePartition partition)
FilePartition should be read by Spark in a columnar way.
This means, implementations must also implement { #createColumnarReader(FilePartition)}
for the input partitions that this method returns true.
As of Spark 2.4, Spark can only read all input partition in a columnar way, or none of them.
Data source can't mix columnar and row-based partitions. This may be relaxed in future
versions.Copyright © 2020. All rights reserved.