feat: support type coercion in Parquet Reader#6458
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thanks @e1ijah1 ! This is looking very cool
I think to really complete this feature we should have an "end to end test" -- like actually creating parquet files with two different schemas and showing how they can be read as a single table using this feature
Perhaps we could add a test to https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/parquet, following the model of
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/parquet/custom_reader.rs
@tustvold since you filed #6427 do you have some idea about how this feature would be used?
|
Marking as draft as we are waiting on feedback |
The use-case is where you have one or more parquet files, with different schema. You can provide a file_schema to FileScanConfig (or to ListingTableConfig) and have underlying data coerced to that schema on read. An example might be if a column has changed from a |
a853cce to
6eb5353
Compare
5287f6d to
030a631
Compare
tustvold
left a comment
There was a problem hiding this comment.
Thank you this looks good to me
|
Getting this in to avoid conflicts with #6374 |
|
I filed a follow on PR #6563 that avoids needing to recompute the mapping for each batch |
Which issue does this PR close?
Closes #6427 .
Rationale for this change
What changes are included in this PR?
Support type coercion in Parquet Reader
Are these changes tested?
Are there any user-facing changes?