Parquet
Delta
Incremental Reads AS OF
So if you picture AS OF VERSION 5000 in Delta, the engine is not usually doing:
read 00000000000000000000.jsonread 00000000000000000001.json- ...
read 00000000000000005000.json
Instead it is more like:
- find the latest checkpoint at or before version 5000,
- load that checkpointed table state,
- replay only the JSON commits after that checkpoint up to 5000,
- get the active file set for version 5000,
- then plan the scan over those files
That is why the metadata machinery exists: it turns “huge append-only file pile” into “here is the exact active file set for this version.”