DataFusion is a new data processing engine written in the Rust programming language. It provides a SQL and a DataFrame API to transform datasets from multiple sources and in multiple file formats, similar to Spark. DataFusion uses Apache Arrow as the underlying memory model, an efficient in-memory columnar format. This choice has significant performance benefits compared to a row-level representation such as used by Apache Spark.
This is a companion discussion topic for the original entry at https://godatadriven.com/blog/making-joins-faster-in-datafusion-based-on-table-statistics/