Reladiff
Reladiff is a high-performance tool and library designed for diffing large datasets across databases. By executing the diff calculation within the database itself, Reladiff minimizes data transfer and achieves optimal performance.
This tool is specifically tailored for data professionals, DevOps engineers, and system administrators.
Reladiff is free, open-source, user-friendly, extensively tested, and delivers fast results, even at massive scale.
Key Features
Cross-Database Diff: Reladiff employs a divide-and-conquer algorithm, based on matching hashes, to efficiently identify modified segments and download only the necessary data for comparison. This approach ensures exceptional performance when differences are minimal.
⇄ Diffs across over a dozen different databases (e.g. PostgreSQL -> Snowflake)!
🧠 Gracefully handles reduced precision (e.g., timestamp(9) -> timestamp(3)) by rounding according to the database specification.
🔥 Benchmarked to diff over 25M rows in under 10 seconds and over 1B rows in approximately 5 minutes, given no differences.
♾️ Capable of handling tables with tens of billions of rows.
Intra-Database Diff: When both tables reside in the same database, Reladiff compares them using a join operation, with additional optimizations for enhanced speed.
Supports materializing the diff into a local table.
Can collect various extra statistics about the tables.
Threaded: Utilizes multiple threads to significantly boost performance during diffing operations.
Configurable: Offers numerous options for power-users to customize and optimize their usage.
Automation-Friendly: Outputs both JSON and git-like diffs (with + and -), facilitating easy integration into CI/CD pipelines.
Over a dozen databases supported: MySQL, Postgres, Snowflake, Bigquery, Oracle, Clickhouse, and more. See full list.
Reladiff is a fork of an archived project called data-diff. Code that worked with data-diff should also work with reladiff, without any changes. However, there are a few differences: Reladiff doesn’t contain any tracking code. Reladiff doesn’t have DBT integration.
Resources
- Other links