Comparing Large Resultsets
Overview

CompareData allows you to compare data of large resultsets without having to fit all the data of the resultsets in the client machine memory provided that the data sources and client machine character sets/collation sequences are compatible for the comparison.

Default Comparison

By default, when you compare the data of two resultsets, all the resultset data is retrieved from each data source, and then each resultset data is sorted at the client machine using the comparison keys you set up for the comparison.

After both resultset data are retrieved and sorted, the data is compared at the client machine.

In order to use this method, both resultset data must fit in the client machine memory.

Note that eventhough ODBC drivers may convert data from the data source server character set to the character set used at the client machine, comparison resultsets can be safely assumed to be in the correct sorting order for comparing at the client machine because the resultset data is sorted at the client machine and not at the data source.

If your machine has more than 4 GB of memory and you are using 64-bit version of Windows and 64-bit ODBC drivers are available for the data sources you want to compare, you can use CompareData 64-bit version to have access to more memory. See x64.

Inline Comparison

In an inline comparison, the resultset data is sorted at the DBMS server typically using an SQL Order By clause and then a fixed-size block of rows is retrieved from each data source and then the rows of two blocks are compared at the client machine and only rows that satisfy a row limit criteria are kept at the client machine.  After two blocks of rows are compared, then the next two blocks of rows are retrieved and compared and so forth until the end of the resultsets is reached.

CompareData allows you to set up a limit on the number of rows you want to keep depending on their comparison result. For example, you may want to keep only the first 1000 rows that have no differences, all rows that are new, and all the rows that have differences.  Or you may set a limit of zero for all rows in which case no rows are kept at the client machine after a block rows is compared.

Since you can set limits on the number of rows to keep, you can compare very large resultsets as long as the rows you decide to keep will fit in the client machine memory.

A requirement for inline comparison is that since the data is sorted at the servers and compared at the client, both servers and the client must use compatible character sets and collation sequences. So that if 'A' is smaller than 'a' on one server, it is also smaller on the other server and also at the client machine where rows are compared. 

If none of the resultset comparison keys are character type, then this is not an issue, otherwise, you may get incorrect comparison results if any of the comparison key columns are character type and the two servers and the client machine character sets/collation sequences are not compatible.