I discovered an error scenario with SSIS merge joins.
I had a merge join on two datasets, based upon a join on two columns: COL_A and COL_B.
The merge join tool requires the inputs to be sorted, which they were.
However the change I made required a change to the order sequence.
Instead of it being Col_A = 1, Col_B = 2 it was changed to Col_B = 1, Col_A = 2.
The order sequence was specified logically in SSIS properties of the input data sources. However the SQL query for the 2nd data source was not updated, so it was left as
ORDER BY COL_A, COL_B
No errors were reported as they were both of type INT. However the JOIN failed and rows were being excluded that were present before.
So it is important to ensure that the data is PHYSICALLY sorted (either by using the SORT tool) or by the SQL query itself, not just sequence it in the properties of the data source. This is noted here with a warning:
If the sort options indicate that the data is sorted, but the data is
not actually sorted, the results of the merge or merge join operation
are unpredictable.
It's just a bit nasty because if you make this mistake, there are no warnings or errors and the first indication that something is wrong is when the output isn't what you expect.
You can also conclude that an SSIS join is different from a SQL join - it is dependent upon column positions rather than column names.
No comments:
Post a Comment