I am reverse engineering (Importing) SSIS packages into ER/Studio in order to produce Data Lineage diagrams. Would someone please help me understand the difference between Activity Driven, Source Driven, and Target Driven summary lineage. Specifically, what business problem is addressed by each lineage type? When would you use one over another?
Ok, so I get to answer my own question! Here is what I found for the benefit of the community....
When importing SSIS packages into ER/Studio in order to produce Data Lineage diagrams, we have 3 options:
Each of these is helpful for different reasons.
Source Driven imports start with the Source and build a diagram through transform(s) to Target(s). Each Source is only listed once. Source driven Lineages support Impact Analysis. These types of lineages answer questions like:
Target Driven imports start from the Target and build a diagram through transform(s) to Sources. Each Target is listed only once. Target drive lineages support Root Cause Analysis and answer questions like:
Activity Driven imports start from the Transform (Activity) and build a diagram with all of the sources and targets involved in the activity. Each Source, Transform (Activity), and Target is listed only once. Its a nice high level diagram that shows:
In addition to the data fields involved in the lineage, you can also see the code involved (but see below for a possible problem) and (you can) provide a business-level explanation of each transform.
Possible problem: In digging around, I found that the import failed to bring in the SQLTask information. Specifically the SQLTask:SqlStatementSource which would be nice to see as it is the SQL statement that actually transformed the data. These are located in the SSIS packages as "<DTS:Executables>". Since I am new to all of this, the problem might lie with what I used as parameters for the import.
Hope this was helpful for someone!