A LookUp cache does not change its data once built. But what if the underlying table upon which lookup was done changes the data after the lookup cache is created? Is there a way so that the cache always remain up-to-date even if the underlying table changes?
Let's think about this scenario. You are loading your target table through a mapping. Inside the mapping you have a Lookup and in the Lookup, you are actually looking up the same target table you are loading.
You may ask me, "So? What's the big deal? We all do it quite often...". And yes you are right.
There is no "big deal" because Informatica (generally) caches the lookup table in the very beginning of the mapping, so whatever record getting inserted to the target table through the mapping, will have no effect on the Lookup cache.
The lookup will still hold the previously cached data, even if the underlying target table is changing.
But what if you want your Informatica Lookup cache to get updated as and when the data in the underlying target table changes?
What if you want your lookup cache to always show the exact snapshot of the data in your target table at that point in time? Clearly this requirement will not be fullfilled in case you use a static cache. You will need a dynamic cache to handle this.
But in which scenario will someone need to use a dynamic cache? To understand this, let's first understand a static cache scenario.
Let's suppose you run a retail business and maintain all your customer information in a customer master table (RDBMS table). Every night, all the customers from your customer master table is loaded in to a Customer Dimension table in your data warehouse. Your source customer table is a transaction system table, probably in 3rd normal form, and does not store history. Meaning, if a customer changes his address, the old address is updated with the new address.
But your data warehouse table stores the history (may be in the form of SCD Type-II). There is a map that loads your data warehouse table from the source table. Typically you do a Lookup on target (static cache) and check with every incoming customer record to determine if the customer is already existing in target or not. If the customer is not already existing in target, you conclude the customer is new and INSERT the record whereas if the customer is already existing, you may want to update the target record with this new record (if the record is updated). This scenario - commonly known as 'UPSERT' (update else insert) scenario - is illustrated below.
A static Lookup Cache to determine if a source record is new or updatable
You don't need dynamic Lookup cache for the above type of scenario.
Notice in the previous example I mentioned that your source table is an RDBMS table. Generally speaking, this ensures that your source table does not have any duplicate record.
But, What if you had a flat file as source with many duplicate records in the same bunch of data that you are trying to load? (Or even a RDBMS table may also contain duplicate records)
Would the scenario be same if the bunch of data I am loading contains duplicate?
Unfortunately Not. Let's understand why from the below illustration. As you can see below, the new customer "Linda" has been entered twice in the source system - most likely mistakenly. The customer "Linda" is not present in your target system and hence does not exist in the target side lookup cache.
When you try to load the target table, Informatica processes row 3 and inserts it to target as customer "Linda" does not exist in target. Then Informatica processes row 4 and again inserts "Linda" into target since Informatica lookup's static cache can not detect that the customer "Linda" has already been inserted. This results into duplicate rows in target.
Here are some more examples when you may consider using dynamic lookup,
Once you have configured your lookup to use dynamic cache (we will see below how to do that), when Integration Service reads a row from the source, it updates the lookup cache by performing one of the following actions:
Notice that Integration Service actually flags the rows based on the above three conditions.
And that's a great thing, because, if you know the flag you can actually reroute the row to achieve different logic.
Fortunately, as soon as you create a dynamic lookup Informatica adds one extra port to the lookup. This new port is called:
Using the value of this port, the rows can be routed for insert, update or to do nothing. You just need to use a Router or Filter transformation followed by an Update Strategy.
Oh, forgot to tell you the actual values that you can expect in NewLookupRow port are:
When the Integration Service reads a row, it changes the lookup cache depending on the results of the lookup query and the Lookup transformation properties you define. It assigns the value 0, 1, or 2 to the NewLookupRow port to indicate if it inserts or updates the row in the cache, or makes no change.
Ok, I design a mapping for you to show Dynamic lookup implementation. I have given a full screenshot of the mapping. Since the screenshot is slightly bigger, so I link it below. Just click to expand the image.
Screen-shot of the mapping (click on the image to expand)
Screen-shot of the Lookup - Ports Tab (click on the image to expand)
Screen-shot of the Lookup - Properties Tab (click on the image to expand)
If you check the mapping screenshot, there I have used a router to reroute the INSERT group and UPDATE group. The router screenshot is also given below. New records are routed to the INSERT group and existing records are routed to the UPDATE group.
Screen-shot of the Router (click on the image to expand)
While using a dynamic lookup cache, we must associate each lookup/output port with an input/output port or a sequence ID. The Integration Service uses the data in the associated port to insert or update rows in the lookup cache. The Designer associates the input/output ports with the lookup/output ports used in the lookup condition.
When we select Sequence-ID in the Associated Port column, the Integration Service generates a sequence ID for each row it inserts into the lookup cache.
When the Integration Service creates the dynamic lookup cache, it tracks the range of values in the cache associated with any port using a sequence ID and it generates a key for the port by incrementing the greatest sequence ID existing value by one, when the inserting a new row of data into the cache.
When the Integration Service reaches the maximum number for a generated sequence ID, it starts over at one and increments each sequence ID by one until it reaches the smallest existing value minus one. If the Integration Service runs out of unique sequence ID numbers, the session fails.
The lookup/output port output value depends on whether we choose to output old or new values when the Integration Service updates a row:
Note: We can configure to output old or new values using the Output Old Value On Update transformation property.
If the input value is NULL and we select the Ignore Null inputs for Update property for the associated input port, the input value does not equal the lookup value or the value out of the input/output port. When you select the Ignore Null property, the lookup cache and the target table might become unsynchronized if you pass null values to the target. You must verify that you do not pass null values to the target.
When you update a dynamic lookup cache and target table, the source data might contain some null values. The Integration Service can handle the null values in the following ways:
If we know the source data contains null values, and we do not want the Integration Service to update the lookup cache or target with null values, then we need to check the Ignore Null property for the corresponding lookup/output port.
When we choose to ignore NULLs, we must verify that we output the same values to the target that the Integration Service writes to the lookup cache. We can Configure the mapping based on the value we want the Integration Service to output from the lookup/output ports when it updates a row in the cache, so that lookup cache and the target table might not become unsynchronized.
When we run a session that uses a dynamic lookup cache, the Integration Service compares the values in all lookup ports with the values in their associated input ports by default.
It compares the values to determine whether or not to update the row in the lookup cache. When a value in an input port differs from the value in the lookup port, the Integration Service updates the row in the cache.
But what if we don't want to compare all ports?
We can choose the ports we want the Integration Service to ignore when it compares ports. The Designer only enables this property for lookup/output ports when the port is not used in the lookup condition. We can improve performance by ignoring some ports during comparison. (Learn how to improve performance of lookup transformation here)
We might want to do this when the source data includes a column that indicates whether or not the row contains data we need to update. Select the Ignore in Comparison property for all lookup ports except the port that indicates whether or not to update the row in the cache and target table.
Note: We must configure the Lookup transformation to compare at least one port else the Integration Service fails the session when we ignore all ports.