Hi Lavanya,
Parsing part of address cleansing ,data cleansing and text data processing also . Parsing works with functionality of lookup.
If you have one source field , it extract the particular value from the source field and populate on the target fields.
You can see my above example for understanding the parsing .
Let explain the same example how it will work.
This is my input data about pizza information .
large mushrooms sausage pepperoni stuffed crust
medium deluxe thin crust
small vegetarian handtossed
personal hamburger bacon cheese pan
large cheese thin crust
handtossed sausage large
This is the one field description .
Here large,medium,small,personal are pizza sizes.
thin crust ,handtossed, pan,Stuffed Crust are pizza crust details.
mushrooms, sasuage,pepperoni,deluxe,vegetarian,humburger,bacon and cheese are toppings.
We want to extract the these values of respective pizza information into size,crust,topping fields .
Come to normal data integration , we can prepare the lookup values with two fields like below
TYPE,VALUE
SIZE, large
SIZE,medium
SIZE,small
CRUST,thin crust
CRUST,handtossed
TOPPING,cheese
-------------------
------------------
First we have description with combination of the all values . we have to do element analysis of the source . Means we have break the description into different values . If we want to break also very difficult to maintain the criteria . For breaking the description consider space is delimiter then how can we extract the thin crust ,stuffed crust etc values having the spaces. Some times we may having the multiple values for single field like mushrooms pepperoni sausage are three different toppings in the first description
Come to Data quality .
In this case we will go with data quality . Here also we need to parse the values on some criteria or rules .
See first 5 descriptions having the same order of size, topping and crust. If you see in the last description having the crust,topping and size values .
Here we have to build two types of rules to parse the values because of order. for multiple values also you have to maintain the rule file . Base on rule file quality transform will parse.
Even address cleansing , text data processing will work on same criteria only.
I hope you will understand the parsing .
Thanks & Regards,
Ramana.