Saturday, February 28, 2015

Create, Deploy, Run a Workflow with Mapping Task and Monitor Workflow in Informatica




Workflow is a graphical representation of a set of events, tasks, and decisions that define a business process.  The Data Integration Service uses the instructions configured in the workflow to run the objects.

Workflow Objects
 A workflow object is an event, task, or gateway. You add objects as you develop a worklow in the editor. Workflow objects are non-reusable.

Event Starts or ends the workflow. An event represents something that happens when the workflow runs. The editor displays events as circles.  
Start  Represents the beginning of the workflow. A workflow must contain one Start event. 
End  Represents the end of the workflow. A workflow must contain one End event. 

Task  runs a single unit of work in the workflow, such as running a mapping, sending an email, or running a shell command. A task represents something that is performed during the workflow. The editor displays tasks as squares.
Assignment  Assigns a value to a user-defined workflow variable. 
Command  Runs a single shell command or starts an external executable program. 
Mapping  Runs a mapping. 
Notification  Sends an email notification to specified recipients.

 

Exclusive gateway splits and merges paths in the workflow based on how the Data Integration Service evaluates expressions in conditional sequence flows. An Exclusive gateway represents a decision made in the workflow. The editor displays Exclusive gateways as diamonds.

Workflow Deployment

When you develop a workflow in the Developer tool, you create a workflow definition. To run an instance of the workflow, you add the workflow definition to an application When you deploy a workflow, the Data Integration Service creates a separate set of run-time metadata in the Model repository for the workflow. If you make changes to a workflow definition in the Developer tool after you deploy it, you must redeploy the application that contains the workflow definition for the changes to take effect.

Steps to create a workflow

Create a workflow by clicking File > New > Workflow.

Add objects to the workflow and configure the object properties. Workflow object is an event, task, or gateway.

Connect objects with sequence flows to specify the order that the Data Integration Service runs the objects.

Define variables for the workflow to capture run-time information.

Validate the workflow to identify errors

Add the workflow to an application and deploy the application to the Data Integration Service.

Run an instance of the workflow from the deployed application using the infacmd wfs command line program.
infacmd wfs startWorkflow -dn MyDomain -sn MyDataIntSvs -un MyUser -pd MyPassword -a MyApplication -wf MyWorkflow -pf MyParameterFile.xml
We can also run the instance of the workflow from the deployed application from the Administrator tool.

Monitor the workflow instance run in the Monitoring tool in Informatica Administrator.

Mapping Task

Mapping task runs a mapping during a workflow. When you add a Mapping task to a workflow, you select a single mapping for the task to run.  When you change the mapping, the Model Repository Service tracks the effects of these changes on all Mapping tasks that include the mapping.

Mapping Task Input
Mapping task input is the data that passes into a Mapping task from workflow parameters and variables. On the Mapping task Input tab, you can assign the following information to workflow parameters or variables:
User-defined mapping parameters Assign a user-defined mapping parameter to task input to define the user-defined mapping parameter value in a workflow parameter value, workflow variable value, or literal value. The Input tab lists all parameters created for the mapping and for objects included in the mapping.
Mapping task configuration properties Assign a Mapping task configuration property to task input to define the value of the property in a workflow parameter or variable. The Advanced tab lists the Mapping task configuration properties.

Mapping Task Output
Mapping task output is the data that passes from a Mapping task into workflow variables. When you configure a Mapping task, you specify the task output values that you want to assign to workflow variables on the Output tab. The Data Integration Service copies the Mapping task output values to workflow variables when the Mapping task completes.
General Outputs include output data produced by all tasks such as the task start time, end time, and whether the task successfully ran.
Mapping Task Outputs include number of target rows processed, number of source rows processed and number of error rows.

Mapping Task Advanced Properties
The Advanced tab for a Mapping task includes properties that the task uses to run the mapping. You can also assign a configuration property to task input. Then in the Mapping task Input tab, you can assign the property to a workflow parameter or variable.
Default Date Time Format Date/time format the Data Integration Services uses when the mapping converts strings to dates. Select one of the predefined formats, or type a valid date format string.
Optimizer Level Controls the optimization methods that the Data Integration Service applies to a mapping as follows:
0 (None). The Data Integration Service does not optimize the mapping.
1 (Minimal). The Data Integration Service applies the early projection optimization method to the mapping.
2 (Normal). The Data Integration Service applies the early projection, early selection, pushdown, and predicate optimization methods to the mapping.
3 (Full). The Data Integration Service applies the early projection, early selection, pushdown, predicate, cost-based, and semi-join optimization methods to the mapping.
The property has an integer datatype. Default is 2 (Normal).
High Precision Runs the mapping with high precision. The property has a boolean datatype. Default is true.
Sort Order Order in which the Data Integration Service sorts character data in the mapping. The property has a string datatype. Default is Binary.
Override Tracing Level Overrides the tracing level for each transformation in the mapping. The tracing level determines the amount of information the Data Integration Service sends to the mapping log files. The property has a string datatype. Default is normal.
 

Friday, February 13, 2015

Serializer in Data Transformation in Informatica - XML File to Comma Delimited Flatfile


Serializer is opposite to parser - it converts XML to other formats like Microsoft Excel, text document or an HTML document. You can create a Serializer in two ways - Invert the configuration of an existing Parser or Edit the Script and insert the Serializer component. It is usually easier to create a Serializer than a Parser because the XML is completely structured. The structure makes it easy to identify the required data and write it, in a sequential procedure, to the output.

Serializer contains anchors that are analogous to the anchors that we use in the parser, but work in opposite direction. Serialization anchors read XML data and write the data to locations in the output document. The most important serialization anchors are ContentSerializer and StringSerializer.

ContentSerializer writes the content of a specified data holder to the output document. It is the inverse of a Content anchor, which reads content from a source document.

StringSerializer writes a predefined string to the output. It is the inverse of a Marker anchor, which finds a predefined string in a source document.

Serializer executes the serialization anchors in the sequence of their definitions. Serialization anchors write data sequentially, always appending it to the end of the output document.

In this Blog, We will convert a flat file in XML format to comma delimited flat file. The XML file that we use as example source is given below:

<?xml version="1.0" encoding="windows-1252"?>
<EmpPdf>
<FirstName>Chris</FirstName>
<LastName>Boyd</LastName>
<Department>HR</Department>
<StartDate>2009-10-11</StartDate>
</EmpPdf>

The XSD is same as we used in the Parser.  In the Serializer, we will use the output of the Parser(XML file) as source and convert it to Comma Delimited Flatfile.


If you have  any questions, feel free to leave comments on this page and I would get back as soon as I can.

Friday, February 6, 2015

Parsing Unstructured Data Using Data Processor Transformation in Informatica - PDF to XML


Data Processor transformation processes unstructured and semi-structured file formats in a mapping. We can configure it to process HTML pages, XML, JSON, and PDF documents. We can also convert structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT. For example, if we have customer invoices in Microsoft Word files, we can configure a Data Processor transformation to parse the data from each word file and extract the customer data to a Customer table and order information Orders table.

The Data Processor Transformation  has the following options:
Parser converts source documents to XML. The output of a Parser is always XML. The input can have any format, such as text, HTML, Word, PDF, or HL7.
Serializer converts an XML file to an output document of any format. The output of a Serializer can be any format, such as a text document, an HTML document, or a PDF.
Mapper converts an XML source document to another XML structure or schema. You can convert the same XML documents as in an XMap.
Transformer modifies the data in any format. Adds, removes, converts, or changes text. Use Transformers with a Parser, Mapper, or Serializer. You can also run a Transformer as stand-alone component.
Streamer splits large input documents, such as multi-gigabyte data streams, into segments. The Streamer processes documents that have multiple messages or records in them, such as HIPAA or EDI files.

In this blog, we will see how to extract data from a PDF Document and create a XML file using Data Processor Transformation. The source documents that have fixed page layout like bills, invoices and account statements can be parsed using positional format to find the data fields.  An anchor is a signpost that you place in a document, indicating the position of the data. The most commonly used anchors are called Marker and Content anchors. These anchors are often used as a pair:
Marker anchor labels a location in a document.
Content anchor retrieves text from the location. It stores the text that it extracts from a source document in a data holder.
 
I have a PDF document with employee data as shown below:

FirstName: Chris
Lastname: Boyd
Department: HR
StartDate: 2009-10-11

You will also need an XML Schema Definition (XSD) file which contains target XML schema. XSD file looks like this:

<?xml version="1.0" encoding="Windows-1252"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="EmpPdf">

<xs:complexType>

<xs:sequence>

<xs:element name="FirstName" type="xs:string" />

<xs:element name="LastName" type="xs:string" />

<xs:element name="Department" type="xs:string" />

<xs:element name="StartDate" type="xs:date" />        

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

Using the above data, I have explained in my video how to create a schema object and Data Processor Transformation and use it in a mapping.



 * In case of any questions, feel free to leave comments on this page and I would get back as soon as I can.