Monday, March 9, 2015

Command Task in Informatica Powercenter Express



Command task runs a single shell command or starts an external executable program during the workflow. We can write  commands to delete reject files, copy a file, or archive target files. You can use workflow parameters and variables in the command.

When you run a workflow, the workflow passes input data to a Command task in parameters and variables. The Command task uses the input data to run the command. When the Command task finishes running, the task passes output data back to the workflow in variables.

The command syntax depends on whether the Data Integration Service runs on UNIX or Windows.
Windows Use any DOS or batch file.  For example, to copy a file from target folder to source folder
Copy C:\Informatica\PCExpress\tomcat\bin\target\xyz.txt C:\Informatica\PCExpress\tomcat\bin\source\xyz.txt
To create a batch file, write above command in a file and save the file as *.bat file. Then use the file in command task.
UNIX Use any UNIX command or shell script. For example, to copy a file from target folder to source folder
cp C:\Informatica\PCExpress\tomcat\bin\target\xyz.txt C:\Informatica\PCExpress\tomcat\bin\source\xyz.txt
To create a shell script, write above command in a file and save the file as *.sh file. Then use the file in command task.

Command Task Input
It is the data that passes into a Command task from workflow parameters and variables. Assign a Command task configuration property to task input to define the value of the property in a workflow parameter or variable.

Command Task Output
It is the data that passes from a Command task into workflow variables. Command task outputs include general outputs and task specific outputs. The Data Integration Service copies the Command task output values to workflow variables when the Command task completes.
If the task fails, the Data Integration Service copies the general task output values but not the task specific output values to workflow variables.
General outputs Includes output data produced by all tasks such as the task start time, end time, and whether the task successfully ran.
Command task outputs Includes data produced by the Command task when it runs the command.
Exit code returned by the command. A successful command returns 0. An unsuccessful command returns a non-zero value.
Standard error message returned by the command. By default, the first 1,024 characters of the error are returned. You can change the length of the standard error in the Command task advanced configuration properties.
Standard output returned by the command. By default, the first 1,024 characters of the output are returned. You can change the length of the standard output in the Command task advanced configuration properties.

Command Task Advanced Properties
The Advanced tab for a Command task includes properties that the task uses to run the command.
Working Directory
Directory from which to run the command. You can also specify the working directory in the command that you enter for the task. If you specify the directory in both locations, the directory specified in the command overrides the directory specified in this property.
Standard Output Length
Number of characters to return in the standard output for the command. Maximum is 32,768 characters. Default is 1024 characters.
Standard Error Length
Number of characters to return in the standard error for the command. Maximum is 4096 characters. Default is 1024 characters.

Saturday, February 28, 2015

Create, Deploy, Run a Workflow with Mapping Task and Monitor Workflow in Informatica




Workflow is a graphical representation of a set of events, tasks, and decisions that define a business process.  The Data Integration Service uses the instructions configured in the workflow to run the objects.

Workflow Objects
 A workflow object is an event, task, or gateway. You add objects as you develop a worklow in the editor. Workflow objects are non-reusable.

Event Starts or ends the workflow. An event represents something that happens when the workflow runs. The editor displays events as circles.  
Start  Represents the beginning of the workflow. A workflow must contain one Start event. 
End  Represents the end of the workflow. A workflow must contain one End event. 

Task  runs a single unit of work in the workflow, such as running a mapping, sending an email, or running a shell command. A task represents something that is performed during the workflow. The editor displays tasks as squares.
Assignment  Assigns a value to a user-defined workflow variable. 
Command  Runs a single shell command or starts an external executable program. 
Mapping  Runs a mapping. 
Notification  Sends an email notification to specified recipients.

 

Exclusive gateway splits and merges paths in the workflow based on how the Data Integration Service evaluates expressions in conditional sequence flows. An Exclusive gateway represents a decision made in the workflow. The editor displays Exclusive gateways as diamonds.

Workflow Deployment

When you develop a workflow in the Developer tool, you create a workflow definition. To run an instance of the workflow, you add the workflow definition to an application When you deploy a workflow, the Data Integration Service creates a separate set of run-time metadata in the Model repository for the workflow. If you make changes to a workflow definition in the Developer tool after you deploy it, you must redeploy the application that contains the workflow definition for the changes to take effect.

Steps to create a workflow

Create a workflow by clicking File > New > Workflow.

Add objects to the workflow and configure the object properties. Workflow object is an event, task, or gateway.

Connect objects with sequence flows to specify the order that the Data Integration Service runs the objects.

Define variables for the workflow to capture run-time information.

Validate the workflow to identify errors

Add the workflow to an application and deploy the application to the Data Integration Service.

Run an instance of the workflow from the deployed application using the infacmd wfs command line program.
infacmd wfs startWorkflow -dn MyDomain -sn MyDataIntSvs -un MyUser -pd MyPassword -a MyApplication -wf MyWorkflow -pf MyParameterFile.xml
We can also run the instance of the workflow from the deployed application from the Administrator tool.

Monitor the workflow instance run in the Monitoring tool in Informatica Administrator.

Mapping Task

Mapping task runs a mapping during a workflow. When you add a Mapping task to a workflow, you select a single mapping for the task to run.  When you change the mapping, the Model Repository Service tracks the effects of these changes on all Mapping tasks that include the mapping.

Mapping Task Input
Mapping task input is the data that passes into a Mapping task from workflow parameters and variables. On the Mapping task Input tab, you can assign the following information to workflow parameters or variables:
User-defined mapping parameters Assign a user-defined mapping parameter to task input to define the user-defined mapping parameter value in a workflow parameter value, workflow variable value, or literal value. The Input tab lists all parameters created for the mapping and for objects included in the mapping.
Mapping task configuration properties Assign a Mapping task configuration property to task input to define the value of the property in a workflow parameter or variable. The Advanced tab lists the Mapping task configuration properties.

Mapping Task Output
Mapping task output is the data that passes from a Mapping task into workflow variables. When you configure a Mapping task, you specify the task output values that you want to assign to workflow variables on the Output tab. The Data Integration Service copies the Mapping task output values to workflow variables when the Mapping task completes.
General Outputs include output data produced by all tasks such as the task start time, end time, and whether the task successfully ran.
Mapping Task Outputs include number of target rows processed, number of source rows processed and number of error rows.

Mapping Task Advanced Properties
The Advanced tab for a Mapping task includes properties that the task uses to run the mapping. You can also assign a configuration property to task input. Then in the Mapping task Input tab, you can assign the property to a workflow parameter or variable.
Default Date Time Format Date/time format the Data Integration Services uses when the mapping converts strings to dates. Select one of the predefined formats, or type a valid date format string.
Optimizer Level Controls the optimization methods that the Data Integration Service applies to a mapping as follows:
0 (None). The Data Integration Service does not optimize the mapping.
1 (Minimal). The Data Integration Service applies the early projection optimization method to the mapping.
2 (Normal). The Data Integration Service applies the early projection, early selection, pushdown, and predicate optimization methods to the mapping.
3 (Full). The Data Integration Service applies the early projection, early selection, pushdown, predicate, cost-based, and semi-join optimization methods to the mapping.
The property has an integer datatype. Default is 2 (Normal).
High Precision Runs the mapping with high precision. The property has a boolean datatype. Default is true.
Sort Order Order in which the Data Integration Service sorts character data in the mapping. The property has a string datatype. Default is Binary.
Override Tracing Level Overrides the tracing level for each transformation in the mapping. The tracing level determines the amount of information the Data Integration Service sends to the mapping log files. The property has a string datatype. Default is normal.
 

Friday, February 13, 2015

Serializer in Data Transformation in Informatica - XML File to Comma Delimited Flatfile


Serializer is opposite to parser - it converts XML to other formats like Microsoft Excel, text document or an HTML document. You can create a Serializer in two ways - Invert the configuration of an existing Parser or Edit the Script and insert the Serializer component. It is usually easier to create a Serializer than a Parser because the XML is completely structured. The structure makes it easy to identify the required data and write it, in a sequential procedure, to the output.

Serializer contains anchors that are analogous to the anchors that we use in the parser, but work in opposite direction. Serialization anchors read XML data and write the data to locations in the output document. The most important serialization anchors are ContentSerializer and StringSerializer.

ContentSerializer writes the content of a specified data holder to the output document. It is the inverse of a Content anchor, which reads content from a source document.

StringSerializer writes a predefined string to the output. It is the inverse of a Marker anchor, which finds a predefined string in a source document.

Serializer executes the serialization anchors in the sequence of their definitions. Serialization anchors write data sequentially, always appending it to the end of the output document.

In this Blog, We will convert a flat file in XML format to comma delimited flat file. The XML file that we use as example source is given below:

<?xml version="1.0" encoding="windows-1252"?>
<EmpPdf>
<FirstName>Chris</FirstName>
<LastName>Boyd</LastName>
<Department>HR</Department>
<StartDate>2009-10-11</StartDate>
</EmpPdf>

The XSD is same as we used in the Parser.  In the Serializer, we will use the output of the Parser(XML file) as source and convert it to Comma Delimited Flatfile.


If you have  any questions, feel free to leave comments on this page and I would get back as soon as I can.

Friday, February 6, 2015

Parsing Unstructured Data Using Data Processor Transformation in Informatica - PDF to XML


Data Processor transformation processes unstructured and semi-structured file formats in a mapping. We can configure it to process HTML pages, XML, JSON, and PDF documents. We can also convert structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT. For example, if we have customer invoices in Microsoft Word files, we can configure a Data Processor transformation to parse the data from each word file and extract the customer data to a Customer table and order information Orders table.

The Data Processor Transformation  has the following options:
Parser converts source documents to XML. The output of a Parser is always XML. The input can have any format, such as text, HTML, Word, PDF, or HL7.
Serializer converts an XML file to an output document of any format. The output of a Serializer can be any format, such as a text document, an HTML document, or a PDF.
Mapper converts an XML source document to another XML structure or schema. You can convert the same XML documents as in an XMap.
Transformer modifies the data in any format. Adds, removes, converts, or changes text. Use Transformers with a Parser, Mapper, or Serializer. You can also run a Transformer as stand-alone component.
Streamer splits large input documents, such as multi-gigabyte data streams, into segments. The Streamer processes documents that have multiple messages or records in them, such as HIPAA or EDI files.

In this blog, we will see how to extract data from a PDF Document and create a XML file using Data Processor Transformation. The source documents that have fixed page layout like bills, invoices and account statements can be parsed using positional format to find the data fields.  An anchor is a signpost that you place in a document, indicating the position of the data. The most commonly used anchors are called Marker and Content anchors. These anchors are often used as a pair:
Marker anchor labels a location in a document.
Content anchor retrieves text from the location. It stores the text that it extracts from a source document in a data holder.
 
I have a PDF document with employee data as shown below:

FirstName: Chris
Lastname: Boyd
Department: HR
StartDate: 2009-10-11

You will also need an XML Schema Definition (XSD) file which contains target XML schema. XSD file looks like this:

<?xml version="1.0" encoding="Windows-1252"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="EmpPdf">

<xs:complexType>

<xs:sequence>

<xs:element name="FirstName" type="xs:string" />

<xs:element name="LastName" type="xs:string" />

<xs:element name="Department" type="xs:string" />

<xs:element name="StartDate" type="xs:date" />        

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

Using the above data, I have explained in my video how to create a schema object and Data Processor Transformation and use it in a mapping.



 * In case of any questions, feel free to leave comments on this page and I would get back as soon as I can.

Thursday, January 22, 2015

Unconnected Lookup Transformation In Informatica - Role Playing Date Dimension


Unconnected Lookup Transformation is a passive transformation since it can return only one column from each row to the return port. It is not connected to any other transformations in the mapping.

  • It receives input from the result of a :LKP expression from transformations such as an Expression transformation or Aggregator transformation or Update Strategy Transformation.
  • The Integration Service queries the lookup source or cache based on the lookup ports and condition in the transformation.
  • The :LKP expression passes parameters to the Lookup transformation and receives a result back from the lookup.
  • If there is no match for the lookup condition, the Integration Service returns NULL.
  • You can call the Lookup transformation more than once in a mapping.
  • Unlike Connected lookup, it can use only Static cache.

The main advantage of using an unconnected lookup is the fact that you can use it in any other transformation like a function  in other programming languages. The Lookup Cache is calculated only once as part of the session initialization and hence more efficient, since all subsequent look-ups will re-use the Cache.
The only constraint is that the cache has to be static, i.e. the data in your look-up table shouldn’t be changing once the lookup Cache has been created.

Scenario
Lookup Order Date, Due Date and Ship date in Date Dimension.



*In case of any questions, feel free to leave comments on this page and I would get back as soon as I can.

Friday, January 16, 2015

Extracting Twitter Data and Using it in a Mapping - Informatica Powercenter Express


In the Previous blog, we saw how to create a Twitter App, configure Open Authentication and create a Twitter Connection. The next step is to create the Data Object


Create New Data Object

A Twitter data object is a physical data object that represents data based on Twitter resource such as Followers, Profile and Place.
1. Create a new Data Object by clicking File -> New -> Data Object and choose Twitter Data Object.
2. Give a name to the Data Object, location and choose connection created in the above steps. When creating a new data object Selected Resources section is empty. You need to choose resources by clicking Add button.
3. There will be various Twitter Resources shown like Search, Profile lookup, Followers and Friends. For now, just select Search as we are going to search for the tweets based on keywords. Click OK and complete the Data Object creation step by clicking Finish.

Create New Data Object Operation

Create new Data Object Operation on Data Object just created by right clicking on Twitter_Data_Object -> New -> Data Object Operation.
Name the Data Object Operation and select the resource that we added during the creation of data object i.e., Search. Click on Finish button.
Informatica creates two operations Search and Output. In the properties of Output read operation, go to Advanced tab. Use Query Parameter to specify search criteria for tweets to search and endsAfter property can be used to limit number of records fetched by the connection.
Query parameter: QUERY=Informatica


Twitter Data Object Operation Advanced Properties

Query Parameter is the Twitter search criteria. Each resource requires one of the mandatory parameters.
Followers - screen_name,user_id
Friends - screen_name,user_id
Profile Lookup - screen_name,user_id
Search - QUERY
For example, you specify the screen name in the following query parameter to extract the Twitter profile of the Twitter screen name xyzscreen_name=xyz

Token file  is the file name and format of a file that contains a list of access tokens and access secrets. The CSV file must be on the machine where Informatica Services is installed. The Data Integration Service uses the access token and access secret at run time to establish a connection. The Token File is applicale for Followers and Friends Twitter Resource.

Ends After  controls the number of rows requested. Specify a time or a positive numeral:
We can specify the amount of time that the Data Integration Service runs the mapping. Enter the duration in the format, hh:mm. Default is blank, which indicates that the Data Integration Service runs the mapping until you stop it or until there is data.
The Data Integration Service will request the minimum number of rows which is 100. The maximum number of rows requested is based on the rows returned by the Twitter application. Enter the number of rows as a positive numeral.

Create  Mapping

In the Developer, using above Data Object as source and any relational database or flat file as target, mapping can be created.

  

*In case of any questions, feel free to leave comments on this page and I would get back as soon as I can.