datastage 8.1: DS Course Syllabus

DataStage Course Syllabus

Introduction to DataStage

Brief history of DataStage
Various versions available
Introduction to DataStage server components

Repository
DataStage server
DataStage package installer

Introduction to DataStage client components

DataStage administrator
DataStage designer
DataStage director
DataStage manager (removed in latest version)

Overview of IBM web sphere DataStage and quality stage designer

How to connect to a project
IBM information server repository
Developing a job
Introduction to job properties
Introduction to job parameters
Introduction to table definitions
Importing and exporting from the repositor

Web Sphere DataStage Server Jobs

Introduction to server jobs
handling databases in server jobs
Handling special characters (#and$)
Loading tables.
Data type conversion- writing to oracle
Data type conversion- reading from oracle
Looking up on oracle table
Updating on oracle table
ODBC Stage
Universe stage
Handling files in server jobs
Sequential file stage

How to use sequential file stage
Defining sequential file input data
Defining sequential file output data
How the sequential stage behaves
Folder stages

Handling processing stages in server jobs
Transformer stage

How to use transformer stage
Transformer editor components
The DataStage expression editor
Transformer stage properties
Overview of transformer function
Using transformer as a look up stage

Aggregator stage

How to use aggregator stage
Defining the input colomn sort order
Aggregating data

Merge stage
Sort stage

Parallel processing in DataStage

Client server architecture for data warehouse

Various server hardware available
SMP (symmetric multiprocessing)
Clusters
MPP ( massively parallel processing)
CCNUMA OR NUMA(cache-coherent non – Uniform memory architecture)

Types of parallel processing in DataStage

pipeline parallelism
partition parallelism
Combining pipeline and partition parallelism
Repartitioning data
Parallel processing environments
The configuration file

Types of partitioning techniques in DataStage

round robin
random
same
entire
hash by field
modulus
range
DB2
Auto

Type of collecting techniques in DataStage

round robin
ordered
sorted merge
auto

Mechanics of partitioning and collecting Web sphere DataStage parallel jobs

Introduction to DataStage parallel jobs
Difference between a passive stage and active stage
Handling metadata in DataStage

Running column propagation (RPC)
Table definitions
Schema files and partial schemas
Data types
Data and time formats
Complex data types

Handling oracle enterprise stage in parallel jobs
handling special characters(# and $)
loading tables
type conversions writing to oracle
updating an oracle database
deleting rows from an oracle database
leading an oracle database
reading an oracle database
performing a direct lookup on an oracle database table
using SQL builder
Handling transformer stage in parallel jobs
how it is different from server transformer stage
creating and deleting columns
handling null values
defining constraints and handling otherwise links
specifying link order
defining local stage variables
what is a BASIC transformer stage
transformer functions
Combining data in DataStage parallel jobs
horizontal and vertical combining
join stage

inner
Left outer
Right outer
Full order

Look up stage
Merger stage
Comparison between join merge and look up stage
Partitioning in reference links
Aggregator stage
Funnel stage

Funnel mode
Sort funnel mode
Sequence

Most useful stages in DataStage parallel jobs

sort stage

sequential sort
Parallel sort
Total sort
Partitioning requirement

Remove duplicates stage
Modify stage

Dropping and keeping columns
Changing data type
Null handling

Pivot stage
Limitations in pivot stage
Modify stage
Copy stage
Filter stage
External filter stage
Switch stage
Compress stage
Expand stage
Encode stage
Decode stage
FIP enterprise stage
Generic stage
Surrogate key generator stage
SAS stage

Capturing changes in DataStage parallel jobs

change capture stage
Change apply stage
Difference stage
Compare stage
Slowly changing dimension stage

Handling develop / debug stages in DataStage parallel jobs

Head stage

Head stage
Head stage default behavior
Skipping data

Tail stage
Sample stage
Peek stage
Row generator stage

How to specify data to be generated
Generating data in parallel

Colomn generator stage
Write range map stage

How to perform range look up in DataStage
handling restructure stages in DataStage parallel jobs

colomn import stage
Colomn export stage
Make sub record stage

Split sub record stage

Combine records stage
Promote sub record stage
Make vector stage
Split vector stage

Handling XML file in DataStage parallel jobs

Introduction to XML files
Using the XML meta data importer
Using xml input stage

Validating documents and schemas
Processing namespaces
Supported x path expressions

Using XML output stage

Processing names spaces
Supported x path expressions
Aggregating input rows on output
Writing output to your file system
Processing NULLS and empty values
How repetition paths work

Using xml transformer stage

Optimizing performance in server and parallel jobs

Web sphere DataStage jobs and processes
Interpreting performances statistics in server jobs
Improving performance in server jobs

CPU limited jobs single processor systems
CPU limited jobs multiprocessor systems
I/O limited jobs
Hashed file stages
Hash file design

Inter process stages in sever jobs
Link collector stages in server jobs
Link partitioned stages in server jobs
Job design tips in parallel jobs

Processing large volumes of data
Modular development
Designing for good performance

Database sparse lookup vs. join
Improving performance in parallel jobs

Understanding a flow
Performance monitoring
Resolving bottlenecks
Ensuring data is evenly partitioned

Programming in DataStage
Introduction to programming components
Routines

Transform functions
Before /after subroutines
Custom universe functions
Active (ole) functions
Subroutines
Creating a routine
Defining custom transforms

Transforms
Macros
Precedence rules
BASIC programming
Built in transforms and routines

Handling web services in DataStageIntroduction to web services technologies

Encoding requests and responses

Using the soap framework

Publishing web service operations

Accessing web services

What is the web service pack

Using the web service meta data importer

Using the web services transformer stage

Using the web services client stage

Creating web service routines

How to expose DataStage job as a web service

Using IBM information console

Job scheduling using job sequences in DataStage

Creating a job sequence

Overview of activity stages
Triggers
Expressions
Job activity properties
Routine activity properties
Email notification activity properties
Wait for file activity properties
exception activity properties
Nested condition activity properties
Start loop activity properties
End loop activity properties
User variables activity properties
Compiling and restarting the job sequence

Some advanced concepts in DataStage

Achieving reusability in DataStage using containers

Types of containers
Local containers
Server shared containers
Parallel shared containers

Creating a shared containers
Using shared containers in DataStage jobs
Converting shared containers to local containers
Deconstruction of shared containers
Specifying our own parallel stage

Defining custom stage
Defining build stage
Build stage macros
Defining wrapper stage

Usage of administrator client in datastage

Adding environment variables
Setting job parameters default values
Changing license details
Handling projects

Buffer settings in DataStage
Multiple instances of jobs in DataStage
DataStage job control utility
Jobs – compilation execution and checking of logs using DataStage tool
Handling multilingual data in DataStage
How to enable NLS on DataStage
Orchestrate architecture and commands
Orchestrate parallel processing framework in datastage
Orchestrate utility in DataStage
Surrogate key generation using DataStage
Version control in DataStage

Pages

DS Course Syllabus

DataStage Course Syllabus

Introduction to DataStage

Overview of IBM web sphere DataStage and quality stage designer

Web Sphere DataStage Server Jobs

Parallel processing in DataStage

Types of parallel processing in DataStage

Types of partitioning techniques in DataStage

Type of collecting techniques in DataStage

Mechanics of partitioning and collecting Web sphere DataStage parallel jobs

Most useful stages in DataStage parallel jobs

Capturing changes in DataStage parallel jobs

Handling develop / debug stages in DataStage parallel jobs

Handling XML file in DataStage parallel jobs

Using XML output stage

Using xml transformer stage

Web sphere DataStage jobs and processes Interpreting performances statistics in server jobs Improving performance in server jobs

Inter process stages in sever jobs Link collector stages in server jobs Link partitioned stages in server jobs Job design tips in parallel jobs

Database sparse lookup vs. join Improving performance in parallel jobs

Programming in DataStage Introduction to programming components Routines

Transforms Macros Precedence rules BASIC programming Built in transforms and routines

Encoding requests and responses

Using the soap framework

Publishing web service operations

Accessing web services

What is the web service pack

Using the web service meta data importer

Using the web services transformer stage

Using the web services client stage

Creating web service routines

How to expose DataStage job as a web service

Using IBM information console

Job scheduling using job sequences in DataStage

Creating a job sequence

Some advanced concepts in DataStage

Web sphere DataStage jobs and processes
Interpreting performances statistics in server jobs
Improving performance in server jobs

Inter process stages in sever jobs
Link collector stages in server jobs
Link partitioned stages in server jobs
Job design tips in parallel jobs

Database sparse lookup vs. join
Improving performance in parallel jobs

Programming in DataStage
Introduction to programming components
Routines

Transforms
Macros
Precedence rules
BASIC programming
Built in transforms and routines