DS Course Syllabus

DataStage Course Syllabus

Introduction to DataStage

  • Brief history of DataStage
  • Various versions available
  • Introduction to DataStage server components
    • Repository
    • DataStage server
    • DataStage package installer
  • Introduction to DataStage client components
    • DataStage administrator
    • DataStage designer
    • DataStage director
    • DataStage manager (removed in latest version)

Overview of IBM web sphere DataStage and quality stage designer

  • How to connect to a project
  • IBM information server repository
  • Developing a job
  • Introduction to job properties
  • Introduction to job parameters
  • Introduction to table definitions
  • Importing and exporting from the repositor

Web Sphere DataStage Server Jobs

  • Introduction to server jobs
  • handling databases in server jobs
  • Handling special characters (#and$)
  • Loading tables.
  • Data type conversion- writing to oracle
  • Data type conversion- reading from oracle
  • Looking up on oracle table
  • Updating on oracle table
  • ODBC Stage
  • Universe stage
  • Handling files in server jobs
  • Sequential file stage
    • How to use sequential file stage
    • Defining sequential file input data
    • Defining sequential file output data
    • How the sequential stage behaves
    • Folder stages
  • Handling processing stages in server jobs
  • Transformer stage
    • How to use transformer stage
    • Transformer editor components
    • The DataStage expression editor
    • Transformer stage properties
    • Overview of transformer function
    • Using transformer as a look up stage
  • Aggregator stage
    • How to use aggregator stage
    • Defining the input colomn sort order
    • Aggregating data
  • Merge stage
  • Sort stage

Parallel processing in DataStage

  • Client server architecture for data warehouse
    • Various server hardware available
    • SMP (symmetric multiprocessing)
    • Clusters
    • MPP ( massively parallel processing)
    • CCNUMA OR NUMA(cache-coherent non – Uniform memory architecture)

Types of parallel processing in DataStage

  • pipeline parallelism
  • partition parallelism
  • Combining pipeline and partition parallelism
  • Repartitioning data
  • Parallel processing environments
  • The configuration file

Types of partitioning techniques in DataStage

  • round robin
  • random
  • same
  • entire
  • hash by field
  • modulus
  • range
  • DB2
  • Auto

Type of collecting techniques in DataStage

  • round robin
  • ordered
  • sorted merge
  • auto

Mechanics of partitioning and collecting Web sphere DataStage parallel jobs

  • Introduction to DataStage parallel jobs
  • Difference between a passive stage and active stage
  • Handling metadata in DataStage
    • Running column propagation (RPC)
    • Table definitions
    • Schema files and partial schemas
    • Data types
    • Data and time formats
    • Complex data types
  • Handling oracle enterprise stage in parallel jobs
  • handling special characters(# and $)
  • loading tables
  • type conversions writing to oracle
  • updating an oracle database
  • deleting rows from an oracle database
  • leading an oracle database
  • reading an oracle database
  • performing a direct lookup on an oracle database table
  • using SQL builder
  • Handling transformer stage in parallel jobs
  • how it is different from server transformer stage
  • creating and deleting columns
  • handling null values
  • defining constraints and handling otherwise links
  • specifying link order
  • defining local stage variables
  • what is a BASIC transformer stage
  • transformer functions
  • Combining data in DataStage parallel jobs
  • horizontal and vertical combining
  • join stage
    • inner
    • Left outer
    • Right outer
    • Full order
  • Look up stage
  • Merger stage
  • Comparison between join merge and look up stage
  • Partitioning in reference links
  • Aggregator stage
  • Funnel stage
    • Funnel mode
    • Sort funnel mode
    • Sequence

Most useful stages in DataStage parallel jobs

    • sort stage
      • sequential sort
      • Parallel sort
      • Total sort
      • Partitioning requirement
    • Remove duplicates stage
    • Modify stage
      • Dropping and keeping columns
      • Changing data type
      • Null handling
    • Pivot stage
    •     Limitations in pivot stage
    • Modify stage
    • Copy stage
    • Filter stage
    • External filter stage
    • Switch stage
    • Compress stage
    • Expand stage
    • Encode stage
    • Decode stage
    • FIP enterprise stage
    • Generic stage
    • Surrogate key generator stage
    • SAS stage

Capturing changes in DataStage parallel jobs

    • change capture stage
    • Change apply stage
    • Difference stage
    • Compare stage
    • Slowly changing dimension stage

Handling develop / debug stages in DataStage parallel jobs

  • Head stage
    • Head stage
    • Head stage default behavior
    • Skipping data
  • Tail stage
  • Sample stage
  • Peek stage
  • Row generator stage
    • How to specify data to be generated
    • Generating data in parallel

  • Colomn generator stage
  • Write range map stage
  • How to perform range look up in DataStage
  • handling restructure stages in DataStage parallel jobs
    • colomn import stage
    • Colomn export stage
    • Make sub record stage
  • Split sub record stage
    • Combine records stage
    • Promote sub record stage
    • Make vector stage
    • Split vector stage

Handling XML file in DataStage parallel jobs

  • Introduction to XML files
  • Using the XML meta data importer
  • Using xml input stage
    • Validating documents and schemas
    • Processing namespaces
    • Supported x path expressions

Using XML output stage

  • Processing names spaces
  • Supported x path expressions
  • Aggregating input rows on output
  • Writing output to your file system
  • Processing NULLS and empty values
  • How repetition paths work

Using xml transformer stage

  • Optimizing performance in server and parallel jobs

Web sphere DataStage jobs and processes
Interpreting performances statistics in server jobs
Improving performance in server jobs

  • CPU limited jobs single processor systems
  • CPU limited jobs multiprocessor systems
  • I/O limited jobs
  • Hashed file stages
  • Hash file design

Inter process stages in sever jobs
Link collector stages in server jobs
Link partitioned stages in server jobs
Job design tips in parallel jobs

  • Processing large volumes of data
  • Modular development
  • Designing for good performance

Database sparse lookup vs. join
Improving performance in parallel jobs

  • Understanding a flow
  • Performance monitoring
  • Resolving bottlenecks
  • Ensuring data is evenly partitioned

Programming in DataStage
Introduction to programming components
Routines

  • Transform functions
  • Before /after subroutines
  • Custom universe functions
  • Active (ole) functions
  • Subroutines
  • Creating a routine
  • Defining custom transforms

Transforms
Macros
Precedence rules
BASIC programming
Built in transforms and routines

  • Handling web services in DataStageIntroduction to web services technologies

Encoding requests and responses

Using the soap framework

Publishing web service operations

Accessing web services

What is the web service pack

Using the web service meta data importer

Using the web services transformer stage

Using the web services client stage

Creating web service routines

How to expose DataStage job as a web service

Using IBM information console

Job scheduling using job sequences in DataStage

Creating a job sequence

  • Overview of activity stages
  • Triggers
  • Expressions
  • Job activity properties
  • Routine activity properties
  • Email notification activity properties
  • Wait for file activity properties
  • exception activity properties
  • Nested condition activity properties
  • Start loop activity properties
  • End loop activity properties
  • User variables activity properties
  • Compiling and restarting the job sequence

Some advanced concepts in DataStage

  • Achieving reusability in DataStage using containers
    • Types of containers
    • Local containers
    • Server shared containers
    • Parallel shared containers
  • Creating a shared containers
  • Using shared containers in DataStage jobs
  • Converting shared containers to local containers
  • Deconstruction of shared containers
  • Specifying our own parallel stage
    • Defining custom stage
    • Defining build stage
    • Build stage macros
    • Defining wrapper stage
  • Usage of administrator client in datastage
    • Adding environment variables
    • Setting job parameters default values
    • Changing license details
    • Handling projects
  • Buffer settings in DataStage
  • Multiple instances of jobs in DataStage
  • DataStage job control utility
  • Jobs – compilation execution and checking of logs using DataStage tool
  • Handling multilingual data in DataStage
  • How to enable NLS on DataStage
  • Orchestrate architecture and commands
  • Orchestrate parallel processing framework in datastage
  • Orchestrate utility in DataStage
  • Surrogate key generation using DataStage
  • Version control in DataStage