datastage 8.1: DS interview questions

1. What is difference between server jobs & parallel jobs
The Parallel jobs are also available if you have Datastage 6.0 PX, or Datastage 7.0 versions installed. The Parallel jobs are especially useful if you have large amounts of data to process.

2. What is merging? And how to use merge?
merge is nothing but a filter conditions that have been used for filter condition

3. How we use NLS function in Datastage? What are advantages of NLS function? Where we can use that one? Explain briefly?
As per the manuals and documents, we have different level of interfaces. More specific? Like Teradata interface operators, DB2 interface operators, Oracle Interface operators and SAS-Interface operators. Orchestrate National Language Support (NLS) makes it possible for you to process data in international languages using Unicode character sets. International Components for Unicode (ICU) libraries support NLS functionality in Orchestrate. Operator NLS Functionality* Teradata Interface Operators * switch Operator * filter Operator * The DB2 Interface Operators * The Oracle Interface Operators* The SAS-Interface Operators * transform Operator * modify Operator * import and export Operators * generator

4. What is APT_CONFIG in datastage
The APT_CONFIG_FILE (not just APT_CONFIG) is the configuration file that defines the nodes, (the scratch area, temp area) for the specific project.

5. What is the OCI? And how to use the ETL Tools?
OCI means orabulk data which used client having bulk data its retrieve time is much more i.e., your used to orabulk data the divided and retrieved

6. What is merge and how it can be done explain with simple example taking 2 tables?
Merge is used to join two tables. It takes the Key columns sort them in Ascending or descending order. Let us consider two table i.e. Emp,Dept.If we want to join these two tables we are having DeptNo as a common Key so we can give that column name as key

7. What is version Control?
Version Control stores different versions of DS jobs runs different versions of same job reverts to previous version of a job view version histories

8. What are the Repository Tables in DataStage and what are they?
A datawarehouse is a repository (centralized as well as distributed) of Data, able to answer any adhoc, analytical, historical or complex queries. Metadata is data about data. Examples of metadata include data element descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions, and process/method descriptions. The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like the name, length, valid values, and description of a data element. Metadata is stored in a data dictionary and repository. It insulates the data warehouse from changes in the schema of operational systems. In data stage I/O and Transfer , under interface tab: input , out put & transfer pages will have 4 tabs and the last one is build under that u can find the TABLE NAME .The DataStage client components are:AdministratorAdministers DataStage projects and conducts housekeeping on the serverDesignerCreates DataStage jobs that are compiled into executable programs Director Used to run and monitor the DataStage jobsManagerAllows you to view and edit the contents of the repository

9. How can we pass parameters to job by using file?
You can do this, by passing parameters from UNIX file, and then calling the execution of a datastage job. the ds job has the parameters defined (which are passed by Unix)

10. Where does Unix script of datastage execute weather in clinet machine or in server? Suppose if it executes on server then it will execute?
Datastage jobs are executed in the server machines only. There is nothing that is stored in the client machine.

11. Defaults nodes for datastage parallel Edition
Actually the Number of Nodes depend on the number of processors in your system. If your system is supporting two processors we will get two nodes by default

12. What happens if RCP disable?
In such case Osh has to perform Import and export every time when the job runs and the processing time job is also increased

13. I want to process 3 files in sequentially one by one, how I can do that. While processing the files it should fetch files automatically.

If the metadata for all the files r same then create a job having file name as

Parameter, then use same job in routine and call the job with different file

Name or u can create sequencer to use

14. Scenario based Question..... Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 )if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target table remaining are not loaded and your job going to be aborted then.. How can short out the problem.
Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this condition should go director and check it what type of problem showing either data type problem, warning massage, job fail or job aborted, If job fail means data type problem

15. What is the Batch Program and how can generate?
Batch program is the program it's generate run time to maintain by the datastage it self but u can easy to change own the basis of your requirement (Extraction, Transformation, Loading) .Batch program are generate depends your job nature either simple

16. How many places u can call Routines?
Four Places u can call (i) Transform of routine (A) Date Transformation (B) Upstring Transformation (ii) Transform of the Before & After Subroutines(iii) XML transformation(iv)Web base transformation

17. How do you fix the error "OCI has fetched truncated data" in DataStage
Can we use Change capture stage to get the truncated data’s? Members please confirm

18. Importance of Surrogate Key in Data warehousing?
Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e. Surrogate Key is not affected by the changes going on with a database.

19. What’s the difference between Datastage Developers and Datastage Designers? What are the skills required for this.
Datastage developer is one how will code the jobs. datastage designer is how will design the job, i mean he will deal with blue prints and he will design the jobs the stages that are required in developing the code

20. How do you merge two files in DS?
Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or created a job to concatenate the 2 files into one if the metadata is different.

21. How do we do the automation of ds jobs?
We can call Datastage Batch Job from Command prompt using 'dsjob'. We can also pass all the parameters from command prompt. Then call this shell script in any of the market available schedulers. The 2nd option is schedule these jobs using Data

22. What is DS Director used for - did u use it?
Datastage director is used to run the jobs and validate the jobs. we can go to datastage director from datastage designer it self.

23. What is DS Manager used for - did u use it?
Datastage manager is used to export and import purpose [/B] main use of export and import is sharing the jobs and projects one project to other project.

24. What are types of Hashed File?
Hashed File is classified broadly into 2 types. a) Static - Sub divided into 17 types based on Primary Key Pattern. b) Dynamic - sub divided into 2 types i) Generic ii) Specific. Default Hashed file is "Dynamic – Type

25. How do you eliminate duplicate rows?
Removal of duplicates done in two ways: 1. Use "Duplicate Data Removal" stage or 2. Use group by on all the columns used in select, duplicates will go away.

26. What about System variables?
DataStage provides a set of variables containing useful system information that you can access from a transform or routine. System variables are read-only. @DATE the internal date when the program started. See the Date function.

@DAY The day of the month extracted from the value in @DATE.

@FALSE The compiler replaces the value with 0.

@FM A field mark, Char (254).

@IM An item mark, Char (255).

@INROWNUM Input row counter. For use in constrains and derivations in Transformer stages.

@OUTROWNUM Output row counter (per link). For use in derivations in Transformer stages.

@LOGNAME The user login name.

@MONTH The current extracted from the value in @DATE.

@NULL The null value.

@NULL.STR The internal representation of the null value, Char (128).

@PATH The pathname of the current DataStage project.

@SCHEMA The schema name of the current DataStage project.

@SM A subvalue mark (a delimiter used in Universe files), Char(252).

@SYSTEM.RETURN.CODE
Status codes returned by system processes or commands.

@TIME The internal time when the program started. See the Time function.

@TM A text mark (a delimiter used in Universe files), Char (251).

@TRUE The compiler replaces the value with 1.

@USERNO The user number.

@VM A value mark (a delimiter used in Universe files), Char (253).

@WHO The name of the current DataStage project directory.

@YEAR The current year extracted from @DATE.

REJECTED Can be used in the constraint expression of a Transformer stage of an output link. REJECTED is initially TRUE, but is set to FALSE whenever an output link is successfully written.

27. What is DS Designer used for - did u use it?
You use the Designer to build jobs by creating a visual design that models the flow and transformation of data from the data source through to the target warehouse. The Designer graphical interface lets you select stage icons, drop them onto the Designer

28. What is DS Administrator used for - did u use it?
The Administrator enables you to set up DataStage users, control the purging of the Repository, and, if National Language Support (NLS) is enabled, install and manage maps and locales.

29. Dimensional modeling is again sub divided into 2 types.
A) Star Schema - Simple & Much Faster. Denormalized form. B) Snowflake Schema - Complex with more Granularity. More normalized form.

30. How will you call external function or subroutine from datastage?
There is datastage option to call external programs. execSH

31. How do you pass filename as the parameter for a job?
While job development we can create a parameter 'FILE_NAME' and the value can be passed while running the job.

32. How to handle Date conversions in Datastage? Convert an mm/dd/yyyy format to yyyy-dd-mm?
We use a) "Iconv" function - Internal Conversion. b) "Oconv" function - External Conversion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")

33. What’s difference between operational data stage (ODS) & data warehouse?
That which is volatile is ODS and the data which is nonvolatile and historical and time variant data is DWh data. In simple terms ods is dynamic data.

34. When should we use ODS?
DWH's are typically read only, batch updated on a schedule ODS's are maintained in more real time, trickle fed constantly

35. What are the Job parameters?
These Parameters are used to provide Administrative access and change run time values of the job. EDIT>JOBPARAMETERS In that Parameters Tab we can define the name,prompt,type,value

36. How can we join one Oracle source and Sequential file?
Join and look up used to join oracle and sequential file

37. What is iconv and oconv functions?
Iconv and oconv are date conversion functions

38. Difference between Hash file and Sequential File?
Hash file stores the data based on hash algorithm and on a key value. A sequential file is just a file with no key column. Hash file used as a reference for look up. Sequential file cannot

39. How do you rename all of the jobs to support your new File-naming conventions?
Create an Excel spreadsheet with new and old names. Export the whole project as a dsx. Write a Perl program, which can do a simple rename of the strings looking up the Excel file. Then import the new dsx file probably into a new project for testing

40. Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic.
There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate options. They are radically different in permissions (Truncate requires you to have altered table permissions where Delete doesn't).

41. Tell me one situation from your last project, where you had faced problem and How did u solve it?
A. The jobs in which data is read directly from OCI stages are running extremely slow. I had to stage the data before sending to the transformer to make the jobs run faster.
B. The job aborts in the middle of loading some 500,000 rows. Have an option either cleaning/deleting the loaded data and then run the fixed job or run the job again from the row the job has aborted. To make sure the load is proper we opted the former.

42. The above might raise another question: Why do we have to load the dimensional tables first, then fact tables
as we load the dimensional tables the keys (primary) are generated and these keys (primary) are Foreign keys in Fact tables.

43. How will you determine the sequence of jobs to load into data warehouse?
First we execute the jobs that load the data into Dimension tables, then Fact tables, then load the Aggregator tables (if any).

44. What are the command line functions that import and export the DS jobs?
dsimport.exe- imports the DataStage components. dsexport.exe- exports the DataStage components

45. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director?
Use crontab utility along with dsexecute() function along with proper parameters passed.

46. How would call an external Java function which are not supported by DataStage?
Starting from DS 6.0 we have the ability to call external Java functions using a Java package from Ascential. In this case we can even use the command line to invoke the Java function and write the return values from the Java program (if any) and use that files as a source in DataStage job.

47. What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job.
Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. May be you can schedule the sequencer around the time the file is expected to arrive.B. Under UNIX: Poll for the file. Once the file has start the job

48. Read the String functions in DS
Functions like [] -> sub-string function and ':' -> concatenation operator Syntax: string [ [ start, ] length ]string [ delimiter, instance, repeats ]

49. How did u connect with DB2 in your last project?
Most of the times the data was sent to us in the form of flat files. The data is dumped and sent to us. In some cases were we need to connect to DB2 for look-ups as an instance then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation

50. What are Sequencers?
Sequencers are job control programs that execute other jobs with preset Job parameters

51. How did you handle an 'Aborted' sequencer?
In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again.

52. What are other Performance tunings you have done in your last project to increase the performance of slowly running jobs?
Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts. Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values

53. How did you handle reject data?
Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. So Reject link has to be defined every Output link you wish to collect rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-rows

54. If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for?
Link Partitioner - Used for partitioning the data. Link Collector - Used for collecting the partitioned data.

55. What are Routines and where/how are they written and have you written any routines before?
Routines: Routines are stored in the Routines branch of the DataStage Repository, where you can create, view, or edit them using the Routine dialog box. The following program components are classified as routines:• Transform functions. These are functions that you can use when defining custom transforms. DataStage has a number of built-in transform functions which are located in the Routines ➤ Examples➤ Functions branch of the Repository. You can also define your own transform functions in the Routine dialog box. Before/After subroutines. When designing a job, you can specify a subroutine to run before or after the job, or before or after an active stage. DataStage has a number of built-in before/after subroutines, which are located in the Routines ➤ Built-in ➤ Before/Afterbranch in the Repository. You can also define your own before/after subroutines using the Routine dialog box. Custom Universe functions. These are specialized BASIC functions that have been defined outside DataStage. Using the Routine dialog box, you can get DataStage to create a wrapper that enables you to call these functions from within DataStage. These functions are stored under the Routines branch in the Repository. You specify the category when you create the routine. If NLS is enabled,9-4 Ascential DataStage Designer Guide you should be aware of any mapping requirements when using custom Universe functions. If a function uses data in a particular character set, it is your responsibility to map the data to and from Unicode. ActiveX (OLE) functions. You can use ActiveX (OLE) functions as programming components within DataStage. Such functions are made accessible to DataStage by importing them. This creates a wrapper that enables you to call the functions. After import, you can view and edit the BASIC wrapper using the Routine dialog box. By default, such functions are located in the Routines ➤Class name branch in the Repository, but you can specify your own category when importing the functions. When using the Expression Editor, all of these components appear under the DS Routines… command on the Suggest Operand menu. A special case of routine is the job control routine. Such a routine is used to set up a DataStage job that controls other DataStage jobs. Job control routines are specified in the Job control page on the Job Properties dialog box. Job control routines are not stored under the Routines branch in theRepository.TransformsTransforms are stored in the Transforms branch of the DataStage Repository, where you can create, view or edit them using the Transform dialog box. Transforms specify the type of data transformed the type it is transformed into, and the expression that performs the transformation. DataStage is supplied with a number of built-in transforms (which you cannot edit). You can also define your own custom transforms, which are stored in the Repository and can be used by other DataStage jobs. When using the Expression Editor, the transforms appear under the DSTransform… command on the Suggest Operand menu.FunctionsFunctions take arguments and return a value. The word “function” is applied to many components in DataStage:• BASIC functions. These are one of the fundamental building blocks of the BASIC language. When using the Expression Editor, Programming in DataStage 9-5you can access the BASIC functions via the Function… command on the Suggest Operand menu. DataStage BASIC functions. These are special BASIC functions that are specific to DataStage. These are mostly used in job control routines. DataStage functions begin with DS to distinguish them from general BASIC functions. When using the Expression Editor, you can access the DataStage BASIC functions via the DS Functions…command on the Suggest Operand menu. The following items, although called “functions,” are classified as routines and are described under “Routines” on page 9-3. When using the Expression Editor, they all appear under the DS Routines… command on the Suggest Operand menu.• Transform functions• Custom Universe functions• ActiveX (OLE) functionsExpressionsAn expression is an element of code that defines a value. The word” expression” is used both as a specific part of BASIC syntax, and to describe portions of code that you can enter when defining a job. Areas of DataStage where you can use such expressions are:• Defining breakpoints in the debugger• Defining column derivations, key expressions and constraints in Transformer stages• Defining a custom transform In each of these cases the DataStage Expression Editor guides you as to what programming elements you can insert into the expression

56. What are OConv () and Iconv () functions and where are they used?
IConv() - Converts a string to an internal storage format OConv() - Converts an expression to an output format.

57. How did u connect to DB2 in your last project?
Using DB2 ODBC drivers.

58. Do u know about METASTAGE?
in simple terms metadata is data about data and metastge can be anything like DS(dataset,sq file,etc)

59. Do you know about INTEGRITY/QUALITY stage?
integrity/quality stage is a data integration tool from ascential which is used to standardize/integrate the data from different sources

60. What versions of DS you worked with?
DS 7.0.2/6.0/5.2

61. What are Static Hash files and Dynamic Hash files?
As the names itself suggest what they mean. In general we use Type-30 dynamic Hash files. The Data file has a default size of 2Gb and the overflow file is used if the data exceeds the 2GB size.

62. What is Hash file stage and what is it used for?
Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better performance.

63. Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps you have taken in doing so?
Yes. The following are some of the steps; I have taken in doing so:1) Definitely take a back up of the whole project(s) by exporting the project as a .dsx file2) See that you are using the same parent folder for the new version also for your old jobs

64. Did you Parameterize the job or hard-coded the values in the jobs?
Always parameterized the job. Either the values are coming from Job Properties or from a ‘Parameter Manager’ – a third part tool. There is no way you will hard–code some parameters in your jobs. The often Parameterized variables in a job are: DB DSN name

65. Tell me the environment in your last projects
Give the OS of the Server and the OS of the Client of your recent most project

66. How many jobs have you created in your last project?
100+ jobs for every 6 months if you are in Development, if you are in testing 40 jobs for every 6 months although it need not be the same number for everybody

67. What are the often used Stages or stages you worked with in your last project?
A) Transformer, ORAOCI8/9, ODBC, Link-Partitioner, Link-Collector, Hash, ODBC, Aggregator, Sort.

68. What r XML files and how do you read data from XML files and what stage to be used?
In the pallet there is Real time stages like xml-input,xml-output,xml-transformer

69. Suppose if there are million records did you use OCI? if not then what stage do you prefer?
using Orabulk

70. How do you pass the parameter to the job sequence if the job is running at night?
Two ways1. Ste the default values of Parameters in the Job Sequencer and map these parameters to job.2. Run the job in the sequencer using dsjobs utility where we can specify the values to be taken for each parameter.

71. What happens if the job fails at night?
Job Sequence Abort

72. What is SQL tuning? How do you do it?
in database using Hints

73. How do you track performance statistics and enhance it?
Through Monitor we can view the performance statistics.

74. What is the order of execution done internally in the transformer with the stage editor having input links on the left hand side and output links?
Stage variables, constraints and column derivation or expressions.

75. What are the difficulties faced in using DataStage? Or what are the constraints in using DataStage?
1) If the number of lookups are more? 2) What will happen, while loading the data due to some regions job aborts?

76. Differentiate Database data and Data warehouse data?
Data in a Database is a) Detailed or Transactional b) Both Readable and Writable. c) Current.

77. Dimension Modeling types along with their significance
Data Modeling is broadly classified into 2 types. a) E-R Diagrams (Entity - Relationships). b) Dimensional Modeling

78. What is the flow of loading data into fact & dimensional tables?
Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in Dimensional table. Consists of fields with numeric values. Dimension table - Table with Unique Primary Key. Load - Data should be first loaded into dimensional table.

79. Orchestrate Vs Datastage Parallel Extender?
Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0 i.e Parallel Extender.

80. Differentiate Primary Key and Partition Key?
Primary Key is a combination of unique and not null. It can be a collection of key values called as composite primary key. Partition Key is a just a part of Primary Key. There are several methods of partition like Hash, DB2, Random etc.While using Hash partition we specify the Partition Key.

81. How do you execute datastage job from command line prompt?
Using "dsjob" command as follows. dsjob -run -job status projectname jobname

82. What are Stage Variables, Derivations and Constants?
Stage Variable - An intermediate processing variable that retains value during read and doesnt pass the value into target column. Derivation - Expression that specifies value to be passed on to the target column. Constant - Conditions that are either

83. What is the default cache size? How do you change the cache size if needed?
Default cache size is 256 MB. We can increase it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there.

84. Containers: Usage and Types?
Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers. a) Local Container: Job Specific b) Shared Container: Used in any job within a project.

85. Compare and Contrast ODBC and Plug-In stages?
ODBC: a) Poor Performance. b) Can be used for Variety of Databases. c) Can handle Stored Procedures. Plug-In: a) Good Performance. b) Database specific.(Only one database) c) Cannot handle Stored Procedures.

86. How to run a Shell Script within the scope of a Data stage job?
By using "ExcecSH" command at Before/After job properties.

87. Types of Parallel Processing?
Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing.

88. What does a Config File in parallel extender consist of?
Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location.

89. Functionality of Link Partitioner and Link Collector?
Link Partitioner: It actually splits data into various partitions or data flows using various partition methods. Link Collector: It collects the data coming from partitions, merges it into a single data flow and loads to target.

90. What is Modulus and Splitting in Dynamic Hashed File?
In a Hashed File, the size of the file keeps changing randomly. If the size of the file increases it is called as "Modulus". If the size of the file decreases it is called as "Splitting".

91. Types of vies in Datastage Director?
There are 3 types of views in Datastage Director a) Job View - Dates of Jobs Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages, Event Messages, and Program Generated Messages.

datastage 8.1

Pages

DS interview questions

No comments:

Post a Comment