ORCHESTRATE 7 RecordSchemaSyntax 20060302

Orchestrate 7.
Record Schema Syntax
syntax
record { prop1 [= arg1 ], prop2 [= arg2 ], ... propn [= argn ] }
(fieldname_1 : fieldtype { prop1 [= arg1 ], prop2 [= arg2 ], ... propn [= argn ] };
...
fieldname_n : fieldtype { prop1 [= arg1 ], prop2 [= arg2 ], ... propn [= argn ] };)
example
osh import -file inputFile.data
-schema record {record_length = fixed, delim = none, binary}
(
name:string[80] {padchar = ' '};
age:int8 {default = 127};
income:dfloat {text, width = 8};
)
> outputDS.ds
General Operator Record Schema

The syntax in this section applies to all Orchestrate operators except for the properties
enclosed in curly braces which apply only to the import and export operators.
See the sectionsImport/Export Operator Record Schema on page 5 and Generator
Operator Field Properties on page 10 for syntax specific to those operators.
Field Names
A field name may include single-byte and unicode characters. Its character-set encoding is
determined by the top-level osh option -input_charset. It can contain only alphanumeric
characters, as defined by the Unicode standard, and underscore characters. It must start
with a letter or an underscore character. It is case insensitive and can be of any length.
Field Data Types

Field Data
Type
Forms
Explanation
Default Value
integer
int8, int16, int32, int64

uint8, uint16, uint32, uint64
1, 2, 4, and 8-byte signed integers

1, 2, 4, and 8-byte unsigned
integers
0
0
floating point
sfloat
dfloat
single-precision floating-point
value
double-precision floating-point
value
0.0
0.0
Orchestrate 7.5 Record Schema Syntax
Field Data Types (continued)

Field Data
Type
Forms
Explanation
Default Value
string
for processing
single-byte
character data
(UTF-8)
string
string[max=n_codepoint_units]
string[n_codepoint_units]
string[ ...,
padchar = ASCII_value |
'ASCII_char' | null ]
variable-length string
variable-length with upper
bound
fixed-length string
optional padding character
specification
length = 0
length = 0
A codepoint is always equal to a

byte.
ustring
for processing
double-byte
unicode
character data
(UTF-16)
ustring
ustring[max=n_codepoint_units]
decimal
date/time
null (0x00)
characters or the
padchar, if
specified
variable-length unicode string

variable-length unicode string
with upper bound
fixed-length unicode string with
optional unicode padding
character specification
length = 0
length = 0
decimal[precision]
decimal[precision, scale]
precision is the total number of

digits in the decimal; scale is the
number of digits to the right of
the decimal point.
precision: 1 - 255 digits
0 <= scale < precision
scale = 0
date
time
time[microseconds]
timestamp
timestamp[microseconds]
date
time with second resolution
time with microsecond resolution
date/time with second resolution
date/time with microsecond
resolution
January 1, 0001
00:00:00
00:00:00
00:00:00 on
January 1, 0001
ustring[n_codepoint_units]
ustring[ ...,
padchar = unicode_value |
'unicode_char' | null ]
null (0x0000)
characters or the
padchar, if
specified
Unicode characters can be

included in the string
components of uformat syntax:
String%macroString %macroString
%macroString
raw
raw
raw[max=n]
raw[n]
raw[ ..., align=n]
variable-length
variable-length with upper
bound
fixed length
align on n-byte boundary where
n = 1, 2, or 4 (optional)
length = 0
length = 0
bytes = 0
Vector Fields
Vector fields cannot be tagged.

Vector Field Syntax
Explanation
fieldname[length] : fieldtype
fixed-length vector
fieldname[] :
fieldtype { vector_prefix=n }
variable length vector; use zero-based indexing to reference an

element
The vector_prefix syntax, where n = 1, 2, or 4,
can only be specified for the import and export operators.
length-field : uint32 { link };

fieldname[] : fieldtype
{ reference=length-field };
The link and reference syntax, where the length of the vector is given
by the value of the attached length-field, can only be specified for the
import and export operators.
Nullable Fields
Nullable Field Syntax
Explanation
fieldname : nullable fieldtype
nullable field
fieldname[] :
fieldtype { vector_prefix=n }
vector of nullable values

The vector_prefix syntax, where n = 1, 2, or 4,
can only be specified for the import and export operators.
Subrecord Fields
Subrecord fields are not nullable.

Subrecord Field Syntax
Explanation
fieldname : subrec
(subfieldname1:fieldtype;
Use dot addressing to reference a subrecord:
...
subfieldnamen:fieldtype;
);
fieldname.subfieldname
A subrec has its own naming scope. For example, fieldname and
subfieldname1 can have the same name.
A subrecord field can be placed inside a tagged field.
fieldname[n] : subrec
...
subfieldnamen:fieldtype;
);
fieldname : subrec
A vector of subrecords. Use zero-based indexing to reference a

vector element and dot addressing to reference a subrecord:
fieldname[n].subfieldname
Nested subrecords to any depth. To reference a subrecord, use dot
addressing. For example:
...
subfieldname2:subrec
(subfieldname2a:fieldtype;
fieldname.subfieldname2.subfieldname2b
...
);
subfieldname2b:fieldtype;);
Tagged Fields
Tagged fields are not nullable.

Tagged Field Syntax
Explanation
fieldname : tagged
(tagfield1:fieldtype;
The value of fieldname can be the data type of any one of the
tagfields. A tagfield can be any type except tagged or subrec.
...
tagfieldn:fieldtype;
);
link-fieldname : uint32 { link } ;
tag-fieldname : tagged
{ reference = link-fieldname }
(
fieldname 0: fieldtype
{ tagcase = 0 };
fieldname1 : fieldtype
{ tagcase = 1 };
...
fieldnamen : fieldtype
{ tagcase = n };
);
Use dot addressing to reference a tagged field: fieldname.tagfield.

A tagged field cannot be a vector. A subrecord field can be placed
inside a tagged field.
The link and reference syntax, where the length of the vector is
given by the value of the attached length-field, can only be specified
for the import and export operators. The attached link-fieldname
supplies a tagcase value for tag-fieldname. For example, if the value
of link-field is 1, the data type for tag-fieldname is the data type of
fieldname1.
Import/Export Operator Record Schema

In record schemas used for the import or export of data, you can use the record- and fieldlevel properties to specify how the data is read or written.
The default schema format for input specifies that each record is a single newlinedelimited string:
record { text, record_delim='\n' } ( rec: string; )
The default schema for export specifies that the fields are space-separated, with the record
terminated by a newline character:
record { text, delim=' ', record_delim='\n' } ()
Import/Export Record Properties

Import/Export Record
Properties
Explanation
Intact Properties
This property lets you improve performance by limiting the amount of processing done on each
record of the data. Only fields explicitly specified by the schema are processed. The example below
contains 80 characters of data, a character newline delimiter, and references only two fields:
record { intact, record_length = 81, record_delim='\n' }
( int : int32 { position=12, delim=,, text };
str : string[20] { position=40, delim=none };
)
intact
intact record, with no import field checking (faster)
check_intact
used only with intact, to enable import field checking

(slower)
intact=recordname
optionally specify an identifier for intact schema
Length Properties
record_length = length
fixed-length records
record_length = fixed
field lengths determine the record length
Delimiter Properties
record_delim = character | null
specifies a single ASCII or multi-byte Unicode character or

null (0x00) to delimit a record; the default is the newline
character
record_delim_string = "string"
specifies an ASCII or multi-byte Unicode string to delimit a

record
delim = ws | end | none |null |'x'
specifies the trailing delimiter of all fields in the record
delim_string
specifies a string of one or more ASCII or multi-byte Unicode

characters forming a trailing delimiter
Import/Export Record Properties (continued)

Import/Export Record
Properties
Explanation
Delimiter Properties (continued)

final_delim = ws | end | none|
null | 'x'
unique deliminter for the last field; precedes record_delim, if

any; the options are white space, end-of-record, no delimiter
(use field length), null (0x00), or a single-byte or unicode
character; the default is end-of-record.
final_delim_string = 'string'
single-byte or unicode string delimiter for the last field
Prefix Property
record_prefix = n
Specifies that a variable-length record is prefixed by a 1-, 2-,

or 4-byte length prefix.
Format Property
record_format=
{ type = implicit | varying
[, format = V, VB, VS, VBS, | VR] }
specifies that data consists of variable-length blocked records;

implicit denotes records with no explicit record boundaries;
the varying values are IBM blocked or spanned formats.
Record-Level Field Properties

These properties control the processing of the fields in the record. An example is a record with three
comma-separated fields. The last field is delimited by the end of the record:
record {delim=',', final_delim=end } (a: int32; b:int32; c:int32; )
fill = bytevalue | 'x' | 'string'
on export, fill the gaps between positioned fields with a byte

value from 0 - 255 or a character or string in single quotes;
ignored on import
final_delim = ws | end | none|

null | 'x'
unique deliminter for the last field; precedes record_delim, if

any; the options are white space, end-of-record, no delimiter
(use field length), null (0x00), or a single-byte or unicode
character; the default is end-of-record.
final_delim_string = 'string'
single-byte or unicode string delimiter for the last field
Import/Export Field Properties

R: A field property that has an R in its left-most column can also be used as a record
property to specify a default property for all the fields in a record. A field-level property
overrides a record-level property.
The general import/export field properties apply to all field types, except where noted.
General Import/Export Field Properties
General Import/Export Field

Property
Explanation
binary
binary format for data values

Binary format cannot be used with string or ustring fields.
text
, ascii | ebcdic
, import_ebcdic_as_ascii
, export_ebcdic_as_ascii
text representation of values

optionally specify character set
optionally convert character set on import
optionally convert character set on export
Text format cannot be used with raw fields or with ustring
fields. The charset property below offers similar functionality
for ustring fields.
charset = charset
At the field level, it defines the character set to be used for

ustring fields; at the record level it applies to all ustring fields
in the record and the other import/export properties that
support multi-byte Unicode character data.
prefix = n
n-byte length prefix where n = 1, 2, or 4
delim_string = "string"
a single-byte string or unicode string delimiter for the field
delim =
ws | end | none |null |'x'
delimiter for field

ws = whitespace which is the default
end = end of record
none = no delimiter (use field length)
null = 0x00
'x' = specified single-byte or unicode character
quote = single | double | 'x'
the field is enclosed by single-byte or unicode quote

characters
Quotes are removed on import, and added on export.
print_field
print each imported field value; intended for debugging
default = value
default value to be used if field import or export fails; may

contain unicode characters for ustring fields
drop
drop field on import; ignored on export
generate
set field to default value on export; ignored on import
position = nbytes | fieldname
nbytes is the offset of the field in the record where the first
byte = 0; fieldname uses the starting offset of the field
General Import/Export Field Properties (continued)

General Import/Export Field
Property
Explanation
skip = nbytes
skip nbytes after preceding field; can be negative to skip

backward
link
reference = linkfieldname
mark the field an attached field that supplies a value to the

reference field; used for vector and tagged fields
Type-Specific Import/Export Field Properties

Type-Specific Import/Export
Field Property
Explanation
Integer and Floating-Point Fields

R
width = nbytes |
max_width = nbytes
fixed or maximum text-format length
in_format = "format"
out_format = "format"
c_format = "format"
read and write using a C-style format string. For

example: c_format="%3d"
big_endian | little_endian |
native_endian
byte ordering; the default is native_endian
padchar = 'x' | null
single-byte or unicode pad character for fixed-length

fields (null = 0x00)
String and ustring Fields

R
width = nbytes |
max_width = nbytes
fixed or maximum length
padchar = 'x' | null
single-byte or unicode pad character for fixed-length

fields (null = 0x00)
import_ascii_as_ebcdic
Translate a string (not ustring) field from ASCII to

EBCDIC at import
Decimal Fields
R
binary
for decimals, binary is packed
decimal_separator =
ASCII_character
separates the integer and fraction components of a

decimal
width = nbytes |
max_width = nbytes
fixed or maximum length
precision = digits
the number of digits (1 - 255)
scale = digits
number of decimal digits (0 precision)
nofix_zero | fix_zero
disallow (the default) or allow all-zero values
Type-Specific Import/Export Field Properties (continued)

Field Property
Explanation
Decimal Fields (continued)

overpunch
specifies that the field has a leading or ending byte that

contains a character which specifies both the numeric
value of that byte and whether the number as a whole is
negatively or positively signed
packed
, check | nocheck
, signed | unsigned
packed format; the defaults are check and signed
separate, leading | trailing
separate sign byte; the default is leading
zoned, leading | trailing
sign nibble; the default is trailing
round = ceil | floor | trunc_zero |

round_inf
rounding type, where round_inf means the nearest

value and trunc_zero is the default value
Date Fields
native_endian
binary
a synonym for the julian property
charset charset
a character set for the string components in the String

components of the default_date_format property below
date_format =
"%yyyy-%mm-%dd"
date format string
default_date_format =
"String%yyyyString%mm
String&dd"
Strings provide support for international unicode

components
days_since =
"%yyyy-%mm-%dd"
days since date (binary format)
julian
Julian day number (binary format)
Time Fields
R
time_format = "%hh:%nn:%ss"
time format string
default_time_format =
String %hhString %nnString %ss

components
native_endian
binary
equivalent to midnight_seconds
midnight_seconds
seconds since midnight (binary format)
Generator Operator Field Properties
Type-Specific Import/Export Field Properties (continued)

Field Property
Explanation
Timestamp Fields
R
native_endian
timestamp_format =
"%yyyy-%mm-%dd%hh:%nn:%ss"
| "String%yyyyString%mm
String&ddString%hhString %nn
String%ss
timestamp format string

components
Vector Fields
vector_prefix = n
n-byte vector length prefix where n = 1, 2, or 4
Nullable Fields
null_field = "string"
single-byte character string or unicode string

representation of a null value. An example is:
null_field = "NULL\0\0\0"
null_length = value
,actual_length = nbytes
field length indicating a null

actual length to skip for a null (optional)

The generator operator chapter of the Orchestrate 7.5 Operators Reference contains
more details about these properties.
generator Operator Field Properties

Data Type
Generator Field Properties
Explanation
numeric
(also decimal, date,
time, timestamp)
cycle = {init = init_val ,

incr = incr_val , limit = limit_val }
|
random = {limit = limit_val ,
seed = seed_val , signed}
generate a repeating pattern of

values
date
epoch = 'date'
set the earliest date for the field;

use yyyy-mm-dd format
10
generate random values
invalids = percentage
percentage of fields with invalid

values
function = rundate
set the field to the current date
generator Operator Field Properties (continued)

Data Type
Generator Field Properties
Explanation
decimal
zeros = percentage
percentage of fields set to 0x00
percentage of fields set to 0xFF

(invalid)
raw
no options available
string
cycle = {value = 'string_1',

value = 'string_2 ', ... }
list of string values
alphabet = 'alpha_numeric_string '
sets all charaters to successive

characters in the string
s: string[3]
{alphabet = 'abc'};
produces aaa bbb ccc aaa ...
ustring
time
timestamp
nullable
cycle = {value = 'ustring_1',

value = 'ustring_2 ', ... }
list of ustring values
alphabet =
'alpha_numeric_ustring '
sets all charaters to successive

characters in the string
s: ustring[3]
{alphabet = 'abc'};
produces aaa bbb ccc aaa ...
scale = factor
specify a multiplier to the

increment value
percentage of invalid field values
epoch = 'date'
set the earliest date for the field;

use yyyy-mm-dd format
scale = factor
specify a multiplier to the

increment value
percentage of fields with invalid

values
nulls = percentage
percentage of fields set to null
nullseed = number
set the seed for the random

number generator used to
determine whether a field is null
11
Ascential Software Corporation

50 Washington Street
Westboro, MA 01581-1021
508 366-3888
508 389-8955 FAX
If you require assistance or have questions about Orchestrate, you can

contact Ascential Customer Support by:
- Calling (866) INFONOW
- Writing support@ascential.com for any Ascential Software product or
orch-support@ascential.com for Orchestrate-specific help.
- Logging onto the Ascential Support eService Web site at:
www.ascential.com:8080/eservice/index.html
For current information about Ascential and its products log onto:
http://www.ascentialsoftware.com/
2004, 1995-2004 Ascential Software Corporation. All rights reserved.
ASCENTIAL, Orchestrate and Orchestrate Hybrid Neural Network,
DataStage Parallel Extender are trademarks of Ascential Software
Corporation or its affiliates and may be registered in the United States or
other jurisdictions. UNIX is a registered trademark of the Open Group.
Other marks are the property of the owners of those marks.
This product may contain or utilize third party components subject to the
user documentation previously provided by Ascential Software
Corporation or contained herein.
12

ORCHESTRATE 7 RecordSchemaSyntax 20060302

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ORCHESTRATE 7 RecordSchemaSyntax 20060302

Uploaded by

Copyright:

Available Formats

Orchestrate 7.

Record Schema Syntax

General Operator Record Schema

Field Data Types

int8, int16, int32, int64

1, 2, 4, and 8-byte signed integers

Orchestrate 7.5 Record Schema Syntax

General Operator Record Schema

Field Data Types (continued)

A codepoint is always equal to a

variable-length unicode string

precision is the total number of

Unicode characters can be

Orchestrate 7.5 Record Schema Syntax

General Operator Record Schema

Vector fields cannot be tagged.

variable length vector; use zero-based indexing to reference an

length-field : uint32 { link };

fieldname : nullable fieldtype

vector of nullable values

Subrecord fields are not nullable.

Use dot addressing to reference a subrecord:

A vector of subrecords. Use zero-based indexing to reference a

Orchestrate 7.5 Record Schema Syntax

General Operator Record Schema

Tagged fields are not nullable.

Use dot addressing to reference a tagged field: fieldname.tagfield.

Orchestrate 7.5 Record Schema Syntax

Import/Export Operator Record Schema

Import/Export Operator Record Schema

Import/Export Record Properties

intact record, with no import field checking (faster)

used only with intact, to enable import field checking

optionally specify an identifier for intact schema

field lengths determine the record length

specifies a single ASCII or multi-byte Unicode character or

specifies an ASCII or multi-byte Unicode string to delimit a

delim = ws | end | none |null |'x'

specifies the trailing delimiter of all fields in the record

specifies a string of one or more ASCII or multi-byte Unicode

Orchestrate 7.5 Record Schema Syntax

Import/Export Operator Record Schema

Import/Export Record Properties (continued)

Delimiter Properties (continued)

unique deliminter for the last field; precedes record_delim, if

single-byte or unicode string delimiter for the last field

Specifies that a variable-length record is prefixed by a 1-, 2-,

specifies that data consists of variable-length blocked records;

Record-Level Field Properties

on export, fill the gaps between positioned fields with a byte

final_delim = ws | end | none|

unique deliminter for the last field; precedes record_delim, if

single-byte or unicode string delimiter for the last field

Orchestrate 7.5 Record Schema Syntax

Import/Export Field Properties

Import/Export Field Properties

General Import/Export Field Properties

General Import/Export Field

binary format for data values

text representation of values

At the field level, it defines the character set to be used for

n-byte length prefix where n = 1, 2, or 4

a single-byte string or unicode string delimiter for the field