Professional Documents
Culture Documents
syntax
record { prop1 [= arg1 ], prop2 [= arg2 ], ... propn [= argn ] }
(fieldname_1 : fieldtype { prop1 [= arg1 ], prop2 [= arg2 ], ... propn [= argn ] };
...
fieldname_n : fieldtype { prop1 [= arg1 ], prop2 [= arg2 ], ... propn [= argn ] };)
example
osh import -file inputFile.data
-schema record {record_length = fixed, delim = none, binary}
(
name:string[80] {padchar = ' '};
age:int8 {default = 127};
income:dfloat {text, width = 8};
)
> outputDS.ds
Field Names
A field name may include single-byte and unicode characters. Its character-set encoding is
determined by the top-level osh option -input_charset. It can contain only alphanumeric
characters, as defined by the Unicode standard, and underscore characters. It must start
with a letter or an underscore character. It is case insensitive and can be of any length.
Forms
Explanation
Default Value
integer
0
0
floating point
sfloat
dfloat
single-precision floating-point
value
double-precision floating-point
value
0.0
0.0
Forms
Explanation
Default Value
string
for processing
single-byte
character data
(UTF-8)
string
string[max=n_codepoint_units]
string[n_codepoint_units]
string[ ...,
padchar = ASCII_value |
'ASCII_char' | null ]
variable-length string
variable-length with upper
bound
fixed-length string
optional padding character
specification
length = 0
length = 0
ustring
ustring[max=n_codepoint_units]
decimal
date/time
null (0x00)
characters or the
padchar, if
specified
length = 0
length = 0
decimal[precision]
decimal[precision, scale]
scale = 0
date
time
time[microseconds]
timestamp
timestamp[microseconds]
date
time with second resolution
time with microsecond resolution
date/time with second resolution
date/time with microsecond
resolution
January 1, 0001
00:00:00
00:00:00
00:00:00 on
January 1, 0001
ustring[n_codepoint_units]
ustring[ ...,
padchar = unicode_value |
'unicode_char' | null ]
null (0x0000)
characters or the
padchar, if
specified
raw
raw[max=n]
raw[n]
raw[ ..., align=n]
variable-length
variable-length with upper
bound
fixed length
align on n-byte boundary where
n = 1, 2, or 4 (optional)
length = 0
length = 0
bytes = 0
Vector Fields
Explanation
fieldname[length] : fieldtype
fixed-length vector
fieldname[] :
fieldtype { vector_prefix=n }
The link and reference syntax, where the length of the vector is given
by the value of the attached length-field, can only be specified for the
import and export operators.
Nullable Fields
Nullable Field Syntax
Explanation
nullable field
fieldname[] :
fieldtype { vector_prefix=n }
Subrecord Fields
Explanation
fieldname : subrec
(subfieldname1:fieldtype;
...
subfieldnamen:fieldtype;
);
fieldname.subfieldname
A subrec has its own naming scope. For example, fieldname and
subfieldname1 can have the same name.
A subrecord field can be placed inside a tagged field.
fieldname[n] : subrec
(subfieldname1:fieldtype;
...
subfieldnamen:fieldtype;
);
fieldname : subrec
(subfieldname1:fieldtype;
...
subfieldname2:subrec
(subfieldname2a:fieldtype;
fieldname.subfieldname2.subfieldname2b
...
);
subfieldname2b:fieldtype;);
Tagged Fields
Explanation
fieldname : tagged
(tagfield1:fieldtype;
The value of fieldname can be the data type of any one of the
tagfields. A tagfield can be any type except tagged or subrec.
...
tagfieldn:fieldtype;
);
link-fieldname : uint32 { link } ;
tag-fieldname : tagged
{ reference = link-fieldname }
(
fieldname 0: fieldtype
{ tagcase = 0 };
fieldname1 : fieldtype
{ tagcase = 1 };
...
fieldnamen : fieldtype
{ tagcase = n };
);
Explanation
Intact Properties
This property lets you improve performance by limiting the amount of processing done on each
record of the data. Only fields explicitly specified by the schema are processed. The example below
contains 80 characters of data, a character newline delimiter, and references only two fields:
record { intact, record_length = 81, record_delim='\n' }
( int : int32 { position=12, delim=,, text };
str : string[20] { position=40, delim=none };
)
intact
check_intact
intact=recordname
Length Properties
record_length = length
fixed-length records
record_length = fixed
Delimiter Properties
record_delim = character | null
record_delim_string = "string"
delim_string
Explanation
final_delim_string = 'string'
Prefix Property
record_prefix = n
Format Property
record_format=
{ type = implicit | varying
[, format = V, VB, VS, VBS, | VR] }
final_delim_string = 'string'
The general import/export field properties apply to all field types, except where noted.
Explanation
binary
text
, ascii | ebcdic
, import_ebcdic_as_ascii
, export_ebcdic_as_ascii
charset = charset
prefix = n
delim_string = "string"
delim =
ws | end | none |null |'x'
print_field
default = value
drop
generate
nbytes is the offset of the field in the record where the first
byte = 0; fieldname uses the starting offset of the field
Explanation
skip = nbytes
link
reference = linkfieldname
Explanation
width = nbytes |
max_width = nbytes
in_format = "format"
out_format = "format"
c_format = "format"
big_endian | little_endian |
native_endian
width = nbytes |
max_width = nbytes
import_ascii_as_ebcdic
Decimal Fields
R
binary
decimal_separator =
ASCII_character
width = nbytes |
max_width = nbytes
precision = digits
scale = digits
nofix_zero | fix_zero
Explanation
packed
, check | nocheck
, signed | unsigned
Date Fields
big_endian | little_endian |
native_endian
binary
charset charset
date_format =
"%yyyy-%mm-%dd"
default_date_format =
"String%yyyyString%mm
String&dd"
days_since =
"%yyyy-%mm-%dd"
julian
Time Fields
R
time_format = "%hh:%nn:%ss"
default_time_format =
String %hhString %nnString %ss
big_endian | little_endian |
native_endian
binary
equivalent to midnight_seconds
midnight_seconds
Explanation
Timestamp Fields
R
big_endian | little_endian |
native_endian
timestamp_format =
"%yyyy-%mm-%dd%hh:%nn:%ss"
| "String%yyyyString%mm
String&ddString%hhString %nn
String%ss
Vector Fields
vector_prefix = n
Nullable Fields
null_field = "string"
null_length = value
,actual_length = nbytes
Explanation
numeric
(also decimal, date,
time, timestamp)
date
epoch = 'date'
10
invalids = percentage
function = rundate
Explanation
decimal
zeros = percentage
invalids = percentage
raw
no options available
string
ustring
time
timestamp
nullable
alphabet =
'alpha_numeric_ustring '
scale = factor
invalids = percentage
epoch = 'date'
scale = factor
invalids = percentage
nulls = percentage
nullseed = number
11
12