You are on page 1of 13

Coding Tips for High Quality of Results

Module 6

Jan 31, 2011

Module Objective
Your objective:
To code your design for optimal Quality of Results

Topics:
Hardcoding compiler optimizations Controlling expression size and dynamics Facilitating scheduler optimizations Current C-to-Silicon known problems and solutions Miscellaneous issues

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-2

General Compiler Optimizations


Compilers automatically do some of these optimizations:
Move loop-invariant code out of loop statements Reduce operation strength
The tool may perform such optimizations

You can guarantee these optimizations by coding them yourself!

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-3

Move Invariant Expressions out of the Loop


Move loop-invariant calculations to before or after the loop.
Schedules unnecessary operations. for (i=0; i<10; i++) { max = a > b ? a : b; c[i] = max * b; } Move expr Unrolled loop generates 4 x 320 mux.
int sum[10]; for (i=0; i<10; i++) { if (cond1) sum[i] += if (cond2) sum[i] += if (cond3) sum[i] += sum[i] = }
01/31/2011

Loop-invariant expression moved. max = a > b ? a : b; for (i=0; i<10; i++) { c[i] = max * b; }

Generates 4 x 32 mux.
int sum[10], temp; if (cond1) temp = input1; else if (cond2) temp = input2; else if (cond3) temp = input3; else temp = 0; for (i=0; i<10; i++) sum[i] += temp;
6-4

input1; input2; input3; sum[i];

else else else

SystemC Synthesis using C-to-Silicon Compiler

Reduce Operation Strength


Convert multiplication and division operations to shift operations to extent practical.

Synthesis infers at least 6-bit ops.


a = b * 48; c = b / 48; Reduce strength

Operation strength reduced.


a = (b << 4) * 3; c = (b >> 4) / 3;

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-5

Controlling Expression Size and Dynamics


Synthesis tools automatically do some of these optimizations:
Explicitly specify constant expressions

Explicitly size state variables


Explicitly size expressions Control variable dynamics Pad array inner dimensions to powers of 2

The tool may perform such optimizations

You can guarantee these optimizations by coding them yourself!

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-6

Explicitly Specify Constants


Synthesis cannot always statically determine your design intent.
Explicitly declare constants to clarify your design intent This code infers a barrel shifter. sc_int<40> a, b; sc_int<4> c; c = 10; ... a = b >> c; This code infers a constant shift (wires). sc_int<40> a, b; const sc_int<4> c = 10; ... ... a = b >> c;

Use a constant

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-7

Explicitly Size State Variables


Synthesis cannot always statically determine your design intent.
Explicitly size state variables to clarify your design intent Synthesis infers 32-bit counter. int counter = 0; ... counter++; if (counter == 25) counter = 0; Synthesis infers 5-bit counter. sc_uint<5> counter = 0; ... counter++; if (counter == 25) counter = 0;

Size the variable

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-8

Explicitly Size Expressions


Synthesis cannot always statically determine your design intent.
Explicitly size expressions to clarify your design intent Synthesis infers 64-bit comparator. sc_uint<4> a, b; ... if ((a-1) > b) ...
Synthesis assumes maximum width i.e. long long (1LL)

Synthesis infers 4-bit comparator. sc_uint<4> a, b; ... if ((a-sc_uint<4>(1)) > b) ...

Explicitly size expressions only when needed and be very careful to not induce errors!

Size the expression

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-9

Control Variable Dynamics


Synthesis cannot always statically determine your design intent.
Explicitly control variable dynamics to clarify your design intent 32-bit variable shift.
sc_in<bool> valid_in; sc_in<sc_uint<5> > word_in; ... unsigned shift(unsigned data) { while (!valid_in) wait(); sc_uint<5> word = word_in; return data << (32-word); } // shift range is 32 to 17

16-bit variable shift.


sc_in<bool> valid_in; sc_in<sc_uint<5> > word_in; ... unsigned shift(unsigned data) { while (!valid_in) wait(); sc_uint<5> word = word_in; unsigned preshift = data << 16; return preshift << (16-word); } // shift range is 16 to 1

Control variable dynamics


01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-10

Pad Array Inner Dimensions to Powers of 2


Simplify address calculation concatenate instead of multiply and add.
If mapped to registers, unused registers are removed If mapped to RAM, unused RAM may remain Multiply and add: i*9+j
int A[3][9]; ... for (int i=0; i<3; ++i) for (int j=0; j<9; ++j) A[i][j] = ...

Concatenate: { i[1:0], j[3:0] }


int A[3][16]; ... for (int i=0; i<3; ++i) for (int j=0; j<9; ++j) A[i][j] = ...

Pad to power of two

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-11

Facilitating Scheduler Optimizations


Synthesis tools need your direction for most of these optimizations:
Code an optimal control flow

Provide realistic timing constraints


Fully describe a datapath in as few threads as possible Pass function arguments by value Move local write/read arrays to module body to suppress initialization Separate I/O and computation to facilitate scheduling Combine I/O and computation to facilitate resource sharing Suppress resource sharing to improve timing
Forcing signal semantics suppresses register sharing

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-12

Code an Optimal Control Flow


Eliminate code not reachable in the target operating environment Caused by input value constraints synthesis does not know about Simplify and compact consecutive or nested if conditions to reduce the

number of multiplexors
Rewrite a cascaded if else if statement (priority implementation) as a

switch statement (parallel implementation) where applicable


if (cond) do_this(); if (!cond) do_that(); if (cond) do_this(); else do_that();

if (value==0) do_this(); else if (value==1) do_that(); else do_other();

switch (value) { case 0: do_this(); break; case 1: do_that(); break; default: do_other(); break; }
6-13

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

Provide Realistic Timing Constraints


Do not overconstrain the clock! An overconstrained clock unnecessarily increases area and timing
Can prevent resource sharing that otherwise would occur
Can prevent operator rescheduling to a less-utilized pipeline state
If the operator delay exceeds the clock cycle The tool will not move the operator Potentially leaving it bundled with other ops
clock realized

constrain latency and ops, not clock

clock constraint +
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-14

Fully Describe a Datapath in One Thread


The scheduler cannot share resources between threads (or by extension, modules).
Group datapath operations into as few threads as practical (cannot always group operations executing at different throughputs)

Operations that can be grouped.


my_module::proc1() { wait(); for (;;;) { if (cond) ya = a1 + a2; wait(); } } my_module::proc2() { wait(); for (;;;) { if (!cond) yb = b1 + b2; wait(); } }
01/31/2011

Operations grouped into one thread.


my_module::proc() { wait(); for (;;;) { if (cond) ya = a1 + a2; else yb = b1 + b2; wait(); } }

Reduce number of threads

SystemC Synthesis using C-to-Silicon Compiler

6-15

Pass Function Arguments by Value


Pass-by-Pointer int func(int *in, int *out);
Accepted May produce

Pass-by-Reference int func(int &in, int &out);


Better May produce

Pass-by-Value int func(int in);


Best Most aggressive

inoptimal timing

inoptimal area

optimization

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-16

Move Local Write/Read Arrays to Module Body


Synthesis initializes non-const function-local variables i.a.w. C++ semantics Synthesis must schedule initialization of non-const local arrays mapped to RAM

Local array mapped to RAM.


SC_MODULE (my_module) { ... private: ... }; void my_module::foo() { int array[100]={}; ... Initialized }

Member array mapped to RAM.


SC_MODULE (my_module) { ... private: int array[100]={}; }; Not initialized

Make array member

void my_module::foo() { ... ... }

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-17

Separate I/O and Computation to Facilitate Scheduling


Synthesis must schedule I/O operations in the cycle where coded.
Separate I/O and computation to allow scheduling flexibility I/O and computation in one cycle.
while (true) { ... wait(); ... result.write( subtract.read() ? a - b : a + b ); }

Flexible scheduling.
while (true) { bool sub = subtract.read(); wait(); ... result.write( sub ? a - b : a + b ); }

Generally separate out I/O ops


01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-18

Combine I/O and Computation to Facilitate Sharing


Combining I/O and computation in one cycle can reduce resources.
Opcode registers are not shared. ALU is shared.
while op1 op2 ... opN (true) { = opcode1.read(); = opcode2.read(); = opcodeN.read();

Opcode register is shared (muxed). ALU is shared.


while (true) { op1 = opcode1.read(); result1 = ALU(data,op1); wait(); op2 = opcode2.read(); result2 = ALU(data,op2); wait(); ... opN = opcodeN.read(); resultN = ALU(data,opN); wait(); }

result1 = ALU(data,op1); result2 = ALU(data,op2); ... resultN = ALU(data,opN); wait(N); }

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-19

Forcing Signal Semantics Suppresses Register Sharing


Prohibiting resource sharing can improve timing by removing multiplexors.
Not recommended style but sometimes can be useful Assume each ALU operation fully utilizes the clock cycle Register shared between cycles.
int result1; int result2; }; int module::func(int data_in) { result1=ALU(data_in,opcode1); wait(); result2=ALU(result1,opcode2); wait(); return ALU(result2,opcode3); }

Registers not shared between cycles.


sc_signal<int> result1; sc_signal<int> result2; }; int module::func(int data_in) { result1=ALU(data_in,opcode1); wait(); result2=ALU(result1,opcode2); wait(); return ALU(result2,opcode3); }

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-20

Known Problems and Solutions


Tips and current limitations specific to the C-to-Silicon Compiler:
Declare large classes as SystemC modules Limit each pointer to maximum of 16 objects

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-21

Declare Large Classes as SystemC Modules


The C-to-Silicon Compiler handles modules more efficiently than arbitrary classes.
Converting arbitrary classes to modules may solve a capacity problem.

Potential capacity problem. class mpeg_decoder { // A really big class ... }; SC_MODULE(my_module) { ... private: mpeg_decoder my_decoder; };

May resolve capacity problem. SC_MODULE (mpeg_decoder) { // A really big class ... }; SC_MODULE(my_module) { ... SC_CTOR(my_module) : my_decoder("my_decoder") {...} private: mpeg_decoder my_decoder; };

Make it a module
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-22

Limit Each Pointer to Maximum of 16 Objects


The C-to-Silicon Compiler tracks up to 16 objects that a pointer can point to.
You can assign any number of addresses of the up to 16 objects.

Cannot use 1 pointer for 18 objects. SC_MODULE(...) { ... private: int buf00[32]; int buf01[32]; ... int buf19[32]; ... int *ptr00to19; };

Use 1 pointer for maximum 16 objects. SC_MODULE(...) { ... private: int buf00[32]; int buf01[32]; ... int buf19[32]; ... int *ptr00to15; int *ptr16to19; };

Maximum of 16 objects

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-23

Coding for High QoR Quiz


1. Explain how explicitly sizing expressions might cause problems.

2. Suggest a reason why synthesis might not be able to remove code representing functionality that your device will never use.

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-26

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-27

Coding for High QoR Quiz Solution


1. Explain how explicitly sizing expressions might cause problems. While explicitly sizing expressions, you can very easily inadvertently lose the more significant result bits for operations such as addition and multiplication. 2. Suggest a reason why synthesis might not be able to remove code representing functionality that your device will never use. Synthesis cannot be aware of how the target environment might restrict the value ranges of data inputs and combinations of control inputs, thus sometimes cannot strip design functionality that will never be used.

01/31/2011

SystemC Synthesis using C-to-Silicon Compiler

6-28

You might also like