Professional Documents
Culture Documents
Rev. 1631A–02/00
1
In addition to the new multiplication instruction, a few other The file “AVR201.asm” contains the application note
additions and improvements are made to the megaAVR source code of the 16-bit multiply routines.
processor core. One improvement that is particularly useful A listing of all implementations with key performance speci-
is the new instruction MOVW - Copy Register Word, which fications is given in Table 1.
makes a copy of one register pair into another register pair.
8-bit Multiplication
Doing an 8-bit multiply using the hardware multiplier is sim- Figure 1. Valid Register Usage
ple, as the examples in this section will clearly show. Just
load the operands into two registers (or only one for square MUL MULS MULSU
multiply) and execute one of the multiply instructions. The R0
FMUL
result will be placed in register pair R0:R1. However, note R1 FMULS
R2
that only the MUL instruction does not have register usage R3
FMULSU
R4
restrictions. Figure 1 shows the valid (operand) register R5
R6
usage for each of the multiply instructions. R7
R8
R9
Example 1 – Basic Usage R10
R11
R12
The first example shows an assembly code that reads the R13
R14
port B input value and multiplies this value with a constant R15
(5) before storing the result in register pair R17:R16. R16
R17
R16
R17
R16
R17
in r16,PINB ; Read pin values R18 R18 R18
R19 R19 R19
ldi r17,5 ; Load 5 into r17 R20 R20 R20
R21 R21 R21
mul r16,r17 ; r1:r0 = r17 * r16 R22 R22 R22
R23 R23 R23
movw r17:r16,r1:r0; Move the result to the r17:r16 R24 R24
R25 R25
; register pair R26 R26
R27 R27
Note the use of the new MOVW instruction. This example R28
R29
R28
R29
is valid for all of the multiply instructions. R30
R31
R30
R31
2 AVR201
AVR201
Example 2 – Special Cases the middle low byte of the result C. Equal notations are
This example shows some special cases of the MUL used for the remaining bytes.
instruction that are valid. The algorithm is basic for all multiplication. All of the partial
lds r0,variableA; Load r0 with SRAM variable A 16-bit results are shifted and added together. The sign
lds r1,variableB; Load r1 with SRAM variable B extension is necessary for signed numbers only, but note
mul r1,r0 ; r1:r0 = variable A * variable B that the carry propagation must still be done for unsigned
numbers.
lds r0,variableA; Load r0 with SRAM variable A
mul r0,r0 ; r1:r0 = square(variable A) Figure 2. 16-bit Multiplication, General Algorithm
Even though the operand is put in the result register pair
R1:R0, the operation gives the correct result since R1 and AH AL X BH BL
R0 are fetched in the first clock cycle and the result is
stored back in the second clock cycle.
3
Figure 3. 16-bit Multiplication, 16-bit Result Figure 4. 16-bit Multiplication, 32-bit Result
AH AL X BH BL AH AL X BH BL
(sign
AL * BL 1 ext) AL * BH 3
(sign
+ AL * BH 2 + ext) AH * BL 4
+ AH * BL 3 + AH * BH AL * BL 1+2
= CH CL = CH CMH CML CL
When one factor is negative and one factor is positive, for The 32-bit result of the unsigned multiplication of 672 and
example, A is negative and B is positive: 1844 will now be in the registers R19:R18:R17:R16. If
“muls16x16_32” is called instead of “mul16x16_32”, a
C = A • B = (216 - |A|) • |B| = (216 • |B|) - |A • B| = (216 - |A • B|) + signed multiplication will be executed. If “mul16x16_16” is
216 • (|B| - 1) called, the result will only be 16 bits long and will be stored
in the register pair R17:R16. In this example, the 16-bit
The MSBs will be discarded and the correct two’s comple- result will not be correct.
ment notation result will be C = 216 - |A • B|.
The product must be in the range 0 ≤ C ≤ 216 - 1 if unsigned 16-bit Multiply-accumulate Operation
numbers are used, and in the range -215 ≤ C ≤ 215 - 1 if
signed numbers are used. Figure 5. 16-bit Multiplication, 32-bit Accumulated Result
When doing integer multiplication in C language, this is
how it is done. The algorithm can be expanded to do 32-bit AH AL X BH BL
multiplication with 32-bit result.
4 AVR201
AVR201
Using the FMUL, FMULS and FMULSU instructions should Since we do not have a rest, the remaining three bits will be
not be more complex than the MUL, MULS and MULSU zero, and the final result is “1110 1000”, which is 1 + 0.5 +
instructions. However, one potential problem is to assign 0.25 + 0.0625 = 1.8125.
fractional variables right values in a simple way. The frac- To convert a negative fractional number, first add 2 to the
tion 0.75 (= 0.5 + 0.25) will, for example, be “0110 0000” if number and then use the same algorithm as already
8 bits are used. shown.
To convert a positive fractional number in the range [0, 2> 16-bit fractional numbers use a format similar to that of
(for example 1.8125) to the format used in the AVR, the fol- 8-bit fractional numbers; the high 8 bits have the same for-
lowing algorithm, illustrated by an example, should be mat as the 8-bit format. The low 8 bits are only an increase
used: of accuracy of the 8-bit format; while the 8-bit format has an
Is there a “1” in the number? accuracy of ±2-8, the16-bit format has an accuracy of ±2-16.
Yes, 1.8125 is higher than or equal to 1. Then again, the 32-bit fractional numbers are an increase
Byte is now “1xxx xxxx” of accuracy to the 16-bit fractional numbers. Note the
Is there a “0.5” in the rest? important difference between integers and fractional num-
0.8125 / 0.5 = 1.625 bers when extra byte(s) are used to store the number:
Yes, 1.625 is higher than or equal to 1. while the accuracy of the numbers is increased when frac-
Byte is now “11xx xxxx” tional numbers are used, the range of numbers that may be
represented is extended when integers are used.
Is there a “0.25” in the rest?
0.625 / 0.5 = 1.25 As mentioned earlier, using signed fractional numbers in
Yes, 1.25 is higher than or equal to 1. the range [-1, 1> has one main advantage to integers:
when multiplying two numbers in the range [-1, 1>, the
Byte is now “111x xxxx”
result will be in the range [-1, 1], and an approximation (the
Is there a “0.125” in the rest?
highest byte(s)) of the result may be stored in the same
0.25 / 0.5 = 0.5
number of bytes as the factors, with one exception: when
No, 0.5 is lower than 1. both factors are -1, the product should be 1, but since the
Byte is now “1110 xxxx” number 1 cannot be represented using this number format,
Is there a “0.0625” in the rest? the FMULS instruction will instead place the number -1 in
0.5 / 0.5 = 1 R1:R0. The user should therefore assure that at least one
Yes, 1 is higher than or equal to 1. of the operands is not -1 when using the FMULS instruc-
Byte is now “1110 1xxx” tion. The 16-bit x 16-bit fractional multiply also has this
restriction.
5
Example 5 – Basic Usage 8-bit x 8-bit = 16-bit Signed Note: Full orthogonality, i.e. any register pair can be used as
Fractional Multiply long as the result and the two operands do not share
register pairs. The routine is non-destructive to the
This example shows an assembly code that reads the
operands.
port B input value and multiplies this value with a fractional
mul16x16_16:
constant (-0.625) before storing the result in register pair
mul r22, r20 ; al * bl
R17:R16.
in r16,PINB ; Read pin values movw r17:r16, r1:r0
ldi r17,$B0 ; Load -0.625 into r17 mul r23, r20 ; ah * bl
fmuls r16,r17 ; r1:r0 = r17 * r16 add r17, r0
movw r17:r16,r1:r0; Move the result to the r17:r16 mul r21, r22 ; bh * al
; register pair add r17, r0
Note that the usage of the FMULS (and FMUL) instructions ret
is very similar to the usage of the MULS and MUL
instructions. Function
mul16x16_32
Example 6 – Multiply-accumulate Operation
Description
The example below uses data from the ADC. The ADC
should be configured so that the format of the ADC result is Unsigned multiply of two 16-bit numbers with a 32-bit
compatible with the fractional two’s complement format. result.
For the ATmega83/163, this means that the ADLAR bit in Usage
the ADMUX I/O register is set and a differential channel is R19:R18:R17:R16 = R23:R22 • R21:R20
used. (The ADC result is normalized to one.)
Statistics
ldi r23,$62 ; Load highbyte of
; fraction 0.771484375 Cycles: 17 + ret
ldi r22,$C0 ; Load lowbyte of Words: 13 + ret
; fraction 0.771484375 Register usage: R0 to R2 and R16 to R23 (11
in r20,ADCL ; Get lowbyte of ADC conversion registers)
in r21,ADCH ; Get highbyte of ADC conversion Note: Full orthogonality, i.e. any register pair can be used as
callfmac16x16_32;Call routine for signed fractional long as the 32-bit result and the two operands do not
; multiply accumulate share register pairs. The routine is non-destructive to the
operands.
The registers R19:R18:R17:R16 will be incremented with
mul16x16_32:
the result of the multiplication of 0.771484375 with the ADC
clr r2
conversion result. In this example, the ADC result is treated
as a signed fraction number. We could also treat it as a mul r23, r21 ; ah * bh
signed integer and call it “mac16x16_32” instead of movw r19:r18, r1:r0
“fmac16x16_32”. In this case, the 0.771484375 should be mul r22, r20 ; al * bl
replaced with an integer. movw r17:r16, r1:r0
mul r23, r20 ; ah * bl
add r17, r0
Implementations
adc r18, r1
Function adc r19, r2
mul16x16_16 mul r21, r22 ; bh * al
Description add r17, r0
6 AVR201
AVR201
Description
mul r22, r20 ; al * bl
Signed multiply of two 16-bit numbers with a 32-bit
add r16, r0
result.
adc r17, r1
Usage adc r18, r2
R19:R18:R17:R16 = R23:R22 • R21:R20 adc r19, r2
Statistics
Cycles: 19 + ret mulsu r23, r20 ; (signed)ah * bl
7
Function fmac16x16_32: ; Register usage optimized
fmuls16x16_32 clr r2
Description
fmuls r23, r21 ; ( (signed)ah * (signed)bh )
Signed fractional multiply of two 16-bit numbers with a << 1
32-bit result. add r18, r0
Usage adc r19, r1
R19:R18:R17:R16 = (R23:R22 • R21:R20) << 1
Statistics fmul r22, r20 ; ( al * bl ) << 1
adc r18, r2
Cycles: 20 + ret
adc r19, r2
Words: 16 + ret
add r16, r0
Register usage: R0 to R2 and R16 to R23 (11 adc r17, r1
registers)
adc r18, r2
Note: The routine is non-destructive to the operands. adc r19, r2
fmuls16x16_32:
clr r2 fmulsu r23, r20 ; ( (signed)ah * bl ) << 1
fmuls r23, r21 ; ( (signed)ah * (signed)bh ) << 1 sbc r19, r2
movw r19:r18, r1:r0 add r17, r0
fmul r22, r20 ; ( al * bl ) << 1 adc r18, r1
adc r18, r2 adc r19, r2
movw r17:r16, r1:r0
fmulsu r23, r20; ( (signed)ah * bl ) << 1 fmulsu r21, r22 ; ( (signed)bh * al ) << 1
sbc r19, r2 ; Sign extend sbc r19, r2
add r17, r0 add r17, r0
adc r18, r1 adc r18, r1
adc r19, r2 adc r19, r2
fmulsu r21, r22; ( (signed)bh * al ) << 1
sbc r19, r2 ; Sign extend ret
add r17, r0
adc r18, r1 fmac16x16_32_method_B: ; uses two temporary
adc r19, r2 registers (r4,r5), speed / Size optimized
ret ; but reduces cycles/words by 2
clr r2
Function
fmuls r23, r21 ; ( (signed)ah * (signed)bh )
fmac16x16_32
<< 1
Description movw r5:r4,r1:r0
Signed fractional multiply-accumulate of two 16-bit fmul r22, r20 ; ( al * bl ) << 1
numbers with a 32-bit result. adc r4, r2
Usage
R19:R18:R17:R16 += (R23:R22 • R21:R20) << 1 add r16, r0
8 AVR201
AVR201
ret
9
Atmel Headquarters Atmel Operations
Corporate Headquarters Atmel Colorado Springs
2325 Orchard Parkway 1150 E. Cheyenne Mtn. Blvd.
San Jose, CA 95131 Colorado Springs, CO 80906
TEL (408) 441-0311 TEL (719) 576-3300
FAX (408) 487-2600 FAX (719) 540-1759
Europe Atmel Rousset
Atmel U.K., Ltd. Zone Industrielle
Coliseum Business Centre 13106 Rousset Cedex
Riverside Way France
Camberley, Surrey GU15 3YL TEL (33) 4-4253-6000
England FAX (33) 4-4253-6001
TEL (44) 1276-686-677
FAX (44) 1276-686-697
Asia
Atmel Asia, Ltd.
Room 1219
Chinachem Golden Plaza
77 Mody Road Tsimhatsui
East Kowloon
Hong Kong
TEL (852) 2721-9778
FAX (852) 2722-1369
Japan
Atmel Japan K.K.
9F, Tonetsu Shinkawa Bldg.
1-24-8 Shinkawa
Chuo-ku, Tokyo 104-0033
Japan
TEL (81) 3-3523-3551
FAX (81) 3-3523-7581
Fax-on-Demand
North America:
1-(800) 292-8635
International:
1-(408) 441-0732
e-mail
literature@atmel.com
Web Site
http://www.atmel.com
BBS
1-(408) 436-4309