12.4 Synthesis of the Viterbi Decoder
12.4
Synthesis of the Viterbi Decoder
In this section we return to the Viterbi decoder from Chapter 11. After an initial synthesis run that shows how logic synthesis works with a real example, we step back and study some of the issues and problems of using HDLs for logic synthesis.
12.4.1 ASIC I/O
Some logic synthesizers can include I/O cells automatically, but the designer may have to use directives to designate special pads (clock buffers, for example). It may also be necessary to use commands to set I/O cell features such as selection of pull-up resistor, slew rate, and so on. Unfortunately there are no standards in this area. Worse, there is currently no accepted way to set these parameters from an HDL. Designers may also use either generic technology-independent I/O models or instantiate I/O cells directly from an I/O cell library. Thus, for example, in the Compass tools the statement
asPadIn #(3,"1,2,3") u0 (in0, padin0);
uses a generic I/O cell model,
asPadIn
. This statement will generate three input pads (with pin numbers
"1"
,
"2"
, and
"3"
) if
in0
is a 3-bit-wide bus.
The next example illustrates the use of generic I/O cells from a standard-component library. These components are technology independent (so they may equally well be used with a 0.6
m
m or 0.35
m
m technology).
module
allPads(padTri, padOut, clkOut, padBidir, padIn, padClk);
output
padTri, padOut, clkOut; inout padBidir;
input
[3:0] padIn;
input
padClk;
wire
[3:0] in;
//compass dontTouch u*
// asPadIn #(W, N, L, P) I (toCore, Pad) also asPadInInv
// asPadOut #(W, N, L, P) I (Pad, frCore)
// asPadTri #(W, N, S, L, P) I (Pad, frCore, OEN)
// asPadBidir #(W, N, S, L, P) I (Pad, toCore, frCore, OEN)
// asPadClk #(N, S, L) I (Clk, Pad) also asPadClkInv
// asPadVxx #(N, subnet) I (Vxx)
// W = width, integer (default=1)
// N = pin number string, e.g. "1:3,5:8"
// S = strength = {2, 4, 8, 16} in mA drive
// L = level = {cmos, ttl, schmitt} (default = cmos)
// P = pull-up resistor = {down, float, none, up}
// Vxx = {Vss, Vdd}
// subnet = connect supply to {pad, core, both}
asPadIn #(4,"1:4","","none") u1 (in, padIn);
asPadOut #(1,"5",13) u2 (padOut, d);
asPadTri #(1,"6",11) u3 (padTri, in[1], in[0]);
asPadBidir #(1,"7",2,"","") u4 (d, padBidir, in[3], in[2]);
asPadClk #(8) u5 (clk, padClk);
asPadOut #(1, "9") u6 (clkOut, clk);
asPadVdd #("10:11","pads") u7 (vddr);
asPadVss #("12,13","pads") u8 (vssr);
asPadVdd #("14","core") u9 (vddc);
asPadVss #("15","core") u10 (vssc);
asPadVdd #("16","both") u11 (vddb);
asPadVss #("17","both") u12 (vssb);
endmodule
The following code is an example of the contents of a generic model for a three-state I/O cell (provided in a standard-component library or in an I/O cell library):
module
PadTri (Pad, I, Oen); // active-low output enable
parameter
width = 1, pinNumbers = "", \strength = 1,
level = "CMOS", externalVdd = 5;
output
[width-1:0] Pad;
input
[width-1:0] I;
input
Oen;
assign
#1 Pad = (Oen ? {width{1'bz}} : I);
endmodule
The module
PadTri
can be used for simulation and as the basis for synthesizing an I/O cell. However, the synthesizer also has to be told to synthesize an I/O cell connected to a bonding pad and the outside world and not just an internal three-state buffer. There is currently no standard mechanism for doing this, and every tool and every ASIC company handles it differently.
The following model is a generic model for a bidirectional pad. We could use this model as a basis for input-only and output-only I/O cell models.
module
PadBidir (C, Pad, I, Oen); // active-low output enable
parameter
width = 1, pinNumbers = "", \strength = 1,
level = "CMOS", pull = "none", externalVdd = 5;
output
[width-1:0] C;
inout
[width-1:0] Pad;
input
[width-1:0] I;
input
Oen;
assign
#1 Pad = Oen ? {width{1'bz}} : I;
assign
#1 C = Pad;
endmodule
In Chapter 8 we used the
halfgate
example to demonstrate an FPGA design flow—including I/O. If the synthesis tool is not capable of synthesizing I/O cells, then we may have to instantiate them by hand; the following code is a hand-instantiated version of lines
19
–
22
in module
allPads
:
pc5o05 u2_2 (.PAD(padOut), .I(d));
pc5t04r u3_2 (.PAD(padTri), .I(in[1]), .OEN(in[0]));
pc5b01r u4_3 (.PAD(padBidir), .I(in[3]), .CIN(d), .OEN(in[2]));
pc5d01r u5_in_1 (.PAD(padClk), .CIN(u5toClkBuf[0]));
The designer must find the names of the I/O cells (
pc5o05
and so on), and the names, positions, meanings, and defaults for the parameters from the cell-library documentation.
I/O cell models allow us to simulate the behavior of the synthesized logic inside an ASIC “all the way to the pads.” To simulate “outside the pads” at a system level, we should use these same I/O cell models. This is important in ASIC design. For example, the designers forgot to put pull-up resistors on the outputs of some of the SparcStation ASICs. This was one of the very few errors in a complex project, but an error that could have been caught if a system-level simulation had included complete I/O cell models for the ASICs.
12.4.2 Flip-Flops
In Chapter 11 we used this D flip-flop model to simulate the Viterbi decoder:
module
dff(D,Q,Clock,Reset); // N.B. reset is active-low
output
Q;
input
D,Clock,Reset;
parameter
CARDINALITY = 1;
reg
[CARDINALITY-1:0] Q;
wire
[CARDINALITY-1:0] D;
always
@(
posedge
Clock)
if
(Reset!==0) #1 Q=D;
always
begin
wait
(Reset==0); Q=0;
wait
(Reset==1);
end
endmodule
Most simulators cannot synthesize this model because there are two
wait
statements in one
always
statement (line
6
). We could change the code to use flip-flops from the synthesizer standard-component library by using the following code:
asDff ff1 (.Q(y), .D(x), .Clk(clk), .Rst(vdd));
Unfortunately we would have to change all the flip-flop models from
'dff'
to
'asDff'
and the code would become dependent on a particular synthesis tool. Instead, to maintain independence from vendors, we shall use the following D flip-flop model for synthesis and simulation:
module
dff(D, Q, Clk, Rst); // new flip-flop for Viterbi decoder
parameter
width = 1, reset_value = 0;
input
[width - 1 : 0] D;
output
[width - 1 : 0] Q;
reg
[width - 1 : 0] Q;
input
Clk, Rst;
initial
Q <= {width{1'bx}};
always
@ (
posedge
Clk
or
negedge
Rst )
if
( Rst == 0 ) Q <= #1 reset_value;
else
Q <= #1 D;
endmodule
12.4.3 The Top-Level Model
The following code models the top-level Viterbi decoder and instantiates (with instance name
v_1
) a copy of the Verilog module
viterbi
from Chapter 11. The model uses generic input, output, power, and clock I/O cells from the standard-component library supplied with the synthesis software. The synthesizer will take these generic I/O cells and map them to I/O cells from a technology-specific library. We do not need three-state I/O cells or bidirectional I/O cells for the Viterbi ASIC.
/* This is the top-level module, viterbi_ASIC.v */
module
viterbi_ASIC
(padin0, padin1, padin2, padin3, padin4, padin5, padin6, padin7,
padOut, padClk, padRes, padError);
input
[2:0] padin0, padin1, padin2, padin3,
padin4, padin5, padin6, padin7;
input
padRes, padClk;
output
padError;
output
[2:0] padOut;
wire
Error, Clk, Res;
wire
[2:0] Out; // core
wire
padError, padClk, padRes;
wire
[2:0] padOut;
wire
[2:0] in0,in1,in2,in3,in4,in5,in6,in7; // core
wire
[2:0]
padin0, padin1,padin2,padin3,padin4,padin5,padin6,padin7;
// Do not let the software mess with the pads.
//compass dontTouch u*
asPadIn #(3,"1,2,3") u0 (in0, padin0);
asPadIn #(3,"4,5,6") u1 (in1, padin1);
asPadIn #(3,"7,8,9") u2 (in2, padin2);
asPadIn #(3,"10,11,12") u3 (in3, padin3);
asPadIn #(3,"13,14,15") u4 (in4, padin4);
asPadIn #(3,"16,17,18") u5 (in5, padin5);
asPadIn #(3,"19,20,21") u6 (in6, padin6);
asPadIn #(3,"22,23,24") u7 (in7, padin7);
asPadVdd #("25","both") u25 (vddb);
asPadVss #("26","both") u26 (vssb);
asPadClk #("27") u27 (Clk, padClk);
asPadOut #(1,"28") u28 (padError, Error);
asPadin #(1,"29") u29 (Res, padRes);
asPadOut #(3,"30,31,32") u30 (padOut, Out);
// Here is the core module:
viterbi v_1
(in0,in1,in2,in3,in4,in5,in6,in7,Out,Clk,Res,Error);
endmodule
At this point we are ready to begin synthesis. In order to demonstrate how synthesis works, I am cheating here. The code that was presented in Chapter 11 has already been simulated and synthesized (requiring several iterations to produce error-free code). What I am doing is a little like the Galloping Gourmet’s television presentation: “And then we put the soufflé in the oven . . . and look at the soufflé that I prepared earlier.” The synthesis results for the Viterbi decoder are shown in
Table 12.6
. Normally the worst thing we can do is prepare a large amount of code, put it in the synthesis oven, close the door, push the “synthesize and optimize” button, and wait. Unfortunately, it is easy to do. In our case it works (at least we may think so at this point) because this is a small ASIC by today’s standards—only a few thousand gates. I made the bus widths small and chose this example so that the code was of a reasonable size. Modern ASICs may be over one million gates, hundreds of times more complicated than our Viterbi decoder example.
|
TABLE 12.6
Initial synthesis results of the Viterbi decoder ASIC.
|
|
Command
|
Synthesizer output
,
|
|
> optimize
|
Num Gate Count Tot Gate Width Total
Cell Name Insts Per Cell Count Per Cell Width
--------- ----- ---------- -------- -------- --------
pc5c01 1 315.4 315.4 100.8 100.8
pc5d01r 26 315.4 8200.4 100.8 2620.8
pc5o06 4 315.4 1261.6 100.8 403.2
pv0f 1 315.4 315.4 100.8 100.8
pvdf 1 315.4 315.4 100.8 100.8
viterbi_p 1 1880.0 1880.0 18048.0 18048.0
|
The derived schematic for the synthesized core logic is shown in
Figure 12.6
. There are eight boxes in
Figure 12.6
that represent the eight modules in the Verilog code. The schematics for each of these eight blocks are too complex to be useful. With practice it is possible to “see” the synthesized logic from reports such as
Table 12.6
. First we check the following cells at the top level:
|
|
|
FIGURE 12.6
The core logic of the Viterbi decoder ASIC. Bus names are abbreviated in this figure for clarity. For example the label m_out0-3 denotes the four buses: m_out0, m_out1, m_out2, and m_out3.
|
-
pc5c01
is an I/O cell that drives the clock node into the logic core. ASIC designers also call an I/O cell a
pad cell
, and often refer to the pad cells (the bonding pads and associated logic) as just “the
pads
.” From the library data book we find this is a “core-driven, noninverting clock buffer capable of driving 125 pF.” This is a large logic cell and does not have a bonding pad, but is placed in a pad site (a slot in the ring of pads around the perimeter of the die) as if it were an I/O cell with a bonding pad.
-
pc5d01r
is a 5V CMOS input-only I/O cell with a bus repeater. Twenty-four of these I/O cells are used for the 24 inputs (
in0
to
in7
). Two more are used for
Res
and
Clk
. The I/O cell for
Clk
receives the clock signal from the bonding pad and drives the clock buffer cell (
pc5c01
). The
pc5c01
cell then buffers and drives the clock back into the core. The power-hungry clock buffer is placed in the pad ring near the VDD and VSS pads.
-
pc5o06
is a CMOS output-only I/O cell with 6X drive strength (6 mA AC drive and 4 mA DC drive). There are four output pads: three pads for the signal outputs,
outp[2:0
], and one pad for the output signal,
error
.
-
pv0f
is a power pad that connects all VSS power buses on the chip.
-
pvdf
is a power pad that connects all VDD power buses on the chip.
-
viterbi_p
is the core logic. This cell takes its name from the top-level Verilog module (
viterbi
). The software has appended a
"_p"
suffix (the default) to prevent input files being accidentally overwritten.
The software does not tell us any of this directly. We learn what is going on by looking at the names and number of the synthesized cells, reading the synthesis tool documentation, and from experience. We shall learn more about I/O pads and the layout of power supply buses in Chapter 16.
Next we examine the cells used in the logic core. Most synthesis tools can produce reports, such as that shown in
Table 12.7
, which lists all the synthesized cells. The most important types of cells to check are the sequential elements: flip-flops and latches (I have omitted all but the sequential logic cells in
Table 12.7
). One of the most common mistakes in synthesis is to accidentally leave variables unassigned in all situations in the HDL. Unassigned variables require memory and will generate unnecessary sequential logic. In the Viterbi decoder it is easy to identify the sequential logic cells that should be present in the synthesized logic because we used the module
dff
explicitly whenever we required a flip-flop. By scanning the code in Chapter 11 and counting the references to the
dff
model, we can see that the only flip-flops that should be inferred are the following:
-
24 (3
¥
8) D flip-flops in instance
subset_decode
-
132 (11
¥
12) D flip-flops in instance
path_memory
that contains 11 instances of
path
(12 D flip-flops in each instance of
path
)
-
12 D flip-flops in instance
pathin
-
20 (5
¥
4) D flip-flops in instance
metric
The total is 24 + 132 + 12 + 20 = 188 D flip-flops, which is the same as the number of
dfctnb
cell instances in
Table 12.7
.
|
TABLE 12.7
Number of synthesized flip-flops in the Viterbi ASIC.
|
|
Command
|
Synthesizer output
|
|
> report area -flat
|
Num Gate Count Tot Gate Width Total
Cell Name Insts Per Cell Count Per Cell Width
--------- ----- ---------- -------- -------- --------
...
dfctnb 188 5.8 1081.0 55.2 10377.6
...
--------- ----- ---------- -------- -------- --------
Totals: 1383 12716.5 25485.6
|
Table 12.6
gives the total width of the standard cells in the logic core after logic optimization as 18,048
m
m. Since the standard-cell height for this library is 72
l
(21.6
m
m), we can make a first estimate of the total logic cell area as
|
(18,048
m
m) (21.6
m
m)
|
=
|
390 k(
m
m)
2
|
|
(12.12)
|
|
|
|
|
|
|
|
|
|
390 k(
m
m)
2
mil
2
|
|
|
|
|
ª
|
––––––––––––––
|
|
|
|
|
|
(25.4
m
m)
2
|
|
|
|
|
|
|
|
|
|
|
ª
|
600 mil
2
|
|
|
In the physical layout we shall need additional space for routing. The ratio of routing to logic cell area is called the
routing factor
. The routing factor depends primarily on whether we use two levels or three levels of metal. With two levels of metal the routing factor is typically between 1 and 2. With three levels of metal, where we may use over-the-cell routing, the routing factor is usually zero to 1. We thus expect a logic core area of 600–1000 mils
2
for the Viterbi decoder using this cell library.
From
Table 12.6
we see the I/O cells in this library are 100.8
m
m wide or approximately 4 mil (the width of a single pad site). From the I/O cell data book we find the I/O cell height is 650
m
m (actually 648.825
m
m) or approximately 26 mil. Each I/O cell thus occupies 104 mil
2
. Our 33 pad sites will thus require approximately 3400 mil
2
which is larger than the estimated core logic area.
Let us go back and take a closer look at what it usually takes to get to this point. Remember we used an already prepared Verilog model for the Viterbi decoder.
[ Chapter start ] [ Previous page ] [ Next page ] |