13.7
Static Timing Analysis
We return to the
comparator/MUX example to see how timing analysis is applied to sequential logic. We shall use the same input code (
comp_mux.v
in
Section 13.2
), but this time we shall target the design to an Actel FPGA.
Before routing we obtain the following static timing analysis:
Instance name in pin-->out pin tr total incr cell
--------------------------------------------------------------------
END_OF_PATH
outp_2_ R 27.26
OUT1 : D--->PAD R 27.26 7.55 OUTBUF
I_1_CM8 : S11--->Y R 19.71 4.40 CM8
I_2_CM8 : S11--->Y R 15.31 5.20 CM8
I_3_CM8 : S11--->Y R 10.11 4.80 CM8
IN1 : PAD--->Y R 5.32 5.32 INBUF
a_2_ R 0.00 0.00
BEGIN_OF_PATH
The estimated prelayout critical path delay is nearly 30 ns including the I/O-cell delays (ACT 3, worst-case, standard speed grade). This limits the operating frequency to 33 MHz (assuming we can get the signals to and from the chip pins with no further delays—highly unlikely). The operating frequency can be increased by pipelining the design as follows (by including three register stages: at the inputs, the outputs, and between the comparison and the select functions):
// comp_mux_rrr.v
module
comp_mux_rrr(a, b, clock, outp);
input
[2:0] a, b; output [2:0] outp;
input
clock;
reg
[2:0] a_r, a_rr, b_r, b_rr, outp;
reg
sel_r;
wire
sel = ( a_r <= b_r ) ? 0 : 1;
always
@ (
posedge
clock)
begin
a_r <= a; b_r <= b;
end
always
@ (
posedge
clock)
begin
a_rr <= a_r; b_rr <= b_r;
end
always
@ (
posedge
clock) outp <= sel_r ? b_rr : a_rr;
always
@ (
posedge
clock) sel_r <= sel;
endmodule
Following synthesis we optimize module
comp_mux_rrr
for maximum speed. Static timing analysis gives the following preroute critical paths:
---------------------INPAD to SETUP longest path---------------------
Rise delay, Worst case
Instance name in pin-->out pin tr total incr cell
--------------------------------------------------------------------
END_OF_PATH
D.a_r_ff_b2 R 4.52 0.00 DF1
INBUF_24 : PAD--->Y R 4.52 4.52 INBUF
a_2_ R 0.00 0.00
BEGIN_OF_PATH
---------------------CLOCK to SETUP longest path---------------------
Rise delay, Worst case
Instance name in pin-->out pin tr total incr cell
--------------------------------------------------------------------
END_OF_PATH
D.sel_r_ff R 9.99 0.00 DF1
I_1_CM8 : S10--->Y R 9.99 0.00 CM8
I_3_CM8 : S00--->Y R 9.99 4.40 CM8
a_r_ff_b1 : CLK--->Q R 5.60 5.60 DF1
BEGIN_OF_PATH
---------------------CLOCK to OUTPAD longest path--------------------
Rise delay, Worst case
Instance name in pin-->out pin tr total incr cell
--------------------------------------------------------------------
END_OF_PATH
outp_2_ R 11.95
OUTBUF_31 : D--->PAD R 11.95 7.55 OUTBUF
outp_ff_b2 : CLK--->Q R 4.40 4.40 DF1
BEGIN_OF_PATH
The timing analyzer has examined the following:
-
Paths that start at an input pad and end on the data input of a sequential logic cell (the D input to a D flip-flop, for example). We might call this an
entry path
(or
input-to-D path) to a pipelined design. The longest
entry delay
(or
input-to-setup delay) is 4.52 ns.
-
Paths that start at a clock input to a sequential logic cell and end at the data input of a sequential logic cell. This is a
stage path
(
register-to-register path or
clock-to-D path) in a pipeline stage. The longest
stage delay
(
clock-to-D delay) is 9.99 ns.
-
Paths that start at a sequential logic cell output and end at an output pad. This is an
exit path
(
clock-to-output path) from the pipeline. The longest
exit delay
(
clock-to-output delay) is 11.95 ns.
By pipelining the design we added three clock periods of latency, but we increased the estimated operating speed. The longest prelayout critical path is now an exit delay, approximately 12 ns—more than doubling the maximum operating frequency. Next, we route the registered version of the design. The Actel software informs us that the postroute maximum stage delay is 11.3 ns (close to the preroute estimate of 9.99 ns). To check this figure we can perform another timing analysis. This time we shall measure the stage delays (the start points are all clock pins, and the end points are all inputs to sequential cells, in our case the D input to a D flip-flop). We need to define the
sets
of nodes at which to start and end the timing analysis (similar to the path clusters we used to specify timing constraints in logic synthesis). In the Actel timing analyzer we can use predefined sets
'clock'
(flip-flop clock pins) and
'gated'
(flip-flop inputs) as follows:
timer> startset clock
timer> endset gated
timer> longest
1st longest path to all endpins
Rank Total Start pin First Net End Net End pin
0 11.3 a_r_ff_b2:CLK a_r_2_ block_0_OUT1 sel_r_ff:D
1 6.6 sel_r_ff:CLK sel_r DEF_NET_50 outp_ff_b0:D
... 8 similar lines omitted ...
We could try to reduce the long stage delay (11.3 ns), but we have already seen from the preroute timing estimates that an exit delay may be the critical path. Next, we check some other important timing parameters.
13.7.1
Hold Time
Hold-time problems can occur if there is clock skew between adjacent flip-flops, for example. We first need to check for the shortest exit delays using the same sets that we used to check stage delays,
timer> shortest
1st shortest path to all endpins
Rank Total Start pin First Net End Net End pin
0 4.0 b_rr_ff_b1:CLK b_rr_1_ DEF_NET_48 outp_ff_b1:D
1 4.1 a_rr_ff_b2:CLK a_rr_2_ DEF_NET_46 outp_ff_b2:D
... 8 similar lines omitted ...
The shortest path delay, 4 ns, is between the clock input of a D flip-flop with instance name
b_rr_ff_b1
(call this
X
) and the D input of flip-flop instance name
outp_ff_b1
(
Y
). Due to clock skew, the clock signal may not arrive at both flip-flops simultaneously. Suppose the clock arrives at flip-flop
Y
3 ns earlier than at flip-flop
X
. The D input to flip-flop
Y
is only stable for (4 – 3) = 1 ns after the clock edge. To check for hold-time violations we thus need to find the clock skew corresponding to each clock-to-D path. This is tedious and normally timing-analysis tools check hold-time requirements automatically, but we shall show the steps to illustrate the process.
13.7.2
Entry Delay
Before we can measure clock skew, we need to analyze the entry delays, including the clock tree. The synthesis tools automatically add I/O pads and the clock cells. This means that extra nodes are automatically added to the netlist with automatically generated names. The EDIF conversion tools may then modify these names. Before we can perform an analysis of entry delays and the clock network delay, we need to find the input node names. By looking for the EDIF
'rename'
construct in the EDIF netlist we can associate the input and output node names in the behavioral Verilog model,
comp_mux_rrr
, and the EDIF names,
piron% grep rename comp_mux_rrr_o.edn
(port (rename a_2_ "a[2]") (direction INPUT))
... 8 similar lines renaming ports omitted ...
(net (rename a_rr_0_ "a_rr[0]") (joined
... 9 similar lines renaming nets omitted ...
piron%
Thus, for example, the EDIF conversion program has renamed input port
a[2]
to
a_2_
because the design tools do not like the Verilog bus notation using square brackets. Next we find the connections between the ports and the added I/O cells by looking for
'PAD'
in the Actel format netlist, which indicates a connection to a pad and the pins of the chip, as follows:
piron% grep PAD comp_mux_rrr_o.adl
NET DEF_NET_148; outp_2_, OUTBUF_31:PAD.
NET DEF_NET_151; outp_1_, OUTBUF_32:PAD.
NET DEF_NET_154; outp_0_, OUTBUF_33:PAD.
NET DEF_NET_127; a_2_, INBUF_24:PAD.
NET DEF_NET_130; a_1_, INBUF_25:PAD.
NET DEF_NET_133; a_0_, INBUF_26:PAD.
NET DEF_NET_136; b_2_, INBUF_27:PAD.
NET DEF_NET_139; b_1_, INBUF_28:PAD.
NET DEF_NET_142; b_0_, INBUF_29:PAD.
NET DEF_NET_145; clock, CLKBUF_30:PAD.
piron%
This tells us, for example, that the node we called
clock
in our behavioral model has been joined to a node (with automatically generated name) called
CLKBUF_30:PAD
, using a net (connection) named
DEF_NET_145
(again automatically generated). This net is the connection between the node
clock
that is dangling in the behavioral model and the clock-buffer pad cell that the synthesis tools automatically added.
13.7.3 Exit Delay
We now know that the clock-pad input is
CLKBUF_30:PAD
, so we can find the exit delays (the longest path between clock-pad input and an output) as follows (using the clock-pad input as the start set):
timer> startset clockpad
Working startset 'clockpad' contains 0 pins.
timer> addstart CLKBUF_30:PAD
Working startset 'clockpad' contains 2 pins.
I shall explain why this set contains two pins and not just one presently. Next, we define the end set and trace the longest exit paths as follows:
timer> endset outpad
Working endset 'outpad' contains 3 pins.
timer> longest
1st longest path to all endpins
Rank Total Start pin First Net End Net End pin
0 16.1 CLKBUF_30/U0:PAD DEF_NET_144 DEF_NET_154 OUTBUF_33:PAD
1 16.0 CLKBUF_30/U0:PAD DEF_NET_144 DEF_NET_151 OUTBUF_32:PAD
2 16.0 CLKBUF_30/U0:PAD DEF_NET_144 DEF_NET_148 OUTBUF_31:PAD
3 pins
This tells us we have three paths from the clock-pad input to the three output pins (
outp[0]
,
outp[1]
, and
outp[2]
). We can examine the longest exit delay in more detail as follows:
timer> expand 0
1st longest path to OUTBUF_33:PAD (rising) (Rank: 0)
Total Delay Typ Load Macro Start pin Net name
16.1 3.7 Tpd 0 OUTBUF OUTBUF_33:D DEF_NET_154
12.4 4.5 Tpd 1 DF1 outp_ff_b0:CLK DEF_NET_1530
7.9 7.9 Tpd 16 CLKEXT_0 CLKBUF_30/U0:PAD DEF_NET_144
The input-to-clock delay, t
IC
, due to the clock-buffer cell (or macro)
CLKEXT_0
, instance name
CLKBUF_30/U0
, is 7.9 ns. The clock-to-Q delay, t
CQ
, of flip-flop cell
DF1
, instance name
outp_ff_b0
, is 4.5 ns. The delay, t
QO
, due to the output buffer cell
OUTBUF
, instance name
OUTBUF_33
, is 3.7 ns. The longest path between clock-pad input and the output, t
CO
, is thus
|
t
CO
= t
IC
+ t
CQ
+ t
QO
= 16.1 ns .
|
(13.23)
|
This is the critical path and limits the operating frequency to (1 / 16.1 ns)
ª
62 MHz.
When we created a start set using
CLKBUF_30:PAD
, the timing analyzer told us that this set consisted of two pins. We can list the names of the two pins as follows:
timer> showset clockpad
Pin name Net name Macro name
CLKBUF_30/U0:PAD <no net> CLKEXT_0
CLKBUF_30/U1:PAD DEF_NET_145 CLKTRI_0
2 pins
The clock-buffer instance name,
CLKBUF_30/U0
, is hierarchical (with a
'/'
hierarchy separator). This indicates that there is more than one instance inside the clock-buffer cell,
CLKBUF_30
. Instance
CLKBUF_30/U0
is the input driver, instance
CLKBUF_30/U1
is the output driver (which is disabled and unused in this case).
13.7.4 External Setup Time
Each of the six chip data inputs must satisfy the following set-up equation:
|
t
SU
(external) > t
SU
(internal) – (clock delay) + (data delay
|
(13.24)
|
(where both clock and data delays end at the same flip-flop instance). We find the clock delays in Eq.
13.24
using the clock input pin as the start set and the end set
'clock'
. The timing analyzer tells us all 16 clock path delays are the same at 7.9 ns in our design, and the clock skew is thus zero. Actel’s clock distribution system minimizes clock skew, but clock skew will not always be zero. From the discussion in
Section 13.7.1
, we see there is no possibility of internal hold-time violations with a clock skew of zero.
Next, we find the data delays in Eq,
13.24
using a start set of all input pads and an end set of
'gated'
,
timer> longest
... lines omitted ...
1st longest path to all endpins
Rank Total Start pin First Net End Net End pin
10 10.0 INBUF_26:PAD DEF_NET_1320 DEF_NET_1320 a_r_ff_b0:D
11 9.7 INBUF_28:PAD DEF_NET_1380 DEF_NET_1380 b_r_ff_b1:D
12 9.4 INBUF_25:PAD DEF_NET_1290 DEF_NET_1290 a_r_ff_b1:D
13 9.3 INBUF_27:PAD DEF_NET_1350 DEF_NET_1350 b_r_ff_b2:D
14 9.2 INBUF_29:PAD DEF_NET_1410 DEF_NET_1410 b_r_ff_b0:D
15 9.1 INBUF_24:PAD DEF_NET_1260 DEF_NET_1260 a_r_ff_b2:D
16 pins
We are only interested in the last six paths of this analysis (rank 10–15) that describe the delays from each data input pad (
a[0]
,
a[1]
,
a[2]
,
b[0]
,
b[1]
,
b[2]
) to the D input of a flip-flop. The maximum data delay, 10 ns, occurs on input buffer instance name
INBUF_26
(pad 26); pin
INBUF_26:PAD
is node
a_0_
in the EDIF file or input
a[0]
in our behavioral model. The six t
SU
(external) equations corresponding to Eq,
13.24
may be reduced to the following worst-case relation:
|
t
SU
(external)
max
|
> t
SU
(internal) – 7.9 ns + max (9.1 ns, 10.0 ns)
|
|
|
> t
SU
(internal) + 2.1 ns
|
(13.25)
|
We calculated the clock and data delay terms in Eq.
13.24
separately, but timing analyzers can normally perform a single analysis as follows:
|
t
SU
(external)
max
> t
SU
(internal) – (clock delay – data delay)
min
.
|
(13.26)
|
Finally, we check that there is no external hold-time requirement. That is to say, we must check that t
SU
(external) is never negative or
|
t
SU
(external)
min
|
> t
SU
(internal) – (clock delay – data delay)
max
> 0
|
|
|
> t
SU
(internal) + 1.2 ns > 0 .
|
(13.27)
|
Since t
SU
(internal) is always positive on Actel FPGAs, t
SU
(external)
min
is always positive for this design. In large ASICs, with large clock delays, it is possible to have external hold-time requirements on inputs. This is the reason that some FPGAs (Xilinx, for example) have programmable delay elements that deliberately increase the data delay and eliminate irksome external hold-time requirements.
[ Chapter start ] [ Previous page ] [ Next page ] |