# A Scan Shifting Method based on Clock Gating of Multiple Groups for Low Power Scan Testing

Sungyoul Seo<sup>1</sup>, Yong Lee<sup>1</sup>, Joohwan Lee<sup>2</sup>, Sungho Kang<sup>1</sup> <sup>1</sup>Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea <sup>2</sup>Samsung Electronics, Korea <sup>1</sup>E-mail: sungyoul@soc.yonsei.ac.kr

## Abstract

From the advent of very large scale integration (VLSI) design, a larger power consumption of a scan-based testing has been one of the most serious problems. The large number of scan cells lead to excessive switching activities during the scan shifting operations. In this paper, we present a new scan shifting method based on clock gating of multiple groups by reducing toggle rate of the internal combinational logic. This method prevents cumulative transitions caused by shifting operations of the scan cells. In addition, the existing compression schemes can be compatible with the proposed method without modification of decompression architecture. Experimental results on ITC'99 benchmark circuits and industrial circuits show that this shifting method reduces the scan shifting power in all cases. In spite of outperformed power, a burden of the extra logic is not necessary to be contemplated.

#### **Keywords**

Scan-based testing, low power scan testing, shifting power reduction, design-for-testability (DFT)

## 1. Introduction

As the manufacturing technology has developed, design complexities and scaling size has been improved rapidly. In the modern chip designs, the number of logic gate is over one hundred million gates [1]. This large chip design especially has a large number of the scan cells, which cause a huge number of the switching activities in the test mode. These activities make more dynamic power consumption and IR-drop [2]. Unfortunately, the power consumption is much more excessive during the test operation than during the functional operation [3]. This is because a number of changing states occur in the scan flip-flops when the test patterns are loaded and unloaded into the scan chains. Hence, toggling phenomenon transitions to the internal combinational logics and switching activities of these logics increase dramatically [4]. As a result, this problem may degrade scan test quality by causing a structural damage to silicon, bonding wires, or packages [5].

In order to overcome these damages, two types of the test power should be considered: average test power and peak test power. The average power means the ratio of consumed energy to test time [3] and it makes chip higher heat dissipation. As a result, the incremental temperature and current density require expensive test packages to tolerate excessive heat during test [6]. On the other hand, the peak power which means the highest power in a cycle leads to erroneous data transfer and fails test results [7]. The scan-based testing is still one of the important methods in the DFT fields because this method guarantees enhance controllability and observability [8]. It makes higher test coverage and faster test time than alternative ways, such as a functional test. However, this testing should be operated on the limited environment. For example, the shift operations should be activated at extremely reduced frequencies for satisfying not to reach the threshold power.

To improve efficiency of the scan-based testing, there are two major solutions to reduce excessive test power in the research works: automatic test pattern generation (ATPG)based and DFT-based [7]. The ATPG-based solution analyzes and/or controls the test patterns for reduction of test power [9]. Many proposed works are published based on Xfilling [10, 11], test pattern reordering [12] and low power test pattern generation algorithm [13]. These works can be easily applied to the conventional test flow through ATPG without any modification of an original design. The DFTbased solution inserts extra DFT logics and/or modifies conventional structures [9]. This solution needs to examine the trade-off between additional burdens of the area overhead and the effects of the power reduction. There are various examples using the scan cell gating [2, 14], scan chain modification [15] and scan clock gating [16]. These works make to outperform the ATPG-based solution, whereas increase the area overhead such as the control logics.

In respect of the reduction of the power consumption, the DFT-based solution generally outperforms the ATPG-based solution when the burden of the extra logic is tolerable. In this paper, we propose a new DFT-based solution using the scan clock gating effectively, thus extra hardware is added. However, the proposed method has low area overhead and low complexity for maintaining high controllability. The remainder of this paper is organized as follows. Section 2 describes the preliminaries of shift power estimation and clock gating methods. In Section 3, the proposed low power scan shifting method is introduced. The experimental results are shown in Section 4, and we conclude our proposal and exhibit future works in Section 5.

#### 2. Preliminaries

#### 2.1 Scan Shifting Power Estimation

There are two kinds of the power consumption in complementary metal-oxide-semiconductor (CMOS) integrated circuits (ICs), the one is the static power due to a leakage current and the other is the dynamic power due to charging and discharging of a load capacitance [17]. From the point of view of improving shifting speed, the dynamic power is more predominant than the static power, especially when the circuit components switch from 0 to 1 or vice versa [11].

Under the scan-based testing, the two problems are incurred: shifting power and capture power. Although the capture power is one of the serious problems, it has smaller power consumption than the shifting power. This is because most of switching activities is due to the transitions in the scan chain when loading and unloading the test patterns [18].

For estimating the scan shifting power, an equation which is called weighted transition metric (WTM) was proposed in [11]. WTM is enough to compute the power consumption incurred by the scan shifting; the shift power in the  $i^*$ pattern can be estimated as follows:

$$WTM_i = \sum_{j=1}^{N-1} \left( S_{i,j} \oplus S_{i,j+1} \right) \times j \tag{1}$$

where *N* is the number of scan cells into the scan chain and  $S_{i,j}$  represents the logic state of the  $j^{th}$  scan cell in the  $i^{th}$  test pattern. In the (1) equation, multiplying by j means that the switching activities generate cumulative transitions from a port of scan in (SI) to the  $j^{th}$  scan cell. The proposed method blocks the cumulative transition using gating the scan clocks.

#### 2.2 Clock Gating

Clock gating is one of the most preferred techniques for low power management mechanism in practice [19]. It enables to block the clock of the unnecessary flip-flop in a current cycle. Generally, AND gates are inserted to the circuitry between the flip-flop and the clock signal ports. Hence, the clock pulse is disabled when another port of the AND gate is '0'. The method of the clock gating has a great strength to save the power of the registers and clock-line. In the scan shifting mode, this method is more efficient by disabling the unused scan cells. All scan cells are divided into many groups to apply the clock gating in this paper.

# **3.** A Scan Shifting Method based on Clock Gating of Multiple Groups

We propose a new scan shifting method using the clock gating of the multi groups which minimize enabling the scan cells, thus it blocks unused scan cells. The scan test architecture, which is composed of  $N \times M$  scan cells is shown in Fig. 1, where N is the number of the scan chains and M is the number of the groups. These scan cells are named as  $Cell_{nm}$ , which means that it is a part of the m group in the  $n^{th}$  scan chain. A demultiplexer (deMUX), an inverter, some OR, AND and exclusive OR (XOR) gates, and some wires are inserted for using the proposed method. The decompressor such as linear-decompression or broadcast can be used to decompress test data, but it requires a counter for delivering a counting value to the deMUX. Hence, any decompressor can be used with the proposed method because there is no need to change the input test patterns and the timings of scan shifting. Hence, it is not necessary to modify the existing compression method in order to apply the proposed method.

A design and a test flows are carried out simultaneously



Figure 1: A conceptual overview of the proposed architecture



Figure 2: A design flow chart considering proposed method.

in the design of the modern VLSI. For the proposed method, additional steps should be considered between scan cell reordering and physical optimization. The design flow that considers additional works is shown in Fig 2. There are two steps that are categorized: division of the scan cells and insertion and modification of the logic design.

In the first step, the results of the scan cell reordering conducted in the previous step are important to determine how many groups will be created. The depth of the scan chain is the number of groups and the position of the scan cell in a scan chain means the group number. After grouping the scan cells, the extra logic is inserted to the existing logic such as the components of some gates and a counter. For gating scan clock, one AND gate and one OR gates are inserted per a group. These gates are placed between a scan clock line and the scan clock ports, which are controlled by the deMUX and a scan enable (SE) signal. Moreover, the ports of the scan out (SO) into the each scan chain are connected to the XOR gates.

A simple example of the scan structure is shown in Fig. 3, which has two multiple scan chains and three groups. Each group is involved in a scan clock (SCK) and the scan cells in a scan chain are directly connected to the same SI. Hence,



Figure 3: A simple example of grouping the scan cells.



Figure 4: A timing diagram for an example of Fig. 3.

the value of the scan cells is inserted when their group is activated by SCK. Its timing diagram is presented in Fig. 4. In the scan test mode, SE signal determines whether current test operation is the shifting mode or the launch and capture mode. When SE is '1', one SCK is activated by the deMUX selection (SEL) value. On the other hand, All SCKs are activated when SE is '0', thus it performs the launch and capture. In addition, all SO ports are connected to a XOR gate that is placed on each scan chain. Because the outputs of the test results can be observed through the XOR gate per a cycle, there is no problem with loss on the output stage of the test results.

As mentioned above, all flip-flops in a scan chain are connected to a SI port. Hence, it reduces the large number of switching activities caused by the insertion of the shifting patterns serially. Moreover, the method of the SO ports connection eliminates the shift-out power. The outperformed power reduction is presented in Section 4.

**Table 1:** Information of the ITC'99 benchmark and industrial circuits.

| Circuit | # of<br>Scan<br>Cells | # of<br>Gates | # of<br>PIs | # of<br>POs | # of<br>Patterns | Test<br>Coverage<br>(%) |
|---------|-----------------------|---------------|-------------|-------------|------------------|-------------------------|
| b17     | 1,415                 | 32,326        | 37          | 97          | 664              | 99.97                   |
| b18     | 3,320                 | 114,621       | 36          | 23          | 1,325            | 99.94                   |
| b19     | 6,642                 | 231,320       | 21          | 30          | 1,392            | 99.92                   |
| b20     | 490                   | 20,226        | 32          | 22          | 1,421            | 99.78                   |
| CKT-1   | 116,671               | 4.2M          | 593         | 590         | 5,970            | 96.48                   |
| CKT-2   | 273,287               | 6.5M          | 1,470       | 1,845       | 6,331            | 97.01                   |
| CKT-3   | 161,303               | 9.3M          | 728         | 772         | 4,608            | 98.80                   |
| CKT-4   | 342,982               | 11.4M         | 774         | 600         | 9,279            | 97.00                   |
| CKT-5   | 219,246               | 16.3M         | 1,233       | 1,086       | 9,419            | 96.73                   |

Table 2: Power estimation comparison.

|         | # of   | Shift WTM  |             |                 |         |  |  |
|---------|--------|------------|-------------|-----------------|---------|--|--|
| Circuit | Scan   | Basic Scar | n Test Mode | Proposed Method |         |  |  |
|         | Chains | Avg.       | Peak        | Avg.            | Peak    |  |  |
| h17     |        | 1,716      | 10,802      | 698             | 854     |  |  |
| 017     |        |            |             | (40.7%)         | (7.9%)  |  |  |
| b18     |        | 8,044      | 59,168      | 1,661           | 2,069   |  |  |
|         | 100    |            |             | (20.6%)         | (3.5%)  |  |  |
| h10     | 100    | 33,957     | 210,292     | 3,410           | 4,386   |  |  |
| 019     |        |            |             | (10.0%)         | (2.1%)  |  |  |
| b20     |        | 424        | 1,231       | 232             | 287     |  |  |
| 020     |        |            |             | (54.7%)         | (23.3%) |  |  |
| CKT-1   | /00    | 270,850    | 7,435,073   | 45,644          | 116,536 |  |  |
|         | 499    |            |             | (16.9%)         | (1.6%)  |  |  |
| CKT-2   | 1140   | 1,337,710  | 17,247,774  | 164,262         | 270,508 |  |  |
|         |        |            |             | (12.3%)         | (1.6%)  |  |  |
| CKT-3   | 699    | 476,682    | 8,672,658   | 56,638          | 151,295 |  |  |
|         |        |            |             | (11.9%)         | (1.7%)  |  |  |
| CKT-4   | 1420   | 1,415,085  | 21,378,144  | 147,179         | 327,708 |  |  |
|         |        |            |             | (10.4%)         | (1.5%)  |  |  |
| CKT-5   | 932    | 730,909    | 13,516,780  | 149,440         | 219,246 |  |  |
| 011-5   |        |            |             | (20.4%)         | (1.6%)  |  |  |

#### **4. Experimental Results**

To examine the improved effects of the proposed method, experiments are performed on the four ITC'99 benchmark circuits. In addition, five industrial circuits, which are provided by Samsung Electronics, are used in order to show the effectiveness of the proposed method on the real designs. The information about these circuits is shown in Table 1. All test patterns are generated from TetraMAX [20], which is the ATPG tool of Synopsys with the dynamic compaction and the adjacent-fill turned on. Generally, the adjacent-fill is known as simple and efficient to reduce the power. The power estimation is applied to the WTM method and the area overhead is represented by the gate count.

The results of the WTM estimation are presented in Table2. The basic scan test mode indicates that it does not use any low power technique. The numbers in parentheses refer to the ratio of the proposed method to the basic method. In this experiment, the WTM of the proposed method is much lower than the basic scan test mode in the all circuits. Especially, the results show that the peak power is 1.7% in the industrial circuits. It leads to overcome the problems of the erroneous data transfer significantly. The outperformed results are necessary because the proposed method does not produce transitions during the scan shifting mode; hence, *j* is always 1 in the (1) equation regardless of the position of the scan cells. For this reason, the maximum WTM of the proposed method is no more than the number of their scan flip-flops and the proposed method can obtain low power in both average power and peak power. Therefore, this method has less dependency of the pattern formats and it is possible to use the existing test patterns generated by the scan compression architecture such as a linear-decompression and a broadcast-based test data compression although the most of low power testing techniques has tended to ignore their compatibility. The two main issues in the scan-based testing can be easily resolved when the low power testing method and the test data compression method are combined.

The hardware area overheads for ITC'99 benchmark and

| Circuit | Hardware Area Overhead (%) |      |      |      |      |      |              |      |      |       |
|---------|----------------------------|------|------|------|------|------|--------------|------|------|-------|
|         | N=10                       | N=20 | N=30 | N=40 | N=50 | N=60 | <i>N</i> =70 | N=80 | N=90 | N=100 |
| b17     | 1.85                       | 1.16 | 0.90 | 0.82 | 0.70 | 0.69 | 0.69         | 0.69 | 0.62 | 0.64  |
| b18     | 1.04                       | 0.60 | 0.44 | 0.38 | 0.35 | 0.30 | 0.29         | 0.28 | 0.28 | 0.28  |
| b19     | 0.96                       | 0.53 | 0.38 | 0.31 | 0.27 | 0.23 | 0.22         | 0.21 | 0.20 | 0.17  |
| b20     | 1.35                       | 0.91 | 0.84 | 0.70 | 0.70 | 0.74 | 0.62         | 0.67 | 0.70 | 0.74  |
| CKT-1   | 0.83                       | 0.42 | 0.28 | 0.22 | 0.17 | 0.15 | 0.13         | 0.11 | 0.10 | 0.09  |
| CKT-2   | 1.25                       | 0.63 | 0.42 | 0.32 | 0.26 | 0.22 | 0.19         | 0.16 | 0.15 | 0.13  |
| CKT-3   | 0.52                       | 0.26 | 0.18 | 0.13 | 0.11 | 0.09 | 0.08         | 0.07 | 0.06 | 0.06  |
| CKT-4   | 0.90                       | 0.45 | 0.30 | 0.23 | 0.18 | 0.15 | 0.13         | 0.12 | 0.10 | 0.09  |
| CKT-5   | 0.40                       | 0.20 | 0.14 | 0.10 | 0.08 | 0.07 | 0.06         | 0.05 | 0.05 | 0.04  |

**Table 3:** Hardware area overhead comparison.

industrial circuits according to the number of the scan chains are shows in Table 3. Note that the extra logic for using proposed method is much smaller compared to the original circuits. In the trend of the scan-based testing, the number of the scan chains tends to increase continually. Hence, the results in Table 3 show the possibility for applying the proposed method to the industrial circuits. The area overhead decreases rapidly as the number of scan chains increases except for small circuits, such as b17 and b20. These results show that the proposed method can be applied to the large industrial designs.

#### 5. Conclusion

In this paper, we present a new scan shifting method based on clock gating of multiple groups by reducing toggling rate of the internal combinational logic. The proposed method shows outstanding results: low power consumption and low area overhead. Experimental results show that in all cases the performance of the proposed method is effective regardless of the size of the circuits by applying to various benchmark circuits and industrial circuits. Moreover, this method is compatible with the existing compression methods such as the linear-decompression and the broadcast-based test data compression. To conclude, the proposed method can be a viable solution for large circuits with a very small area overhead.

#### 6. Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2012R1A2A1A03006255). In addition, this work was supported by industrial-educational cooperational program of Samsung. [2014-11-0799]

#### 7. References

- [1] ITRS 2012 Edition Reports [online]. Available: http://www.itrs.net.
- [2] E. Alpaslan, Y. Huang, and X. Lin, "On Reducing Scan Shift Activity at RTL," IEEE Trans. Comput.-Aided Des. Integr. Circuit Syst., vol. 29, no. 7, pp. 1110–1120, Jul. 2010.
- [3] P. Girard, "Survey of Low-Power Testing of VLSI Circuit," IEEE Des. & Test of Comput., vol. 19, no. 3, pp. 80–90, May-June 2002.
- [4] W.-L. Li, P.-H. Wu, and J.-C. Rau, "Reducing switching activity by test slice difference technique for

test volume compression," in Proc. IEEE Int. Symp. On Circuit and syst., May 2009, pp. 2686–2989.

- [5] A. Chandra and K. Chakrabarty, "Combining lowpower scan testing and test data compression for system-on-a-chip," in Proc. Des. Autom. Conf., June 2001, pp. 166–169.
- [6] X. Lin and Y. Huang, "Scan Shift Power Reduction by Freezing Power Sensitive Scan Cells," J. Electron Test., vol. 24, no. 4, pp. 327–334, Aug. 2008.
- [7] C. P. Ravikumar, M. Hirech, and X. Wen, "Test Strategies for Low Power Devices," in Proc. Des. Autom. & Test in Europe Conf. & Exhibition, Mar. 2008, pp. 728–733.
- [8] A. Jain, and S. Subramanian, "Multi-CoDec Configurations for Low Power and High Quality Scan Test," in Proc. VLSI Des. Int. Conf., Jan. 2011, pp. 370–375.
- [9] W. Zhao, M. Tehranipoor, and S. Chakravarty, "Power-Safe Test Application Using An Effective Gating Approach Considering Current Limits," in Proc. IEEE VLSI Test Symp., May 2011, pp. 160–165.
- [10] J. Li, Q. Xu, and Y. Hu, "X-Filling for Simultaneous Shift- and Capture-Power Reduction in At-Speed Scan-Based Testing," IEEE Trans. Very Large Scale Integr. Syst., vol. 18, no. 7, pp. 1081–1092, Jul. 2010.
- [11] K. Sankaralingam, R. R. Oruganti, and N. A. Touba, "Static Compaction Techniques to Control Scan Vector Power Dissipation," in Proc. IEEE VLSI Test Symp., May 2000, pp. 35–40.
- [12] L.-C. Hsu and H.-M. Chen, "On Optimizing Scan Testing Power and Routing Cost in Scan Chain Design," in Proc. Int. Symp. Quality Electronic Des., Mar. 2006, pp. 451–456.
- [13] X. Wen, S. Kajihara, K. Miyase, T. Suzuki, K. K. Saluja, L.-T Wang, K. S. Abdel-Hafez, and K. Kinoshita, "A New ATPG Method for Efficient Capture Power Reduction during Scan Testing," in Proc. IEEE VLSI Test Symp., Apr.-May 2006, pp. 58–65.
- [14] Y.-T. Lin, J.-L. Huang, and X. Wen, "A Transition Isolation Scan Cell Design for Low Shift and Capture Power," in Proc. IEEE Asian Test Symp., Nov. 2012, pp. 107–112.
- [15] S. Wang, K. Li, and S. Chen, "Scan-Chain Partition for High Test-Data Compressibility and Low Shift Power under Routing Constraint," IEEE Trans. Comput.-

Aided Des. Integr. Circuit Syst., vol. 28, no. 5, pp. 716–727, May 2009.

- [16] D. Czysz, M. Kassab, X. Lin, G. Mrugalski, J. Rajski, and J. Tyszer "Low-Power Scan Operation in Test Compression Environment," IEEE Trans. Comput.-Aided Des. Integr. Circuit Syst., vol. 28, no. 11, pp. 1742–1755, Nov. 2009.
- [17] P. Girard, "Low Power Testing of VLSI Circuits: Problems and Solutions," in Proc. Int. Symp. Quality Electronic Des., Mar. 2000, pp. 173-179.
- [18] M. Chen and A. Orailoglu, "Scan Power Reduction for Linear Test Compression Schemes Through Seed Selection," IEEE Trans. Very Large Scale Integr. Syst., vol. 20, no. 12, pp. 2170–2183, Dec. 2012.
- [19] H. Furukawa, X. Wen, K. Miyase, Y. Yamato, S. Kajihara, P. Girard, L.-T. Wang, and M. Tehranipoor, "CTX: A Clock-Gating-Based Test Relaxation and X-Filling Scheme for Reducing Yield Loss Risk in At-Speed Scan Testing," in Proc. IEEE Asian Test Symp., Nov. 2008, pp. 397–402.
- [20] TetraMAX ATPG User Guide, version I-2013.12-SP4. Synopsys Inc.