# A Statistic-Based Scan Chain Reordering for Energy-Quality Scalable Scan Test

Sungyoul Seo, Keewon Cho, Young-Woo Lee, and Sungho Kang<sup>(D)</sup>, Senior Member, IEEE

Abstract—As a rapid progress in technology processes, the design integration of high-performance system-on-chip (SoC) is on the rise rapidly. To incorporate hundreds of IP cores into a single chip, a modern SoC exceeds ten million gates with a large number of scan cells, so that it leads excessive energy consumption. In this paper, we present an energy-quality (EQ) scalable scan test method using new scan chain reordering. The method conducts three stages, which are a new scan partitioning, a scan partition-based X-filling, and a statistic-based scan stitching to reduce test energy consumption without quality degradation. The proposed scan partitioning method prevents excessive routing overhead. Then, the proposed scan chain reordering is performed by a statistical analysis considering EQ scalability. It also covers two frequently-used fault models: 1) stuck-at and 2) transition delay. The experimental results show that the proposed scan chain reordering method achieved lower energy consumption and relieve the routing overhead on ISCAS'89, ITC'99, and IWLS'05 OpenCores benchmark circuits in most cases compared with previously existing methods without excessive runtime overhead.

*Index Terms*—Design for testability (DFT), test energy reduction, low power testing, scan-based testing, scan chain reordering, energy-quality (EQ) scalable test.

#### I. INTRODUCTION

S A fast-growing manufacturing technology of recent system-on-chip (SoC) products, it leads to further increased design complexity. Modern SoC designs contain several intellectual property (IP) cores such as logics, memories, analogs, high speed I/O interfaces, and radio frequencies (RFs), requiring over a billion transistors [1] for high performance and computing power [2]. However, sub-90nm process must overcome additional barriers of technology scaling, which threaten the correct operation and increase the energy consumption of SoCs [3]. For this reason, the excessive energy becomes a major bottleneck in SoC [4], micro-processor [5] design, and testing [3]. Following this trend, a concept of the energy-quality (EQ) scalable very large scale integration (VLSI) circuit and system have been introduced, and it offers new opportunities to improve energy efficiency while meeting

Manuscript received December 29, 2017; revised March 29, 2018; accepted April 28, 2018. Date of publication May 7, 2018; date of current version September 11, 2018. This work was supported in part by the Ministry of Trade, Industry and Energy under Grant 10067813 and in part by the Korea Semiconductor Research Consortium support program for the development of the future semiconductor device. This paper was recommended by Guest Editor V. De. (*Corresponding author: Sungho Kang.*)

The authors are with the Electrical and Electronic Engineering Department, Yonsei University, Seoul 03722, South Korea (e-mail: sungyoul@soc.yonsei.ac.kr; ckw1505@soc.yonsei.ac.kr; roberto@soc.yonsei.ac.kr; shkang@yonsei.ac.kr).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JETCAS.2018.2833846

the expected quality [4]. Here, the quality is defined as the reliable deliverance of the expected result, and the energy denotes a power consumption such as voltage and/or switching activity.

From the test perspective, scan-based testing is generally known as obtaining a higher test quality than alternative ways [6]. Therefore, almost all of current VLSI systems include scan design and scan-based test is performed. The proposed method can be applied to any scan-based VLSI systems. Although this scheme can improve the design's controllability and observability during the test, its high energy consumption presents a significant challenge [7], [8]. There is considerable amount of switching activities during shift-in and out using scan chains, whose amount is usually proportional to the number of flip-flops. These experiments indicate that the power consumption and the switching activity during the scan-based test is 1.61X and 4.12X on average compared to that during the functional operations, respectively [9], [10]. Because the switching activities into the scan chains are forwarded to the combinational logic during the shift and/or capture of test patterns [11], and thus, the quality of the reliability is easily degraded by structural damage to the silicon, bonding wires, or packages [12].

There are two major considerations with respect to overcoming the aforementioned energy problems: 1) shift power and 2) capture power. Shift power arises due to differences among each adjacent scan cell's value during loading and unloading the test patterns [13]. To remain within the power budget of the functional operation, the frequencies must be extremely reduced while the shift phase is operated [14]. On the other hand, capture power is due to simultaneously switching of the value within the same scan cell, which replace the input test patterns to the response during the capture phase [15]. It is important to reduce the capture violation for blocking the test fail [16].

In this paper, we propose a new scan chain reordering method that reduces the test energy consumption during scan shifting phase while maintaining the reliable quality as part of the EQ scalable circuit and system. Here, the quality standard is defined by the test reliability and a test time because excessive energy consumption can degrade the reliability and increase the test time. To satisfy the allowable energy consumption during scan-based testing, the test frequency is significantly adjusted to the low frequency. It is also useful to achieve the reduction of the scan shift power without the degradation of the reliable quality, but it makes a test cost increasing. The novelty of the proposed scan chain

2156-3357 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

reordering method is that it is composed of the three stages, which are scan partitioning, scan-based X-filling, and statisticbased scan stitching. The scan partitioning is performed for reliving the routing overhead, the scan-based X-filling is executed for making a lot of same bit streams, and the statistic-based scan stitching is performed for the reduction of the energy consumption under low routing overhead. The performance of the proposed method is evaluated using the experiments on International Symposium on Circuits and Systems (ISCAS)'89, International Test Conference (ITC)'99, and International Workshop on Logic and Synthesis (IWLS)'05 OpenCores benchmark circuits. The experimental results show that relative to [13], the energy consumption during the scan shift mode is reduced by 66.6% and the scan chain length by 21.5% with a negligible runtime overhead. Additionally, both energy consumption during the scan shift mode and capture mode are reduced by approximately 30.6% and 55.7%, respectively, while the routing is increased by 19.9% compared to [33].

The remainder of this paper is organized as follows. Section II describes the preliminaries of related work on scan-based testing for low energy consumption, probability of the switching activities, estimation of energy consumption, and we also present the EQ scaling strategy and motivation for the paper. Section III presents the newly proposed scan chain reordering method, which performs scan partitioning, X-filling, and statistic-based scan stitching. The experimental results are discussed in Section IV, and Section V concludes the paper.

## **II. PRELIMINARIES**

#### A. Related Work

To improve the energy efficiency and reliability of scanbased testing, it is necessary to apply a low energy test solution. For this reason, various low energy scan testing solutions have been researched with respect to the following two features: 1) automatic test pattern generation (ATPG)based and 2) design-for-testability (DFT)-based [17].

Generally, an ATPG-based solution analyzes the test patterns and controls the configurations or results of the test pattern generation [18]. X-filling is a method that X-bits (don't care bits) are filled with 0 or 1 according to the goals. It is based on the observation that most of test patterns from ATPG consist of a lot of the X-bits. As using this observation, X-filling is by filling the X-bits with the 0 or 1 values to make the test patterns with small number of toggles if the goal is reduction of the shift transitions and/or the capture transitions. For example, if an ATPG pattern is "11XXX111" and it is intended to the reduction of the shift transitions, after X-filling, the pattern becomes "11111111".

The adjacent filling is one of the X-filling methods that can improve the efficiency with respect to the shift-in transition. The X-bits are filled with 0 or 1 values that are the same in the adjacent bits. Li *et al.* [19] proposed a new X-filling method, namely "iFill," to reduce shift and capture power simultaneously. Chen *et al.* [20] proposed a physical-aware X-filling method, which determines the don't care bits with the layout information. Gulve and Singh [21] proposed the Integer linear programming (ILP)-based X-filling technique to reduce the capture power of launch-on-capture (LOC) and launch-on-shift (LOS) schemes.

Low power test pattern generation is also a type of ATPGbased solutions. Wen *et al.* [22] presented a power-aware test generation for at-speed testing. Li *et al.* [23] reduced the capture power by proposing the new test pattern refinement and the low power test pattern regeneration method. This solution reduces the test energy without requiring any additional DFT hardware. Nevertheless, it is difficult to apply an actual test flow due to less effective energy reduction when compared to a DFT-based solution [15].

A DFT-based solution inserts additional DFT logics and/or modifies the scan architecture or cell structure [18], [24]. Scan clock gating disables the clock signals, blocking unnecessary transitions in the scan cell output during the scan shifting phase. Lin *et al.* [25] proposed a scan cell design, namely "transition-isolation design (TIDE)," for scan clock gating. Seo *et al.* [14] presented a new scan shifting method based on clock gating of multiple groups. Similarly, scan chain modification [26]–[28] is used to prevent cumulative transitions. These methods reduce the test energy, but they require much additional hardware and/or routing compared to the other methods. They also cause clock skew problems during functional operation and degrade test coverage during test operation [26].

Scan chain reordering method is one of the DFT-based solutions. Bonhomme et al. [29] presented a heuristic ordering method, but it doesn't consider routing overhead. Later [30], they proposed an improved method that optimizes the energy efficiency under the given routing constraints. Seo et al. [15] proposed a weighted hamming distance-based scan reordering method with an X-filling method. Wu and Chao [13] described a power- and routing-aware scan reordering method, namely "PRORO." Li et al. [31] introduced a new scan chain reordering method by exploiting two complementary connection. Cui et al. [32] described a K-means clustering-based scan reordering method under a routing constraint. These methods modify the scan chain path to reduce the switching activity during the scan shifting phase. Although the scan chain reordering method has a routing overhead, it is widely applied to reduce test power due to their high performance with simple architecture. More recently, Pathak et al. [33] proposed a scan chain stitching method based on logic cluster controllability (LoCCo), which reduce a shift-in power and the computational time of the scan reordering flow.

In this paper, we propose a new scan chain reordering method, which is composed of scan partitioning and scan partition-based X-filling, and statistic-based scan stitching. The proposed method considers the shift power reduction with the routing overhead and the computational time. The performance of the proposed method compared by estimating their energy consumption in benchmarks using widely used method, as described in Section II-C.

## B. Probability of Switching Activities

As previously mentioned, the probability of switching activities in sequential cells is significantly higher during scan test



Fig. 1. Switching probabilities of test data of s38417 benchmark circuits.

operation than during functional operation. This is because the probability is approximately 0.5 during a scan shifting phase, while it approximately corresponds to 0.2–0.3 in functional operation. To verify this characteristic, experiments are conducted on ISCAS'89 benchmark circuits in terms of the test pattern; the results are presented in Fig. 1.

Each line indicates the switching probability between two pairs of adjacent scan cells, where the blue-solid is applied to the default option and the green-dotted line is applied to the adjacent fill. The test data for this experiment were extracted from s38417 circuit using the ATPG algorithm of TetraMAX. In the default case, the average probability of the switching activities is 46.35%. Moreover, it exceeds 50.00% (above 12% up to 79%) when it reaches a peak. Although adjacent fill is widely used for reducing the switching probability by X-filling, it is still insufficient to meet the requirement of low energy scan test. Specifically, the switching activities is consistently cumulative under serial loading by the scan shift input operation. Hence, the switching operations of the input pattern is better placed at the front part, while that of output pattern should be placed at the back part. It can be clarified by an equation of an energy consumption estimation method; it presents in Section II.D

Various low energy test methods have been developed without adjacent fill. The purpose of this paper has the same goals as previous works, but we address a new test method, which uses the scan partitioning, the new X-filling and static-based scan stitching methods as well as it outperforms compared to the previous works.

#### C. Estimation of Energy Consumption

There are two types of energy consumptions in complementary metal-oxide-semiconductor (CMOS) integrated circuits (ICs), namely static power and dynamic power. The former energy consumption is due to leakage current, while the latter is mostly caused by the charging and discharging of a load capacitance [34]. Dynamic power is more important than static power from the view point of a low energy scan testing, and especially with respect to switching activities from 0 to 1 or vice versa [35]. Specifically, dynamic power management is preferred as increasing the energy efficiency with minimization of the performance quality [36]. Therefore, we concentrate two types of dynamic power as follows: 1) the shift-in power and 2) the shift-out power.

To estimate scan shift power, a weighted transition metric (WTM) was proposed in a previous work [35] which is a widely-preferred estimation method. WTM contains all power consumption incurred by scan shifting. The shift-in power in the *i*-th pattern is estimated using the following equation:

$$S_PWR_i = \sum_{j=1}^{N-1} (C_{i,j} \oplus C_{i,j+1}) \times j$$
 (1)

where N indicates the number of scan cells in the scan chain and  $C_{i,j}$  represents the *j*-th scan cell value in the *i*-th input test pattern. When the shift-out power is estimated, the last term is replaced by (N - j). Therefore, it is more important to reduce the switching activities when last term has high value, i.e. the test input data should alleviate the switching activities on the part of the last cells (*j* is close to N-1). In this paper, the scan shift power was calculated using (1).

#### D. EQ Scaling Strategy and Motivation

As the expression of the dynamic energy in digital circuits, the dynamic energy for the scan-based test can be inferred as follows:

$$E_{dyn} = \alpha \cdot C \cdot V^2 \tag{2}$$

where  $E_{dyn}$  represents the dynamic energy for the scan test,  $\alpha$  indicates the switching activity. *C* is the capacitance and *V* denotes the supply voltage. Additionally, the dynamic power for the scan-based test can be expressed as follows:

$$P_{dyn} = \alpha \cdot C \cdot V^2 \cdot f \tag{3}$$

The difference between (2) and (3) is only whether it includes the frequency or not. Here, *C* and *V* are almost entirely dependent on the manufacturing technology, while  $\alpha$  can be handled for meeting the required dynamic energy. Generally, the power rail is specified during the initial design phase of the circuit and it is insufficient to meet the required power during the scan test, so the test frequency must be lowered [37]. Besides, the ratio of the energy availability during the wafer



Fig. 2. Objective of the proposed EQ scalable testing.



Fig. 3. Flow chart for the proposed EQ scalable testing solution.

test is becoming lower than during the functional operation as the technology process is shrunk [38]. As a result, the dynamic energy for the scan-based test must be lowered by control of the switching activities to meet the energy and power limits. As an additional benefit, the frequency can be improved due to the lowered dynamic energy by the proposed method.

Here, the quality is defined by the test reliability and the test frequency. To reach the target quality at the low energy consumption as shown in Fig. 2, the main purpose is to reduce the switching activity in (2) using the proposed scan chain reordering. In this figure,  $E_a$  indicates the allowable energy for the test and  $E_r$  is the reduced energy by using the proposed method. Following (3), the test frequency using the conventional scan chain reordering method ( $f_{conv}$ ) increases as the energy consumption rises. However, it is limited to increase the test frequency due to the restricted energy ( $E_a$ ). For overcoming the limited test frequency with the reliable test quality, the energy consumption is lower than the  $E_a$  such as  $E_r$ . As a result, proposed method enables to increase the test frequency without the degradation of the test reliability.

The conventional scan chain reordering methods have outperformed other methods in most of cases, whereas the development of this method is restricted due to several problems. This method has a lot of considerations compared to the other methods, including physical layout information, further scan chain lengths, and the runtime overhead. Moreover, one of the biggest problems is that almost all proposal was limited to stuck-at faults testing without considering transition delay faults. According to current trends of the scan test, the transition delay faults must also be tested.

For this reason, we overcome the problems as discussed above. A new scan partition method is firstly presented for considering routing overhead using the physical layout information in this paper. A new X-filling method is then proposed to facilitate the proposed scan cell reordering method. Finally, a statistic-based scan stitching is proposed for the low energy consumption. This method can largely reduce the test energy consumption with providing the negligible runtime overhead, while only slightly increasing the routing overhead compare to the state-of-aft methods. Besides, the capture power is also reduced although the existing methods do not consider both shift and capture power consumptions. Therefore, both stuckat and transition delay faults can be handled by the proposed method. The detailed contents are sequentially described in Section III.

#### **III. PROPOSED METHOD**

The proposed new scan chain reordering for EQ scalable testing must set up the DFT in a conventional back-end design flow, as shown in Fig. 3. In the conventional DFT setup, placement-aware scan reordering is generally used for reducing the total wire length. This reordering makes the scan cells near each other, so the routing length and the number of the buffers and/or inverters can be reduced. Instead, the proposed DFT setup flow focuses on the reduction of switching activities caused by the scan cells rather than the total wire length. This flow is illustrated as the dark-gray blocks in Fig. 3. Unlike the conventional methods, the proposed scan chain reordering method proceeds three procedures, scan partitioning, scan partition-based X-filling, and statistic-based scan stitching. The new scan partitioning firstly performed to relieve the scan routing overhead, limiting the range of the scan chain reordering by gathering the near scan cells. Next, the new scan partition-based X-filling is executed using the scan partition information for making the same bit streams to the maximum possible extent in each scan partition. The same bit streams make it easy to reduce the transitions between the adjacent scan cells. The statistic-based scan stitching is performed by using the fully filled test patterns in each scan partition. The scan stitching is executed according to the proposed statistic value (S-value) as will be described in Section III-C. Note that our method differs from the conventional DFT setup by using the proposed power-aware scan chain reordering.

## A. Scan Partitioning

The goal of the existing scan partitioning for low energy scan-based test makes the same bit streams as much as possible in each partition, while the aim of the proposed



Fig. 4. Simple example for finding the smallest area of the rectangle.(a) Reference rectangle comprised of Scan Cell #1 and Scan Cell #2.(b) Rectangle comprised of the reference and Scan Cell #3. (c) Rectangle comprised of the reference and Scan Cell #4.

scan partitioning relives the routing overhead by gathering the scan cells as close as possible in the minimum area of each rectangle. Each scan partition is a form of the small rectangular as shown in Fig. 5. Moreover, there is a single scan chain in every scan partition, hence, it means that the increase in the wire length can be reduced by restricting the range of stitching the scan cells. Many same bit streams are then generated by the scan partition-based X-filling, as will be described in Section III-B.

Algorithm 1 shows the algorithm of the proposed scan partitioning. The N<sub>part</sub> denotes the number of scan partitions, which is same as the number of scan chains. Hence, the number of scan partitions is completely predetermined by what has already been decided to the number of scan chains.  $P^{i}$  indicates the *i*-th partition, N[x] is the number of  $P^{i}$ or  $L_{SC}$ , and  $E[P^i]$  is the entry of the *i*-th scan partition.  $SC \{sel\}, SC \{r\}$  denote the selected and reminded scan cells respectively, and  $L_{SC}$  is the scan cell list. The 10th line in Algorithm 1 determines that the scan cells comprise the smallest-area rectangle in  $E[P^i]$  by iterating the scan cell selection from  $L_{SC}$ . For example, in Fig. 4, "Scan Cell #1 and #2" are already included in the entry of a scan partition, and the number of scan cells is 3; hence  $E[P^{i}] =$  $[SC \{1\}, SC \{2\}], L_{SC} = [SC \{3\}, SC \{4\}], \text{ and } N[P^1] = 3$ (see Fig. 4(a)). There are two possible cases:  $SC \{sel\} =$  $[SC \{3\}]$  (Fig. 4(b)) and  $SC \{sel\} = [SC \{4\}]$  (Fig. 4(c)). As both sets must contain SC {1} and SC {2}, the rectangle must be extended from that of Fig. 4(a). After selecting SC {3} and SC {4}, the area of the extended rectangle is 256.75 and 346.50 respectively. Consequently, the final entry is  $E |P^{i}| =$ [*SC* {1}, *SC* {2}, *SC* {2}].

Finally, the proposed scan partitioning method generates each small rectangle. This operation lowers the length of the scan path and saves computation time in the subsequent scan stitching method. Fig. 5 shows the result of a simple experiment, in which 5 scan partitions were created on an actual s15850 circuit. Almost all of the scan cells in each partition were closely placed as desired. The information of the scan partition is used in the scan partition-based X-filling as discussed below.



Fig. 5. A result of the proposed scan partitioning method.

Algorithm 1 Proposed Scan Partitioning Algorithm

- 1: Specify the number of scan partitions,  $N_{part}$ , which same as the number of scan chains
- 2: Specify the number of scan cells in each scan partitions  $N[P^1] \sim N[P^{N_{part}}]$
- 3: Initialize entries of each scan partition,  $E[P^1] \sim E[P^{N_{part}}]$
- 4: Initialize scan cell list,  $L_{SC}$
- 5: for i = 1 to  $N_{part}$  do
- 6:  $E[P^i] \leftarrow$  the closest scan cell to (0,0) in  $L_{SC}$
- 7: **for** j = 1 to  $N[P^i] 1$  **do**
- 8: **for** l = 1 to  $N[L_{SC}]$  **do**
- 9:  $SC \{r\} \leftarrow$  one of reminded scan cells in  $L_{SC}$
- 10: **if** area of rectangle which is composed of positions of  $E[P^i]$  and  $SC\{r\}$  is smallest **do**
- 11:  $L_{SC} \leftarrow SC \{sel\}$
- 12:  $SC \{sel\} \leftarrow SC \{r\}$
- 13: else do
- 14:  $L_{SC} \leftarrow SC\{r\}$
- 15: **end if**
- 16: **end for**

17: 
$$E[P^{i}] \leftarrow SC \{sel\}$$

18: end for

19:end for

## B. Scan Partition-Based X-Filling

The post-ATPG X-filling combines the X-filling with test relaxation, so it requires a high X-bit ratio [39]. The test relaxation prevents a decrease in the test coverage and an increase in the number of test patterns. The test patterns include several X-bits and Touba [40] asserted that the unspecified bit density of industrial circuits is 95–99%, although they are generated from ATPG with dynamic compaction. Given this characteristic, it is important to determine a suitable value by using the X-filling method.

The aim of the proposed scan partition-based X-filling is making the same bit streams in each scan partition. Generally, adjacent fill is used for reducing shift-in transitions after reordering the scan cells in the previous works although it is not enough to consider the shift-out transitions. The proposed scan partition-based X-filling is performed as shown in Algorithm 2. This procedure applies the X-filling to both

| Algorithm | 2 | Procedure | of | Scan | Partition- | Based | X-Filling |
|-----------|---|-----------|----|------|------------|-------|-----------|
|-----------|---|-----------|----|------|------------|-------|-----------|

1: Analyze distribution of specified bits on input test patterns (stuck-at and transition delay fault) for each partition and pattern number.

2: Fill values in terms of each scan partition and pattern number according to result of step 1.

3: Perform pattern simulation.

|    | Input test pattern partitioned by Filling value<br>proposed scan partition |      |          |            |                     | Fully fil<br>by par | led inpu<br>tition-b | it test pa<br>ased X-f | atterns<br>ïlling |            |      |
|----|----------------------------------------------------------------------------|------|----------|------------|---------------------|---------------------|----------------------|------------------------|-------------------|------------|------|
|    | $\prod$                                                                    | ↓↓   | <b>•</b> | -Partition | n #1                | - → Pa              | rtition #2-          | <del>]</del>           |                   | +          | + ++ |
| 1: | 00XX                                                                       | 0XX1 | 1XXX     | X1X1 ·     | ····· <b>&gt;</b> ( | 0 1 1               | 1 ·····▶             | 0011                   | 0011              | 1111       | 1111 |
| 2: | XX1X                                                                       | 1XXX | X101     | XXXX       | ····· > :           | 1 0 1               | 1>                   | 1111                   | 1111              | 1101       | 0100 |
| 3: | 0X0X                                                                       | 01XX | 1X01     | xoxx ·     | ·····> (            | 0 0 0               | 1 ·····▶             | 0001                   | 0111              | 1001       | 0000 |
| 4: | XOXX                                                                       | XX1X | 0XXX     | X000 ·     | ·····> (            | 0 0 0               | 1>                   | 0011                   | 0011              | 0000       | 0000 |
| 5: | X1X0                                                                       | XXX0 | 11XX     | XXX0       | ·····>              | 1 0 1               | 0 ·····▶             | 1110                   | 1100              | 1100       | 0100 |
| 6: | X1XX                                                                       | 1XXX | XOXO     | XX0X-      | ·····>              | 1 0 0               | 0 ·····▶             | 1100                   | 1100              | 0000       | 0000 |
| 7: | 0X11                                                                       | XXXX | 1XXX     | XXOX-      | ····· ) (           | 0 0 1               | 1>                   | 0011                   | 0011              | 1101       | 0100 |
| 8: | X11X                                                                       | 111X | XXXX     | X1X1 ·     | ·····>              | 1 1 1               | 1>                   | 1111                   | 1111              | 1111       | 1111 |
|    | <b>⁺</b> †                                                                 |      | ** 1     | Par        | tition #3           | 3                   | ▲<br>Partiti         | ↑↑<br>ion #4           |                   | <b>* *</b> | ↑    |

Fig. 6. Example of applying proposed X-filling method.

stuck-at and transition delay test patterns because both test patterns are considered in the proposed scan reordering. To analyze their tendency, the initial test patterns are necessary extracted along the flow of the front-end design. Therefore, the additional work of the pattern extraction is not required for the proposed X-Filling.

For each partition, the input test pattern is analyzed by extracting from a distribution of specified bits whether the propositional of 0s is higher than 1s or not. Next, the unspecified bits are fully filled with 0s or 1s depending on the distribution result. Fig. 6 is the example of the procedure as discussed above. For example, the first input test pattern of 'Partition #1' has more 0s than 1s. So that pattern is fully filled by 0s on the right side of Fig. 6. After then, the pattern simulation is performed to obtain the responses by the ATPG tool. This is because the input test patterns and responses are required to stitch scan cells using the proposed method in next stage as described in Section III.C. The additional procedures in the partition-based X-filling incur negligible run time overhead, which will be confirmed in Section IV.D. The scan partition-based X-filling involves the two advantages. One is that the similarity of the pattern bit streams is maintained in each scan partition. Another is that the X-filling is simple with a low runtime overhead. The first advantage is especially useful for scan stitching by the proposed method.

#### C. Statistic-Based Scan Stitching

Using scan chain reordering method enables to lower energy consumption during the test and improve the frequency of the scan shift without damaging the chip. Nevertheless, if the designer considers only the energy consumption, it leads to occur the excessive routing overhead. Excessive routing causes a setup/hold violation during the scan shift operation. Therefore, the routing must be considered when the scan chain reordering method is developed for blocking the violation.

The final stage stitches the scan cells to reduce the scan shift power with a minimal increase in the routing. For this purpose, the scan partitioning and scan partition-based X-filling are performed in advance. The scan partition information limits the range of each reconfigured scan chain. Specifically, each scan chain is stitched within its own scan partition, which relives the wire length among the scan cells. Moreover, the fully filled test patterns applied to the scan partition-based X-filling improve the efficiency of the proposed statisticbased scan stitching by reducing the energy consumption. By maintaining several same bit streams, the scan stitching procedure reduces the probability of switching activities in part of the input test pattern. Moreover, it considers both the stuckat and transition delay test patterns. If more varied patterns are required, such as functional and bridge test patterns, they can be simply added to the stitching list after the proposed partition-based X-filling. For the EQ scalable testing, this step also requires the layout information, such as DEF file.

The ranges of numerical values are different between the power and the routing length, so some previous methods used the weighted value for each both the two values. Nevertheless, this weighted value must change as the range of these values, then its accuracy is significantly ambiguous. For this purpose, the proposed method uses a statistical analysis method for reducing the shift power under the routing overhead constraints and we define a new statistic value between two scan cells, the *S*-value ( $S(C^i, C^j)$ ), which is estimated as follows:

$$S\left(C^{i},C^{j}\right) = Z_{dist}\left(C^{i},C^{j}\right) + Z_{pwr}\left(C^{i},C^{j}\right)$$
(4)

where  $Z_{dist}(C^i, C^j)$  denotes the modified Z-value, which is also called the standard score, of the physical distance between the *i*-th and *j*-th scan cells.  $Z_{pwr}(C^i, C^j)$  is the corresponding modified Z-value of the WTM. The modified Z-values are respectively calculated as follows:

$$Z_{dist}\left(C^{i},C^{j}\right) = \left[\mu_{dist} - D\left(C^{i},C^{j}\right)\right]/\sigma_{dist} \qquad (5)$$

$$Z_{pwr}\left(C^{i},C^{j}\right) = \left[\mu_{pwr} - P\left(C^{i},C^{j}\right)\right]/\sigma_{pwr} \quad (6)$$

where the physical distance  $D(C^i, C^j)$  is obtained from the DEF file or a similar layout extraction file. The distance is simply written as follows:

$$D\left(C^{i},C^{j}\right) = \left|x_{s}^{i} - x_{s}^{j}\right| + \left|y_{s}^{i} - y_{s}^{j}\right|$$
(7)

where  $x_s^i$ ,  $y_s^i$  are the coordinate of x and y coordinates, respectively, on the *i*-th scan cell. The scan shift power,  $P(C^i, C^j)$ , is represented as follows:

$$P\left(C^{i}, C^{j}\right) = WTM_{load, i, j} + WTM_{unload, i, j}$$
(8)

where  $WTM_{load}$  and  $WTM_{unload}$  denote the scan load and unload of the WTM, respectively, as introduced in Section II-C. The statistics  $\mu$  and  $\sigma$  are the mean and standard deviation, respectively.

The proposed scan stitching method proceeds as shown in Fig. 7. Each scan chain is formed from the scan cells in a given scan partition, as mentioned above. First, the scan chain ordering method selects a scan partition for making a scan



Fig. 7. Procedure of the proposed statistic-based scan stitching method.

chain. It then selects a scan cell from the chosen scan partition, and calculates the *S*-value by using (4)–(8). The scan cell with the maximum *S*-value is stitched as the next scan cell. This procedure iterates until the scan chain is completely stitched, then repeats until all scan chains are formed.

For a more detailed explanation, an example is shown in terms of the shadow area in Fig. 7. Consider a partition with four scan cells and three test patterns, and the first scan cell is  $C^1$  as shown in Fig. 8. The left and right parts at the top of Fig. 8 show x and y coordinates of the scan cells and a part of the test patterns, respectively. Based on this information, S-values are compared in the bottom tables of Fig. 8. In the first step,  $C^4$  has the highest S-value (0.3738), although  $C^3$  is the closest to the first scan cell, so  $C^4$  is stitched with  $C^1$ . Next, the remaining two scan cells have the same distance from  $C^4$ . It means that the next stitched scan cell must be determined according to the power consumption,  $Z_{pwr}(C^4, C^i)$ . Hence, Z-values of Equation (4) for the distance are both 0. Next, each  $P(C^4, C^i)$  of candidates is 6 and 8 according to the given weighted value of the WTM, 2. As a result, each Z-value for the power  $(Z_{pwr}(C^4, C^i))$  is 0.7092 and -0.7092, so  $C^2$  is connected to  $C^4$  (each S-value is the same as each corresponding Z-value for the power). Lastly,  $C^2$  is automatically connected to  $C^3$  because the remaining scan cell is only  $C^3$ . As demonstrated in this example, the sequential procedure lowers the shift power with the minimal increase in the routing overhead. The superiority of the method is confirmed in Section IV.

#### **IV. EXPERIMENTAL RESULTS**

The proposed method was evaluated in the experiments on five ISCAS'89, five ITC'99, and two IWLS'05 OpenCores benchmark circuits. For comparing the performances of the proposed and previous methods, the synthesis and DFT insertion flow of all logics were covered using a Synopsys Design



Fig. 8. Example of statistic-based scan stitching by using the proposed method.

Compiler [41] and a DFT Compiler [42]. All generated and simulated test data were handled by Synopsys TetraMAX [43]. The layout information, such as the placement and routing information, was extracted by a Synopsys IC Compiler [44]. The experiments were performed using SAED 32nm technology, supported by the Synopsys ARMENIA Education Department.

## A. Probability of Switching Activities

We first estimated the probability of switching activities between two pairs of adjacent scan cells compared to the previous methods. These experiments were conducted on four ISCAS'89 benchmark circuits. It is important to reduce these switching activities because they can directly affect the scan shift power. The results of the proposed method and two existing methods are presented in Fig. 9. Each vertical bar in this figure is the average switching probability regarding the stuck-at test patterns. The proposed method had the lowest switching probability in three out of the four cases. As previously mentioned, the default ATPG option (first bar in each case) does not consider the energy consumption during scan-based testing; therefore, it induces excessive switching activities during the scan shift phase. Adjacent filling (second bar in each case) is among the most widely used filling options for reducing the energy consumption. The average probability of switching activities was 15.45% in adjacent fill, and 12.46% in the proposed method, which is 19.35% lower than that in the adjacent fill.

#### B. Energy Consumption During Shift Mode

Although the results show that the proposed method significantly reduces the switching probability whether it reduces the scan shift power for maintaining low energy consumption cannot be easily determined because where the transition that occurs is more important than the reduction of the numerical values due to (1). To account for the location of transitions, WTM was employed in the scan reordering and it used for

TABLE I Comparison of Energy Consumption With a Previously Proposed Scan Reordering Methods

| Donohmoniz        | Cinomit     | # of       | Fault Madal             | Total Scan Shift | Reduction       |        |
|-------------------|-------------|------------|-------------------------|------------------|-----------------|--------|
| Benchmark Circuit |             | Scan Cells | Fault Model             | [13]             | Proposed Method | Ratio  |
|                   | a12207      | 660        | Stuck-at                | 15,057.81        | 3,551.09        | 76.42% |
|                   | \$15207     | 009        | Transition Delay        | 15,714.68        | 3,573.51        | 77.26% |
|                   | a15950      | 507        | Stuck-at                | 11,197.81        | 5,209.29        | 53.48% |
|                   | \$13830     | 397        | Transition Delay        | 12,435.69        | 6,505.41        | 47.69% |
| 15CAS'80          | -25022      | 1 728      | Stuck-at                | 99,447.11        | 41,403.47       | 58.37% |
| 13CAS 89          | \$33932     | 1,720      | Transition Delay        | 134,446.68       | 63,681.32       | 52.63% |
|                   | s38417      | 1,636      | Stuck-at                | 102,765.60       | 30,970.06       | 69.86% |
|                   |             |            | Transition Delay        | 103,804.35       | 37,854.77       | 63.53% |
|                   | s38584      | 1,452      | Stuck-at                | 83,743.66        | 27,507.70       | 67.15% |
|                   |             |            | Transition Delay        | 88,642.45        | 20,057.49       | 77.37% |
| 1.00              |             | 400        | Stuck-at                | 8,690.14         | 2,240.57        | 74.22% |
|                   | 020         | 490        | Transition Delay        | 9,199.99         | 3,241.61        | 64.77% |
| 170,00            | h21         | 400        | Stuck-at                | 8,220.06         | 2,262.88        | 72.47% |
| TTC 99            | 021         | 490        | Transition Delay        | 9,166.34         | 3,248.38        | 64.56% |
|                   | <b>h</b> 22 | 725        | Stuck-at                | 21,429.92        | 4,335.58        | 79.77% |
|                   | 022         | /33        | Transition Delay        | 20,394.03        | 6,929.19        | 66.02% |
|                   |             |            | Average Reduction Ratio |                  |                 | 66.66% |



Fig. 9. Comparison of switching probability with the previous methods.

comparing the previous works. The results are shown in Table I and these experiments used 10 scan chains.

For a proper comparison, we selected a state-of-art work [13] as the main comparison target. As one of the latest scan reordering methods, this method considers both the scan shift power and the routing overhead. It also achieves outstandingly low energy consumption (about 50% reduction compared to the industrial tool) with relatively low routing overhead (about 7% growth compared to the industrial tool) with a single chain in comparison with other low power scan chain reordering. Relative to [13], the proposed method reduced the average energy consumption by approximately 66% compare to the conventional method. Note that the proposed method also outperformed [13] by an average of 68.96% in all cases of the stuck-at fault testing.

Although transition delay faults were not considered in [13], the experiment was conducted in the same manner as the stuck-at fault testing. The most cases of the existing method, including those in state-of-art [13] and [33], are proposed for only stuck-at fault test, despite the importance of transition delay testing, which becomes the essential test procedure. The proposed method again reduced the energy consumption for transition delay test, from 47.7% to 70.4% compared to [13].

The additional experiments were performed for applying very large design, including ITC'99 and OpenCores benchmark circuits, and these results are shown in Table II. These designs have over 2k flip flops and the results are useful to regard the tendency as the feasibility assessment for applying the real design. Stuck-at and transition delay fault are considered as the important fault model. The benchmark circuits in Table II have a lot of flip flops as well as much higher gate counts compared to other benchmark circuits. Here, the existing method [13] cannot be compared to the proposed method because the runtime of that increases relative to the number of the scan cells and the patterns exponentially due to the analysis of the test responses. For this reason, we compare the performance in the conventional flow, the adjacent filling. Nevertheless, the proposed method reduced the average energy consumption by approximately 40.59% compare to the adjacent filling.

Recently, industrial design trends favor a large number of short internal scan chains (multiple scan chain), because the test time can be reduced by increasing the number of patterns applied at the same time [43]. Therefore, the scan shift power must be reduced regardless of the number of scan chains. As shown Fig. 10, the proposed method outstandingly reduces the shift power on s38584 and ac97 benchmark circuits compared to the conventional adjacent filling. In this experiment, the four cases of the number of scan chains are used to observe the impact of the scan chain count. The energy consumption is always reduced without exception. Hence, the proposed method can improve the energy efficiency regardless of the

| TABLE II                                                          |
|-------------------------------------------------------------------|
| COMPARISON OF THE ENERGY CONSUMPTION FOR LARGE BENCHMARK CIRCUITS |

| Donohmark               | Circuit     | # of       | Fault Model      | Total Scan Shift | Reduction       |         |
|-------------------------|-------------|------------|------------------|------------------|-----------------|---------|
| Benchmark Circuit       |             | Scan Cells | Fault Model      | Adjacent Fill    | Proposed Method | Ratio   |
|                         | <b>h</b> 19 | 2 220      | Stuck-at         | 52,619.03        | 29,435.29       | 44.06%  |
| ITC'99 —                | 018         | 3,320      | Transition Delay | 47,286.57        | 53,644.35       | -13.45% |
|                         | h10         | 6,642      | Stuck-at         | 181,572.74       | 102,863.59      | 43.35%  |
|                         | 019         |            | Transition Delay | 154,796.95       | 173,453.33      | -12.05% |
|                         | ac97        | 2 280      | Stuck-at         | 57,731.83        | 38,827.85       | 32.74%  |
| OnenCones               |             | 2,289      | Transition Delay | 46,959.94        | 31,292.42       | 33.36%  |
| OpenCores               | dag narf    | 8,808      | Stuck-at         | 1,634,042.77     | 21,096.42       | 98.71%  |
|                         | des_peri    |            | Transition Delay | 154,267.90       | 3,131.06        | 97.97%  |
| Average Reduction Ratio |             |            |                  |                  |                 |         |



Fig. 10. Comparison of energy consumption with a variety of the number of scan chains.

number of scan chains, the fault models, and the size of the design. Besides, the test reliability is drastically deteriorated if the energy consumption exceeds the limit of the energy during the scan-based test. For preventing the degradation of the test reliability, the proposed method significantly reduces the energy consumption compared to the existing methods.

### C. Energy Consumption During Capture Mode

The most of low energy scan testing solutions is limited to consideration of only stuck-at faults or the transition delay faults. Hence, the most of solutions do not consider capture power for the transition delay test. During the capture mode for/of the input test patterns, all scan cell values are simultaneously replaced by the test responses. Additionally, during testing for transition delay faults, test responses are captured twice at the at-speed frequency that is the same as functional clock. In order to compare the capture power consumption during transition delay test, the experiments have been performed on the benchmark circuits as shown in Table III.

The capture power is estimated by counting the transitions which are generated during the capture cycles. In all of cases, the proposed method outperforms other methods. For s38584, the reduction of the peak power is relatively small compared to other circuits. However, during transition delay test, the

| TABLE III                       |            |
|---------------------------------|------------|
| COMPARISON OF THE CAPTURE POWER | ESTIMATION |

| Circuit | Adjace | ent Fill | [3    | 3]   | Prop. |      |
|---------|--------|----------|-------|------|-------|------|
| Circuit | Avg.   | Peak     | Avg.  | Peak | Avg.  | Peak |
| s13207  | 311.3  | 558      | 209.0 | 336  | 96.5  | 284  |
| s15850  | 255.4  | 519      | 245.9 | 378  | 69.7  | 191  |
| s38417  | 764.2  | 1,378    | 521.6 | 860  | 233.0 | 567  |
| s38584  | 674.1  | 1,204    | 362.1 | 581  | 210.1 | 540  |

specific result shows that the patterns whose transition count is over 400 transitions per pattern are frequently emerged in [33], while the transition count is significantly reduced by 81.6% using the proposed method. Relative to the adjacent fill, which is frequently used for low energy consumption, the capture power is reduced by average 70.0%. Moreover, it is also reduced by approximately 55.7% compared to [33].

#### D. Routing Overhead

Next, we presented the scan chain length in the proposed method compared to [13]. The units correspond to  $\mu m$  for each die area and scan chain length in this experiments. The results are shown in Table IV. The circuits, number of scan chains, and all other experimental conditions were same as Table I. Again, the proposed method obtained the best results in seven out of the eight cases. The scan chain length was 21.5% shorter in the proposed method than in [13]. Moreover, the comparison target [13] generates only 7% more total wire length than the industrial solution, which is an acceptable overhead in terms of the design perspective. Hence, the proposed method can relieve the setup/hold violation at high scan shift frequencies under using the lower energy consumption compared to [13]. According to these results, the proposed method is compatible with several real designs with high scan shift frequency without a heavy burden.

## E. Hardware Overhead

Routing overhead affects the increase in the hardware overhead. Several buffers and/or inverters whose area is even larger than the additional routing area are inserted to the

 TABLE IV

 Comparison of the Scan Chain Length With a Previously Proposed Scan Reordering Methods

| Danahmark        | Circuit | # of       | # of | Dia Area                | Scan Chain | Reduction       |        |
|------------------|---------|------------|------|-------------------------|------------|-----------------|--------|
| Benchmark Circui | Circuit | Scan Cells | SCs  | Die Alea                | [13]       | Proposed Method | Ratio  |
|                  | s13207  | 669        |      | 2,440.10 x 2,359.39     | 247,261.90 | 209,533.22      | 15.3%  |
| ISCAS'89         | s15850  | 597        |      | 1,840.00 x 1,840.00     | 124,420.82 | 121,168.47      | 3.4%   |
|                  | s35932  | 1,728      | 10   | 4,360.02 x 4,400.00     | 969,672.45 | 960,598.50      | 0.9%   |
|                  | s38417  | 1,636      |      | 2,160.12 x 2,160.00     | 286,895.60 | 334,609.00      | -16.6% |
|                  | s38584  | 1,452      |      | 3,840.02 x 3,718.73     | 675,820.42 | 498,786.33      | 26.2%  |
|                  | b20     | 490        |      | 1,360.14 x 1,360.00     | 78,736.46  | 53,667.70       | 31.8%  |
| ITC'99           | b21     | 490        |      | 1,360.14 x 1,360.00     | 81,086.68  | 52,573.00       | 35.2%  |
|                  | b22     | 735        |      | 1,360.14 x 1,360.00     | 118,918.87 | 68,441.95       | 42.4%  |
|                  |         |            | A    | Average Reduction Ratio |            |                 | 21.5%  |

TABLE V Comparison of Hardware Area for Layout

|           |         | Hardware Area (unit: µm <sup>2</sup> ) |                       |                       |  |  |  |
|-----------|---------|----------------------------------------|-----------------------|-----------------------|--|--|--|
| Benchmark | Circuit | Circuit w/o scop                       | Circuit w/ scan &     | Circuit w/ scan &     |  |  |  |
|           |         | Circuit w/o scall                      | No scan reordering    | Proposed method       |  |  |  |
| ISCAS'89  | s13207  | 3,574,696.90 (1.0000)                  | 3,582,664.92 (1.0022) | 3,584,478.45 (1.0027) |  |  |  |
|           | s38417  | 3,249,393.77 (1.0000)                  | 3,269,483.43 (1.0062) | 3,278,972.39 (1.0091) |  |  |  |
|           | s38584  | 6,307,759.97 (1.0000)                  | 6,327,220.56 (1.0031) | 6,329,505.26 (1.0034) |  |  |  |
| ITC'99    | b22     | 1,688,548.02 (1.0000)                  | 1,698,652.92 (1.0059) | 1,701,263.30 (1.0075) |  |  |  |

design for blocking the setup/hold violations caused by the additional routing. To examine the effect of the proposed method with respect to the hardware area, the comparative experiments are performed on the benchmark circuits as shown in Table V. Here, the third column means the circuit does not include in any DFT design, the fourth column indicates the circuit with only scan design, and the fifth column is applied to the proposed scan reordering with the scan design. In Section IV-D, the reduction ratio of b22 shows the best performance, while that of s38417 gets worse rather than the existing method. However, the proposed method has a little effect with respect to the hardware area compared to the circuit with scan design regardless of the circuit types. For s38417, the scan design and the proposed method only require the 0.91% growth compared to the original raw circuit. Specifically, the proposed method increases 0.3% hardware area compared to the scan design. Likewise, the area overhead of other circuits is under 1% growth compared to the original circuit when the scan design and the proposed method are applied.

## F. Runtime Overhead

Unlike the conventional DFT setup, the proposed method requires the scan partitioning and X-filling. These additional works might be concerned about the increased runtime overhead. When the number of scan chains is 10, the estimated runtimes of the proposed method are illustrated in Fig. 11. In this graph, the unit corresponds to sec and the notation unit is limited to 600sec although b17 was required at over 3000sec in the previous work [13]. Regardless of the circuit



Fig. 11. Runtime overhead of proposed method.

size, the runtime of the scan partition and the partition-based X-filling was almost under 1.0 second. Finally, the runtime of the scan reordering is impact factor to determine the runtime. In the experiments of the large circuits, it is clear that the scan reordering was the dominant part compared to the other procedures. However, the longest runtime was 365sec in the experiment of b19 circuit, while the other circuits take no more than 50sec. Even the runtime of ac97 and des\_perf benchmark circuit was only 8.28 and 12.45sec, respectively. Besides, the runtime overhead was reduced as the number of scan chains raised due to the restricted range of scan stitching through the

 TABLE VI

 Comparison of Three Main Factors with [33]

| Circuit | Shift Pow | er (WTM) | Scan Chain Length (µm) |           |  |
|---------|-----------|----------|------------------------|-----------|--|
|         | [33]      | Prop.    | [33]                   | Prop.     |  |
| s13207  | 3772.2    | 3565.9   | 192,643.9              | 209,533.2 |  |
| s15850  | 8053.3    | 5988.7   | 98,707.7               | 121,168.5 |  |
| s38584  | 30,739.5  | 21,701.7 | 431,856.0              | 498,786.3 |  |
| b20     | 7,418.1   | 2,844.0  | 43,053.5               | 53,667.7  |  |



Fig. 12. Tendency graph of proposed method and [13].

scan partition. Especially, the previous work [13] increases the runtime exponentially, while the proposed method keeps the runtime steady. This result is suitable to the industrial trend with increasing the number of scan chains. Relative to [13], the proposed method has a negligible runtime overhead explicitly.

## G. Comparison of Performances With [33]

For comparing the performance between the proposed method and [33], the experiments for the energy consumption and the scan chain length have been performed on the benchmark circuits as shown in Table VI. Here, both third and fifth columns are the result of the proposed method. The test patterns are generated for testing stuck-at and transition delay faults. Relative to [33], the scan shift power is reduced by approximately 30.6% although the scan chain length is increased by average 19.88%. Especially, the proposed method outperforms [33] as the circuit size increases. In addition, as presented in Section IV-E, since the incremental of the scan chain length is relatively less critical compared to the total scan shift power, it only requires at most 0.91% growth of the hardware area for blocking the setup/hold violation.

## H. Comparison of Overall Results

Finally, the overall performance was evaluated by comparing the routing length and the energy consumption of the proposed method and the comparison target [13] as shown in Fig. 12. In this figure, the green dot and line indicate the results and tendency of the existing method [13], while the navy dot and line denote the results and tendency of the proposed method. Comparing to [13], the proposed method obtained significant low energy consumption with a decrease in the routing length. As a result, the trend line of the proposed method moved closer to the ideal case.

Especially, the gap between the two lines is larger as the size of the benchmark circuit increase. It is significantly important when applying the industrial circuit because the real designs have more gate counts, scan cells, and the complex routing the compared to the benchmark circuits. Nevertheless, the proposed method maintained a low routing overhead and improved the energy efficiency despite of the larger logic.

## V. CONCLUSION

In this paper, a new scan chain reordering method is presented for EQ scalable test. The method proceeds sequentially through scan partitioning, scan partition-based X-filling, and statistic-based scan stitching. It simultaneously reduces the energy consumption during the scan shift without the quality degradation. The main contributions of the proposed method are reduction of energy consumption during the scanbased test, reduction of the scan chain length during routing and the reduction of the runtime overhead. The statistical measure known as the S-value proved significantly useful in balancing these three factors. Moreover, the proposed method is independent of design environment, such as the number of scan chains and the size of the logics. The new scan chain reordering method covers frequently used faults, including stuck-at and transition delay faults. The proposed method reduced the energy consumption by 66.6% relative to [13], and the scan chain length by 21.5% relative to a state-of-art method [13]. Besides, both energy consumption during the scan shift mode and capture mode are reduced by approximately 30.6% and 55.7%, respectively, while the routing is increased by 19.9% compared to [33]. As a result, the proposed scan chain reordering method enables to reduce the energy reduction without the quality degradation, thereby the test time can be significantly reduced by increasing the test frequency.

#### REFERENCES

- [1] ITRS. (2013). Edition Reports. [Online]. Available: http://www.itrs2.net
- [2] G. Beanato *et al.*, "Design and testing strategies for modular 3-Dmultiprocessor systems using die-level through silicon via technology," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 2, no. 2, pp. 295–306, Jun. 2012.
- [3] G. Karakonstantis, A. Chatterjee, and K. Roy, "Containing the nanometer 'Pandora-Box': Cross-layer design techniques for variation aware low power systems," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 1, no. 1, pp. 19–29, Mar. 2011.
- [4] M. Alioto, "Energy-quality scalable adaptive VLSI circuits and systems beyond approximate computing," in *Proc. Design Autom. Test Eur. Conf. Exhibit.*, Munich, Germany, Mar. 2017, pp. 127–132.
- [5] A. Raychowdhury *et al.*, "Error detection and correction in microprocessor core and memory due to fast dynamic voltage droops," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 1, no. 3, pp. 208–217, Sep. 2011.
- [6] P. Girard, X. Wen, and N. A. Touba, "Low-power testing," in System-on-Chip Test Architectures: Nanometer Design for Testability, L.-T. Wang, C. E. Stroud, and N. A. Touba, Eds. San Francisco, CA, USA: Morgan Kaufmann, 2007, ch. 1.
- [7] E. Alpaslan, Y. Huang, X. Lin, W.-T. Cheng, and J. Dworak, "On reducing scan shift activity at RTL," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 29, no. 7, pp. 1110–1120, Jul. 2010.

- [8] S. Remersaro, X. Lin, S. M. Reddy, I. Pomeranz, and J. Rajski, "Scanbased tests with low switching activity," *IEEE Des. Test Comput.*, vol. 24, no. 3, pp. 268–275, May 2007.
- [9] I. Pomeranz, "Skewed-load test cubes based on functional broadside tests for a low-power test set," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 23, no. 3, pp. 593–597, Mar. 2015.
- [10] D. R. Bild *et al.*, "Temperature-aware test scheduling for multiprocessor systems-on-chip," in *Proc. Int. Conf. Comput.-Aided Design*, San Jose, CA, USA, Nov. 2008, pp. 59–66.
- [11] P. Girard, "Survey of low-power testing of VLSI circuits," *IEEE Des. Test Comput.*, vol. 19, no. 3, pp. 82–92, May/Jun. 2002.
- [12] A. Chandra and K. Chakrabarty, "Combining low-power scan testing and test data compression for system-on-a-chip," in *Proc. Design Autom. Conf.*, New York, NY, USA, Jun. 2001, pp. 166–169.
- [13] Y.-Z. Wu and M. C.-T. Chao, "Scan-cell reordering for minimizing scan-shift power based on nonspecified test cubes," ACM Trans. Design Autom. Electron. Syst., vol. 16, no. 1, Nov. 2010, Art. no. 10.
- [14] S. Seo, Y. Lee, J. Lee, and S. Kang, "A scan shifting method based on clock gating of multiple groups for low power scan testing," in *Proc. IEEE Int. Symp. Quality Electron. Design*, Santa Clara, CA, USA, Mar. 2015, pp. 162–166.
- [15] S. Seo *et al.*, "Scan chain reordering-aware X-filling and stitching for scan shift power reduction," in *Proc. IEEE Asian Test Symp.*, Mumbai, India, Nov. 2015, pp. 1–6.
- [16] S. Eggersglüß, S. Holst, D. Tille, K. Miyase, and X. Wen, "Formal test point insertion for region-based low-capture-power compact at-speed scan test," in *Proc. IEEE Asian Test Symp.*, Hiroshima, Japan, Nov. 2016, pp. 173–178.
- [17] C. P. Ravikumar, M. Hirech, and X. Wen, "Test strategies for low power devices," in *Proc. Design Autom. Test Eur. Conf. Exhibit.*, Munich, Germany, Mar. 2008, pp. 728–733.
- [18] W. Zhao, M. Tehranipoor, and S. Chakravarty, "Power-safe test application using an effective gating approach considering current limits," in *Proc. IEEE VLSI Test Symp.*, Dana Point, CA, USA, May 2011, pp. 160–165.
- [19] J. Li, Q. Xu, Y. Hu, and X. Li, "X-filling for simultaneous shift- and capture-power reduction in at-speed scan-based testing," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 18, no. 7, pp. 1081–1092, Jul. 2010.
- [20] F.-W. Chen, S.-L. Chen, Y.-S. Lin, and T. T. Hwang, "A physicallocation-aware fault redistribution for maximum IR-drop reduction," in *Proc. Asia South Pacific Design Autom. Conf.*, Yokohama, Japan, Jan. 2011, pp. 701–706.
- [21] R. Gulve and V. Singh, "ILP based don't care bits filling technique for reducing capture power," in *Proc. IEEE East-West Design Test Symp.*, Yerevan, Armenia, Oct. 2016, pp. 1–4.
- [22] X. Wen *et al.*, "Power-aware test generation with guaranteed launch safety for at-speed scan testing," in *Proc. IEEE VLSI Test Symp.*, Dana Point, CA, USA, May 2011, pp. 166–171.
- [23] Y.-H. Li, W.-C. Lien, I.-C. Lin, and K.-J. Lee, "Capture-power-safe test pattern determination for at-speed scan-based testing," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 33, no. 1, pp. 127–138, Jan. 2014.
- [24] S. Ahlawat and J. T. Tudu, "On minimization of test power through modified scan flip-flop," in *Proc. VLSI Design Test*, Guwahati, India, May 2016, pp. 1–6.
- [25] Y.-T. Lin, J.-L. Huang, and X. Wen, "A transition isolation scan cell design for low shift and capture power," in *Proc. IEEE Asian Test Symp.*, Niigata, Japan, Nov. 2012, pp. 107–112.
- [26] E. Arvaniti and Y. Tsiatouhas, "Low-power scan testing: A scan chain partitioning and scan hold based technique," *J. Eletron. Test.*, vol. 30, no. 3, pp. 329–341, Jun. 2014.
- [27] H. Lim, W. Kang, S. Seo, Y. Lee, and S. Kang, "Low power scan bypass technique with test data reduction," in *Proc. IEEE Int. Symp. Quality Electron. Design*, Santa Clara, CA, USA, Mar. 2015, pp. 173–176.
- [28] T. Wu, L. Zhou, and H. Liu, "Reducing scan-shift power through scan partitioning and test vector reordering," in *Proc. IEEE Int. Conf. Eletron. Circuits Syst.*, Tamil Nadu, India, Dec. 2014, pp. 498–501.
- [29] Y. Bonhomme, P. Girard, C. Landrault, and S. Pravossoudovitch, "Power driven chaining of flip-flops in scan architectures," in *Proc. Int. Test Conf.*, Baltimore, MD, USA, Oct. 2002, pp. 796–803.
- [30] Y. Borthomme, P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, "Efficient scan chain design for power minimization during scan testing under routing constraint," in *Proc. Int. Test Conf.*, Charlotte, NC, USA, Sep./Oct. 2003, pp. 488–493.

- [31] M. Li, A. Cui, and T. Yu, "An improved scan cell ordering method using the scan cells with complementary outputs," in *Proc. Int. Symp. Integr. Circuits*, Singapore, Dec. 2014, pp. 103–106.
- [32] A. Cui, T. Yu, G. Qu, and M. Li, "An improved scan design for minimization of test power under routing constraint," in *Proc. IEEE Int. Symp. Circuits Syst.*, Lisbon, Portugal, May 2015, pp. 629–632.
- [33] S. Pathak, A. Grover, M. Pohit, and N. Bansal, "LoCCo-based scan chain stitching for low-power DFT," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 25, no. 11, pp. 3227–3236, Nov. 2017.
- [34] P. Girard, "Low power testing of VLSI circuits: Problems and solutions," in *Proc. IEEE Int. Symp. Quality Electron. Design*, San Jose, CA, USA, Mar. 2000, pp. 173–179.
- [35] K. Sankaralingam, R. R. Oruganti, and N. A. Touba, "Static compaction techniques to control scan vector power dissipation," in *Proc. IEEE VLSI Test Symp.*, Montreal, QC, Canada, Apr./May 2000, pp. 35–40.
- [36] W. L. Bircher and L. K. John, "Core-level activity prediction for multicore power management," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 1, no. 3, pp. 218–227, Sep. 2011.
- [37] J. Saxena et al., "A case study of IR-drop in structured at-speed testing," in Proc. Int. Test Conf., Charlotte, NC, USA, Sep./Oct. 2003, pp. 1098–1104.
- [38] S. Kundu, T. M. Mak, and R. Galivanche, "Trends in manufacturing test methods and their implications," in *Proc. Int. Test Conf.*, Charlotte, NC, USA, Oct. 2004, pp. 679–687.
- [39] K. Miyase *et al.*, "Effective IR-drop reduction in at-speed scan testing using distribution-controlling X-identification," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design*, San Jose, CA, USA, Nov. 2008, pp. 52–58.
- [40] N. A. Touba, "Survey of test vector compression techniques," *IEEE Des. Test Comput.*, vol. 23, no. 4, pp. 294–303, Apr. 2006.
- [41] Design Compiler User Guide, Version I-2013.12-SP1. Synopsys Inc, Mountain View, CA, USA, Jan. 2014.
- [42] DFT Compiler User Guide, Version I-2013.12-SP1. Synopsys Inc, Mountain View, CA, USA, Jan. 2014.
- [43] TetraMAX ATPG User Guide, Version I-2013.12-SP4. Synopsys Inc, Mountain View, CA, USA, May 2014.
- [44] IC Compiler User Guide, Version H-2013.03-ICC-SP2. Synopsys Inc, Mountain View, CA, USA, May 2013.
- [45] Z. Zhang and R. D. McLeod, "An efficient multiple scan chain testing scheme," in *Proc. 6th Great Lakes Symp. VLSI*, Ames, IA, USA, Mar. 1996, pp. 294–297.



**Sungyoul Seo** received the B.S. degree in electronic engineering from Kwangwoon University, Seoul, South Korea, in 2013. He is currently pursuing the combined Ph.D. degree with the Department of Electrical and Electronic Engineering, Yonsei University, Seoul.

His research interests include design for testability, scan-based testing, test data compression, low power testing, logic and memory testing, and built-off self test.



**Keewon Cho** received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2013, where he is currently pursuing the combined Ph.D. degree with the Department of Electrical and Electronic Engineering.

His current research interests include built-in self-repair, built-in self-testing, built-in redundancy analysis, redundancy analysis algorithms, reliability, and built-off self test.



Young-Woo Lee received the B.S. degree in electronic engineering from Inha University, Incheon, South Korea, in 2011, and the M.S. degree in electrical and electronic engineering from Yonsei University, Seoul, South Korea, where he is currently pursuing the Ph.D. degree from the Department of Electrical and Electronic Engineering.

He was an Application Engineer with Teradyne, Seoul. His current research interests include SOC design and testing, test methodology, built-in redundancy analysis, and design for testability.



Sungho Kang (M'89–SM'15) received the B.S. degree in control and instrumentation engineering from Seoul National University, Seoul, South Korea, in 1986, and the M.S. and Ph.D. degrees in electrical and computer engineering from The University of Texas at Austin, Austin, TX, USA, in 1988 and 1992, respectively.

He was a Research Scientist with the Schlumberger Laboratory for Computer Science, Schlumberger Inc., Austin, TX, USA, and a Senior Staff Engineer with Semiconductor Systems Design

Technology, Motorola Inc., Austin, TX, USA. Since 1994, he has been a Professor with the Department of Electrical and Electronic Engineering, Yonsei University, Seoul. His current research interests include VLSI/SOC design and testing, design for testability, design for manufacturability, and fault tolerant computing.