IDEC Journal of Integrated Circuits And Systems jicas.idec.or.kr



Volume 5 • Number 4 • October 2019 ISSN -2384-2113 (Online)

IC DESIGN EDUCATION CENTER





Volume 5 • Number 4 • October 2019 jicas.idec.or.kr



IDEC Journal of Integrated Circuits And Systems

# IDEC Journal of Integrated Circuits And Systems

IDEC Journal of Integrated Circuits And Systems jicas.idec.or.kr

# Editorial Committee

### **Editor in Chief**

In Cheol Park

### **Associate Editors**

Byung In Moon Kyungpook National University bihmoon@knu.ac.kr

Byung Sub Kim Pohang University of Science and Technology byungsub@postech.ac.kr

Goang Seog Choi Chosun University gschoigs@chosun.ac.kr

Hyoung Ho Ko Chungnam National University hhko@cnu.ac.kr

Jae Ha Kim Seoul National University jaeha@snu.ac.kr KAIST icpark@kaist.edu

Ji Hoon Kim Ewha Womans University jihoonkim@ewha.ac.kr

Kang Yoon Lee Sungkyunkwan University klee@skku.edu

Kwang Sub Yoon Inha University ksyoon@inha.ac.kr

Seung Tak Ryu KAIST stryu@kaist.ac.kr

Tae Wook Kim Yonsei University taewook.kim@yonsei.ac.kr



### **Overview**

Since its premiere in the spring of 2015, each issue mainly covers integrated circuit design research results from IDEC's MPW program. JICAS selects the best research papers among all final reports and promotes to improve the MPW program's research result. It aims to archive and share the IDEC's integrated circuit design research.

# Editorial Assistant

Kyung Ok Lee (IC Design Education Center) 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea Tel : 82-42-350-8533 Fax : 82-42-350-8540 E-mail : kyungoklee@idec.or.kr

IDEC Journal of Integrated Circuits And Systems is published in every quarter by the IC Design Education Center. Responsibility for the contents rests upon the authors and its members, not upon the IDEC.



jicas.idec.or.kr JICAS

# A Fully-Integrated High-Voltage Generation IC for Implantable Medical Devices

Myeong Gyu Song and Hyouk Kyu Cha

P. 02

\_

## A Low-Power 8-bit Switched Capacitor Convolution Engine Optimized for Artificial Neural Networks in 65nm CMOS

Beom Kyu Seo and Jin Tae Kim

P. 08

## Implementation of RVDT Signal Conditioner Based-on Costas Loop

Yong Heum Yeon, Seung Won Yang and Jong Yeol Lee

P. 13

### Security SoC Architecture with Hardware-Based Pre-Authentication

Won Bae Kong, Pil Joo Choi and Dong Kyue Kim

– P. 18

## Two-negative feedback loop PLL with frequency voltage converter loop

Dae Hyun Moon and Young Shig Choi

P. 23

# A Fully-Integrated High-Voltage Generation IC for Implantable Medical Devices

### Myeong Gyu Song<sup>1</sup> and Hyouk Kyu Cha<sup>a</sup>

Department of Electrical and Information Engineering, Seoul National University of Science and Technology E-mail : <sup>1</sup>aud1832@naver.com

Abstract - This work presents the design of a fully-integrated high-voltage charge pump IC for implantable medical devices using 0.18- $\mu$ m CMOS process. The implemented charge pump IC is used to generate high-voltage DC supply of around 12.8 V for the neural stimulator circuit using 3.2-V input voltage with on-chip pumping and load capacitors. The proposed hybrid charge pump IC is comprised of a feed-forward high-efficiency capacitive pumping path and an input voltage modulated feedback regulation path to maintain the output voltage with varying load current of up to 300  $\mu$ A. The proposed IC achieves around 46% power efficiency at maximum current load condition.

*Keywords*—Charge pump, High-voltage, Implant device, Neural stimulation

### I. INTRODUCTION

Bidirectional implantable neural stimulation and recording can be used for medical treatment for neural disorders, such as deafness, blindness, and motion disorders [1]-[4]. By recording and analyzing neural signals before and/or after the stimulation, customized stimulation parameters can be decided for each individual and/or better understanding of the stimulation effects can be studied. To enable it, both neural recording and stimulation circuits have to be installed in each electrode and integrated in a single IC with minimized area to achieve small form factor for the overall medical implant device. Considering multi-array systems, the need for small area and low power become an important parameter that need to be addressed in the design process. The basic idea of stimulation is to deliver and recover controlled amount of charge to the tissue through the electrode. However, due to the high impedance of the electrode and tissue interface, high voltage compliance is needed to deliver sufficient amount of charge. On the other hand, low voltage supply is preferred for the neural recording circuits to avoid excessive power consumption. As several supply voltages are required within the IC, a fullyintegrated high-voltage



Fig. 1. Block diagram of HVGCP IC

generation charge pump (HVGCP) circuit is included to generate the DC supply voltage for the stimulator circuits inside the IC so that the number of pins can be reduced. The HVGCP should be designed with high efficiency considering the limitation of wirelessly transmitted power to the implant from the external device. In addition, the HVGCP should be fully-integrated so that external on-board capacitors do not need to be used which increases the cost and the overall form factor of the implant device. In addition to designing an efficient feedforward charge pump, a regulation function is required to make sure the output voltage does not change with varying load current. Among several schemes, most widely used method is pulse frequency modulation (PFM) [5], [6] which controls the pumping clock frequency depending on the load condition. Although this method has been proven to work well, the efficiency can be degraded due to switching loss at high pumping frequency at heavy load current. This method also has limitation in the load regulation performance at minimal load current.

This work proposes a fully-integrated highly-efficient HVGCP employing a hybrid Dickson and Cockcroft-Walton four-stage charge pump core with an input voltage modulated [7] regulation loop to generate a reliable high voltage supply for the neural stimulator. The rest of this paper is organized as follows. The system architecture is presented in Section II. Section III describes the circuit design in detail and Section IV presents the experimental results. The conclusions are given in Section V.

### II. ARCHITECTURE

Figure 1 shows the system architecture of the proposed system. This work includes the HV generation charge pump, regulation circuits, non-overlap clock generator, and peripheral bias circuits. The following neural stimulation

a. Corresponding author; hkcha@seoultech.ac.kr

Manuscript Received Sep. 02, 2019, Revised Sep. 09, 2019, Accepted Sep. 20, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<u>http://creativecommons.org/licenses/bync/3.0</u>) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.



Fig. 2. Conventional charge pump architectures (feedforward path) (a) Dickson charge pump (b) Symmetric Dickson charge pump (c) Cockcroft-Walton charge pump (d) Symmetric Cockcroft-Walton charge pump

circuit, which will utilize the charge pump output voltage as its supply voltage, is assumed to have stimulation current ranges between a few tens of microamperes to maximum current of 300 microamperes. The target output voltage of the charge pump is set to be 12.8 V [8].

A 3.2-V DC voltage is first inputted to the low dropout (LDO) regulator input and after some voltage drop, the LDO output voltage is pumped to 12.8 V output by the four-stage charge pump core. A four-stage hybrid charge pump core circuit based on Dickson and Cockcroft-Walton architectures is proposed for the feedforward path to generate the high-voltage output with good power efficiency using a fixed 40 MHz pumping clock. For the regulation function, a feedback path consisting of resistors R<sub>F1</sub> and R<sub>F2</sub>, error amplifier EA, and the LDO block are used to monitor and maintain the output voltage with varying output current load. The charge pump output voltage is fed-back through the resistive divider to the error amplifier and the output of this error amplifier is applied as the reference voltage of the LDO regulator circuit to control the charge pump core input voltage V<sub>IN CP</sub> depending on the load current. More details regarding the circuit design is discussed in the next subsection.



(b)

Fig. 3. (a) Proposed architecture of hybrid four-stage charge pump with twostage Dickson and two-stage Cockcroft-Walton topologies (b) Circuit schematic of symmetrical latched core pumping circuit

VIN

### III. CIRCUIT DESIGN

Figure 2 shows the previous well-established charge pump topologies that have been proposed. The Dickson architecture [9] in Fig. 2(a) and (b) is well-known for its



Fig. 4. Circuit schematic of LDO circuit

good efficiency but has its limitations in achieving small area as MOSCAP cannot be used as the pumping capacitance in the later stages due to possible breakdown issue. On the other hand, the Cockcroft-Walton architecture [10] in Fig. 2(c) and (d) can employ MOSCAP as its pumping capacitance without reliability issue and thus high integration can be achieved. However, the Cockcroft-Walton architecture is known to have poorer power efficiency in comparison to the Dickson architecture [11].

Figure 3 shows the proposed four-stage charge pump core circuit. A hybrid four-stage topology using two-stage Dickson (Stages 1 and 2) and two-stage Cockcroft-Walton architecture (Stages 3 and 4) is employed which allows to use both metal-insulator-metal (MIM) and MOSCAP as pumping capacitors without reliability issues while achieving relatively good efficiency and small layout area. The capacitors C<sub>1</sub> to C<sub>4</sub> are realized using MOSCAPS with values of 20 pF for C1 and C2, and 10 pF for C3 and C4. The capacitors C<sub>5</sub> to C<sub>8</sub> are MIM capacitors with 10 pF values. An MIM capacitor of 50 pF is used for the load capacitance. The 3.3-V thick oxide transistors are used as pumping switches in a cross-coupled connection. The core PMOS transistors have dimensions of 18 µm/0.36 µm and NMOS transistors have 10.8 µm/0.36 µm. The respective body terminals of the switches are always connected to the lowest voltage between its drain and source (highest voltage for the PMOS transistors). A dynamic body bias circuit is utilized for this purpose. If the body bias circuit for the upper-left PMOS core transistor is taken as an example, if the source terminal of the upper body bias transistor is at a higher potential than the source terminal of the lower body bias transistor, the lower body bias transistor is turned OFF while the upper body bias transistor is turned ON. This creates a short path between the source terminal of the upper body bias transistor and body terminal of the core PMOS transistor, which ultimately connects the body terminal to the higher potential. The switch implemented using deep n-well NMOS transistor needs two pairs of dynamic body bias transistors to make sure its substrate and n-well are always connected to the correct voltage during operation.

The well-known output voltage equation of a

conventional Dickson charge pump circuit is decided by:



Fig. 5. (a) Post-layout transient simulation plot of designed charge pump with load current switching between 0 and 300  $\mu A$  (b) ripple characteristic of the output voltage



Fig. 6. Post-layout simulated output voltage and power efficiency versus change in load current

$$V_{OUT} = (V_{IN} - V_t) + N \cdot \left[V_{\varphi} \frac{c_p}{c_p + c_s} - V_t - \frac{I_L}{f_{osc}(c_p + c_s)}\right]$$
(1)







(b)

Fig. 7. (a) Chip micrograph and (b) Chip-on-board (COB) packaged IC and measurement  $\ensuremath{\mathsf{PCB}}$ 

where  $V_{IN}$  is the input voltage of charge pump, Vt is the threshold voltage of MOS transistor, V $\Phi$  is the pumping clock signal amplitude,  $f_{OSC}$  is the pumping clock frequency, Cp is capacitance of pumping capacitor, Cs is the stray capacitance, and I<sub>L</sub> is the load current.

For the latched symmetrical charge pump used in this design, eq. (1) can be reduced to [7];

$$V_{OUT} = (N+1)V_{IN} - N(\frac{I_L}{2f_{osc}c_p})$$
 (2)

where both Vt and Cs are assumed to be zero or negligible and the voltage level of  $V\Phi$  is equal to  $V_{IN}$ . It can be understood from equ. (2) that with changing load current, either the clock frequency or the input voltage can be controlled to regulate the output voltage. The disadvantage of controlling the clock frequency is that at very light load condition (I<sub>L</sub> $\approx$ 0), output regulation may not be achieved as the second part of equ. (2) cancels out at very light load condition. In addition, the efficiency may degrade at heavy load due to increase in pumping frequency due to increase in dynamic switching loss. Thus in this work, input voltage modulation regulation method is chosen to maintain the output voltages with varying load current. Input voltage modulation can achieve good efficiency for the charge pump due to the utilization of a fixed low-frequency clock and advantage with regards to the load regulation performance.





Figure 4 presents the capacitor-less LDO regulator circuit for sub-charge pump 1. The LDO circuit is comprised of a power transistor  $M_P$ , feedback resistors  $R_{L1}$  and  $R_{L2}$ , an error amplifier, a buffer, and a limiter. The limiter circuit is included to limit the current in  $M_P$ , during the beginning stages of the charge pump start-up where it draws a large amount of current. Much attention is given to meeting the stability requirement of the LDO, especially at light load condition at the charge pump output. Miller compensation is used in the error amplifier to meet the stability conditions. The non-overlapping clock generation circuit utilizes NAND gates and chain of inverters to generate CLK1 and CLK2 signals for charge pumping.

Figure 5 shows the post-layout transient simulation of the designed regulated charge pump with switching load current between 0 and 300  $\mu$ A. It can be observed that due to the regulation loop, the drop in the output voltage is minimal and is maintained at the required output voltage.

Figure 6 shows the post-layout simulated output voltage and power conversion efficiency versus change in load current plot. At 300  $\mu$ A load current, the efficiency is 46.1% and the maximum efficiency is over 50% at 400  $\mu$ A load current.

At light current load, the power loss in the charge pump is decided by the bias currents for the LDO/EA circuit and voltage drop at the power transistor of the LDO and pumping switch. When the load current increases, the bias current effect to the efficiency is reduced and the voltage drop in power transistor of LDO will also decrease, which improves the efficiency.

#### IV. EXPERIMENTAL RESULTS

The proposed charge pump IC is fabricated in a 0.18 µm standard CMOS process. The chip micrograph and photograph of PCB measurement board is shown in Fig. 7(a) and (b), respectively. The core die area is 0.53 mm<sup>2</sup>. The IC is packaged using chip-on-board method and soldered on a two-layer FR4 PCB. The 40 MHz pumping clock is applied externally using an arbitrary waveform generator.

Figure 8 shows the measured output voltage at no load condition. The input voltage of 3.2 V is boosted up to around 12.8 V. However, with heavier load condition, the regulation performance is observed to be degraded compared to the simulation results. Some additional leakage current within

| Parameter                           | This work         | [11]           | [12]           |
|-------------------------------------|-------------------|----------------|----------------|
| Number of stages                    | 4                 | 4, 6           | 3, 5           |
| Input voltage (V)                   | 3.2               | 1              | 1.8            |
| Output voltage (V)                  | 12.8              | 3-6            | 5-8.5          |
| Ripple voltage (V)                  | 74 m*             | 40 m           | -              |
| Pumping cap. (F)                    | 10 p/20 p         | 6 p            | 2.5 p          |
| Load cap. (F)                       | 50 p              | 54 p           | 30 p           |
| Max. load current (A)               | 300 µ             | 240 μ          | 400 μ          |
| Switching freq. (Hz)                | 40 M              | 10k-20M        | 100 M          |
| Power Efficiency (%) @<br>Vout, max | 46.1 @ 300<br>μA* | 52 @ 240<br>μΑ | 46 @ 200<br>μΑ |
| Load regulation (V/mA)              | 0.77*             | 2.78           | 3.75           |
| Area (mm <sup>2</sup> )             | 0.53              | 0.5            | 1.3            |
| Process                             | 180 nm<br>CMOS    | 180 nm<br>CMOS | 180 nm<br>CMOS |

Table I. Performance Summary

\*Post-sim. results

the IC and/or PCB is suspected to have caused this degradation in performance. Improved layout and PCB design in the future should lessen the degradation. Table I presents the performance summary of this work (post-layout simulation values are used for comparison) and compares to previously reported charge pump ICs with integrated pumping and load capacitance. As the application and requirements of the charge pumps are all different, a fair comparison is difficult. In summary, this work provides a HV output with good load regulation performance with small area and comparable power efficiency.

### V. CONCLUSIONS

A high-voltage hybrid-core charge pump IC with input voltage modulated regulation is proposed for neural stimulation applications using 180 nm standard CMOS process. The proposed IC outputs 12.8 V from a 3.2 V input, supports load currents up to 300  $\mu$ A, and achieves 0.77 V/mA load regulation performance and 46.1 % power efficiency.

### ACKNOWLEDGMENT

This research was partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (MSIT), (NRF-2018R1C1B6003088) and Institute for Information & Communications Technology Promotion (IITP) grant funded by MSIT (No. 2017-0-00659). This work was also supported by IDEC for EDA Tool and MPW support.

### REFERENCES

- K. Chen, Z. Yang, L. Hoang, J. Weiland, M. Humayun, and W. Liu, "An integrated 256-channel epiretinal prosthesis," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1946–1956, Sep. 2010.
- [2] H.-M. Lee, K. Y. Kwon, W. Li, and M. Ghovanloo, "A power-efficient switched-capacitor stimulating system for electrical/optical deep brain stimulation," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 360–374, Jan. 2015.
- [3] M. Yip, R. Jin, H. H. Nakajima, K. M. Stankovic, and A. P. Chandrakasan, "A fully-implantable cochlear implant SoC with piezoelectric middle-ear sensor and arbitrary waveform neural stimulation," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 214–229, Jan. 2015.
- [4] R. R. Harrison, P. T. Watkins, R. J. Kier, R. O. Lovejoy, D. J. Black, B. Greger, and F. Solzbacher, "A lowpower integrated circuit for a wireless 100-electrode neural recording system," *IEEE J. Solid-State Circuits*, vol. 42, no. 1, pp. 123–133, Jan. 2007.
- [5] W.-M. Chen *et al.*, "A fully integrated 8-channel closeloop neural-prosthetic CMOS SoC for real-time epileptic seizure control", *IEEE J. Solid State Circuits*, vol. 49, no. 1, pp. 232-247, Jan. 2014
- [6] J. Zhao *et al.*, "An integrated wireless power management and data telemetry IC for highcompliance-voltage electrical stimulation applications", *IEEE Trans. on Biomed. Circuits and Syst.*, vol. 10, no. 1, pp. 113-124, Feb. 2016
- [7] A. Abdi, H. S. Kim, and H.-K. Cha, "A high-voltage generation charge pump IC using input voltage modulated regulation for neural implant devices", *Circuits and Systems II, IEEE Transactions on*, vol. 66, no. 3, pp. 342-346, Mar. 2019
- [8] A. Abdi and H.-K. Cha, "A bidirectional neural interface CMOS analog front-end IC with embedded isolation switch for implantable devices", *Elsevier Microelectronics J.*, vol. 58, no.12, pp. 70-75, Dec. 2016
- [9] J. F. Dickson, "On-chip high-voltage generation in MNOS integrated circuits using an improved voltage multiplier technique", *IEEE J. of Solid-State Circuits*, vol. SC-11, no. 3, pp. 374-378, Nov. 1976
- [10] J. Cockcroft and E. Walton, "Production of high velocity positive ions", *Proc. Royal Society of London*, Series A, 136, pp. 619-630, 1932
- [11] J.-H. Tsai et al., "A 1 V input, 3 V-to-6 V output, 58%efficient integrated charge pump with a hybrid topology for area reduction and an improved efficiency by using parasitics", *IEEE J. of Solid-State Circuits*, vol. 50, no. 11, pp. 2533-2548, Nov. 2015
- [12] R. Pelliconi et al., "Power efficient charge pump in deep submicron standard CMOS technology", *IEEE J. of Solid-State Circuits*, vol. 38, no. 6, pp. 1068-1071, Jun. 2003



**Myeong Gyu Song** received the B.S. and M.S. degrees from the Department of Electrical and Information Engineering at Seoul National University of Science and Technology, in Seoul, Korea, in 2017 and 2019, respectively. In 2019, he joined Hideep Inc., in Seongnamsi, Korea as an analog IC design engineer.

His research interests and areas are CMOS analog IC design for both biomedical and consumer applications



**Hyouk Kyu Cha** received the B.S. and Ph.D. degrees from the Department of Electrical Engineering and Computer Science at Korea Advanced Institute of Science and Technology (KAIST), in Daejeon, Korea, in 2003 and 2009, respectively.

From 2009 to 2012, he was with the Institute of Microelectronics,

(IME), Agency for Science, Technology, and Research (A\*STAR), Singapore, as a Scientist where he was involved in the research and development of RF/analog ICs for biomedical applications.

Since 2012, he has been with the Department of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul, Korea, where he is now an Associate Professor.

His research interests and areas are CMOS analog/RF IC and system design for implantable biomedical devices.

# A Low-Power 8-bit Switched Capacitor Convolution Engine Optimized for Artificial Neural Networks in 65nm CMOS

### Beom Kyu Seo<sup>1</sup> and Jintae Kim<sup>a</sup>

Department of Electronics Engineering, Konkuk University E-mail : <sup>1</sup>bk.seo@msel.konkuk.ac.kr

Abstract - In this paper, we present a study on a neural network operator that performs low resolution, low power, and high efficiency convolution operations in analog domains. The proposed operator is consisted of multiplying DAC (MDAC) with integrator structure and successive-approximation ADC (SAR ADC). The memory access frequency is lower than that of the digital operation because the addition operation is performed while the multiplication operation is performed, and the information is stored in the form of charge on the opamp output terminal. A digital-input, digital-output calculator consisting of MDAC and ADC was designed using a 65nm CMOS process. The result of transistorlevel simulation was 30.11uW of power at 33.3MHz, which is equivalent to 2.21TOPS/W. And it shows improved power efficiency than conventional digital convolution operator.

*Keywords*—Convolutional Neural Network, Deep Learning, Switched-Capacitor

#### I. INTRODUCTION

With the advent of deep learning technology, technological advancement of artificial intelligence has been spurred. In addition to demonstrating excellent performance in image processing, the deep learning technology is rapidly evolving since it has exceeded human recognition rates in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015 [1]. Most of the operations of Convolutional Neural Network that make up Deep Learning artificial intelligence take up the convolution, and GPU is used to massively parallelize these operations. As a by-product of continuous research to improve the recognition rate, the complexity and the computational requirement of the neural network are steadily increasing. For example, AlexNet [2], released in 2012, consisted of only eight hidden layers, but GoogLeNet [3] released in 2014 had 22 hidden layers and ResNet [4] released in 2015 had a maximum of 152 hidden layers.

a. Corresponding author; jintae.kim@msel.konkuk.ac.kr

Manuscript Received Jun. 13, 2019, Revised Sep. 23, 2019, Accepted Sep. 30, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<u>http://creativecommons.org/licenses/bync/3.0</u>) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. The composition of this paper is as follows. In Section II, we describe the overall architecture of an operator and the operation of an analog integrator-based circuit. In Section III, we analyze power consumption and performance through SPICE simulation of the operation of a composite multiplier designed in 65nm CMOS process. In section IV, we conclude the paper by summarizing the merits of the proposed circuit.

### II. OVERALL ARCHITECTURE OF CONVOLUTION ENGINE

### A. Circuit Structure

Two multiplying DACs (DACs) are pipelined as shown in Figure 1-(a). The ADC was added to the structure that multiplies the data and the weight, and the operator was completed. An 8-bit Switched-Capacitor architecture suitable for low-power computer design is adopted as an analog operation unit, and a relatively simple and low-power inverter-based amplifier is used as an opamp. Inverter-based amplifiers have been used in low-power ADCs including audio ADCs [9]. Finally, the ADC is a successiveapproximation ADC (SAR ADC) with low power characteristics and advantageous for future process scale [10].



Fig. 1. (a) Operation circuit with 8-bit input and output, (b) Timing diagram of operation circuit

The operation circuit operates as follows. Each time a 2phase non-overlap clock passes, a pair of data is multiplied by the weight and the information is stored in the form of a charge on the output of the multiply-accumulate (MAC) DAC. From the second operation, the multiplied data is added to the previous data at the MAC DAC output stage. As shown in Figure 1-(b), the SAR ADC operates when four data calculated in this manner are stored.

Figure 2-(a) shows internal circuit of the multiplying DAC in Figure 1. Pseudo-differential is adopted as switchedcapacitor structure and is configured to take digital input value and apply voltage to amplifier output stage. As the unit capacitor Cu to realize 8-bit DAC operation, 748aF, the minimum size provided in the process, was used. The LSB sampling capacitors of the multiplying DACs used two Cu in series to reduce the size of the entire sampling capacitor by half. As shown in Figure 2- (b), the opamp is implemented as an input inverter-based circuit. The similar structure to digital circuits also has advantages in future process scales.

The result of the single multiplication operation and the differential output  $V_{\text{DIFF},\text{OUT}}$ 

$$V_{DIFF.OUT} = \frac{C_{sample}^2 V_{DD} X W}{C_{feedback}^2}$$
(1)

and is formed as a differential voltage at the output terminal of the MAC DAC. The value of the full sampling capacitor  $C_{sample}$  of the multiplying DAC is 95.7fF and the value of the feedback capacitor  $C_{feedback}$  is 74.8fF. X and W represent the input data of the multiplication operation corresponding to -0.5 to +0.5 and the normalized value of the weight.

### B. SAR ADC

The ADC operates after all MAC operations have been completed but consumes about 10% of the total power consumption. In addition, the ADC is unnecessary when designing a computer with an analog method when compared with a digital computer.

On the other hand, the data of the MAC operation in the artificial neural network is subjected to a post-processing function before being processed as the input of the next hidden layer. ReLU, leaky ReLU, and Sigmoid are the post-processing functions. Up to now, ReLU in Figure 3-(a) has been recognized as the most efficient post-processing function [2]. When designing the ADC with a differential structure, the difference in the  $V_{CM}$  at both ends of Figure 4 causes undesirable effects such as offset and additional techniques may be required to compensate for this [13]. However, if this phenomenon is reversed, it will produce the output as shown in Figure 3-(b) and implement the function of ReLU function without additional power consumption. The differential output of the ADC internal CDAC is

$$DAC_{OUP} - DAC_{OUTN} = 2V_{CMP} - 2V_{CMN} + V_{INN} - V_{INP} (2)$$



Fig. 2. (a) Multiplying DAC internal structure, (b) Inverter based amplifier circuit



Fig. 3. (a) Post-processing function ReLU, (b) Implementation of ReLU function in ADC  $% \left( {{{\rm{D}}_{\rm{B}}}} \right)$ 



Fig. 4. Back-end successive-approximation ADC (SAR ADC)

If  $V_{CMP}$  is increased by the same voltage and  $V_{CMN}$  is decreased, the nonlinear section of the CDAC output value is increased in proportion to the difference. Figure 3-(b) shows the SPICE simulation of the characteristics of the entire computer when  $V_{CMP} = 712$ mV,  $V_{CMN} = 332$ mV, and  $V_{CM} = 522$ mV. When the analog output of the calculator is negative, it is converted to a specific DC value. It can be confirmed that it is digitally converted. In order to achieve this, the ADC adopts the asynchronous bottom plate sampling SAR ADC [14] which can provide offset by  $V_{CM}$  voltage adjustment.

| Performance comparison table |              |        |                     |         |                   |  |
|------------------------------|--------------|--------|---------------------|---------|-------------------|--|
|                              | This<br>work | [6]    | [16]                | [5]     | Stratix10<br>FPGA |  |
| Process (nm)                 | 65           | 28     | 65                  | 65      | 14                |  |
| Operation method             | Analog       | Analog | Analog              | Digital | Digital           |  |
| resolution (bit)             | 8            | 8      | Input:7<br>Filter:1 | 16      | 8                 |  |
| Speed (Hz)                   | 33.3 M       | 19.2 M | 364 M               | 250 M   | 920 M             |  |
| Supply voltage (V)           | 1.2          | 1.0    | 1.2                 | 1.17    |                   |  |
| Power (W)                    | 30.11 u      | 7.74 u | 380.7 u             | 278 m   |                   |  |
| Efficiency (OPS/W)           | 2.21 T       | 9.61 T | 28.1 T              | 302 G   | 400 G             |  |
| Area $(um^2)$                | 0.092*       | 0.012* | 0.067*              | 16*     |                   |  |

TABLE I. Performance comparison table

Efficiency =  $\left(\frac{\text{Power}}{\text{Speed}}\right) * (\# of operation in one period)$ 

\* Direct comparison is difficult because the number of arithmetic core is different.





III. SIMULATION RESULT

In this paper, we verified the actual operation and performance through SPICE simulation. Power consumption was calculated by transistor-level PEX simulation. The calculator was designed using a 65nm CMOS process and the layout is shown in Figure 5. The area of the computing core is  $0.092 \ \mu\text{m}^2$ .

Figure 5 shows the result of 2304 operations on all input data and some filter data from -127 to +127, and the result of operation error extracted from the largest output. In the simulation in Figure 6, only the multiplication operation was performed to collect data.

In some computation results and errors, there is an inverted staircase-type error every time the LSB is changed. In order to reduce the power consumption and the load impedance of the amplifier in the design, two LSB unit capacitors are connected in series. The final computation error in the computation result of Figure 6 is limited to the 2LSB range. This error does not affect the final recognition rate in artificial neural network computation. It is possible to confirm the recognized even in case of 5LSB operation error in the previous study [6]. Figure 8 shows



Fig. 8. Hidden layer data distribution diagram of VGG-F artificial neural network

the distribution of the hidden layer coefficient data of VGG-F [15], which is the simplest of the artificial neural networks, VGGNet. Figure 7 shows that in the worst case, the result of multiplication and addition of the operation result is accumulated 16 times, which is about 4 LSB errors. Since the hidden layer coefficient data of the actual artificial neural network is distributed similar to the normal distribution as shown in Figure 8, the operation in the worst case does not occur frequently.

Table I summarizes the performance comparison between transistor-level PEX simulation results and conventional digital and analog operators. The proposed algorithm is slower than the digital processor Stratix10 FPGA but has computation efficiency as high as 5 times. In addition, since the conventional analog calculator [6] uses a relatively new

http://www.idec.or.kr

process, the power efficiency is low, but it has faster computation speed. [16] has higher speed and efficiency than the arithmetic unit proposed in this paper. However, since the hidden layer data is stored in the SRAM as binary data, the applicable range is limited to a relatively simple data set such as MNIST have. The proposed algorithm can be applied to relatively complicated data sets such as CIFAR-10 because 8-bit operation is possible.

### IV. CONCLUSIONS

In this paper, artificial neural network modeling human brain maintains reliable inference accuracy even at low resolution [7] and computation at low resolution is based on previous research results that analogue method is more efficient than digital method [8] A low - resolution high efficiency artificial neural network computing circuit was designed. The proposed arithmetic unit improves the computation speed more than the conventional analog arithmetic unit [6]. Also, it has the advantage of no additional energy consumption in addition operation in MAC operation, and saves energy used for memory access compared to digital type arithmetic operators [5].

We designed a digital-input, digital-output artificial neural network 8-bit arithmetic circuit with a layout in 65nm CMOS process, a computation speed of 33.3MHz and a computation efficiency of 2.21TOPS/W.

### ACKNOWLEDGMENT

This study was carried out with support of the Nanomaterial Technology Development Project of the Ministry of Science and Technology(2016M3A7B4909668) and the support of the Industrial Innovation Technology Future Semiconductor Project (10080611) of the Ministry of Industry and Commerce.

### REFERENCES

- O. Russakovsky, et al., "ImageNet Large Scale Visual Recognition Challenge", International Journal of Computer Vision, Vol. 115, Issue 3, pp. 211-252, December 2015.
- [2] A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet classification with deep convolutional neural networks", In Advances in Neural Information Processing Systems 25, pp. 1106–1114, 2012.
- [3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions", CVPR, 2015.
- [4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arXiv preprint arXiv: 1512.03385, 2015.
- [5] Y.H Chen and J.S Emer, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional

Neural Networks", IEEE Journal of Solid-State Circuits, VOL.52, pp. 127-138, January 2017.

- [6] D. Bankman and B. Murmann, "An 8-bit, 16 input, 3.2 pJ/op Switched-Capacitor Dot Product Circuit in 28-nm FDSOI CMOS", IEEE Asian Solid-State Circuits Conference, pp. 21-24, November 2016.
- [7] D. Miyashita, S. Kousai, T Suzuki, J. Deguchi, "Time-Domain Neural Network: A 48.5 TSOp/s/W Neuromorphic Chip Optimized for Deep Learning and CMOS Technology", IEEE Asian Solid-State Circuits Conference, pp. 25-28, November 2016.
- [8] R. Sarpeshkar, "Analog Versus Digital: Extrapolating from Electronics to Neurobiology", IEEE Neural Computation, pp. 1601-1638, October 1998.
- [9] T. Christen, "A 15-bit 140-µW Scalable-Bandwidth Inverter-Based Δ∑Modulator for a MEMS Microphone With Digital Output", IEEE Journal of Solid-State Circuits, VOL.48, pp. 1605-1614, July 2013.
- [10] H.W. Shin, J.M. Jeong, T.J. An, J.S Park, S.H. Lee, "A 0.16mm2 12b 30MS/s 0.18um CMOS SAR ADC Based on Low-Power Composite Switching", Journal of The Institute of Electronics and Information Engineers, Vol.53, NO.7, pp. 1027-1038, July 2016.
- [11] Y. Chae, G. Han, "Low Voltage, Low Power, Inverter-Based Switched-Capacitor Delta-Sigma Modulator", IEEE Journal of Solid-State Circuits, VOL.44, pp. 458-472, February 2009.
- [12] J.H. Choi, J.H. Seong, K.S. Yoon, "Design of a Inverter-Based 3rd Order ∆∑ Modulator Using 1.5bit Comparators", Journal of The Institute of Electronics and Information Engineers, Vol.53, NO.7, pp. 1039-1046, July 2016.
- [13] Y.S. Cho, H.S, Shim, S.H. Lee, "A Non-Calibrated 2x Interleaved 10b 120MS/s Pipeline SAR ADC with Minimized Channel Offset Mismatch", Journal of The Institute of Electronics and Information Engineers, Vol.52, NO.9, pp. 1631-1641, September 2015.
- [14] C-C. Liu, S.-J. Chang, G.-Y. Huang, and Y.-Z. Lin., "A 10-bit-50MS/s SAR ADC With a monotonic capacitor switching procedure", IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731–740, March 2010.
- [15] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, "Return of the Devil in the Details: Delving Deep into Convolutional Nets", arXiv:1405.3531, November 2014.
- [16] A. Biswas, A. P. Chandrakasan, "Conv-RAM: An Energy-Efficient SRAM with Embedded Convolution Computation for Low-Power CNN-Based Machine Learning Applications", ISSCC, pp. 488-490, 2018.



**Beom Kyu Seo** received the B.S. and M.S. degrees in electrical engineering from Konkuk University, Seoul, Korea, in 2018. His research interest includes converter circuits and neural network for designing convolution engine. Especially, he is currently conducting the research on lowpower convolution engine for image processing neural network.



Jin Tae Kim received the B.S. degree in Electrical Engineering from Seoul National University, Seoul, Korea, in 1997, and the M.S. and Ph.D. degrees in Electrical Engineering from University of California, Los Angeles, CA, in 2004 and 2008, respectively. He held various industry positions at Barcelona Design, CA, SiTime Corporation, CA, and Agilent

Technologies, CA, as a key technical contributor for their high-speed A/D converters and timing IC products. He is currently an Associate Professor in Electronics Engineering Department at Konkuk University, Seoul, Korea, where he is focusing on low power mixed-signal IC designs for communication and sensor applications. Dr. Kim is a recipient of the IEEE Solid-State Circuits Predoctoral Fellowship in 2007.

# Implementation of RVDT Signal Conditioner Based-on Costas Loop

### Yong Heum Yeon<sup>1</sup>, Seung Won Yang and Jong Yeol Lee<sup>a</sup>

Department of Electronic Engineering Chonbuk National University E-mail : <sup>1</sup>yeon931@naver.com

Abstract - A rotary variable differential transformer (RVDT) shows superior performance compared with the conventional rotation detection sensor. In this paper, we propose a RVDT signal conditioner based on Costas loop. The proposed RVDT signal conditioner applying digital signal processing techniques has an advantage that no additional circuit for phase correction is needed. We implement the proposed RVDT signal conditioner using Magna Chip/SK Hynix 180 nm CMOS process. The proposed signal conditioner is successfully verified in a test environment which is implemented by using a FPGA board and Matlab. The proposed signal conditioner shows a processing time of 0.045 seconds to demodulate the message signal at the sampling frequency of 160 kHz. The area of the proposed signal conditioner is 195470.7 in the number of equivalent gates and a total power dissipation of 47.4 mW at a power supply voltage of 1.8 V and the operating frequency of 24 MHz is achieved.

### Keywords-DSP, LVDT, PLL, RVDT

### I. INTRODUCTION

Systems that convert input signals to useful output signals play an important role in industries requiring high-precision, non-fault-tolerant durability equipments. Especially, the system using the change of the electric force and the magnetic force is influenced by the electromagnetic wave generated from the outside. In high-precision system devices, electromagnetic waves are shielded and used so as not to be sensitive to such external environment changes. These magnetic conversion system devices include a linear variable differential transformer (LVDT) and RVDT that measure linear displacement and angular displacement, respectively. The advantage of RVDT is that it has excellent durability from vibration, impact, water, oil, dust and so on. In addition, RVDT is widely used in industrial facilities because reliability can be secured by using non-contact structure. The RVDTs are used in various precision control systems that require an external signal conditioner because the angular displacement of the RVDT sensor must be converted into an electrical signal. Typically, an RVDT

Manuscript Received Apr. 24, 2019, Revised May. 28, 2019, Accepted Sep. 27, 2019

analog signal conditioner consists of a separate module configured above a printed circuit board (PCB) using commercial passive elements. Recently, digital signal conditioners that can be embedded in RVDT sensors are being researched. [1][2][3][4]

The RVDT consists of a primary coil, a secondary coil and an eccentric rotor. Sine waves of different magnitudes are generated in the primary and secondary coils according to the rotation of the rotor. Therefore, the rotation angle can be detected by comparing the phase differences of the first and second sinusoidal waves. However, there may be the errors in the location of the rotor due to the phase mismatch error between the waves. In order to compensate the phase error, several methods have been proposed. This paper presents a Costas loop based RVDT signal conditioner, which finds the rotational information of the core from the output signal of a RVDT. This paper is organized as follows. In Section 2, we describe the Costas loop algorithm. In Section 3, we present simulation results and circuit verification results for the RVDT digital signal processor, and finally show some improvements in the conclusion.

### II. ALGORITHM

An RVDT is an encoder sensor that outputs the binary position code or gray code high precision position data by converting the angle of one rotation from 0 to 4047 as an absolute position detector. As shown in Fig.1, RVDT can detect the current rotational position by detecting the phase difference between the sinusoidal waves of the primary coil and the secondary coil according to the rotation of the eccentric rotor by using the magnetic induction principle of the transformer. In the RVDT, the four poles are reversely wound around the primary and secondary coils. As the core rotates, it becomes possible to measure the displacement by the change of the mutual inductance induced in the primary and the secondary coils. [2][4]



Fig. 1. RVDT structure

a. Corresponding author; jong@jbnu.ac.kr

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<u>http://creativecommons.org/licenses/bync/3.0</u>) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

In the RVDT, the input signal the position signal of the rotor, m(t), is modulated with the carrier signal  $\cos(w_c t)$  and transmitted to the receiver, where the phase changes due to the external environment and the delay time. The receiving section performs demodulation using the Costas loop to compensate this phase change. As shown in Fig. 2, Costas loop consists of a numerical control oscillator (NCO), three LPFs (low-pass filter) and three multipliers. [5][6]



Fig. 2. Costas loop structure

Input signal s(t) of a Costas loop is

$$s(t) = m(t) \cdot \cos[w_c t + \Theta_i(t)].$$
(1)

The NCO has two output signals that have the same frequency as the carrier signal and the phase differences of zero and 90 degrees, respectively, as follows:

$$2\cos[w_c t + \Theta_o(t)] \tag{2}$$

$$2\sin[w_c t + \Theta_o(t)] \tag{3}$$

In order to compensate the phase difference, the signal output from the NCO is multiplied by the input signal. Outputs of MUL1 and MUL2 are calculated as follows:

$$m(t) \cdot \cos[w_c t + \Theta_i(t)] \cdot 2\cos[w_c t + \Theta_o(t)]$$
  
=  $m(t) \cdot \cos[2w_c t + \Theta_i(t) + \Theta_o(t)]$   
+  $m(t) \cdot \cos[\Theta_i(t) - \Theta_o(t)]$  (4)

$$m(t) \cdot \cos[w_c t + \Theta_i(t)] \cdot 2\sin[w_c t + \Theta_o(t)]$$
  
=  $m(t) \cdot \sin[2w_c t + \Theta_i(t) + \Theta_o(t)]$   
+  $m(t) \cdot \sin[\Theta_i(t) - \Theta_o(t)]$  (5)

The high-frequency components of (4) and (5) are removed by low-pass filtering (4) and (5) as follows.

$$i(t) = m(t) \cdot \cos[\Theta_a(t)] \tag{6}$$

$$q(t) = m(t) \cdot \sin[\Theta_g(t)] \tag{7}$$

where  $\Theta_g(t)$  is  $\Theta_i(t) - \Theta_o(t)$ . In MUL3, i(t) and q(t) are multiplied and low-pass filtered.

$$m(t) \cdot \cos[\Theta_g(t)] \cdot m(t) \cdot \sin[\Theta_g(t)]$$
  
= (1/2)(k +  $\Phi(t)$ )  $\cdot \sin[2\Theta_g(t)]$  (8)

Low-pass filtering (8) produces

$$(1/2)(k) \cdot \sin[2 \Theta_g(t)] \cong k \Theta_g(t) \tag{9}$$

The message signal m(t) is decoded by making the loop converge the  $\Theta$  value to zero, where the NCO generates a signal with the same frequency as the input signal. Therefore, since the phase is automatically corrected by applying the feedback using the NCO, there is an advantage that an additional circuit for compensating the phase error is not necessary.

### III. IMPLEMENTAOIN

### A. Simulation

The above Costas loop is simulated by using MATLAB as in Fig. 3. The sampling frequency of the filter is 160 kHz and the number of the bits in the output of the NCO is 12 bits, which is the resolution of the Analog to Digital Converter (ADC). The system clock is generated by dividing 24 MHz to 2.4 MHz and is used for the ADC. The clock is also used for sine wave generation by dividing to 160kHz.



Fig. 3. Costas loop model using Matlab

For the simulation of Fig. 3 the sine wave in Fig. 4, which is a sine wave with a frequency of 300 Hz, is used as message signal.



Fig. 4. Message signal

As shown in Fig. 5 and Fig. 6, the carrier signal has a frequency of 10 kHz and is multiplied with the message signal m(t) producing the modulated signal.



Fig. 5. Carrier signal



Fig. 6. Modulated signal

In order to demodulate the input signal in the Costas loop, the NCO generates  $2\cos[w_c t + \Theta_o(t)]$  and  $2\sin[w_c t + \Theta_o(t)]$ . The outputs of LPF1 and LPF2 are i(t) and q(t), respectively. Then, these values are multiplied and passed through LPF3. As shown in Fig. 7, it can be seen that the phase difference of the  $\Theta$  value converges to zero as time passes.



Fig. 7. LPF3 output

The signal passed through LPF1 is a demodulated message signal. It can be seen that this signal is demodulated after 0.045 seconds as shown in Fig. 8.



Fig. 8. Demodulated signal

### **B.** Implementation

The RVDT digital signal processor based on the Costas loop is implemented in the following order. Based on the modeling results described above, the block diagram of the RVDT signal processor is described shown Fig.9 using VerilogHDL.



Fig. 9. Block diagram of proposed signal conditioner

When the 12-bit input is received from the input conversion block (Input CVT), it is converted to a 4-wire type signal in the case when a 5-wire type signal is applied. Receiving a 24MHz clock from outside, a clock divider generates 2.4 MHz and 160 kHz clocks. In the Costas loop block, the phase adjustment algorithm is executed to correct the phase error. By analyzing the loop coefficient that adjusts

the output value of the low-pass filter that is input to the NCO, the phase locking time is reduced. Since it is impossible to set  $\Theta$  to 0 completely in hardware, the threshold value is used. When the threshold value is large, the locking time is shortened, but the phase difference correction may be inaccurate. In the opposite case, the locking time may become longer. Therefore, it is necessary to adjust both the magnitude of the threshold value and the loop coefficient to change the locking time. The control block sets the gain and the offset used in the output adjustment block. For the linearity correction, the fifth order curve-fitting is used, where the coefficients are set by using the UART from the outside. The output adjustment block compensates the output by applying the gain and the offset set by the linearity correction block and the control block. All the control parameters and the coefficients are set by serial communication method using UART. The corrected digital output value is converted to serial data and output via the UART.

After the hardware simulation is verified, synthesis is performed using Synopsys' Design Complier using a 180nm standard CMOS process and TABLE I summarizes the implementation result. The total area of the synthesis result is 195470.7 represented as the number of equivalent gates, which is a NAND gate. After that, we extract the netlist from the synthesis result, which is the input to Timing-Simulation using Modelsim and P&R (Place & Route) using Synopsys's Astro. Fig. 10 shows the layout after P&R process. Thereafter, DRC (Design Rule Check) and LVS (Layout Versus Schematic) are performed. STA (Static Timing Analysis) is performed using Prime Time. Fig. 11 shows the message signal waveform and the post-simulation output signal waveform.

TABLE I. Implementation Result

| Process Technology  | CMOS 180 nm 1P6M             |                               |  |  |
|---------------------|------------------------------|-------------------------------|--|--|
| Package             | MQFP 208pin                  |                               |  |  |
| Supply Voltage      | 3.3V/1.8V                    |                               |  |  |
| Operating frequency | 24 MHz                       |                               |  |  |
| Sampling frequency  | 160 kHz                      |                               |  |  |
| Area(GC)            | 195470.2                     | 7 (100%)                      |  |  |
|                     | Input_CVT                    | 432.3 0(0.2%)                 |  |  |
|                     | Contrillor                   | 6071.5 0(3.4%)                |  |  |
|                     | Costas_Loop                  | 142549.9 (72.9%)              |  |  |
|                     | Out_Adjust                   | 44901.1 (23.0%)               |  |  |
|                     | UART 885.9 0(0.              |                               |  |  |
| Power consumption   | 47.4 mW                      |                               |  |  |
| Data bit width      | Input:<br>Output:<br>Filter: | 12 bits<br>12 bits<br>16 bits |  |  |

\* The gate count (GC) is equivalent 2-port NAND-gate count.



Fig. 10. Layout of proposed signal conditioner

### C. Chip Test

Verification of the implemented RVDT signal conditioner proceeds as in Fig. 12(a). As shown in Fig. 12(b), a chip test environment is constructed by using a FPGA board and an oscilloscope. First, we use MATLAB to generate two signals, message and carrier signals, as 12-bit digital signals and store them in memory using FPGA. Next, these two signals are multiplied to produce an input signal s(t) that is a modulated signal. The input signal and the carrier signal are applied to in\_A and in\_B inputs of RVDT\_SC, respectively. The input signal is demodulated in the proposed RVDT\_SC signal conditioner. The demodulated signal is fed into a DAC of the FPGA board and checked using an oscilloscope, as shown in Fig.13. The demodulated output signal is checked by using the signal verification module implemented in FPGA.



Fig. 11. Post-simulation result; (a) Message signal (b) post-simulation output signal





(0)

Fig. 12. Chip test; (a)Test block diagram (b)Test environment



Fig. 13. Chip test result; (a)Message signal (b)Carrier signal (c)Modulated signal (d)Demodulated signal

The maximum linearity errors of some signal conditioners are shown in Table II, where the proposed structure shows a better linearity performance when compared with other structures. We can see that the proposed structure provides as good linearity performance as the commercial signal conditioner in [11].

Table II. Maximum Linearity Errors

| Signal conditioners            | [7]  | [8]  | [9]  | [10] | [11] | proposed |
|--------------------------------|------|------|------|------|------|----------|
| Max. linearity<br>error (%FSO) | 0.14 | 0.18 | 0.16 | 0.2  | 0.01 | 0.01     |

### **IV. CONCLUSIONS**

In this paper, we present the RVDT signal conditioner based on the Costas loop. After verifying a Matlab model, we implemented the signal conditioner by using VerilogHDL, which is synthesized by using a standard 180nm CMOS process. The implemented RVDT digital signal processor can be applied to 4-wire and 5-wire systems, and there is no need for external devices required for phase correction. In addition, the linearity compensation coefficients, the gain, and the offset can be adjusted through the UART communication, thereby increasing the linearity.

### ACKNOWLEDGMENT

This work was supported by the IDEC.

### REFERENCES

- [1] S. M. Kim SoC implementation of RVDT signal conditioner with auto phase-correction based on DSP
- [2] D. Y. Sin, Y. G. Yang, C. S. Lee RVDT Phase Error Compensation for Absolute Displacement Measurement
- [3] G. H. An, S. H. Kim, Y. G. Yang, C. S. Lee Automation of phase error compensation for an absolute position detection
- [4] C. S. Lee. Frequency Domain Error Compensation of RVDT Sensor using FFT
- [5] S. M. Kim, Y. H. Seo, Y. R. Jin, S. I. Joe and J. Y. lee. FPGA Implementation of RVDT Digital Signal Conditioner with Phase Auto-Correction based on DSP
- [6] H. C. Jeong, J. G. Eun Analysis of Modified Digital Costas Loop Part I : Performance in the Absence of Noise
- [7] V. Gunasekaran, B. George, S. Aniruddhan, D. Janardhanan, and R. V. Palur, Performance Analysis of Oscillator-Based Read-Out Circuit for LVDT
- [8] H. Ganesan, B. George, and S. Aniruddhan Design and Analysis of a Relaxation Oscillator-Based Interface Circuit for LVDT
- [9] Analog Devices. (1989) LVDT Signal Conditioner AD598. [Online]. Available: http://www.analog.com/media/en/technicaldocumentation/data-sheets/AD 598.pdf
- [10] R. M. Ford, R. S. Weissbach, and D. R. Loker A Novel DSP-Based LVDT Signal Conditioner
- [11] Texas Instruments. (2019, Apr.) PGA970 LVDT Sensor Signal Conditioner. [Online]. Available: http://www.ti.com/lit/ds/symlink/pga970.pdf







**Yong Heum Yeon** is currently working toward B.S. degree in Department of Electronic Engineering from Chonbuk National University, Jeonju, Korea, in 2019. His main interests are embedded systems.

Seung Won Yang received the B.S. degree in electrical engineering from Kunsan National University, Gunasn, Korea, in 2008. and a M. S., and Ph.D. degrees in electrical engineering from Chonbuk National University, Jeonju, Korea, in 2010 and 2016, respectively. His research interests include multiprocessing and digital system implementation.

**Jong Yeol Lee** received the B.S., M. S., and Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea, in 1993, 1996, and 2002, respectively. Since March 2004, he joined the Division of Electronic Engineering at Chonbuk National University. His research interests

include embedded systems and their SoC implementations.

# Security SoC Architecture with Hardware-Based Pre-Authentication

Won Bae Kong<sup>1</sup>, Pil Joo Choi and Dong Kyue Kim<sup>a</sup>

Department of Electronic Engineering, Hanyang University

E-mail: <sup>1</sup>wbgong@hanyang.ac.kr

Abstract - Edge Devices with limited power and processing performance need the help of hardware-based security solutions in order to provide sufficient security services. The hardwarebased security solutions have been proposed to separate hardware resources into the secure area and the normal area and determine whether accessible according to the secure mode of the processor. The solutions determine secure mode based on the importance of the application or to let the user decide. However, this makes it possible for unauthorized users to access the secure area when the device is stolen or replicated. To solve this problem, we propose a hardware-based pre-authentication protocol which determines if the edge device is a safe situation. The proposed pre-authentication protocol includes all the processes the chip producing, issuing, and using. SoC with the Core-A processor and pre-authentication module was implemented as a hardware chip, and it was confirmed that Core-A enters secure mode after succeeding in the preauthentication protocol.

*Keywords*—Authentication protocol, Hardware security, Security SoC

### I. INTRODUCTION

With the recent advances in smartphones and IoT technology, tasks that were previously only available on PCs have become possible in edge devices. These tasks have expanded to include financial, healthcare, and transportation, which can affect users' personal information, property, and safety. As hackers who have been performing attacks in the PC environment also attack users using security vulnerabilities of edge devices, it is essential to apply security solution to edge devices. However, it is difficult to apply the heavy security software used in PCs to edge devices which have limited power and processing performance. As a result, a variety of hardware-based security solutions [1-10] have been proposed to reduce the load on processing performance and available in low-power environments.

As one of the hardware-based security solutions, it has been proposed for separating the hardware processing environment according to the application. Intel's software guard extensions (SGX) [1] and ARM's Trustzone [2] are solutions that separate hardware resources into normal area and secure area, and access area according to the secure mode of the processor. Intel's SGX allows the user to determine the processor's secure mode through software, and ARM's Trustzone determines the processor's secure mode based on the importance of the application. However, these solutions do not distinguish between whether the current edge device is in a safe or dangerous situation, it can be accessible to the secure area even when the edge device is stolen or replicated and can pose a security threat.

We propose a hardware-based pre-authentication protocol to determine if the edge device is in a safe situation. Only processors on edge devices authenticated through the preauthentication protocol can enter secure mode. It can prevent the processor from entering the secure mode when the device is lost, stolen, or replicated. The authentication protocol meets the following security requirements:

- Object authentication: If the protocol succeeds, it must verify the identity of the object participating in it.
- Key exchange: If the protocol succeeds, participants in the protocol should be able to share a secure session key.
- Confidentiality: While the protocol is proceeding, sensitive information contained in messages should not be able to be identified by an attacker.
- Integrity and non-repudiation: Sensitive information in the protocol must not be tampered with by the attacker, and messages that are approved by each participant should not be denied subsequently.
- Prevent reuse attacks: If an attacker saves some of the messages from a performed protocol and then reuses them later, participants should be able to recognize them.
- Preventing man-in-the-middle attacks: When an attacker attempts a man-in-the-middle attack, there should be no additional information or permissions obtained by the attacker compared to normal circumstances.

This paper consists of: Chapter 2 describes the subject of the protocol and the pre-authentication protocol process, and Chapter 3 analysis the security of the authentication protocol. Chapter 4 describes the hardware-based pre-authentication protocol and the SoC structure that was implemented and concluded in chapter 5.

a. Corresponding author; dqkim@hanyang.ar.kr

Manuscript Received Sep. 02, 2019, Revised Sep. 09, 2019, Accepted Sep. 20, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<u>http://creativecommons.org/licenses/bync/3.0</u>) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

### II. PRE-AUTHENTICATION PROTOCOL

In this section, we show the subject of the protocol and describe protocol process details.

### A. Protocol Subject

The subject spearheading the protocol is a chip, manufacturer, issuer, instrument, and trusted service manager (TSM). The relation between each subject is shown in Fig. 1. The description of each subject is as follows.



Fig. 1. The relation between subjects of pre-authentication protocol

### a. Chip

The security chip mounted inside the device has a PUFbased key that enables identification of the chip. The PUFbased key is the private key  $(Pv_{chip})$  of the public-key cryptography and the symmetric key  $(Sm_{chip})$  of the symmetric-key cryptography. The chip includes a hardware module that performs a cryptographic algorithm (public-key cryptography, symmetric-key cryptography, hash function, random number generation) and must be able to store private information securely and perform knowledge-based certification.

### b. Manufacturer (MF)

A chip manufacturer is responsible for everything involved in chip manufacturing, including physical chip manufacturing and the installation of the necessary software. It gives the chip a serial number (SN), and the connection between the chip and the manufacturer is considered as a trust interval.

### c. Issuer & Issuing Machine (IM)

The issuer is the subject issuing the chip to the user, such as the bank. The issuer makes the chip issued through the issuing machine, and the connection between the chip and the issuing machine is considered as a trust interval. The chip that has been issued becomes available.

### d. Trusted Service Manager (TSM)

As a subject that ensures the reliability of the chip, it

manages the chip's ID and public key and issues a certificate of the chip's public key  $(Pb_{chip})$ . TSM has its private key  $(Pv_{TSM})$  and public key  $(Pb_{TSM})$  to use the public-key cryptography. The communication between the TSM and the issuer is considered as a trust interval, and the communication between the TSM and the chip can be dangerous.

### e. Device

The device, including the chip, support the operating system and hardware for i/o interface with a user, communication with TSM. The device assumes that it is not safe because it is possible to attack, such as hacking, and therefore, the communication channel (communication between device and TSM) is not safe.

### **B.** Protocol Process

The pre-authentication process can be divided into three main stages: The Manufacturing stage of manufacturing security chips in the factory, the Issuing stage of issuing security chips to the user, and the Using stage of using security chips to operate the application. Devices equipped with a security chip can activate the processor's secure mode permission after performing the pre-authentication at a power-on or periodically.

### a. Manufacturing

In the manufacturing process, the manufacturer gives SN to the chip and manages the SN list given to the chip. The chip has an SN and PUF-based key  $(Pv_{chip}, Sm_{chip})$  inside. The manufacturer delivers the finished chip to the issuer, such as the bank, and delivers the SN list of the manufactured chips to the TSM.

### b. Issuing

Issuing is the process of exchanging information between the chip and the TSM. At this time, the communication between the chip and TSM is made through the IM of the issuer. IM is regarded as a reliable device, and communication between chips and issuing devices, and issuing devices and TSM are also regarded as a trust interval. The issuing process can be divided into three primary steps:

- 1. The chip sends its SN to the TSM and generates a  $Pb_{chip}$  for  $Pv_{chip}$ .
- 2. Chips and TSM exchange each other's public keys. TSM sends its  $Pb_{TSM}$  to the chip, and the chip sends its  $Pb_{chip}$ .
- 3. TSM generates a certificate  $(Cert_{chip})$  for  $Pb_{chip}$  and sends to the chip.

After the issuing is terminated, the chip has its SN,  $Sm_{chip}$ ,  $Pv_{chip}$ ,  $Pb_{chip}$ ,  $Cert_{chip}$ , and  $Pb_{TSM}$ . TSM manages the SN and  $Pb_{chip}$  as a list. The chip that is issued is delivered to the user and ported to the device.

### c. Using

The process of being certified by TSM before the device's processor enters secure mode so that users can safely use

services such as mobile banking. The communications between chip and TSM are non-trust interval and use security protocol since it through the user, the app, and the device. The authentication process for the chip and the device includes forming a security channel session for further information exchange with the TSM. The process of use is divided into six steps:

- 1. Comparing the device authentication information  $(HK_{device})$  stored on the device to the authentication information that was encrypted existing in the chip, the chip authenticates the user and the device.
- 2. The chip and TSM certify each other through the SN, the other party's public key, and the certificate. Depending on the public-key cryptography used, detailed steps of the signature and authentication algorithm [11] may vary.
- 3. Chip and TSM generate a one-time share key using public-key cryptography and a random number generator. Depending on the public-key cryptography, the detailed key sharing algorithm [12] may be different. A one-time share key is used as a session key for a session.
- 4. Using the session key, symmetric-key cryptography, and message authentication code (MAC) algorithm, chip, and TSM perform secure communication.
- 5. Before the session ends, send the chip's signature of the message used in the session to TSM.
- 6. Send the hash value  $(HK_{device})$  of the session key used in this session to the device and store the value encrypted ( $EHK_{device}$ ) by symmetric-key cryptography and  $Sm_{chip}$  to the chip. Hashed session key value is used in the following authentication as authentication information for the device.

After authentication to the chip and device through the Using stage, the processor gains permission to enter secure mode. After the Using stage end, the chip has its SN,  $Sm_{chip}$ ,  $Pv_{chip}$ ,  $Pb_{chip}$ ,  $Cert_{chip}$ ,  $Pb_{TSM}$ , and  $EHK_{device}$ . TSM has an SN- $Pb_{chip}$  list, and the device has a  $HK_{device}$ .

### III. SECURITY ANALYSIS OF PROTOCOL

This chapter analyzes the security features provided by the proposed protocol. MF and IM, which are only involved in the manufacturing and issuing, are always assumed to be safe communication. We analyze the security of chips, TSM, and device objects and describes security features that satisfy.

### A. Object authentication

Pre-authentication's Using stage provides mutual authentication and key exchange between chip and TSM. The chip and TSM can verify the identity of the other party by using the public key stored in the issuing step, confirming that the other party is the authorized owner of the private key associated with the public key. The identity of the device can be confirmed by  $HK_{device}$ , which indicates 'the same device as the device that was connected to the previous protocol.'.

It can be assumed that the TSM is attacked and critical information about the chip is leaked. Even so, the leaked SN and public key of the chip does not make the attacker can disguise it as a legitimate chip.

### B. Key exchange

After performing the pre-authentication's Using stage, the chip and TSM are securely shared session keys by public-key cryptography [12].

### C. Confidentiality

Session keys are securely shared through public-key cryptography. Messages encrypted with session keys are securely protected between chip-TSM and cannot be verified by an eavesdropper.

### D. Integrity and Non-repudiation

The MAC of messages generated by session keys is securely protected between chip-TSM and causes errors when tampering in the intermediate. The signature of the public-key cryptography ensures integrity and non-repudiation.

### E. Prevent reuse attacks

Reuse attacks are impossible because we use a random number generator to generate shared keys in public-key cryptography. The generated shared key is used only as a onetime session key.

### F. Prevent man-in-the-middle attacks

Since the chip and TSM have the other party's public key in advance, all the messages sent are guaranteed their validity by signature, the attacker cannot deceive and intervene in the identity.

### IV. IMPLEMENTATION AND CHIP VERIFICATION

### A. SoC Implementation

We designed the security SoC with the hardware-based pre-authentication protocol. The structure of the entire security SoC is shown in Fig. 2. The hardware-based pre-authentication protocol module consists of a logic part that controls the protocol process, via PUF cell [13] for chip recognition and key generation, and cryptographic modules that operate cryptography algorithms. The other part of SoC was configured using the Core-A processor [14] and static random-access memory (SRAM) to determine whether entering the processor's secure mode depending on pre-authentication protocol success.

The cryptographic modules consist of the symmetric-key cryptography algorithm AES [15] and SEED [16], the publickey cryptography algorithm ECC [17], hash algorithm SHA1[18] and SHA2 [18], and true random number generator (TRNG) [19] for generating random numbers. For protocol control and parameter storage, non-volatile (NV) memory and RAM are used, which both are SRAM. The universal asynchronous receiver/transmitter (UART) protocol which is serial communication, is used as an external communication interface.



Fig. 2. The structure of implemented SoC

We implemented the security SoC using the Samsung 60 nm complementary metal-oxide-semiconductor (CMOS) application-specific integrated circuit (ASIC) technology library. The security SoC was implemented at 20Mhz clock frequency. The post-synthesis results of the pre-authentication module are shown in Table 1. The gate count of the whole pre-authentication module and each submodule are presented separately.

TABLE I. Gate Count of Pre-Authentication Protocol

| Module                   | Gate Count |
|--------------------------|------------|
| Pre-Authentication Total | 353,662    |
| AES                      | 33,897     |
| SEED                     | 14,700     |
| ECC                      | 139,034    |
| SHA1                     | 10,572     |
| SHA2                     | 15,194     |
| TRNG                     | 1,267      |
| NV                       | 1,409      |
| RAM                      | 86,513     |
| UART                     | 1,153 x 2  |

### B. Chip verification

We used the test board to verify the implemented chips. The chip was operated at 40Mhz, and the data for verification was exchanged through UART communication with PC. The test board shown in Fig. 3 is connected to the PC.

We made a software program that performs the operation of the manufacturers, issuers, devices, and TSM to verify that the implemented chip performs the pre-authentication protocol correctly. It was confirmed that the chip performs the pre-authentication protocol correctly, and it was confirmed that the processor could enter the secure mode only if the authentication is successful. The pre-authentication protocol is performed through the software program shown in Fig.4. http://www.idec.or.kr



Fig. 3. Test board for verify chip operation

| 🙀 AuthECC                                                                        |                 |                              |                        |                               |              |
|----------------------------------------------------------------------------------|-----------------|------------------------------|------------------------|-------------------------------|--------------|
| Manufacture                                                                      | Protoc          | 01                           |                        |                               |              |
| SN : [1111111000000110000001200000013] PUF-test : [0000]                         | 제조              | 발급                           | 사용                     | Repuk                         | Renewal      |
| Issue                                                                            | 副フ              | 副기(SN)                       | 미인증                    | With error                    | With error   |
| PUF: 1 → LoadTime: 000102 UART: 0 → AES 12 → ReadyMF                             |                 | 副기(PUK)<br>到기(CERT)          |                        | 調フ(PUK)<br>調フ(CERT)           | 19171 (CERT) |
| Seed : 808182838485868790919293949596970CA7C0EC43A72CEA8A118E4251898086          | Com             |                              |                        |                               |              |
| ID : [2222222000000210000002200000023]                                           | Tx : C          | OM: V 11520                  | ( <sub>V</sub> )isc    | onnect Sync                   | Cir S        |
| d_tsm :444444400000041000000420000004300000044000000450000004600000047           |                 |                              |                        |                               | S            |
| Q_t : 021aeb2fd2f6264dcad13f188435ce420505e484c68d0de49924b00d8837788d0z         | 80              |                              |                        |                               | ^            |
| Q_c : 032f4c38c74a5c7646960beae8616526c13e79a9dbf5b059581d87f487a1d19c82         | 50da 3<br>037bb | 222222220000<br>1d499d00c326 | 00210000<br>405a,90a,e | 002200000023<br>9f 0500b119a1 | 0cbc4da1     |
| CertR 571ee18cac11d31160b6a27d14861a58b305e03b4d55bf af 056c23bcd367a160         | 36<br>84666     | 4b891405a4f                  | Oc449ae2               | df 99b7f a42f (               | 0c8409675    |
| CertS a85f785496898f99af568ed6db9d6edde97758b2e43ca1ebc3949dd3c114b1d9           | 911a5           | 54a6b601fbd8                 | 9f351ecf               | df 29d51 d07c8                | 5477d8121    |
| Repuk / Renewal                                                                  | 58ea 3          | 2bf ef f c1c933              | Oeb964f o              | ic020f 3a00a8t                |              |
| Rx RN : Dce58a575b1c245bda0000c984978a62 ReqTime test ReadyISS                   |                 |                              |                        |                               |              |
| Tx R :                                                                           |                 |                              |                        |                               |              |
| Tx S :                                                                           |                 |                              |                        |                               | ~            |
| Rx R :                                                                           | Rx :            |                              |                        |                               |              |
| Rx S :                                                                           | 22222           | 222000000210                 | 00000220               | 00000230ce58                  | Ba575b1c2    |
| Use                                                                              | 0e<br>024f 4    | 27a6cab2008                  | 1a466294               | 2670a268724t                  | 09ada8as     |
| a : 2923be84e16cd6ae529049f1f1bbe9ebb3a6db3c870c3e99245e0d1c06b747de             | 9b3c0           | 9841330606b                  | 8e415e22               | 5a16cf ecca24                 | 4a0033901    |
| A : 037bbdd499d00c326405a90ae9f0500b119af0cbc4da1e2da753dcd9917f25a93E           | 2d51c           | 19e6974e40cb                 | 98683ab8               | 03ab0cb37                     | 000001100    |
| R : 84666d4b891405a4f0c449ae2df99b7fa42f0c840967524a43aed5d721b34d06             | ce 00           |                              |                        |                               |              |
| S : 911a554a6b601fbd89f351ecfdf29d51d07c6477d812103de135b52afb92c15f             |                 |                              |                        |                               |              |
| B : 024f 4127a6cab20081a4662942670a268724b09ada8ae7a1707241079a446cc49t          |                 |                              |                        |                               |              |
| R : 3c019841330606b8e415e225a16cf ecca24a00339018294a00bcc1529c9f 598f           |                 |                              |                        |                               |              |
| S : [14db8c4978c1409122df ba2a92723c2598f 68ceaf 3da5a6f 7026708d4c920b2d        |                 |                              |                        |                               |              |
| K1 : 00ecb96962f a43b4f c7f 25bca3082f 5d4e068e12a479e9f 18432856acbb49a09       |                 |                              |                        |                               |              |
| K2/1V : 429a122f 90e7493a7ce34d0d88e900a6 5d7cf 2b9bbf e525c38a94f 5004b1097d    |                 |                              |                        |                               | ¥            |
| C1/C2 : 51 c49e6974e40cb98683ab803ab0cb37 2bf ef f c1 c9330eb964f dc020f 3a00a8b | Auther          | ntication su                 | cceeded                |                               |              |

Fig. 4. The software program for verify pre-authentication protocol

### V. CONCLUSION

We proposed a way to protect the device's secure world from danger situation where the device is stolen or replicated by checking before the processor entering secure mode. We proposed the pre-authentication protocol, which can determine whether the device is safe or not and analyzed the security of the proposed protocol. The protocol provides the object authentication of edge devices, safe key exchange, confidentiality and integrity of information, non-repudiation, and prevention of reuse attack and man-in-the-middle attacks. It was implemented in a chip to verify that the processor enters secure mode only in safety situations that pass the preauthentication protocol.

The hardware-based pre-authentication protocol we propose can authenticate the edge device, but the legitimacy of the software installed within the device is unknown. If the application running on the device is tampered with, or if a malicious program is performed to monitor what the user enters the device, the user may still be exposed to security threats. Therefore, hardware-based security solutions such as SGX and TEE or software-based security solutions must be added for a completely secure edge device execution environment.

#### http://www.idec.or.kr

### ACKNOWLEDGMENT

This work is supported by IDEC.

### REFERENCES

- V. Costan, and D. Srinivas. "Intel SGX Explained." *IACR Cryptology ePrint Archive* 2016, 086: 1-118, 2016.
- [2] ARM Ltd. TrustZone. Available online at: <u>http://www.arm.com/products/processors/technologies/t</u> <u>rustzone.php</u>.
- [3] J. Burke, J. McDonald, and T. Austin, "Architectural support for fast symmetric-key cryptography," ACM SIGOPS Operating Systems Review, 34.5: 178-189, 2000.
- [4] S. Tillich, J. Großschädl, and A. Szekely, "An instruction set extension for fast and memory-efficient AES implementation," *IFIP International Conference on Communications and Multimedia Security*, Springer, Berlin, Heidelberg, p. 11-21, 2005.
- [5] S. Tillich and J. Großschädl, "Instruction set extensions for efficient AES implementation on 32-bit processors," *International workshop on cryptographic hardware and embedded systems*, Springer, Berlin, Heidelberg, p. 270-284, 2006.
- [6] Xilinx Corp., "CryptoBlaze: 8-Bit Security Microcontroller," Available online at: http://www.bdtic.com/download/Xilinx/xapp374.pdf.
- [7] K. Akdemir, M. Dixon, W. Feghali, et al., "Breakthrough AES performance with intel AES new instructions," White paper, June 2010.
- [8] S. O'Melia and A. J. Elbirt, "Enhancing the performance of symmetric-key cryptography via instruction set extensions," *IEEE transactions on very large scale integration (VLSI) systems*, vol. 18, no. 11, pp. 1505-1518, 2009.
- [9] L. Wu, C. Weaver, and T. Austin, "CryptoManiac: a fast flexible architecture for secure communication," *Proceedings 28th Annual International Symposium on Computer Architecture*, Göteborg, Sweden, pp. 110–119, 2001.
- [10] R. Buchty, N. Heintze, and D. Oliva, "Cryptonite–A programmable crypto processor architecture for highbandwidth applications," *International Conference on Architecture of Computing Systems*. Springer, Berlin, Heidelberg, p. 184-198, 2004.
- [11] Kerry, C. F and C. R, "Digital signature standard (DSS)," *Federal Information Processing Standards Publication 186-4*, July 2013.
- [12] IEEE Std 1363a, "IEEE Standard Specifications for Public-Key Cryptography-Amendment 1: Additional Techniques," Mar 2004.
- [13] B. D. Choi, T. W. Kim, M. K. Lee, K. S. Chung and D. K. Kim, "Integrated circuit design for physical unclonable function using differential amplifiers." *Analog integrated circuits and signal processing* 66.3: 467-474, 2011.
- [14] J. H. KIM, D. H. You, K. S. Kwon, E. J. Bae, W. Son,

and I. C. Park, "Design of high-performance 32-bit embedded processor." 2008 International SoC Design Conference. IEEE, p. III-54-III-55, 2008.

- [15] Standard, NIST-FIPS "Announcing the advanced encryption standard (AES)," *Federal Information Processing Standards Publication 197*, vol. 197, pp. 1-51, Nov 2001.
- [16] H. J. Lee, S. J. Lee, J. H. Yoon, D. H. Cheon, J. I. Lee, "The SEED encryption algorithm." RFC 4269 (2005)
- [17] V. S. Miller, "Use of elliptic curves in cryptography," Conference on the Theory and Application of Cryptographic Techniques, pp. 417-426, 1985.
- [18] J. H. Burrows, Secure hash standard. Department of Commerce Washington DC, 1995.
- [19] A. Rukhin, J. Soto, J. Nechvata, M. Smid and E. Barker, A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.



security systems.



Won Bae Kong received the B.S. degrees in electronic engineering from Hanyang University, Seoul, South Korea, in 2015. He is currently a Ph.D candidate in electronics and computer engineering at Hanyang University. His research interests are in the areas of security SoC (System on Chip), secure crypto-processor, crypto-coprocessors, and information

**Pil Joo Choi** was born in Seoul, South Korea in 1982. He received the B.S., M.S., and Ph.D. degrees in electronic computer engineering from Hanyang University, Seoul, South Korea, in 2010, 2012, and 2018, respectively. He is currently a professor in Software Education Committee at Hanyang University. His research interests are in the areas of security SoC, crypto-coprocessors,

and information security.



**Dong Kyue Kim** was born in Seoul, South Korea in 1968. He received the B.S., M.S. and Ph.D. degrees in computer engineering from Seoul National University in 1992, 1994, and 1999, respectively. From 1999 to 2005, he was an assistant professor in the Division of Computer Science and Engineering at Pusan National University. From 2006, he is a professor in the Department of

Electronic Engineering at Hanyang University. His research interests are in the areas of security SoC, secure cryptoprocessor, crypto-coprocessors, and information security systems.

# Two-negative feedback loop PLL with frequency voltage converter loop

Dae Hyun Moon<sup>1</sup> and Young Shig Choi<sup>a</sup> Department of Electronic Engineering, Pukyong National University E-mail : <sup>1</sup>eogus2741@naver.com

Abstract - To reduce the phase noise and jitter of the conventional PLL, the proposed PLL uses frequency voltage converter (FVC). The inner negative feedback loop consisting of a voltage controlled oscillator (VCO) and a frequency voltage converter is nested inside a conventional outer PLL loop. When the output voltage (VCO input voltage) of the loop filter changes, the output voltage of the FVC changes in the opposite direction at a much higher sampling frequency in the negative feedback looped VCO. Thus, whenever the VCO output frequency varies, the FVC works as a compensator and it results in VCO noise reduction. It improves the phase noise characteristic and the stability of PLL. It has been simulated and proved by HSPICE in a CMOS 0.18µm 1.8V process. Measurement result of the two-negative feedback loop PLL fabricated in a one-poly six-metal 0.18µm CMOS process shows approximately 20dB improvement at 1MHz offset from 1GHz carrier frequency.

*Keywords*—Frequency-to-voltage Converter, Phase Locked Loop, Phase Noise

### I. INTRODUCTION

Phase Locked Loop (PLL) is a core component included in next-generation communication systems and digital chips because it is used as a frequency synthesizer in wireless communication and clock signal generators in digital chips that operate at high speed. The carrier frequency of nextgeneration mobile communications is increasing to transport a lot of data in a short time, and the operating speed of semiconductor chips is also increasing. Therefore, the new frequency synthesizer should have better phase noise characteristics and the clock signal generator should have very small jitter.

Typically, a PLL consists of a phase frequency detector (PFD), a charge pump (CP), a loop filter (LF), a voltagecontrolled oscillator (VCO), and a divider [1]. ]. A PLL is usually the third-order closed loop system that includes the second-order LF which consists of two capacitors and one resistor. It has one low frequency zero, and three poles consisting of two poles at origin and one pole at high frequency. In general, a narrow bandwidth has an advantage for suppressing phase noise and spur, but it needs a low phase noise VCO and takes a long locking time. A higher frequency pole can be introduced to suppress phase noise further. Careful design of poles and zero location is required to ensure a stable operation of PLL. The design of PFD and CP should deal with linearity, dead zone, current mismatch, timing mismatch in two output signals of PFD, charge sharing and injection, because these performances degrade the phase noise characteristics of PLLs. LC-VCO is preferable to ring VCO for low phase noise PLL because of its better phase noise performance [2].

Various architectures of PLLs and DLLs have been proposed to improve phase noise and jitter characteristics. The optimal loop-bandwidth has been derived from a discrete-time PLL model to find a loop-bandwidth which shows a good jitter performance [3]. Jitter optimization using PLL design parameters has been derived [4]. These theories are helpful in the design of PLL, but their effect on fabricated PLLs is not great because they are susceptible to process variations. A VCO realignment method has been introduced to reduce accumulated jitter in VCO periodically [5]. The accurate timing of realignment is not easy to implement in a PLL even though it has a maximum of 10dB improvement in phase noise performance. Sub-sampling phase detector has been used to improve phase noise and spur characteristics [6] [7]. PD/CP noise is not multiplied by N<sup>2</sup> in the sub-sampling scheme but the VCO noise which is usually the largest in PLL cannot be suppressed. Various architectures of DLL-based clock generator or frequency synthesizer have been proposed to improve the phase noise and jitter characteristics of PLL [8]-[10]. The frequency synthesizer in [8] multiplies reference frequency by a fixed multiplication factor. The clock generator in [9] can generate a wide range of clock signals by a fixed multiple multiplication factor. A clock multiplier combines a PLL and a recirculating DLL which has low phase noise characteristic [10]. These DLL-based frequency synthesizers or clock generators have difficulty in generating multiple narrowly spaced channel frequencies. Various PLLs with this structure have been published. However, this structure still has an issue of how to accurately deliver periodical input signals to VCO.

Previously published PLL structures have one negative feedback loop. To overcome the characteristic limits of one

a. Corresponding author; choiys@pknu.ac.kr

Manuscript Received Jul. 24, 2019, Revised Sep. 05, 2019, Accepted Sep. 17, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<u>http://creativecommons.org/licenses/bync/3.0</u>) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

negative feedback loop PLL, a new scheme is required. Therefore, the new architecture of PLL with two negative feedback loop will be studied [11]. It can be applied to design a frequency synthesizer with an outstanding phase noise characteristics to be used in next-generation mobile communication systems and a clock signal generator with very small jitter to be used in high-speed chips.

### II. PROPOSED PLL

A phase locked loop is a circuit that usually produces a high frequency signal. The most commonly used PLL architecture is shown in Fig. 1.



Fig. 1. Conventional PLL

The PLL consists of a phase frequency detector (PFD), a charge pump (CP), a loop filter (LF), a voltage controlled oscillator (VCO), and a frequency divider (N).

In conventional PLL, a PFD compares the phase and frequency of a signal generated by a VCO with a reference frequency ( $F_{ref}$ ) signal input from the outside, and generates signals corresponding to the difference in phase and frequency to drive CP. The CP charges or discharges a LF which generates a signal to drive a VCO, consequently the VCO generates the output signal ( $F_{out}$ ), PLL out signal.

In a PLL with a narrow bandwidth, it is not easy to reduce the phase noise of the VCO. Otherwise, widening the bandwidth to reduce the phase noise of the VCO makes it difficult to reduce the phase noise of other parts. In order to overcome the limit of a conventional PLL, the proposed PLL uses a frequency-to-voltage converter (FVC) to reduce the phase noise of the VCO. The FVC is a circuit that inputs the output signal of the VCO and generates a voltage according to the frequency. The FVC which is connected to the VCO in negative feedback loop operates to reduce jitter and phase noise.

Fig. 2 is a block diagram of the proposed two-negative feedback loop PLL. An FVC is added to the conventional PLL. The PLL with FVCs has a different bandwidth which reduces the VCO's phase noise.

Fig. 3 shows the transfer function of the proposed PLL. The characteristics of PFD, CP, LF, VCO, FVC, and divider are shown using s-parameters, and the characteristic of FVC is added in the transfer function of general PLL. Here, the PLL LPF (LPF<sub>E</sub> (s)) the FVC filter (LPF<sub>I</sub> (s)) are implemented with two capacitors and a resistor, and one capacitor, respectively.



Fig. 2. Proposed two-negative feedback loop PLL



Fig. 3. Transfer function of proposed PLL

The signal output from the overall signal can be expressed by the following equation.

$$\phi_{o} = (\phi_{i} - \frac{\phi_{o}}{N}) \frac{lp}{2\pi} \frac{1}{cp} \frac{s+z}{s(s+p)} \frac{k_{vcol}}{s} + (-s\phi_{o})K \frac{1}{sc_{y}} \frac{k_{vco2}}{s}$$
(1)

The closed and open loop transfer function is as follows

$$\frac{\phi_o}{\phi_i} = \frac{\frac{Ip}{2\pi} \frac{1}{cp} \frac{s+z}{s(s+p)} \frac{k_{veo1}}{s}}{1 + \frac{1}{N} \frac{Ip}{2\pi} \frac{1}{cp} \frac{s+z}{s(s+p)} \frac{k_{veo1}}{s(s+p)} \frac{k_{veo1}}{s} + K \frac{1}{sc_v} \frac{k_{veo2}}{s}}$$
(2)

$$T_{op} = \frac{1}{N} \frac{Ip}{2\pi} \frac{1}{cp} \frac{s+z}{s(s+p)} \frac{k_{vcd}}{s} + K \frac{1}{sc_y} \frac{k_{vco2}}{s}$$
(3)

Where K,  $K_{VCO2}$  and  $C_y$  are FVC transfer function, VCO gain for FVC loop and FVC loop filter.

K,  $K_{VCO2}$ , and  $C_y$  are introduced by the FVC in the conventional transfer function. Adding the FVC circuit reduces the bandwidth in the entire closed loop transfer function and can reduce the jitter of the entire PLL as shown in Fig. 4. There is no requirement on the ratio between K<sub>VCO1</sub> and K<sub>VCO2</sub> because both loops work as complementary.

Fig. 5 (a) shows the VCO circuit as a ring oscillator. Fig. 5 (b) shows the CMOS circuit inside the VCO delay cell. Gain of  $K_{VCO1}$  and  $K_{VCO2}$  is generated by receiving the voltage of LF and the voltage of FVC. Fig. 6 (a) shows the FVC circuit, in which the  $\Phi_1$  and  $\Phi_2$  signals are generated by the VCO output signal as shown in Fig. 6 (b). Fig. 6 (c) shows the signal waveforms of  $\Phi_1$  and  $\Phi_2$ .



Fig. 4. Bode diagram of conventional and proposed PLL

In a conventional second-order loop filter except the FVC shown in Fig. 7 (a), the two terms of proportional and integral control signals in VLF are produced by a resistor and two capacitors as shown in V<sub>LF</sub> of Fig. 7 (c), respectively. Fig. 7. (b) demonstrates the conceptual block diagram of the proposed two-negative-feedback-loop PLL. When a PFD detects a phase error, it asserts UP or DN outputs in a way that their net pulse width is proportional to the error. As these PFD outputs control the activation of the up and down currents of the CP, most of the current fed into the Cp would be proportional to the phase error. This current gives rise to a voltage (V<sub>proportional</sub>) across the capacitor, C<sub>p</sub>, of which net contribution to the VCO phase is proportional to the present phase error. After UP/DN output, the charge stored in C<sub>p</sub> moves to C<sub>z</sub>. This current gives rise to a voltage (V<sub>intergral</sub>) across the capacitor, Cz, which is equal to the integral of all the current fed to the filter, hence the sum of the phase errors. The sum of V<sub>proportional</sub> and V<sub>intergral</sub> determines the VCO frequency and phase as shown in V<sub>LF</sub> of Fig. 7 (c). FVC decreases/increases the input voltage of the VCO more abruptly. During the UP pulse, a CP dumps the error charge to a capacitor C<sub>p</sub>, giving rise to a voltage equal to V<sub>proportional</sub> + V<sub>intergral</sub> as shown in V<sub>LF</sub> + V<sub>FVC</sub> of Fig. 7 (c). It increases the VCO frequency and then the output voltage of FVC begins to decrease, leaving only V<sub>intergral</sub> on the VCO. The shaded area of the effective input voltage,  $V_{LF} + V_{FVC}$ , becomes smaller than that of  $V_{LF}$  of a conventional secondorder LF as shown in Fig. 7 (c). It means that the sum of phase error becomes smaller and the PLL becomes more stable. Therefore, the FVC works as a noise suppressor and stability enhancer.







Fig. 5. Voltage controlled oscillator (a) block (c) delay cell circuit



Fig. 6. Frequency to voltage converter (a) circuit (b) control signal block (c) control signal timing



Fig. 7. (a) Second-order loop filter with the inner negative feedback loop consists of a VCO and an FVC. (b) Conceptual block diagram. (c) Output signals of LF and FVC and an effective VCO input voltage of  $V_{LF} + V_{FVC}$ .

### III. SIMULATION AND MEASUREMENT RESULT

The proposed two-negative feedback loop PLL was simulated with HSPICE by using the 0.18 $\mu$ m CMOS process variable. The input frequency is 31.25 MHz. The ratio of divider is 32. Output frequency is set at 1GHz and current of CP is 300 $\mu$ A and gain of VCO is 850MHz/V. The parameters used for the loop filter are R<sub>z</sub> = 2 K $\Omega$ , C<sub>z</sub> = 200 pF, C<sub>p</sub> = 20 pF.

Figure 9 shows the simulation results of the conventional PLL.  $\Delta V_{LF}$  is the magnitude of loop filter voltage fluctuation after lock and determines phase noise characteristic.  $\Delta \Delta V_{LPF}$  is the magnitude of loop filter voltage fluctuation generated during one reference signal period after lock and determines spur performance.



Fig. 8. Design flow

Fig. 8 shows the design flow for designing the proposed two-negative feedback loop PLL.

Fig. 9 (a) shows that the phase is locked at 3 $\mu$ s. Fig. 9 (b) shows  $\Delta\Delta V_{LF}$  which determines the excess phase shift. It has a value of approximately 220  $\mu$ V. Fig. 9 (c) shows the jitter (rms) of 1.066ps. The VCO gain of the proposed PLL is 850/550 MHz/V, and the values used for CP current and loop filter are same. The current of FVC is 185 $\mu$ A and the capacitors C<sub>x</sub> and C<sub>y</sub> are 10pF, respectively.

Fig. 10 shows the simulation results of the proposed twonegative feedback loop PLL. In Fig. 10 (a), we can see the loop filter voltage and the FVC voltage waveform. It is locked at 5µs. The inner loop of FVC makes the proposed PLL more stable and it does not increase locking time. Fig. 10 (b) shows  $\Delta\Delta V_{LF} \quad \Delta\Delta V_{FVC}$ . Even if the  $\Delta\Delta V_{LPF}$  has 345µV, the actual effective voltage is lower than that of the conventional due to  $\Delta\Delta V_{FVC}$  which is the voltage subtracted through FVC. Therefore, as shown in Fig. 10 (c), the jitter (rms) shows the jitter of 286.28fs which much lower than that of conventional PLL. Table 1 and 2 show the simulation results of the conventional PLL and the proposed twonegative feedback loop PLL, respectively.

Fig. 11 shows the top layout of the conventional and proposed two-negative feedback loop PLL using the Magna/Hynix 180nm process, where all parameters are same as pre-simulation. Overall size of proposed PLL is  $650 \mu m \times 430 \mu m = 0.278 mm^2$  with 1 poly 6 metal standard CMOS process.

#### http://www.idec.or.kr



Fig. 9. Conventional PLL (a)  $V_{LF}$  (b) enlarged  $V_{LF}$  (c) jitter



Fig. 12 shows the measurement results of the conventional and proposed two-negative feedback loop PLL. It shows approximately 20dB improvement at 1MHz offset from 1GHz carrier frequency. The reference spur shows no improvement.



Fig. 10. Proposed PLL (a)  $V_{LF}$  and  $V_{FVC}$  (b) enlarged  $V_{LF}$  and  $V_{FVC}$  (c) jitter

TABLE I. Simulation results of Conventional PLL

| Parameter name        | Measurement result |
|-----------------------|--------------------|
| Lock time             | 3µs                |
| $\Delta V_{LF}$       | 2.06mV             |
| $\Delta\Delta V_{LF}$ | 220μV              |
| Jitter (rms)          | 1.067ps            |

TABLE Ⅱ. Simulation results of Proposed PLL

|                       |                    | _ |
|-----------------------|--------------------|---|
| Parameter name        | Measurement result |   |
| FVC current           | 185μΑ              |   |
| Lock time             | 5µs                |   |
| $\Delta V_{LF}$       | 2.56mV             |   |
| $\Delta\Delta V_{LF}$ | 345µV              |   |
| $\Delta V_{FVC}$      | 2.09mV             |   |
| Jitter (rms)          | 286.28fs           |   |

=



Fig. 11. Layout of the proposed PLL

### IV. CONCLUSIONS

The proposed two-negative feedback loop PLL introduced the FVC to reduce the phase noise and jitter of the conventional PLL by adding one more negative feedback loop. The outer negative feedback loop is a conventional PLL loop. The inner negative feedback loop is inside the negative feedback loop of a conventional PLL. The independent inner loop does a negative feedback function to the outer loop. It improves the phase noise characteristics and the stability of PLL without requiring complicated auxiliary circuits and components which occupy a large area or consumes a large amount of power. It shows approximately 20dB improvement at 1MHz offset from 1GHz carrier frequency.



(0)

Fig. 12. Phase noise measurement result of the (a) conventional (b) proposed PLL.

### ACKNOWLEDGEMENT

The chip fabrication and EDA too were supposed by the IC Design Education Center (IDEC), Korea.

#### REFERENCES

- Floyd M. Gardner, "Charge-Pump Phase-Lock Loop," *IEEE J. Tran, on Communications*, vol. COM-28, NO, 11, pp. 1849-1858, Nov., 1980.
- [2] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits. New York: IEEE Press, 1996.
- [3] K. Lim, C. Park, D. Kim and B. Kim, "A Low-Noise Phase-Locked Design by Loop Bandwidth Optimization," *IEEE J. solid state circuits*, vol. 35, no. 6, pp. 807-815, June 2000.
- [4] Mozhgan Mansuri and Chih-Kong Ken Yang, "Jitter Optimization Based on Phase-Locked Loop Design Parameters," *IEEE J. solid state circuits*, vol. 37, no. 11, pp. 1375-1382, Nov. 2002.
- [5] Sheng Ye, Lars Jansson and Ian Galton, "A Multiple-Crystal Interface PLL with VCO Realignment to Reduce Phase Noise," *IEEE J. solid state circuits*, vol. 37, no. 12, pp. 1795-1803, Dec. 2002.
- [6] X. Gao, E. Klumperink, M. Bohsali, and B. Nauta, "A low-noise sub-sampling PLL in which divider noise is eliminated and PD/CP noise is not multiplied by N2," *IEEE J. solid state circuits*, vol. 44, no. 12, pp. 3253-3263, Dec. 2009.
- [7] X. Gao, E. Klumperink, G. Gerard, M. Bohsali, and B. Nauta, "Spur reduction techniques for phase-locked loops exploiting a sub-sampling phase detector," *IEEE J. solid state circuits*, vol. 45, no. 9, pp. 1809-1821, Sept. 2010.
- [8] D. Foley and M. P. Flynn, "CMOS DLL-based 2-V 3.2ps Jitter 1-GHz Clock Synthesizer and Temperature– Compensated Tunable Oscillator," *IEEE J. solid state circuits*, vol. 36, no. 3, pp. 417-423, Mar. 2001.
- [9] J. Kim, Y. Kwak, M. Kim, S. Kim and C. Kim, "A 120-MHz-1.8-GHz CMOS DLL-based Clock Generator for Dynamic Frequency Scaling," *IEEE J. solid state circuits*, vol. 41, no. 9, pp. 2077-2082, Sept. 2006.
- [10] Sander L. Gierkink, "Low-Spur, Low-Phase-Noise Clock Multiplier based on a Combination of PLL and Recirculating DLL with Dual-Pulse Ring Oscillator and Self-Correcting Charge Pump," *IEEE J. solid state circuits*, vol. 43, no. 12, pp. 2967-2976, Dec. 2008.
- [11] Young-Shig Choi, "Phase-Locked Loop with Two Negative Feedback Loops," U.S. Patent 8 547 150, Oct. 1, 2013.



**Dae Hyun Moon** received B.S. and M.S. degree in Electronics Engineering from Pukyong National University, Pusan, Korea, in 2017 and 2019. His main interests are PLL design.



Young Shig Choi received B.S. degree in Electronics Engineering from Kyungpook National University in 1982, M.S. degree from Texas A&M University in 1986 and Ph. D from Arizona State University in Electrical Engineering in 1993. From 1987 to 1999, he was with Hyundai Electronics (Now SK Hynix) as a principal circuit design engineer where he has been involved

in the development of communication and mixed signal chips. In March 2003, he joined the faculty of Dept. of Electronics, Pukyong National, where he is currently a Professor. His current interests include PLL and DLL design.

**IDEC Journal of Integrated Circuits and Systems** Volume 5 • Number 4 • October 2019 Date of Publication October 1, 2019 **Printed by Simwon** 401, 12, Munjeong-ro, Seo-gu, Deajeon, Republic of Korea

IC Design Education Center (IDEC) 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea Tel. +82-42-350-8533 / Fax. +82-42-350-8540



jicas.idec.or.kr