Time : 2021 spring (second half semester of sophomore)
more info in lec/*.pdf
| subject | teacher |
|---|---|
| 超大型積體電路電腦輔助設計概論 | 邱瀝毅 |
more info in doc/*.docx
- OS
CenterOS v6
- Software
| 名稱 | 功能 |
|---|---|
| NC Verilog | 對HDL模擬真實電路並產生波型 |
| nWave in Verdi | 觀測波型*.fsdb |
| Superlint | 檢查不符的格式,進行除錯 |
| Design Vision | 電路合成 |
| HSPICE | 類比電路模擬 |
| Laker | 佈局編輯器 |
| Calibre | 佈局驗證DRC、LVS、PEX |
| Mobaxterm | 支援X11, sftp, ssh等傳輸協議,使遠端能連線工作站 |
- In lab6, provide
makefile
| Description | Command |
|---|---|
| Run RTL Convolution simulation | make rtl0 |
| Run RTL Pooling simulation | make rtl1 |
| Run RTL simulation | make rtl_full |
| Run post-synthesis simulation | make syn_full |
| Dump waveform (no array) | make {rtlX, syn_full} FSDB=1 |
| Dump waveform (with array) | make {rtlX, syn_full} FSDB=2 |
| Open nWave without file pollution | make nWave |
| Open Superlint without file pollution | make superlint |
| Open DesignVision without file pollution | make dv |
| Synthesize your RTL code | make synthesize |
| Check correctness of your file structure | make check |
| Compress your homework to tar format | make tar |
| Count the total lines of your code | wc –l ./src/* ./include/* |
- compile
ncverilog top_module.v
- pre-simulate
ncverilog top_module_tb.v +define+FSDB access+r
-
synthesis
- open Design Vision
dv &- change hierarchy
current_design top- read design constraints file
source DC.sdcCompile Design->OK- generate report
report_timing report_area report_power- generate SDF file
write_sdf version 2.1 context verilog load_delay net too_module_syn.sdf -
post-simulate
ncverilog top_module_tb.v +define+FSDB+syn access+r
-
Superlint
- open
jg -superlintFile->TclScripts->Source- Count the number of total lines
wc –l filename -
check file hierarchy
sh check.sh
4-to-2 priority encoder in gate-level
5-bit add/sub ripple carry adder in hierarchical coding
- call the FullAdder we design in Lab2
include "File_Path/Filename"
8-to-1 multiplexer and testbench that needs to test all selected inputs and print results
- operations
| alu_op | operation | description |
|---|---|---|
| 01000 | NOT | ~src1 |
| 01001 | NAND | ~(src1&src2) |
| 01010 | MAX | max{sec1, src2} |
| 01011 | MIN | min{sec1, src2} |
| 01100 | ABS | |src| |
| 01101 | SLTS | (src1<src2)?1:0 |
| 01110 | SLL | src1<<src2 |
| 01111 | ROTL | src1 rotate left by "src2 bits" |
| 10000 | ASSU | unsigned(src1+src2) |
| 10001 | SRLU | unsigned(src1>>src2) |
- Port
| signal | type | bits | description |
|---|---|---|---|
| alu_enable | input | 1 | 0->close;1->open |
| alu_op | input | 5 | opcode select which op to be execued |
| src1 | input | 32 | ALU source 1 |
| src2 | input | 32 | ALU source 2 |
| alu_out | output | 32 | ALU result |
| alu_overflow | output | 1 | 0->no;1->yes |
conversion formula : y = 0.3125r + 0.5625g + 0.125b
| input | output |
|---|---|
| 24 bit RGB color values | 8 bit grayscale values |
模擬 64x32 register file寫入、存取、讀出的狀況。
分為三個階段
| 階段 | 描述 |
|---|---|
| Phase0 | 使用者投錢,機器並把錢先存在money_temp |
| Phase1 | 選擇飲料並把money_temp減去beverage的商品價格 |
| Phase2 | 找錢change = money_temp,並把finish拉高,讓使用者知道交易已完成。此部分用conbinatioal寫,要與sequential電路分開寫 |
沒修相關課程,大概有去看神經網路科普影片。但這題講白了這題就是把兩個矩陣的個別元素相乘,而對我來說難點在負數相乘要先做sign extension,而我的解題思路為
- 個別輸入連到array上方便一次用
for loop處理,有4種輸入的情況w_w和if_w皆為1,個別為1與都為0 - 用
for loop把array每一項個別處理 - 把結果跟0位元cascade到17位,再做
sign extension - 最後再乘得結果
Rectified Linear Unit函數映射(線性整流函數,活化函數主要目的是用來增加類神經網路模型的非線性)
| CurrentState | NS (din = 0) | NS (din = 1) | qout |
|---|---|---|---|
| S0 = 00 | S2 | S1 | 1 |
| S0 = 01 | S1 | S0 | 0 |
| S0 = 10 | S3 | S2 | 0 |
| S0 = 11 | S3 | S1 | 1 |
| Current State | Next State, output | |
|---|---|---|
| X | din = 0 | din = 1 |
| S0 = 00 | S1,0 | S2,0 |
| S1 = 01 | S1,1 | S2,0 |
| S2 = 11 | S2,0 | S0,1 |
- a
65536x24bits random access memory - a
16384x24bits read only memory
- port
| signal | type | bits | description |
|---|---|---|---|
| clk | input | 1 | clock |
| rst | input | 1 | reset |
| clear | input | 1 | Set all register to 0 |
| w_w | input | 1 | Write weight enable. When w_w is high, write w_in. |
| if_w | input | 1 | Write input feature map enable. When if_w is high, write if_in. |
| w_in | input | 16 | Input weight data |
| if_in | input | 16 | Input feature map data |
| out | output | 34 | Output data |
- Shift register
a cascade of flip flops.The output of each flip flop is connected to the input of the next flip flop.The output of each flip flop is connected to the input of the next flip flop.
-
spec
The system will be able to change RGB pictures to grayscale pictures -
function
- reads pixel from the input memory.
- compute new value of pixels
- writes the new value pixel back to the output memory.
- repeats the process step (1)-(3) until the last pixel of output memory is updated.
- flags
donewhen step (4) is completed
-
control signal
| signal | function |
|---|---|
en_in_mem |
enable input memory |
in_mem_addr |
input memory address |
en_out_mem |
enable output memory |
out_mem_read |
output memory read enable |
out_mem_write |
output memory write enable |
out_mem_addr |
output memory address |
done |
Stop the process |
| Original Image | Results |
|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
-
Waveform
第一張圖為所有執行的波形圖,第二張為最一開始從rst =1,使in_mem_addr,out_mem_addr初始化從0開始加,en_in_mem與en_out_mem、out_mem_write隨clk交替拉高,進入讀入(S_in_mem)與讀出(S_out_mem)的狀態迴圈,一直到out_addr到32'd479999時,就是把整張480000像素的圖片跑完就進入done = 1卡在S_done的單一狀態裡面,符合上面設計的state diagram的大致流程。
| Timing(slack) | Area(total cell area) | Power(total) |
|---|---|---|
5.49 |
3839.52 |
0.1058mW |
integrate all components that you have learned so far to form a simple convolution system.
- reads pixel from the
IFM ROMto convolution block and consider the padding problem. - computes new value of pixels.
- writes the convolution result back to the
CONV RAM. - repeats the process step (1)-(3) until the last pixel of
CONV RAMis updated. - reads pixel from the
CONV RAMto pooling block. - computes new value of pixels.
- writes the new value pixel back to the
POOL RAM. - repeats the process step (5)-(7) until the last pixel of
POOL RAMis updated. - flags
donewhen step (8) is completed.
| signal | function |
|---|---|
ROM_IF_OE |
read data from input feature map ROM |
ROM_W_OE |
read data from weight ROM |
RAM_CONV_WE |
store the data to CONV RAM |
RAM_CONV_OE |
read data from CONV RAM |
RAM_POOL_WE |
store the data to POOL RAM |
RAM_POOL_OE |
read data from POOL RAM |
done |
stop the process |
- Do convolution with a 3\times3 weight map to the penguin.
- Consider the boundary condition to handle the padding problem.
- Do maximum pooling to the convolution result.
- Synthesize your
system.vwith following constraint:
| Clock period | no more than 20 ns |
|---|---|
| Synthesized Verilog file | system_syn.v |
| Timing constraint file | system_syn.sdf |
- READ_9
- 一般情況
Cycle1、4、7 pad_en打開 - 邊界情況
row == 18'b0額外Cycle2、3打開row == 18'b255額外Cycle8、9打開
- 一般情況
- READ_C
- 一般情況
pad_en皆關閉 - 邊界情況
column == 18'b255Cycle1、2、3pad_en皆打開row == 18'd0Cycle1打開row == 18'd255Cycle3打開
- 一般情況
- terminal
- image
| Original | Result |
|---|---|
![]() |
![]() |
cs[2:0]=READ_W
cs[2:0]=READ_9
讀9筆資料,但因為地址都要早一個Cycle給,所以如上圖count[3:0]從0加到9,共花了10個Cycle去完成READ_9這個state。Cycle1、2、3、4、7 pad_en拉高,此時不用管地址,因為都是輸出0,而Cycle5、6、8、9,如上圖地址分別是0、1、256、257。ROM_IF_OE拉高讀ROM裡面原始企鵝的資料;而RAM_CONV_WE拉高把做完Convolution運算結果寫入RAM_CONV保存。
cs[2:0]=READ_C

如上述cs[2:0]=READ_9的行為,差別是指需要讀3筆資料而已,如上圖count[3:0]從0加到3,所以花了3+1=4個Cycle去完成。大部分的情況都是這樣,依序READ_C、WRITE_C交替。

在column == 18'd255時padding全部拉高,此時相對位置在Input Feature Map的右下角,接下來跳到狀態READ_9,row = row+1,而column歸0,從零開始數,如此不斷循環。

直到address == 18'd65535時,第一階段Convolution完成,跳至下一個state-READ_P
cs[2:0]=READ_P

一樣地址要早一個cycle給,pool_en拉高時,允許寫入 Pooling.v,如果pool_en拉低,我的設計就是維持Pooling.v的值。RAM_CONV_OE拉高為
把前一個做完Convoulution保存在RAM_CONV的data讀進來;而RAM_POOL_WE拉高則把結果寫入RAM_POOL保存。

在column2 == 18'd254時row2 = row2+2,而column歸0,從零開始數,如此不斷循環。

當 address2 == 18'd16383時,第二階段Pooling完成,DONE拉高並卡在無窮迴圈之中,RTL code全部一、二階段執行流程結束。
Coverage : 99% (2error in system.v)
能解完的錯誤已解完,剩下兩個錯誤在system.v檔裡面。
| 錯誤代碼 | 說明 |
|---|---|
INP_NO_USE |
RAM_POOL_Q沒有接線,因為該線功能為將RAM_POOL傳data到system,這個功能在這次design沒有用到 |
RXT_XC_LDTH |
猜測為rst訊號接線導致 |
| Synthesizable clock period | Simulation time | Cell Area | Power |
|---|---|---|---|
10ns (TA default) |
4275325ns |
84011 |
1.3264mW |
設計一個inverter、nand、nor電路
| 電路 | 波型驗證 |
|---|---|
inverter |
訊號做0變1、1變0 |
NAND |
先做AND再做NOT |
NOR |
先做OR再做NOT |
這堂課前半段是寫Verilog做數位電路模擬合成,用到的基本觀念有數位邏輯設計、計算機組織與unix-like環境的基本使用;後半段是layout,用到的基本觀念有電子學一二,但由於新冠疫情在本土延燒,後半的課只有上到lab9,在畫完inverter、nand、nor的layout後就幾乎結束了,有些可惜,不過大二下課業繁重。也給了我喘息的時間去讀電子學等其他科目。
比較重要或有趣的電路有
- lab5第五部分的
grayscale conversion system - lab6也就是final froject的
simple convolution system
讓我學到要如何把演算法轉換成RTL code,尤其是lab6的邊界條件這部分是主要的困難點,再加上發現助教給的testbench似乎有把從ROM讀入的data調晚1個cycle,這些東西花了我很多的時間去完成,不過我也學到了很多東西,有了一點由自己去design的感覺。
其實這次作業很多部分是由助教代勞,像是linux環境下的shell script, makefile、由高階語言生成的golden data與testbench驗證以及那些system的block與彼此之間的port接線,我們學生完成的是block內部電路的FSM實現。
上完這堂課我覺得我應該精進自己的coding能力與對linux的掌握度,希望能達到真正了解整個設計流程的designer,別人開好文字描述的spec,自己從無到有、全部自己弄的獨立感。


















