总线控制器设计

总线设计

总线形式

CPU内cache通过总线与其他设备包括RAM、ROM及IO设备(Memory Mapped IO)相连。总线可以有多个主设备,能发起读写请求,这使得该系统拥有DMA能力,其他设备能在总线空闲时抢占直接访问存储设备。这一点是受到ZipCPU启发的,ZipCPU使用一种成熟的总线,在本项目中我计划设计一种简单的总线。
总线设计如下:

========address
========data
--------request
--------ready
--------r_w
--------clk

总线控制器

总线控制器设计如下:

          ------------
request<--|  state    |--<DMA 0
          |           |-->Grant 0
          |   BUS     |--<DMA 1
          | Controller|-->Grant 1
          |           | ...
          |           |--<DMA 7
    clk>--|           |-->Grant 7
          ------------

控制器负责对总线上所有的主设备请求进行排队,其内部指定的优先级由DMA 0 -> DMA 7递减。被允许的设备,会通过Grant X信号通知主设备,由该信号控制设备连接在总线上的三态门,允许其与总线通讯。当任意设备发起DMA请求时,request输出高电平(该信号与总线上的request连接),通知从设备进行通讯。控制器内部有一个状态机,当有请求发出时的下一个时钟上跳进入busy状态,收到任何一个从设备发送来的ready后的下一个时钟上跳进入idle状态。在busy状态其输出不变,等待该设备通讯结束,总线才空闲。该控制器由bus_control.v实现。总线上的请求发送和数据接收都在一个时钟内完成。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
module bus_control(
dma,grant,req,ready,clk
);
input [7:0] dma;
input ready, clk;
output [7:0] grant;
output req;

//-------------------------- Module implementation -------------------------

//registered grant value.
reg [7:0] grant_reg;

//internal state machine
reg state;
always @(posedge clk)
begin
case (state)

//Idle state, in this state, if has a req, jump to state busy
//and register the grant value, this device is chosen, and
//other devices' request can't change the output.
0: begin
if (req)
state <= 1;
grant_reg <= grant_inner;
end
//Busy state, in this state, if has a ready, jump to state idle
1: begin
if (ready)
state <= 0;
end
endcase
end

//dma request queue.
reg [7:0] grant_inner;
always @(*)
begin
casez (dma)
8'bzzzzzzz1 : grant_inner = 8'b00000001;
8'bzzzzzz10 : grant_inner = 8'b00000010;
8'bzzzzz100 : grant_inner = 8'b00000100;
8'bzzzz1000 : grant_inner = 8'b00001000;
8'bzzz10000 : grant_inner = 8'b00010000;
8'bzz100000 : grant_inner = 8'b00100000;
8'bz1000000 : grant_inner = 8'b01000000;
8'b10000000 : grant_inner = 8'b10000000;
default : grant_inner = 8'b00000000;
endcase
end

//When state == 0 (idle), grant is the instant output of code above
//When state == 1 (busy), grant is the registered value.
//This lets the grant output stable when one device has already been
//chosen.
assign grant = state ? grant_reg : grant_inner;

//The req signal will remain untill ready signal is received.
assign req = (|grant) ? 1 : 0;

endmodule //bus_control

总线控制器仿真如图:
总线控制器仿真

总线上的设备

总线上的主设备,连接总线上address、data、r_w和ready,连接控制器的DMA X和Grant X。发起请求时,将DMA x置高电平,排队成功后,由Grant X信号控制address、data、r_w和ready上的三态门。只有当设备被控制器选中,address、data、r_w才能在总线上输出或者输入,否则这些信号是高阻状态。主设备对其是否被选中是不可知的,当某个主设备发起了请求,他便将其请求的地址、数据、读写情况放在输出端口,而输出端口的三态门是由总线控制器传回的Grant X信号控制的。若其未被控制器选中,该设备无法收到其他设备发送的ready信号,因此处于等待状态。因此在总线空闲的时候,如CPU连续cache命中,其他设备即可使用总线进行DMA请求。

总线上的从设备,其接口应对地址线进行范围判定,地址线选定该设备内地址即认为对该设备发起请求。在被请求的数据准备好后,应将数据输出到总线,并将ready置高电平。

CPU内部的cache控制器是主设备,只能发起请求而不能被请求。大部分的IO设备都可以作为主/从设备,既可以发起请求,也可以作为请求的对象。这样的设备要设计分开的端口,负责接受请求的从设备使用从设备接口,负责发起请求的主设备使用主设备接口。

测试

如何设计并单独测试总线这一部分呢?需要设计主从设备接口,主从设备模拟器,并将模拟设备接入总线和控制器进行方针和测试。

从设备模拟

首先设计了一个模块dummy_slave来模拟从设备在总线上的行为。这个从设备可以看作一个内存设备,需要几个周期将数据准备好,可以读可以写。这个从设备也可以用作处理器的测试。代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
//module for dummy slave devices.
module dummy_slave(
clk,address, data, request, ready_out, r_w
);
input clk, r_w, request;
input [31:0] address;
inout [31:0] data;
output ready_out;

//-------------------------- Module implementation -------------------------
//dummy memory
reg [31:0] mem [0:32-1];

//internal state machine
reg [ 2:0] state;

//address range
reg [31:0] entry_start, entry_end;

//internal registers for bus signal
reg [31:0] addr_reg,data_reg;
reg r_w_reg,selected_reg;

initial
begin
entry_start=32'b0;
entry_end =32'b11111;
state = 0;
//ready signal is Z when idle, ready line should have a tri0
//pulldown resistance. because there're other devices on the
//bus.
ready = 1'b0;
addr_reg =0;
data_reg=0;
r_w_reg=0;
selected_reg=0;
end


//selected if request in address range
reg selected;
always @(*)
begin
if ((address >= entry_start) &(address <=entry_end) &request )
selected = 1;
else selected =0;

end
//put the ready_out High Z when device is not selected.
reg ready;
assign ready_out = (selected | selected_reg) ? ready : 1'bz;

//implement inout data port.
//if device is selected and the request is a read request, this device
//will put data onto the bus. In any other condition, the output will be
//high Z.
assign data = (selected_reg & ~r_w_reg & ready) ? read :32'bz;

//read is the continuous read data out.
wire [31:0] read;
assign read = mem[addr_reg];

//the state machine implements the dummy wait cycles and ready signal.
//one dummy operation needs four cycles.
always @(posedge clk)
begin
//If device is in idle state and selected, register address, r_w
//and data.
if ((state == 2'b00)& selected) begin
state <= 2'b01;

//pull the ready line low.
ready <= 0;

//registered the request
addr_reg<= address;
r_w_reg <= r_w;
selected_reg<=selected;
if (r_w) begin
data_reg<= data;
end
end
//dummy write and read.
else if (state == 2'b01 ) begin
state <= 2'b10;
if (r_w_reg)
mem[addr_reg] <= data_reg;
end
//dummy wait.
else if (state == 2'b10) begin
state <= 2'b11;
end
//operation ready
else if (state == 2'b11) begin
state <= 3'b100;
//one cycle ready signal
ready <= 1;
end

//goto idle next cycle, ready for next request
else if (state ==3'b100) begin
state <= 00;
ready <= 1'b0;
selected_reg<= 0;
r_w_reg <=0;
data_reg <=0;
addr_reg<=0;
end
else begin
//if device is idle, and there's no request on this device
//clear all internal registers.
state <= 00;
ready <= 1'b0;
selected_reg<= 0;
r_w_reg <=0;
data_reg <=0;
addr_reg<=0;
end
end
endmodule //dummy_slave

这段代码,在quartus中综合生成了一个隐含的调用FPGA片上存储的模块,模块的类型是dual port/single clock RAM.其实我所设计的模型应当是一个单端口RAM,我查看了Altera的手册,也参考了quartus的代码片段,但是综合后的结果仍然是双端口RAM,这个问题让我想不明白。但是综合的结果满足我的设计需求。细想单端口和双端口RAM的区别,这里综合生成的双端口,应是读写端口共用地址线和数据线,但读端口的数据线用三态门控制,仅在写信号无效时才向总线输出。因为我并不清楚Quartus综合的细节,所以在这里我并不纠结为什么编译器没有按照我的要求综合成单端口RAM了,如果有哪位大佬知道原因,欢迎留言。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//Quaratus内置的单端口ram代码片段。
module single_port_RAM
#(parameter DATA_WIDTH=8, parameter ADDR_WIDTH=6)
(
input [(DATA_WIDTH-1):0] data,
input [(ADDR_WIDTH-1):0] addr,
input we, clk,
output [(DATA_WIDTH-1):0] q
);

// Declare the RAM variable
reg [DATA_WIDTH-1:0] ram[2**ADDR_WIDTH-1:0];

// Variable to hold the registered read address
reg [ADDR_WIDTH-1:0] addr_reg;

always @ (posedge clk)
begin
// Write
if (we)
ram[addr] <= data;

addr_reg <= addr;
end

// Continuous assignment implies read returns NEW data.
// This is the natural behavior of the TriMatrix memory
// blocks in Single Port mode.
assign q = ram[addr_reg];

dummy_slave的仿真结果:
dummy_slave仿真

主设备模拟

dummy master最终会被Cache、各种DMA设备替代,但是设计一个dummy master仍然有意义,它可以测试总线的功能,尤其是多个主设备同时请求的情况。

dummy master内部是一个请求队列,其将请求送至总线控制器,在控制器允许后将请求送至总线,等待被请求设备的ready信号,取走数据后重新排队,准备下一个请求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
module dummy_master(
clk, request, ready, grant, address, data, r_w
);
input clk, ready, grant;
output request, r_w;
output [31:0] address;
inout [31:0] data;

//dummy requests.
reg [31:0] req_addr [0:32-1];
reg [31:0] req_r_w;
reg [31:0] req;
wire ready_inner;

//dummy memory operation.
reg [31:0] mem [0:32-1];

reg [4:0] req_num ;
//initial dummy operations.
initial
begin
//simulated operation is here.
end
//internal state machine.
reg state;
always @(posedge clk)
begin
case (state)
1'b0:begin
if (request) state <= 1;
else req_num<=req_num +1;
end
1'b1:begin
if (ready_inner)
begin
if (~req_r_w [req_num])
mem[req_num] <= data;
state <=0;
req_num<=req_num +1;
end
end
endcase
end

assign request = req [req_num];

//High z when not granted
assign address = grant ? req_addr[req_num] : 32'bz;
assign r_w = grant ? req_r_w [req_num] : 1'bz;
wire [31:0] data_out = req_r_w [req_num] ? mem [req_num] :32'bz;
assign data = grant ? data_out: 32'bz;

//mask out ready signal when not granted
assign ready_inner = grant? ready :1'b0;

endmodule //dummy_master

仿真情况如图:
dummy_master 仿真

总线事务测试

这里设计的test bench在总线上连接了两个DMA设备,分别接入控制器的DMA[0]和DMA[1]因此,DMA[0]有高优先级。总线上同时接入了两个从设备(内存设备),两个内存设备的地址空间分别是32’b0 -> 32’b11111 和 32’b100000->32’b111111。test bench很简单,将线网连接至各个模块的实例就可以了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
//This module is the test bench for bus
module bus_t(
clk,address_o,data_o,request_o,ready_o,rw_o,DMA_o,grant_o ,ready_inner
);
output [31:0] address_o, data_o;
output request_o, ready_o,rw_o,ready_inner;
output [7:0] DMA_o, grant_o;

input clk;

//These wires are internal bus signal
wire [31:0] address,data;
wire request, ready, r_w;
wire [7:0] DMA, grant;

//these ports are used by the simulator output
assign address_o =address;
assign data_o = data;
assign request_o = request;
assign ready_o = ready;
assign rw_o = r_w;
assign DMA_o = DMA;
assign grant_o = grant;

//instances of bus component
bus_control bus_control_0 (DMA,grant, request,ready,clk);
dummy_slave dummy_slave_a (clk,address,data,request,ready,r_w);
dummy_slave_1 dummy_slave_b (clk,address,data,request,ready,r_w);
dummy_master dummy_master_a (clk,DMA[0],ready, grant[0],address,data,r_w);
dummy_master_1 dummy_master_b (clk,DMA[1],ready, grant[1],address,data,r_w,ready_inner);
endmodule

仿真输出为:
总线事务仿真
从仿真结果可以看到模拟的四个设备8个总线事务功能正常。

小结

总线的设计,可以隔离CPU和外围设备,甚至可以隔离IO,因为我所要设计的CPU使用memory mapped IO。有了总线,IO设备只是总线上链接的一个设备罢了, 而且可以轻松实现不同的内存映射,连接ROM和RAM。所以总线设计完成,是本项目的一个里程碑,它标志着对开发设计流程的验证。

关于Cache设计的思考 关于中断和io控制器
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×