Skip to content

Very small 32-bit soft core processor written in VHDL, designed for blitting and complex DMA operations within an Avalon bus fabric.

License

Notifications You must be signed in to change notification settings

esherriff/Praxos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Praxos Functional Description

Overview

The Praxos is a 32-bit soft core processor written in VHDL which has been designed for use as a flexible DMA controller in Avalon bus fabrics. The core’s feature set is inspired by the Microsemi coreABC AMBA bus controller, though the actual instruction set and architecture are based more upon the 8-bit Picoblaze and 16-bit Leros architectures. The Praxos instruction set provides direct access to a 32-bit Avalon memory mapped master interface, while also using faster internal program and data memory. To minimize logic usage the core is an accumulator-based architecture with a register-memory style instruction set. Features

  • 34 instructions with 3 address modes, branch instructions with most instructions requiring 3 clock cycles to execute (excluding branches, data memory reads and Avalon bus operations).
  • 232x36-bit program memory address space.
  • 228x32-bit data memory address space.
  • 216x32-bit IO address space (expandable to 232x32-bit).
  • 32-bit accumulator and 32-bit index register.
  • Direct, indirect (with offset) and immediate addressing modes.
  • 32-bit Avalon memory mapped master.

A typical Praxos core requires 450 logic elements and will run at over 80MHz in a Cyclone 10LP FPGA.

Architecture

The partial block diagram below shows the basic layout of the core.

Block diagram

The design makes use of the output registers within the program memory block RAM, which serve as an instruction register. The control signals from the FSM are registered to improve timing. The data RAM output registers are also used, which incurs an additional clock cycle delay to operations that read from data memory.

Each instruction is encoded as a single 36-bit word, which permits the loading of immediate 32-bit operands into the accumulator and index registers. The majority of instructions require 3 clock cycles to execute, branches and data memory reads take 4 cycles. Avalon bus operations require a minimum of 5 clock cycles, assuming zero wait states.

Not shown on the block diagram is the branch unit or the negative flags, which are simply the top bit of the respective register. Four flags are implemented, accumulator zero (AZ), accumulator negative (AN), index negative (IN) and IO zero (IOZ). The number of bits evaluated by the IO zero flag is configured via the IO_FLAG_WIDTH generic, which sets the number of bits tested starting from the lsb of port_in (1 to 32).

The depth of the program and data memories are also configured by generics. The program memory address width; PM_WIDTH is configurable between 8 and 32 bits. DM_WIDTH sets the data memory address width between 8 and 28 bits. In reality it is expected that only small memories would be utilised, with the default 8 bits corresponding to a single M9k block for each memory.

Registers

Praxos programs rely on manipulating just two 32-bit registers, an accumulator and an index register. The index register is used only to generate addresses for the data memory and Avalon bus. When used in combination with a location in data memory, the index register can be used as a stack pointer to implement subroutines.

Arithmetic and Logic Unit

The ALU has three basic modes of operation: addition/subtraction, shifting/rotation and bit-wise logical. All ALU results are stored in the accumulator.

Input/Output (IO) Ports

Praxos supports 65536 32-bit IO locations, which may be used to attach additional hardware to the processor. This IO port can be used to attach the Praxos to a larger, master processor, either as an Avalon slave or directly to the master’s memory bus.

Instruction Set

The table below summarises the instructions implemented.

Asssembler Mnemonic Description Flags Affected
a) ADD K

b) ADD# K
a) Adds the contents of data memory address K to the accumulator.

b) Adds the immediate operand K to the accumualtor. K is a 28-bit unsigned value
AZ

AN
a) SUB K

b) SUB# K
a) Subtracts the contents of data memory address K from the accumulator.

b) Subtracts the immediate operand K from the accumulator. K is a 28-bit unsigned value.
AZ

AN
a) BUSR K

b) BUSRBx K

c) BUSRHx K

d) BUSRW K
a) Reads a value from the Avalon bus at address I + K(27 downto 0) to the accumulator. K(31 downto 28) sets the bus byte enable.

b) x = 0-3, reads a 32-bit value from Avalon address I+K to the accumulator. x sets the byte lane enabled.

c) x = 0-1, reads a 32-bit value from Avalon address I+K to the accumulator. x sets the half word enabled on the Avalon bus (upper or lower).

d) Reads a 32-bit value from Avalon address I+K to the accumulator. All byte lanes are enabled.
AZ

AN
a) BUSW K

b) BUSWBx K

c) BUSWHx K

d) BUSWW K
a) Writes the accumulator to the Avalon bus at address index + K(27 downto 0). K(31 downto 28) sets the bus byte enable.

b) x = 0-3, Writes the accumulator to Avalon address index+K to the accumulator. x sets the byte lane enabled.

c) x = 0-1, writes the accumulator to Avalon address index+K to the accumulator. x sets the half word enabled on the Avalon bus (upper or lower).

d) Writes the accumulator to Avalon address index+K. All byte lanes are enabled.
AZ

AN
LD# K Loads the accumulator with the 32-bit constant K. AZ

AN
a) LD K

b) LDI K
Loads the accumulator with the contents of data memory location (a) K or (b) the index register+K . The result is stored in the accumulator. AZ

AN
a) AND K

b) ANDI K
Performs the logical AND of the accumulator with the contents of data memory location (a) K or (b) the index register + K . The result is stored in the accumulator. AZ

AN
a) OR K

b) ORI K
Performs the logical OR of the accumulator with the contents of data memory location (a) K or (b) the index register + K . The result is stored in the accumulator. AZ

AN
a) XOR K

b) XORI K
Performs the logical XOR of the accumulator with the contents of data memory location (a) K or (b) the index register + K . The result is stored in the accumulator. AZ

AN
a) SHL0

b) SHL1

c) SHLX
Shifts the accumulator left by one bit. The least significant bit is set to (a) 0, (b) 1 or (c) left unchanged. AZ

AN
ROL Rotates the accumulator left by one bit. AZ

AN
a) SHR0

b) SHR1

c) SHRX
Shifts the accumulator right by one bit. The most significant bit is set to (a) 0, (b) 1 or (c) left unchanged. AZ

AN
ROR Rotates the accumulator right by one bit. AZ

AN
OUT K Writes the accumulator to the output port specified by K. -
IN K Reads the value from the IO port specified by K to the accumulator. AZ

AN
a) ST K

b) STI K
a) Stores the accumulator to data memory location (a) K or (b) I + K. -
ILD K Loads the index register with the contents of data memory location K. IN
ILD# K Loads the index register with the 32-bit constant K. IN
IST K Writes the index register to data memory location K. -
IADD# K Adds the 32-bit constant K to the index register. IN
PUSH Decrements the index register then writes the accumulator to data memory location specified by the index register. IN
POP Writes the contents of data memory location specified by the index register to the accumulator, then increments the index register. IN
JAL K Jump and Link. Writes the program counter to data memory location index+K, then writes the accumulator to the program counter. -
BR K Branch always. Adds the signed 28-bit constant K to the program counter. -
BRZ K If the accumulator = 0, adds the signed 28-bit constant K to the program counter. -
BRNZ K If the accumulator ≠ 0, adds the signed 28-bit constant K to the program counter. -
BRP K If the accumulator msb = 0, adds the signed 28-bit constant K to the program counter. -
BRN K If the accumulator msb = 1, adds the signed 28-bit constant K to the program counter. -
BRIN K If the index register msb = 1, adds the signed 28-bit constant K to the program counter. -
BRIO K If port_in(IO_FLAG_WIDTH-1 downto 0) = 0, adds the signed 28-bit constant K to the program counter. -
NOP No operation, 4 clock cycles (implemented internally as branch never). -

Praxis Assembler

The assembler converts text files (.asm) containing assembly instructions and directives into a Memory Initialisation File (MIF), Mentor memory file (MEM) and a VHDL application image file. A program is assembled by running praxis.exe <PM_WIDTH> Where specifies the file to be assembled and PM_WIDTH specifies the width of the program memory address bus. Any assembly errors will be written to the console. The VHDL application image contains a vendor agnostic VHDL package that can initialise the program memory. It also doubles as a listing file by providing the assembly listing as VHDL comment next to the corresponding opcode.

Directives

Directives are used to instruct the assembler how to assemble the program. All directives are prefixed with a period character (.).

Currently the assembler only supports one directive, .EQU. Which is used to define a numerical value to an alphanumeric reference in the form .EQU X Y, where X is the label and Y is the value. .EQU is used to both label locations in data memory or constant values, depending upon the addressing mode of the instruction in which they appear as an operand.

Operators and functions

Currently the assembler supports only numerical constant values in decimal or hexadecimal format. Hexadecimal values must be prefixed with a $ character.

Labels

Labels are used to refer to program locations symbolically. A label consists of a @ character followed by an alphanumeric string. A line containing a label must also contain a valid assembly instruction, an otherwise empty line cannot be labelled.

Comments

Comments are initiated with a ; character and terminate at the next line break.

Jump and Link

The Praxos processor does not implement a traditional call stack using call and return operations. Instead it uses a single instruction JAL which can serve as both a call or return operation when used appropriately in combination with operations on the index register. It is recommended that a data memory location be allocated for use as a stack pointer so that the index register can be used for other purposes besides maintaining the stack.

The JAL instruction accepts an operand in the form of an offset to the index register though it does not update the index register, this must be done using a separate IADD# instruction to update the index register following a call or return.

For example the following code implements the equivalent of call and return from a nested subroutines:

.EQU sp 0               ; allocate a stack pointer at DM(0)
.EQU av_addr1 1         ; allocate an Avalon address at DM(1)	
.EQU av_addr2 2         ; allocate an Avalon address at DM(1)	
.EQU mask1 3            ; some more variables
.EQU	mask2 4
.EQU io 5
.EQU call_ret -1        ; handy constant
;
		ld# 0
		st av_addr1
		ld# 4
		st av_addr2
		ld#	$80000000      ; load a constant
		st	mask1          ; store it
		ror
		st	mask2
		ld#	0              ; load accumulator
		st	sp             ; initialise stack pointer
@main	ild	sp             ; load the index register with sp
		ld#	10             ; load subroutine parameter
		push               ; push the parameter
		ld#	dec            ; point the accumulator at @dec
		jal	call_ret       ; jump, link to sp-1
		iadd# 1            ; clear the parameter we pushed
		ist	sp             ; save the stack pointer
; do some unrelated stuff with the index register
		ild	av_addr1       ; load first Avalon address
		busrw	0          ; read from it
		iadd#	4          ; increment address
		ist	av_addr1       ; store address
		ldi	av_addr2       ; load second Avalon address
		busww	0          ; write to it
		iadd#	-4         ; decrement address
		ist	av_addr2       ; store address
		ild	sp             ; load the stack pointer
; call from here as well
		ld	mask2          ; load a bit mask
		push               ; push bit mask
		ld#	tog            ; point at tog
		jal	call_ret       ; call
		iadd# 1            ; clear the parameter we pushed
		br	main           ; jump
; subroutine1
@dec	ldi	0              ; load the parameter
@dec_lp	sub#	1          ; decrement the parameter
		brnz	dec_lp     ; loop until zero
		iadd#	call_ret   ; push parent return address
		ld	mask1          ; load parameter
		push               ; push parameter
		ld#	tog            ; point at @tog
		jal	call_ret       ; call
		iadd# 1            ; clear pushed parameter
		pop                ; pop return address
		jal	call_ret       ; return
; subroutine2
@tog	ldi	0              ; load parameter
		xor	io             ; xor mask with data
		out	0              ; write to IO port
		st	io             ; store data
		ldi	call_ret       ; pop return address
		jal	call_ret       ; return

As can be seen from the above code, intelligent use of indexed addressing modes allows the management of both a call stack and stack frame. Functions may also push return values onto the stack before returning to the caller, return values can be saved to location -2 using sti and retrieved by the caller using ldi. Two important principles apply; first, that a subroutine that needs to perform a call allocate stack space for its caller’s return address (lines marked with †). Second, that any code that pushed parameters onto the stack before a call must clear them from the stack when the function returns (line marked with *). Both operations are conducted by simply updating the index register using the IADD# instruction.

When switching the index register to another purpose, it is stored to the sp location in data memory and restored afterwards.

Conclusion

A design for a small 32-bit CPU has been presented which fills the requirements for a means to rapidly move data around an Avalon bus. There are several areas that would benefit from further work:

  1. The existing Praxis assembler lacks many useful features, such as support for arithmetic expressions or conditional assembly.

  2. Integration with the Neo430 as a new IO peripheral with corresponding C driver should also be considered.

  3. A testbench disassembler for RTL simulation of the core should be written.

About

Very small 32-bit soft core processor written in VHDL, designed for blitting and complex DMA operations within an Avalon bus fabric.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published