Synthesis starter

This page will be a guided walkthrough of the prepackaged iCE40 FPGA synthesis script - synth_ice40. We will take a simple design through each step, looking at the commands being called and what they do to the design. While synth_ice40 is specific to the iCE40 platform, most of the operations we will be discussing are common across the majority of FPGA synthesis scripts. Thus, this document will provide a good foundational understanding of how synthesis in Yosys is performed, regardless of the actual architecture being used.

See also

Advanced usage docs for Synth commands

Demo design

First, let’s quickly look at the design we’ll be synthesizing:

Listing 1 fifo.v
 1// address generator/counter
 2module addr_gen 
 3#(  parameter MAX_DATA=256,
 4	localparam AWIDTH = $clog2(MAX_DATA)
 5) ( input en, clk, rst,
 6	output reg [AWIDTH-1:0] addr
 7);
 8	initial addr <= 0;
 9
10	// async reset
11	// increment address when enabled
12	always @(posedge clk or posedge rst)
13		if (rst)
14			addr <= 0;
15		else if (en) begin
16			if (addr == MAX_DATA-1)
17				addr <= 0;
18			else
19				addr <= addr + 1;
20		end
21endmodule //addr_gen
22
23// Define our top level fifo entity
24module fifo 
25#(  parameter MAX_DATA=256,
26	localparam AWIDTH = $clog2(MAX_DATA)
27) ( input wen, ren, clk, rst,
28	input [7:0] wdata,
29	output reg [7:0] rdata,
30	output reg [AWIDTH:0] count
31);
32	// fifo storage
33	// sync read before write
34	wire [AWIDTH-1:0] waddr, raddr;
35	reg [7:0] data [MAX_DATA-1:0];
36	always @(posedge clk) begin
37		if (wen)
38			data[waddr] <= wdata;
39		rdata <= data[raddr];
40	end // storage
41
42	// addr_gen for both write and read addresses
43	addr_gen #(.MAX_DATA(MAX_DATA))
44	fifo_writer (
45		.en     (wen),
46		.clk    (clk),
47		.rst    (rst),
48		.addr   (waddr)
49	);
50
51	addr_gen #(.MAX_DATA(MAX_DATA))
52	fifo_reader (
53		.en     (ren),
54		.clk    (clk),
55		.rst    (rst),
56		.addr   (raddr)
57	);
58
59	// status signals
60	initial count <= 0;
61
62	always @(posedge clk or posedge rst) begin
63		if (rst)
64			count <= 0;
65		else if (wen && !ren)
66			count <= count + 1;
67		else if (ren && !wen)
68			count <= count - 1;
69	end
70
71endmodule

Loading the design

Let’s load the design into Yosys. From the command line, we can call yosys fifo.v. This will open an interactive Yosys shell session and immediately parse the code from fifo.v and convert it into an Abstract Syntax Tree (AST). If you are interested in how this happens, there is more information in the document, The Verilog and AST frontends. For now, suffice it to say that we do this to simplify further processing of the design. You should see something like the following:

$ yosys fifo.v

-- Parsing `fifo.v' using frontend ` -vlog2k' --

1. Executing Verilog-2005 frontend: fifo.v
Parsing Verilog input from `fifo.v' to AST representation.
Storing AST representation for module `$abstract\addr_gen'.
Storing AST representation for module `$abstract\fifo'.
Successfully finished Verilog frontend.

See also

Advanced usage docs for Loading a design

Elaboration

Now that we are in the interactive shell, we can call Yosys commands directly. Our overall goal is to call synth_ice40 -top fifo, but for now we can run each of the commands individually for a better sense of how each part contributes to the flow. We will also start with just a single module; addr_gen.

At the bottom of the help output for synth_ice40 is the complete list of commands called by this script. Let’s start with the section labeled begin:

Listing 2 begin section
read_verilog -D ICE40_HX -lib -specify +/ice40/cells_sim.v
hierarchy -check -top <top>
proc

read_verilog -D ICE40_HX -lib -specify +/ice40/cells_sim.v loads the iCE40 cell models which allows us to include platform specific IP blocks in our design. PLLs are a common example of this, where we might need to reference SB_PLL40_CORE directly rather than being able to rely on mapping passes later. Since our simple design doesn’t use any of these IP blocks, we can skip this command for now. Because these cell models will also be needed once we start mapping to hardware we will still need to load them later.

Note

+/ is a dynamic reference to the Yosys share directory. By default, this is /usr/local/share/yosys. If using a locally built version of Yosys from the source directory, this will be the share folder in the same directory.

The addr_gen module

Since we’re just getting started, let’s instead begin with hierarchy -top addr_gen. This command declares that the top level module is addr_gen, and everything else can be discarded.

Listing 3 addr_gen module source
 2module addr_gen 
 3#(  parameter MAX_DATA=256,
 4	localparam AWIDTH = $clog2(MAX_DATA)
 5) ( input en, clk, rst,
 6	output reg [AWIDTH-1:0] addr
 7);
 8	initial addr <= 0;
 9
10	// async reset
11	// increment address when enabled
12	always @(posedge clk or posedge rst)
13		if (rst)
14			addr <= 0;
15		else if (en) begin
16			if (addr == MAX_DATA-1)
17				addr <= 0;
18			else
19				addr <= addr + 1;
20		end
21endmodule //addr_gen

Note

hierarchy should always be the first command after the design has been read. By specifying the top module, hierarchy will also set the (* top *) attribute on it. This is used by other commands that need to know which module is the top.

Listing 4 hierarchy -top addr_gen output
yosys> hierarchy -top addr_gen

2. Executing HIERARCHY pass (managing design hierarchy).

3. Executing AST frontend in derive mode using pre-parsed AST for module `\addr_gen'.
Generating RTLIL representation for module `\addr_gen'.

3.1. Analyzing design hierarchy..
Top module:  \addr_gen

3.2. Analyzing design hierarchy..
Top module:  \addr_gen
Removing unused module `$abstract\fifo'.
Removing unused module `$abstract\addr_gen'.
Removed 2 unused modules.

Our addr_gen circuit now looks like this:

../_images/addr_gen_hier.svg

Fig. 2 addr_gen module after hierarchy

Simple operations like addr + 1 and addr == MAX_DATA-1 can be extracted from our always @ block in addr_gen module source. This gives us the highlighted $add and $eq cells we see. But control logic (like the if .. else) and memory elements (like the addr <= 0) are not so straightforward. These get put into “processes”, shown in the schematic as PROC. Note how the second line refers to the line numbers of the start/end of the corresponding always @ block. In the case of an initial block, we instead see the PROC referring to line 0.

To handle these, let us now introduce the next command: proc - translate processes to netlists. proc is a macro command like synth_ice40. Rather than modifying the design directly, it instead calls a series of other commands. In the case of proc, these sub-commands work to convert the behavioral logic of processes into multiplexers and registers. Let’s see what happens when we run it. For now, we will call proc -noopt to prevent some automatic optimizations which would normally happen.

../_images/addr_gen_proc.svg

Fig. 3 addr_gen module after proc -noopt

There are now a few new cells from our always @, which have been highlighted. The if statements are now modeled with $mux cells, while the register uses an $adff cell. If we look at the terminal output we can also see all of the different proc_* commands being called. We will look at each of these in more detail in Converting process blocks.

Notice how in the top left of addr_gen module after proc -noopt we have a floating wire, generated from the initial assignment of 0 to the addr wire. However, this initial assignment is not synthesizable, so this will need to be cleaned up before we can generate the physical hardware. We can do this now by calling clean. We’re also going to call opt_expr now, which would normally be called at the end of proc. We can call both commands at the same time by separating them with a colon and space: opt_expr; clean.

../_images/addr_gen_clean.svg

Fig. 4 addr_gen module after opt_expr; clean

You may also notice that the highlighted $eq cell input of 255 has changed to 8'11111111. Constant values are presented in the format <bit_width>'<bits>, with 32-bit values instead using the decimal number. This indicates that the constant input has been reduced from 32-bit wide to 8-bit wide. This is a side-effect of running opt_expr, which performs constant folding and simple expression rewriting. For more on why this happens, refer to Optimization passes and the section on opt_expr.

Note

clean - remove unused cells and wires can also be called with two semicolons after any command, for example we could have called opt_expr;; instead of opt_expr; clean. You may notice some scripts will end each line with ;;. It is beneficial to run clean before inspecting intermediate products to remove disconnected parts of the circuit which have been left over, and in some cases can reduce the processing required in subsequent commands.

See also

Advanced usage docs for

The full example

Let’s now go back and check on our full design by using hierarchy -check -top fifo. By passing the -check option there we are also telling the hierarchy command that if the design includes any non-blackbox modules without an implementation it should return an error.

Note that if we tried to run this command now then we would get an error. This is because we already removed all of the modules other than addr_gen. We could restart our shell session, but instead let’s use two new commands:

Listing 5 reloading fifo.v and running hierarchy -check -top fifo
yosys> design -reset

yosys> read_verilog fifo.v

11. Executing Verilog-2005 frontend: fifo.v
Parsing Verilog input from `fifo.v' to AST representation.
Generating RTLIL representation for module `\addr_gen'.
Generating RTLIL representation for module `\fifo'.
Successfully finished Verilog frontend.

yosys> hierarchy -check -top fifo

12. Executing HIERARCHY pass (managing design hierarchy).

12.1. Analyzing design hierarchy..
Top module:  \fifo
Used module:     \addr_gen
Parameter \MAX_DATA = 256

12.2. Executing AST frontend in derive mode using pre-parsed AST for module `\addr_gen'.
Parameter \MAX_DATA = 256
Generating RTLIL representation for module `$paramod\addr_gen\MAX_DATA=s32'00000000000000000000000100000000'.
Parameter \MAX_DATA = 256
Found cached RTLIL representation for module `$paramod\addr_gen\MAX_DATA=s32'00000000000000000000000100000000'.

12.3. Analyzing design hierarchy..
Top module:  \fifo
Used module:     $paramod\addr_gen\MAX_DATA=s32'00000000000000000000000100000000

12.4. Analyzing design hierarchy..
Top module:  \fifo
Used module:     $paramod\addr_gen\MAX_DATA=s32'00000000000000000000000100000000
Removing unused module `\addr_gen'.
Removed 1 unused modules.

Notice how this time we didn’t see any of those $abstract modules? That’s because when we ran yosys fifo.v, the first command Yosys called was read_verilog -defer fifo.v. The -defer option there tells read_verilog only read the abstract syntax tree and defer actual compilation to a later hierarchy command. This is useful in cases where the default parameters of modules yield invalid code which is not synthesizable. This is why Yosys defers compilation automatically and is one of the reasons why hierarchy should always be the first command after loading the design. If we know that our design won’t run into this issue, we can skip the -defer.

Note

The number before a command’s output increments with each command run. Don’t worry if your numbers don’t match ours! The output you are seeing comes from the same script that was used to generate the images in this document, included in the source as fifo.ys. There are extra commands being run which you don’t see, but feel free to try them yourself, or play around with different commands. You can always start over with a clean slate by calling exit or hitting ctrl+d (i.e. EOF) and re-launching the Yosys interactive terminal. ctrl+c (i.e. SIGINT) will also end the terminal session but will return an error code rather than exiting gracefully.

We can also run proc now to finish off the full begin section. Because the design schematic is quite large, we will be showing just the data path for the rdata output. If you would like to see the entire design for yourself, you can do so with show - generate schematics using graphviz. Note that the show command only works with a single module, so you may need to call it with show fifo. Displaying schematics section in Scripting in Yosys has more on how to use show.

../_images/rdata_proc.svg

Fig. 5 rdata output after proc

The highlighted fifo_reader block contains an instance of the addr_gen module after proc -noopt that we looked at earlier. Notice how the type is shown as $paramod\\addr_gen\\MAX_DATA=s32'.... This is a “parametric module”: an instance of the addr_gen module with the MAX_DATA parameter set to the given value.

The other highlighted block is a $memrd cell. At this stage of synthesis we don’t yet know what type of memory is going to be implemented, but we do know that rdata <= data[raddr]; could be implemented as a read from memory. Note that the $memrd cell here is asynchronous, with both the clock and enable signal undefined; shown with the 1'x inputs.

See also

Advanced usage docs for Converting process blocks

Flattening

At this stage of a synthesis flow there are a few other commands we could run. In synth_ice40 we get these:

Listing 6 flatten section
flatten
tribuf -logic
deminout

First off is flatten. Flattening the design like this can allow for optimizations between modules which would otherwise be missed. Let’s run flatten;; on our design.

Listing 7 output of flatten;;
yosys> flatten

15. Executing FLATTEN pass (flatten design).
Deleting now unused module $paramod\addr_gen\MAX_DATA=s32'00000000000000000000000100000000.
<suppressed ~2 debug messages>

yosys> clean
Removed 3 unused cells and 25 unused wires.
../_images/rdata_flat.svg

Fig. 6 rdata output after flatten;;

The pieces have moved around a bit, but we can see addr_gen module after proc -noopt from earlier has replaced the fifo_reader block in rdata output after proc. We can also see that the addr output has been renamed to fifo_reader.addr and merged with the raddr wire feeding into the $memrd cell. This wire merging happened during the call to clean which we can see in the output of flatten;;.

Note

flatten and clean would normally be combined into a single yosys> flatten;; output, but they appear separately here as a side effect of using echo for generating the terminal style output.

Depending on the target architecture, this stage of synthesis might also see commands such as tribuf with the -logic option and deminout. These remove tristate and inout constructs respectively, replacing them with logic suitable for mapping to an FPGA. Since we do not have any such constructs in our example running these commands does not change our design.

The coarse-grain representation

At this stage, the design is in coarse-grain representation. It still looks recognizable, and cells are word-level operators with parametrizable width. This is the stage of synthesis where we do things like const propagation, expression rewriting, and trimming unused parts of wires.

This is also where we convert our FSMs and hard blocks like DSPs or memories. Such elements have to be inferred from patterns in the design and there are special passes for each. Detection of these patterns can also be affected by optimizations and other transformations done previously.

Note

While the iCE40 flow had a flatten section and put proc in the begin section, some synthesis scripts will instead include these in this section.

Part 1

In the iCE40 flow, we start with the following commands:

Listing 8 coarse section (part 1)
opt_expr
opt_clean
check
opt -nodffe -nosdff
fsm
opt

We’ve already come across opt_expr, and opt_clean is the same as clean but with more verbose output. The check pass identifies a few obvious problems which will cause errors later. Calling it here lets us fail faster rather than wasting time on something we know is impossible.

Next up is opt -nodffe -nosdff performing a set of simple optimizations on the design. This command also ensures that only a specific subset of FF types are included, in preparation for the next command: fsm - extract and optimize finite state machines. Both opt and fsm are macro commands which are explored in more detail in Optimization passes and FSM handling respectively.

Up until now, the data path for rdata has remained the same since rdata output after flatten;;. However the next call to opt does cause a change. Specifically, the call to opt_dff without the -nodffe -nosdff options is able to fold one of the $mux cells into the $adff to form an $adffe cell; highlighted below:

Listing 9 output of opt_dff
yosys> opt_dff

17. Executing OPT_DFF pass (perform DFF optimizations).
Adding EN signal on $procdff$55 ($adff) from module fifo (D = $0\count[8:0], Q = \count).
Adding EN signal on $flatten\fifo_writer.$procdff$60 ($adff) from module fifo (D = $flatten\fifo_writer.$procmux$51_Y, Q = \fifo_writer.addr).
Adding EN signal on $flatten\fifo_reader.$procdff$60 ($adff) from module fifo (D = $flatten\fifo_reader.$procmux$51_Y, Q = \fifo_reader.addr).
../_images/rdata_adffe.svg

Fig. 7 rdata output after opt_dff

See also

Advanced usage docs for

Part 2

The next group of commands performs a series of optimizations:

Listing 10 coarse section (part 2)
wreduce
peepopt
opt_clean
share
techmap -map +/cmp2lut.v -D LUT_WIDTH=4
opt_expr
opt_clean
memory_dff [-no-rw-check]

First up is wreduce - reduce the word size of operations if possible. If we run this we get the following:

Listing 11 output of wreduce
yosys> wreduce

19. Executing WREDUCE pass (reducing word size of cells).
Removed top 31 bits (of 32) from port B of cell fifo.$add$fifo.v:66$27 ($add).
Removed top 23 bits (of 32) from port Y of cell fifo.$add$fifo.v:66$27 ($add).
Removed top 31 bits (of 32) from port B of cell fifo.$sub$fifo.v:68$30 ($sub).
Removed top 23 bits (of 32) from port Y of cell fifo.$sub$fifo.v:68$30 ($sub).
Removed top 1 bits (of 2) from port B of cell fifo.$auto$opt_dff.cc:195:make_patterns_logic$66 ($ne).
Removed cell fifo.$flatten\fifo_writer.$procmux$53 ($mux).
Removed top 31 bits (of 32) from port B of cell fifo.$flatten\fifo_writer.$add$fifo.v:19$34 ($add).
Removed top 24 bits (of 32) from port Y of cell fifo.$flatten\fifo_writer.$add$fifo.v:19$34 ($add).
Removed cell fifo.$flatten\fifo_reader.$procmux$53 ($mux).
Removed top 31 bits (of 32) from port B of cell fifo.$flatten\fifo_reader.$add$fifo.v:19$34 ($add).
Removed top 24 bits (of 32) from port Y of cell fifo.$flatten\fifo_reader.$add$fifo.v:19$34 ($add).
Removed top 23 bits (of 32) from wire fifo.$add$fifo.v:66$27_Y.
Removed top 24 bits (of 32) from wire fifo.$flatten\fifo_reader.$add$fifo.v:19$34_Y.

yosys> show -notitle -format dot -prefix rdata_wreduce o:rdata %ci*

20. Generating Graphviz representation of design.
Writing dot description to `rdata_wreduce.dot'.
Dumping selected parts of module fifo to page 1.

yosys> opt_clean

21. Executing OPT_CLEAN pass (remove unused cells and wires).
Finding unused cells or wires in module \fifo..
Removed 0 unused cells and 4 unused wires.
<suppressed ~1 debug messages>

yosys> memory_dff

22. Executing MEMORY_DFF pass (merging $dff cells to $memrd).
Checking read port `\data'[0] in module `\fifo': merging output FF to cell.
    Write port 0: non-transparent.

Looking at the data path for rdata, the most relevant of these width reductions are the ones affecting fifo.$flatten\fifo_reader.$add$fifo.v. That is the $add cell incrementing the fifo_reader address. We can look at the schematic and see the output of that cell has now changed.

../_images/rdata_wreduce.svg

Fig. 8 rdata output after wreduce

The next two (new) commands are peepopt - collection of peephole optimizers and share - perform sat-based resource sharing. Neither of these affect our design, and they’re explored in more detail in Optimization passes, so let’s skip over them. techmap -map +/cmp2lut.v -D LUT_WIDTH=4 optimizes certain comparison operators by converting them to LUTs instead. The usage of techmap is explored more in Technology mapping.

Our next command to run is memory_dff - merge input/output DFFs into memory read ports.

Listing 12 output of memory_dff
yosys> memory_dff

22. Executing MEMORY_DFF pass (merging $dff cells to $memrd).
Checking read port `\data'[0] in module `\fifo': merging output FF to cell.
    Write port 0: non-transparent.
../_images/rdata_memrdv2.svg

Fig. 9 rdata output after memory_dff

As the title suggests, memory_dff has merged the output $dff into the $memrd cell and converted it to a $memrd_v2 (highlighted). This has also connected the CLK port to the clk input as it is now a synchronous memory read with appropriate enable (EN=1'1) and reset (ARST=1'0 and SRST=1'0) inputs.

See also

Advanced usage docs for

Part 3

The third part of the synth_ice40 flow is a series of commands for mapping to DSPs. By default, the iCE40 flow will not map to the hardware DSP blocks and will only be performed if called with the -dsp flag: synth_ice40 -dsp. While our example has nothing that could be mapped to DSPs we can still take a quick look at the commands here and describe what they do.

Listing 13 coarse section (part 3)
wreduce t:$mul
techmap -map +/mul2dsp.v -map +/ice40/dsp_map.v -D DSP_A_MAXWIDTH=16 -D DSP_B_MAXWIDTH=16 -D DSP_A_MINWIDTH=2 -D DSP_B_MINWIDTH=2 -D DSP_Y_MINWIDTH=11 -D DSP_NAME=$__MUL16X16    (if -dsp)
select a:mul2dsp                  (if -dsp)
setattr -unset mul2dsp            (if -dsp)
opt_expr -fine                    (if -dsp)
wreduce                           (if -dsp)
select -clear                     (if -dsp)
ice40_dsp                         (if -dsp)
chtype -set $mul t:$__soft_mul    (if -dsp)

wreduce t:$mul performs width reduction again, this time targetting only cells of type $mul. techmap -map +/mul2dsp.v -map +/ice40/dsp_map.v ... -D DSP_NAME=$__MUL16X16 uses techmap to map $mul cells to $__MUL16X16 which are, in turn, mapped to the iCE40 SB_MAC16. Any multipliers which aren’t compatible with conversion to $__MUL16X16 are relabelled to $__soft_mul before chtype changes them back to $mul.

During the mul2dsp conversion, some of the intermediate signals are marked with the attribute mul2dsp. By calling select a:mul2dsp we restrict the following commands to only operate on the cells and wires used for these signals. setattr removes the now unnecessary mul2dsp attribute. opt_expr we’ve already come across for const folding and simple expression rewriting, the -fine option just enables more fine-grain optimizations. Then we perform width reduction a final time and clear the selection.

Finally we have ice40_dsp: similar to the memory_dff command we saw in the previous section, this merges any surrounding registers into the SB_MAC16 cell. This includes not just the input/output registers, but also pipeline registers and even a post-adder where applicable: turning a multiply + add into a single multiply-accumulate.

See also

Advanced usage docs for Technology mapping

Part 4

That brings us to the fourth and final part for the iCE40 synthesis flow:

Listing 14 coarse section (part 4)
alumacc
opt
memory -nomap [-no-rw-check]
opt_clean

Where before each type of arithmetic operation had its own cell, e.g. $add, we now want to extract these into $alu and $macc cells which can help identify opportunities for reusing logic. We do this by running alumacc, which we can see produce the following changes in our example design:

Listing 15 output of alumacc
yosys> alumacc

24. Executing ALUMACC pass (create $alu and $macc cells).
Extracting $alu and $macc cells in module fifo:
  creating $macc model for $add$fifo.v:66$27 ($add).
  creating $macc model for $flatten\fifo_reader.$add$fifo.v:19$34 ($add).
  creating $macc model for $flatten\fifo_writer.$add$fifo.v:19$34 ($add).
  creating $macc model for $sub$fifo.v:68$30 ($sub).
  creating $alu model for $macc $sub$fifo.v:68$30.
  creating $alu model for $macc $flatten\fifo_writer.$add$fifo.v:19$34.
  creating $alu model for $macc $flatten\fifo_reader.$add$fifo.v:19$34.
  creating $alu model for $macc $add$fifo.v:66$27.
  creating $alu cell for $add$fifo.v:66$27: $auto$alumacc.cc:485:replace_alu$80
  creating $alu cell for $flatten\fifo_reader.$add$fifo.v:19$34: $auto$alumacc.cc:485:replace_alu$83
  creating $alu cell for $flatten\fifo_writer.$add$fifo.v:19$34: $auto$alumacc.cc:485:replace_alu$86
  creating $alu cell for $sub$fifo.v:68$30: $auto$alumacc.cc:485:replace_alu$89
  created 4 $alu and 0 $macc cells.
../_images/rdata_alumacc.svg

Fig. 10 rdata output after alumacc

Once these cells have been inserted, the call to opt can combine cells which are now identical but may have been missed due to e.g. the difference between $add and $sub.

The other new command in this part is memory - translate memories to basic cells. memory is another macro command which we examine in more detail in Memory handling. For this document, let us focus just on the step most relevant to our example: memory_collect. Up until this point, our memory reads and our memory writes have been totally disjoint cells; operating on the same memory only in the abstract. memory_collect combines all of the reads and writes for a memory block into a single cell.

../_images/rdata_coarse.svg

Fig. 11 rdata output after memory_collect

Looking at the schematic after running memory_collect we see that our $memrd_v2 cell has been replaced with a $mem_v2 cell named data, the same name that we used in fifo.v. Where before we had a single set of signals for address and enable, we now have one set for reading (RD_*) and one for writing (WR_*), as well as both WR_DATA input and RD_DATA output.

See also

Advanced usage docs for

Final note

Having now reached the end of the the coarse-grain representation, we could also have gotten here by running synth_ice40 -top fifo -run :map_ram after loading the design. The -run <from_label>:<to_label> option with an empty <from_label> starts from the begin section, while the <to_label> runs up to but including the map_ram section.

Hardware mapping

The remaining sections each map a different type of hardware and are much more architecture dependent than the previous sections. As such we will only be looking at each section very briefly.

If you skipped calling read_verilog -D ICE40_HX -lib -specify +/ice40/cells_sim.v earlier, do it now.

Memory blocks

Mapping to hard memory blocks uses a combination of memory_libmap and techmap.

Listing 16 map_ram section
memory_libmap -lib +/ice40/brams.txt -lib +/ice40/spram.txt [-no-auto-huge] [-no-auto-block]    (-no-auto-huge unless -spram, -no-auto-block if -nobram)
techmap -map +/ice40/brams_map.v -map +/ice40/spram_map.v
ice40_braminit
../_images/rdata_map_ram.svg

Fig. 12 rdata output after map_ram section

The map_ram section converts the generic $mem_v2 into the iCE40 SB_RAM40_4K (highlighted). We can also see the memory address has been remapped, and the data bits have been reordered (or swizzled). There is also now a $mux cell controlling the value of rdata. In fifo.v we wrote our memory as read-before-write, however the SB_RAM40_4K has undefined behaviour when reading from and writing to the same address in the same cycle. As a result, extra logic is added so that the generated circuit matches the behaviour of the verilog. Synchronous SDP with undefined collision behavior describes how we could change our verilog to match our hardware instead.

If we run memory_libmap under the debug command we can see candidates which were identified for mapping, along with the costs of each and what logic requires emulation.

yosys> debug memory_libmap -lib +/ice40/brams.txt -lib +/ice40/spram.txt -no-auto-huge
4. Executing MEMORY_LIBMAP pass (mapping memories to cells).
Memory fifo.data mapping candidates (post-geometry):
- logic fallback
  - cost: 2048.000000
- $__ICE40_RAM4K_:
  - option HAS_BE 0
  - emulation score: 7
  - replicates (for ports): 1
  - replicates (for data): 1
  - mux score: 0
  - demux score: 0
  - cost: 78.000000
  - abits 11 dbits 2 4 8 16
  - chosen base width 8
  - swizzle 0 1 2 3 4 5 6 7
  - emulate read-first behavior
  - write port 0: port group W
    - widths 2 4 8
  - read port 0: port group R
    - widths 2 4 8 16
    - emulate transparency with write port 0
- $__ICE40_RAM4K_:
  - option HAS_BE 1
  - emulation score: 7
  - replicates (for ports): 1
  - replicates (for data): 1
  - mux score: 0
  - demux score: 0
  - cost: 78.000000
  - abits 11 dbits 2 4 8 16
  - byte width 1
  - chosen base width 8
  - swizzle 0 1 2 3 4 5 6 7
  - emulate read-first behavior
  - write port 0: port group W
    - widths 16
  - read port 0: port group R
    - widths 2 4 8 16
    - emulate transparency with write port 0
Memory fifo.data mapping candidates (after post-geometry prune):
- logic fallback
  - cost: 2048.000000
- $__ICE40_RAM4K_:
  - option HAS_BE 0
  - emulation score: 7
  - replicates (for ports): 1
  - replicates (for data): 1
  - mux score: 0
  - demux score: 0
  - cost: 78.000000
  - abits 11 dbits 2 4 8 16
  - chosen base width 8
  - swizzle 0 1 2 3 4 5 6 7
  - emulate read-first behavior
  - write port 0: port group W
    - widths 2 4 8
  - read port 0: port group R
    - widths 2 4 8 16
    - emulate transparency with write port 0
mapping memory fifo.data via $__ICE40_RAM4K_

The $__ICE40_RAM4K_ cell is defined in the file techlibs/ice40/brams.txt, with the mapping to SB_RAM40_4K done by techmap using techlibs/ice40/brams_map.v. Any leftover memory cells are then converted into flip flops (the logic fallback) with memory_map.

Listing 17 map_ffram section
opt -fast -mux_undef -undriven -fine
memory_map
opt -undriven -fine
../_images/rdata_map_ffram.svg

Fig. 13 rdata output after map_ffram section

Note

The visual clutter on the RDATA output port (highlighted) is an unfortunate side effect of opt_clean on the swizzled data bits. In connecting the $mux input port directly to RDATA to reduce the number of wires, the $techmap579\data.0.0.RDATA wire becomes more visually complex.

See also

Advanced usage docs for

Arithmetic

Uses techmap to map basic arithmetic logic to hardware. This sees somewhat of an explosion in cells as multi-bit $mux and $adffe are replaced with single-bit $_MUX_ and $_DFFE_PP0P_ cells, while the $alu is replaced with primitive $_OR_ and $_NOT_ gates and a $lut cell.

Listing 18 map_gates section
ice40_wrapcarry
techmap -map +/techmap.v -map +/ice40/arith_map.v
opt -fast
abc -dff -D 1    (only if -retime)
ice40_opt
../_images/rdata_map_gates.svg

Fig. 14 rdata output after map_gates section

See also

Advanced usage docs for Technology mapping

Flip-flops

Convert FFs to the types supported in hardware with dfflegalize, and then use techmap to map them. In our example, this converts the $_DFFE_PP0P_ cells to SB_DFFER.

We also run simplemap here to convert any remaining cells which could not be mapped to hardware into gate-level primitives. This includes optimizing $_MUX_ cells where one of the inputs is a constant 1'0, replacing it instead with an $_AND_ cell.

Listing 19 map_ffs section
dfflegalize -cell $_DFF_?_ 0 -cell $_DFFE_?P_ 0 -cell $_DFF_?P?_ 0 -cell $_DFFE_?P?P_ 0 -cell $_SDFF_?P?_ 0 -cell $_SDFFCE_?P?P_ 0 -cell $_DLATCH_?_ x -mince -1
techmap -map +/ice40/ff_map.v
opt_expr -mux_undef
simplemap
ice40_opt -full
../_images/rdata_map_ffs.svg

Fig. 15 rdata output after map_ffs section

See also

Advanced usage docs for Technology mapping

LUTs

abc and techmap are used to map LUTs; converting primitive cell types to use $lut and SB_CARRY cells. Note that the iCE40 flow uses abc9 rather than abc. For more on what these do, and what the difference between these two commands are, refer to The ABC toolbox.

Listing 20 map_luts section
abc          (only if -abc2)
ice40_opt    (only if -abc2)
techmap -map +/ice40/latches_map.v
simplemap                                   (if -noabc or -flowmap)
techmap -map +/gate2lut.v -D LUT_WIDTH=4    (only if -noabc)
flowmap -maxlut 4    (only if -flowmap)
read_verilog -D ICE40_HX -icells -lib -specify +/ice40/abc9_model.v
abc9  -W 250
ice40_wrapcarry -unwrap
techmap -map +/ice40/ff_map.v
clean
opt_lut -tech ice40
../_images/rdata_map_luts.svg

Fig. 16 rdata output after map_luts section

Finally we use techmap to map the generic $lut cells to iCE40 SB_LUT4 cells.

Listing 21 map_cells section
techmap -map +/ice40/cells_map.v    (skip if -vpr)
clean
../_images/rdata_map_cells.svg

Fig. 17 rdata output after map_cells section

See also

Advanced usage docs for

Other cells

The following commands may also be used for mapping other cells:

hilomap

Some architectures require special driver cells for driving a constant hi or lo value. This command replaces simple constants with instances of such driver cells.

iopadmap

Top-level input/outputs must usually be implemented using special I/O-pad cells. This command inserts such cells to the design.

These commands tend to either be in the map_cells section or after the check section depending on the flow.

Final steps

The next section of the iCE40 synth flow performs some sanity checking and final tidy up:

Listing 22 check section
autoname
hierarchy -check
stat
check -noinit
blackbox =A:whitebox

The new commands here are:

The output from stat is useful for checking resource utilization; providing a list of cells used in the design and the number of each, as well as the number of other resources used such as wires and processes. For this design, the final call to stat should look something like the following:

yosys> stat -top fifo

17. Printing statistics.

=== fifo ===

   Number of wires:                 94
   Number of wire bits:            260
   Number of public wires:          94
   Number of public wire bits:     260
   Number of ports:                  7
   Number of port bits:             29
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                138
     $scopeinfo                      2
     SB_CARRY                       26
     SB_DFF                         26
     SB_DFFER                       25
     SB_LUT4                        58
     SB_RAM40_4K                     1

Note that the -top fifo here is optional. stat will automatically use the module with the top attribute set, which fifo was when we called hierarchy. If no module is marked top, then stats will be shown for each module selected.

The stat output is also useful as a kind of sanity-check: Since we have already run proc, we wouldn’t expect there to be any processes. We also expect data to use hard memory; if instead of an SB_RAM40_4K saw a high number of flip-flops being used we might suspect something was wrong.

If we instead called stat immediately after read_verilog fifo.v we would see something very different:

yosys> stat

2. Printing statistics.

=== fifo ===

   Number of wires:                 28
   Number of wire bits:            219
   Number of public wires:           9
   Number of public wire bits:      45
   Number of ports:                  7
   Number of port bits:             29
   Number of memories:               1
   Number of memory bits:         2048
   Number of processes:              3
   Number of cells:                  9
     $add                            1
     $logic_and                      2
     $logic_not                      2
     $memrd                          1
     $sub                            1
     addr_gen                        2

=== addr_gen ===

   Number of wires:                  8
   Number of wire bits:             60
   Number of public wires:           4
   Number of public wire bits:      11
   Number of ports:                  4
   Number of port bits:             11
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              2
   Number of cells:                  2
     $add                            1
     $eq                             1

Notice how fifo and addr_gen are listed separately, and the statistics for fifo show 2 addr_gen modules. Because this is before the memory has been mapped, we also see that there is 1 memory with 2048 memory bits; matching our 8-bit wide data memory with 256 values (\(8*256=2048\)).

Synthesis output

The iCE40 synthesis flow has the following output modes available:

As an example, if we called synth_ice40 -top fifo -json fifo.json, our synthesized fifo design will be output as fifo.json. We can then read the design back into Yosys with read_json, but make sure you use design -reset or open a new interactive terminal first. The JSON output we get can also be loaded into nextpnr to do place and route; but that is beyond the scope of this documentation.