Coarse arithmetics¶

The $macc cell type represents a generalized multiply and accumulate operation. The cell is purely combinational. It outputs the result of summing up a sequence of products and other injected summands.

Y = 0 +- a0factor1 * a0factor2 +- a1factor1 * a1factor2 +- ...
     + B[0] + B[1] + ...

The A port consists of concatenated pairs of multiplier inputs (“factors”). A zero length factor2 acts as a constant 1, turning factor1 into a simple summand.

In this pseudocode, u(foo) means an unsigned int that’s foo bits long.

struct A {
   u(CONFIG.mul_info[0].factor1_len) a0factor1;
   u(CONFIG.mul_info[0].factor2_len) a0factor2;
   u(CONFIG.mul_info[1].factor1_len) a1factor1;
   u(CONFIG.mul_info[1].factor2_len) a1factor2;
   ...
};

The cell’s CONFIG parameter determines the layout of cell port A. The CONFIG parameter carries the following information:

struct CONFIG {
   u4 num_bits;
   struct mul_info {
      bool is_signed;
      bool is_subtract;
      u(num_bits) factor1_len;
      u(num_bits) factor2_len;
   }[num_ports];
};

B is an array of concatenated 1-bit-wide unsigned integers to also be summed up.

yosys> help $alu¶

Arithmetic logic unit

A building block supporting both binary addition/subtraction operations, and indirectly, comparison operations. Typically created by the alumacc pass, which transforms: $add, $sub, $lt, $le, $ge, $gt, $eq, $eqx, $ne, $nex cells into this $alu cell.

Properties:: is_evaluable

Simulation model (verilog)¶

Listing 216 simlib.v¶

module \$alu (A, B, CI, BI, X, Y, CO);

    parameter A_SIGNED = 0;
    parameter B_SIGNED = 0;
    parameter A_WIDTH = 1;
    parameter B_WIDTH = 1;
    parameter Y_WIDTH = 1;

    input [A_WIDTH-1:0] A;      // Input operand
    input [B_WIDTH-1:0] B;      // Input operand
    output [Y_WIDTH-1:0] X;     // A xor B (sign-extended, optional B inversion,
                                //          used in combination with
                                //          reduction-AND for $eq/$ne ops)
    output [Y_WIDTH-1:0] Y;     // Sum

    input CI;                   // Carry-in (set for $sub)
    input BI;                   // Invert-B (set for $sub)
    output [Y_WIDTH-1:0] CO;    // Carry-out

    wire [Y_WIDTH-1:0] AA, BB;

    generate
        if (A_SIGNED && B_SIGNED) begin:BLOCK1
            assign AA = $signed(A), BB = BI ? ~$signed(B) : $signed(B);
        end else begin:BLOCK2
            assign AA = $unsigned(A), BB = BI ? ~$unsigned(B) : $unsigned(B);
        end
    endgenerate

    // this is 'x' if Y and CO should be all 'x', and '0' otherwise
    wire y_co_undef = ^{A, A, B, B, CI, CI, BI, BI};

    assign X = AA ^ BB;
    // Full adder
    assign Y = (AA + BB + CI) ^ {Y_WIDTH{y_co_undef}};

    function get_carry;
        input a, b, c;
        get_carry = (a&b) | (a&c) | (b&c);
    endfunction

    genvar i;
    generate
        assign CO[0] = get_carry(AA[0], BB[0], CI) ^ y_co_undef;
        for (i = 1; i < Y_WIDTH; i = i+1) begin:BLOCK3
            assign CO[i] = get_carry(AA[i], BB[i], CO[i-1]) ^ y_co_undef;
        end
    endgenerate

endmodule

yosys> help $fa¶

Properties:: is_evaluable

Simulation model (verilog)¶

Listing 217 simlib.v¶

module \$fa (A, B, C, X, Y);

    parameter WIDTH = 1;

    input [WIDTH-1:0] A, B, C;
    output [WIDTH-1:0] X, Y;

    wire [WIDTH-1:0] t1, t2, t3;

    assign t1 = A ^ B, t2 = A & B, t3 = C & t1;
    assign Y = t1 ^ C, X = (t2 | t3) ^ (Y ^ Y);

endmodule

yosys> help $lcu¶

Lookahead carry unit A building block dedicated to fast computation of carry-bits used in binary arithmetic operations. By replacing the ripple carry structure used in full-adder blocks, the more significant bits of the sum can be expected to be computed more quickly. Typically created during techmap of $alu cells (see the “_90_alu” rule in +/techmap.v).

Properties:: is_evaluable

Simulation model (verilog)¶

Listing 218 simlib.v¶

module \$lcu (P, G, CI, CO);

    parameter WIDTH = 1;

    input [WIDTH-1:0] P;    // Propagate
    input [WIDTH-1:0] G;    // Generate
    input CI;               // Carry-in

    output reg [WIDTH-1:0] CO; // Carry-out

    integer i;
    always @* begin
        CO = 'bx;
        if (^{P, G, CI} !== 1'bx) begin
            CO[0] = G[0] || (P[0] && CI);
            for (i = 1; i < WIDTH; i = i+1)
                CO[i] = G[i] || (P[i] && CO[i-1]);
        end
    end

endmodule

yosys> help $macc¶

Multiply and accumulate. A building block for summing any number of negated and unnegated signals and arithmetic products of pairs of signals. Cell port A concatenates pairs of signals to be multiplied together. When the second signal in a pair is zero length, a constant 1 is used instead as the second factor. Cell port B concatenates 1-bit-wide signals to also be summed, such as “carry in” in adders. Typically created by the alumacc pass, which transforms $add and $mul into $macc cells.

Properties:: is_evaluable

Simulation model (verilog)¶

Listing 219 simlib.v¶

module \$macc (A, B, Y);

    parameter A_WIDTH = 0;
    parameter B_WIDTH = 0;
    parameter Y_WIDTH = 0;
    // CONFIG determines the layout of A, as explained below
    parameter CONFIG = 4'b0000;
    parameter CONFIG_WIDTH = 4;

    // In the terms used for this cell, there's mixed meanings for the term "port". To disambiguate:
    // A cell port is for example the A input (it is constructed in C++ as cell->setPort(ID::A, ...))
    // Multiplier ports are pairs of multiplier inputs ("factors").
    // If the second signal in such a pair is zero length, no multiplication is necessary, and the first signal is just added to the sum.
    input [A_WIDTH-1:0] A; // Cell port A is the concatenation of all arithmetic ports
    input [B_WIDTH-1:0] B; // Cell port B is the concatenation of single-bit unsigned signals to be also added to the sum
    output reg [Y_WIDTH-1:0] Y; // Output sum

    // Xilinx XSIM does not like $clog2() below..
    function integer my_clog2;
        input integer v;
        begin
            if (v > 0)
                v = v - 1;
            my_clog2 = 0;
            while (v) begin
                v = v >> 1;
                my_clog2 = my_clog2 + 1;
            end
        end
    endfunction

    // Bits that a factor's length field in CONFIG per factor in cell port A
    localparam integer num_bits = CONFIG[3:0] > 0 ? CONFIG[3:0] : 1;
    // Number of multiplier ports
    localparam integer num_ports = (CONFIG_WIDTH-4) / (2 + 2*num_bits);
    // Minium bit width of an induction variable to iterate over all bits of cell port A
    localparam integer num_abits = my_clog2(A_WIDTH) > 0 ? my_clog2(A_WIDTH) : 1;

    // In this pseudocode, u(foo) means an unsigned int that's foo bits long.
    // The CONFIG parameter carries the following information:
    //    struct CONFIG {
    //        u4 num_bits;
    //        struct port_field {
    //            bool is_signed;
    //            bool is_subtract;
    //            u(num_bits) factor1_len;
    //            u(num_bits) factor2_len;
    //        }[num_ports];
    //    };

    // The A cell port carries the following information:
    //    struct A {
    //        u(CONFIG.port_field[0].factor1_len) port0factor1;
    //        u(CONFIG.port_field[0].factor2_len) port0factor2;
    //        u(CONFIG.port_field[1].factor1_len) port1factor1;
    //        u(CONFIG.port_field[1].factor2_len) port1factor2;
    //        ...
    //    };
    // and log(sizeof(A)) is num_abits.
    // No factor1 may have a zero length.
    // A factor2 having a zero length implies factor2 is replaced with a constant 1.

    // Additionally, B is an array of 1-bit-wide unsigned integers to also be summed up.
    // Finally, we have:
    // Y = port0factor1 * port0factor2 + port1factor1 * port1factor2 + ...
    //     * B[0] + B[1] + ...

    function [2*num_ports*num_abits-1:0] get_port_offsets;
        input [CONFIG_WIDTH-1:0] cfg;
        integer i, cursor;
        begin
            cursor = 0;
            get_port_offsets = 0;
            for (i = 0; i < num_ports; i = i+1) begin
                get_port_offsets[(2*i + 0)*num_abits +: num_abits] = cursor;
                cursor = cursor + cfg[4 + i*(2 + 2*num_bits) + 2 +: num_bits];
                get_port_offsets[(2*i + 1)*num_abits +: num_abits] = cursor;
                cursor = cursor + cfg[4 + i*(2 + 2*num_bits) + 2 + num_bits +: num_bits];
            end
        end
    endfunction

    localparam [2*num_ports*num_abits-1:0] port_offsets = get_port_offsets(CONFIG);

    `define PORT_IS_SIGNED   (0 + CONFIG[4 + i*(2 + 2*num_bits)])
    `define PORT_DO_SUBTRACT (0 + CONFIG[4 + i*(2 + 2*num_bits) + 1])
    `define PORT_SIZE_A      (0 + CONFIG[4 + i*(2 + 2*num_bits) + 2 +: num_bits])
    `define PORT_SIZE_B      (0 + CONFIG[4 + i*(2 + 2*num_bits) + 2 + num_bits +: num_bits])
    `define PORT_OFFSET_A    (0 + port_offsets[2*i*num_abits +: num_abits])
    `define PORT_OFFSET_B    (0 + port_offsets[2*i*num_abits + num_abits +: num_abits])

    integer i, j;
    reg [Y_WIDTH-1:0] tmp_a, tmp_b;

    always @* begin
        Y = 0;
        for (i = 0; i < num_ports; i = i+1)
        begin
            tmp_a = 0;
            tmp_b = 0;

            for (j = 0; j < `PORT_SIZE_A; j = j+1)
                tmp_a[j] = A[`PORT_OFFSET_A + j];

            if (`PORT_IS_SIGNED && `PORT_SIZE_A > 0)
                for (j = `PORT_SIZE_A; j < Y_WIDTH; j = j+1)
                    tmp_a[j] = tmp_a[`PORT_SIZE_A-1];

            for (j = 0; j < `PORT_SIZE_B; j = j+1)
                tmp_b[j] = A[`PORT_OFFSET_B + j];

            if (`PORT_IS_SIGNED && `PORT_SIZE_B > 0)
                for (j = `PORT_SIZE_B; j < Y_WIDTH; j = j+1)
                    tmp_b[j] = tmp_b[`PORT_SIZE_B-1];

            if (`PORT_SIZE_B > 0)
                tmp_a = tmp_a * tmp_b;

            if (`PORT_DO_SUBTRACT)
                Y = Y - tmp_a;
            else
                Y = Y + tmp_a;
        end
        for (i = 0; i < B_WIDTH; i = i+1) begin
            Y = Y + B[i];
        end
    end

    `undef PORT_IS_SIGNED
    `undef PORT_DO_SUBTRACT
    `undef PORT_SIZE_A
    `undef PORT_SIZE_B
    `undef PORT_OFFSET_A
    `undef PORT_OFFSET_B

endmodule

yosys> help $macc_v2¶

Multiply and add. This cell represents a generic fused multiply-add operation, it supersedes the earlier $macc cell.

Properties:: is_evaluable

Simulation model (verilog)¶

Listing 220 simlib.v¶

module \$macc_v2 (A, B, C, Y);

    parameter NPRODUCTS = 0;
    parameter NADDENDS = 0;
    parameter A_WIDTHS = 16'h0000;
    parameter B_WIDTHS = 16'h0000;
    parameter C_WIDTHS = 16'h0000;
    parameter Y_WIDTH = 0;

    parameter PRODUCT_NEGATED = 1'bx;
    parameter ADDEND_NEGATED = 1'bx;
    parameter A_SIGNED = 1'bx;
    parameter B_SIGNED = 1'bx;
    parameter C_SIGNED = 1'bx;

    function integer sum_widths1;
        input [(16*NPRODUCTS)-1:0] widths;
        integer i;
        begin
            sum_widths1 = 0;
            for (i = 0; i < NPRODUCTS; i++) begin
                sum_widths1 = sum_widths1 + widths[16*i+:16];
            end
        end
    endfunction

    function integer sum_widths2;
        input [(16*NADDENDS)-1:0] widths;
        integer i;
        begin
            sum_widths2 = 0;
            for (i = 0; i < NADDENDS; i++) begin
                sum_widths2 = sum_widths2 + widths[16*i+:16];
            end
        end
    endfunction

    input [sum_widths1(A_WIDTHS)-1:0] A; // concatenation of LHS factors
    input [sum_widths1(B_WIDTHS)-1:0] B; // concatenation of RHS factors
    input [sum_widths2(C_WIDTHS)-1:0] C; // concatenation of summands
    output reg [Y_WIDTH-1:0] Y; // output sum

    integer i, j, ai, bi, ci, aw, bw, cw;
    reg [Y_WIDTH-1:0] product;
    reg [Y_WIDTH-1:0] addend, oper_a, oper_b;

    always @* begin
        Y = 0;
        ai = 0;
        bi = 0;
        for (i = 0; i < NPRODUCTS; i = i+1)
        begin
            aw = A_WIDTHS[16*i+:16];
            bw = B_WIDTHS[16*i+:16];

            oper_a = 0;
            oper_b = 0;
            for (j = 0; j < Y_WIDTH && j < aw; j = j + 1)
                oper_a[j] = A[ai + j];
            for (j = 0; j < Y_WIDTH && j < bw; j = j + 1)
                oper_b[j] = B[bi + j];
            // A_SIGNED[i] == B_SIGNED[i] as RTLIL invariant
            if (A_SIGNED[i] && B_SIGNED[i]) begin
                for (j = aw; j > 0 && j < Y_WIDTH; j = j + 1)
                    oper_a[j] = oper_a[j - 1];
                for (j = bw; j > 0 && j < Y_WIDTH; j = j + 1)
                    oper_b[j] = oper_b[j - 1];
            end

            product = oper_a * oper_b;

            if (PRODUCT_NEGATED[i])
                Y = Y - product;
            else
                Y = Y + product;

            ai = ai + aw;
            bi = bi + bw;
        end

        ci = 0;
        for (i = 0; i < NADDENDS; i = i+1)
        begin
            cw = C_WIDTHS[16*i+:16];

            addend = 0;
            for (j = 0; j < Y_WIDTH && j < cw; j = j + 1)
                addend[j] = C[ci + j];
            if (C_SIGNED[i]) begin
                for (j = cw; j > 0 && j < Y_WIDTH; j = j + 1)
                    addend[j] = addend[j - 1];
            end

            if (ADDEND_NEGATED[i])
                Y = Y - addend;
            else
                Y = Y + addend;

            ci = ci + cw;
        end
    end

endmodule