Generated verilog code is working funny for +>> on Vec

Hi, I ran into a weird case while trying to create a FIFO queue. Simulation in Haskell works correctly, but when testing the generated Verilog code with testbench I get the nonsensical outputs. I have tried to isolate one of those cases with a simple example here. Any insight would be highly appreciated :pray:

So, for a simple mealy machine like below:

module Rough where

import Clash.Prelude

type State = Vec 6 Int
type Input = Int
type Output = State

fx :: State -> Input -> (State, Output)
fx state input = (nextState, nextState)
    where 
        nextState = input +>> state

machine :: HiddenClockResetEnable dom => Signal dom Input -> Signal dom Output
machine = mealy fx (repeat 0)

topEntity :: Clock System -> Reset System -> Enable System 
    -> Signal System Input -> Signal System Output
topEntity = exposeClockResetEnable machine


--- TEST --
inputs :: HiddenClockResetEnable dom => Signal dom Input
inputs = fromList [0,0,1,2,3,4,5,6,7,8]

outputs = sampleN @System 10 (machine inputs)

The outputs from the test gives me the result as expected. However, generating a Verilog and simulating it in a testbench gives a weird output.

Here is the generated Verilog code.

/* AUTOMATICALLY GENERATED VERILOG-2001 SOURCE CODE.
** GENERATED BY CLASH 1.8.2. DO NOT MODIFY.
*/
`default_nettype none
`timescale 100fs/100fs
module topEntity
    ( // Inputs
      input wire  eta // clock
    , input wire  eta1 // reset
    , input wire  eta2 // enable
    , input wire signed [63:0] c$arg

      // Outputs
    , output wire signed [63:0] result_0
    , output wire signed [63:0] result_1
    , output wire signed [63:0] result_2
    , output wire signed [63:0] result_3
    , output wire signed [63:0] result_4
    , output wire signed [63:0] result_5
    );
  // rough.hs:14:1-78
  reg [383:0] c$ds_app_arg = {64'sd0,   64'sd0,   64'sd0,   64'sd0,   64'sd0,   64'sd0};
  wire [767:0] result_12;
  // rough.hs:10:1-2
  wire [383:0] nextState;
  wire [447:0] nextState_projection;
  wire [383:0] result;

  // register begin
  always @(posedge eta or  posedge  eta1) begin : c$ds_app_arg_register
    if ( eta1) begin
      c$ds_app_arg <= {64'sd0,   64'sd0,   64'sd0,   64'sd0,   64'sd0,   64'sd0};
    end else if (eta2) begin
      c$ds_app_arg <= result_12[767:384];
    end
  end
  // register end

  assign result = result_12[383:0];

  assign result_12 = {nextState,   nextState};

  assign nextState_projection = ({c$arg,c$ds_app_arg});

  assign nextState = nextState_projection[447:64];

  assign result_0 = $signed(result[383:320]);

  assign result_1 = $signed(result[319:256]);

  assign result_2 = $signed(result[255:192]);

  assign result_3 = $signed(result[191:128]);

  assign result_4 = $signed(result[127:64]);

  assign result_5 = $signed(result[63:0]);


endmodule

Here is my Verilog testbench.

`timescale 1us/1ns
module testbench;
    reg clk;
    reg rst;
    reg en;

    reg signed [63:0] data;

    wire signed [63:0] mem0;
    wire signed [63:0] mem1;
    wire signed [63:0] mem2;
    wire signed [63:0] mem3;
    wire signed [63:0] mem4;
    wire signed [63:0] mem5;

    topEntity monitor (clk, rst, en,
                       data, 
                       mem0, mem1, mem2, mem3, mem4, mem5
                    );

    always begin
        #1 clk = ~clk;
    end

    initial begin
        clk = 0;
        rst = 0;
        en = 1;
        
        $printtimescale(testbench);
        $dumpvars(0, testbench);

        #1;
        data = 1;
        #2;
        data = 2;
        #2;
        data = 3;
        #2;
        data = 4;
        #2
        data = 5;
        #2
        data = 6;
        #2
        data = 7;
        #2
        data = 8;
        #2;
        $finish;
    end

endmodule


It gives me the following output.

This output doesn’t make sense to me as the newest value of data gets propagated 2 steps forward instead of being shifted one by one.

Furthermore when I disable the enable signal I get the following result:

Here, the first output gets updated even with enable = false, suggesting something might be going wrong.

At first glance, I don’t see any obvious mistake in the generated code.

Any ideas?

This output doesn’t make sense to me as the newest value of data gets propagated 2 steps forward instead of being shifted one by one.

You are changing your input at the exact moment of the clock active edge. Try changing it before it. For example toggle your clock every 10 units and set your data on the falling edges. This will make it much more clear what is happening since the input will be stable before the clock active edge hits.

Here, the first output gets updated even with enable = false, suggesting something might be going wrong.

Your mealy outputs the nextstate regardless whether enable is set. The only thing setting the enable to False does is ensure the nextstate is not stored in the register.

1 Like

Thanks Rowan. That makes sense.

So, if I make the input signals stable by the time of edge trigger I get this:

As the Fx is combinational, the output comes instantly. That makes sense. Thanks for pointing it out.

However, the output is still mysterious to me.

Before the first clk rise we have:

prev state: [0,0,0,0,0,0]
input: 1
output: [1,0,0,0,0,0]

When the clock rises:
prev state: [0,0,0,0,0,0] as the input to register needs 1 cycle to settle
input: 1
output: [1,0,0,0,0,0,0] as the output

However, both the mem0 & mem1 are 1 here which is confusing.

I could ignore mem0 and assume that I have 1 less memory unit i.e I have from mem1 to mem5. This could be a workaround for me. But, I don’t understand the reason which makes me uneasy.

Thanks again!

PS: Also why isn’t mem0 initialized to 0 in the beginning :thinking: ?

Clash code for context:

prev state: [0,0,0,0,0,0] as the input to register needs 1 cycle to settle
input: 1
output: [1,0,0,0,0,0,0] as the output

These simulations don’t simulate propagation delays of registers, so an infinitesimally small delay after the clock rises, the state will already be [1,0,...]. And if you shift that and tack a 1 at the front, you get the output you see in the simulation.

PS: Also why isn’t mem0 initialized to 0 in the beginning :thinking: ?

I expect it is, but you can’t see it as you immediately overwrite it with your data which is X.

[edit]
I’m sure it is initialised to 0, because it’s shifted into mem1 as a 0 right at the beginning. It’s mem5 that we aren’t sure of because its contents have been dropped by the shift :-).
[/edit]

1 Like

Ah yes, that makes sense.

Thanks a lot @Rowan & @DigitalBrains :slight_smile:

2 Likes

Actually, mem5 doesn’t serve a purpose and would be optimised out if you synthesized the design. The useful memory units are mem0 through mem4.

The output of the mealy machine never observes mem5; its output is unconnected. The output of the mem0 register comes out of Fx as mem1, the output of mem1 as mem2, …, the output of the mem5 register doesn’t come out at all.

The code is equivalent to

type State = Vec 5 Int
type Input = Int
type Output = Vec 6 Int

fx :: State -> Input -> (State, Output)
fx state input = (init output, output)
    where 
        output = input :> state

or

fx state input = (nextState, nextState :< last state)
    where
      nextState = input +>> state

although I like that one less.

1 Like

Interesting. I would have never thought of that.
Thanks for the insights.

On a side note,
For my original issue with queue implementation, In the end, I chose to scrap the mealy-based design in favor of explicit registers. (State, Output) way to output in the transition function was very prone to me making such mistakes (output being completely combinational et al).

Now it looks something like below and works pretty well. Gotta love those applicative functors :slight_smile:

...
buffer = register (repeat 0 :: QMem) nextBufferSignal
nextBufferSignal = nextBuffer <$> buffer <*> bundle (input, cursor)

nextBuffer :: QMem -> (QInput, QCursor) -> QMem
        nextBuffer buf ((push, pop, qData), cur) = out
            where 
                out = case (push, pop) of
                    (True, _) -> if cur /= length buf then qData +>> buf else buf
                    (_, _) -> buf
...