Hanging while testing clash-protocols code

Hello all,

I was wondering if I could ask for some help on the following issue.

I have a trivial topEntity function, using clash-protocols, that plugs into an Axi 4 Stream:

roundTripTopEntity 
    :: Clock System 
    -> Reset System 
    -> (Signal System (Maybe (Axi4StreamM2S AxisConfig Bool)), Signal System Axi4StreamS2M) 
    -> (Signal System Axi4StreamS2M, Signal System (Maybe (Axi4StreamM2S AxisConfig Bool)))
roundTripTopEntity clk rst = withClockResetEnable clk rst enableGen $ toSignals circuitRoundTrip

circuitRoundTrip 
  :: forall dom cfg. (AxiCfg cfg, HiddenClockResetEnable dom)
  => Circuit
      (Axi4Stream dom cfg Bool)
      (Axi4Stream dom cfg Bool)
circuitRoundTrip = 
  let
    proxy = Proxy @(Axi4Stream dom cfg Bool)
    fromDf = DfConv.dfToDfConvInp proxy
    toDf = DfConv.dfToDfConvOtp proxy
  in fromDf |> toDf 

And I have a means of testing this, which feeds in Axi4StreamM2S values, only moving on to the next one if they are ack-ed via a tready signal:

topEntityRunner
  :: forall dom cfg n.
     ( HiddenClockResetEnable dom
     , NFDataX (Axi4StreamM2S cfg Bool)
     , KnownNat n
     )
  => AxiSTopEntity dom cfg
  -> Vec n (Axi4StreamM2S cfg Bool)  -- Data to input
  -> Signal dom (Maybe (Axi4StreamM2S cfg Bool))
topEntityRunner theTopEntity inputVec =
  let
    m2sGenerator :: Signal dom (Maybe (Axi4StreamM2S cfg Bool))
    m2sGenerator = axiM2SGenerator inputVec outS2M

    s2MInput :: Signal dom Axi4StreamS2M
    s2MInput = pure $ Axi4StreamS2M True

    (outS2M, outM2S) = theTopEntity clockGen resetGen (m2sGenerator, s2MInput)
  in outM2S

axiM2SGenerator
  :: (HiddenClockResetEnable dom, NFDataX (Axi4StreamM2S cfg Bool), KnownNat n)
  => Vec n (Axi4StreamM2S cfg Bool)           -- Data to input
  -> Signal dom Axi4StreamS2M              -- the acks we get back
  -> Signal dom (Maybe (Axi4StreamM2S cfg Bool))
axiM2SGenerator initialQueue =
  mealy axiStateMachine (initialQueue, 0)

axiStateMachine
  :: (KnownNat n)
  => (Vec n (Axi4StreamM2S cfg Bool), Index (n + 1))
  -> Axi4StreamS2M
  -> ((Vec n (Axi4StreamM2S cfg Bool), Index (n + 1)), Maybe (Axi4StreamM2S cfg Bool))
axiStateMachine state@(queue, n) !s2mVal =
  let out = case (n == maxBound, _tready s2mVal) of 
              (True, _) -> (state, Nothing)
              (False, True) -> ((queue, n+1), Just $ queue !! n )
              (False, False) -> (state, Just $ queue !! n)
  in out

If I then write a test which uses the latter to test the former:

it "topEntityRunner" $ do
	let clk = systemClockGen
	    rst = systemResetGen
	    en  = enableGen
	    axiChunk isLast = 
	      Axi4StreamM2S 
	      { _tdata = 0x01 :> 0x02 :> 0x03 :> 0x04 :> 0x05 :> 0x06 :> 0x07 :> 0x08 :> Nil
	      , _tkeep = True :> True :> True :> True :> True :> True :> True :> True :> Nil  
	      , _tstrb = True :> True :> True :> True :> True :> True :> True :> True :> Nil
	      , _tid = 0
	      , _tdest = 0
	      , _tuser = False
	      , _tlast = isLast
	      }
	    axiChunks :: Vec 20 (Axi4StreamM2S AxiCfg64 Bool) = 
	      replicate d19 (axiChunk False) ++ axiChunk True :> Nil
	    outSignal = withClockResetEnable clk rst en $
	                  topEntityRunner roundTripTopEntity axiChunks
	    sampledOutput = withClockResetEnable clk rst en $
	                      sampleWithResetN d1 20 outSignal
	toList axiChunks `shouldBe` catMaybes sampledOutput

For some reason, though, this hangs during testing. There’s two notable things about this:

  • it doesn’t hang if the code isn’t in any sense recursive, ie if the data I’m feeding in doesn’t depend at all on the tready signals coming out.
  • it doesn’t hang if I use an even more trivial (combinatorial) topEntity function, below:
trivialTopEntity clk rst (m2sIn, s2mIn) =
  let s2mOut = s2mIn
      m2sOut = m2sIn
  in (s2mOut, m2sOut)

Any help gratefully appreciated.

A key rule of protocols in clash-protocols that use an axi-handshake like backpressure system is that the forward cannot depend on the backward. If you do this you can get combinational loops and thus hangs. See this explicit rule here:

For axi stream.

Edit: Quickly looking over the code, I think you can fix it by explicitly factoring out the output from the case inside axiStateMachine:

axiStateMachine
  :: (KnownNat n)
  => (Vec n (Axi4StreamM2S cfg Bool), Index (n + 1))
  -> Axi4StreamS2M
  -> ((Vec n (Axi4StreamM2S cfg Bool), Index (n + 1)), Maybe (Axi4StreamM2S cfg Bool))
axiStateMachine state@(queue, n) !s2mVal =
  let
    finished = n == maxBound
    outFwd = toMaybe (not finished) (queue !! n)
    nextState = case (finished, _tready s2mVal) of 
                         (False, True) -> (queue, n+1)
                         _ -> state
  in (nextState, outFwd)

Thanks for the response. Unfortunately I get the same result from your axiStateMachine as I do for my own (ie … not a good one). I think my original function was obeying the implicit contract already - it doesn’t wait for tready before trying to send a value, it just moves on to a new value if it sees that tready was true when that value was sent.

GHC cannot float a computation like this outside the case. So to eval snd out(aka forward) of axiStateMachine it needs to force the case statement which includes _tready s2mVal (aka backward). Which means it violates the contract.

Consider when _tready s2mVal is bottom. In the code I gave forward is still fully defined. In the case you have it will also be bottom.

I will take a deeper look what’s going on here later today.

I believe this is a bug in fromDfCircuit/toDfCircuit which is used by dfToDfConvInp/dfToDfConvOtp. If I replace them with simpler version inside clash-protocols itself and then it works. Except for the problem I noted I don’t see anything wrong with what you are doing.

1 Like

Thank you for looking into it. Is this a problem that might manifest in actual production code, or is it something that’s quite specific to this circumstance?

The example I have above, of using fromDf |> toDf, is what I am intending to use in production code (except more like, fromDf |> doSomeWorkHere |> toDf). Should I avoid this paradigm in my production code because it might be buggy, or should I just re-write my test code to avoid this narrow circumstance?

I think it will manifest itself in actual production code. Unfortunately I don’t have the bandwidth currently to actually go deeper what is wrong with the implementation. If you are up for it you could apply this patch to clash-protocols which replaces toDfCircuit/fromDfCircuit:

diff --git a/clash-protocols/src/Protocols/Axi4/Stream.hs b/clash-protocols/src/Protocols/Axi4/Stream.hs
index 2204121..a7d437b 100644
--- a/clash-protocols/src/Protocols/Axi4/Stream.hs
+++ b/clash-protocols/src/Protocols/Axi4/Stream.hs
@@ -12,11 +12,14 @@ Types and instance declarations for the AXI4-stream protocol.
 -}
 module Protocols.Axi4.Stream where
 
+import qualified Prelude as P
+
 -- base
 import Control.DeepSeq (NFData)
 import Data.Hashable (Hashable, hashWithSalt)
 import qualified Data.Maybe as Maybe
 import Data.Proxy
+import Data.Coerce (coerce)
 
 -- clash-prelude
 import Clash.Prelude hiding (concat, length, take)
@@ -125,19 +128,26 @@ instance
     FwdPayload (Axi4Stream dom conf userType) =
       Axi4StreamM2S conf userType
 
-  toDfCircuit proxy = DfConv.toDfCircuitHelper proxy s0 blankOtp stateFn
+  toDfCircuit _ = fromSignals go
    where
-    s0 = ()
-    blankOtp = Nothing
-    stateFn ack _ otpItem =
-      pure (otpItem, Nothing, Maybe.isJust otpItem C.&& _tready ack)
-
-  fromDfCircuit proxy = DfConv.fromDfCircuitHelper proxy s0 blankOtp stateFn
+    go (fwdIn, bwdIn) =
+      (
+        ( fmap coerce bwdIn
+        , pure (deepErrorX "Axi4Stream toDfCircuit: undefined")
+        )
+      , Df.dataToMaybe <$> P.fst fwdIn
+      )
+
+  fromDfCircuit _ = fromSignals go
    where
-    s0 = ()
-    blankOtp = Axi4StreamS2M{_tready = False}
-    stateFn m2s ack _ =
-      pure (Axi4StreamS2M{_tready = ack}, m2s, False)
+    go (fwdIn, bwdIn) =
+      ( coerce <$> P.fst bwdIn
+      ,
+        ( fmap Df.maybeToData fwdIn
+        , pure (deepErrorX "Axi4Stream fromDfCircuit: undefined")
+        )
+      )
+
 
 instance
   (KnownAxi4StreamConfig conf, NFDataX userType, KnownDomain dom) =>

Ok, thank you for your help. I’ve worked around it by writing specific fromAxi / toAxi functions and since this seems to work, I’m happy to continue like that.

Also take a look at the PacketStream protocol. Which is a more restricted/simpler version of Axi4Stream. It has a lot more combinators available. clash-protocols/clash-protocols/src/Protocols/PacketStream at main · clash-lang/clash-protocols · GitHub

Actually I was premature when I said this was working OK. I’ve made the changes you suggested to clash-protocols, but I’m still getting similar behaviour. On the plus side I think I’ve reduced the problem down to a very minimal example.

Here’s a circuit which works both in (Haskell-based) testing and simulation (with icarus) - the tests are just making sure that it passes through whatever data it’s given:

type AxiDataWidth = 64
type AxisConfig = 'Axi4StreamConfig (AxiDataWidth `Div` 8) 0 0

type AxiSTopEntity dom cfg =
  Clock dom 
  -> Reset dom 
  -> (Signal dom (Maybe (Axi4StreamM2S cfg Bool)), Signal dom Axi4StreamS2M) 
  -> (Signal dom Axi4StreamS2M, Signal dom (Maybe (Axi4StreamM2S cfg Bool)))

topEntity :: AxiSTopEntity System AxisConfig
topEntity clk rst = withClockResetEnable clk rst enableGen $ toSignals ethAxisCircuit

ethAxisCircuit 
  :: forall dom cfg. _
  => Circuit
      (Axi4Stream dom cfg Bool)
      (Axi4Stream dom cfg Bool)
ethAxisCircuit = 
  let
    proxy = Proxy @(Axi4Stream dom cfg Bool)
    fromDf = DfConv.dfToDfConvInp proxy
    toDf = DfConv.dfToDfConvOtp proxy
    asAxi = fromDf |> toDf 
  in
    asAxi

axiPassthrough
    :: forall dom cfg. _
    => Circuit
      (Df dom (Axi4StreamM2S cfg Bool))
      (Df dom (Axi4StreamM2S cfg Bool))
axiPassthrough = Df.compander () (\() raw -> ((), Just raw, True))

However, if I add a trivial piece of logic:

axiPassthrough
    :: forall dom cfg. _
    => Circuit
      (Df dom (Axi4StreamM2S cfg Bool))
      (Df dom (Axi4StreamM2S cfg Bool))
axiPassthrough = Df.compander () (\() raw -> ((), Just raw, True))

and make my ethAxisCircuit into fromDf |> axiPassthrough |> toDf, then the haskell-based tests will pass, but it fails in simulation, returning x on the same tick as valid data is passed in.

This was my own attempt to write a fromAxi/toAxi function - if you can suggest any other similar work-around then I’d be perfectly happy with that, I’m not too concerned about architectural consistency:

fromAxi 
  :: Circuit 
    (Axi4Stream dom cfg Bool) 
    (Df dom (Axi4StreamM2S cfg Bool))
fromAxi = 
  let
    go (Nothing, _) = (Axi4StreamS2M { _tready = False }, Df.NoData)
    go (Just axiM2S, Ack ack) = (Axi4StreamS2M { _tready = ack }, Df.Data axiM2S)
  in Circuit (unbundle . fmap go . bundle)

toAxi 
  :: Circuit 
    (Df dom (Axi4StreamM2S cfg Bool))
    (Axi4Stream dom cfg Bool) 
toAxi = 
  let
    go (Df.NoData, _) = (Ack False, Nothing)
    go (Df.Data m2s, s2m) = (Ack $ _tready s2m, Just m2s)
  in Circuit (unbundle . fmap go . bundle)

That’s odd. Could you perhaps upload to github or somewhere else a complete reproducer including the testbench with the behavior you described?

Sure, let me set one up now.

Ok - I’ve put a minimal example here: GitHub - harris-chris/clash-problem-example

There’s instructions in the Readme as to how to re-create the problem. I put in a nix environment with the necessary tooling (if you happen to use it)

I had to take some time to get the project working since ghdl does not work on aarch64 and I ran into some other flake issues. So in the end I copied the nix environment from clash-cores and changed to include everything in your flake.

Anyway the reason for the discrepancy is that your axiPassthrough is using Df.compander. Which includes a forceResetSanity. forceResetSanity ensure that when a circuit is in reset it is not ready and produces no values. In my new branch I created a variant that includes a version of compander without forceResetSanity which matches the output as if no axiPassthrough was included.

I tried to take a look why rx_tready becomes undefined with forceResetSanity.
To do this I simplified to a Df dom (), a protocol with only handshaking signals and no payload. And then just adding the other inputs/outputs to the verilog so cocotb is happy. There I see the same behavior. So I made a circuit diagram using digitaljs:

If you trace all the signals there doesn’t seem to be any way for rx_tready to become undefined unless one of the inputs is also undefined.

Honestly I have no experience with cocotb so I would start looking there.

Here for the changes I made:

See: Comparing harris-chris:main...rowanG077:main · harris-chris/clash-problem-example · GitHub

Thank you very much, this is really appreciated. I’m now playing around and the issue is, as you say, something to do with resets and undefined signals in cocotb. I’d been glossing over the question of reset behaviour, I need to think about what I’m expecting to happen and figure things out from there. Anyway - thank you for taking the time to provide such a detailed analysis.

Oke. good to hear. Also maybe it will help you:

We have a specific protocol for packetized streams that is very close to axi stream, packetstream. It has way more combinators implemented and in general is easier to work with IMO. There is also a full ethernet core almost ™ finished which can work upto UDP with some restrictions.