How to dynamically shift a BitVector?

I am trying to implement a simple processor that should support shifts. I did not find anything helpful in the documentary of BitVector or Bits so I checked how other people do this (e.g., here or here). The trick seems to be to unpack the BitVector and then use the Int to do the shifting.

I created this Clash code

shifter :: BitVector 16 -> BitVector 16 -> BitVector 16
shifter a b =  shiftL a $ unpack $ extend b

which gives me the following verilog code

module shifter
    ( // Inputs
      input wire [15:0] a
    , input wire [15:0] b

      // Outputs
    , output wire [15:0] result
    );
  wire [63:0] eta;
  wire signed [63:0] ds;
  wire signed [63:0] c$ds_app_arg;

  assign eta = {48'b000000000000000000000000000000000000000000000000,b};

  assign ds = $signed(c$ds_app_arg);

  assign c$ds_app_arg = $unsigned(eta[0+:64]);

  assign result = a << (ds);
endmodule

Now this seems to be very wasteful (i.e., having a 64 bit eta intermediate value).

I made the following black box to make things less wasteful

{-# ANN myShiftL (InlineYamlPrimitive [Verilog] [__i|
  BlackBox:
    name: myShiftL
    kind: Expression
    type: 'myShiftL :: BitVector 16 -> BitVector 16 -> BitVector 16'
    template: ~ARG[0] << ~ARG[1]|]) #-}
{-# OPAQUE myShiftL #-}

That does generate the expected code (note that myShiftL is wrapped in the function t (random name) as otherwise the black box wasn’t used but the Clash code was synthesized instead):

module t
    ( // Inputs
      input wire [15:0] c$arg
    , input wire [15:0] c$arg_0

      // Outputs
    , output wire [15:0] result
    );
  assign result = c$arg << c$arg_0;
endmodule

I am not the first person that wants to do some dynamic shifting and I do not think that it is that uncommon. Yet, I haven’t seen anyone using a solution such as the black box presented above. Am I missing some pitfall here?

I do not have Vivado available to check the test bench I wrote. Verilator failed to create a binary from the generated test bench .v. I will try to get Vivado up and running before fiddling with Verilator.

Please let me know if my solution is fine or help me come up with another solution.

The reason 64 bits are used to represent the shift amount is that clash uses the standard haskell functions from the Bits typeclass and it defines the shift amount as an Int:
shiftL :: Bits a => a -> Int -> a
And and Int is a 64 bits signed value.

Looks like your myShiftL would work.
But if you look carefully, you could realize it is also “wasteful”.
To represent all meaningful shift amounts of a 16 bit input you only need 4 bits, so you could reduce it to: minimalShiftL :: BitVector 16 -> BitVector 4 -> BitVector 16

But these things are likely not as “wasteful” as you might think.
Or at least the waste might not be where you think it is.

In general synthesis software is very good at optimizing this kind of code.
And it is very likely that all three options, the the builtin shiftL, your myShiftL and the hypothetical minimalShiftL will result in the same or equivalent hardware when synthesized to an FPGA.

The “waste” is some extra (but likely small amount of) time that synthesis software has to spend to do the this optimization.
And possibly the extra time than a human has to take when trying to understand this generated code. (But hopefully they don’t have to)

I do know that my version is using too many bits as well but that is what the ISA provides. I will probably simply extract the last four bits and use only them for the input of my black box version.

To be sure whether there really is no difference in the synthesized result, I’ll just make the shift function a parameter and then measure the difference in either LUTs or area.

Please report back with your findings.

We decided to support 64-bit numbers as well so the issue is gone as I need that size any ways.

I do have another project in mind where this could actually make a difference. I will report back on that then but it will take some time.

1 Like

Please note that Haskell’s Bits class specifies that shiftL and shiftR take a 64-bit signed number, yet specify that it must be non-negative.

While that is weird, it poses no problem as we have dedicated shfitL and shiftR instructions and specified that negative shift amounts will result in an error.