I recently started using Chisel, a hardware construction language from UC Berkeley implemented as a Scala DSL. Although it still has some rough edges, it's definitely usable and I really like how it can map the same description into Verilog or a cycle-accurate C++ simulation.
I'm currently working on making hardware accelerators irregular applications (like graph traversal and sparse matrix operations - oh wait, we can do one with the other). Although simulation goes a long way for testing and evaluation, I prefer FPGA implementations (even though it takes so much more time and effort, the end product actually is a computer). I found that Chisel also helps out quite a bit with productivity there. Along the way, I made a little something that someone else might find useful: A collection of AXI4 interface definitions and simple peripherals in Chisel. Here's the github repo:
I started working on this because I wanted more hands-on experience with Chisel and a faster way of making my hardware prototypes work with AXI interfaces found on e.g Xilinx FPGA/programmable SoCs. There are 50+ signals/ports on a full-blown AXI interface, which can be daunting. However, these can be organized into address and data channels based on the decoupled (ready/valid) abstraction, which fits nicely with Chisel's Decoupled-style interfaces and custom types.
It is worth mentioning that the code here targets the Verilog backend and hardware synthesis only, since there are no testbenches. The generated Verilog should be straightforward to use with the Xilinx IP packager. The peripherals aren't extensively tested, but they performed as expected on a ZedBoard (pushed through Vivado for synthesis).
Right now the repository is a haphazard collection of Chisel source files, with varying degree of comments in each:
SimpleReg - a translation of the AXI Lite slave template (register file) generated by Vivado
SumAccel - read 3 consecutive words from address 0x10000000 using AXI Lite master and sum them (the result can be read through the AXI Lite slave interface)
HPSumAccel - read specified number of words from specified address in large bursts and sum them. Note that number of words should be a multiple of the burst length (set to 512 32-bit words by default), since this doesn't handle chopping the burst into bits. The result and total elapsed cycles can be read through the slave interface.
I realize I could have used DMA IP cores to do most of this, but keeping the interfacing in Chisel has some benefits like tighter integration with the peripherals. Plus it's been a good learning experience :)