Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToDo list or list of differences? #1

Open
iDoka opened this issue Mar 23, 2022 · 11 comments
Open

ToDo list or list of differences? #1

iDoka opened this issue Mar 23, 2022 · 11 comments

Comments

@iDoka
Copy link

iDoka commented Mar 23, 2022

Hi Lawrie and thanks for that great work!

As you mentioned:

This implementation has been done from the specification, without access to any Raspberry Pi HDL. It is currently incomplete, but some programs run in simulation and on open source FPGA boards.
The current method of configuring and controlling PIO from a top-level module is different from that used on the RP2040 chip, and will probably be changed for closer compatibility.

As you mentioned in README, your implementation slight incomplete and have differences than PIO in RP2040 chip.

Would be great to known todo bullets and/or details of differences between two implementations to better understanding how to achive almost 100% compatibility with PIO in RP2040 chip. 🙏🏻

Have a nice hacking!

@lawrie
Copy link
Owner

lawrie commented Jul 11, 2022

Sorry about the late reply. It is quite a while since I worked on this, so I don't remember very accurately.

  • I think there is an error in autopull/autopush where an extra cycle is being used that shouldn't be
  • Merging the results of multiple state machines to produce the output pin state is not implemented
  • Combining Fifos together to produce a deeper Fifo is not implemented

@iDoka
Copy link
Author

iDoka commented Jul 11, 2022

Lawrie, thanks for your reply!
Would be great to add this limitations in README.

@xobs
Copy link
Contributor

xobs commented Dec 27, 2022

Note that the second item in your bullet points -- merging the results -- is handled by the patchset in xobs@6170f34

@xobs
Copy link
Contributor

xobs commented Dec 29, 2022

Also note that interrupts do not appeared to be wired up -- irq0 and irq1 are not attached to anything.

@lawrie
Copy link
Owner

lawrie commented Dec 29, 2022

Sean, many thanks for making the improvements - I have merged the pull request. It is exciting that this in going to be included in the MPW-8 shuttle run.

@bunnie
Copy link
Contributor

bunnie commented Apr 30, 2023

Do you happen to have any more info on point 1?

I think there is an error in autopull/autopush where an extra cycle is being used that shouldn't be

Any hints on how I could try to produce/test for this particular condition, and what sorts of things it would impact?

@lawrie
Copy link
Owner

lawrie commented Apr 30, 2023

Bunnie, It is some time since I worked on this, so I cannot remember the full details. I remember when I ran the simulations or looked at output with a logic analyser or scope, in some cases there was an extra clock cycle that shouldn't have been there, so a transition took an extra clock cycle. I seem to remember that it occurred with the spi example, but it probably happens on anything that uses autopull or autopush.

@bunnie
Copy link
Contributor

bunnie commented Apr 30, 2023

Thanks! That's very helpful!

@bunnie
Copy link
Contributor

bunnie commented May 3, 2023

I think I found what you''re talking about:

image

Does this extended clock cycle at the end of the SPI transaction jog your memory? Mostly just want to make sure I am covering the base that you described in the to-do list. If this doesn't seem like the issue you were worried about, I will keep searching for it.

Regardless, the fix to this is basically to "look-ahead" during an auto-pull. I think the problem is right now that when you do an auto-pull from a FIFO, it fetches the value into the output shift register, and then it pushes it to the pins, so the instruction ends up taking an extra cycle longer to execute than it should. I added a set of bypasses that, in the case the OSR is being loaded, it "reaches around" to the input value and presents it to the output (it also pre-shifts the value as well). Basically I hoist the shift computation an cycle earlier. This is going to be pretty terrible for the critical path (effectively doubles it) but also I think the block doesn't need to close at such tight timing levels for my application, so this is fine.

The actual fix is a bit complicated, and I've had to layer a lot of changes on your API anyways to integrate the block for testing (I implemented the full RPi-compatible register set so I can run their test programs directly on my RV32 core), and I also implemented some of the other to-do's that were missing on your list. However, this is the specific commit that seems to fix it:

buncram@ebc18c5

I'm not quite sure how to merge these larger changes back into this repo, because the client I'm doing this for is requiring me to wire this into an APB bus (but a least it's all permissively open source licensed), so the top level pio.v file has forked into this systemverilog monstrosity:

https://github.com/buncram/cram-soc/blob/cram/deps/pio/pio_ahb.sv

Once I get everything working, I'll try to clean it up and turn it into a module that can be cleanly integrated as a LiteX peripheral with some rube-goldberg AXI-to-APB bridge in between, but I have a lot more test vectors to write and run...

@lawrie
Copy link
Owner

lawrie commented May 3, 2023

Again, I can't remember the full details, but what you have identified in the SPI example looks like what I remember seeing. I think I did look briefly at a solution to the autopull/push issue and it did not look simple. Your solution sounds convincing.

If I had continued working on the project I would have got it working with an RV32 core. I mainly use picorv32 or VexRiscv. There are a variety of buses that could be used for such integration.

I am very pleased to see all these fixes and improvement done to my code. I merged your pull request.

I did look at how FIFO chaining might be done and it didn't look too hard, but I did not get round to doing it.

Sean Cross will probably be interested in your fixes and improvements as he incorporated the code in the Google Skywater MPW-8 shuttle.

I was originally going to use Amaranth HDL (nmigen) rather than Verilog, but I decided to try a Verilog version first. I might still be interested in doing an amaranth version sometime.

@bunnie
Copy link
Contributor

bunnie commented May 5, 2023

Just as a side note for anyone reading along, I did find an interesting bit of documentation on pages 336/337 of the RP2040 manual, that solves the autpull extra cycle mystery. Instead of hoisting the shift in the OSR and pre-computing everything (which creates a horrific critical path), it turns out they do a look-ahead computation of the count, and if it looks like on the next iteration they need to do a pull they trigger the pull simultaneous with the OSR update from the OUT instruction. Basically a small pipelining trick.

Unfortunately adding the look-ahead count output means I had to change the module interfaces, which means a PR will not apply cleanly into the repo as-is, so for now I'm working along a fork of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants