Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consumed_by_tx_id on ma_tx_out #1913

Open
Saghen opened this issue Nov 29, 2024 · 5 comments
Open

consumed_by_tx_id on ma_tx_out #1913

Saghen opened this issue Nov 29, 2024 · 5 comments
Assignees

Comments

@Saghen
Copy link

Saghen commented Nov 29, 2024

As far as I'm aware, it's currently not possible to get all unspent multi asset transaction outputs for a specific multi asset without doing one of:

  • Scanning all rows of ma_tx_out for a given multi asset joined with tx_out filtered on consumed_by_tx_id
  • Scanning all consumed_by_tx_id IS NULL on tx_out joined with ma_tx_out filtered for a given multi asset
  • Creating a materialized view

The former ends up being far more performant, but still leads to slow queries for multi assets with many transaction outputs. Here's an example for the /assets/:asset/addresses query from Blockfrost:

SELECT
  txo.address AS "address",
  SUM(quantity) AS "quantity"
FROM ma_tx_out mto
  JOIN multi_asset ma ON (mto.ident = ma.id)
  JOIN tx_out txo ON (txo.id = mto.tx_out_id)
  JOIN tx ON (tx.id = txo.tx_id)
WHERE txo.consumed_by_tx_id IS NULL
  AND (encode(ma.policy, 'hex') || encode(ma.name, 'hex')) = $1
  -- don't count utxos that are part of transaction that failed script validation at stage 2
  AND tx.valid_contract = 'true'
GROUP BY txo.address
ORDER BY MIN(tx.id) ASC
LIMIT 100;

Would it make sense to include a consumed_by_tx_id field on ma_tx_out alongside the existing field on tx_out for this case?

@kderme
Copy link
Contributor

kderme commented Nov 30, 2024

Would it help to have an optional tx_out.ma_tx_out field where all multiassets are stored? It could be in the form of json, to allow other more fine grained queries, using the jsonb postgres type?
I'm not quite sure how possible or easy to implement this could be.

@Saghen
Copy link
Author

Saghen commented Nov 30, 2024

That would avoid a JOIN on the majority of our tx_out queries as well, since we almost always get the multi assets, so that sounds great!

@sgillespie sgillespie self-assigned this Dec 3, 2024
@rdlrt
Copy link

rdlrt commented Dec 7, 2024

Would it help to have an optional tx_out.ma_tx_out field where all multiassets are stored? It could be in the form of json, to allow other more fine grained queries, using the jsonb postgres type? I'm not quite sure how possible or easy to implement this could be.

As per my understanding, that would likely not work well for querying. ma_tx_out is a already over 120GB and only increasing exponentially.

Adding it to tx_out will negatively impact tx_out performance itself (essentially drastically increase tuning params and in turn resources, especially IOPs).

Not to mention - the indexing you'd need to filter tx_out by assets in this case would add tremendous cost to query planner.

A better alternative would be adding similar consumed (or spent) field to ma_tx_out, as part of using tx_out's consumed_by_tx_id config preset. Having said that, looking at few years down the line - projects should really start splitting out a full-history instance [useful for explorers and projects that show history tx details including input addresses] from utxo-only instances , which are far more performant for such querying. One can use bootstrap to initially sync dbsync w/o tx_out|ma_tx_out and then once synched switch to prune utxos. A combination of the above would really help out users query consumption.

CC: @sgillespie - since you're working on this, to consider feasibility of spent field instead of above approach.

@kderme
Copy link
Contributor

kderme commented Dec 9, 2024

Adding a new updateable field in ma_tx_out can be very slow, since this table is huge and updating is already quite slow for tx_out.

DBSync schema is quite normalised and requires joins. Denormalising it by merging tables together can help for specific queries which fetch all the MA from a tx or output. It's true that queries based on MA policy won't work that well or won't work at all (we can try with jsonb or postgres arrays but it will be worse than the current solution). So it really depends on what are the frequent queries. If users have both queries based on policy and based on tx/output I guess the current schema would be ideal, or even allow both a MA table and a MA field in tx_out at the cost of more disk.

One can use bootstrap to initially sync dbsync w/o tx_out|ma_tx_out and then once synched switch to prune utxos.

We want to use the bootstrap idea more, even for live stake, live voting power etc. The consumed option is a compromise: users still get the full history, but it's also relatively fast to query the current state.

By the way with the opportunity of this issue and other older ones, I've considered the possibility of porting db-sync to Mongodb, which provides better support for denormalised data.

@rdlrt
Copy link

rdlrt commented Dec 18, 2024

So it really depends on what are the frequent queries. If users have both queries based on policy and based on tx/output I guess the current schema would be ideal, or even allow both a MA table and a MA field in tx_out at the cost of more disk.

From koios queries (ordered based on past 20 days):
policy => addresses (alongwith their balance)
asset => utxos
asset => addresses (alongwith their balance)
asset => info (current balance and meta info)
asset => txs

The change above will likely not make sense for many of these, especially if ma_tx_out stays and we're 'adding' more data to tx_out. The current layout atleast keeps tx_out away from vast spam of ma_tx_out entries - hence, I mentioned it's a bad idea to change that (or add an option over complicating further - we hardly have enough test data coverage across various options).

We want to use the bootstrap idea more, even for live stake, live voting power etc. The consumed option is a compromise: users still get the full history, but it's also relatively fast to query the current state.

Agreed - (sorry my next statement will be out-of-topic on this issue) I'd say that's a preset that makes most sense (part missing is starting dbsync with bootstrap option, once on tip automatically stop taking ledger snapshots and start pruning). It is potentially also useful to allow pruning frequency configurable as deletes can take a while, some may prefer more frequent deletes, while others may prefer waiting for it once per epoch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants