`consumed_by_tx_id` on `ma_tx_out` #1913

Saghen · 2024-11-29T21:04:27Z

As far as I'm aware, it's currently not possible to get all unspent multi asset transaction outputs for a specific multi asset without doing one of:

Scanning all rows of ma_tx_out for a given multi asset joined with tx_out filtered on consumed_by_tx_id
Scanning all consumed_by_tx_id IS NULL on tx_out joined with ma_tx_out filtered for a given multi asset
Creating a materialized view

The former ends up being far more performant, but still leads to slow queries for multi assets with many transaction outputs. Here's an example for the /assets/:asset/addresses query from Blockfrost:

SELECT
  txo.address AS "address",
  SUM(quantity) AS "quantity"
FROM ma_tx_out mto
  JOIN multi_asset ma ON (mto.ident = ma.id)
  JOIN tx_out txo ON (txo.id = mto.tx_out_id)
  JOIN tx ON (tx.id = txo.tx_id)
WHERE txo.consumed_by_tx_id IS NULL
  AND (encode(ma.policy, 'hex') || encode(ma.name, 'hex')) = $1
  -- don't count utxos that are part of transaction that failed script validation at stage 2
  AND tx.valid_contract = 'true'
GROUP BY txo.address
ORDER BY MIN(tx.id) ASC
LIMIT 100;

Would it make sense to include a consumed_by_tx_id field on ma_tx_out alongside the existing field on tx_out for this case?

The text was updated successfully, but these errors were encountered:

kderme · 2024-11-30T09:16:10Z

Would it help to have an optional tx_out.ma_tx_out field where all multiassets are stored? It could be in the form of json, to allow other more fine grained queries, using the jsonb postgres type?
I'm not quite sure how possible or easy to implement this could be.

Saghen · 2024-11-30T16:44:23Z

That would avoid a JOIN on the majority of our tx_out queries as well, since we almost always get the multi assets, so that sounds great!

rdlrt · 2024-12-07T01:21:21Z

Would it help to have an optional tx_out.ma_tx_out field where all multiassets are stored? It could be in the form of json, to allow other more fine grained queries, using the jsonb postgres type? I'm not quite sure how possible or easy to implement this could be.

As per my understanding, that would likely not work well for querying. ma_tx_out is a already over 120GB and only increasing exponentially.

Adding it to tx_out will negatively impact tx_out performance itself (essentially drastically increase tuning params and in turn resources, especially IOPs).

Not to mention - the indexing you'd need to filter tx_out by assets in this case would add tremendous cost to query planner.

A better alternative would be adding similar consumed (or spent) field to ma_tx_out, as part of using tx_out's consumed_by_tx_id config preset. Having said that, looking at few years down the line - projects should really start splitting out a full-history instance [useful for explorers and projects that show history tx details including input addresses] from utxo-only instances , which are far more performant for such querying. One can use bootstrap to initially sync dbsync w/o tx_out|ma_tx_out and then once synched switch to prune utxos. A combination of the above would really help out users query consumption.

CC: @sgillespie - since you're working on this, to consider feasibility of spent field instead of above approach.

kderme · 2024-12-09T17:30:54Z

Adding a new updateable field in ma_tx_out can be very slow, since this table is huge and updating is already quite slow for tx_out.

DBSync schema is quite normalised and requires joins. Denormalising it by merging tables together can help for specific queries which fetch all the MA from a tx or output. It's true that queries based on MA policy won't work that well or won't work at all (we can try with jsonb or postgres arrays but it will be worse than the current solution). So it really depends on what are the frequent queries. If users have both queries based on policy and based on tx/output I guess the current schema would be ideal, or even allow both a MA table and a MA field in tx_out at the cost of more disk.

One can use bootstrap to initially sync dbsync w/o tx_out|ma_tx_out and then once synched switch to prune utxos.

We want to use the bootstrap idea more, even for live stake, live voting power etc. The consumed option is a compromise: users still get the full history, but it's also relatively fast to query the current state.

By the way with the opportunity of this issue and other older ones, I've considered the possibility of porting db-sync to Mongodb, which provides better support for denormalised data.

rdlrt · 2024-12-18T07:30:48Z

So it really depends on what are the frequent queries. If users have both queries based on policy and based on tx/output I guess the current schema would be ideal, or even allow both a MA table and a MA field in tx_out at the cost of more disk.

From koios queries (ordered based on past 20 days):
policy => addresses (alongwith their balance)
asset => utxos
asset => addresses (alongwith their balance)
asset => info (current balance and meta info)
asset => txs

The change above will likely not make sense for many of these, especially if ma_tx_out stays and we're 'adding' more data to tx_out. The current layout atleast keeps tx_out away from vast spam of ma_tx_out entries - hence, I mentioned it's a bad idea to change that (or add an option over complicating further - we hardly have enough test data coverage across various options).

We want to use the bootstrap idea more, even for live stake, live voting power etc. The consumed option is a compromise: users still get the full history, but it's also relatively fast to query the current state.

Agreed - (sorry my next statement will be out-of-topic on this issue) I'd say that's a preset that makes most sense (part missing is starting dbsync with bootstrap option, once on tip automatically stop taking ledger snapshots and start pruning). It is potentially also useful to allow pruning frequency configurable as deletes can take a while, some may prefer more frequent deletes, while others may prefer waiting for it once per epoch.

sgillespie self-assigned this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`consumed_by_tx_id` on `ma_tx_out` #1913

`consumed_by_tx_id` on `ma_tx_out` #1913

Saghen commented Nov 29, 2024 •

edited

Loading

kderme commented Nov 30, 2024

Saghen commented Nov 30, 2024 •

edited

Loading

rdlrt commented Dec 7, 2024 •

edited

Loading

kderme commented Dec 9, 2024 •

edited

Loading

rdlrt commented Dec 18, 2024

consumed_by_tx_id on ma_tx_out #1913

consumed_by_tx_id on ma_tx_out #1913

Comments

Saghen commented Nov 29, 2024 • edited Loading

kderme commented Nov 30, 2024

Saghen commented Nov 30, 2024 • edited Loading

rdlrt commented Dec 7, 2024 • edited Loading

kderme commented Dec 9, 2024 • edited Loading

rdlrt commented Dec 18, 2024

`consumed_by_tx_id` on `ma_tx_out` #1913

`consumed_by_tx_id` on `ma_tx_out` #1913

Saghen commented Nov 29, 2024 •

edited

Loading

Saghen commented Nov 30, 2024 •

edited

Loading

rdlrt commented Dec 7, 2024 •

edited

Loading

kderme commented Dec 9, 2024 •

edited

Loading