Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Creating a unified mechanism to generate blocklists for ckBTC and ckETH #3401

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

THLO
Copy link
Contributor

@THLO THLO commented Jan 10, 2025

This PR adds a Python script to generate blocklists for ckBTC and ckETH based on the OFAC SDN list.

There are some differences in the produced blocklist files compared to the current files:

Differences to the current ckBTC blocklist file:

  1. The new blocklist file has a comment at the beginning about the script that was used to generate it.
  2. The new blocklist file has a more concise error message for the invalid Bitcoin address in the OFAC SDN list.

Differences to the current ckETH blocklist file:

  1. The new blocklist file has a comment at the beginning about the script that was used to generate it, replacing the old comment.
  2. Since Ethereum addresses are case-insensitive, all addresses are now consistently in lower case.
  3. The parameter from_address in the function is_blocked has been changed to address to make it consistent with the corresponding function for ckBTC.

Corresponding ticket: https://dfinity.atlassian.net/browse/XC-250

@THLO THLO requested a review from a team as a code owner January 10, 2025 14:22
@github-actions github-actions bot added the chore label Jan 10, 2025
Copy link
Member

@gregorydemay gregorydemay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @THLO for making this script! Some minor comments and python newbie questions.

rs/cross-chain/scripts/generate_blocklist.py Outdated Show resolved Hide resolved
#
# 2) Run this script as follows:
#
# python generate_blocklist.py [currency (BTC or ETH)] [path to the SDN.XML file]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we specify the output file as a 3rd argument instead of using the constant BLOCKLIST_FILENAME?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, just write output to stdout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can introduce a third argument.
Note that the script writes stuff to stdout: The written file is supposed to be a valid Rust source file whereas the extracted addresses are written to stdout, which could be used for other purposes including sanity checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the argument.

Comment on lines 66 to 71
blocklist_file.write('#[cfg(test)]\nmod tests;\n\nuse bitcoin::Address;\n\n')
blocklist_file.write('/// The script to generate this file, including information about the source data, can be found here:\n')
blocklist_file.write('/// /rs/cross-chain/scripts/generate_blocklist.py\n\n')
blocklist_file.write('/// BTC is not accepted from nor sent to addresses on this list.\n')
blocklist_file.write('/// NOTE: Keep it sorted!\n')
blocklist_file.write('pub const BTC_ADDRESS_BLOCKLIST: &[&str] = &[\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a multiline string concept in Python, similar to what Rust has? This could simplify readability. Similar comment for the other preamble/postamble.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Python uses """ for multi-line strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I'll try to improve readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def store_blocklist(currency, addresses):
blocklist_file = open(BLOCKLIST_FILENAME, 'w')

if currency == 'BTC':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe that's overkill for now but it might make sense to have a writer per currency, so that we don't have to intertwine if/else statements for preamble/body/postamble

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that it would probably be overkill. I'll see what I can do (without spending too much effort).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't take all that much effort. Please check out the new version!

if currency == 'ETH':
address = address.lower()
if address in INVALID_ADDRESSES:
blocklist_file.write(' // ' + address_prefix + '"' + address[offset:] + '"' + address_suffix + ' (Invalid address prefix)\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would handle invalid addresses in that script:

  1. Separation of concerns: the purpose of that script is to parse the SDN into a list of addresses that can be understood by our Rust code. Our Rust unit tests should catch any invalid address.
  2. The stated reason there (Invalid address prefix) cannot work for all invalid addresses the script tries cannot really determine the reason why an address is invalid (it could try to parse the addresses but I'm not sure it's worth it, in particular because the parsing will be done by another library that we use in our canisters.)

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added it for the sake of convenience. Otherwise, we always run into the issue that the Rust unit tests first need to catch the invalid address(es), which requires an additional update step.
I expect/hope that the guys compiling the list won't introduce many more invalid addresses.

But your comment regarding separation of concerns is valid, of course. If you feel that this concern trumps convenience, then I can remove the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove the check for the sake of simplicity.

for address in addresses:
# Ethereum addresses are case-insensitive. By contrast, only Bech32 Bitcoin addresses are case-insensitive.
if currency == 'ETH':
address = address.lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the address as is: while the address is case-insensitive, the case sensitivity is used as an optional checksum (see EIP-55). Also I have the impression that the generated list may no longer be sorted and this also produces a large diff with the existing blocklist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @gregorydemay that we shouldn't do the case conversion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll remove the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

addresses = []

# Generate the currency-specific suffix.
currency_suffix = 'XBT' if currency == 'BTC' else 'ETH'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question: why is XBT used for BTC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"XBT" is the Bitcoin ticker that complies with the ISO 4217 standard. According to this standard, currencies not tied to a specific nation begin with the letter “X”, and "BT" stands for "Bitcoin". 🙂

blocklist_file.close()

if __name__ == '__main__':
if len(sys.argv) < 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm a bit late to the party. Just a suggestion: the Python argparse library is pretty powerful and e.g. generates useful help/error message for the script CLI. Could be useful here to e.g. restrict currency in {BTC, ETH} and provide a default value for filename.

}'''

def format_address(self, address):
return 'ethereum_address!("' + address[2:] + '")'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a small suggestion: I find f-strings generally easier to read, it is for sure also a matter of taste :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants