Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support minLength and maxLength for stringMatching #5562

Open
TomerAberbach opened this issue Dec 26, 2024 · 3 comments
Open

Support minLength and maxLength for stringMatching #5562

TomerAberbach opened this issue Dec 26, 2024 · 3 comments

Comments

@TomerAberbach
Copy link
Contributor

TomerAberbach commented Dec 26, 2024

🚀 Feature Request

stringMatching generates based on the regular expression, but doesn't provide a way to set a min or max length on the generated strings.

Motivation

Sometimes you want to test specific lengths of a string matching a regex, but modifying a regular expression to have a specific min and/or max length can be pretty hard and confusing. It would be a lot easier to just be able to pass minLength and maxLength like the string arbitrary.

Example

const nameRegex = /^[a-z]+$/

// Test different lengths without having to modify regex
fc.stringMatching(nameRegex, { minLength: 10 })
fc.stringMatching(nameRegex, { maxLength: 10 })
fc.stringMatching(nameRegex, { minLength: 5, maxLength: 10 })
@dubzzz
Copy link
Owner

dubzzz commented Jan 6, 2025

So far the only option I have for it is to play with .filter but unfortunately performance-wise it will not be ideal. I'd need to think about a way to make such constraint more optimal before offering it on the arbitrary.

@gruhn
Copy link
Contributor

gruhn commented Jan 7, 2025

Hmm, interesting problem. Regular expressions are closed under intersection. So one idea might be to "just" compute the intersection of nameRegex and for example /^.{5,10}$/. That gives a new regular expression which is restricted to strings with min length 5 and max length 10. I think this can be done in ~O(n^2). Not super cheap but it's an upfront cost and maybe negligible for small user defined regular expressions.

One way to do it might be:

  1. Convert both regular expressions to NFAs (I think: O(n))
  2. Compute NFA intersection (I think: O(n^2))
  3. Convert resulting NFA back to regular expression (I think: O(n))

Any route involving DFAs seems to come with the danger of exponential blow up.

For the regular expressions from formal language theory, this is maybe not too complicated but I can imagine you need a ton of case distinctions for the the full JavaScript regular expressions. So not sure if this is the best approach.

@TomerAberbach
Copy link
Contributor Author

This certainly sounds quite tricky! Thanks for all the research here :)

By the way @dubzzz, in case it changes priority, in addition to needing this for typespec-fast-check, we actually coincidentally needed the same exact thing at Stainless (where I work).

We were trying to automatically generate example values for OpenAPI spec schemas like the following (this is just an example schema):

type: string
pattern: [a-z]+
minLength: 3
maxLength: 10

I ended doing fc.stringMatching(...).filter(s => s.length >= minLength && s.length <= maxLength) and passing that to fc.sample.

It's still very quick, but I only needed to generate one value from fc.sample so that probably helped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants