Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS: do fewer regexp matches in SensitiveActions #16306

Merged
merged 1 commit into from
Apr 24, 2024
Merged

Conversation

nickrolfe
Copy link
Contributor

@nickrolfe nickrolfe commented Apr 23, 2024

It's generally faster to do name.regexpMatch("a|b|c") than name.regexpMatch(["a", "b", "c"]).

SensitiveDataHeuristics.qll is shared with Python/Ruby/Swift, but I haven't modified it in a way that would affect those languages. This approach would likely help those languages as well, but I don't know if any of them have the kinds of tuple counts we see with JS.

Before:

[2024-04-22 23:40:12] Evaluated non-recursive predicate _SensitiveActions::writesProperty/2#18c4f0d1_SensitiveDataHeuristics::HeuristicNames::maybeSensitive__#shared@b4f553uj in 2938ms (size: 1420).
Evaluated relational algebra for predicate _SensitiveActions::writesProperty/2#18c4f0d1_SensitiveDataHeuristics::HeuristicNames::maybeSensitive__#shared@b4f553uj with tuple counts:
        3573353  ~0%    {4} r1 = JOIN `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e` WITH `SensitiveActions::writesProperty/2#18c4f0d1` CARTESIAN PRODUCT OUTPUT Lhs.0, Lhs.1, Rhs.0, Rhs.1
           1420  ~0%    {4}    | JOIN WITH PRIMITIVE regexpMatch#bb ON Lhs.3,Lhs.1
           1420  ~0%    {3}    | SCAN OUTPUT In.0, In.2, In.3
                        return r1

[2024-04-22 17:56:53] Evaluated non-recursive predicate _SensitiveActions::SensitiveVariableAccess#f2802ef9_SensitiveDataHeuristics::HeuristicNames::maybeSe__#shared@b0b8f7fs in 9711ms (size: 2268).
Evaluated relational algebra for predicate _SensitiveActions::SensitiveVariableAccess#f2802ef9_SensitiveDataHeuristics::HeuristicNames::maybeSe__#shared@b0b8f7fs with tuple counts:
        9498951  ~3%    {4} r1 = JOIN `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e` WITH SensitiveActions::SensitiveVariableAccess#f2802ef9 CARTESIAN PRODUCT OUTPUT Lhs.0, Lhs.1, Rhs.0, Rhs.1
           2268  ~0%    {4}    | JOIN WITH PRIMITIVE regexpMatch#bb ON Lhs.3,Lhs.1
           2268  ~5%    {3}    | SCAN OUTPUT In.0, In.2, In.3
                        return r1

[2024-04-23 00:27:43] Evaluated non-recursive predicate _SensitiveActions::SensitiveFunctionName#30a0ea9b_SensitiveDataHeuristics::HeuristicNames::maybeSens__#shared@9c190645 in 350ms (size: 216).
Evaluated relational algebra for predicate _SensitiveActions::SensitiveFunctionName#30a0ea9b_SensitiveDataHeuristics::HeuristicNames::maybeSens__#shared@9c190645 with tuple counts:
        361354  ~1%    {3} r1 = JOIN `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e` WITH SensitiveActions::SensitiveFunctionName#30a0ea9b CARTESIAN PRODUCT OUTPUT Lhs.0, Lhs.1, Rhs.0
           216  ~0%    {3}    | JOIN WITH PRIMITIVE regexpMatch#bb ON Lhs.2,Lhs.1
           216  ~3%    {2}    | SCAN OUTPUT In.0, In.2
                       return r1

After:

[2024-04-22 23:56:39] Evaluated non-recursive predicate _SensitiveActions::writesProperty/2#18c4f0d1_SensitiveDataHeuristics::HeuristicNames::maybeSensitive__#shared@6979a5od in 511ms (size: 1419).
Evaluated relational algebra for predicate _SensitiveActions::writesProperty/2#18c4f0d1_SensitiveDataHeuristics::HeuristicNames::maybeSensitive__#shared@6979a5od with tuple counts:
             1  ~0%    {1} r1 = AGGREGATE `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e_1_#concat_range`, `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e_1_1_#concat_term` ON In.2, In.3 WITH CONCAT<0 ASC> OUTPUT , Agg.0
        510479  ~0%    {5}    | JOIN WITH `SensitiveActions::writesProperty/2#18c4f0d1` CARTESIAN PRODUCT OUTPUT _, Rhs.0, Rhs.1, Lhs.0, _
        510479  ~1%    {3}    | REWRITE WITH Tmp.0 := "(?:", Tmp.4 := ")", Out.0 := (Tmp.0 ++ In.3 ++ Tmp.4) KEEPING 3
          1419  ~0%    {3}    | JOIN WITH PRIMITIVE regexpMatch#bb ON Lhs.2,Lhs.0
          1419  ~0%    {2}    | SCAN OUTPUT In.1, In.2
                       return r1
[2024-04-22 23:07:11] Evaluated non-recursive predicate _SensitiveActions::SensitiveVariableAccess#f2802ef9_SensitiveDataHeuristics::HeuristicNames::maybeSe__#shared@7789803b in 3598ms (size: 2267).
Evaluated relational algebra for predicate _SensitiveActions::SensitiveVariableAccess#f2802ef9_SensitiveDataHeuristics::HeuristicNames::maybeSe__#shared@7789803b with tuple counts:
              1  ~0%    {1} r1 = AGGREGATE `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e_1_#concat_range`, `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e_1_1_#concat_term` ON In.2, In.3 WITH CONCAT<0 ASC> OUTPUT , Agg.0
        1356993  ~1%    {5}    | JOIN WITH SensitiveActions::SensitiveVariableAccess#f2802ef9 CARTESIAN PRODUCT OUTPUT _, Rhs.0, Rhs.1, Lhs.0, _
        1356993  ~0%    {3}    | REWRITE WITH Tmp.0 := "(?:", Tmp.4 := ")", Out.0 := (Tmp.0 ++ In.3 ++ Tmp.4) KEEPING 3
           2267  ~0%    {3}    | JOIN WITH PRIMITIVE regexpMatch#bb ON Lhs.2,Lhs.0
           2267  ~0%    {2}    | SCAN OUTPUT In.1, In.2
                        return r1

[2024-04-23 00:30:46] Evaluated non-recursive predicate _SensitiveActions::SensitiveFunctionName#30a0ea9b_SensitiveDataHeuristics::HeuristicNames::maybeSens__#shared@1ec408nh in 84ms (size: 215).
Evaluated relational algebra for predicate _SensitiveActions::SensitiveFunctionName#30a0ea9b_SensitiveDataHeuristics::HeuristicNames::maybeSens__#shared@1ec408nh with tuple counts:
            1  ~0%    {1} r1 = AGGREGATE `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e_1_#concat_range`, `SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp/1#a7ec008e_1_1_#concat_term` ON In.2, In.3 WITH CONCAT<0 ASC> OUTPUT , Agg.0
        51622  ~0%    {4}    | JOIN WITH SensitiveActions::SensitiveFunctionName#30a0ea9b CARTESIAN PRODUCT OUTPUT _, Rhs.0, Lhs.0, _
        51622  ~0%    {2}    | REWRITE WITH Tmp.0 := "(?:", Tmp.3 := ")", Out.0 := (Tmp.0 ++ In.2 ++ Tmp.3) KEEPING 2
          215  ~0%    {2}    | JOIN WITH PRIMITIVE regexpMatch#bb ON Lhs.1,Lhs.0
          215  ~1%    {1}    | SCAN OUTPUT In.1
                      return r1

@github-actions github-actions bot added the JS label Apr 23, 2024
@nickrolfe nickrolfe force-pushed the nickrolfe/js-sensitive branch from a5d8cca to 003d208 Compare April 23, 2024 14:31
@nickrolfe nickrolfe added the no-change-note-required This PR does not need a change note label Apr 23, 2024
@nickrolfe nickrolfe marked this pull request as ready for review April 23, 2024 15:05
@nickrolfe nickrolfe requested review from a team as code owners April 23, 2024 15:05
@nickrolfe
Copy link
Contributor Author

DCA shows some stage timing improvements for SensitiveActions.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Maybe do a followup where you use the new predicate in the other languages?

@nickrolfe nickrolfe merged commit af72c08 into main Apr 24, 2024
36 checks passed
@nickrolfe nickrolfe deleted the nickrolfe/js-sensitive branch April 24, 2024 08:49
@geoffw0
Copy link
Contributor

geoffw0 commented Apr 24, 2024

The excessive calls to regexpMatch had been on my mind in Swift as well. I'd welcome a follow-up to this including Swift - or if you don't have time I'll at least create an issue to track this.

@nickrolfe
Copy link
Contributor Author

I'm doing a follow-up for Ruby, because it's very similar to the JS version, particularly in its use of the nameIndicatesSensitiveData predicate.

Swift and Python look different enough – and don't use nameIndicatesSensitiveData – that I think I will have to leave them alone.

@geoffw0
Copy link
Contributor

geoffw0 commented Apr 24, 2024

I've created an issue for doing this in Swift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants