I Can Kill Your Browser with a Simple Regex

Maciek Rząsa, @mjrzasa, Toptal

View My GitHub Profile

Regex Optimization Exercises

Presentation

Intro

Test those two simple pages that uses regex:

  1. Simple input, without JS (validation)
  2. Editable regex (search)

Part 1: Warm-up

1.1. Repetition

Regex101
You need to match HTML tags. (In general case it’s a bad idea to use regex for HTML parsing, but it’s a good learning example).

  1. Add text inside/after the tag, see if step count changes; see debugger
  2. Add another tag, see the result
  3. There are two solutions: limit repetition using lazy quantifier .*?, or limit scope [^>]. Try both, add text inside/after the tag, see step count changes

1.2. Alternative

Regex101
Optimize regex that finds certain CSS classes related to product: product-size, product-column, product-info and product ids that has digits 1,2,3.

Part 2: Understanding Catastrophe

2.1. Exponential

Regex101
Watch the step count and the debugger while doing those changes:

  1. Add more a at the beginning
  2. Remove b

2.2. Just Polynomial

Regex101
Watch the step count and the debugger while doing those changes:

  1. Add more a at the beginning
  2. Remove b, add more a.

2.3. Arithmetic operations.

Regex101
You have a regex matching simple arithmetic operations. Allowed: two numbers separated with plus or minus sign, ending with equals sign, e.g. 12+34= or 32121-23=

  1. Enhance regex to allow 3 numbers and 2 signs (e.g. 12+322-1= ).
  2. Enhance regex to allow any lenght of the operation (e.g. 12+322-1+223-2323+...=).
  3. Remove equals sign from the test string, check steps in debugger.

When you’re done, go to the editable regex demo page and see impact of hastily written regex on user experience. To see timing stats, open browser console. Optimize the regex (you can start in Regex101 and apply it to the demo page).

2.4. Wrong way to parse CSV

Regex101
You have a regex matching 6th column (Tea) in CSV file (again, there are better ways to parse CSV…). It’s rather slow even for valid input. Optimize it.

Part 3: Into the Wild

3.1. Cloudflare

Regex101
Analyse regex that stopped Cloudflare servers. Optimize it to remove catastrophic backtracking.

3.2. Discourse customer

Regex101
Analyse regex found by Sam Saffron at code written by a Discourse client. It replaces space before certain characters by HTML thin space (French typography). See how it performs on unexpected input. Optimize it.

3.3. Microsoft

Regex101
Analyze regex used in a Microsoft project to match Windows username. Find an input causing catastrophic bactracking. Optimize it.

Cheatsheet

Operators:

Good practices

References