regex

Regular expression functions for pattern matching and text processing. The function signatures follow Python’s re module conventions.

Available Functions

Function Description
match(pattern, string, flags=0) Match pattern at the beginning of string
search(pattern, string, flags=0) Search for pattern anywhere in string
findall(pattern, string, flags=0) Find all non-overlapping matches
sub(pattern, repl, string, count=0, flags=0) Replace pattern matches with repl
split(pattern, string, maxsplit=0, flags=0) Split string by pattern
compile(pattern, flags=0) Compile pattern into a regex object

Match Objects

The re.match() and re.search() functions return a Match object on success, or None if no match is found. Match objects provide the following methods:

Method Description
group(n=0) Returns the nth matched group (0 = full match)
groups() Returns a tuple of all capturing groups (excluding group 0)
start(n=0) Returns the start position of the match
end(n=0) Returns the end position of the match
span(n=0) Returns a (start, end) tuple for the match

Example:

import re

# Search with capturing groups
m = re.search(r'(\w+)@(\w+)\.(\w+)', 'Email: [email protected]')
if m:
    print(m.group(0))   # '[email protected]' (full match)
    print(m.group(1))   # 'user' (first group)
    print(m.group(2))   # 'example' (second group)
    print(m.group(3))   # 'com' (third group)
    print(m.groups())   # ('user', 'example', 'com')
    print(m.start())    # 7 (position where match starts)
    print(m.end())      # 23 (position where match ends)
    print(m.span())     # (7, 23)

Constants (Flags)

The regex library provides the following flags that can be passed to functions:

Flag Shorthand Value Description
re.IGNORECASE re.I 2 Case-insensitive matching
re.MULTILINE re.M 8 ^ and $ match at line boundaries
re.DOTALL re.S 16 . matches newlines

Flags can be combined using the bitwise OR operator (|):

import re

# Combine IGNORECASE and MULTILINE
m = re.match("hello", "HELLO\nWORLD", re.I | re.M)
if m:
    print(m.group(0))  # "HELLO"

Functions

re.match(pattern, string, flags=0)

Checks if the pattern matches at the beginning of the string.

Parameters:

  • pattern: Regular expression pattern
  • string: String to search
  • flags: Optional flags (default: 0)

Returns: Match object if pattern matches at start, or None if no match

Example:

import re

m = re.match("[0-9]+", "123abc")
if m:
    print("String starts with digits:", m.group(0))  # "123"

m = re.match("[0-9]+", "abc123")
if m == None:
    print("Pattern must match at start")

# Case-insensitive matching
m = re.match("hello", "HELLO world", re.I)
if m:
    print("Case-insensitive match:", m.group(0))  # "HELLO"

re.search(pattern, string, flags=0)

Searches for the first occurrence of the pattern anywhere in the string.

Parameters:

  • pattern: Regular expression pattern
  • string: String to search
  • flags: Optional flags (default: 0)

Returns: Match object for the first match, or None if no match found

Example:

import re

m = re.search(r'\w+@\w+\.\w+', "Contact: [email protected]")
if m:
    print(m.group(0))  # "[email protected]"

result = re.search("[0-9]+", "no numbers")
print(result)  # None

# Case-insensitive search
m = re.search("world", "HELLO WORLD", re.I)
if m:
    print(m.group(0))  # "WORLD"

# Using capturing groups
m = re.search(r'(\d+)-(\d+)', "Phone: 555-1234")
if m:
    print(m.group(0))  # "555-1234"
    print(m.group(1))  # "555"
    print(m.group(2))  # "1234"
    print(m.groups())  # ("555", "1234")

re.findall(pattern, string, flags=0)

Finds all occurrences of the pattern in the string.

Parameters:

  • pattern: Regular expression pattern
  • string: String to search
  • flags: Optional flags (default: 0)

Returns: List of strings (all matches)

Example:

import re

phones = re.findall("[0-9]{3}-[0-9]{4}", "Call 555-1234 or 555-5678")
print(phones)  # ["555-1234", "555-5678"]

# Case-insensitive findall
words = re.findall("a+", "aAbBaAa", re.I)
print(words)  # ["aA", "aAa"]

re.finditer(pattern, string, flags=0)

Finds all occurrences of the pattern in the string and returns Match objects.

Parameters:

  • pattern: Regular expression pattern
  • string: String to search
  • flags: Optional flags (default: 0)

Returns: List of Match objects (all matches)

Example:

import re

matches = re.finditer("[0-9]{3}-[0-9]{4}", "Call 555-1234 or 555-5678")
for match in matches:
    print(match.group(0))  # "555-1234", "555-5678"
    print(match.start())   # 5, 18
    print(match.end())     # 13, 26

# With capturing groups
matches = re.finditer(r'(\d+)-(\d+)', "555-1234, 888-9999")
for match in matches:
    print(match.group(0))  # "555-1234", "888-9999"
    print(match.group(1))  # "555", "888"
    print(match.group(2))  # "1234", "9999"
    print(match.groups())  # ("555", "1234"), ("888", "9999")

re.sub(pattern, repl, string, count=0, flags=0)

Replaces occurrences of the pattern in the string with the replacement. The replacement can be either a string or a function. This follows Python’s re.sub() function signature.

Parameters:

  • pattern: Regular expression pattern
  • repl: Replacement string or function that takes a Match object and returns a string
  • string: String to modify
  • count: Maximum number of replacements (0 = all, default: 0)
  • flags: Optional flags (default: 0)

Returns: String (modified text)

Example:

import re

# String replacement
text = re.sub("[0-9]+", "XXX", "Price: 100")
print(text)  # "Price: XXX"

# Replace multiple occurrences
result = re.sub("[0-9]+", "#", "a1b2c3")
print(result)  # "a#b#c#"

# Limit replacements with count
result = re.sub("[0-9]+", "X", "a1b2c3", 2)
print(result)  # "aXbXc3"

# Case-insensitive replacement
result = re.sub("hello", "hi", "Hello HELLO hello", 0, re.I)
print(result)  # "hi hi hi"

# Function replacement - uppercase all words
result = re.sub(r'(\w+)', lambda m: m.group(1).upper(), "hello world")
print(result)  # "HELLO WORLD"

# Function replacement - swap first and last name
result = re.sub(r'(\w+) (\w+)', lambda m: m.group(2) + " " + m.group(1), "John Doe")
print(result)  # "Doe John"

# Function replacement - format inline code
backtick = chr(96)
result = re.sub(backtick + r'([^' + backtick + r']+)' + backtick,
                lambda m: "[" + m.group(1) + "]",
                "test `code` here")
print(result)  # "test [code] here"

re.split(pattern, string, maxsplit=0, flags=0)

Splits the string by occurrences of the pattern.

Parameters:

  • pattern: Regular expression pattern
  • string: String to split
  • maxsplit: Maximum number of splits (0 = all, default: 0)
  • flags: Optional flags (default: 0)

Returns: List of strings (split parts)

Example:

import re

parts = re.split("[,;]", "one,two;three")
print(parts)  # ["one", "two", "three"]

# Limit splits
parts = re.split("[,;]", "a,b;c;d", 2)
print(parts)  # ["a", "b;c;d"]

re.compile(pattern, flags=0)

Compiles a regular expression pattern for validation and caching.

Parameters:

  • pattern: Regular expression pattern
  • flags: Optional flags (default: 0)

Returns: Regex object (compiled pattern) or error if invalid

Example:

import re

pattern = re.compile("[0-9]+")  # Validates and caches the pattern
print(type(pattern))  # "Regex"

# Compile with flags
pattern = re.compile("hello", re.I)
print(type(pattern))  # "Regex"

# Compile with multiple flags
pattern = re.compile("hello", re.I | re.M)
print(type(pattern))  # "Regex"

Compiled Pattern Methods

The Regex object returned by re.compile() provides the following methods:

  • pattern.match(string) - Match at start of string
  • pattern.search(string) - Search anywhere in string
  • pattern.findall(string) - Find all matches as strings
  • pattern.finditer(string) - Find all matches as Match objects

Example:

import re

pattern = re.compile(r'\d+')
m = pattern.match("123abc")  # Match at start
if m:
    print(m.group(0))  # "123"

matches = pattern.findall("a1b2c3")  # ["1", "2", "3"]

match_objects = pattern.finditer("a1b2c3")
for match in match_objects:
    print(match.group(0), match.start(), match.end())
    # "1" 1 2
    # "2" 3 4
    # "3" 5 6

re.escape(string)

Escapes special regex characters in a string.

Parameters:

  • string: String to escape

Returns: String (escaped text)

Example:

import re

escaped = re.escape("a.b+c")
print(escaped)  # "a\.b\+c"

re.fullmatch(pattern, string, flags=0)

Checks if the pattern matches the entire string.

Parameters:

  • pattern: Regular expression pattern
  • string: String to match
  • flags: Optional flags (default: 0)

Returns: Boolean (True if entire string matches, False otherwise)

Example:

import re

if re.fullmatch("[0-9]+", "123"):
    print("Entire string is digits")  # This prints

if re.fullmatch("[0-9]+", "123abc"):
    print("This won't print - doesn't match entire string")

# Case-insensitive fullmatch
if re.fullmatch("hello", "HELLO", re.I):
    print("Case-insensitive full match")  # This prints

Regular Expression Syntax

Scriptling uses Go’s regexp syntax, which is similar to Perl/Python:

Basic Patterns

  • . - Any character (newlines only with DOTALL flag)
  • \d - Digit (0-9)
  • \D - Non-digit
  • \w - Word character (a-z, A-Z, 0-9, _)
  • \W - Non-word character
  • \s - Whitespace
  • \S - Non-whitespace

Quantifiers

  • * - Zero or more
  • + - One or more
  • ? - Zero or one
  • {n} - Exactly n times
  • {n,} - n or more times
  • {n,m} - Between n and m times

Character Classes

  • [abc] - Any of a, b, or c
  • [^abc] - Not a, b, or c
  • [a-z] - Any lowercase letter
  • [A-Z] - Any uppercase letter
  • [0-9] - Any digit

Anchors

  • ^ - Start of string (or line with MULTILINE flag)
  • $ - End of string (or line with MULTILINE flag)
  • \b - Word boundary
  • \B - Not word boundary

Inline Flags

You can also use inline flag modifiers in your patterns:

  • (?i) - Case-insensitive
  • (?m) - Multiline mode
  • (?s) - Dotall mode (. matches newlines)

Usage Examples

import re

# Basic matching at start of string
m = re.match("[0-9]+", "123abc")
if m:
    print("String starts with:", m.group(0))  # "123"

# Search anywhere in string
m = re.search(r'\w+@\w+\.\w+', "Contact: [email protected]")
if m:
    print("Email:", m.group(0))  # "[email protected]"

# Search with groups
m = re.search(r'(\w+)@(\w+)\.(\w+)', "Contact: [email protected]")
if m:
    print("User:", m.group(1))    # "user"
    print("Domain:", m.group(2))  # "example"
    print("TLD:", m.group(3))     # "com"
    print("Groups:", m.groups())  # ("user", "example", "com")

# Find all matches
numbers = re.findall("[0-9]+", "abc123def456")
# ["123", "456"]

# Find all matches as Match objects
matches = re.finditer("[0-9]+", "abc123def456")
for match in matches:
    print(match.group(0), match.start(), match.end())
    # "123" 3 6
    # "456" 9 12

# Replace text
text = re.sub("[0-9]+", "XXX", "Price: 100")
# "Price: XXX"

# Replace with count limit
text = re.sub("[0-9]+", "X", "1 2 3 4 5", 3)
# "X X X 4 5"

# Split by pattern
parts = re.split("[,;]", "one,two;three")
# ["one", "two", "three"]

# Compile pattern (validates and caches)
pattern = re.compile("[0-9]+")
# Regex object

# Use compiled pattern
matches = pattern.finditer("abc123def456")
for match in matches:
    print(match.group(0))  # "123", "456"

# Escape special characters
escaped = re.escape("a.b+c*d?")
# "a\.b\+c\*d\?"

# Full match entire string
if re.fullmatch("[0-9]+", "123"):
    print("String contains only digits")

# Case-insensitive matching with flag
m = re.match("hello", "HELLO world", re.I)
if m:
    print("Case-insensitive match:", m.group(0))

# Case-insensitive matching with inline flag
m = re.match("(?i)hello", "HELLO world")
if m:
    print("Inline flag match:", m.group(0))

# Multiline matching
text = "line1\nline2\nline3"
matches = re.findall("^line", text, re.M)
# ["line", "line", "line"]

# Dotall - dot matches newlines
m = re.search("a.*b", "a\nb", re.S)
if m:
    print("Dotall match:", m.group(0))  # "a\nb"

Notes

  • Patterns use Go’s regexp engine (RE2)
  • re.match() and re.search() return Match objects (not strings) like Python
  • All functions are case-sensitive by default
  • Use re.I or re.IGNORECASE flag for case-insensitive matching
  • Alternatively, use (?i) at the start of pattern for case-insensitive matching
  • Backslashes in patterns need to be escaped in Scriptling strings
  • The count parameter in re.sub() limits the number of replacements (0 = replace all)
  • The maxsplit parameter in re.split() limits the number of splits

RE2 Limitations (Differences from Python re)

Scriptling uses Go’s RE2 engine, which intentionally omits some features found in Python’s re module (which uses a backtracking engine):

Feature Python re Scriptling (RE2) Workaround
Backreferences (\1, \2) Restructure pattern to avoid them
Lookahead ((?=...)) Restructure pattern or post-filter results
Lookbehind ((?<=...)) Restructure pattern or post-filter results
Negative lookahead ((?!...)) Restructure pattern or post-filter results
Negative lookbehind ((?<!...)) Restructure pattern or post-filter results
Atomic groups ((?>...)) Not needed with RE2 (no backtracking)
Possessive quantifiers (*+, ++) Not needed with RE2 (no backtracking)
Named backreferences ((?P=name)) Restructure pattern to avoid them

The most common issue is backreferences — patterns like r'<(h\d)>.*?</\1>' that use \1 to match the same text as a capturing group will fail with a compile error. Rewrite them to repeat the pattern explicitly:

import re

# Python - uses backreference \1 to match closing tag
# pattern = r'<(h\d)>(.*?)</\1>'  # Does NOT work in Scriptling

# Scriptling - repeat the pattern instead
matches = re.findall(r'<(h\d)>(.*?)</(?:h\d)>', html, re.IGNORECASE | re.DOTALL)
for tag, content in matches:
    print(tag, content)