Use regex

The Use regex step matches patterns using Regular Expression to find or replace values. Regular Expressions, or RegEx for short, are useful for matching patterns in text or numbers (or anything, really). RegExr.com is an excellent resource to use when composing your regular expression.

Input/output

For our input data, we'll use a list sales orders.

Our output after using this Use Regex step is a new column "Sales Order Number" with the replaced [A-Z] values by the letter A.

Settings

Pick the dataset, type the "Target column" the target column name or index to find the regex pattern,

then, type the "Regex expression" Example: [A-Z],

then, type in "Replace value" the value to replace the value with the found one. Pick "Empty Text" if you'd like this step to replace the located values with blanks (i.e. replacing it with empty row cells).

Helpful tips

Again, we recommend RegExr.com as a useful tool when working with RegEx. We particularly find their "Community Patterns" section useful where you can find RegEx patterns that others have used before.

Characters

  • . any character, except a newline

  • \w any word

  • \d any digit

  • \s any whitespace

  • \W anything except a word

  • \D anything except a digit

  • \S anything except whitespace

  • [abc] any of a, b, and/or c - you can use dashes in here too such as [a-z] or [0-9]

  • [^abc] not a, b, nor c - you can use dashes in here too such as [a-z] or [0-9]

  • [a-g] any character between a and g

  • [0-5] any digit between 0 and 5

Quantifiers and Alternators

  • a any amount of the letter a in a row. 0 or more

  • a+ 1 or more of the letter a in a row

  • a? 0 or 1 of the letter a

  • a{5} exactly 5 of the letter a in a row

  • a{2,} 2 or more of the letter a in a row

  • a{1,3} between 1 and 3 (inclusive) of the letter a in a row

  • a+? 1 or more of the letter a in a row, but match as few as possible

  • a{2,}? 2 or more of the letter a in a row, but match as few as possible

  • ab|cd match either ab or cd in a cell

Anchors

Anchors help you define how an expression is related to the beginning or end of a cell

  • ^abc the cell that starts with abc

  • def$ the cell that ends with def

  • ^$ a blank cell

Escaped characters

Using a backslash, you can indicate invisible characters, or escape any character that normally has a special purpose

  • \. escape a dot so that it is seen as an actual dot

  • \\ escape a backslash so that it is seen as an actual backslash

  • \t find a tab in your text

  • \r find a newline in your text

  • \n find a different type of newline in your text

Groups & Lookarounds

Groups are used to capture bits of text and then interact with them in the replacement function.

  • (abc) capture the group that contains abc within a cell

  • \1 back reference to the first capture group

  • (?:abc) non-capturing group that contains abc within a cell

  • (?=abc) positive lookahead

  • (?!abc) negative lookahead

Last updated