Skip to main content
Jump to: navigation, search

Details on regular expressions as a data source

The message batch generator can populate fields by creating text strings that match regular expressions. A regular expression is a compact syntax for describing a certain set of strings. For example, the regular expression cat|dog describes the two strings cat and dog, while the regular expression cats? describes the two strings cat and cats. Here are some examples of the kinds of strings you can generate from regular expressions:

cat cat cat cat cat 
cat dog cat dog cat 
cat cats cat cats cat 
'grr' 'grrr' 'grrrr' 'grrrrr' 'grrrrrr' 
mewl mewlmewl mewlmewlmewl mewlmewlmewlmewl mewlmewlmewlmewlmewl 
hottt hotttt hottt hotttt hottt 
a b c d e 
0 1 2 3 4 
a c d e a 

There is a preference page accessed through Window->Preferences->OHF H3ET->Batch Generator->Regex Batch Data Source, that sets some required options:


The 'Regex choice strategy' options determine in which order strings are generated from the regular expression; that is, how the generator will behave when it encounters a choice in a regular expression, such as alternation ('|') or a quantifier (such as '*' or {2,3}). If 'Random' is selected, then the choice will be made randomly. If 'Increasing' is selected, then the first time a string is generated from the regular expression, the first available option path will be taken, the second time the second, and so on. When it reaches the last choice it starts over again. 'Decreasing' works like 'Increasing' but starting at the last available option and working backwards. Here is a sample of the different behaviors using the same regular expression to generate 5 strings:

b d a c d

a b c d a 

d c b a d

The second option on the preference page, 'Upper bound for infinite closures' puts an upper limit on the size of the strings created by using quantifiers like '*' or '+', since they could potentially generate arbitrarily large strings. For example, an upper bound of 3 would mean that the longest string which could be generated from a regular expression like ab* would be abbb.

The third option on the preference page, "Character class for '.'", allows the user to set how the '.' operator is expanded. In usual regular expression usage, '.' represents any character. However, if regular expressions are used to generate strings, then not all characters are necessarily desirable (such as space or punctuation characters, or characters with non-English diacritical marks). You can therefore either specify a character class (without the []) which will be accessed with the '.' shortcut, or you can leave this field blank, in which case all possible characters will be used.

Finally, the message batch generator requires that if you are using regular expressions as a source of data, then the total number of files must be limited (that is, on the final page of the wizard, we have to select a maximum number of files of create, instead of opting to use the entire data source).


The generator supports the following special operators for generating sample strings:

Expression Description
. any character
() groups the expressions inside the parentheses
+ 0 or 1 of the preceding expression
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
{n,} n or more of the preceding expression
{n,m} between n and m of the preceding expression (n must not be greater than m)
[xyz] any character inside the brackets
[a-n] any character in the range
\ treat whatever comes next as an ordinary character, and not as a special operator

Back to the top