# Difference between revisions of "Details on regular expressions as a data source"

Line 27: | Line 27: | ||

− | The 'Regex choice strategy' options | + | The 'Regex choice strategy' options determine in which order strings are generated from the regular expression; that is, how the generator will behave when it encounters a choice in a regular expression, such as alternation ('|') or a quantifier (such as '*' or {2,3}). If 'Random' is selected, then the choice will be made randomly. If 'Increasing' is selected, then the first time a string is generated from the regular expression, the first available option path will be taken, the second time the second, and so on. When it reaches the last choice it starts over again. 'Decreasing' works like 'Increasing' but starting at the last available option and working backwards. Here is a sample of the different behaviors using the same regular expression to generate 5 strings: |

<code><pre> | <code><pre> |

## Revision as of 05:28, 16 August 2007

The message batch generator can populate fields by creating text strings that match regular expressions. A regular expression is a compact syntax for describing a certain set of strings. For example, the regular expression `cat|dog`

describes the two strings `cat`

and `dog`

, while the regular expression `cats?`

describes the two strings `cat`

and `cats`

. Here are some examples of the kinds of strings you can generate from regular expressions:

>cat
cat cat cat cat cat
>cat|dog
cat dog cat dog cat
>cats?
cat cats cat cats cat
>'grrr*'
'grr' 'grrr' 'grrrr' 'grrrrr' 'grrrrrr'
>(mewl)+
mewl mewlmewl mewlmewlmewl mewlmewlmewlmewl mewlmewlmewlmewlmewl
>hot{3,4}
hottt hotttt hottt hotttt hottt
>[a-z]
a b c d e
>[0-4]
0 1 2 3 4
>[ac-e]
a c d e a

There is a preference page accessed through `Window->Preferences->OHF H3ET->Batch Generator->Regex Batch Data Source`

, that sets some required options:

The 'Regex choice strategy' options determine in which order strings are generated from the regular expression; that is, how the generator will behave when it encounters a choice in a regular expression, such as alternation ('|') or a quantifier (such as '*' or {2,3}). If 'Random' is selected, then the choice will be made randomly. If 'Increasing' is selected, then the first time a string is generated from the regular expression, the first available option path will be taken, the second time the second, and so on. When it reaches the last choice it starts over again. 'Decreasing' works like 'Increasing' but starting at the last available option and working backwards. Here is a sample of the different behaviors using the same regular expression to generate 5 strings:

Random:
>a|b|c|d
b d a c d
Increasing:
>a|b|c|d
a b c d a
Decreasing:
>a|b|c|d
d c b a d

The second option on the preference page, 'Upper bound for infinite closures' puts an upper limit on the size of the strings created by using quantifiers like '*' or '+', since they could potentially generate arbitrarily large strings. For example, an upper bound of
3 would mean that the longest string which could be generated from a regular expression like `ab*`

would be `abbb`

.

Finally, the message batch generator requires that if you are using regular expressions as a source of data, then the total number of files must be limited (that is, on the final page of the wizard, we have to select a maximum number of files of create, instead of opting to use the entire data source).

## Reference

The generator supports the following special operators for generating sample strings:

Expression | Description |
---|---|

() | groups the expressions inside the parentheses |

+ | 0 or 1 of the preceding expression |

* | 0 or more of the preceding expression |

+ | 1 or more of the preceding expression |

{n,} | n or more of the preceding expression |

{n,m} | between n and m of the preceding expression (n must not be greater than m) |

[xyz] | any character inside the brackets |

[a-n] | any character in the range |

\ | treat whatever comes next as an ordinary character, and not as a special operator |