Software Carpentry
Regular Expressions


Introduction


You Can Skip This Lecture If...


A Simple Example


This or That


Precedence


Escaping Special Characters


Raw Strings


Sequences


Making Something Optional


Character Sets


Abbreviations


Special Cases


Anchoring


Extracting Matches


Match Objects


Match Groups


Reversing Columns


Compiling


Finding Title Case Words


Finding All Matches


Reference Material


But Wait, There's More


Summary


Exercises

Exercise 10.1:

By default, regular expression matches are greedy: the first term in the RE matches as much as it can, then the second part, and so on. As a result, if you apply the RE «X(.*)X(.*)» to the string "XaX and XbX", the first group will contain "aX and Xb", and the second group will be empty.

It's also possible to make REs match reluctantly, i.e., to have the parts match as little as possible, rather than as much. Find out how to do this, and then modify the RE in the previous paragraph so that the first group winds up containing "a", and the second group " and XbX".

Exercise 10.2:

What the easiest way to write a case-insensitive regular expression? (Hint: read the documentation on compilation options.)

Exercise 10.3:

What does the VERBOSE option do when compiling a regular expression? Use it to rewrite some of the REs in this lecture in a more readable way.

Exercise 10.4:

What does the DOTALL option do when compiling a regular expression? Use it to get rid of the call to string.split in the example that finds words ending in vowels.