Tuesday 3 November 2009

Regexes from the perspective of a noob. Also autumn images.



Pictures this time are from a visit last week to Anglesey Abbey. I've no idea what the round-leaved shrub is called but it is lovely, isn't it?
The purple berries likewise. I did buy one of these a couple of years ago but it failed so comprehensively that I can't even remember where I planted it. The last photo is a Japanese Maple. But you knew that.

There comes a time, for any programmer, when they have to get their hands dirty with regular expressions. Yuk. They look like maths but are not really, as the rules are fairly random. This cannot be helped, as the system grew organically and is used by too many languages and too much legacy code to change.

One thing I dislike very much and will change in my own code is seeing a regex like this:

string matchString = @"^(?\\d{4}-\\d{2}-\\d{2})\\s*(?



What on earth does that mean? (Actually I had to break this line up so it is readable. In my code editor it is all on one line but of course that wraps and "pre" tags don't.) You look in tutorial books and on MSDN and other places and all the examples seem to be like this. Do you know, you are allowed to approach this like a normal programmer, not a machine, and break it all up into smaller chunks, each of which has a sensible name and a comment?


string dateMatch = @"(?\d{4}-\d{2}-\d{2})"; // 0000-00-00 pattern
string timeMatch = @"(?

Then add them all together:

string matchString = @"^" + dateMatch + @"\s*" + timeMatch + findExit + exitCodeMatch; 

Isn't that so much nicer? I for one certainly feel that regexes are slightly less likely to come back and haunt me later if I compose them in this way.But as I said in the title, I am still a noob in this area.


By the way, the @ signs are peculiar to .Net and mean yes, I do want  those backslashes. The code won't compile without them.

The more eagle-eyed of you may notice that this example comes from an exercise in the MCTS Microsoft .Net Framework - Application Development Foundation book. Don't worry, I composed the regexes myself and there may well have been a better way of doing it.