Home > regular expressions > Multiline regex pattern

Multiline regex pattern

Task: Parse a file and capture whatever text appears between a pair of double quotes like the following:

“Catch me”

Not so difficult, you could use the following regex:

“.*”

This will catch any character within double quotes in a group
¿any?

Well, if you have to deal with multi-line characters (CR / LF) like in the following text:

“Catch me
if you can”

The . special character that means “any character” in fact means “any character, except for newlines”. So it won’t work in our case. And you can’t also put it within a set of characters like: [.\s] which would mean:
. match any character
\s plus any space character (including new line ones)
The problem is that inside [] the special characters like . or * or ? lose their meaning and are treated as literals.
Python address this issue with the option re.DOTALL, which makes a dot mean really any character even a new lines.
If you are working with other language, with a regex library, but without this option, like C# for instance you could use this trick:
“[\w\W]*”

The [\w\W] means: “catch any word character + non word characters”. You could solve it by combining other special characters, but I find this way specially clear as there is no doubt that you will truly catch any character as you are just adding two complementary sets.

Making things more complex

If you have a file like this:

“Catch me
if you can”
“other line”
“and the last one”

The regex will catch from the first double quote, to the last one. To solve it you shoud use a non greedy multiplier like: *?


When you use * you are saying “match zero or more characters that fulfill the preceding condition.” But the regex engine will choose the longest match possible, as you are using a greedy quantifier. To solve this you should use a non-greedy quantifier, like in this regex:
“[\w\W]*?”
And to put anything (except the double-quotes) in a group (so you could for instance iterate over the results) , just add some brackets after and before the quotes:

“([\w\W]*?)”


Code examples

In C# (Visual C# 2010) we need to do the following:


using System;
using System.Text.RegularExpressions;</code>

namespace ConsoleApplication1
{
	class TestRegularExpressions
	{
		static void Main()
		{
			// double "" are used to escape double-quotes
			// "?" is used to give the capture text a simple name
			// @ means the text is a string literal and we don't want that C# escapes any character (like is usual when you write regex patterns)
			string pattern = @"""(?[\w\W]*?)""";</code>

			Regex regex = new Regex(pattern);

			string text = new System.IO.StreamReader(@"c:\Users\adrian\test.txt").ReadToEnd();
			/*
			* Suppose that c:\\Users\\adrian\\test.txt has the following content:
			*
			"Catch me
			if you can"
			"other line"
			"and the last one"
			*/

			Match m = regex.Match(text);

			//iterate in all the captures
			while (m.Success)
			{
				Console.WriteLine("Captured line: " + m.Groups["quoted_line"]);
				m = m.NextMatch();
			}

			Console.WriteLine();

		}
	}
}

That will print:
Captured line: Catch me
if you can
Captured line: other line
Captured line: and the last one


Of course in Python you have to invest much less effort to get the same.

'''
Created on 25/12/2009</code>

capturing regex groups example

@author: adrian
'''

import re

pattern = r'"(?P.*?)"'

text = """"Catch me
if you can"
"other line"
"and the last one" """

# Retrieve group(s) by name
for m in re.finditer(pattern, text, re.DOTALL):
    print "Captured line: %s " % m.group("quoted_line")

The output is the same as before:

Captured line: Catch me
if you can
Captured line: other line
Captured line: and the last one

Note the differences between Python and C#:

  • As we previously mentioned, you can use re.DOTALL to capture also new lines.
  • To name a group “quoted line” you write in Python ?P<quoted_line> instead of the C# version ?<quoted_line>
  • You write less and get more!
  1. August 23rd, 2010 at 06:55 | #1

    I would like to exchange links with your site adrian.org.ar
    Is this possible?

  2. admin
    August 25th, 2010 at 03:08 | #2

    No right now, thank you.

  3. lemery
    September 25th, 2010 at 19:15 | #3

    Great writing, been waiting for that!?!

  4. kami
    September 27th, 2010 at 12:34 | #4

    Great read! I wish you could follow up on this topic?!?

  5. March 3rd, 2011 at 17:37 | #5

    Hi, i think that i saw you visited my web site thus i came to “return the favor”.I’m attempting to find things to improve my site!I suppose its ok to use a few of your ideas!!

  6. March 16th, 2011 at 17:27 | #6

    Great blog post and genuinely can assist with becoming familiar with the topic much better.

  7. bicis electricas
    April 10th, 2012 at 23:21 | #7

    Very great post. I just stumbled upon your weblog and wished to mention that I have really enjoyed browsing your weblog
    posts. After all I’ll be subscribing for your rss feed and I’m hoping you write once more soon!

  1. No trackbacks yet.