It should be easy. Let’s say we are searching for all scripts at web page with regular expression like this:
<script.*</script>
And it will generally work… but with assumption that page contains only one script tag. In case that there is more… for example:
<html><head><script>first script</script></head><body>example body<script>second script</script></body></html>
The result of match is:
<script>first script</script></head><body>example body<script>second script</script>
instead of expected:
<script>first script</script>
The reason is the greedy nature of .* regular expression qualifier. It matches as much text as possible.
The solution is to use non-greedy qualifier which is .*? which matches as little text as possible.
So the regular expression should look like this:
<script.*?</script>
Thanks to Regular Expression HOWTO for explaining this.