How to write find-all function (with regex) in awk or sed?

Up vote 1 down vote favorite 1 share g+ share fb share tw.

I have bash function which run python (which return all finded regex from stdin) function find-all() { python -c "import re import sys print '\n'. Join(re. Findall('$1', sys.stdin.read()))" } When I use this regex find-all 'href="(^"*)"' Html it should return first group from the regex (value of href attribute from file index.

Html) How can I write this I sed or awk? Python bash sed awk link|improve this question edited Sep 15 '10 at 8:23 asked Sep 14 '10 at 9:29jcubic3,0521519 91% accept rate.

The regex support in sed, awk and grep differs to the perl-like regular expressions in python. Why do you want to reimplement this using GNU utilities? – MattH Sep 14 '10 at 9:53.

I suggest you use grep -o. -o, --only-matching Show only the part of a matching line that matches PATTERN. E.g.

: $ cat > foo test test test test bar baz test $ grep -o test foo test test test test test Update If you were extracting href attributes from html files, using a command like: $ grep -o -E 'href="(^"*)"' /usr/share/vlc/http/index. Html href="style. Css" href="iehacks.

Css" href="old/" You could extract the values by using cut and sed like this: $ grep -o -E 'href="(^"*)"' /usr/share/vlc/http/index. Html| cut -f2 -d'=' | sed -e 's/"//g' style. Css iehacks.

Css old/ But you'd be better off using html/xml parsers for reliability.

It work fine but, when I use grep -o -E 'href="(^"*)"' it's return the whole matched string not first group (from parentheses). – jcubic Sep 14 '10 at 10:34 Yes, it will. You didn't mention that as a requirement.

What are your requirements? – MattH Sep 14 '10 at 11:00.

Here's a gawk implementation (not tested with other awks): find_all. Sh awk -v "patt=$1" ' function find_all(str, patt) { while (match(str, patt, a) > 0) { for (i=0; I in a; i++) print ai str = substr(str, RSTART+RLENGTH) } } $0 ~ patt {find_all($0, patt)} ' - Then: echo 'asdf href="href1" asdf asdf href="href2" asdfasdf asdfasdfasdfasdf href="href3" asdfasdfasdf' | find_all. Sh 'href="(^"+)"' outputs: href="href1" href1 href="href2" href2 href="href3" href3 Change i=0 to i=1 if you only want to print the captured groups.

With i=0 you'll get output even if you have no parentheses in your pattern.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions