How do I crop a very large text file between first and last occurrence of a string? (Linux)?

The following script will do all of what you asked including the bonus Put this script in the top-level directory that contains all the possible files with the 'uid' you want to crop. The script will recursively search this directory and crop all files that match and put the result in a new file with a crp extension at the end (see example below). I took special consideration to make sure that this script will work with whatever filename you throw at it, whether it contain spaces or newlines or whatever in its name.

The following script will do all of what you asked, including the bonus. Put this script in the top-level directory that contains all the possible files with the 'uid' you want to crop. The script will recursively search this directory and crop all files that match and put the result in a new file with a .

Crp extension at the end (see example below). I took special consideration to make sure that this script will work with whatever filename you throw at it, whether it contain spaces or newlines or whatever in its name. #!

/bin/bash uid="1111-ABCD-1111-SOME-GUID" while IFS= read -r -d $'\0' file; do printf "%s\n" "? $uid? +1,\$d" "1,/$uid/-1d" "%p" | ex -s "$file" > "$file".

Crp echo "$file being cropped" done /uid. Sh . /sample1.

Txt being cropped . /subdir/sample2. Txt being cropped $ cat .

/sample1.txt. Crp line three containing my session id: 1111-ABCD-1111-SOME-GUID blaa blaa blaa line four containing other session id: 2222-ABCD-1111-SOME-GUID line five blaa blaa blaa line six containing other session id: 3333-ABCD-1111-SOME-GUID blaa blaa blaa line seven containing other session id: 2222-ABCD-1111-SOME-GUID line eight containing my session id: 1111-ABCD-1111-SOME-GUID blaa blaa blaa line nine containing other session id: 3333-ABCD-1111-SOME-GUID line ten containing my session id: 1111-ABCD-1111-SOME-GUID line eleven line twelve containing other session id: 3333-ABCD-1111-SOME-GUID blaa blaa blaa line thirteen containing my session id: 1111-ABCD-1111-SOME-GUID $ cat . /subdir/sample2.txt.

Crp line three containing my session id: 1111-ABCD-1111-SOME-GUID blaa blaa blaa foo bar line eight containing my session id: 1111-ABCD-1111-SOME-GUID blaa blaa blaa baz line ten containing my session id: 1111-ABCD-1111-SOME-GUID As you can see in the example above, my script found two files which matched, one of which was in a sub-directory below the top-level directory.

SessionId is like a guid and appears many times in the file. – TiGz Dec 16 '09 at 10:39 that doesn't help. I need to see an example of what a sessionid can look like and where it can change and how.

– SiegeX Dec 16 '09 at 10:48 Well an example is like: 4934FF07-436E-8D2A-C7C2-A3328B371005_1260470734931_143 however I don't see how that helps. Bare in mind that the log file contains many different session ids from many different sessions but at any particular time I am only interested in a particular instance. I.e.

The sid needs to be an input to the script (however I can do that manually if needs be). – TiGz Dec 16 '09 at 10:56 @SiegeX: I may be wrong, but I think TiGz means he would like the regex to match the files to check, not the sessionId within the files. – Grundlefleck Dec 16 '09 at 11:23 I misunderstood your original 'bonus' question, I had thought that you wanted to use a regex to pick out the sessionid, not which file it was in.

Question for you, can the relevant sessionid be in a single file more than two times? Meaning I would need to skip past all the middle occurrences and proceed to crop until the very last occurrence? – SiegeX Dec 16 '09 at 11:27.

I'd probably do this using cat and awk. Something like cat *. Log | awk 'BEGIN { sidFound = 0; } { if (*check for SID here*) { sidFound =!

SidFound; } if (sidFound) { print $0 } }.

1 cat is not needed. – ghostdog74 Dec 16 '09 at 12:46 He mentions in the question that he may need to scan multiple log files for this Session ID. Cat provides an easy way of scanning multiple files at once.

– Adam Luchjenbroers Dec 16 '09 at 12:50 1 awk can take in file input as well. --> awk '{blah blah}' *. Log – ghostdog74 Dec 16 '09 at 13:14.

Either a few lines of Perl, or: grep -no (make a note of the first and last line numbers with your session ID on) awk 'NR==3,NR==935' (where 3 and 935 are the first and last line numbers returned from the grep command) I can't currently think of a way to make that a one-liner.

This is definitely not what he is looking for. – ghostdog74 Dec 16 '09 at 13:15.

I'd propose something like this : # Find all occurence of session id in the input file grep -n "" "" > /tmp/grep. $$ # get the first line number of session id appearance FIRST_LINE=$(head -1 /tmp/grep. $$ | cut -d: -f1) # get the last line number of session id appearance LAST_LINE=$(tail -1 /tmp/grep.

$$| cut -d: -f1) # Display only the part (inclusive) in between the first and last session id sed -n "${FIRST_LINE},${LAST_LINE}p" "" So that you retrive the line number of the first and last occurence of your pattern in the input file and then, using sed, you display only those (included). It can be optimised (grepping only once) but it should be working.

There's no need to open input file 3 times. – ghostdog74 Dec 16 '09 at 15:11 The two first opening can be merged if you keep the result of grep in a temporary file, ok. However, with linear stream processor programs, I don't see how you can know in advance if the current line must be printed or not, ie if there's another occurence of the session id later in the file.

Keep in mind that it's the same pattern which starts AND ends the portion to dump as far as I understood. – Zeograd Dec 16 '09 at 15:38 for this qns, one have to have better control over file manipulation, like open the file, searching through it, storing lines in temp memory, printing out when necessary , etc etc. – ghostdog74 Dec 16 '09 at 15:58.

The following Perl script (session_id. Pl) does the job: #! /usr/bin/perl my $session_id = '1111-ABCD-1111-SOME-GUID'; while ( ) { if ( /$session_id/ ... /$session_id/ ) { print; } } Make it executable and run it: .

/session_id. Pl.

What about: sed -n "/$session_id/,/$session_id/p" file. Txt?

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions