Select up to N (random, or the N first) rows for each unique values of a column using unix or awk (no sql)?

This will return a constant number of rows (two in this case) for each unique value in column 2, but I'm pretty sure this isn't quite what you expected. Your input data is in the file 'test.txt.

This will return a constant number of rows (two in this case) for each unique value in column 2, but I'm pretty sure this isn't quite what you expected. Your input data is in the file 'test. Txt'.

$ sort -k2 -u test. Txt > a. Tmp; sort a.

Tmp a. Tmp 3 00017.padded. Fasta 1769 3 00017.padded.

Fasta 1769 5 00059.padded. Fasta 2986 5 00059.padded. Fasta 2986 6 00108.padded.

Fasta 2348 6 00108.padded. Fasta 2348 It's not clear what you expect if your input has only one row for a given unique value in column 2. If you still want two rows in the output, then this will work.

Thanks for this. To answer your question: I should have said I want to extract up to N rows for each given unique value in column 2, i.e. If N=2 and there is only one row for a given unique value in column 2, I would like to output this only one row.

– caroleS Apr 20 at 4:47.

! /bin/bash # tested with bash 4 declare -A assoc declare -a count while read -r line do array=($line) assoc ${array0} +="${array@}|" done bash N. Sh 6 00108.padded.

Fasta 2348 6 00108.padded. Fasta 2348 3 00017.padded. Fasta 1769 3 00017.padded.

Fasta 1769 5 00059.padded. Fasta 2986 5 00059.padded. Fasta 2986.

Thanks for this. I can't make this work. I replaced 'file' by my input file, but if I run it as row.Sh > output.

Txt it gives an empty output file. Moreover, I added a precision in my question: if N=2 but for one given unique value in column 2, I have only one row, it should output this single row.So it is 'up to N' rather than N. Thanks for your help – caroleS Apr 20 at 7:09.

Here is a small script for your purpose: #! /usr/bin/ksh awk '{ print $0 >$2". Yourfile"}' yourfile for I in *.

Yourfile do awk 'NR script. Sh 3 00017.padded. Fasta 1769 3 00017.padded.

Fasta 1769 5 00059.padded. Fasta 2986 5 00059.padded. Fasta 2986 6 00108.padded.

Fasta 2348 remove temp files successful torinoco! DBL:/oo_dgfqausr/test/dfqwrk12/vijay.

Thanks for this script. I tried it but it did not work. Must have done something wrong, but I am beginner so...here is what I run: awk '{ print $0 >$2 }' row.

Txt for I in row. Txt do awk 'NR-eq 0 then echo "remove temp files successful" fi just obtained the same info as in the input file, so no selection of 4 rows for each given unique value in column 2. – caroleS Apr 20 at 16:45.

This is my file test.awk. $1 >= n {$1 = n;} $1 5 00059.padded. Fasta 2986 5 00059.padded.

Fasta 2986 5 00059.padded. Fasta 2986 5 00059.padded. Fasta 2986 5 00059.padded.

Fasta 2986 5 00059.padded. Fasta 2986 6 00108.padded. Fasta 2348 3 00017.padded.

Fasta 1769 3 00017.padded. Fasta 1769 3 00017.padded. Fasta 1769 3 00017.padded.

Fasta 1769 1 00001.padded. Fasta 1000 And this is the command line that gives you up to 'n' lines of output. $ sort test.

Txt | uniq -c | awk -v n=2 -f test. Awk | cut -f 1 -d " " --complement 1 00001.padded. Fasta 1000 3 00017.padded.

Fasta 1769 3 00017.padded. Fasta 1769 5 00059.padded. Fasta 2986 5 00059.padded.

Fasta 2986 6 00108.padded. Fasta 2348 To change the number of lines, change the value assigned to 'n'. N=4, n=3, etc.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions