Is SQL GROUP BY a design flaw?

You don't have to group by the exactly the same thing you're selecting, e.g.

You don't have to group by the exactly the same thing you're selecting, e.g. : SQL:select priority,count(*) from rule_class group by priority PRIORITY COUNT(*) 70 1 50 4 30 1 90 2 10 4 SQL:select decode(priority,50,'Norm','Odd'),count(*) from rule_class group by priority DECO COUNT(*) Odd 1 Norm 4 Odd 1 Odd 2 Odd 4 SQL:select decode(priority,50,'Norm','Odd'),count(*) from rule_class group by decode(priority,50,'Norm','Odd') DECO COUNT(*) Norm 4 Odd 8.

1 Of course. All non-aggregate attributes in the SELECT must appear in the GROUP BY but not all attributes in the GROUP BY must appear in the SELECT. I feel dumb for not checking the converse.

But there is still a case for making GROUP BY optional in the presence of a mix of aggregate and non-aggregate SELECT columns, a common use case. I really hate the mysql behavior of just randomly choosing a column when the SELECT attribute isn't accounted for in the GROUP BY – Samuel Danielson Feb 22 '10 at 16:37 Er, I mean when mysql just chooses an arbitrary tuple. – Samuel Danielson Feb 22 '10 at 16:45.

I agree its verbose that the group by list shouldn't implicitly be the same as then non-aggregated select columns. In Sas there are data aggregation operations that are more succinct. Also : it's hard to come up with an example where it would be useful to have a longer list of columns in the group list than the select list.

The best I can come up with is ... create table people ( Nam char(10) ,Adr char(10) ) insert into people values ('Peter', 'Tibet') insert into people values ('Peter', 'OZ') insert into people values ('Peter', 'OZ') insert into people values ('Joe', 'NY') insert into people values ('Joe', 'Texas') insert into people values ('Joe', 'France') -- Give me people where there is a duplicate address record select * from people where nam in ( select nam from People group by nam, adr -- group list different from select list having count(*) > 1 ).

If you issue just regarding to easier way to write scripts. Here is one tip: In MS SQL MGMS write you query in text something like select * from my_table after that select text right click and "Design Query in Editor.." Sql studio will open new editor with filed up all fields after that again right click and select "Add Gruop BY" Sql MGM studio will add code for you . I fund this method extremely useful for insert statements.

When I need to write script for insert a lot of fields in table, I just do select * from table_where_want_to_insert and after that change type of select statement to insert.

I would say if you have a large number of items in the group by clause then perhaps the core info should be pulled out into a tabular sub-query which you inner join into. There is a probably a performance hit, but it makes for neater code. Select id, count(a), b, c, d from table group by id, b, c, d becomes select id, myCount, b, c, d from table t inner join ( select id, count(*) as myCount from table group by id ) as myCountTable on myCountTable.Id = t.

Id That said, I'm interested to hear counter-arguments for doing this as opposed to having a large group by clause.

It makes for performance improvements too, as it typically means less sort/temp/work space being used. – araqnid Mar 15 at 14:17.

There is one more reason for why does SQL requires that I specify on which attributes to group. Lets sat we have two simple tables: friend and car, where we store info about our friends and their cars. And lets say we want to show all our friends's data (from table friend) and for everyone of our friends, how many cars they own now, have sold, have crashed and the total number.

Oh, and we want the elders first, younger last. We'd do something like: SELECT f. Id , f.

Firstname , f. Lastname , f. Birthdate , COUNT(NOT c.

Sold AND NOT c. Crashed) AS owned , COUNT(c. Sold) AS sold , COUNT(c.

Crashed) AS crashed , COUNT(c. Friendid) AS totalcars FROM friend f LEFT JOIN car c Id = c. Friendid GROUP BY f.

Id , f. Firstname , f. Lastname , f.

Birthdate ORDER BY f. Birthdate DESC But do we really need all those fields in the GROUP BY? Isn't every friend uniquely determined by his id?

In other words, aren't the firstname, lastname and birthdate functionally dependend on the f.Id? Why not just do (as we can in MySQL): SELECT f. Id , f.

Firstname , f. Lastname , f. Birthdate , COUNT(NOT c.

Sold AND NOT c. Crashed) AS owned , COUNT(c. Sold) AS sold , COUNT(c.

Crashed) AS crashed , COUNT(c. Friendid) AS totalcars FROM friend f LEFT JOIN car c Id = c. Friendid GROUP BY f.

Id ORDER BY f. Birthdate And what if we had 20 fields in the SELECT (plus ORDER BY) parts? Isn't the second query shorter, clearer and probably faster (in the RDBMS that accept it)?

I say yes.So, do the SQL 1999 and 2003 specs say, if this article is correct: Debunking group by myths.

1 The most recent SQL standards say, in effect, that you can include columns in the SELECT list that aren't in the GROUP BY list if and only if those columns are functionally dependent on columns that are in the GROUP BY list. The objections to MySQL usually have to do with its allowing you to include columns in the SELECT list that aren't functionally dependent on any columns in the GROUP BY list. – Catcall Apr 10 at 0:09 @Catcall: I agree on that.

MySQL's deviation from this and the "random" rows it fetches when one uses this "feature", opens the door to problems. – ypercube Apr 10 at 7:14.

This thread provides some useful explanations. social.msdn.microsoft.com/Forums/en/tran....

I'd say it is more likely to be a language design choice that decisions be explicit, not implicit. For instance, what if I wish to group the data in a different order than that in which I output the columns? Or if I want to group by columns that aren't included in the columns selected?

Or if I want to output grouped columns only and not use aggregate functions? Only by explicitly stating my preferences in the group by clause are my intentions clear. You also have to remember that SQL is a very old language (1970).

Look at how Linq flipped everything around in order to make Intellisense work - it looks obvious to us now, but SQL predates IDEs and so couldn't have taken into account such issues.

1 for mentioning the importance of the ordering of the fields in an ORDER BY. A ... GROUP BY a, be may deliver different results when compared to ... GROUP BY b, a – Dirk May 18 at 22:01.

I Agree I quite agree with the question. I asked the same one here. I honestly think it's a language flaw.

I realise that there are arguments against that, but I have yet to use a GROUP BY clause containing anything other than all the non-aggregated fields from the SELECT clause in the real world.

The "superflous" attributes determine the ordering of the result. Consider create table gb ( a number, be varchar(3), c varchar(3) ); insert into gb values ( 3, 'foo', 'foo'); insert into gb values ( 1, 'foo', 'foo'); insert into gb values ( 0, 'foo', 'foo'); insert into gb values ( 20, 'foo', 'bar'); insert into gb values ( 11, 'foo', 'bar'); insert into gb values ( 13, 'foo', 'bar'); insert into gb values ( 170, 'bar', 'foo'); insert into gb values ( 144, 'bar', 'foo'); insert into gb values ( 130, 'bar', 'foo'); insert into gb values (2002, 'bar', 'bar'); insert into gb values (1111, 'bar', 'bar'); insert into gb values (1331, 'bar', 'bar'); This statement select sum(a), b, c from gb group by b, c; results in 44 foo bar 444 bar foo 4 foo foo 4444 bar bar while this one select sum(a), b, c from gb group by c, b; results in 444 bar foo 44 foo bar 4 foo foo 4444 bar bar.

4 The only thing that determines ordering is an ORDER BY clause. If you don't explicitly state the desired order the data should be returned, the database can order it however is most convenient to it. This will vary from database to database.

Without an ORDER BY clause, any appearance of order is entirely coincidental. – Chris Latta Feb 22 '10 at 14:03 Yes, thats right. I should have written: influences – René Nyffenegger Feb 22 '10 at 15:08 -1 GROUP BY does not imply SORT.

Parallelism or hash aggregates could both lead to results not sorted in the way you appear to expect. Additionally in SQL Server (and quite possibly other RDBMSs too) if you GROUP BY PK, Other Functionally Dependant Cols it completely ignores these other columns in the GROUP BY operation they are just required in the GROUP BY to be syntactically valid. – Martin Smith Apr 9 at 23:29.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions