COUNT(*) vs. COUNT(1) vs. COUNT(pk): which is better?

Use for all your queries that need to count everything, even for joins, use.

Up vote 44 down vote favorite 15 share g+ share fb share tw.

I often find these three variants: SELECT COUNT(*) FROM Foo; SELECT COUNT(1) FROM Foo; SELECT COUNT(PrimaryKey) FROM Foo; As far as I can see, they all do the same thing, and I find myself using the three in my codebase. However, I don't like to do the same thing different ways. To which one should I stick?

Is any one of them better than the two others? Sql select count link|improve this question asked Apr 26 '10 at 1:10zneak21.8k23770 89% accept rate.

2 +1, I didn't even know, SELECT COUNT(PrimaryKey) FROM Foo; was even an option – Anthony Forloney Apr 26 '10 at 1:16 3 IMO, if you don't know the difference, pick one and stick with it. If you can't be right, at least be consistent. – Frank Farmer Apr 26 '10 at 1:18 @Anthony Forloney: let's make it clear that PrimaryKey refers to the name of your primary key field, and that it's not some magical keyword.

– zneak Apr 26 '10 at 1:21 @zneak, Yeah, I realized that when MySQL threw me an error Unknown column "primarykey" in 'field list' good job me. – Anthony Forloney Apr 26 '10 at 1:24 possible duplicate of stackoverflow.com/questions/1221559/count-vs-count1 – gbn Apr 26 '107 at 19:14.

Use * for all your queries that need to count everything, even for joins, use * SELECT boss. Boss_id, COUNT(subordinate. *) FROM boss LEFT JOIN subordinate on subordinate.

Boss_id = boss. Boss_id GROUP BY boss. Id But don't use COUNT(*) for joins, as that will return 1 even if the subordinate table doesn't match anything from parent table SELECT boss.

Boss_id, COUNT(*) FROM boss LEFT JOIN subordinate on subordinate. Boss_id = boss. Boss_id GROUP BY boss.

Id Don't be fooled by those advising that when using * in COUNT, it fetches entire row from your table, saying that * is slow. The * on SELECT COUNT(*) and SELECT * has no bearing to each other, they are entirely different thing, they just share a common token, i.e. *.

In fact, if it is not permitted to name a field as same as its table name, RDBMS language designer could give COUNT(tableNameHere) the same semantics as COUNT(*). Example: For counting rows we could have this: SELECT COUNT(emp) FROM emp And they could make it simpler: SELECT COUNT() FROM emp And for LEFT JOINs, we could have this: SELECT boss. Boss_id, COUNT(subordinate) FROM boss LEFT JOIN subordinate on subordinate.

Boss_id = boss. Boss_id GROUP BY boss. Id But they cannot do that (COUNT(tableNameHere)) since SQL standard permits naming a field with the same name as its table name: CREATE TABLE fruit -- ORM-friendly name ( fruit_id int NOT NULL, fruit varchar(50), /* same name as table name, and let's say, someone forgot to put NOT NULL */ shape varchar(50) NOT NULL, color varchar(50) NOT NULL ) And also, it is not a good practice to make a field nullable, say you have values 'Banana', 'Apple', NULL, 'Pears' on fruit field.

This will not count all fruits, it will only yield 3, not 4 SELECT count(fruit) FROM fruit Though some RDBMS do that sort of principle(for counting the table's rows, it accepts table name as COUNT's parameter), this will work in Postgresql (if there is no subordinate field in any of the two tables below, i.e. As long as there is no name conflict between field name and table name): SELECT boss. Boss_id, COUNT(subordinate) FROM boss LEFT JOIN subordinate on subordinate.

Boss_id = boss. Boss_id GROUP BY boss. Id But that could cause confusion later if we will add a subordinate field in the table, as it will count the field(which could be nullable), not the table rows.

So to be on the safe side, use: SELECT boss. Boss_id, COUNT(subordinate. *) FROM boss LEFT JOIN subordinate on subordinate.

Boss_id = boss. Boss_id GROUP BY boss. Id EDIT In particular to COUNT(1), it is a one-trick pony, it works well only on one table query: SELECT COUNT(1) FROM tbl But when you use joins, that trick won't work on multi-table queries without its semantics being confused, and in particular you cannot write: -- count the subordinates that belongs to boss SELECT boss.

Boss_id, COUNT(subordinate.1) FROM boss LEFT JOIN subordinate on subordinate. Boss_id = boss. Boss_id GROUP BY boss.

Id So what's the meaning of COUNT(1) here? SELECT boss. Boss_id, COUNT(1) FROM boss LEFT JOIN subordinate on subordinate.

Boss_id = boss. Boss_id GROUP BY boss. Id Is it this...? -- counting all the subordinates only SELECT boss.

Boss_id, COUNT(subordinate. Boss_id) FROM boss LEFT JOIN subordinate on subordinate. Boss_id = boss.

Boss_id GROUP BY boss. Id Or this...? -- or is that COUNT(1) will also count 1 for boss regardless if boss has a subordinate SELECT boss. Boss_id, COUNT(*) FROM boss LEFT JOIN subordinate on subordinate.

Boss_id = boss. Boss_id GROUP BY boss. Id Though it isn't hard to infer (though some could be confused) that COUNT(1) is the same as COUNT(*) regardless of type of join.

But for LEFT JOINs result, we cannot mold COUNT(1) to work as: COUNT(subordinate. Boss_id), COUNT(subordinate. *) So just use either of the following: -- count the subordinates that belongs to boss SELECT boss.

Boss_id, COUNT(subordinate. Boss_id) FROM boss LEFT JOIN subordinate on subordinate. Boss_id = boss.

Boss_id GROUP BY boss. Id -- count the subordinates that belongs to boss SELECT boss. Boss_id, COUNT(subordinate.

*) FROM boss LEFT JOIN subordinate on subordinate. Boss_id = boss. Boss_id GROUP BY boss.

Id Use either COUNT(field) or COUNT(*), and stick with it consistently. In short, don't use COUNT(1) for anything.

Thanks for the advice. Are there reasons behind those imperatives? – zneak Apr 26 '10 at 1:54 Imperatives regarding using * consistently?

Yes there is, to make things simpler and consistent. In fact if SQL standard ruled that a field name could not have the same name as its table name, there could be only two forms of COUNT, one is COUNT() and the other is COUNT(tableOrAliasedTablenameHere), life could be simpler. If we think about it, we don't count field, we count rows.

And if SQL standard ruled that there could be no nullable fields in the database, we really won't need COUNT(fieldnameHere) construct at all. – Michael Buen Apr 26 '10 at 2:13 If it's only for the sake of consistency, then you could also use COUNT(1) instead of COUNT(*). Is one of them better than the other?

– zneak Apr 26 '10 at 2:53 1 COUNT(1) looks like a magic number, one that is used when someone already have a grasp what is going on under-the-hood. It could led to abuse (i.e. If there's a malicious intention), since all of COUNT(0), COUNT(1), COUNT(2), COUNT(42) (you get the gist) are the same as COUNT(*), somebody could obfuscate the code and use COUNT(2) for example, so the next maintainer could have a hard time deducing what those COUNTs do.

Someone will only start to use COUNT(1) when he/she already gleans that COUNT(1) is same as COUNT(*). Nobody started their database career with COUNT(1) – Michael Buen Apr 26 '10 at 3:07 If for consistency's sake, COUNT(*) is better, it doesn't has any arcane semantics from it as compared to COUNT(1). Everybody starts with SELECT * FROM tbl.

And it naturally occurs to them(and rest of us) that when they need to get that query's count, they will just enclose COUNT on *. It won't be a first nature to us to use COUNT(1) – Michael Buen Apr 26 '10 at 3:18.

Two of them always produce the same answer: COUNT(*) counts the number of rows COUNT(1) also counts the number of rows Assuming the 'pk' is a primary key and that no nulls are allowed in the values, then COUNT(pk) also counts the number of rows However, if 'pk' is not constrained to be not null, then it produces a different answer: COUNT(possibly_null) counts the number of rows with non-null values in the column possibly_null. COUNT(DISTINCT pk) also counts the number of rows (because a primary key does not allow duplicates). COUNT(DISTINCT possibly_null_or_dup) counts the number of distinct non-null values in the column possibly_null_or_dup.

COUNT(DISTINCT possibly_duplicated) counts the number of distinct (necessarily non-null) values in the column possibly_duplicated when that has the NOT NULL clause on it. Normally, I write COUNT(*); it is the original recommended notation for SQL. Similarly, with the EXISTS clause, I normally write WHERE EXISTS(SELECT * FROM ...) because that was the original recommend notation.

There should be no benefit to the alternatives; the optimizer should see through the more obscure notations.

I didn't even know COUNT(DISTINCT) worked, though it makes sense. Is it specific to a SQL flavor, or it's widely supported? – zneak Apr 26 '10 at 1:36 @zneak: COUNT(DISTINCT x) has been in SQL since SQL-86 (the first standard), so I would be surprised to find any SQL DBMS that did not support it.

– Jonathan Leffler Apr 26 '10 at 1:40.

This will depend on the type of database you are using as well as the type of table in some cases. For example, using MySQL, count(*) will be fast under a MyISAM table but slow under an InnoDB. Under InnoDB you should use count(1) or count(pk).

4 Interesting. Is there any benchmark on the matter you could link to? – zneak Apr 26 '10 at 1:26 Sorry, not aware of any specific benchmarks.

Its just a common known "issue" with dealing with MySQL. A google search should point at numerous articles discussing the issue. Basically, MyISAM stores the total number of rows with the table information whilst InnoDB doesn't.

I believe count(*) can be fast for InnoDB under certain situations but I find it safer to always use count(key) - no surprise performance issues that way. – Jarod Elliott Apr 26 '10 at 1:34 count(key) (unique keys and foreign keys (not primary key ones), unique/foreign can be nullable) can have performance surprises issues, look at Mitch Wheat answer. Just think of COUNT(*) as a C language function COUNT(), i.e.

The * is ignored. If there will be performance issue between COUNT(*) and COUNT(key) on non-optimizing RDBMS, I think it's the latter that could suffer from performance, the parameter key in COUNT(key) gets evaluated on each row. While the * is just a directive for RDBMS to just fetch the metadata for count, or the cardinality of table.

Read Mitch Wheat's answer – Hao Apr 26 '10 at 6:19.

COUNT(*) and COUNT(1) are actually subtley different (though they do the same thing) COUNT(*) is the cardinality of the table (i.e. The number of rows) and COUNT(exp) is the count of the non-NULL occurrences of the expression, which happens to be a constant in this case. I prefer COUNT(*).

At least on Oracle they are all the same: oracledba.co.uk/tips/count_speed.htm.

That's good news. – zneak Apr 26 '10 at 1:22.

Asked and answered before... Books on line says "COUNT ( { ALL | DISTINCT expression | * } )" "1" is a non-null expression so it's the same as COUNT(*). The optimiser recognises it as trivial so gives the same plan. A PK is unique and non-null (in SQL Server at least) so COUNT(PK) = COUNT(*) This is a similar myth to EXISTS (SELECT * ... or EXISTS (SELECT 1 ... And see the ANSI 92 spec, section 6.5, General Rules, case 1 a) If COUNT(*) is specified, then the result is the cardinality of T.

B) Otherwise, let TX be the single-column table that is the result of applying the to each row of T and eliminating null values. If one or more null values are eliminated, then a completion condition is raised: warning- null value eliminated in set function.

I feel the performance charecteristics changes from DBMS to DBMS. Its all on how they choose to implment it. Since I have worked extensively on oracle, Ill tell from that perspective.

COUNT(*) - Fetches entire row into result set befor passing on to the count function, count function will aggregate 1 if the row is not null COUNT(1) - Will not fetch any row, instead count is called with a constant value 1 for each row in the table when the where matches. Count(PK) - PK's in oracle is indexed. This means Oracle has to read only the index.

Normally one row in Index B+ Tree is many times smaller than the actual row. So considering the disk IOPS rate, Oracle can fetch many times more rows from Index with a single block transfer as compared to entire row. This leads to higher througput of the query.

From this you can see the first count is the slowest and the last count is the fastest in Oracle.

Fortunately they have been sensible enough to change that after you left - oracledba.co.uk/tips/count_speed.htm – OrangeDog Feb 17 '11 at 11:20.

As it is obvious Jonathan Leffler told the complete story. But in similarcases Irecommand to use SQL Profiler and check whats happening. There you can find your answer.

– zneak Apr 26 '10 at 9:21 @zneak - Go to StartMenu>All Programs\Microsoft SQL Server 2xxx \Performance Tools\SQL Profiler – Nasser Hadjloo Apr 27 '10 at 6:24 Hajloo: Unfortunately, I don't use SQL Server; and even if I did somehow, I guess I can't run SQL Profiler from a non-Windows machine. Thanks for the tip, though. – zneak Apr 27 '10 at 22:08.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions