Limited SELECT count(*)

Posted on May 13, 2010, 3:52 pm, by Hazan Ilan, under Cool MySQL Queries.

A SELECT statement with COUNT returns the number of rows retrieved by the SELECT statement (see mysql select count).

For performance reasons, the desired result is to limit that count.
Including a LIMIT clause in the SELECT statement will not work since it only restricts the number of rows returned, which is always one.

The solution, what I call “Limited-Count”, is done by limiting a non-count SELECT statement and wrapping it in COUNT(*).

For example:
If your count statement looks like:

SELECT COUNT(*) FROM a_table WHERE some_conditions;

You can limit the results by moving the query into:

SELECT COUNT(*) FROM (SELECT 1 FROM a_table WHERE some_conditions LIMIT x) t;

Notes:
1. COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause (this is true only for MyISAM tables. For InnoDB tables, it’s slow – thanks for Roland Bouman’s comment)
2. If you need to know how many rows your non-count SELECT statement would have returned without the LIMIT, you could consider using the information function, FOUND_ROWS(). It will fetch the total rows number without running the statement again.

Tags: count(*), limit, limited count, performance, select
Comment (RSS) | Trackback

24 Comments

Gerry says:
May 13, 2010 at 5:45 pm
I used a different solution to this issue, which can become a performance issue under the right conditions: use TRIGGERS w/ an auxiliary table to hold the counts. Then you can just do a ‘SELECT count WHERE some_condition’. SELECT COUNT(*) that would take several seconds (even minutes in some cases), were reduced to fractions of a second.
My $.02
G
josh says:
May 13, 2010 at 6:32 pm
I have a hard time seeing how this is beneficial. Perhaps you can post some benchmarks and explain when/where this query is useful.
Roland Bouman says:
May 13, 2010 at 6:48 pm
Hi there!
a few comments:
“For performance, the desired result is to limit that count.”
Well, perhaps…but what good is a count if you put an arbitrary limit on it?
“COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause.”
This is true for MyISAM tables. For InnoDB tables, it’s slow.
Bella says:
May 14, 2010 at 12:47 am
nice post thank your for sharing ?
school grants says:
May 14, 2010 at 1:41 am
Pretty nice post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!
Anse says:
May 14, 2010 at 1:59 am
Alternatively, to avoid the subselect, try this one:
SELECT IF(COUNT(*) > 100, 100, COUNT(*)) FROM a_table WHERE some_conditions
Shlomi Noach says:
May 14, 2010 at 8:10 am
Hi,
Be careful. To use FOUND_ROWS() you need to supply with the CALC_FOUND_ROWS hint. What MySQL will do is actually calculate all found rows, and only return you with your LIMITed number.
Performance-wise, this means you may be waiting for a query to return with 100,000 rows, even though you only expect 20 in your LIMIT clause.
Regards
Psih says:
May 14, 2010 at 11:15 am
When your query is optimized and indexes are up and running, using FOUND_ROWS() with SQL_CALC_FOUND_ROWS slows things down when you have much data. IT slowes so hard, that 2 separate queries are 2-3 times faster. Just search the google for it.
Hazan Ilan says:
May 16, 2010 at 12:01 am
To Josh:
This limited count is useful for showing list with a limited roller. If your Roller shows only 10 pages at a time you don’t need to perform a full count(*) query. A limited count(*) is sufficient and much more performed.
Hazan Ilan says:
May 16, 2010 at 12:17 am
To Anse,
This will not bring the required performance improvement.
MySQL will need to calculate the full count(*) before returning the answer.
Ilan.
Anse says:
May 16, 2010 at 3:51 am
Yes, exactly, performance breaks as soon as there are InnoDB tables involved, or if there is a WHERE clause.
A similar approach, although slightly different topic than discussed here, is the one used for HeidiSQL’s data grid:
SELECT col1, … FROM table [WHERE …] [ORDER BY col1] LIMIT 0, 1000
The crux is that COUNT(*) is slow in too many cases, so HeidiSQL avoids using it at all. Instead, chunks of 1000 rows are fetched while you scroll down the grid. If a chunk returns less than 1000 rows this is the indicator for the end of a result set.
scholarships for women says:
May 18, 2010 at 10:51 pm
Keep up the good work, I like your writing.
Eran Binkin says:
May 19, 2010 at 3:25 pm
Nice post.
student scholarships says:
May 21, 2010 at 3:52 am
Valuable info. Lucky me I found your site by accident, I bookmarked it.
Steve says:
May 28, 2010 at 11:37 am
Keep up the good work, I like your writing.
Emily says:
June 1, 2010 at 2:59 pm
To Josh:
This limited count is useful for showing list with a limited roller. If your Roller shows only 10 pages at a time you don’t need to perform a full count(*) query. A limited count(*) is sufficient and much more performed.
Binod Suman says:
June 4, 2010 at 8:34 am
Thanks a lot, this below SQL worked perfectly in my project. But I am still surprise why count(*) does not work with limit clause.
Any how thank a lot of this better workaround.
SELECT COUNT(1) FROM (SELECT 1 FROM subscriber LIMIT 18, 2000)s
Thanks,
Binod Suman
Bangalore,India
Hazan Ilan says:
June 6, 2010 at 12:06 am
The Count will return only one row of result. The limit clause is limiting the number of result rows. Hence, any limit that greater than one will will still show one result which is the only result (that count(*) returns).
XPerez says:
June 8, 2010 at 1:09 am
Firts of all it’s to have good indexes, and it’s better to count a field using these indexes:
SELECT COUNT(1) FROM users WHERE postal_code = ‘23456’
If you have indexes on postal_code, these count takes few time.
Also, take a look that I make count(1), not count(*), because I’m saying to mysql that I don’t need fields, only count, and the query it’s executed quickest.
Remember that the best policy to make a count before to listing results (and paging), on large sites and large tables, it’s to have a general count in a separate table, almost for those tables that you need to make full scan (or limited where). Also, you can obtain full count on base mysql tables and indexes.
In relation to LIMIT, on all queries it’s better to use a LIMIT:
SELECT count(1) FROM users WHERE postal_code=’67677′ LIMIT 1
SELECT count(1) FROM users LIMIT 1
Even when you have 2 or 3 millions of users, these counts will be made on less that a sec.
Remember ever to use cache in your code, and to save those queries that can take a long time.
Hazan Ilan says:
June 9, 2010 at 1:09 am
Thanks for your reply,
Dont you think that the usage of Limit with a count query is redundant? (because the limit applies only on the result and a count results is always one)
XPerez says:
June 9, 2010 at 2:36 am
Dear Hazan,
Yes, would be redundant, but the parser understands better the command, takes less time and assures to reserve limited memory results.
This can be checked and tested for a 1 milion record table, selecting conditional to am indexed field, example:
SELECT count(1) FROM cities
WHERE country_id = 50
LIMIT 1
Stephen Kimberling says:
June 21, 2010 at 2:31 am
Looking forward to reading more. Great blog post.Really looking forward to read more.
Ashley says:
September 8, 2010 at 5:22 pm
hey, nice blog…really like it and added to bookmarks. keep up with good work
Angelin says:
February 27, 2012 at 4:26 am
Awsome job!