Limited SELECT count(*)
A SELECT statement with COUNT returns the number of rows retrieved by the SELECT statement (see mysql select count).
For performance reasons, the desired result is to limit that count.
Including a LIMIT clause in the SELECT statement will not work since it only restricts the number of rows returned, which is always one.
The solution, what I call “Limited-Count”, is done by limiting a non-count SELECT statement and wrapping it in COUNT(*).
For example:
If your count statement looks like:
SELECT COUNT(*) FROM a_table WHERE some_conditions; |
You can limit the results by moving the query into:
SELECT COUNT(*) FROM (SELECT 1 FROM a_table WHERE some_conditions LIMIT x) t; |
Notes:
1. COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause (this is true only for MyISAM tables. For InnoDB tables, it’s slow – thanks for Roland Bouman’s comment)
2. If you need to know how many rows your non-count SELECT statement would have returned without the LIMIT, you could consider using the information function, FOUND_ROWS(). It will fetch the total rows number without running the statement again.
I used a different solution to this issue, which can become a performance issue under the right conditions: use TRIGGERS w/ an auxiliary table to hold the counts. Then you can just do a ‘SELECT count WHERE some_condition’. SELECT COUNT(*) that would take several seconds (even minutes in some cases), were reduced to fractions of a second.
My $.02
G
I have a hard time seeing how this is beneficial. Perhaps you can post some benchmarks and explain when/where this query is useful.
Hi there!
a few comments:
“For performance, the desired result is to limit that count.”
Well, perhaps…but what good is a count if you put an arbitrary limit on it?
“COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause.”
This is true for MyISAM tables. For InnoDB tables, it’s slow.
nice post thank your for sharing ?
Pretty nice post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!
Alternatively, to avoid the subselect, try this one:
SELECT IF(COUNT(*) > 100, 100, COUNT(*)) FROM a_table WHERE some_conditions
Hi,
Be careful. To use FOUND_ROWS() you need to supply with the CALC_FOUND_ROWS hint. What MySQL will do is actually calculate all found rows, and only return you with your LIMITed number.
Performance-wise, this means you may be waiting for a query to return with 100,000 rows, even though you only expect 20 in your LIMIT clause.
Regards
When your query is optimized and indexes are up and running, using FOUND_ROWS() with SQL_CALC_FOUND_ROWS slows things down when you have much data. IT slowes so hard, that 2 separate queries are 2-3 times faster. Just search the google for it.
To Josh:
This limited count is useful for showing list with a limited roller. If your Roller shows only 10 pages at a time you don’t need to perform a full count(*) query. A limited count(*) is sufficient and much more performed.
To Anse,
This will not bring the required performance improvement.
MySQL will need to calculate the full count(*) before returning the answer.
Ilan.
Yes, exactly, performance breaks as soon as there are InnoDB tables involved, or if there is a WHERE clause.
A similar approach, although slightly different topic than discussed here, is the one used for HeidiSQL’s data grid:
SELECT col1, … FROM table [WHERE …] [ORDER BY col1] LIMIT 0, 1000
The crux is that COUNT(*) is slow in too many cases, so HeidiSQL avoids using it at all. Instead, chunks of 1000 rows are fetched while you scroll down the grid. If a chunk returns less than 1000 rows this is the indicator for the end of a result set.
Keep up the good work, I like your writing.
Nice post.
Valuable info. Lucky me I found your site by accident, I bookmarked it.
Keep up the good work, I like your writing.
To Josh:
This limited count is useful for showing list with a limited roller. If your Roller shows only 10 pages at a time you don’t need to perform a full count(*) query. A limited count(*) is sufficient and much more performed.
Thanks a lot, this below SQL worked perfectly in my project. But I am still surprise why count(*) does not work with limit clause.
Any how thank a lot of this better workaround.
SELECT COUNT(1) FROM (SELECT 1 FROM subscriber LIMIT 18, 2000)s
Thanks,
Binod Suman
Bangalore,India
The Count will return only one row of result. The limit clause is limiting the number of result rows. Hence, any limit that greater than one will will still show one result which is the only result (that count(*) returns).
Firts of all it’s to have good indexes, and it’s better to count a field using these indexes:
SELECT COUNT(1) FROM users WHERE postal_code = ‘23456’
If you have indexes on postal_code, these count takes few time.
Also, take a look that I make count(1), not count(*), because I’m saying to mysql that I don’t need fields, only count, and the query it’s executed quickest.
Remember that the best policy to make a count before to listing results (and paging), on large sites and large tables, it’s to have a general count in a separate table, almost for those tables that you need to make full scan (or limited where). Also, you can obtain full count on base mysql tables and indexes.
In relation to LIMIT, on all queries it’s better to use a LIMIT:
SELECT count(1) FROM users WHERE postal_code=’67677′ LIMIT 1
SELECT count(1) FROM users LIMIT 1
Even when you have 2 or 3 millions of users, these counts will be made on less that a sec.
Remember ever to use cache in your code, and to save those queries that can take a long time.
Thanks for your reply,
Dont you think that the usage of Limit with a count query is redundant? (because the limit applies only on the result and a count results is always one)
Dear Hazan,
Yes, would be redundant, but the parser understands better the command, takes less time and assures to reserve limited memory results.
This can be checked and tested for a 1 milion record table, selecting conditional to am indexed field, example:
SELECT count(1) FROM cities
WHERE country_id = 50
LIMIT 1
Looking forward to reading more. Great blog post.Really looking forward to read more.
hey, nice blog…really like it and added to bookmarks. keep up with good work
Awsome job!