When ONLY_FULL_GROUP_BY Won’t See the Query Is - TopicsExpress



          

When ONLY_FULL_GROUP_BY Won’t See the Query Is Deterministic… ift.tt/1BGHRrP This is a follow-up post to my recent announcement of only_full_group_by improvements in 5.7, where I’d like to address some tricky corner cases where GROUP BY contains non-column expressions (functions). In the SQL standard, GROUP BY should contain only table columns and never expressions. So this example query where we want to count how many people have the same name is illegal in the SQL2011 standard because the GROUP BY clause contains a function:CREATE TABLE people (first_name VARCHAR(100), last_name VARCHAR(100)); SELECT CONCAT(first_name, , last_name), COUNT(*) FROM people GROUP BY CONCAT(first_name, , last_name);But MySQL does allow expressions in a GROUP BY clause, and it’s a very convenient addition to the standard. The logic of the only_full_group_by SQL mode notices that the selected expression is equal to the grouping expression, so it concludes that the result is not random and thus accepts the query. However, let’s try this:SELECT UPPER(CONCAT(first_name, , last_name)), COUNT(*) FROM people GROUP BY CONCAT(first_name, , last_name);As you’ll see, it returns: Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column test.people.first_name which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by Indeed, the selected expression is not equal to the grouping one anymore. A human brain instantly recognizes that it’s still deterministic, as it’s simply the UPPER() value of the grouping expression, but for MySQL to recognize it too it should analyze the selected expression’s parse tree, trying to spot the grouping expression at all possible places in that tree. As you can imagine, the selected expression could be rather complex. For example:SELECT UPPER(CONCAT(first_name, , last_name)), SUBSTR(CONCAT(first_name, , last_name), 0.6 * LENGTH(CONCAT(first_name, , last_name))), COUNT(*) FROM people GROUP BY CONCAT(first_name, , last_name);MySQL would have to: Start with UPPER: See if the UPPER value is the CONCAT of GROUP BY — no it’s not So go down to arguments of UPPER There it finds CONCAT, which is equal to the GROUP BY one — OK! Now check SUBSTR: It’s not CONCAT So go down to the arguments CONCAT is there in the first argument — fine The second argument is a multiplication, which is not CONCAT So go down to its arguments We have 0.6, which is not CONCAT but rather a constant and deterministic Then LENGTH, which is not CONCAT So let’s check its arguments CONCAT is there, good. Finally done! All these comparisons of functions have a runtime cost and an implementation cost, so we decided not to go down this path. After all, only_full_group_by had always rejected such a query—which is a non-Standard one—so we chose not to address these specific cases. Thus, MySQL simply observes that the selected expression is not the grouping one and then it collects columns referenced by the selected expression to see if they’re equal to, or functionally dependent on, some grouping columns. But if there’s no grouping column — bail out! One could take a step back and realize that grouping on first_name and last_name (the columns) is more natural than grouping on CONCAT, and it also yields a Standard-compliant query:SELECT UPPER(CONCAT(first_name, , last_name)), COUNT(*) FROM people GROUP BY first_name, last_name;But what if you do have a query which really must group on an expression (likely something less trivial than my examples)? We saw that UPPER and GROUP BY CONCAT don’t fit together. A first idea is thus to group on a column; for that, the Standard-compliant solution is to use a derived table:SELECT UPPER(CONC), COUNT(*) FROM (SELECT CONCAT(first_name, , last_name) AS CONC FROM people) AS derived GROUP BY CONC;It makes CONCAT’s output a column of the derived table and thus the grouping happens on this column and finally UPPER uses this column value. However, the derived table will have as many rows as people; as it will be materialized internally it will form a big temporary table, which creates space and speed issues. It would be better if the derived table contained only the groups. So we’re set for grouping on CONCAT, keeping UPPER separated from the grouping query:SELECT UPPER(CONC), COU FROM (SELECT CONCAT(first_name, , last_name) AS CONC, COUNT(*) AS COU FROM people GROUP BY CONCAT(first_name, , last_name)) AS derived;Another potential solution, albeit a less safe one, is to add ANY_VALUE around UPPER() (or even turn only_full_group_by off!):SELECT ANY_VALUE(UPPER(CONCAT(first_name, , last_name))), COUNT(*) FROM people GROUP BY CONCAT(first_name, , last_name);One case I have seen recently, brought up by Roland Bouman, is when one wants to sort the output in a particular order, for example one wants NULLs first, so adds ISNULL in ORDER BY:SELECT CONCAT(first_name, , last_name), COUNT(*) FROM people GROUP BY CONCAT(first_name, , last_name) ORDER BY ISNULL(CONCAT(first_name, , last_name)), CONCAT(first_name, , last_name);This time, the selected expression is equal to that of GROUP BY, so it has no problem; but the ISNULL expression is the new problem! To ensure a deterministic order of results, only_full_group_by has to validate this ISNULL, and, just like for the previous UPPER call, it will not see that the argument of the ISNULL call is the same as the grouping expression. Here, because the problem is in the ORDER BY clause, and because ORDER BY clauses are allowed to reference expressions of the select list by aliases in MySQL, the simplest solution is to simply use an alias:SELECT CONCAT(first_name, , last_name) AS CONC, COUNT(*) FROM people GROUP BY CONC ORDER BY ISNULL(CONC), CONC;MySQL will see that ISNULL’s argument is an alias (CONC) and that it matches the same alias in the GROUP BY clause — the alias is then treated as a “grouping pseudo-column”. Matching aliases is much simpler than matching expressions, so we have been able to implement that in the only_full_group_by logic. Thus the above query will pass the only_full_group_by restrictions. It’s also much more readable with aliases than without, in my humble opinion; it might even execute slightly faster, as the value of CONC, computed in GROUP BY, is now reused by ISNULL, leading to less calls to the CONCAT function. The only potential drawback is that using aliases in GROUP BY clauses is not Standard-compliant, so it may cause issues if you need to support various DBMS options with your application. I hope that this blog post has given you enough safe and usable tips for those types of queries. If you know of other tricks, they are warmly welcome, and you might just drop them in a comment here (thanks in advance!). As always, THANK YOU for using MySQL! from Planet MySQL ift.tt/1du18ol
Posted on: Wed, 07 Jan 2015 08:30:40 +0000

Trending Topics



Recently Viewed Topics




© 2015