Oracle - Substituindo subqueries por funções analíticas - Um Exemplo prático

Muitas vezes nos vemos na necessidade de identificar o último registro de uma tabela baseando-se numa regra específica. Veja o exemplo abaixo:

Criando a Tabela de exemplo:

Create table plz_nota_fiscal
   (nr_nota_fiscal      number
   ,Tipo_nota_fiscal    number
   ,Data_nota_fiscal    date
   ,Valor_nota_fiscal   number);
   
create index plz_nf01 on plz_nota_fiscal (tipo_nota_fiscal);

Inserindo um conjunto de registros para testes:

DECLARE
   V_MOD NUMBER;
BEGIN
   FOR x IN 1..10 LOOP
      FOR y IN 1..50 LOOP
         --> Modificador para diferenciar números
         if x = 1 then
            v_mod := 5;
         elsif x = 2 then
            v_mod := 13;
         elsif x = 3 then
            v_mod := 19;
         elsif x = 4 then
            v_mod := 43;
         else
            v_mod := 198;
         end if;
         --> Insere a nota
         insert into PLZ_NOTA_FISCAL
            (nr_nota_fiscal
            ,tipo_nota_fiscal
            ,data_nota_fiscal
            ,valor_nota_fiscal)
         values
            (y+v_mod
            ,x
            ,sysdate+y+x
            ,y+x);
      END LOOP;
   END LOOP;
   commit;
END;
/

Agora, precisa identificar somente as últimas notas geradas para cada tipo específico de nota:

Select nr_nota_fiscal
  from PLZ_NOTA_FISCAL
 where tipo_nota_fiscal = 3
   and nr_nota_fiscal = (select MAX(nr_nota_fiscal)
                           from PLZ_NOTA_FISCAL
                          where tipo_nota_fiscal = 3)

Agora, vejamos como fica o plano de ação para esta query:

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |    26 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID  | PLZ_NOTA_FISCAL |     1 |    26 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN            | PLZ_NF01        |     2 |       |     1   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE              |                 |     1 |    26 |            |          |
|   4 |    TABLE ACCESS BY INDEX ROWID| PLZ_NOTA_FISCAL |    50 |  1300 |     1   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN          | PLZ_NF01        |     2 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NR_NOTA_FISCAL"= (SELECT MAX("NR_NOTA_FISCAL") 
                                   FROM "PLZ_NOTA_FISCAL"
                                  WHERE "TIPO_NOTA_FISCAL"=3))
   2 - access("TIPO_NOTA_FISCAL"=3)
   5 - access("TIPO_NOTA_FISCAL"=3)

Agora vamos testar de outra maneira, só que desta vez usando as funções analíticas disponíveis no Oracle:

select tipo_nota_fiscal
     , ultimo_numero
  from (select nr_nota_fiscal
             , tipo_nota_fiscal
             , data_nota_fiscal
             , valor_nota_fiscal
             , MAX(nr_nota_fiscal) over (partition by tipo_nota_fiscal) ultimo_numero
          from PLZ_NOTA_FISCAL)
 where nr_nota_fiscal   = ultimo_numero
   and tipo_nota_fiscal = 3

Agora, vejamos como fica o plano de acesso para esta query:

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |    50 |  1950 |     1   (0)| 00:00:01 |
|*  1 |  VIEW                         |                 |    50 |  1950 |     1   (0)| 00:00:01 |
|   2 |   WINDOW BUFFER               |                 |    50 |  1300 |     1   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| PLZ_NOTA_FISCAL |    50 |  1300 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | PLZ_NF01        |     2 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NR_NOTA_FISCAL"="ULTIMO_NUMERO")
   4 - access("TIPO_NOTA_FISCAL"=3)

Resultado da comparação entre os planos de acesso:

Na busca usando subquery, ele gerou um custo maior e um plano de acesso onde acessou duas vezes a mesma tabela e o mesmo índice.

Na busca usando funções analíticas, ele gerou um custo menor e um plano de acesso simplificado.

Agora, Vamos fazer o teste retornando o último número de notas fiscais para cada tipo existente:

Primeiro, com a subquery:

Select nr_nota_fiscal
  from PLZ_NOTA_FISCAL
 where (tipo_nota_fiscal, nr_nota_fiscal) in (select tipo_nota_fiscal
                                                   , MAX(nr_nota_fiscal)
                                                from PLZ_NOTA_FISCAL
                                               group by tipo_nota_fiscal)

Vejamos o plano e acesso para esta query:

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 | 25000 |  1562K|     5  (40)| 00:00:01 |
|*  1 |  FILTER                        |                 |       |       |            |          |
|   2 |   HASH GROUP BY                |                 | 25000 |  1562K|     5  (40)| 00:00:01 |
|   3 |    NESTED LOOPS                |                 |       |       |            |          |
|   4 |     NESTED LOOPS               |                 | 25000 |  1562K|     3   (0)| 00:00:01 |
|   5 |      TABLE ACCESS FULL         | PLZ_NOTA_FISCAL |   500 | 13000 |     3   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN          | PLZ_NF01        |    50 |       |     0   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID| PLZ_NOTA_FISCAL |    50 |  1900 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NR_NOTA_FISCAL"=MAX("NR_NOTA_FISCAL"))
   6 - access("TIPO_NOTA_FISCAL"="TIPO_NOTA_FISCAL")

Agora, usando a função analítica:

select tipo_nota_fiscal
     , ultimo_numero
  from (select nr_nota_fiscal
             , tipo_nota_fiscal
             , data_nota_fiscal
             , valor_nota_fiscal
             , MAX(nr_nota_fiscal) over (partition by tipo_nota_fiscal) ultimo_numero
          from PLZ_NOTA_FISCAL)
 where nr_nota_fiscal   = ultimo_numero;
 
 ---------------------------------------------------------------------------------------
 | Id  | Operation           | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
 ---------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT    |                 |   500 | 19500 |     4  (25)| 00:00:01 |
 |*  1 |  VIEW               |                 |   500 | 19500 |     4  (25)| 00:00:01 |
 |   2 |   WINDOW SORT       |                 |   500 | 13000 |     4  (25)| 00:00:01 |
 |   3 |    TABLE ACCESS FULL| PLZ_NOTA_FISCAL |   500 | 13000 |     3   (0)| 00:00:01 |
 ---------------------------------------------------------------------------------------
 
 Predicate Information (identified by operation id):
 ---------------------------------------------------
     1 - filter("NR_NOTA_FISCAL"="ULTIMO_NUMERO")

Resultado da comparação entre os planos de acesso:

Na busca pela subquery, ele retornou um número bem maior de linhas por causa da cardinalidade entre elas, fazendo acesso full na tabela.

Na busca pela função analítica, ele retornou o número exato de bytes existentes na tabela por não gerar cardinalidade, o que resulta num custo menor, apesar do acesso full.

Apesar da pouca diferença do custo, a diferença no consumo de CPU bem como no número de linhas e bytes processados foi significativo.

Conclusão

Substituir a subquery por uma função analítica nestes casos pode gerar um ganho significativo de desempenho, especialmente se esse comando é executando dentro de um loop.