开发者

Sort chars in varchar2 alphabetically

开发者 https://www.devze.com 2022-12-30 21:08 出处:网络
I\'m looking for a function that would sort chars in varchar2 alphabetically. Is there something built-in into oracle that I can use or I need to create custom i开发者_JAVA技巧n PL/SQL ? From an ans

I'm looking for a function that would sort chars in varchar2 alphabetically.

Is there something built-in into oracle that I can use or I need to create custom i开发者_JAVA技巧n PL/SQL ?


From an answer at http://forums.oracle.com/forums/thread.jspa?messageID=1791550 this might work, but don't have 10g to test on...

SELECT MIN(permutations)
FROM (SELECT REPLACE (SYS_CONNECT_BY_PATH (n, ','), ',') permutations
    FROM (SELECT LEVEL l, SUBSTR ('&col', LEVEL, 1) n
        FROM DUAL
        CONNECT BY LEVEL <= LENGTH ('&col')) yourtable
    CONNECT BY NOCYCLE l != PRIOR l)
WHERE LENGTH (permutations) = LENGTH ('&col')

In the example col is defined in SQL*Plus, but if you make this a function you can pass it in, or could rework it to take a table column directly I suppose.

I'd take that as a start point rather than a solution; the original question was about anagrams so it's designed to find all permutations, so something similar but simplified might be possible. I suspect this doesn't scale very well for large values.


So eventually I went PL/SQL route, because after searching for some time I realized that there is no build-in function that I can use.

Here is what I came up with. Its based on the future of associative array which is that Oracle keeps the keys in sorted order.

create or replace function sort_chars(p_string in varchar2) return varchar deterministic
as
     rv varchar2(4000);
     ch  varchar2(1);
     type vcArray is table of varchar(4000) index by varchar2(1);
     sorted vcArray;

     key varchar2(1);

begin
     for i in 1 .. length(p_string)
     loop
        ch := substr(p_string, i, 1);

        if (sorted.exists(ch))
        then 
            sorted(ch) := sorted(ch) || ch;
        else
            sorted(ch) := ch;
        end if;
     end loop;


    rv := '';
    key  := sorted.FIRST;
    WHILE key IS NOT NULL LOOP
        rv := rv || sorted(key);
        key := sorted.NEXT(key);
    END LOOP;

     return rv;
end;

Simple performance test:

set timing on;

create table test_sort_fn as 
select t1.object_name || rownum as test from user_objects t1, user_objects t2;

select count(distinct test) from  test_sort_fn;

select count (*)  from (select sort_chars(test)  from test_sort_fn);


Table created.
Elapsed: 00:00:01.32

COUNT(DISTINCTTEST)
-------------------
             384400
1 row selected.
Elapsed: 00:00:00.57

  COUNT(*)
----------
    384400
1 row selected.
Elapsed: 00:00:00.06


You could use the following query:

select listagg(letter) 
    within group (order by UPPER(letter), ASCII(letter) DESC) 
from
(
select regexp_substr('gfedcbaGFEDCBA', '.', level) as letter from dual
connect by regexp_substr('gfedcbaGFEDCBA', '.', level) is not null
);

The subquery splits the string into records (single character each) using regexp_substr, and the outer query merges the records into one string using listagg, after sorting them.

Here you should be careful, because alphabetical sorting depends on your database configuration, as Cine pointed.

In the above example the letters are sorted ascending "alphabetically" and descending by ascii code, which - in my case - results in "aAbBcCdDeEfFgG". The result in your case may be different.

You may also sort the letters using nlssort - it would give you better control of the sorting order, as you would get independent of your database configuration.

select listagg(letter) 
    within group (order by nlssort(letter, 'nls_sort=german') 
from
(
select regexp_substr('gfedcbaGFEDCBA', '.', level) as letter from dual
connect by regexp_substr('gfedcbaGFEDCBA', '.', level) is not null
);

The query above would give you also "aAbBcCdDeEfFgG", but if you changed "german" to "spanish", you would get "AaBbCcDdEeFfGg" instead.


You should remember that there is no common agreement what "alphabetically" means. It all depends on which country it is, and who is looking at your data and what context it is in.

For instance in DK, there are a large number of different sortings of a,aa,b,c,æ,ø,å

  • per the alphabet: a,aa,b,c,æ,ø,å
  • for some dictionary: a,aa,å,b,c,æ,ø
  • for other dictionaries: a,b,c,æ,ø,aa,å
  • per Microsoft standard: a,b,c,æ,ø,aa,å

check out http://www.siao2.com/2006/04/27/584439.aspx for more info. Which also happens to be a great blog for issues as these.


Assuming you don't mind having the characters returned 1 per row:

select substr(str, r, 1) X from (
select 'CAB' str,
       rownum r
from dual connect by level <= 4000
) where r <= length(str) order by X;

X
=
A
B
C


For people using Oracle 10g, select listagg within group won't work. The accepted answer does work, but it generates every possible permutation of the input string, which results in terrible performance - my Oracle database struggles with an input string only 10 characters long.

Here is another alternative working for Oracle 10g. It's similar to Jeffrey Kemp's answer, only the result isn't splitted into rows:

select replace(wm_concat(ch), ',', '') from (
    select substr('CAB', level, 1) ch from dual
    connect by level <= length('CAB')
    order by ch
);
-- output: 'ABC'

wm_concat simply concatenates records from different rows into a single string, using commas as separator (that's why we are also doing a replace later).

Please note that, if your input string had commas, they will be lost. Also, wm_concat is an undocumented feature, and according to this answer it has been removed in Oracle 12c. Use it only if you're stuck with 10g and don't have a better option (such as listagg, if you can use 11g instead).


From Oracle 12, you can use:

SELECT *
FROM   table_name t
       CROSS JOIN LATERAL (
         SELECT LISTAGG(SUBSTR(t.value, LEVEL, 1), NULL) WITHIN GROUP (
                  ORDER BY SUBSTR(t.value, LEVEL, 1)
                ) AS ordered_value
         FROM   DUAL
         CONNECT BY LEVEL <= LENGTH(t.value)
       )

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'ZYX' FROM DUAL UNION ALL
SELECT 'HELLO world' FROM DUAL UNION ALL
SELECT 'aæøåbcæøåaæøå' FROM DUAL;

Outputs:

VALUE ORDERED_VALUE
ZYX XYZ
HELLO world EHLLOdlorw
aæøåbcæøåaæøå aabcåååæææøøø

If you want to change how the letters are sorted then use NLSSORT:

SELECT *
FROM   table_name t
       CROSS JOIN LATERAL (
         SELECT LISTAGG(SUBSTR(t.value, LEVEL, 1), NULL) WITHIN GROUP (
                  ORDER BY NLSSORT(SUBSTR(t.value, LEVEL, 1), 'NLS_SORT = BINARY_AI')
                ) AS ordered_value
         FROM   DUAL
         CONNECT BY LEVEL <= LENGTH(t.value)
       )

Outputs:

VALUE ORDERED_VALUE
ZYX XYZ
HELLO world dEHLLlOorw
aæøåbcæøåaæøå aaåååbcøøøæææ

db<>fiddle here

0

精彩评论

暂无评论...
验证码 换一张
取 消