Home » SQL & PL/SQL » SQL & PL/SQL » how to get the rows of different values of single column
how to get the rows of different values of single column [message #684197] Wed, 21 April 2021 07:14 Go to next message
Sekhar6617
Messages: 22
Registered: March 2021
Junior Member
Hi Team,

Requesting you please help me with the query for selecting rows with different values of single column.
Below is the scenario explained with example. In this below example PO_ID = 2675 should be returned as a output since it has two distinct date fields.


CREATE TABLE PO_TBL (
    PO_ID int,
    DATE DATE,
    AMOUNT  INT, 
    Address varchar(255),
    City varchar(255)
);
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 1023.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 125.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/25/2021', 78.00, 'TEXAS', 'TEXAS');

INSERT INTO PO_TBL VALUES (5360, '02/01/2021', 231.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (5360, '02/01/2021', 349.00, 'TEXAS', 'TEXAS');
OUTPUT:

PO_ID     DATE         
2675      02/01/2021
2675      02/25/2021   


I have tried with below sql query but results are not as expected.

SELECT A.PO_ID, A.DATE, COUNT(A.DATE) FROM PO_TBL A
WHERE A.DATE NOT IN
(SELECT DISTINCT B.DATE FROM PO_TBL B 
WHERE A.PO_ID = B.PO_ID
AND A.DATE <> B.DATE)
GROUP BY A.PO_ID, A.DATE
Thank you.

Regards
Sekhar
Re: how to get the rows of different values of single column [message #684198 is a reply to message #684197] Wed, 21 April 2021 07:54 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Always post your Oracle version, with 4 decimals (query v$version), as often solution depends on it.

DATE is not a valid column name:
SQL> CREATE TABLE PO_TBL (
  2      PO_ID int,
  3      DATE DATE,
  4      AMOUNT  INT,
  5      Address varchar(255),
  6      City varchar(255)
  7  );
    DATE DATE,
    *
ERROR at line 3:
ORA-00904: : invalid identifier
SQL> CREATE TABLE PO_TBL (
  2      PO_ID int,
  3      my_DATE DATE,
  4      AMOUNT  INT,
  5      Address varchar(255),
  6      City varchar(255)
  7  );

Table created.

SQL> INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS')
                                 *
ERROR at line 1:
ORA-01843: not a valid month
ALWAYS use TO_DATE with a format mask to insert a date from a string.
Also note: how Oracle would know if 02/01 is January, 2nd or February, 1st?
Correct syntax is:
SQL> INSERT INTO PO_TBL VALUES (2675, TO_DATE('02/01/2021','MM/DD/YYYY'), 665.00, 'TEXAS', 'TEXAS');

1 row created.
Now for the question:
SQL> select * from PO_TBL order by 1, 2;
     PO_ID MY_DATE                 AMOUNT ADDRESS    CITY
---------- ------------------- ---------- ---------- ----------
      2675 01/02/2021 00:00:00        665 TEXAS      TEXAS
      2675 01/02/2021 00:00:00       1023 TEXAS      TEXAS
      2675 01/02/2021 00:00:00        125 TEXAS      TEXAS
      2675 25/02/2021 00:00:00         78 TEXAS      TEXAS
      5360 01/02/2021 00:00:00        231 TEXAS      TEXAS
      5360 01/02/2021 00:00:00        349 TEXAS      TEXAS

6 rows selected.

SQL> select distinct po_id, my_date
  2  from po_tbl
  3  where po_id in ( select po_id from po_tbl
  4                   group by po_id
  5                   having count(distinct my_date) > 1 )
  6  order by po_id, my_date
  7  /
     PO_ID MY_DATE
---------- -------------------
      2675 01/02/2021 00:00:00
      2675 25/02/2021 00:00:00

2 rows selected.
Re: how to get the rows of different values of single column [message #684199 is a reply to message #684198] Wed, 21 April 2021 12:21 Go to previous messageGo to next message
Sekhar6617
Messages: 22
Registered: March 2021
Junior Member
Thank you Michel.
Sure, I will take care of Data Base version and will follow all your instructions.

The given solution working perfectly. Thank you.

Regards
Sekhar
Re: how to get the rows of different values of single column [message #684200 is a reply to message #684198] Wed, 21 April 2021 12:27 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
Here's another solution:
orclz> select distinct p.po_id,p.my_Date from po_tbl p where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id);

          PO_ID MY_DATE
--------------- -------------------
           2675 2021-02-25:00:00:00
           2675 2021-02-01:00:00:00

I love finding a reason to use the ANY or ALL operators. On my system, the cost is lower than Michel's query:
orclz> set autot trace exp
orclz> select distinct p.po_id,p.my_Date from po_tbl p where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 2711566224

------------------------------------------------------------------------------
| Id  | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |     6 |   264 |     7  (15)| 00:00:01 |
|   1 |  HASH UNIQUE        |        |     6 |   264 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN SEMI    |        |     6 |   264 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| PO_TBL |     6 |   132 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| PO_TBL |     6 |   132 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("P"."PO_ID"="Q"."PO_ID")
       filter("P"."MY_DATE"<>"Q"."MY_DATE")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

orclz>
orclz> select distinct po_id, my_date
  2      from po_tbl
  3    where po_id in ( select po_id from po_tbl
  4                       group by po_id
  5                       having count(distinct my_date) > 1 )
  6      order by po_id, my_date;

Execution Plan
----------------------------------------------------------
Plan hash value: 4065634225

-------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |    18 |  1008 |     8  (25)| 00:00:01 |
|   1 |  SORT UNIQUE NOSORT     |           |    18 |  1008 |     8  (25)| 00:00:01 |
|*  2 |   FILTER                |           |       |       |            |          |
|   3 |    SORT GROUP BY        |           |    18 |  1008 |     8  (25)| 00:00:01 |
|*  4 |     HASH JOIN           |           |    18 |  1008 |     7  (15)| 00:00:01 |
|   5 |      VIEW               | VM_NWVW_2 |     6 |   132 |     4  (25)| 00:00:01 |
|   6 |       HASH GROUP BY     |           |     6 |   132 |     4  (25)| 00:00:01 |
|   7 |        TABLE ACCESS FULL| PO_TBL    |     6 |   132 |     3   (0)| 00:00:01 |
|   8 |      TABLE ACCESS FULL  | PO_TBL    |     6 |   204 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(COUNT("$vm_col_1")>1)
   4 - access("PO_ID"="$vm_col_2")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

orclz>
Re: how to get the rows of different values of single column [message #684204 is a reply to message #684200] Thu, 22 April 2021 08:31 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
Single path through PO_TBL:

WITH T AS (
           SELECT  PO_ID,
                   MY_DATE,
                   COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
                   ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
             FROM  PO_TBL
          )
SELECT  PO_ID,
        MY_DATE
  FROM  T
  WHERE CNT > 1
    AND RN = 1
/

     PO_ID MY_DATE
---------- ----------
      2675 02/01/2021
      2675 02/25/2021

Execution Plan
----------------------------------------------------------
Plan hash value: 118720869

------------------------------------------------------------------------------
| Id  | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |     6 |   288 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |        |     6 |   288 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |        |     6 |   132 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| PO_TBL |     6 |   132 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL>
SY.
Re: how to get the rows of different values of single column [message #684206 is a reply to message #684204] Thu, 22 April 2021 12:24 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

I hesitated to post a solution with analytic functions because this is one of the cases where they are most often less efficient.
If you create an index on the table like:
SQL> create index PO_TBL_idx on PO_TBL (po_id,my_date) compress 1;

Index created.
Then the previous queries use the index when the latest one does not (I removed the costs as they are meaningless for such a small table):
SQL> set autot trace exp
SQL> select distinct p.po_id,p.my_Date
  2  from po_tbl p
  3  where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id)
  4  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1247696620

----------------------------------------
| Id  | Operation         | Name       |
----------------------------------------
|   0 | SELECT STATEMENT  |            |
|   1 |  HASH UNIQUE      |            |
|*  2 |   HASH JOIN       |            |
|   3 |    INDEX FULL SCAN| PO_TBL_IDX |
|   4 |    INDEX FULL SCAN| PO_TBL_IDX |
----------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("P"."PO_ID"="Q"."PO_ID")
       filter("P"."MY_DATE"<>"Q"."MY_DATE")

SQL> WITH T AS (
  2             SELECT  PO_ID,
  3                     MY_DATE,
  4                     COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
  5                     ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
  6               FROM  PO_TBL
  7            )
  8  SELECT  PO_ID,
  9          MY_DATE
 10    FROM  T
 11    WHERE CNT > 1
 12      AND RN = 1
 13  /

Execution Plan
----------------------------------------------------------
Plan hash value: 118720869

--------------------------------------
| Id  | Operation           | Name   |
--------------------------------------
|   0 | SELECT STATEMENT    |        |
|*  1 |  VIEW               |        |
|   2 |   WINDOW SORT       |        |
|   3 |    TABLE ACCESS FULL| PO_TBL |
--------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)
Then index should be far smaller than the table (about 10 bytes per row in the index over several hundred for the table).
Of course, best query depends on the table cardinality and data.
(Note: I used a 11.2.0.4 db for this, maybe a later version knows to use the index with analytic functions.)

Re: how to get the rows of different values of single column [message #684207 is a reply to message #684204] Thu, 22 April 2021 12:25 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
Now that is pretty good! I wish I were more familiar with the analytic functions. However, I wonder about scale. I did a couple of experiments, and if I index PO_ID the JW and MC solutions use the index but the SY solution does not. I tried hacking the object stat's to make the CBO think the table was huge in a very crude way:
EXEC DBMS_STATS.SET_TABLE_STATS(user,'PO_TBL',numrows=>100000000,numblks=>10000000)
and it did look as though my solution was best Smile But I wouldn't think that bears much relation to reality.
Re: how to get the rows of different values of single column [message #684208 is a reply to message #684207] Thu, 22 April 2021 12:27 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Ha-ha, same thought at same time. Smile

Re: how to get the rows of different values of single column [message #684209 is a reply to message #684208] Thu, 22 April 2021 16:43 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
Michel, reason why John's solution is using index is simple - optimizer understands where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id) will not produce TRUE for rows where PO_ID is null or MY_DATE is null. And, BTW, it makes this solution a non-working solution for NULL PO_ID, althought most likely it is PK. Anyway, if we want analytics to take advantage of indexing PO_ID, MY_DATE and both are nullable then all we need is either add where po_id is not null and my_date is not null (but that's again could be an issue if PO_ID can be NULL) or add a constant to index to make sure all rows are indexed:

SQL> CREATE INDEX PO_TBL_IDX ON PO_TBL(PO_ID,MY_DATE);

Index created.

SQL> SET AUTOTRACE TRACE EXP
SQL> WITH T AS (
  2             SELECT  PO_ID,
  3                     MY_DATE,
  4                     COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
  5                     ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
  6               FROM  PO_TBL
  7               WHERE PO_ID IS NOT NULL
  8                 AND MY_DATE IS NOT NULL
  9            )
 10  SELECT  PO_ID,
 11          MY_DATE
 12    FROM  T
 13    WHERE CNT > 1
 14      AND RN = 1
 15  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1314444089

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     6 |   288 |     1   (0)| 00:00:01 |
|*  1 |  VIEW             |            |     6 |   288 |     1   (0)| 00:00:01 |
|   2 |   WINDOW BUFFER   |            |     6 |   132 |     1   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN| PO_TBL_IDX |     6 |   132 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)
   3 - filter("PO_ID" IS NOT NULL AND "MY_DATE" IS NOT NULL)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL> SET AUTOTRACE OFF

SQL> DROP INDEX PO_TBL_IDX;

Index dropped.

SQL> CREATE INDEX PO_TBL_IDX ON PO_TBL(PO_ID,MY_DATE,1);

Index created.

SQL> SET AUTOTRACE TRACE EXP
SQL> WITH T AS (
  2             SELECT  PO_ID,
  3                     MY_DATE,
  4                     COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
  5                     ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
  6               FROM  PO_TBL
  7            )
  8  SELECT  PO_ID,
  9          MY_DATE
 10    FROM  T
 11    WHERE CNT > 1
 12      AND RN = 1
 13  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1314444089

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     6 |   288 |     1   (0)| 00:00:01 |
|*  1 |  VIEW             |            |     6 |   288 |     1   (0)| 00:00:01 |
|   2 |   WINDOW BUFFER   |            |     6 |   132 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN| PO_TBL_IDX |     6 |   132 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL>

SY.
Re: how to get the rows of different values of single column [message #684292 is a reply to message #684209] Mon, 03 May 2021 07:23 Go to previous message
Sekhar6617
Messages: 22
Registered: March 2021
Junior Member
Thank you all for your suggestions.

Regards
Sekhar
Previous Topic: sql regex_instr
Next Topic: multiple queries and 1 output window/result
Goto Forum:
  


Current Time: Fri Mar 29 01:38:09 CDT 2024