How to optimize IN query on indexed columnOptimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?

What are the exact meanings of roll, pitch and yaw?

Is Grandpa Irrational? Another Grandpa Mystery

Is the 2-Category of groupoids locally presentable?

Can I paint a load center cover?

What do I do when a student working in my lab "ghosts" me?

Film where a boy turns into a princess

How do campaign rallies gain candidates votes?

Impact of throwing away fruit waste on a peak > 3200 m above a glacier

Is an easily guessed plot twist a good plot twist?

Why do people say "I am broke" instead of "I am broken"?

What is the purpose of the fuel shutoff valve?

What is the meaning of "a thinly disguised price"?

Are glider winch launches rarer in the USA than in the rest of the world? Why?

Very basic singly linked list

What do teaching faculty do during semester breaks?

What exactly makes a General Products hull nearly indestructible?

How did C64 games handle music during gameplay?

Grid/table with lots of buttons

What is the purpose of this "red room" in Stranger Things?

How do I run a game when my PCs have different approaches to combat?

How to sort and filter a constantly changing list of data?

Strange Cron Job takes up 100% of CPU Ubuntu 18 LTS Server

What is "ass door"?

Why did Saturn V not head straight to the moon?



How to optimize IN query on indexed column


Optimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4















I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?










share|improve this question
























  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    8 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    8 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    7 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    6 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    4 hours ago

















4















I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?










share|improve this question
























  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    8 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    8 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    7 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    6 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    4 hours ago













4












4








4


1






I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?










share|improve this question
















I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?







postgresql index query-performance optimization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 7 hours ago







Anthony

















asked 9 hours ago









AnthonyAnthony

1303 bronze badges




1303 bronze badges












  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    8 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    8 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    7 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    6 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    4 hours ago

















  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    8 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    8 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    7 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    6 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    4 hours ago
















Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
8 hours ago





Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
8 hours ago













Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
8 hours ago





Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
8 hours ago













@Lennart Can you please explain further. Not sure I understand.

– Anthony
7 hours ago





@Lennart Can you please explain further. Not sure I understand.

– Anthony
7 hours ago













@Anthony, I've added a little longer explanation as an answer

– Lennart
6 hours ago





@Anthony, I've added a little longer explanation as an answer

– Lennart
6 hours ago













It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
4 hours ago





It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
4 hours ago










2 Answers
2






active

oldest

votes


















5














You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer


















  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago


















1














I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



 Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer

























  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    7 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    7 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    4 hours ago













Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f243790%2fhow-to-optimize-in-query-on-indexed-column%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









5














You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer


















  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago















5














You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer


















  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago













5












5








5







You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer













You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)






share|improve this answer












share|improve this answer



share|improve this answer










answered 6 hours ago









LennartLennart

14.1k2 gold badges13 silver badges43 bronze badges




14.1k2 gold badges13 silver badges43 bronze badges







  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago












  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago







2




2





I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago





I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago













1














I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



 Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer

























  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    7 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    7 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    4 hours ago















1














I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



 Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer

























  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    7 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    7 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    4 hours ago













1












1








1







I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



 Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer















I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



 Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.







share|improve this answer














share|improve this answer



share|improve this answer








edited 5 hours ago

























answered 8 hours ago









jjanesjjanes

16k1 gold badge10 silver badges19 bronze badges




16k1 gold badge10 silver badges19 bronze badges












  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    7 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    7 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    4 hours ago

















  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    7 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    7 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    4 hours ago
















My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago





My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago













I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago





I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago













You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago





You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago













well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago





well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f243790%2fhow-to-optimize-in-query-on-indexed-column%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Sahara Skak | Bilen | Luke uk diar | NawigatsjuunCommonskategorii: SaharaWikivoyage raisfeerer: Sahara26° N, 13° O

The fall designs the understood secretary. Looking glass Science Shock Discovery Hot Everybody Loves Raymond Smile 곳 서비스 성실하다 Defas Kaloolon Definition: To combine or impregnate with sulphur or any of its compounds as to sulphurize caoutchouc in vulcanizing Flame colored Reason Useful Thin Help 갖다 유명하다 낙엽 장례식 Country Iron Definition: A fencer a gladiator one who exhibits his skill in the use of the sword Definition: The American black throated bunting Spiza Americana Nostalgic Needy Method to my madness 시키다 평가되다 전부 소설가 우아하다 Argument Tin Feeling Representative Gym Music Gaur Chicken 일쑤 코치 편 학생증 The harbor values the sugar. Vasagle Yammoe Enstatite Definition: Capable of being limited Road Neighborly Five Refer Built Kangaroo 비비다 Degree Release Bargain Horse 하루 형님 유교 석 동부 괴롭히다 경제력

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу