How to optimize IN query on indexed columnOptimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?

What are the exact meanings of roll, pitch and yaw?

Is Grandpa Irrational? Another Grandpa Mystery

Is the 2-Category of groupoids locally presentable?

Can I paint a load center cover?

What do I do when a student working in my lab "ghosts" me?

Film where a boy turns into a princess

How do campaign rallies gain candidates votes?

Impact of throwing away fruit waste on a peak > 3200 m above a glacier

Is an easily guessed plot twist a good plot twist?

Why do people say "I am broke" instead of "I am broken"?

What is the purpose of the fuel shutoff valve?

What is the meaning of "a thinly disguised price"?

Are glider winch launches rarer in the USA than in the rest of the world? Why?

Very basic singly linked list

What do teaching faculty do during semester breaks?

What exactly makes a General Products hull nearly indestructible?

How did C64 games handle music during gameplay?

Grid/table with lots of buttons

What is the purpose of this "red room" in Stranger Things?

How do I run a game when my PCs have different approaches to combat?

How to sort and filter a constantly changing list of data?

Strange Cron Job takes up 100% of CPU Ubuntu 18 LTS Server

What is "ass door"?

Why did Saturn V not head straight to the moon?

How to optimize IN query on indexed column

Optimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
 Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Rows Removed by Index Recheck: 5067635
 Filter: (sale_date = 1970)
 Heap Blocks: exact=38679 lossy=496680
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
 Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Planning time: 0.165 ms
 Execution time: 63524.100 ms
(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 
 WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
 Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
 Rows Removed by Index Recheck: 5067635
 Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
 Heap Blocks: exact=38679 lossy=496680
 -> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
 Index Cond: ((color_code)::text = 'red'::text)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
 Index Cond: ((color_code)::text = 'green'::text)
 Planning time: 0.162 ms
 Execution time: 18896.785 ms
(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
 -> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
 -> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
 Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
 Rows Removed by Filter: 4196103
 Planning time: 0.161 ms
 Execution time: 97069.896 ms
(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

edited 7 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
8 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
8 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
7 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
6 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
4 hours ago

add a comment |

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
 Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Rows Removed by Index Recheck: 5067635
 Filter: (sale_date = 1970)
 Heap Blocks: exact=38679 lossy=496680
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
 Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Planning time: 0.165 ms
 Execution time: 63524.100 ms
(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 
 WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
 Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
 Rows Removed by Index Recheck: 5067635
 Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
 Heap Blocks: exact=38679 lossy=496680
 -> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
 Index Cond: ((color_code)::text = 'red'::text)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
 Index Cond: ((color_code)::text = 'green'::text)
 Planning time: 0.162 ms
 Execution time: 18896.785 ms
(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
 -> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
 -> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
 Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
 Rows Removed by Filter: 4196103
 Planning time: 0.161 ms
 Execution time: 97069.896 ms
(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

edited 7 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
8 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
8 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
7 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
6 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
4 hours ago

add a comment |

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
 Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Rows Removed by Index Recheck: 5067635
 Filter: (sale_date = 1970)
 Heap Blocks: exact=38679 lossy=496680
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
 Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Planning time: 0.165 ms
 Execution time: 63524.100 ms
(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 
 WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
 Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
 Rows Removed by Index Recheck: 5067635
 Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
 Heap Blocks: exact=38679 lossy=496680
 -> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
 Index Cond: ((color_code)::text = 'red'::text)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
 Index Cond: ((color_code)::text = 'green'::text)
 Planning time: 0.162 ms
 Execution time: 18896.785 ms
(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
 -> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
 -> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
 Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
 Rows Removed by Filter: 4196103
 Planning time: 0.161 ms
 Execution time: 97069.896 ms
(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

edited 7 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
 Recheck Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Rows Removed by Index Recheck: 5067635
 Filter: (sale_date = 1970)
 Heap Blocks: exact=38679 lossy=496680
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
 Index Cond: ((color_code)::text = ANY ('red,green'::text[]))
 Planning time: 0.165 ms
 Execution time: 63524.100 ms
(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 
 WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
 -> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
 Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
 Rows Removed by Index Recheck: 5067635
 Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
 Heap Blocks: exact=38679 lossy=496680
 -> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
 Index Cond: ((color_code)::text = 'red'::text)
 -> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
 Index Cond: ((color_code)::text = 'green'::text)
 Planning time: 0.162 ms
 Execution time: 18896.785 ms
(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date) 



 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
 -> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
 -> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
 Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('red,green'::text[])))
 Rows Removed by Filter: 4196103
 Planning time: 0.161 ms
 Execution time: 97069.896 ms
(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

postgresql index query-performance optimization

edited 7 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

edited 7 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

edited 7 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

asked 9 hours ago

Anthony

1303 bronze badges

asked 9 hours ago

Anthony

1303 bronze badges

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
8 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
8 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
7 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
6 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
4 hours ago

add a comment |

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
8 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
8 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
7 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
6 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
4 hours ago

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
8 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
8 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
7 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
6 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
4 hours ago

add a comment |

2 Answers
2

active

oldest

votes

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
 ) 
 select * from T 
 where color_code in ('green', 'red') 
 and sale_date = '1970';

 color_code | sale_date 
------------+-----------
 red | 1970
(1 row)

However:

with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
) 
select * from T 
where color_code = 'green' or color_code = 'red' 
 and sale_date = '1970';

color_code | sale_date 
------------+-----------
 red | 1970
 green | 1969
(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 
where (color_code = 'green' or color_code = 'red') 
 and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

 Heap Blocks: exact=38679 lossy=496680

This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.

Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f243790%2fhow-to-optimize-in-query-on-indexed-column%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
 ) 
 select * from T 
 where color_code in ('green', 'red') 
 and sale_date = '1970';

 color_code | sale_date 
------------+-----------
 red | 1970
(1 row)

However:

with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
) 
select * from T 
where color_code = 'green' or color_code = 'red' 
 and sale_date = '1970';

color_code | sale_date 
------------+-----------
 red | 1970
 green | 1969
(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 
where (color_code = 'green' or color_code = 'red') 
 and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
 ) 
 select * from T 
 where color_code in ('green', 'red') 
 and sale_date = '1970';

 color_code | sale_date 
------------+-----------
 red | 1970
(1 row)

However:

with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
) 
select * from T 
where color_code = 'green' or color_code = 'red' 
 and sale_date = '1970';

color_code | sale_date 
------------+-----------
 red | 1970
 green | 1969
(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 
where (color_code = 'green' or color_code = 'red') 
 and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
 ) 
 select * from T 
 where color_code in ('green', 'red') 
 and sale_date = '1970';

 color_code | sale_date 
------------+-----------
 red | 1970
(1 row)

However:

with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
) 
select * from T 
where color_code = 'green' or color_code = 'red' 
 and sale_date = '1970';

color_code | sale_date 
------------+-----------
 red | 1970
 green | 1969
(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 
where (color_code = 'green' or color_code = 'red') 
 and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
 ) 
 select * from T 
 where color_code in ('green', 'red') 
 and sale_date = '1970';

 color_code | sale_date 
------------+-----------
 red | 1970
(1 row)

However:

with T (color_code, sale_date) as ( 
 values ('red', '1970'), ('green','1969')
) 
select * from T 
where color_code = 'green' or color_code = 'red' 
 and sale_date = '1970';

color_code | sale_date 
------------+-----------
 red | 1970
 green | 1969
(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 
where (color_code = 'green' or color_code = 'red') 
 and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

answered 6 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

 Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago

add a comment |

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

 Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago

add a comment |

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

 Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

 Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago

add a comment |

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
7 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
7 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
4 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xjyuk

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Sahara Skak | Bilen | Luke uk diar | NawigatsjuunCommonskategorii: SaharaWikivoyage raisfeerer: Sahara26° N, 13° O

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Sahara Skak | Bilen | Luke uk diar | NawigatsjuunCommonskategorii: SaharaWikivoyage raisfeerer: Sahara26° N, 13° O

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

2 Answers
2

2 Answers
2

2 Answers
2