<br />
<b>Deprecated</b>:  The each() function is deprecated. This message will be suppressed on further calls in <b>/home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php</b> on line <b>456</b><br />
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[pganalyze Blog]]></title><description><![CDATA[Monitoring Postgres and tuning query performance]]></description><link>https://pganalyze.com</link><generator>GatsbyJS</generator><lastBuildDate>Mon, 13 Apr 2026 17:57:53 GMT</lastBuildDate><atom:link href="https://pganalyze.com/feed.xml" rel="self" type="application/rss+xml"/><item><title><![CDATA[Waiting for Postgres 19: Reduced timing overhead for EXPLAIN ANALYZE with RDTSC]]></title><description><![CDATA[In today’s E122 of “5mins of Postgres” we're talking about the upcoming Postgres 19 release, and how a change in the Postgres instrumentation handling reduces overhead of timing measurements in EXPLAIN ANALYZE using the RDTSC instruction, and why this will allow turning on  for more workloads. We dive into the recently committed change that I (Lukas) authored together with Andres Freund and David Geier. See the full transcript with examples below. Share this episode: Click here to share this…]]></description><link>https://pganalyze.com/blog/5mins-postgres-19-reduced-timing-overhead-explain-analyze</link><guid isPermaLink="false">https://pganalyze.com/blog/5mins-postgres-19-reduced-timing-overhead-explain-analyze</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Sat, 11 Apr 2026 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;In today’s E122 of “5mins of Postgres” we&apos;re talking about the upcoming Postgres 19 release, and how a change in the Postgres instrumentation handling reduces overhead of timing measurements in EXPLAIN ANALYZE using the RDTSC instruction, and why this will allow turning on &lt;code &gt;auto_explain.log_timing&lt;/code&gt; for more workloads.&lt;/p&gt;
&lt;p&gt;We dive into the &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=294520c44487ecaade7a6ea8781b973f9ed03909&quot;&gt;recently committed&lt;/a&gt; change that I (Lukas) authored together with Andres Freund and David Geier. See the full transcript with examples below.&lt;/p&gt;
&lt;iframe
    width=&quot;750&quot;
    height=&quot;421&quot;
    src=&quot;https://www.youtube-nocookie.com/embed/4EgdLMxkCrE&quot;
    frameborder=&quot;0&quot;
    modestbranding=&quot;1&quot; controls=&quot;0&quot; allownetworking=&quot;internal&quot;
    allow=&quot;autoplay; encrypted-media&quot;
    allowfullscreen
&gt;
&lt;/iframe&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;p&gt;&lt;strong&gt;Share this episode:&lt;/strong&gt; Click here to share this episode &lt;a href=&quot;https://www.LinkedIn.com/shareArticle?mini=true&amp;#x26;url=https://pganalyze.com/blog/5mins-postgres-19-reduced-timing-overhead-explain-analyze&amp;#x26;title=Waiting%20for%20Postgres%2019%20Reduced%20timing%20overhead%20for%20EXPLAIN%20ANALYZE%20with%20RDTSC&amp;#x26;source=LinkedIn&quot;&gt;on LinkedIn&lt;/a&gt;. Feel free to &lt;a href=&quot;https://pganalyze.com/newsletter&quot;&gt;sign up for our newsletter&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/channel/UCDV_1Dz2Ixgl1nT_3DUZVFw&quot;&gt;subscribe to our YouTube channel&lt;/a&gt;.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-problem-of-slow-timing-measurements&quot;&gt;The problem of slow timing measurements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#rdtsc-vs-rdtscp&quot;&gt;RDTSC vs RDTSCP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-new-timing_clock_source-postgres-setting&quot;&gt;The new timing_clock_source Postgres setting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#live-demo-on-postgres-19-development-branch&quot;&gt;Live demo on Postgres 19 development branch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-we-have-discussed-in-this-episode-of-5mins-of-postgres&quot;&gt;What we have discussed in this episode of 5mins of Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Transcript&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Welcome back to 5mins of Postgres! Today we talk about a change in the upcoming Postgres 19 release that will lower timing overhead for EXPLAIN ANALYZE.&lt;/p&gt;
&lt;p&gt;This is a change that I contributed myself together with Andres Freund and David Geier, and we&apos;ve worked on this change for a couple of years now actually. But in this release, we basically sat down and we really figured out all the little details that make this work. Now, this was committed recently to the Postgres 19 development branch, and to be clear, it might still be taken out of the final release if any issues are found, but right now, I think there&apos;s a decent chance it stays in.&lt;/p&gt;
&lt;p&gt;Postgres 19 will be released in September or October, and feature freeze just happened and the beta release will come out sometime in May this year. Now let me show you a little bit more about what this change is about.&lt;/p&gt;
&lt;h2 id=&quot;the-problem-of-slow-timing-measurements&quot; &gt;&lt;a href=&quot;#the-problem-of-slow-timing-measurements&quot; aria-label=&quot;the problem of slow timing measurements permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The problem of slow timing measurements&lt;/h2&gt;
&lt;p&gt;Back in 2020, &lt;a href=&quot;https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de&quot;&gt;Andres Freund started a mailing list thread&lt;/a&gt; where he was basically saying when you run EXPLAIN ANALYZE on a query, it looks a lot slower than it actually is. So in this example here, Andres created a table with 50 million rows:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; lotsarows&lt;span &gt;(&lt;/span&gt;&lt;span &gt;key&lt;/span&gt; &lt;span &gt;int&lt;/span&gt; &lt;span &gt;not&lt;/span&gt; &lt;span &gt;null&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; lotsarows &lt;span &gt;SELECT&lt;/span&gt; generate_series&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;50000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
VACUUM FREEZE lotsarows&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Very simple table, and then he ran a &lt;code &gt;COUNT(*)&lt;/code&gt; on that table:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;SELECT count(*) FROM lotsarows;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If I run the &lt;code &gt;COUNT(*)&lt;/code&gt; without any EXPLAIN, I get a run time of about 1,900 milliseconds. If I run, EXPLAIN ANALYZE with TIMING OFF and back in that release also with BUFFERS OFF, I get a runtime of about 2,300 milliseconds. Now, if I turn TIMING ON the runtime more than doubles from the actual time. Instead of my query taking 1,900 milliseconds, the query now takes 4,200 milliseconds:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;-- best of three:
SELECT count(*) FROM lotsarows;
Time: 1923.394 ms (00:01.923)

-- best of three:
EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM lotsarows;
Time: 2319.830 ms (00:02.320)

-- best of three:
EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
Time: 4202.649 ms (00:04.203)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And first of all, that&apos;s a problem because it skews what my actual performance is. If I&apos;m doing testing with EXPLAIN ANALYZE, and I don&apos;t recognize that timing has overhead, I basically think my query is slower than it actually is. The other issue is that if you run auto_explain, usually we recommend people turn log_timing off. Just for example, here in pganalyze&apos;s install instructions, we like recommending people to use auto explain, but we always tell people today to turn timing off because we think that this is not safe to use on most production systems without knowing your workload better.&lt;/p&gt;
&lt;p&gt;If we look at the problem here in more detail, Andres basically did a little profile here and he looked at where is that overhead coming from?&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;-   95.49%     0.00%  postgres     postgres                 [.] agg_retrieve_direct (inlined)
   - agg_retrieve_direct (inlined)
      - 79.27% fetch_input_tuple
         - ExecProcNode (inlined)
            - 75.72% ExecProcNodeInstr
               + 25.22% SeqNext
               - 21.74% InstrStopNode
                  + 17.80% __GI___clock_gettime (inlined)
               - 21.44% InstrStartNode
                  + 19.23% __GI___clock_gettime (inlined)
               + 4.06% ExecScan
      + 13.09% advance_aggregates (inlined)
        1.06% MemoryContextReset&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;rdtsc-vs-rdtscp&quot; &gt;&lt;a href=&quot;#rdtsc-vs-rdtscp&quot; aria-label=&quot;rdtsc vs rdtscp permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;RDTSC vs RDTSCP&lt;/h2&gt;
&lt;p&gt;So first of all, in that profile we see the InstrStartNode and InstrStopNode calls. So those are basically calls that get added by Postgres when instrumentation is on, so when I&apos;m running an EXPLAIN ANALYZE, and we can see that most of that time is spent in the clock_gettime function. On a modern Linux system, this is not actually a syscall. Instead, it directly calls &lt;code &gt;RDTSCP&lt;/code&gt;. &lt;code &gt;RDTSCP&lt;/code&gt; is basically a special instruction on the CPU that gets what&apos;s called the timestamp counter.&lt;/p&gt;
&lt;p&gt;And think of the timestamp counter as a value that keeps going up, that basically counts cycles, but it counts cycles in a way that isn&apos;t influenced by power level changes or other issues that might cause it to be skewed. So it&apos;s actually pretty reliable. Now the problem is that what &lt;code &gt;RDTSCP&lt;/code&gt; does is it waits until all prior instructions have finished and we say instructions we mean CPU instructions. And so basically what happens is that the timing itself is not just getting the time, but it&apos;s also blocking other activity from occurring.&lt;/p&gt;
&lt;p&gt;It&apos;s blocking the CPU from basically running things in parallel effectively. Now, there is a different instruction called &lt;code &gt;RDTSC&lt;/code&gt; without the P. And this instruction basically does not have this blocking of other concurrent instructions. And so when you have this in the picture, then it actually drastically lowers the performance overhead of the timing.&lt;/p&gt;
&lt;p&gt;In this particular example Andres ran at the time, instead of the query taking 4,200 milliseconds, it actually took only 2,600 milliseconds:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                          QUERY PLAN                                                           │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=846239.20..846239.21 rows=1 width=8) (actual time=2610.235..2610.235 rows=1 loops=1)                         │
│   -&gt;  Seq Scan on lotsarows  (cost=0.00..721239.16 rows=50000016 width=0) (actual time=0.006..1512.886 rows=50000000 loops=1) │
│ Planning Time: 0.028 ms                                                                                                       │
│ Execution Time: 2610.256 ms                                                                                                   │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

Time: 2610.589 ms (00:02.611)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This was mainly a prototype at the time. So a lot of the complexities, and part of the reason why this took so long to get implemented is because we needed to make sure that this worked in all kinds of different systems that Postgres gets used on.&lt;/p&gt;
&lt;h2 id=&quot;the-new-timing_clock_source-postgres-setting&quot; &gt;&lt;a href=&quot;#the-new-timing_clock_source-postgres-setting&quot; aria-label=&quot;the new timing_clock_source postgres setting permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The new timing_clock_source Postgres setting&lt;/h2&gt;
&lt;p&gt;One of the things we ended up adding based on discussions on the mailing lists is a new setting to control whether this gets used or not. So with the &lt;a href=&quot;https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-TIME&quot;&gt;new &quot;timing_clock_source&quot; setting&lt;/a&gt;, you basically control whether you automatically use the TSC clock source on x86-64 CPUs that are modern enough that have the right instructions. You can force the old way of using the system clock, or you can explicitly set the TSC clock source.&lt;/p&gt;
&lt;p&gt;Now in Postgres, we&apos;re now basically splitting into two different use cases. So for things like EXPLAIN ANALYZE where we don&apos;t necessarily care about a very short, exactly precise measurement, like it&apos;s more about the cumulative time that gets taken we use the RDTSC instruction versus in other cases where we care about the higher precision, and it&apos;s still a short, run time we do use the RDTSCP instruction, which is higher overhead. Now there is a lot of supporting code to make this work in different environments, if you&apos;re interested in how that works, look at the &quot;instr_time.c&quot; file.&lt;/p&gt;
&lt;h2 id=&quot;live-demo-on-postgres-19-development-branch&quot; &gt;&lt;a href=&quot;#live-demo-on-postgres-19-development-branch&quot; aria-label=&quot;live demo on postgres 19 development branch permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Live demo on Postgres 19 development branch&lt;/h2&gt;
&lt;p&gt;I want to show you an actual example of how this improvement now looks like in the 19 branch. So here I have an SSH client because my machine right now actually is a MacBook. And this initial release will only be focused on getting the fast timing in for x86-64. ARM has a similar instruction, but there is some outstanding issues for ARM machines. So right now I&apos;m connected here via SSH to a different machine. This machine sits right next to me, it&apos;s this little Framework Desktop here, but that one is an x86 machine.&lt;/p&gt;
&lt;p&gt;And so now what I can do here is I have my Postgres branch already built. I&apos;m first going to run the pg_test_timing utility, it basically measures that overhead of timing. Now here we get three different measurements:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;System clock source: clock_gettime (CLOCK_MONOTONIC)
Average loop time including overhead: 18.80 ns
Histogram of timing durations:
   &amp;lt;= ns   % of total  running %      count
       0       0.0000     0.0000          0
       1       0.0000     0.0000          0
       3       0.0000     0.0000          0
       7       0.0000     0.0000          0
      15      12.7533    12.7533   20353931
      31      87.2357    99.9890  139225930
...

Clock source: RDTSCP
Average loop time including overhead: 16.94 ns
Histogram of timing durations:
   &amp;lt;= ns   % of total  running %      count
       0       0.0000     0.0000          0
       1       0.0000     0.0000          0
       3       0.0000     0.0000          0
       7       0.0000     0.0000          0
      15      31.1807    31.1807   55204578
      31      68.8159    99.9966  121836600
...

Fast clock source: RDTSC
Average loop time including overhead: 11.69 ns
Histogram of timing durations:
   &amp;lt;= ns   % of total  running %      count
       0       0.0000     0.0000          0
       1       0.0000     0.0000          0
       3       0.0000     0.0000          0
       7       0.0000     0.0000          0
      15      83.5188    83.5188  214321443
      31      16.4789    99.9977   42287217
...

TSC frequency in use: 2993629 kHz
TSC frequency from calibration: 2994357 kHz
TSC clock source will be used by default, unless timing_clock_source is set to &apos;system&apos;.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We get the built in clock source called &lt;code &gt;clock_gettime&lt;/code&gt;. That took 18 nanoseconds to get a time measurement. Now we&apos;re checking with &lt;code &gt;RDTSCP&lt;/code&gt;, which again, blocks out of order instructions. That one takes 16.9 nanoseconds. And then if we&apos;re running with &lt;code &gt;RDTSC&lt;/code&gt;, it takes 11.6 nanoseconds. So clearly &lt;code &gt;RDTSC&lt;/code&gt; has less overhead here, I&apos;m getting 50% benefit in this test timing program. I also see which frequency gets used, and then I also see whether that new clock source will used by default. If I don&apos;t want to use it, I would have to set &lt;code &gt;timing_clock_source&lt;/code&gt; to &lt;code &gt;system&lt;/code&gt; explicitly.&lt;/p&gt;
&lt;p&gt;The only reason why that would make sense by the way, is if for some reason your TSC is emulated in a certain way so the timing measurements are not stable. And then &lt;code &gt;timing_clock_source = system&lt;/code&gt; might provide you those stable measurements.&lt;/p&gt;
&lt;p&gt;Now I can run a psql client, show you the actual example. I already have that table that Andres created as an example here as well. First of all, I&apos;ll turn on &lt;code &gt;\timing&lt;/code&gt;. This is on the psql side, just gives me the run time. Now I&apos;m doing a &lt;code &gt;SELECT COUNT(*)&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;postgres=# SELECT count(*) FROM lotsarows;
  count   
----------
 50000000
(1 row)

Time: 268.466 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is a more modern machine, so this takes the same 50 million rows, just goes a little faster. So I have about 260 - 270 milliseconds of runtime here.&lt;/p&gt;
&lt;p&gt;If I run with &lt;code &gt;EXPLAIN (ANALYZE, TIMING OFF, BUFFERS OFF)&lt;/code&gt;, let&apos;s start with that. I&apos;m not doing a lot of extra work really. I&apos;m just counting how many rows got returned:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;postgres=# EXPLAIN (ANALYZE, TIMING OFF, BUFFERS OFF) SELECT count(*) FROM lotsarows;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual rows=1.00 loops=1)
   -&gt;  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         -&gt;  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual rows=1.00 loops=3)
               -&gt;  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual rows=16666666.67 loops=3)
 Planning Time: 0.174 ms
 Execution Time: 297.043 ms
(8 rows)

Time: 297.535 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That&apos;s pretty simple.&lt;/p&gt;
&lt;p&gt;And then if I now turn &lt;code &gt;TIMING ON&lt;/code&gt; this is with the TSC clock source, I get a measurement of about 350 milliseconds:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;postgres=# EXPLAIN (ANALYZE, TIMING ON, BUFFERS OFF) SELECT count(*) FROM lotsarows;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual time=349.687..351.719 rows=1.00 loops=1)
   -&gt;  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual time=349.606..351.709 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         -&gt;  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual time=347.932..347.933 rows=1.00 loops=3)
               -&gt;  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual time=0.149..201.918 rows=16666666.67 loops=3)
 Planning Time: 0.186 ms
 Execution Time: 351.773 ms
(8 rows)

Time: 352.171 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I&apos;m still seeing, I would say about a 20 - 25% overhead here. So it&apos;s not free, but it&apos;s substantially better than with the system clock source.&lt;/p&gt;
&lt;p&gt;If I do &lt;code &gt;SET timing_clock_source = system&lt;/code&gt;, and I do the timing again, you see a drastic difference:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;SET timing_clock_source = &apos;system&apos;;
EXPLAIN (ANALYZE, TIMING ON, BUFFERS OFF) SELECT count(*) FROM lotsarows;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual time=799.624..801.496 rows=1.00 loops=1)
   -&gt;  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual time=799.535..801.488 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         -&gt;  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual time=797.885..797.887 rows=1.00 loops=3)
               -&gt;  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual time=0.073..417.005 rows=16666666.67 loops=3)
 Planning Time: 0.115 ms
 Execution Time: 801.529 ms
(8 rows)

Time: 801.979 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Just for clarity, if I just did a regular select count star here, it would take me 260 milliseconds to run the actual query:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;postgres=# SELECT count(*) FROM lotsarows;
  count   
----------
 50000000
(1 row)

Time: 263.824 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And with the old timing clock source, I get a run time of 800 milliseconds. Versus with the new TSC clock source, I get 355 milliseconds:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;SET timing_clock_source = &apos;tsc&apos;;
EXPLAIN (ANALYZE, TIMING ON, BUFFERS OFF) SELECT count(*) FROM lotsarows;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual time=353.401..355.238 rows=1.00 loops=1)
   -&gt;  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual time=353.292..355.229 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         -&gt;  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual time=351.081..351.082 rows=1.00 loops=3)
               -&gt;  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual time=0.131..200.584 rows=16666666.67 loops=3)
 Planning Time: 0.150 ms
 Execution Time: 355.291 ms
(8 rows)

Time: 355.690 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;So a drastic difference, and I think this to me also makes a difference for many systems where I would feel comfortable using auto_explain on with log_timing on just because, most queries are not this extreme. To be clear, many realistic queries have much less repetition over just these instrumentation start and stop functions.&lt;/p&gt;
&lt;p&gt;Previously you would&apos;ve seen 5-10% on average, now you&apos;ll probably see 2-3% on average, which for many systems is a good trade off to have the full instrumentation data available in auto_explain.&lt;/p&gt;
&lt;p&gt;There&apos;s many other new features that are coming up, hear some more about that in upcoming episodes.&lt;/p&gt;
&lt;p&gt;I hope you learned something new from E122 of 5mins of Postgres. Feel free to &lt;a href=&quot;https://www.youtube.com/channel/UCDV_1Dz2Ixgl1nT_3DUZVFw&quot;&gt;subscribe to our YouTube channel&lt;/a&gt;, &lt;a href=&quot;https://pganalyze.com/newsletter&quot;&gt;sign up for our newsletter&lt;/a&gt; or &lt;a href=&quot;https://www.linkedin.com/company/pganalyze/&quot;&gt;follow us on LinkedIn&lt;/a&gt; to get updates about new episodes!&lt;/p&gt;
&lt;h2 id=&quot;what-we-have-discussed-in-this-episode-of-5mins-of-postgres&quot; &gt;&lt;a href=&quot;#what-we-have-discussed-in-this-episode-of-5mins-of-postgres&quot; aria-label=&quot;what we have discussed in this episode of 5mins of postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What we have discussed in this episode of 5mins of Postgres&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=294520c44487ecaade7a6ea8781b973f9ed03909&quot;&gt;Postgres 19 commit - instrumentation: Use Time-Stamp Counter on x86-64 to lower overhead&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de&quot;&gt;Postgres pgsql-hackers mailinglist discussion: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-TIME&quot;&gt;The new timing_clock_source Postgres setting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/common/instr_time.c&quot;&gt;Timing instrumentation in instr_time.c&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/docs/explain/setup/amazon_rds/03_review_settings&quot;&gt;Recommended auto_explain settings by pganalyze&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[The Dilemma of the ‘AI DBA’]]></title><description><![CDATA[Like many in the industry, my perspective on AI tools has shifted considerably over the past year, specifically when it comes to software engineering tasks. Going from “this is nice, but doesn’t really solve complex tasks for me” to “this actually works pretty well for certain use cases.” But the more capable these tools become, the sharper one dilemma gets: you can hand off the work, but an AI agent won’t ultimately be responsible when the database goes down and your app stops working. For…]]></description><link>https://pganalyze.com/blog/the-ai-dba-dilemma</link><guid isPermaLink="false">https://pganalyze.com/blog/the-ai-dba-dilemma</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Wed, 11 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Like many in the industry, my perspective on AI tools has shifted considerably over the past year, specifically when it comes to software engineering tasks. Going from “this is nice, but doesn’t really solve complex tasks for me” to “this actually works pretty well for certain use cases.” But the more capable these tools become, the sharper one dilemma gets: you can hand off the work, but an AI agent won’t ultimately be responsible when the database goes down and your app stops working.&lt;/p&gt;
&lt;p&gt;For databases, the terms ‘AI DBA’ and ‘self-driving database’ have become marketing buzzwords with the promise of having an agent that can handle creating indexes, optimizing data models, and tuning parameter settings, leaving humans free to focus on higher-value work. The appeal is understandable. Databases are hard; Postgres can behave in odd ways; and, &lt;strong&gt;if an agent can absorb that complexity, why invest in becoming an expert yourself?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;While I’m a big believer in automating routine tasks, I worry the ‘AI DBA’ discourse is missing the mark in terms of the practical, grounded truth of how to use AI tools effectively, especially in production, and who’s responsible when incidents happen.&lt;/p&gt;
&lt;p&gt;If we let the AI do it all willy-nilly, then we accumulate cognitive debt and lose important context, making it harder to take responsibility for the outcome. But there is hope yet: And it comes in the form of enabling engineers, instead of replacing DBAs.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#how-the-ai-dba-framing-gets-it-wrong&quot;&gt;How the ‘AI DBA’ framing gets it wrong&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-llms-are-actually-good-at&quot;&gt;What LLMs are actually good at&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#lets-enable-engineers-and-dbas-to-own-responsibility-for-their-database&quot;&gt;Let&apos;s enable engineers and DBAs to own responsibility for their database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#looking-ahead&quot;&gt;Looking ahead&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;how-the-ai-dba-framing-gets-it-wrong&quot; &gt;&lt;a href=&quot;#how-the-ai-dba-framing-gets-it-wrong&quot; aria-label=&quot;how the ai dba framing gets it wrong permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How the ‘AI DBA’ framing gets it wrong&lt;/h2&gt;
&lt;p&gt;Framing the role of AI in databases as an ‘AI DBA’ makes a critical mistake: it conflates doing the work with owning the outcome. DevOps gave us a useful precedent here. It didn&apos;t remove responsibility from teams: it moved it closer to them. A feature isn&apos;t done when it&apos;s merged: it&apos;s done when it works in production. That same standard should apply to the database: a deployment isn&apos;t done until it performs in production. AI doesn&apos;t change that bar.&lt;/p&gt;
&lt;p&gt;Let’s imagine we have a database team today, with titles like “DBA” or “data platform engineer”:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/8711fdf26bb22c7e6ddc6941fe5ec689/1df5b/today.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Diagram showing application and data platform teams&quot;
        title=&quot;Diagram showing application and data platform teams&quot;
        src=&quot;https://pganalyze.com/static/8711fdf26bb22c7e6ddc6941fe5ec689/1d69c/today.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;And let’s say our plan here is that we can replace parts of that team with our new ‘AI DBA’ agent, that can do the work in a good enough way, and is available at all times:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/d0210406485b755e2aa56789f95da9a2/1df5b/tomorrow-with-ai-dba.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Diagram showing the data platform team replaced by AI DBAs&quot;
        title=&quot;Diagram showing the data platform team replaced by AI DBAs&quot;
        src=&quot;https://pganalyze.com/static/d0210406485b755e2aa56789f95da9a2/1d69c/tomorrow-with-ai-dba.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;But what happens in that scenario if we have the ‘AI DBA’ agent in the picture? Does it magically fix all production problems? Today it would struggle with even having production access in the first place, because giving production credentials to an autonomous AI agent does not absolve you of its decisions.&lt;/p&gt;
&lt;h2 id=&quot;what-llms-are-actually-good-at&quot; &gt;&lt;a href=&quot;#what-llms-are-actually-good-at&quot; aria-label=&quot;what llms are actually good at permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What LLMs are actually good at&lt;/h2&gt;
&lt;p&gt;Even if models improve significantly, they are still LLMs. You can&apos;t hold an agent accountable. It needs approvals for high-risk actions. Which means in any realistic scenario, responsibility falls back on either the infrastructure team or the application team — and we&apos;ve just made the handoff murkier.&lt;/p&gt;
&lt;p&gt;Worse, framing the problem as ‘nobody wants to do DBA work, so let&apos;s replace the DBA’ sends a clear message to experienced database engineers: your expertise isn&apos;t valued here. And beyond the question of accountability, it creates serious problems in practice.&lt;/p&gt;
&lt;p&gt;If we think back to why tools like Claude Code have had such tremendous success over the last year, it’s because it put engineers in the driver’s seat - and made them more effective at what they’re already doing. Quickly cross-referencing different pieces of source code, letting the LLM write code for CRUD tasks, exploring different ways of solving a problem, or investigating production incidents from different data sources effectively, whilst quickly going back to the source.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What does this mean for working with Postgres databases?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Rather than replacing database experts with an AI agent, we should focus on what tasks LLMs genuinely excel at today: Information retrieval across different tools, locating the source code file that produced a query, reviewing pull requests automatically for bad patterns, and providing basic fluency for someone unfamiliar with the database, and apply that focus to enabling engineers who work with databases but whose day-to-day job isn&apos;t the database.&lt;/p&gt;
&lt;h2 id=&quot;lets-enable-engineers-and-dbas-to-own-responsibility-for-their-database&quot; &gt;&lt;a href=&quot;#lets-enable-engineers-and-dbas-to-own-responsibility-for-their-database&quot; aria-label=&quot;lets enable engineers and dbas to own responsibility for their database permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Let&apos;s enable engineers and DBAs to own responsibility for their database&lt;/h2&gt;
&lt;p&gt;The role of the DBA or data platform engineer needs to change. Successful teams already focus on enabling application engineers, instead of being gatekeepers to changes. The future is specific, purpose-built tools, owned by platform teams, made to be reliable for production use:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/d9ebc881396c99dbf13051b813a2a1a4/1df5b/tomorrow-with-enabling.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Diagram showing AI tools next to both application and data platform team members, calling out individual tools&quot;
        title=&quot;Diagram showing AI tools next to both application and data platform team members, calling out individual tools&quot;
        src=&quot;https://pganalyze.com/static/d9ebc881396c99dbf13051b813a2a1a4/1d69c/tomorrow-with-enabling.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;If we get it right, AI tools can help us collect evidence for performance optimizations, so that when the application engineer goes to the data platform team for help, they bring the information necessary to facilitate effective investigative work.&lt;/p&gt;
&lt;p&gt;AI tools can also help us bridge the gap in the other direction: data platform engineers can put on the shoes of the application engineer and become familiar with the codebase, by asking things like &quot;Where did this query get called?&quot; or &quot;Does this field get used somewhere?&quot;&lt;/p&gt;
&lt;p&gt;To enable organizations to roll out AI tools not just in development, but in production use too, we need to be clear on what is being done - and write code that abstracts production information and possibly actions in a safe way. Whether that means specific tool calls, sandboxing, or providing restricted access via a CLI, it needs to be curated to suit an organization’s environment.&lt;/p&gt;
&lt;p&gt;The data platform team should own and provide safe, reliable tools that enable engineers across the organization to use AI tools effectively with production statistics and metadata, and be responsible for their own database.&lt;/p&gt;
&lt;h2 id=&quot;looking-ahead&quot; &gt;&lt;a href=&quot;#looking-ahead&quot; aria-label=&quot;looking ahead permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Looking ahead&lt;/h2&gt;
&lt;p&gt;At pganalyze we build the best monitoring and optimization tools for Postgres, to enable both engineers and platform teams to work better together. One of the ways we do that is we make sure you have reliable monitoring data about your production system. Which query was running yesterday? What EXPLAIN plan was being used? Did the plan switch unexpectedly?&lt;/p&gt;
&lt;p&gt;And it turns out that data is pretty useful when working with AI tools. The &lt;a src=&quot;https://pganalyze.com/docs/mcp&quot;&gt;pganalyze MCP Server&lt;/a&gt;, now in early access, enables safe sharing of specific information about production databases, whilst keeping in mind specific workflows, and enabling engineers to work better.&lt;/p&gt;
&lt;p&gt;There is more to come later this year. Our aim is to focus on automating the tedious tasks, whilst staying grounded in what actually works for production systems. Sometimes it makes sense to use an AI tool, and sometimes deterministic logic is the best choice. And I’m excited to keep working with, and hearing from teams what works for them, and discover new best practices together.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;With thanks to Maciek Sakrejda, Bison Hubert and Laura Kelso for input and reviews on this article.&lt;/em&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[How we used pg_query to rewrite queries to fix bad query plans]]></title><description><![CDATA[Rewriting SQL queries programmatically is harder than it looks. As a human, adding an extra AND condition to a WHERE clause is simple enough. But doing the same thing in code quickly gets complicated. You might try regex, but the real difficulty is coming up with a pattern that works for every variation of a query. AI could generate plausible rewrites, but it's hard to guarantee correctness. These rewrites may look valid, but SQL has many subtle corner cases, so it's difficult to prove that the…]]></description><link>https://pganalyze.com/blog/rewriting-queries-with-pg-query</link><guid isPermaLink="false">https://pganalyze.com/blog/rewriting-queries-with-pg-query</guid><dc:creator><![CDATA[Keiko Oda]]></dc:creator><pubDate>Mon, 06 Oct 2025 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Rewriting SQL queries programmatically is harder than it looks. As a human, adding an extra AND condition to a WHERE clause is simple enough. But doing the same thing in code quickly gets complicated. You might try regex, but the real difficulty is coming up with a pattern that works for every variation of a query. AI could generate plausible rewrites, but it&apos;s hard to guarantee correctness. These rewrites may look valid, but SQL has many subtle corner cases, so it&apos;s difficult to prove that the transformed query always behaves identically.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#query-rewrite-101&quot;&gt;Query rewrite 101&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#example-1---add-0-to-order-by-to-avoid-index-misuse&quot;&gt;Example #1 - Add +0 to ORDER BY to avoid index misuse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#example-2---transform-multiple-or-clauses-to-any&quot;&gt;Example #2 - Transform multiple OR clauses to ANY&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;As we are developing the new &lt;a href=&quot;https://pganalyze.com/webinars/introducing-query-advisor&quot;&gt;Query Advisor feature&lt;/a&gt; in pganalyze, we need a way to take query insights one step further: not only highlight potential issues, but also suggest alternative query patterns. To do that safely, we turn to &lt;a href=&quot;https://github.com/pganalyze/pg_query/&quot;&gt;pg_query&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Using the pg_query open source library, you can parse a query into a structured parse tree, tweak it at the tree level, and then regenerate valid SQL. It ensures the output is deterministic and syntactically correct. With a recent change, it will also support pretty-printing with configurable indentation and line length, making rewrites more powerful and easier to read.&lt;/p&gt;
&lt;p&gt;In this post, we will show a few examples of how you can use pg_query to rewrite queries, starting from a simple demonstration and then moving on to real-world patterns that benefit from rewriting.&lt;/p&gt;
&lt;h2 id=&quot;query-rewrite-101&quot; &gt;&lt;a href=&quot;#query-rewrite-101&quot; aria-label=&quot;query rewrite 101 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Query rewrite 101&lt;/h2&gt;
&lt;p&gt;Let&apos;s walk through a really simple case of using pg_query to rewrite a query. Bindings are available for &lt;a href=&quot;https://github.com/pganalyze/pg_query/&quot;&gt;Ruby&lt;/a&gt;, &lt;a href=&quot;https://github.com/pganalyze/pg_query.rs&quot;&gt;Rust&lt;/a&gt; and &lt;a href=&quot;https://github.com/pganalyze/pg_query_go&quot;&gt;Go&lt;/a&gt;, as well as community-maintained ports for Node.js and Python.
If you want to learn more about the basics of pg_query, check out &lt;a href=&quot;https://pganalyze.com/blog/pg-query-postgres-16&quot;&gt;our past blog post&lt;/a&gt;.
In this blog post, we&apos;ll use the Ruby bindings.&lt;/p&gt;
&lt;p&gt;Let&apos;s start with a simple query:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&quot;pg_query&quot;&lt;/span&gt;
parsed_query &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT id FROM tbl1&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; #&amp;lt;PgQuery::ParserResult:0x000000015da1dc50&lt;/span&gt;
&lt;span &gt;#  @aliases=nil,&lt;/span&gt;
&lt;span &gt;#  @cte_names=nil,&lt;/span&gt;
&lt;span &gt;#  @functions=nil,&lt;/span&gt;
&lt;span &gt;#  @query=&quot;SELECT id FROM tbl1&quot;,&lt;/span&gt;
&lt;span &gt;#  @tables=nil,&lt;/span&gt;
&lt;span &gt;#  @tree=&lt;/span&gt;
&lt;span &gt;#   &amp;lt;PgQuery::ParseResult: version: 170005, stmts: [&amp;lt;PgQuery::RawStmt: stmt: &amp;lt;PgQuery::Node: select_stmt:&lt;/span&gt;
&lt;span &gt;#     &amp;lt;PgQuery::SelectStmt: distinct_clause: [],&lt;/span&gt;
&lt;span &gt;#       target_list: [&amp;lt;PgQuery::Node: res_target: &amp;lt;PgQuery::ResTarget: name: &quot;&quot;, indirection: [], val: &amp;lt;PgQuery::Node: column_ref: &amp;lt;PgQuery::ColumnRef: fields: [&amp;lt;PgQuery::Node: string: &amp;lt;PgQuery::String: sval: &quot;id&quot;&gt;&gt;], location: 7&gt;&gt;, location: 7&gt;&gt;],&lt;/span&gt;
&lt;span &gt;#       from_clause: [&amp;lt;PgQuery::Node: range_var: &amp;lt;PgQuery::RangeVar: catalogname: &quot;&quot;, schemaname: &quot;&quot;, relname: &quot;tbl1&quot;, inh: true, relpersistence: &quot;p&quot;, location: 15&gt;&gt;],&lt;/span&gt;
&lt;span &gt;#       group_clause: [], group_distinct: false, window_clause: [], values_lists: [], sort_clause: [],&lt;/span&gt;
&lt;span &gt;#       limit_option: :LIMIT_OPTION_DEFAULT, locking_clause: [], op: :SETOP_NONE, all: false&gt;&gt;,&lt;/span&gt;
&lt;span &gt;#     stmt_location: 0, stmt_len: 0&gt;]&gt;,&lt;/span&gt;
&lt;span &gt;#  @warnings=[]&gt;&lt;/span&gt;
parsed_query&lt;span &gt;.&lt;/span&gt;tables
&lt;span &gt;# =&gt; [&quot;tbl1&quot;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here, &lt;code &gt;parsed_query&lt;/code&gt; is a parse result that contains a parse tree. It also exposes useful methods, such as tables, which tells us which tables are used in the query.&lt;/p&gt;
&lt;p&gt;The parse tree for this query looks like the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pganalyze.com/b7abc1b7897d1e7ad435819ac4f3d1a4/parse_tree_select_id.svg&quot; alt=&quot;Diagram of the parse tree for SELECT id FROM tbl1&quot;&gt;&lt;/p&gt;
&lt;p&gt;We can either walk the tree to visit nodes (which we&apos;ll cover later), or drill down directly to a specific node. For example, to reach the table name of the from clause:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;parsed_query&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;.&lt;/span&gt;stmts&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;stmt&lt;span &gt;.&lt;/span&gt;select_stmt&lt;span &gt;.&lt;/span&gt;from_clause&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;range_var&lt;span &gt;.&lt;/span&gt;relname
&lt;span &gt;# =&gt; &quot;tbl1&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Updating this lets us change the table name. After updating the node, we can call deparse to generate SQL again:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;parsed_query&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;.&lt;/span&gt;stmts&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;stmt&lt;span &gt;.&lt;/span&gt;select_stmt&lt;span &gt;.&lt;/span&gt;from_clause&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;range_var&lt;span &gt;.&lt;/span&gt;relname &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&quot;tbl2&quot;&lt;/span&gt;
parsed_query&lt;span &gt;.&lt;/span&gt;deparse
&lt;span &gt;# =&gt; &quot;SELECT id FROM tbl2&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that we have the basic idea of how rewriting works, let&apos;s move on to more practical examples.&lt;/p&gt;
&lt;h2 id=&quot;example-1---add-0-to-order-by-to-avoid-index-misuse&quot; &gt;&lt;a href=&quot;#example-1---add-0-to-order-by-to-avoid-index-misuse&quot; aria-label=&quot;example 1   add 0 to order by to avoid index misuse permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Example #1 - Add +0 to ORDER BY to avoid index misuse&lt;/h2&gt;
&lt;p&gt;When you find a slow query using ORDER BY combined with LIMIT, it&apos;s important to check whether the planner is picking the right index. This is something we verify in the Query Advisor feature.&lt;/p&gt;
&lt;p&gt;Let&apos;s start with a simple query:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; items &lt;span &gt;WHERE&lt;/span&gt; object_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;123&lt;/span&gt; &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With this query, when the items table has an object_id index (e.g. &lt;code &gt;items_object_id_idx&lt;/code&gt;), the planner will usually use it, and the query should finish quickly, as long as the index is selective.&lt;/p&gt;
&lt;p&gt;Now, let&apos;s add an ORDER BY:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; items &lt;span &gt;WHERE&lt;/span&gt; object_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;123&lt;/span&gt; &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; id &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In some cases, this can cause the planner to choose a plan like &quot;Index Scan Backward using items_pkey on items&quot;, and then filter out rows where &lt;code &gt;object_id = 123&lt;/code&gt;. If many rows are removed by that filter, the query can become significantly slower.&lt;/p&gt;
&lt;p&gt;A simple workaround is to add &quot;+0&quot; to the ORDER BY id. This prevents the planner from using the primary key index (&lt;code &gt;items_pkey&lt;/code&gt;).&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; items &lt;span &gt;WHERE&lt;/span&gt; object_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;123&lt;/span&gt; &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; id &lt;span &gt;+&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let&apos;s create a parse tree from the query (without &quot;+0&quot;) and look at the &quot;ORDER BY id&quot; part:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;parsed_query &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SELECT * FROM items WHERE object_id = 123 ORDER BY id LIMIT 1&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
parsed_query&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;.&lt;/span&gt;stmts&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;stmt&lt;span &gt;.&lt;/span&gt;select_stmt&lt;span &gt;.&lt;/span&gt;sort_clause&lt;span &gt;.&lt;/span&gt;sort_by&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;sort_by&lt;span &gt;.&lt;/span&gt;node
&lt;span &gt;# =&gt; &amp;lt;PgQuery::Node: column_ref: &amp;lt;PgQuery::ColumnRef: fields: [&amp;lt;PgQuery::Node: string: &amp;lt;PgQuery::String: sval: &quot;id&quot;&gt;&gt;], location: 51&gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It&apos;s a bit hard to read, but the sort_by node here is a &lt;code &gt;ColumnRef&lt;/code&gt; node pointing to id. To add &quot;+0&quot;, we replace it with an &lt;code &gt;A_Expr&lt;/code&gt; node that represents a binary expression with id on the left and 0 on the right.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pganalyze.com/a8181f5bacd9e2920c46833c450a0793/parse_tree_order_by_rewrite.svg&quot; alt=&quot;Diagram of the parse tree for ORDER BY rewrite&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the below code, create a new &lt;code &gt;A_Expr&lt;/code&gt; node:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;sort_by_node &lt;span &gt;=&lt;/span&gt; parsed_query&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;.&lt;/span&gt;stmts&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;stmt&lt;span &gt;.&lt;/span&gt;select_stmt&lt;span &gt;.&lt;/span&gt;sort_clause&lt;span &gt;.&lt;/span&gt;sort_by&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;sort_by&lt;span &gt;.&lt;/span&gt;node
new_node &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
  a_expr&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;A_Expr&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
    kind&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:AEXPR_OP&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;string&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;String&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;sval&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;+&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    lexpr&lt;span &gt;:&lt;/span&gt; sort_by_node&lt;span &gt;.&lt;/span&gt;dup&lt;span &gt;,&lt;/span&gt; &lt;span &gt;# Note: to reuse existing nodes, make sure to duplicate to avoid accidentally modifying the original tree&lt;/span&gt;
    rexpr&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;a_const&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;A_Const&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;ival&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Integer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;ival&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, we assign the new node and deparse the query. Don&apos;t forget to use the new pretty-printing options:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;parsed_query&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;.&lt;/span&gt;stmts&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;stmt&lt;span &gt;.&lt;/span&gt;select_stmt&lt;span &gt;.&lt;/span&gt;sort_clause&lt;span &gt;.&lt;/span&gt;sort_by&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;sort_by&lt;span &gt;.&lt;/span&gt;node &lt;span &gt;=&lt;/span&gt; new_node
opts &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;DeparseOpts&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;pretty_print&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; indent_size&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; trailing_newline&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
parsed_query&lt;span &gt;.&lt;/span&gt;deparse&lt;span &gt;(&lt;/span&gt;opts&lt;span &gt;:&lt;/span&gt; opts&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;SELECT *\nFROM items\nWHERE object_id = 123\nORDER BY id + 0\nLIMIT 1\n&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For more on why this rewrite helps, see our blog post &lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit&quot;&gt;Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;example-2---transform-multiple-or-clauses-to-any&quot; &gt;&lt;a href=&quot;#example-2---transform-multiple-or-clauses-to-any&quot; aria-label=&quot;example 2   transform multiple or clauses to any permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Example #2 - Transform multiple OR clauses to ANY&lt;/h2&gt;
&lt;p&gt;With Postgres 18, the planner can transform certain chains of OR comparisons into ANY, which can produce a better plan (&lt;a href=&quot;https://postgr.es/c/ae4569161&quot;&gt;https://postgr.es/c/ae4569161&lt;/a&gt;).
This happens at the planner level, but let&apos;s take a look at how we can do the same thing explicitly using a pg_query rewrite.&lt;/p&gt;
&lt;p&gt;Let&apos;s look at a query that compares the id column to multiple constants:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;EXPLAIN SELECT id FROM items WHERE id = 41 OR id = 42 OR id = 43;
                                  QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on items  (cost=12.89..24.37 rows=3 width=8)
   Recheck Cond: ((id = 41) OR (id = 42) OR (id = 43))
   -&gt;  BitmapOr  (cost=12.89..12.89 rows=3 width=0)
         -&gt;  Bitmap Index Scan on items_pkey  (cost=0.00..4.29 rows=1 width=0)
               Index Cond: (id = 41)
         -&gt;  Bitmap Index Scan on items_pkey  (cost=0.00..4.29 rows=1 width=0)
               Index Cond: (id = 42)
         -&gt;  Bitmap Index Scan on items_pkey  (cost=0.00..4.29 rows=1 width=0)
               Index Cond: (id = 43)
(9 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, let&apos;s rewrite it with ANY:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;EXPLAIN SELECT id FROM items WHERE id = ANY(&apos;{41,42,43}&apos;);
                                   QUERY PLAN
----------------------------------------------------------------------------------
 Index Only Scan using items_pkey on items  (cost=0.29..12.92 rows=3 width=8)
   Index Cond: (id = ANY (&apos;{41,42,43}&apos;::bigint[]))
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice how the cost dropped from &lt;strong&gt;24.37&lt;/strong&gt; down to &lt;strong&gt;12.92&lt;/strong&gt;. The query returns the same results, but instead of three Bitmap Index Scans, it uses a single Index Only Scan. Let&apos;s take a closer look at the parse tree.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pganalyze.com/418e8a95b332142ccea7dbf798fc9c42/parse_tree_multiple_or_clauses.svg&quot; alt=&quot;Diagram of the parse tree for multiple OR clauses&quot;&gt;&lt;/p&gt;
&lt;p&gt;At the parse tree level, the first query is represented as a bool_expr (&lt;code &gt;BoolExpr&lt;/code&gt;) with &lt;code &gt;OR_EXPR&lt;/code&gt;, containing three equality expressions (&lt;code &gt;id = 41&lt;/code&gt;, &lt;code &gt;id = 42&lt;/code&gt;, &lt;code &gt;id = 43&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;The ANY form, on the other hand, is represented as an a_expr (&lt;code &gt;A_Expr&lt;/code&gt;) with &lt;code &gt;ANY&lt;/code&gt;. This corresponds to &lt;code &gt;= ANY(array)&lt;/code&gt;, with the column id on the left and an array of constants on the right (&lt;code &gt;{41,42,43}&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pganalyze.com/9881d6497aefb5f3b824dd3a7b374737/parse_tree_any_a_expr.svg&quot; alt=&quot;Diagram of the parse tree for ANY a_expr&quot;&gt;&lt;/p&gt;
&lt;p&gt;The rewrite steps are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Find an OR expression made up of multiple &lt;code &gt;id = &amp;lt;const&gt;&lt;/code&gt; comparisons&lt;/li&gt;
&lt;li&gt;Collect all the constants&lt;/li&gt;
&lt;li&gt;Replace the &lt;code &gt;BoolExpr&lt;/code&gt; with an &lt;code &gt;A_Expr&lt;/code&gt; representing &lt;code &gt;id = ANY(array)&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the example code below, to simplify the replace step, it replaces the matching args elements within the &lt;code &gt;BoolExpr&lt;/code&gt; node with a single &lt;code &gt;A_Expr&lt;/code&gt; node, collapsing the OR chain into &lt;code &gt;= ANY(...)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In Example #1, we just drilled down to a node and swapped it. This time, let&apos;s walk the whole tree, find any matching pattern, and rewrite it.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;transform_or_to_any&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;query&lt;span &gt;)&lt;/span&gt;
  parsed_query &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;query&lt;span &gt;)&lt;/span&gt;
  parsed_query&lt;span &gt;.&lt;/span&gt;walk&lt;span &gt;!&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;node&lt;span &gt;|&lt;/span&gt;
    &lt;span &gt;# Find the BoolExpr node with OR_EXPR&lt;/span&gt;
    &lt;span &gt;next&lt;/span&gt; &lt;span &gt;unless&lt;/span&gt; node&lt;span &gt;.&lt;/span&gt;is_a&lt;span &gt;?&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;BoolExpr&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; node&lt;span &gt;.&lt;/span&gt;boolop &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:OR_EXPR&lt;/span&gt;
    keep_as_is &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
    group_by_lexpr &lt;span &gt;=&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;
    node&lt;span &gt;.&lt;/span&gt;args&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;arg&lt;span &gt;|&lt;/span&gt;
      &lt;span &gt;# Note: only group when the arg is ColumnRef = A_Const (e.g. col1 = 123)&lt;/span&gt;
      &lt;span &gt;# For other cases (e.g. col1 IS TRUE, col1 != 345), leave it as is&lt;/span&gt;
      &lt;span &gt;if&lt;/span&gt; arg&lt;span &gt;.&lt;/span&gt;node &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:a_expr&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt;
          arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;name&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;.&lt;/span&gt;node &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:string&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt;
          arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;name&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;.&lt;/span&gt;string&lt;span &gt;.&lt;/span&gt;sval &lt;span &gt;==&lt;/span&gt; &lt;span &gt;&apos;=&apos;&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt;
          arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;lexpr&lt;span &gt;.&lt;/span&gt;node &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:column_ref&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt;
          arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;rexpr&lt;span &gt;.&lt;/span&gt;node &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:a_const&lt;/span&gt;
        &lt;span &gt;# In order to use this as a hash key, remove the location info by setting to 0&lt;/span&gt;
        arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;lexpr&lt;span &gt;.&lt;/span&gt;inner&lt;span &gt;.&lt;/span&gt;location &lt;span &gt;=&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;
        group_by_lexpr&lt;span &gt;[&lt;/span&gt;arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;lexpr&lt;span &gt;]&lt;/span&gt; &lt;span &gt;||&lt;/span&gt;&lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
        group_by_lexpr&lt;span &gt;[&lt;/span&gt;arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;lexpr&lt;span &gt;]&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; arg&lt;span &gt;.&lt;/span&gt;a_expr&lt;span &gt;.&lt;/span&gt;rexpr&lt;span &gt;.&lt;/span&gt;dup
      &lt;span &gt;else&lt;/span&gt;
        keep_as_is &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; arg&lt;span &gt;.&lt;/span&gt;dup
      &lt;span &gt;end&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
    &lt;span &gt;# No multiple ORs with the same column (lexpr)&lt;/span&gt;
    &lt;span &gt;next&lt;/span&gt; &lt;span &gt;unless&lt;/span&gt; group_by_lexpr&lt;span &gt;.&lt;/span&gt;any&lt;span &gt;?&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;k&lt;span &gt;,&lt;/span&gt; v&lt;span &gt;|&lt;/span&gt; v&lt;span &gt;.&lt;/span&gt;length &lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;1&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;

    &lt;span &gt;# Create new args with AEXPR_OP_ANY a_expr for grouped args&lt;/span&gt;
    any_args &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
    group_by_lexpr&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;lexpr&lt;span &gt;,&lt;/span&gt; rexprs&lt;span &gt;|&lt;/span&gt;
      &lt;span &gt;if&lt;/span&gt; rexprs&lt;span &gt;.&lt;/span&gt;length &lt;span &gt;==&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;
        keep_as_is &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
          a_expr&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;A_Expr&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
            kind&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:AEXPR_OP&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;string&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;String&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;sval&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;=&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            lexpr&lt;span &gt;:&lt;/span&gt; lexpr&lt;span &gt;,&lt;/span&gt;
            rexpr&lt;span &gt;:&lt;/span&gt; rexprs&lt;span &gt;.&lt;/span&gt;first
          &lt;span &gt;)&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;
      &lt;span &gt;else&lt;/span&gt;
        any_args &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
          a_expr&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;A_Expr&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
            kind&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:AEXPR_OP_ANY&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;string&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;String&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;sval&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;=&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            lexpr&lt;span &gt;:&lt;/span&gt; lexpr&lt;span &gt;,&lt;/span&gt;
            rexpr&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;a_const&lt;span &gt;:&lt;/span&gt; &lt;span &gt;QueryParameters&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;values_to_array&lt;span &gt;(&lt;/span&gt;rexprs&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
          &lt;span &gt;)&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
    node&lt;span &gt;.&lt;/span&gt;args&lt;span &gt;.&lt;/span&gt;replace&lt;span &gt;(&lt;/span&gt;any_args &lt;span &gt;+&lt;/span&gt; keep_as_is&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  parsed_query&lt;span &gt;.&lt;/span&gt;deparse
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now let&apos;s try the transform on some variations:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# Simple case&lt;/span&gt;
transform_or_to_any&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SELECT id FROM items WHERE id = 41 OR id = 42 OR id = 43&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;SELECT id FROM items WHERE id = ANY(&apos;{41,42,43}&apos;)&quot;&lt;/span&gt;

&lt;span &gt;# With AND and OR&lt;/span&gt;
transform_or_to_any&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SELECT id FROM items WHERE id = 41 OR id = 42 AND id = 43 OR id = 44&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;SELECT id FROM items WHERE id = ANY(&apos;{41,44}&apos;) OR (id = 42 AND id = 43)&quot;&lt;/span&gt;

&lt;span &gt;# ORs in subqueries or UNIONs&lt;/span&gt;
transform_or_to_any&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;~&lt;/span&gt;&lt;span &gt;SQL&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; id &lt;span &gt;FROM&lt;/span&gt; items &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;IN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;SELECT&lt;/span&gt; id &lt;span &gt;FROM&lt;/span&gt; items2 &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;41&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;42&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;UNION&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; id &lt;span &gt;FROM&lt;/span&gt; items3 &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;43&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;44&lt;/span&gt;
&lt;span &gt;SQL&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;SELECT id FROM items WHERE id IN (SELECT id FROM items2 WHERE id = ANY(&apos;{41,42}&apos;)) UNION SELECT id FROM items3 WHERE id = ANY(&apos;{43,44}&apos;)&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see that it transforms ORs properly no matter where they appear in the query.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The examples we looked at, such as adding an expression to influence index usage or transforming multiple OR clauses into ANY, show how query rewriting with pg_query can solve real-world problems in a safe and consistent way. These are only a few cases, and the same approach can be applied to many other kinds of transformations.&lt;/p&gt;
&lt;p&gt;In developing the Query Advisor feature, rewriting the query using pg_query has been an essential piece. We hope that sharing these examples encourages you to explore what is possible with this library in your own projects.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O]]></title><description><![CDATA[With the Postgres 18 Beta 1 release this week, a multi-year effort and significant architectural shift in Postgres is taking shape: Asynchronous I/O (AIO). These capabilities are still under active development, but they represent a fundamental change in how Postgres handles I/O, offering the potential for significant performance gains, particularly in cloud environments where latency is often the bottleneck. Why asynchronous I/O matters How Postgres 17’s read streams paved the way New io_method…]]></description><link>https://pganalyze.com/blog/postgres-18-async-io</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-18-async-io</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Wed, 07 May 2025 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;With the &lt;a href=&quot;https://www.postgresql.org/about/news/postgresql-18-beta-1-released-3070/&quot;&gt;Postgres 18 Beta 1&lt;/a&gt; release this week, a multi-year effort and significant architectural shift in Postgres is taking shape: &lt;strong&gt;Asynchronous I/O (AIO)&lt;/strong&gt;. These capabilities are still under active development, but they represent a fundamental change in how Postgres handles I/O, offering the potential for significant performance gains, particularly in cloud environments where latency is often the bottleneck.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#why-asynchronous-io-matters&quot;&gt;Why asynchronous I/O matters&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#how-postgres-17s-read-streams-paved-the-way&quot;&gt;How Postgres 17’s read streams paved the way&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#new-io_method-setting-in-postgres-18&quot;&gt;New io_method setting in Postgres 18&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#io_method--sync&quot;&gt;io_method = sync&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#io_method--worker&quot;&gt;io_method = worker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#io_method--io_uring&quot;&gt;io_method = io_uring&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#asynchronous-io-in-action&quot;&gt;Asynchronous I/O in action&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#benchmark-on-aws-doubling-read-performance--even-greater-gains-from-io_uring&quot;&gt;Benchmark on AWS: Doubling read performance &amp;#x26; even greater gains from io_uring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#tuning-effective_io_concurrency&quot;&gt;Tuning effective_io_concurrency&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#monitoring-ios-in-flight-with-pg_aios&quot;&gt;Monitoring I/Os in flight with pg_aios&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#heads-up-async-io-makes-io-timing-information-hard-to-interpret&quot;&gt;Heads Up: Async I/O makes I/O timing information hard to interpret&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#in-summary&quot;&gt;In summary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;While some features may still be adjusted or dropped during the beta period before the final release, now is the best time to test and validate how Postgres 18 performs in practice. In Postgres 18 AIO is limited to read operations; writes remain synchronous, though support may expand in future versions.&lt;/p&gt;
&lt;p&gt;In this post, we explain what asynchronous I/O is, how it works in Postgres 18, and what it means for performance optimization.&lt;/p&gt;
&lt;h2 id=&quot;why-asynchronous-io-matters&quot; &gt;&lt;a href=&quot;#why-asynchronous-io-matters&quot; aria-label=&quot;why asynchronous io matters permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Why asynchronous I/O matters&lt;/h2&gt;
&lt;p&gt;Postgres has historically operated under a synchronous I/O model, meaning every read request is a blocking system call. The database must pause and wait for the operating system to return the data before continuing. This design introduces unnecessary waits on I/O, especially in cloud environments where storage is often network-attached (e.g. Amazon EBS) and I/O can have over 1ms of latency.&lt;/p&gt;
&lt;p&gt;In a simplified model, we can illustrate the difference like this, ignoring any prefetching/batching the Linux kernel might do:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Diagram showing synchronous vs asynchronous I/O model with concurrent requests&quot;
        title=&quot;In the asynchronous I/O model, multiple read requests can be in flight simultaneously&quot;
        src=&quot;https://pganalyze.com/static/cd0be5dde105345bb288ac73655b90f1/1d69c/sync_vs_async.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;You can picture synchronous I/O like an imaginary librarian who retrieves one book at a time, returning before fetching the next. This inefficiency compounds as the number of physical reads for a logical operation increases.&lt;/p&gt;
&lt;p&gt;Asynchronous I/O eliminates that bottleneck by allowing programs to issue multiple read requests concurrently, without waiting for prior reads to return. In an async program flow, I/O requests are scheduled to be read into a memory location and the program waits for completion of those reads, instead of issuing each read individually.&lt;/p&gt;
&lt;h3 id=&quot;how-postgres-17s-read-streams-paved-the-way&quot; &gt;&lt;a href=&quot;#how-postgres-17s-read-streams-paved-the-way&quot; aria-label=&quot;how postgres 17s read streams paved the way permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How Postgres 17’s read streams paved the way&lt;/h3&gt;
&lt;p&gt;The work for implementing asynchronous I/O in Postgres has been many years in the making. Postgres 17 introduced an essential internal abstraction, &lt;a src=&quot;https://pganalyze.com/blog/5mins-postgres-17-streaming-io&quot;&gt;with the introduction of read stream APIs&lt;/a&gt;. These internal changes standardized how read operations were issued across different subsystems and streamlined the use of &lt;code &gt;posix_fadvise()&lt;/code&gt; to request that the operating system prefetch data in advance.&lt;/p&gt;
&lt;p&gt;However, this advisory mechanism only hinted to the kernel to load data into the OS page cache, not into Postgres’ own shared buffers. Postgres still had to issue syscalls for each read, and OS readahead behaviour is not always consistent.&lt;/p&gt;
&lt;p&gt;The upcoming Postgres 18 release removes this indirection. With true asynchronous reads, data is fetched directly into shared buffers by the database itself, bypassing reliance on kernel-level heuristics and enabling more predictable, higher-throughput I/O behavior.&lt;/p&gt;
&lt;div &gt;
  &lt;a href=&quot;https://pganalyze.com/tools/postgres-performance-check-list?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=postgres_performance_checklist&amp;amp;utm_content=postgres-18-async-io&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;
    &lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Prevent Postgres slowdowns with this performance checklist&quot; title=&quot;Prevent Postgres slowdowns with this performance checklist&quot; src=&quot;https://pganalyze.com/static/929a7f456fb9ee2a562bd4a9d7a54f9a/1d69c/Check-list-blog-banner-ad.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
  &lt;/a&gt;
&lt;/div&gt;
&lt;h2 id=&quot;new-io_method-setting-in-postgres-18&quot; &gt;&lt;a href=&quot;#new-io_method-setting-in-postgres-18&quot; aria-label=&quot;new io_method setting in postgres 18 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;New io_method setting in Postgres 18&lt;/h2&gt;
&lt;p&gt;To control the mechanism used for asynchronous I/O, Postgres 18 introduces a new configuration parameter: &lt;code &gt;io_method&lt;/code&gt;. This setting determines how read operations are dispatched under the hood, and whether they’re handled synchronously, offloaded to I/O workers, or submitted directly to the kernel via &lt;code &gt;io_uring&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code &gt;io_method&lt;/code&gt; setting must be set in postgresql.conf and cannot be changed without restarting. It controls which  I/O implementation Postgres will use and is essential to understand when tuning I/O performance in Postgres 18. There are three possible settings for io_method, with the current default (as of Beta 1) being &lt;code &gt;worker&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;io_method--sync&quot; &gt;&lt;a href=&quot;#io_method--sync&quot; aria-label=&quot;io_method  sync permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;io_method = sync&lt;/h3&gt;
&lt;p&gt;The &lt;code &gt;sync&lt;/code&gt; setting in Postgres 18 mirrors the synchronous behavior as was implemented in Postgres 17. Reads are still synchronous and blocking, using &lt;code &gt;posix_fadvise()&lt;/code&gt; to achieve read-ahead in the Linux kernel.&lt;/p&gt;
&lt;h3 id=&quot;io_method--worker&quot; &gt;&lt;a href=&quot;#io_method--worker&quot; aria-label=&quot;io_method  worker permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;io_method = worker&lt;/h3&gt;
&lt;p&gt;The &lt;code &gt;worker&lt;/code&gt; setting utilizes dedicated &lt;strong&gt;I/O worker processes&lt;/strong&gt; running in the background that retrieve data independently of query execution. The main backend process enqueues read requests, and these workers interact with the Linux kernel to fetch data, which is then delivered into shared buffers, &lt;strong&gt;without blocking the main process&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The number of I/O workers can be configured through the new &lt;code &gt;io_workers&lt;/code&gt; setting, and defaults to &lt;code &gt;3&lt;/code&gt;. These workers are always running, and shared across all connections and databases.&lt;/p&gt;
&lt;h3 id=&quot;io_method--io_uring&quot; &gt;&lt;a href=&quot;#io_method--io_uring&quot; aria-label=&quot;io_method  io_uring permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;io_method = io_uring&lt;/h3&gt;
&lt;p&gt;This Linux-specific method uses &lt;strong&gt;&lt;code &gt;io_uring&lt;/code&gt;&lt;/strong&gt;, a high-performance I/O interface introduced in kernel version 5.1. Asynchronous I/O has been available in Linux since kernel version 2.5, but it was largely considered inefficient and hard to use. &lt;code &gt;io_uring&lt;/code&gt; establishes a &lt;strong&gt;shared ring buffer&lt;/strong&gt; between Postgres and the kernel, minimizing syscall overhead. This is the most efficient option, &lt;strong&gt;eliminating the need for I/O worker processes entirely&lt;/strong&gt;, but is only available on newer Linux kernels and requires file systems and configurations compatible with &lt;code &gt;io_uring&lt;/code&gt; support.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important note:&lt;/strong&gt; As of the Postgres 18 Beta 1, asynchronous I/O is supported for sequential scans, bitmap heap scans, and maintenance operations like &lt;code &gt;VACUUM&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&quot;asynchronous-io-in-action&quot; &gt;&lt;a href=&quot;#asynchronous-io-in-action&quot; aria-label=&quot;asynchronous io in action permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Asynchronous I/O in action&lt;/h2&gt;
&lt;p&gt;Asynchronous I/O delivers the most noticeable gains in cloud environments where storage is network-attached, such as Amazon EBS volumes. In these setups, individual disk reads often take multiple milliseconds, introducing substantial latency compared to local SSDs.&lt;/p&gt;
&lt;p&gt;With traditional synchronous I/O, each of these reads blocks query execution until the data arrives, leading to idle CPU time and degraded throughput. By contrast, asynchronous I/O allows Postgres to issue multiple read requests in parallel and continue processing while waiting for results. This reduces query latency and enables much more efficient use of available I/O bandwidth and CPU cycles.&lt;/p&gt;
&lt;h3 id=&quot;benchmark-on-aws-doubling-read-performance--even-greater-gains-from-io_uring&quot; &gt;&lt;a href=&quot;#benchmark-on-aws-doubling-read-performance--even-greater-gains-from-io_uring&quot; aria-label=&quot;benchmark on aws doubling read performance  even greater gains from io_uring permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Benchmark on AWS: Doubling read performance &amp;#x26; even greater gains from io_uring&lt;/h3&gt;
&lt;p&gt;To evaluate the performance impact of asynchronous I/O, we benchmarked a representative workload on AWS, comparing Postgres 17 with Postgres 18 using different &lt;code &gt;io_method&lt;/code&gt; settings. The workload remained identical across versions, allowing us to isolate the effects of the new I/O infrastructure.&lt;/p&gt;
&lt;p&gt;We&apos;ve tested on an AWS c7i.8xlarge instance (32 vCPUs, 64 GB RAM), with a dedicated 100GB &lt;code &gt;io2&lt;/code&gt; EBS volume for Postgres, with 20,000 provisioned IOPS. The test table was 3.5GB in size:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; test&lt;span &gt;(&lt;/span&gt;id &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; test &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; generate_series&lt;span &gt;(&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;100000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;test=# \dt+
                                   List of relations
 Schema | Name | Type  |  Owner   | Persistence | Access method |  Size   | Description 
--------+------+-------+----------+-------------+---------------+---------+-------------
 public | test | table | postgres | permanent   | heap          | 3458 MB | 
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Between test runs we cleared the OS page cache (&lt;code &gt;sync; echo 3 &gt; /proc/sys/vm/drop_caches&lt;/code&gt;), and restarted Postgres, to gather cold cache results. Warm cache results represent running the query a second time. We repeated the complete test run for each configuration multiple times, retaining the best result out of three.&lt;/p&gt;
&lt;p&gt;Whilst we also tested with parallel query, to keep results easier to understand all results below are with parallel query turned off (&lt;code &gt;max_parallel_workers_per_gather = 0&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cold cache results:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Postgres 17, using synchronous I/O, established the baseline. It showed consistent read latency, but throughput was limited by the need to complete each I/O request before issuing the next:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 15830.880 ms (00:15.831)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Postgres 18, when configured with &lt;code &gt;io_method = sync&lt;/code&gt;, performed nearly identically, confirming that behavior remains unchanged without enabling asynchronous I/O:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 15071.089 ms (00:15.071)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;However, when we switch to using the &lt;code &gt;worker&lt;/code&gt; method, with 3 I/O workers (the default) a clear improvement shows:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 10051.975 ms (00:10.052)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We observed some gains by raising the number of I/O workers, but the biggested improvement comes when utilizing &lt;code &gt;io_uring&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 5723.423 ms (00:05.723)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When we graph this (measuring runtime in ms, lower is better), it’s clear that Postgres 18 performs significantly better in cold cache situations:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Read performance comparison between Postgres 17 and 18 with different io_method settings&quot;
        title=&quot;Read performance comparison between Postgres 17 and 18 with different io_method settings&quot;
        src=&quot;https://pganalyze.com/static/506febf39b7d14c7ba413260d30b63cc/1d69c/runtime-compared.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;For cold cache tests, both &lt;code &gt;worker&lt;/code&gt; and &lt;code &gt;io_uring&lt;/code&gt; delivered a consistent &lt;strong&gt;2-3x improvement&lt;/strong&gt; in read performance compared to the legacy &lt;code &gt;sync&lt;/code&gt; method.&lt;/p&gt;
&lt;p&gt;Whilst &lt;code &gt;worker&lt;/code&gt; offers a slight benefit for warm cache tests due to its parallelism, &lt;code &gt;io_uring&lt;/code&gt; consistently performed better in cold cache tests, and its lower syscall overhead and reduced process coordination would make &lt;strong&gt;&lt;code &gt;io_uring&lt;/code&gt; the recommended setting&lt;/strong&gt; for maximizing I/O performance in Postgres 18.&lt;/p&gt;
&lt;p&gt;This performance shift for disk reads has meaningful implications for infrastructure planning, especially in cloud environments. By reducing I/O wait time, asynchronous reads can substantially increase query throughput, reduce latency and CPU overhead. For read-heavy workloads, this may translate into smaller instance sizes or better utilization of existing resources.&lt;/p&gt;
&lt;h3 id=&quot;tuning-effective_io_concurrency&quot; &gt;&lt;a href=&quot;#tuning-effective_io_concurrency&quot; aria-label=&quot;tuning effective_io_concurrency permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Tuning effective_io_concurrency&lt;/h3&gt;
&lt;p&gt;In Postgres 18, &lt;code &gt;effective_io_concurrency&lt;/code&gt; becomes more interesting, but only when used with an asynchronous &lt;code &gt;io_method&lt;/code&gt; such as &lt;code &gt;worker&lt;/code&gt; or &lt;code &gt;io_uring&lt;/code&gt;. Previously, this setting merely advised the OS to prefetch data using &lt;code &gt;posix_fadvise&lt;/code&gt;. Now, it directly controls how many asynchronous read-ahead requests Postgres issues internally.&lt;/p&gt;
&lt;p&gt;The number of blocks read ahead is influenced by both &lt;code &gt;effective_io_concurrency&lt;/code&gt; and &lt;code &gt;io_combine_limit&lt;/code&gt;, following the general formula:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;maximum read-ahead = effective_io_concurrency × io_combine_limit&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This gives DBAs and engineers greater control over I/O behavior. The optimal value requires benchmarking, as it depends on your I/O subsystem. For example, higher values may benefit cloud environments with high latency that also support high concurrency, like AWS EBS with high provisioned IOPS.&lt;/p&gt;
&lt;p&gt;When doing our benchmarks, we also tested higher &lt;code &gt;effective_io_concurrency&lt;/code&gt; (between 16 and 128) but did not see a meaningful difference. However, that is likely due to the simple test query used.&lt;/p&gt;
&lt;p&gt;It’s worth noting that the previous default of effective_io_concurrency was 1 in Postgres 17, which is now raised to 16, &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ff79b5b2ab&quot;&gt;based on benchmarks done by the Postgres community&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;monitoring-ios-in-flight-with-pg_aios&quot; &gt;&lt;a href=&quot;#monitoring-ios-in-flight-with-pg_aios&quot; aria-label=&quot;monitoring ios in flight with pg_aios permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Monitoring I/Os in flight with pg_aios&lt;/h3&gt;
&lt;p&gt;As mentioned, previous versions of Postgres with synchronous I/O made it easy to spot read delays: the backend process would block while waiting for disk access, and monitoring tools like pganalyze can reliably surface &lt;code &gt;IO / DataFileRead&lt;/code&gt; as a wait event during these stalls.&lt;/p&gt;
&lt;p&gt;For example, here we can see wait events clearly in Postgres 17 synchronous I/O.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Screenshot of pganalyze showing wait events in Postgres 17&quot;
        title=&quot;pganalyze interface showing clear IO / DataFileRead wait events in Postgres 17&quot;
        src=&quot;https://pganalyze.com/static/67303bca18e1ab006c16c26979172b33/1d69c/wait_events_io_read.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;With asynchronous I/O in Postgres 18, backend wait behavior changes. When using &lt;code &gt;io_method = worker&lt;/code&gt;, the backend process delegates reads to a separate I/O worker. As a result, the backend may appear idle or show the new &lt;code &gt;IO / AioIoCompletion&lt;/code&gt; wait event, while the I/O worker shows the actual I/O wait events:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; backend_type&lt;span &gt;,&lt;/span&gt; query&lt;span &gt;,&lt;/span&gt; state&lt;span &gt;,&lt;/span&gt; wait_event_type&lt;span &gt;,&lt;/span&gt; wait_event
  &lt;span &gt;FROM&lt;/span&gt; pg_stat_activity
 &lt;span &gt;WHERE&lt;/span&gt; backend_type &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;client backend&apos;&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; backend_type &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;io worker&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;  backend_type  | state  | wait_event_type |   wait_event    
----------------+--------+-----------------+-----------------
 client backend | active | IO              | AioIoCompletion
 io worker      |        | IO              | DataFileRead
 io worker      |        | IO              | DataFileRead
 io worker      |        | IO              | DataFileRead
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With &lt;code &gt;io_method = io_uring&lt;/code&gt;, read operations are submitted directly to the kernel and completed asynchronously. The backend does not block on a traditional I/O syscall, so this activity is not visible from the Postgres side, even though I/O is in progress.&lt;/p&gt;
&lt;p&gt;To help with debugging of I/O requests in flight, the new &lt;code &gt;pg_aios&lt;/code&gt; view can show Postgres internal state, even when using &lt;code &gt;io_uring&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_aios&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;  pid  | io_id | io_generation |    state     | operation |    off    | length | target | handle_data_len | raw_result | result  |                   target_desc                    | f_sync | f_localmem | f_buffered 
-------+-------+---------------+--------------+-----------+-----------+--------+--------+-----------------+------------+---------+--------------------------------------------------+--------+------------+------------
 91452 |     1 |          4781 | SUBMITTED    | read      | 996278272 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383760..383775 in file &quot;base/16384/16389&quot; | f      | f          | t
 91452 |     2 |          4785 | SUBMITTED    | read      | 996147200 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383744..383759 in file &quot;base/16384/16389&quot; | f      | f          | t
 91452 |     3 |          4796 | SUBMITTED    | read      | 996409344 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383776..383791 in file &quot;base/16384/16389&quot; | f      | f          | t
 91452 |     4 |          4802 | SUBMITTED    | read      | 996016128 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383728..383743 in file &quot;base/16384/16389&quot; | f      | f          | t
 91452 |     5 |          3175 | COMPLETED_IO | read      | 995885056 | 131072 | smgr   |              16 |     131072 | UNKNOWN | blocks 383712..383727 in file &quot;base/16384/16389&quot; | f      | f          | t
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Understanding these behavior changes and understanding the impact of asynchronous execution is essential when optimizing I/O performance in Postgres 18.&lt;/p&gt;
&lt;h2 id=&quot;heads-up-async-io-makes-io-timing-information-hard-to-interpret&quot; &gt;&lt;a href=&quot;#heads-up-async-io-makes-io-timing-information-hard-to-interpret&quot; aria-label=&quot;heads up async io makes io timing information hard to interpret permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Heads Up: Async I/O makes I/O timing information hard to interpret&lt;/h2&gt;
&lt;p&gt;Asynchronous I/O introduces a shift in how execution timing is reported. When the backend no longer blocks directly on disk reads (as is the case with &lt;code &gt;worker&lt;/code&gt; or &lt;code &gt;io_uring&lt;/code&gt;) the complete time spent doing I/O may not be reflected in &lt;code &gt;EXPLAIN ANALYZE&lt;/code&gt; output. This can make I/O-bound queries seem to require less I/O effort than previously.&lt;/p&gt;
&lt;p&gt;First, let&apos;s run the earlier query in &lt;code &gt;EXPLAIN ANALYZE&lt;/code&gt; on a cold cache in Postgres 17:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;test=# EXPLAIN (ANALYZE, BUFFERS, TIMING OFF) SELECT COUNT(*) FROM test;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1692478.40..1692478.41 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared read=442478
   I/O Timings: shared read=14779.316
   -&gt;  Seq Scan on test  (cost=0.00..1442478.32 rows=100000032 width=0) (actual rows=100000001 loops=1)
         Buffers: shared read=442478
         I/O Timings: shared read=14779.316
 Planning:
   Buffers: shared hit=13 read=6
   I/O Timings: shared read=3.182
 Planning Time: 8.136 ms
 Execution Time: 18006.405 ms
(11 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We&apos;ve read 442,478 buffers in 14.8 seconds.&lt;/p&gt;
&lt;p&gt;And now, we repeat the test on Postgres 18 with the default settings (&lt;code &gt;io_method = worker&lt;/code&gt;):&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;test=# EXPLAIN (ANALYZE, BUFFERS, TIMING OFF) SELECT COUNT(*) FROM test;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1692478.40..1692478.41 rows=1 width=8) (actual rows=1.00 loops=1)
   Buffers: shared read=442478
   I/O Timings: shared read=7218.835
   -&gt;  Seq Scan on test  (cost=0.00..1442478.32 rows=100000032 width=0) (actual rows=100000001.00 loops=1)
         Buffers: shared read=442478
         I/O Timings: shared read=7218.835
 Planning:
   Buffers: shared hit=13 read=6
   I/O Timings: shared read=2.709
 Planning Time: 2.925 ms
 Execution Time: 10480.827 ms
(11 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We&apos;ve read 442,478 buffers in 7.2 seconds.&lt;/p&gt;
&lt;p&gt;Whilst with parallel query we get a summary of all the I/O time across all parallel workers, no such summarization occurs with I/O workers. What we are seeing is the wait time for the I/O to be completed, ignoring any parallelism that may happen behind the scenes.&lt;/p&gt;
&lt;p&gt;This is technically not a behaviour change, since even in Postgres 17 the time reported was the time spent waiting on I/Os, not the time spent performing the I/O, e.g. Kernel I/O time for readahead was never accounted for.&lt;/p&gt;
&lt;p&gt;Historically I/O timing was often equated with I/O effort, instead of just looking at shared buffer read counts, in order to distinguish from a OS page cache hit. Now, in Postgres 18, interpreting I/O timing requires more caution: asynchronous I/O can hide I/O overhead in query plans.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;To summarize, the upcoming release of Postgres 18 marks the beginning of a major evolution in how I/O is handled. While currently limited to reads, asynchronous I/O already opens the door to significant performance improvements in high-latency cloud environments.&lt;/p&gt;
&lt;p&gt;But some of these gains come with tradeoffs. Engineering teams will need to adjust their observability practices, learn new semantics for timing and wait events, and perhaps revisit tuning parameters with previously limited impact, like &lt;code &gt;effective_io_conurrency&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;in-summary&quot; &gt;&lt;a href=&quot;#in-summary&quot; aria-label=&quot;in summary permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;In summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Asynchronous I/O support in Postgres 18 introduces &lt;code &gt;worker&lt;/code&gt; (as the default) and &lt;code &gt;io_uring&lt;/code&gt; options under the new &lt;code &gt;io_method&lt;/code&gt; setting.&lt;/li&gt;
&lt;li&gt;Benchmarks show up to a 2-3x throughput improvement for read-heavy workloads in cloud environments.&lt;/li&gt;
&lt;li&gt;Observability practices need to evolve: &lt;code &gt;EXPLAIN ANALYZE&lt;/code&gt; may underreport I/O effort, and new views like &lt;code &gt;pg_aios&lt;/code&gt; will help provide insights.&lt;/li&gt;
&lt;li&gt;Tools like pganalyze will be adapting to these changes to continue surfacing relevant performance insights.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As Postgres development continues, future versions (19 and beyond) may bring asynchronous write support, further reducing I/O bottlenecks in modern workloads, and enabling production use of Direct I/O.&lt;/p&gt;
&lt;h3 id=&quot;references&quot; &gt;&lt;a href=&quot;#references&quot; aria-label=&quot;references permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;References&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-IO-METHOD&quot;&gt;PostgreSQL &lt;code &gt;io_method&lt;/code&gt; GUC (Postgres 18)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-EFFECTIVE-IO-CONCURRENCY&quot;&gt;PostgreSQL &lt;code &gt;effective_io_concurrency&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/storage-buffer.html&quot;&gt;PostgreSQL Shared Buffers and Buffer Management&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW&quot;&gt;&lt;code &gt;pg_stat_activity&lt;/code&gt; View&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/devel/monitoring-stats.html#PG-STAT-IO-VIEW&quot;&gt;&lt;code &gt;pg_stat_io&lt;/code&gt; View&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/devel/monitoring-stats.html#PG-AIOS-VIEW&quot;&gt;&lt;code &gt;pg_aios&lt;/code&gt; View (New in Postgres 18)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://man7.org/linux/man-pages/man2/posix_fadvise.2.html&quot;&gt;&lt;code &gt;posix_fadvise()&lt;/code&gt; System Call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.google.com/url?q=https://www.man7.org/linux/man-pages/man7/io_uring.7.html&amp;#x26;sa=D&amp;#x26;source=docs&amp;#x26;ust=1746206271490972&amp;#x26;usg=AOvVaw1B_RmjsiRaB-HDroNJCv6b&quot;&gt;Linux io_uring Man Page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-17-streaming-io&quot;&gt;5mins of Postgres: Waiting for Postgres 17: Streaming I/O for sequential scans &amp;#x26; ANALYZE&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Postgres vs. SQL Server: B-Tree Index Differences & the Benefit of Deduplication]]></title><description><![CDATA[When it comes to optimizing query performance, indexing is one of the most powerful tools available to database engineers. Both PostgreSQL and Microsoft SQL Server (or Azure SQL) use B-Tree indexes as their default indexing structure, but the way each system implements, maintains, and uses those indexes varies in subtle but important ways. In this blog post, we explore key areas where PostgreSQL and SQL Server diverge: how their B-Tree indexes implementations behave under the hood and how they…]]></description><link>https://pganalyze.com/blog/postgresql-vs-sql-server-btree-index-deduplication</link><guid isPermaLink="false">https://pganalyze.com/blog/postgresql-vs-sql-server-btree-index-deduplication</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Thu, 03 Apr 2025 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;When it comes to optimizing query performance, indexing is one of the most powerful tools available to database engineers. Both PostgreSQL and Microsoft SQL Server (or Azure SQL) use B-Tree indexes as their default indexing structure, but the way each system implements, maintains, and uses those indexes varies in subtle but important ways.&lt;/p&gt;
&lt;p&gt;In this blog post, we explore key areas where PostgreSQL and SQL Server diverge: how their B-Tree indexes implementations behave under the hood and how they store and access data on disk. We&apos;ll also benchmark the impact of deduplication of values on index size in each database system.&lt;/p&gt;
&lt;p&gt;We&apos;ve also included a comprehensive reference guide at the end (see &lt;a href=&quot;#comparison-table-postgresql-vs-sql-server-indexing&quot;&gt;Postgres vs. SQL Server Index Comparison Table&lt;/a&gt;). Whether you&apos;re optimizing queries or planning a migration, these differences can have a meaningful impact on both performance and indexing strategy.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#how-b-tree-indexing-works-in-postgresql-vs-sql-server&quot;&gt;How B-Tree indexing works in PostgreSQL vs. SQL Server&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#postgresqls-b-tree-deduplication&quot;&gt;PostgreSQL&apos;s B-Tree deduplication&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#benchmarking-b-tree-indexes-on-postgresql-vs-sql-server&quot;&gt;Benchmarking B-Tree indexes on PostgreSQL vs. SQL Server&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#postgresql-test-setup&quot;&gt;PostgreSQL Test Setup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#sql-server-test-setup&quot;&gt;SQL Server Test Setup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#benchmark-results-postgresqls-deduplication-reduces-index-size&quot;&gt;Benchmark results: PostgreSQL&apos;s deduplication reduces index size&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#comparison-table-postgresql-vs-sql-server-indexing&quot;&gt;Comparison Table: PostgreSQL vs. SQL Server Indexing&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#choosing-the-right-index-for-your-workload&quot;&gt;Choosing the right index for your workload&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#references&quot;&gt;References:&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;how-b-tree-indexing-works-in-postgresql-vs-sql-server&quot; &gt;&lt;a href=&quot;#how-b-tree-indexing-works-in-postgresql-vs-sql-server&quot; aria-label=&quot;how b tree indexing works in postgresql vs sql server permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How B-Tree indexing works in PostgreSQL vs. SQL Server&lt;/h2&gt;
&lt;p&gt;At a high level, both databases use B-Tree indexes to speed up equality and range queries. B-Trees maintain sorted order and are balanced for consistent read performance. But while the concept is similar in both databases, the way it&apos;s implemented has important performance consequences.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;SQL Server: Clustered vs Nonclustered Index&quot;
        title=&quot;SQL Server: Clustered vs Nonclustered Index&quot;
        src=&quot;https://pganalyze.com/static/4ee84e2e9209e2207b8ca662ad406f03/1d69c/sql_server_index_types.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;SQL Server uses clustered indexes to physically order the table&apos;s data by the indexed column. When a clustered index is defined, the rows in the table are stored in the same order as the index itself. Nonclustered indexes are stored separately and point to rows using a row locator, either a RID or the clustered key. This physical ordering can be beneficial for range scans or pagination queries, but it also means you&apos;re limited to one clustered index per table. More importantly, SQL Server stores each index entry in full, even if multiple entries have identical values on the same page. There&apos;s no deduplication, so indexes with many repeated values can grow large and consume excessive I/O.&lt;/p&gt;
&lt;div &gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Postgres B-Tree Index&quot; title=&quot;Postgres B-Tree Index&quot; src=&quot;https://pganalyze.com/static/cea9823e4460df530b3a23e0787ba953/e9beb/btree_index_postgres.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/div&gt;
&lt;p&gt;PostgreSQL does not have clustered indexes in the SQL Server sense. All PostgreSQL tables are stored as unordered heaps, and indexes are purely logical structures that point to tuples in the heap. This design gives PostgreSQL some flexibility: it allows for easier index maintenance and avoids the complications of physical reordering.&lt;/p&gt;
&lt;p&gt;However, it also means that you can&apos;t rely on an index to define how the table is physically laid out. If query performance depends on reading data in a particular order, Postgres does allow you to run the &lt;code &gt;CLUSTER&lt;/code&gt; command, but it requires a full table lock. In production environments, you can use tools like &lt;code &gt;pg_repack&lt;/code&gt; to achieve a similar result.&lt;/p&gt;
&lt;p&gt;So while both databases use B-Tree indexes as their default, SQL Server&apos;s tight coupling between index and physical storage creates a different set of expectations and limitations. PostgreSQL&apos;s index model has some performance downsides (since there is no clustered index implementation), but distinct features like deduplication make it perform better in other situations.&lt;/p&gt;
&lt;h3 id=&quot;postgresqls-b-tree-deduplication&quot; &gt;&lt;a href=&quot;#postgresqls-b-tree-deduplication&quot; aria-label=&quot;postgresqls b tree deduplication permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;PostgreSQL&apos;s B-Tree deduplication&lt;/h3&gt;
&lt;p&gt;Deduplication was introduced in PostgreSQL version 13 and addresses a common inefficiency in traditional B-Tree indexes. When many rows share the same indexed value—think status codes, boolean flags, or timestamps—standard B-Trees store each value and its corresponding tuple pointer individually. This results in bloated index pages and increased maintenance cost, especially for write-heavy workloads.&lt;/p&gt;
&lt;p&gt;PostgreSQL deduplicates repeated values within a single index page by default. Instead of storing the same key value multiple times, it stores it once and maintains a compact structure that tracks all matching heap pointers. This reduces index size significantly and improves cache performance, since more index entries fit in memory.&lt;/p&gt;
&lt;p&gt;SQL Server does not support deduplication. Each index entry is stored independently, even if the values are identical. In datasets with skewed distributions, PostgreSQL&apos;s approach leads to more compact, more efficient indexes, with fewer pages and less disk I/O.&lt;/p&gt;
&lt;h3 id=&quot;benchmarking-b-tree-indexes-on-postgresql-vs-sql-server&quot; &gt;&lt;a href=&quot;#benchmarking-b-tree-indexes-on-postgresql-vs-sql-server&quot; aria-label=&quot;benchmarking b tree indexes on postgresql vs sql server permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Benchmarking B-Tree indexes on PostgreSQL vs. SQL Server&lt;/h3&gt;
&lt;p&gt;To understand how PostgreSQL&apos;s index deduplication affects real-world performance and storage, we ran a benchmark comparing B-Tree index sizes across PostgreSQL and SQL Server under varying levels of data duplication. Each test created a table of 10 million rows with differing levels of value repetition, ranging from entirely unique values to repeated values at a 1000x factor.&lt;/p&gt;
&lt;p&gt;Here&apos;s how we structured the test in both databases, so you can reproduce it yourself.&lt;/p&gt;
&lt;h4 id=&quot;postgresql-test-setup&quot; &gt;&lt;a href=&quot;#postgresql-test-setup&quot; aria-label=&quot;postgresql test setup permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;PostgreSQL Test Setup&lt;/h4&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_1&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_10&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_100&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_1000&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_1 &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_10 &lt;span &gt;SELECT&lt;/span&gt; val &lt;span &gt;/&lt;/span&gt; &lt;span &gt;10&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; x&lt;span &gt;(&lt;/span&gt;val&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_100 &lt;span &gt;SELECT&lt;/span&gt; val &lt;span &gt;/&lt;/span&gt; &lt;span &gt;100&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; x&lt;span &gt;(&lt;/span&gt;val&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_1000 &lt;span &gt;SELECT&lt;/span&gt; val &lt;span &gt;/&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; x&lt;span &gt;(&lt;/span&gt;val&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_1_idx &lt;span &gt;ON&lt;/span&gt; factor_1&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_10_idx &lt;span &gt;ON&lt;/span&gt; factor_10&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_100_idx &lt;span &gt;ON&lt;/span&gt; factor_100&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_1000_idx &lt;span &gt;ON&lt;/span&gt; factor_1000&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_1_idx_no_dup_fill100 &lt;span &gt;ON&lt;/span&gt; factor_1&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt; &lt;span &gt;WITH&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;deduplicate_items &lt;span &gt;=&lt;/span&gt; &lt;span &gt;off&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;fillfactor&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;100&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_10_idx_no_dup_fill100 &lt;span &gt;ON&lt;/span&gt; factor_10&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt; &lt;span &gt;WITH&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;deduplicate_items &lt;span &gt;=&lt;/span&gt; &lt;span &gt;off&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;fillfactor&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;100&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_100_idx_no_dup_fill100 &lt;span &gt;ON&lt;/span&gt; factor_100&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt; &lt;span &gt;WITH&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;deduplicate_items &lt;span &gt;=&lt;/span&gt; &lt;span &gt;off&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;fillfactor&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;100&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_1000_idx_no_dup_fill100 &lt;span &gt;ON&lt;/span&gt; factor_1000&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt; &lt;span &gt;WITH&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;deduplicate_items &lt;span &gt;=&lt;/span&gt; &lt;span &gt;off&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;fillfactor&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;100&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id=&quot;sql-server-test-setup&quot; &gt;&lt;a href=&quot;#sql-server-test-setup&quot; aria-label=&quot;sql server test setup permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;SQL Server Test Setup&lt;/h4&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_1&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_10&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_100&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; factor_1000&lt;span &gt;(&lt;/span&gt;col &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_1 &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_10 &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;value&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;10&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_100 &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;value&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;100&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; factor_1000 &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;value&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; GENERATE_SERIES&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_1_idx &lt;span &gt;ON&lt;/span&gt; factor_1&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_10_idx &lt;span &gt;ON&lt;/span&gt; factor_10&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_100_idx &lt;span &gt;ON&lt;/span&gt; factor_100&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; factor_1000_idx &lt;span &gt;ON&lt;/span&gt; factor_1000&lt;span &gt;(&lt;/span&gt;col&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;benchmark-results-postgresqls-deduplication-reduces-index-size&quot; &gt;&lt;a href=&quot;#benchmark-results-postgresqls-deduplication-reduces-index-size&quot; aria-label=&quot;benchmark results postgresqls deduplication reduces index size permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Benchmark results: PostgreSQL&apos;s deduplication reduces index size&lt;/h3&gt;
&lt;p&gt;When we benchmarked index sizes across PostgreSQL and SQL Server, we saw a sharp divergence as data duplication increased. With values repeated 1,000 times, a PostgreSQL index using deduplication was &lt;strong&gt;3x smaller&lt;/strong&gt; than the same index created with deduplication turned off. Compared to SQL Server, which does not support deduplication and stores each repeated value in full, PostgreSQL consistently produced smaller, more efficient indexes.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;pganalyze-sql-server-postgresql-btree-index-size-benchmark.png&quot;
        title=&quot;pganalyze-sql-server-postgresql-btree-index-size-benchmark.png&quot;
        src=&quot;https://pganalyze.com/static/023f81d98fc55b7c3a409be0f9ca868e/1d69c/pganalyze-sql-server-postgresql-btree-index-size-benchmark.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This difference matters. High-cardinality columns like status flags, timestamps, and categorical fields are common in production systems. When these values repeat across millions of rows, large indexes can quickly become a performance bottleneck, slowing scans, increasing I/O, and inflating memory usage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL&apos;s deduplication reduces index size significantly&lt;/strong&gt;, making it easier to keep indexes in memory and reduce disk pressure. For teams moving from SQL Server to PostgreSQL, or simply scaling out workloads with heavily used indexes, this optimization isn&apos;t just theoretical. It has a direct impact on resource usage, query performance, and overall operational efficiency.&lt;/p&gt;
&lt;h2 id=&quot;comparison-table-postgresql-vs-sql-server-indexing&quot; &gt;&lt;a href=&quot;#comparison-table-postgresql-vs-sql-server-indexing&quot; aria-label=&quot;comparison table postgresql vs sql server indexing permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Comparison Table: PostgreSQL vs. SQL Server Indexing&lt;/h2&gt;
&lt;p&gt;Index implementations for both B-Tree and other index types vary significantly between PostgreSQL and SQL Server. We&apos;ve put together a comprehensive index comparison table to help you as a reference in your SQL Server to PostgreSQL migrations.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(Certain index types exist in SQL Server but not in PostgreSQL or vice versa. We&apos;ve noted supportability as follows: 🟢 Supported index type  🔴 Not supported index type.)&lt;/em&gt;&lt;/p&gt;
&lt;table &gt;
  &lt;thead&gt;
    &lt;tr &gt;
      &lt;th &gt;Index Type&lt;/th&gt;
      &lt;th &gt;Use Case Example&lt;/th&gt;
      &lt;th &gt;PostgreSQL&lt;/th&gt;
      &lt;th &gt;SQL Server&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td &gt;&lt;strong&gt;B-Tree&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Best for general-purpose indexing, equality and range queries (e.g., filtering users by age or date).&lt;/td&gt;
      &lt;td &gt;🟢 Default index type, supports equality &amp; range queries, sorting, and pattern matching with prefixes.&lt;/td&gt;
      &lt;td &gt;🟢 On SQL Server the default structure for clustered and nonclustered indexes is a B-Tree.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr &gt;
      &lt;td &gt;&lt;strong&gt;Clustered&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Automatically orders table rows by the index key; best for frequently sorted queries.&lt;/td&gt;
      &lt;td &gt;🔴 PostgreSQL does not have clustered indexes; instead, you can use the &lt;code&gt;CLUSTER&lt;/code&gt; command to order the table based on a nonclustered index; however, this order will not be preserved as new data gets inserted.&lt;/td&gt;
      &lt;td &gt;🟢 Equivalent to PostgreSQL B-Tree; sorts &amp; stores data in order based on key.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td &gt;&lt;strong&gt;Nonclustered&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Useful for indexes that speed up searches without affecting physical storage order.&lt;/td&gt;
      &lt;td &gt;🟢 In PostgreSQL all indexes are nonclustered.&lt;/td&gt;
      &lt;td &gt;🟢 Can be created on heap or a clustered index; stores data separately from the table.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr &gt;
      &lt;td &gt;&lt;strong&gt;Hash&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Optimized for exact match lookups, like searching by user ID or email address.&lt;/td&gt;
      &lt;td &gt;🟢 In PostgreSQL, hash indexes can only index a single column. While you can create multiple indexes to support a query, typically a multi-column B-Tree index is more effective.&lt;/td&gt;
      &lt;td &gt;🟢 Used for memory-optimized tables; requires a fixed bucket count.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td &gt;&lt;strong&gt;Filtered / Partial&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Efficient for indexing a subset of data, such as active users only.&lt;/td&gt;
      &lt;td &gt;🟢 PostgreSQL can use Partial Indexes to index only a subset of rows.&lt;/td&gt;
      &lt;td &gt;🟢 A Filtered Index is a nonclustered index that indexes only a subset of table rows.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr &gt;
      &lt;td &gt;&lt;strong&gt;BRIN&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Best for very large tables where data is naturally ordered, such as time-series data.&lt;/td&gt;
      &lt;td &gt;🟢 Stores summaries of block ranges; best for large, sequentially stored data.&lt;/td&gt;
      &lt;td &gt;🔴 N/A&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td &gt;&lt;strong&gt;Full-text&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Used for natural language searches, such as searching text in articles or product reviews.&lt;/td&gt;
      &lt;td &gt;🟢 PostgreSQL supports Full-Text Search using GIN indexes on &lt;code&gt;tsvector&lt;/code&gt; columns.&lt;/td&gt;
      &lt;td &gt;🟢 SQL Server uses an inverted index for text-based queries, similar to PostgreSQL GIN.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr &gt;
      &lt;td &gt;&lt;strong&gt;GIN&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Great for indexing JSONB, arrays, and full-text search (e.g., searching product descriptions).&lt;/td&gt;
      &lt;td &gt;🟢 Inverted index; best for JSON, full-text search, and arrays.&lt;/td&gt;
      &lt;td &gt;🔴 Partial capability via Full-text index.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td &gt;&lt;strong&gt;Vector&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Efficiently perform similarity search or nearest neighbor search across high-dimensional data, most commonly in AI and machine learning applications.&lt;/td&gt;
      &lt;td &gt;🟢 PostgreSQL doesn&apos;t include vector support natively, but the open-source extension &lt;a href=&quot;https://github.com/pgvector/pgvector&quot;&gt;pgvector&lt;/a&gt; enables vector storage and indexing.&lt;/td&gt;
      &lt;td &gt;🔴 SQL Server does not natively support vector indexing or search. Microsoft recommends using its Azure AI Search instead.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr &gt;
      &lt;td &gt;&lt;strong&gt;XML&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Optimized for querying and storing XML documents.&lt;/td&gt;
      &lt;td &gt;🔴 PostgreSQL does not support indexes directly on XML types; however, expression indexes can be used on subsets of the XML data. For unstructured documents, JSONB is the recommended data type.&lt;/td&gt;
      &lt;td &gt;🟢 SQL Server has dedicated indexes on XML data types.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td &gt;&lt;strong&gt;Spatial&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Used for geographic queries, e.g., finding locations within a radius.&lt;/td&gt;
      &lt;td &gt;🟢 In PostgreSQL spatial indexing queries are provided by the open source &lt;a href=&quot;https://postgis.net/&quot;&gt;PostGIS&lt;/a&gt; extension.&lt;/td&gt;
      &lt;td &gt;🟢 SQL Server has built in spatial data types.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr &gt;
      &lt;td &gt;&lt;strong&gt;SP-GiST&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Used for hierarchical data structures like tree-based searches (e.g., routing networks).&lt;/td&gt;
      &lt;td &gt;🟢 Supports non-balanced tree structures like quadtrees &amp; k-d trees, good for hierarchical data.&lt;/td&gt;
      &lt;td &gt;🔴 N/A&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td &gt;&lt;strong&gt;GiST&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Ideal for geometric and full-text search queries, e.g., finding nearby locations.&lt;/td&gt;
      &lt;td &gt;🟢 Infrastructure for specialized indexes; used for geometric &amp; full-text search.&lt;/td&gt;
      &lt;td &gt;🔴 N/A&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr &gt;
      &lt;td &gt;&lt;strong&gt;Columnstore&lt;/strong&gt;&lt;/td&gt;
      &lt;td &gt;Best for OLAP workloads and analytical queries (e.g., data warehousing).&lt;/td&gt;
      &lt;td &gt;🔴 While PostgreSQL has different extensions that offer columnar storage, like Citus and Timescale, it&apos;s a relatively recent implementation and may be limited by use case.&lt;/td&gt;
      &lt;td &gt;🟢 SQL Server has built-in columnar storage implemented as an index type since SQL Server 2012.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&quot;choosing-the-right-index-for-your-workload&quot; &gt;&lt;a href=&quot;#choosing-the-right-index-for-your-workload&quot; aria-label=&quot;choosing the right index for your workload permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Choosing the right index for your workload&lt;/h2&gt;
&lt;p&gt;Understanding the differences between PostgreSQL and SQL Server indexing is crucial when optimizing query performance, planning a migration, or designing a high-performance database. Choosing the right indexing strategy requires deep knowledge of query execution patterns and performance trade-offs. Many teams manually experiment with different indexing strategies, which can lead to over-indexing, redundant indexes, or missed optimization opportunities.&lt;/p&gt;
&lt;p&gt;Instead of trial and error, &lt;a href=&quot;https://pganalyze.com/blog/index-advisor-v3&quot;&gt;&lt;strong&gt;pganalyze Index Advisor&lt;/strong&gt;&lt;/a&gt; automatically detects missing indexes, redundant indexes, and optimal column order for multicolumn indexes by applying a constraint programming model against real query execution data. This removes the guesswork and ensures that PostgreSQL databases are indexed for maximum performance.&lt;/p&gt;
&lt;h2 id=&quot;references&quot; &gt;&lt;a href=&quot;#references&quot; aria-label=&quot;references permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;References:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/indexes-types.html#INDEXES-TYPES-BTREE&quot;&gt;PostgreSQL Documentation: 17: 11.2. Index Types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/16/btree-implementation.html#BTREE-DEDUPLICATION&quot;&gt;PostgreSQL Documentation: 16: 67.4. Deduplication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://learn.microsoft.com/en-us/sql/relational-databases/indexes/indexes?view=sql-server-ver16&quot;&gt;Microsoft SQL Server Documentation: Indexes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-index-design-guide?view=sql-server-ver16&quot;&gt;Microsoft Blog: SQL Server and Azure SQL index architecture and design guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/index-advisor-v3&quot;&gt;pganalyze Blog: Introducing pganalyze Index Advisor 3.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Comparing EXPLAIN Plans is hard (and how pganalyze does it)]]></title><description><![CDATA[The Postgres EXPLAIN
command
is invaluable when trying to understand query performance. SQL is a declarative
language, and the Postgres query planner will decide the most efficient way to
execute a query. However, plan selection is based on statistics, configuration
settings, and heuristics—not a crystal ball. Sometimes there's a substantial gap
between what the planner thinks is most efficient and reality. In those
situations, EXPLAIN can help Postgres users understand the planner's "reasoning…]]></description><link>https://pganalyze.com/blog/understanding-how-to-compare-postgres-explain-plans</link><guid isPermaLink="false">https://pganalyze.com/blog/understanding-how-to-compare-postgres-explain-plans</guid><dc:creator><![CDATA[Maciek Sakrejda]]></dc:creator><pubDate>Thu, 06 Feb 2025 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;The Postgres &lt;a href=&quot;https://pganalyze.com/docs/explain/basics-of-postgres-query-planning&quot;&gt;EXPLAIN
command&lt;/a&gt;
is invaluable when trying to understand query performance. SQL is a declarative
language, and the Postgres query planner will decide the most efficient way to
execute a query. However, plan selection is based on statistics, configuration
settings, and heuristics—not a crystal ball. Sometimes there&apos;s a substantial gap
between what the planner thinks is most efficient and reality. In those
situations, EXPLAIN can help Postgres users understand the planner&apos;s &quot;reasoning&quot;
in selecting a particular plan.&lt;/p&gt;
&lt;p&gt;In this post, we&apos;ll walk through EXPLAIN plan fundamentals, why it&apos;s helpful to
compare EXPLAIN plans and the challenges presented by existing tools. We&apos;ll also
discuss how that that influenced our product roadmap at pganalyze to create a
text-based diff interface, which we first rolled out as part of the &lt;a href=&quot;https://pganalyze.com/blog/introducing-postgres-query-tuning-workbooks#viewing-plans-explain-insights-and-comparing-plans&quot;&gt;beta
release of Query Tuning
Workbooks&lt;/a&gt;
earlier this year. Now, we&apos;re expanding that same functionality to the EXPLAIN
plan list under query details and adding a new comparison metric, buffers.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#existing-plan-comparisons&quot;&gt;Existing plan comparisons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#building-a-bespoke-explain-plan-comparison&quot;&gt;Building a bespoke EXPLAIN plan comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#in-summary&quot;&gt;In Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;existing-plan-comparisons&quot; &gt;&lt;a href=&quot;#existing-plan-comparisons&quot; aria-label=&quot;existing plan comparisons permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Existing plan comparisons&lt;/h2&gt;
&lt;p&gt;Sometimes, a single query can end up being
executed with several different plans (e.g., due to statistics that vary with
query parameters), and understanding a suboptimal plan is often easier when
contrasted with a &quot;good&quot; plan. One can figure out the differences and what&apos;s
causing them, and rewrite the query to pick a more optimal plan.&lt;/p&gt;
&lt;p&gt;Unfortunately, Postgres plans are not easy to understand, let alone to compare.
We wanted to provide an easier way to review the differences between plans. The
EXPLAIN command goes back all the way to
&lt;a href=&quot;https://www.postgresql.org/docs/current/history.html&quot;&gt;Postgres95&lt;/a&gt;, the first
community open-source, SQL-based release. But comparing EXPLAIN output still
seems to be a fairly ad-hoc process now, thirty years later.&lt;/p&gt;
&lt;p&gt;Take a simple query like&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_class &lt;span &gt;WHERE&lt;/span&gt; relname &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;pg_class&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By default, you will likely get a regular index scan:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                                          	QUERY PLAN                                                          	 
---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.27..8.29 rows=1 width=273) (actual time=0.033..0.035 rows=1 loops=1)
   Index Cond: (relname = &apos;pg_class&apos;::name)
   Buffers: shared hit=3
 Planning Time: 0.127 ms
 Execution Time: 0.060 ms
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If regular index scans are disabled, you&apos;ll get a bitmap index scan followed by
a bitmap heap scan:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                                        	QUERY PLAN                                                        	 
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on pg_class  (cost=4.28..8.29 rows=1 width=273) (actual time=0.027..0.029 rows=1 loops=1)
   Recheck Cond: (relname = &apos;pg_class&apos;::name)
   Heap Blocks: exact=1
   Buffers: shared hit=3
   -&gt;  Bitmap Index Scan on pg_class_relname_nsp_index  (cost=0.00..4.28 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1)
     	Index Cond: (relname = &apos;pg_class&apos;::name)
     	Buffers: shared hit=2
 Planning Time: 0.157 ms
 Execution Time: 0.087 ms
(9 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Comparing something like this by looking at the two plans side-by-side is pretty
straightforward because the plan is small, but once you need to compare larger
plans, you may want a better mechanism. There are no EXPLAIN-specific comparison
tools, but &lt;a href=&quot;https://www.gnu.org/software/diffutils/&quot;&gt;GNU diff&lt;/a&gt; has been around
since the early seventies (Wikipedia has &lt;a href=&quot;https://en.wikipedia.org/wiki/Diff&quot;&gt;a nice overview of the
history&lt;/a&gt;), and is still a go-to tool for
comparing text files. But diff output of the plans above is not very usable:&lt;/p&gt;
&lt;div  data-language=&quot;diff&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;1,4c1,5&lt;/span&gt;
&lt;span &gt;&lt;span &gt;&lt;&lt;/span&gt;                                                           	QUERY PLAN                                                          	 
&lt;span &gt;&lt;&lt;/span&gt; ---------------------------------------------------------------------------------------------------------------------------------------
&lt;span &gt;&lt;&lt;/span&gt;  Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.27..8.29 rows=1 width=273) (actual time=0.033..0.035 rows=1 loops=1)
&lt;span &gt;&lt;&lt;/span&gt;	Index Cond: (relname = &apos;pg_class&apos;::name)
&lt;/span&gt;&lt;span &gt;---&lt;/span&gt;
&lt;span &gt;&lt;span &gt;&gt;&lt;/span&gt;                                                         	QUERY PLAN                                                        	 
&lt;span &gt;&gt;&lt;/span&gt; -----------------------------------------------------------------------------------------------------------------------------------
&lt;span &gt;&gt;&lt;/span&gt;  Bitmap Heap Scan on pg_class  (cost=4.28..8.29 rows=1 width=273) (actual time=0.027..0.029 rows=1 loops=1)
&lt;span &gt;&gt;&lt;/span&gt;	Recheck Cond: (relname = &apos;pg_class&apos;::name)
&lt;span &gt;&gt;&lt;/span&gt;	Heap Blocks: exact=1
&lt;/span&gt;&lt;span &gt;6,8c7,12&lt;/span&gt;
&lt;span &gt;&lt;span &gt;&lt;&lt;/span&gt;  Planning Time: 0.127 ms
&lt;span &gt;&lt;&lt;/span&gt;  Execution Time: 0.060 ms
&lt;span &gt;&lt;&lt;/span&gt; (5 rows)
&lt;/span&gt;&lt;span &gt;---&lt;/span&gt;
&lt;span &gt;&lt;span &gt;&gt;&lt;/span&gt;	-&gt;  Bitmap Index Scan on pg_class_relname_nsp_index  (cost=0.00..4.28 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1)
&lt;span &gt;&gt;&lt;/span&gt;      	Index Cond: (relname = &apos;pg_class&apos;::name)
&lt;span &gt;&gt;&lt;/span&gt;      	Buffers: shared hit=2
&lt;span &gt;&gt;&lt;/span&gt;  Planning Time: 0.157 ms
&lt;span &gt;&gt;&lt;/span&gt;  Execution Time: 0.087 ms
&lt;span &gt;&gt;&lt;/span&gt; (9 rows)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It shows us lines that are due to differences in plan structure, but also some
differences due to cost estimates, timing or I/O differences, or other
irrelevant details.&lt;/p&gt;
&lt;h2 id=&quot;building-a-bespoke-explain-plan-comparison&quot; &gt;&lt;a href=&quot;#building-a-bespoke-explain-plan-comparison&quot; aria-label=&quot;building a bespoke explain plan comparison permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Building a bespoke EXPLAIN plan comparison&lt;/h2&gt;
&lt;p&gt;We experimented with a couple of different approaches to improve this experience
when using pganalyze for recording and comparing query plans. We settled on an
interface built on a text-based diff of the text output (inspired by diff and
GitHub&apos;s git changeset rendering), but optimized for understanding the most
important EXPLAIN plan differences:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/0c3d814ebab0a6fe1fac17b407e8e67e/273a9/explain-comparison.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;explain-comparison&quot;
        title=&quot;explain-comparison&quot;
        src=&quot;https://pganalyze.com/static/0c3d814ebab0a6fe1fac17b407e8e67e/1d69c/explain-comparison.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The plans in the comparison are rendered to focus on the plan structure (since
this is usually what leads to the biggest performance differences between
plans). Changes in runtime or I/O will not show up as a difference between
plans, but you can select a comparison metric to focus on, and see the values of
that metric for each node in the plan. You can also click on a node in either
Plan A or Plan B to see details about that node, just like when viewing full
EXPLAIN plans.&lt;/p&gt;
&lt;p&gt;As mentioned, we first introduced EXPLAIN plan comparison as part of the &lt;a href=&quot;https://pganalyze.com/blog/introducing-postgres-query-tuning-workbooks#viewing-plans-explain-insights-and-comparing-plans&quot;&gt;Query
Tuning Workbooks feature we launched in
beta&lt;/a&gt;.
When tuning a query, being able to compare plans easily is extremely useful.&lt;/p&gt;
&lt;p&gt;Today we&apos;re extending this functionality to the query EXPLAIN plan list, for
plans captured through &lt;a href=&quot;https://pganalyze.com/docs/explain/setup&quot;&gt;Automated
EXPLAIN&lt;/a&gt;. When multiple distinct plans
for a query exist, it can be hard to understand what the differences are. Now,
you can select two plans on the Query Detail page for a specific query to see
their comparison:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/7db4a6261f3262d83f323bbaf034d086/273a9/explain-comparison-selection.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;explain-comparison-selection&quot;
        title=&quot;explain-comparison-selection&quot;
        src=&quot;https://pganalyze.com/static/7db4a6261f3262d83f323bbaf034d086/1d69c/explain-comparison-selection.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;As part of this release, we&apos;re also adding buffers used as one of the &lt;a src=&quot;https://pganalyze.com/docs/explain/plan-comparison#plan-metrics&quot;&gt;execution
metrics&lt;/a&gt; to comparisons. Buffer
usage can be tricky to compare because buffer hits &lt;a href=&quot;https://pganalyze.com/blog/5mins-explain-analyze-buffers-nested-loops&quot;&gt;can be
double-counted&lt;/a&gt;
in Postgres&apos; current statistics accounting. But the sources of double-counting
are somewhat limited: most of that happens in Nested Loop joins, and sometimes
with Index Scans. It can&apos;t always reliably be used to determine &quot;how much data
did this query load&quot; with a warm cache, but it can still be useful to compare
two plans with a similar structure.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/3006fc383facb79fbd8a3ba2f9440c08/273a9/explain-comparison-buffers.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;explain-comparison-with-buffers&quot;
        title=&quot;explain-comparison-with-buffers&quot;
        src=&quot;https://pganalyze.com/static/3006fc383facb79fbd8a3ba2f9440c08/1d69c/explain-comparison-buffers.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2 id=&quot;in-summary&quot; &gt;&lt;a href=&quot;#in-summary&quot; aria-label=&quot;in summary permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;In Summary&lt;/h2&gt;
&lt;p&gt;We&apos;re excited to expand our EXPLAIN comparison feature beyond Query Tuning
Workbooks, and we hope you&apos;ll find this feature useful. If you&apos;re an existing
user, you can find the feature on the EXPLAIN Plans tab of the Query Detail page
under Query Performance. If you&apos;re new to pganalyze, visit our &lt;a href=&quot;https://pganalyze.com/docs/install&quot;&gt;Getting Started
Guide&lt;/a&gt; and &lt;a href=&quot;https://app.pganalyze.com/users/sign_up&quot;&gt;sign up for a free trial
today&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Replacing Oracle Hints: Best Practices with pg_hint_plan on PostgreSQL]]></title><description><![CDATA[If you're migrating from Oracle Database to PostgreSQL, you're likely accustomed to using hints to optimize queries. In Oracle, these are special directives embedded in SQL (like ) that steer the optimizer's execution plan. They can be extremely useful but also introduce complexity and “hint debt” over time. PostgreSQL takes a very different approach to query optimization. Rather than supporting built-in hints, the Postgres community, historically, has emphasized relying on its cost-based…]]></description><link>https://pganalyze.com/blog/migrating-from-oracle-hints-to-pg-hint-plan-on-postgresql</link><guid isPermaLink="false">https://pganalyze.com/blog/migrating-from-oracle-hints-to-pg-hint-plan-on-postgresql</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Wed, 05 Feb 2025 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;If you&apos;re migrating from Oracle Database to PostgreSQL, you&apos;re likely accustomed to using &lt;strong&gt;hints&lt;/strong&gt; to optimize queries. In Oracle, these are special directives embedded in SQL (like &lt;code &gt;/*+ INDEX(...) */&lt;/code&gt;) that steer the optimizer&apos;s execution plan. They can be extremely useful but also introduce complexity and “hint debt” over time.&lt;/p&gt;
&lt;p&gt;PostgreSQL takes a very different approach to query optimization. Rather than supporting built-in hints, the Postgres community, historically, has emphasized relying on its cost-based planner to choose execution plans based on statistics, indexes, and configuration parameters. In practice, that works many times, but there can be cases where the planner is stubborn and keeps picking a bad plan. In migration situations, this is particularly complicated, because performance may be dependent on a particular execution plan that was previously specified using an Oracle hint.&lt;/p&gt;
&lt;p&gt;So you might ask yourself: &lt;strong&gt;how do you replicate or replace Oracle hints when you migrate to Postgres?&lt;/strong&gt; That&apos;s where the &lt;a href=&quot;https://github.com/ossc-db/pg_hint_plan&quot;&gt;&lt;strong&gt;pg_hint_plan&lt;/strong&gt;&lt;/a&gt; extension comes in.&lt;/p&gt;
&lt;p&gt;In this post, we&apos;ll explore the differences between Oracle&apos;s hint system and PostgreSQL&apos;s planner with pg_hint_plan, discuss when you still need hints in your Postgres queries, and walk through best practices for using pg_hint_plan effectively, including &lt;a href=&quot;#using-pganalyze-to-test-query-hints&quot;&gt;how pganalyze can help&lt;/a&gt;.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#when-and-when-not-to-use-hints&quot;&gt;When (and when not) to use hints&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#relying-on-postgresqls-cost-based-planner&quot;&gt;Relying on PostgreSQL&apos;s cost-based planner&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#root-causes-of-postgres-planner-problems&quot;&gt;Root causes of Postgres planner problems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-hints-can-help&quot;&gt;When hints can help&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#mapping-oracle-hints-to-pg_hint_plan&quot;&gt;Mapping Oracle hints to pg_hint_plan&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#access-path-or-index-hints&quot;&gt;Access path (or index) hints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#join-operation-hints&quot;&gt;Join operation hints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#join-order-hints&quot;&gt;Join order hints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#parallel--degree-of-parallelism-hints&quot;&gt;Parallel / degree of parallelism hints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#query-transformation--subquery-hints&quot;&gt;Query transformation &amp;#x26; subquery hints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#result-cache-and-other-specialized-hints&quot;&gt;Result cache and other specialized hints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#additional-pg_hint_plan-features-no-oracle-equivalent&quot;&gt;Additional pg_hint_plan Features (no Oracle equivalent)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#best-practices-for-debugging-pg_hint_plan-hints&quot;&gt;Best practices for debugging pg_hint_plan hints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#using-pganalyze-to-test-query-hints&quot;&gt;Using pganalyze to test query hints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#documentation&quot;&gt;Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#5mins-of-postgres-episodes-on-planner-quirks&quot;&gt;5mins of Postgres episodes on planner quirks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#webinars--ebooks&quot;&gt;Webinars &amp;#x26; eBooks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#blog-posts&quot;&gt;Blog posts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;when-and-when-not-to-use-hints&quot; &gt;&lt;a href=&quot;#when-and-when-not-to-use-hints&quot; aria-label=&quot;when and when not to use hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;When (and when not) to use hints&lt;/h2&gt;
&lt;p&gt;It might be tempting to migrate all Oracle hints into pg_hint_plan, but this can be overkill and sometimes even counterproductive in PostgreSQL. Let&apos;s talk about where hints fit into a well-tuned Postgres environment.&lt;/p&gt;
&lt;h3 id=&quot;relying-on-postgresqls-cost-based-planner&quot; &gt;&lt;a href=&quot;#relying-on-postgresqls-cost-based-planner&quot; aria-label=&quot;relying on postgresqls cost based planner permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Relying on PostgreSQL&apos;s cost-based planner&lt;/h3&gt;
&lt;p&gt;PostgreSQL is built around a cost-based planner that typically selects efficient execution paths without manual intervention. It uses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Statistics&lt;/strong&gt; on table sizes, column data distribution, etc.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Planner cost settings&lt;/strong&gt; like &lt;code &gt;random_page_cost&lt;/code&gt; and &lt;code &gt;cpu_tuple_cost&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server configuration parameters&lt;/strong&gt; such as &lt;code &gt;enable_seqscan&lt;/code&gt;, &lt;code &gt;work_mem&lt;/code&gt;, and &lt;code &gt;effective_cache_size&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The philosophy behind PostgreSQL&apos;s planner is that if your statistics, indexes, and cost parameters are well-tuned, the engine can usually figure out the best plan on its own, and there is rarely a need to rely on hints.&lt;/p&gt;
&lt;p&gt;However, this system isn&apos;t perfect, and Postgres sometimes picks sub-optimal plans, as we&apos;ve talked about in our Postgres &lt;a href=&quot;https://pganalyze.com/blog/migrating-from-oracle-hints-to-pg-hint-plan-on-postgresql#5mins-of-postgres-episodes-on-planner-quirks&quot;&gt;planner quirks series&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;root-causes-of-postgres-planner-problems&quot; &gt;&lt;a href=&quot;#root-causes-of-postgres-planner-problems&quot; aria-label=&quot;root causes of postgres planner problems permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Root causes of Postgres planner problems&lt;/h3&gt;
&lt;p&gt;A common problem with Postgres query plans are out of date, or incorrect statistics. Statistics about tables columns and the selectivity of query filters &lt;a href=&quot;https://pganalyze.com/webinars/how-to-optimize-slow-queries-with-EXPLAIN&quot;&gt;are critical for the planner&lt;/a&gt; to make good decisions. Frequent &lt;code &gt;ANALYZE&lt;/code&gt; operations combined with tuned statistics target settings and using &lt;code &gt;CREATE STATISTICS&lt;/code&gt;, ensure that the system captures current information about data distributions.&lt;/p&gt;
&lt;p&gt;A thoughtfully designed schema with &lt;a href=&quot;https://pganalyze.com/blog/index-advisor-v3&quot;&gt;well-chosen indexes&lt;/a&gt; and, when appropriate, table partitioning, often provides a bigger performance boost than manual hints, which can only do so much on a large table.&lt;/p&gt;
&lt;p&gt;Settings such as &lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-work-mem-tuning&quot;&gt;&lt;code &gt;work_mem&lt;/code&gt;&lt;/a&gt;, &lt;code &gt;random_page_cost&lt;/code&gt;, and &lt;code &gt;effective_cache_size&lt;/code&gt; have a significant impact on the decisions the planner makes, yet they are often set at the default value, which can cause bad query plans. &lt;a href=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;Optimizing these settings&lt;/a&gt; can resolve many query performance challenges without introducing hints. When the planner&apos;s cost model aligns well with the realities of your hardware and data, it typically arrives at better plans.&lt;/p&gt;
&lt;h3 id=&quot;when-hints-can-help&quot; &gt;&lt;a href=&quot;#when-hints-can-help&quot; aria-label=&quot;when hints can help permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;When hints can help&lt;/h3&gt;
&lt;p&gt;Despite the strengths of PostgreSQL&apos;s planner, there are times when hints prove beneficial. In fact, forcing a certain plan for debugging can offer valuable insight into why the planner&apos;s default choice might be less than ideal, and which part of the query plan had inaccurate costs, often caused by statistics issues.&lt;/p&gt;
&lt;p&gt;Legacy Oracle queries often rely heavily on hints, and adjusting them or restructuring the schema might be too risky or time-intensive. In such cases, &lt;a href=&quot;https://github.com/ossc-db/pg_hint_plan/tree/master&quot;&gt;pg_hint_plan&lt;/a&gt; can replicate specific behaviors from Oracle without a total rewrite. Hints also help in highly complex queries or unusual data distributions that consistently lead the planner astray. They are likewise useful as a temporary patch while deeper issues, such as missing statistics or incorrectly set parameters, are being addressed.&lt;/p&gt;
&lt;p&gt;When statistical accuracy, schema design, and parameter tuning are all properly addressed in Postgres, hints become an added layer of complexity rather than a necessity. Use them sparingly, focusing on special cases that truly require hard-coded logic.&lt;/p&gt;
&lt;h2 id=&quot;mapping-oracle-hints-to-pg_hint_plan&quot; &gt;&lt;a href=&quot;#mapping-oracle-hints-to-pg_hint_plan&quot; aria-label=&quot;mapping oracle hints to pg_hint_plan permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Mapping Oracle hints to pg_hint_plan&lt;/h2&gt;
&lt;p&gt;Both Oracle hints and pg_hint_plan hints are embedded in SQL statements using &lt;code &gt;/*+ ... */&lt;/code&gt;. They can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Force the use of specific indexes or join methods (e.g., nested loops)&lt;/li&gt;
&lt;li&gt;Enable or disable parallel execution&lt;/li&gt;
&lt;li&gt;Override other plan choices&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These hints can be very direct: &lt;em&gt;“Use index X on this table,”&lt;/em&gt; or &lt;em&gt;“Join table A and B using a Nested Loop Join.”&lt;/em&gt; This level of control is sometimes essential when the database optimizer doesn&apos;t pick an optimal plan on its own or when you need consistent performance across different instances.&lt;/p&gt;
&lt;p&gt;When you do decide to replicate Oracle hints in Postgres, you&apos;ll likely look for direct equivalents. pg_hint_plan supports many—but not all—Oracle-like hints. pg_hint_plan primarily controls scan methods, join methods, join order, and query parallelism. Many of Oracle&apos;s advanced hints for rewriting queries, star transformations, dynamic sampling, and specialized caching are simply not available or applicable in Postgres.&lt;/p&gt;
&lt;p&gt;Instead, in Postgres, you often achieve similar behavior by tuning planner GUCs (like &lt;code &gt;enable_hashjoin&lt;/code&gt;, &lt;code &gt;enable_nestloop&lt;/code&gt;), rewriting queries, materializing parts of the query with the &lt;code &gt;MATERIALIZED&lt;/code&gt; keyword for CTEs, or using indexes/constraints that nudge the Postgres planner.&lt;/p&gt;
&lt;p&gt;Let&apos;s review some common situations and map them from Oracle Database hints to pg_hint_plan syntax or other Postgres alternatives.&lt;/p&gt;
&lt;h3 id=&quot;access-path-or-index-hints&quot; &gt;&lt;a href=&quot;#access-path-or-index-hints&quot; aria-label=&quot;access path or index hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Access path (or index) hints&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th width=&quot;33%&quot;&gt;Oracle Hint&lt;/th&gt;
      &lt;th width=&quot;33%&quot;&gt;pg_hint_plan Equivalent&lt;/th&gt;
      &lt;th width=&quot;34%&quot;&gt;Notes&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;FULL(table)&lt;/code&gt;&lt;br&gt;Force a full table scan&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;SeqScan(table)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Forces Postgres to use a sequential scan (called Full Table Scan on Oracle) on the named table.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;INDEX(table [index])&lt;/code&gt;&lt;br&gt;Force index scan&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;IndexScan(table [index])&lt;/code&gt; &lt;i&gt;or&lt;/i&gt; &lt;code&gt;IndexOnlyScan(table [index])&lt;/code&gt; &lt;i&gt;or&lt;/i&gt; &lt;code&gt;BitmapScan(table [index])&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;pg_hint_plan has separate hints for regular index scans, index-only scans, or bitmap index scans.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;INDEX_FFS(table index)&lt;/code&gt;&lt;br&gt;Fast full index scan&lt;/td&gt;
      &lt;td&gt;No direct equivalent. &lt;code&gt;IndexOnlyScan&lt;/code&gt; is approximate.&lt;/td&gt;
      &lt;td&gt;Postgres can answer a query from the index by using an IndexOnlyScan, if all filtered and returned columns are indexed. However, Postgres sometimes still checks the table to verify visibility of deleted rows (this cannot be turned off).&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;INDEX_DESC(table [index])&lt;/code&gt;&lt;br&gt;Reverse index scan&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;IndexScan&lt;/code&gt; with an &lt;code&gt;ORDER BY ... DESC&lt;/code&gt; in the query itself.&lt;/td&gt;
      &lt;td&gt;pg_hint_plan can&apos;t directly enforce a &lt;i&gt;descending&lt;/i&gt; index scan; you typically rely on query order or an index with the right sort order.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;NO_INDEX(table [index])&lt;/code&gt;&lt;br&gt;Disallow index&lt;/td&gt;
      &lt;td&gt;No equivalent.&lt;/td&gt;
      &lt;td&gt;No equivalent to disallow individual indexes.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;INDEX_JOIN(table)&lt;/code&gt;&lt;br&gt;Use index join&lt;/td&gt;
      &lt;td&gt;No equivalent.&lt;/td&gt;
      &lt;td&gt;PostgreSQL does not have a direct &quot;index join&quot; concept like Oracle.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In Oracle, you might have:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;/*+ INDEX(table1 idx_table1_col) */&lt;/span&gt; 
       col1&lt;span &gt;,&lt;/span&gt; col2
&lt;span &gt;FROM&lt;/span&gt;   table1
&lt;span &gt;WHERE&lt;/span&gt;  col1 &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;something&apos;&lt;/span&gt;
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; col2 &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In PostgreSQL with pg_hint_plan, you&apos;d translate it to:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*+
  IndexScan(table1 idx_table1_col)
*/&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; col1&lt;span &gt;,&lt;/span&gt; col2
&lt;span &gt;FROM&lt;/span&gt;   table1
&lt;span &gt;WHERE&lt;/span&gt;  col1 &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;something&apos;&lt;/span&gt;
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; col2 &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;join-operation-hints&quot; &gt;&lt;a href=&quot;#join-operation-hints&quot; aria-label=&quot;join operation hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Join operation hints&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th width=&quot;33%&quot;&gt;Oracle Hint&lt;/th&gt;
      &lt;th width=&quot;33%&quot;&gt;pg_hint_plan Equivalent&lt;/th&gt;
      &lt;th width=&quot;34%&quot;&gt;Notes&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;USE_NL(table1 table2)&lt;/code&gt;&lt;br&gt;Use nested loops&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;NestLoop(table1 table2)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Forces a Nested Loop Join between the two named tables.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;USE_HASH(table1 table2)&lt;/code&gt;&lt;br&gt;Use hash join&lt;br&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;HashJoin(table1 table2)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Forces a Hash Join between the two named tables.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;USE_MERGE(table1 table2)&lt;/code&gt;&lt;br&gt;Use sort-merge join&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;MergeJoin(table1 table2)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Forces a Merge Join between the two named tables.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;USE_NL_WITH_INDEX(t1 idx1)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;NestLoop(table1 table2)&lt;/code&gt; + &lt;code&gt;IndexScan(table1 index1)&lt;/code&gt; + &lt;code&gt;Leading((table2 table1))&lt;/code&gt;&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;In order to perform what Postgres calls a &lt;a href=&quot;https://pganalyze.com/blog/how-postgres-chooses-index#parameterized-index-scans-or-why-nested-loop-are-sometimes-a-good-join-type&quot;&gt;Parameterized Index Scan&lt;/a&gt;, the hints must force both a NestedLoop, the Join Order (via Leading) and the use of the correct Index. Note that the Leading hint requires use of extra parenthesis to force the ordering. The first table listed is the outer table, followed by the inner table (which is the one the index scan is on).&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;
        &lt;code&gt;NO_USE_NL(t1 [t2...])&lt;/code&gt;&lt;br&gt;&lt;br&gt;
        &lt;code&gt;NO_USE_MERGE(t1 [t2...])&lt;/code&gt;&lt;br&gt;&lt;br&gt;
        &lt;code&gt;NO_USE_HASH(t1 [t2...])&lt;/code&gt;
      &lt;/td&gt;
      &lt;td&gt;
        &lt;code&gt;NoNestLoop(t1 t2 [t3...])&lt;/code&gt;&lt;br&gt;&lt;br&gt;
        &lt;code&gt;NoMergeJoin(t1 t2 [t3...])&lt;/code&gt;&lt;br&gt;&lt;br&gt;
        &lt;code&gt;NoHashJoin(t1 t2 [t3...])&lt;/code&gt;
      &lt;/td&gt;
      &lt;td&gt;
        pg_hint_plans instructs PostgreSQL&apos;s query planner not to use a Nested Loop/Merge/Hash join for the listed tables (which need to include both the inner and the outer table), while the Oracle hint tells the optimizer not to use a Nested Loop/Merge/Hash join for each specified table where it is the inner table of the join.
      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&quot;join-order-hints&quot; &gt;&lt;a href=&quot;#join-order-hints&quot; aria-label=&quot;join order hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Join order hints&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th width=&quot;33%&quot;&gt;Oracle Hint&lt;/th&gt;
      &lt;th width=&quot;33%&quot;&gt;pg_hint_plan Equivalent&lt;/th&gt;
      &lt;th width=&quot;34%&quot;&gt;Notes&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;ORDERED&lt;/code&gt;&lt;br&gt;Join in the order of tables in the FROM clause&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;Set(join_collapse_limit 1)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;In Postgres, setting the &lt;code&gt;join_collapse_limit&lt;/code&gt; setting to &quot;1&quot; will force Postgres to join the tables in the order they are listed in the query. You can set this either via pg_hint_plan or a regular &lt;code&gt;SET&lt;/code&gt; command before running the query. See &lt;a href=&quot;https://www.postgresql.org/docs/current/explicit-joins.html&quot;&gt;examples in the Postgres documentation&lt;/a&gt;.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;LEADING(t1 t2 ... tN)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;Leading(t1 t2 ... tN)&lt;/code&gt;&lt;br&gt;&lt;br&gt;&lt;code&gt;Leading(((t1 t2) t3))&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;pg_hint_plan supports &lt;code&gt;Leading(...)&lt;/code&gt; to fix the join order. You can list multiple tables in the desired join sequence. Use the syntax with additional parenthesis around each pair to specify which table is used as the inner vs outer table.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&quot;parallel--degree-of-parallelism-hints&quot; &gt;&lt;a href=&quot;#parallel--degree-of-parallelism-hints&quot; aria-label=&quot;parallel  degree of parallelism hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Parallel / degree of parallelism hints&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th width=&quot;33%&quot;&gt;Oracle Hint&lt;/th&gt;
      &lt;th width=&quot;33%&quot;&gt;pg_hint_plan Equivalent&lt;/th&gt;
      &lt;th width=&quot;34%&quot;&gt;Notes&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PARALLEL(table, n)&lt;/code&gt;&lt;br&gt;Parallel degree n&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;Parallel(table n hard)&lt;/code&gt;&lt;/td&gt;
      &lt;td width=&quot;34%&quot;&gt;pg_hint_plan by default (&quot;soft&quot;) only sets the configured maximum number of workers (&lt;code &gt;max_parallel_workers_per_gather&lt;/code&gt;) but won&apos;t force a parallel plan if the costs are not in its favor. You can force a parallel plan by specifying the third argument as &lt;code&gt;hard&lt;/code&gt;, which matches Oracle&apos;s behaviour when specifying a specific parallel degree.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;NO_PARALLEL(table)&lt;/code&gt;&lt;br&gt;Disallow parallel&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;Parallel(table 0)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;pg_hint_plan inhibits parallel execution when the table value is set to zero.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Example usage in &lt;strong&gt;pg_hint_plan,&lt;/strong&gt; increasing the parallel workers from the default of 2 (max_parallel_workers_per_gather) to 4 just for this query&apos;s use of the &quot;sales&quot; table:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*+
  Parallel(sales 4)
*/&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;query-transformation--subquery-hints&quot; &gt;&lt;a href=&quot;#query-transformation--subquery-hints&quot; aria-label=&quot;query transformation  subquery hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Query transformation &amp;#x26; subquery hints&lt;/h3&gt;
&lt;p&gt;Oracle has many hints controlling query transformations (like unnesting subqueries, merging views, star transformations, etc.). pg_hint_plan does not provide direct equivalents for these transformations; PostgreSQL&apos;s planner transformations are generally not hint-based but either controlled automatically or by GUC parameters.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th width=&quot;33%&quot;&gt;Oracle Hint&lt;/th&gt;
      &lt;th width=&quot;20%&quot;&gt;pg_hint_plan Equivalent&lt;/th&gt;
      &lt;th&gt;Notes&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;UNNEST / NO_UNNEST&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;PostgreSQL decides automatically on subquery unnesting (lateral joins, subquery flattening, etc.), and pg_hint_plan cannot influence this. However, queries can be rewritten to use a CTE with the &lt;code&gt;NOT MATERIALIZED&lt;/code&gt; keyword, which will behave similar to Oracle&apos;s &lt;code&gt;UNNEST&lt;/code&gt;, or &lt;code&gt;MATERIALIZED&lt;/code&gt; which will behave like &lt;code&gt;NO_UNNEST&lt;/code&gt;. &lt;a href=&quot;https://www.postgresql.org/docs/current/queries-with.html&quot;&gt;See Postgres documentation&lt;/a&gt;.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;MERGE&lt;/code&gt; / &lt;code&gt;NO_MERGE&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;In Postgres, views are inlined automatically as if they were a subquery; there is no fine-grained hint for controlling this.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PUSH_SUBQ&lt;/code&gt; / &lt;code&gt;NO_PUSH_SUBQ&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;No direct control over subquery execution in &lt;code&gt;pg_hint_plan&lt;/code&gt;.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;STAR_TRANSFORMATION&lt;/code&gt; / &lt;code&gt;NO_STAR_TRANSFORMATION&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;Oracle&apos;s star transformations for data warehouse schemas have no direct counterpart in Postgres.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;FACT&lt;/code&gt; / &lt;code&gt;NO_FACT&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;Oracle uses these for star schemas; not applicable in Postgres.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&quot;result-cache-and-other-specialized-hints&quot; &gt;&lt;a href=&quot;#result-cache-and-other-specialized-hints&quot; aria-label=&quot;result cache and other specialized hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Result cache and other specialized hints&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th width=&quot;33%&quot;&gt;Oracle Hint&lt;/th&gt;
      &lt;th width=&quot;20%&quot;&gt;pg_hint_plan Equivalent&lt;/th&gt;
      &lt;th width=&quot;47%&quot;&gt;Notes&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;RESULT_CACHE&lt;/code&gt; / &lt;code&gt;NO_RESULT_CACHE&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;PostgreSQL does not have a built-in query result cache like Oracle.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;OPT_PARAM(...)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code&gt;Set(...)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Postgres parameters are typically set at the session level (&quot;SET&quot; command) or via &quot;Set&quot; hints in pg_hint_plan. Note the parameters that can be set differ between Oracle and Postgres.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;DYNAMIC_SAMPLING(...)&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;Postgres statistics system works based on a separate ANALYZE of the table outside of query execution and does not have an equivalent of dynamic sampling.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;QB_NAME&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;pg_hint_plan does not offer an equivalent to Oracle&apos;s query block functionality for hints.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PUSH_PRED&lt;/code&gt; / &lt;code&gt;NO_PUSH_PRED&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;Postgres handles predicate pushdown automatically based on heuristics for subqueries; no direct hint.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;USE_CONCAT&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;Oracle uses this to force expansion of &lt;code&gt;OR&lt;/code&gt; clauses into &lt;code&gt;UNION ALL&lt;/code&gt; queries. Postgres does not support doing this transformation automatically, manual rewrite of the query is needed. &lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-UNION-subquery-pull-up-performance&quot;&gt;See our blog post for an example&lt;/a&gt;.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;NO_QUERY_TRANSFORMATION&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;None&lt;/td&gt;
      &lt;td&gt;Postgres&apos;s transformations during the planning process can not be turned off / modified via hints.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&quot;additional-pg_hint_plan-features-no-oracle-equivalent&quot; &gt;&lt;a href=&quot;#additional-pg_hint_plan-features-no-oracle-equivalent&quot; aria-label=&quot;additional pg_hint_plan features no oracle equivalent permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Additional pg_hint_plan Features (no Oracle equivalent)&lt;/h3&gt;
&lt;p&gt;pg_hint_plan has additional hints that don&apos;t map to Oracle hints but can be helpful:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code &gt;Rows(table1 table2 [ n ])&lt;/code&gt;: Tells the planner to assume a join between &lt;code &gt;table1 and table 2&lt;/code&gt; returns &lt;code &gt;n&lt;/code&gt; rows (replacing or adjusting the statistics-derived estimate), influencing join order and plan choices.&lt;/li&gt;
&lt;li&gt;&lt;code &gt;Memoize(table1 table2)&lt;/code&gt; / &lt;code &gt;NoMemoize(table1 table2)&lt;/code&gt;: Influences whether the Memoize functionality is applied to the given join tables. Memoize can sometimes cause Postgres planner costs to be off, and as such the “NoMemoize” hint can be useful to avoid query plans that might favor a Nested Loop Join.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;best-practices-for-debugging-pg_hint_plan-hints&quot; &gt;&lt;a href=&quot;#best-practices-for-debugging-pg_hint_plan-hints&quot; aria-label=&quot;best practices for debugging pg_hint_plan hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Best practices for debugging pg_hint_plan hints&lt;/h2&gt;
&lt;p&gt;Sometimes a pg_hint_plan hint won&apos;t take effect, and it&apos;s not always clear why that might be, as Postgres will always give you a plan, even if the pg_hint_plan hints did not take effect.&lt;/p&gt;
&lt;p&gt;The most common problems can be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Specifying multiple hint comments (if you have multiple hints you must specify them all in one &lt;code &gt;/*+ ... */&lt;/code&gt; comment)&lt;/li&gt;
&lt;li&gt;Using incorrect pg_hint_plan syntax (e.g. &lt;code &gt;NestedLoop&lt;/code&gt; instead of &lt;code &gt;NestLoop&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The planner not having a viable path to use the hint (e.g. because the requested index can&apos;t be used for a given expression)&lt;/li&gt;
&lt;li&gt;Re-used table names not having unique aliases in a query (you need to assign an alias to each table in such situations)&lt;/li&gt;
&lt;li&gt;Hints for partitioned tables must target the partition table parent, not the children&lt;/li&gt;
&lt;li&gt;Subqueries that do not have an assigned name (i.e. are not a CTE) can &lt;a href=&quot;https://pg-hint-plan.readthedocs.io/en/latest/hint_details.html#subqueries&quot;&gt;only be hinted in some cases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, by default you may not see any clear indication of a problem, since pg_hint_plan does not show any debug output by default.&lt;/p&gt;
&lt;p&gt;To understand better why hints may not have been used, you can enable the &lt;code &gt;pg_hint_plan.print_debug&lt;/code&gt; flag. This will give you output like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; pg_hint_plan&lt;span &gt;.&lt;/span&gt;debug_print &lt;span &gt;=&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;  
&lt;span &gt;/*+ NestedLoop(table1 table2) */&lt;/span&gt; &lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; …&lt;span &gt;;&lt;/span&gt;  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;INFO:  pg_hint_plan: hint syntax error at or near &quot;NestedLoop&quot;.  
DETAIL:  Unrecognized hint keyword &quot;NestedLoop&quot;.  
                                          QUERY PLAN                                        	   
----------------------------------------------------------------------------------------------------  
…  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Additionally you can show more detailed output about hint usage by raising the client log level (&lt;code &gt;client_min_messages&lt;/code&gt;) to &lt;code &gt;LOG&lt;/code&gt;, which will tell you which hints were used successfully:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; client_min_messages &lt;span &gt;=&lt;/span&gt; LOG&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;/*+ NestLoop(table1 table2) IndexScan(table3) */&lt;/span&gt; &lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; table1 &lt;span &gt;JOIN&lt;/span&gt; table2 
&lt;span &gt;ON&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;table2_id &lt;span &gt;=&lt;/span&gt; table2&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; table1_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;123&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;LOG:  pg_hint_plan:
used hint:
NestLoop(table1 table2)
not used hint:
IndexScan(table3)
duplication hint:
error hint:
                                        QUERY PLAN                                     	 
----------------------------------------------------------------------------------------------
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find additional aspects to consider in the &lt;a href=&quot;https://pg-hint-plan.readthedocs.io/en/latest/hint_details.html&quot;&gt;pg_hint_plan documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;using-pganalyze-to-test-query-hints&quot; &gt;&lt;a href=&quot;#using-pganalyze-to-test-query-hints&quot; aria-label=&quot;using pganalyze to test query hints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using pganalyze to test query hints&lt;/h2&gt;
&lt;p&gt;Oftentimes Oracle-to-Postgres migrations run into challenges when on a deadline to complete pre-production performance testing or right after going live. In such situations, pganalyze can help you quickly iterate on different hints and benchmark query plans using &lt;a href=&quot;https://pganalyze.com/blog/introducing-postgres-query-tuning-workbooks&quot;&gt;Query Tuning Workbooks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the following example, we compared a baseline query with a query variant that uses pg_hint_plan to choose a particular index. From these results, it&apos;s clear that implementing the hint improves performance by more than 60%, plus it&apos;s documented for the whole team to see why the change was made.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;compare-plan-with-hint&quot;
        title=&quot;compare-plan-with-hint&quot;
        src=&quot;https://pganalyze.com/static/f1d371d15d792cbdf5c535c057ff0a36/1d69c/compare-plan-with-hint.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;By iterating through this process of identifying slow queries, testing variants, and implementing optimizations, you avoid guesswork, ensure that each hint actually benefits your application, and prevent adding unnecessary complexity to your database.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Migrating Oracle hints to PostgreSQL can be a tricky process, but pg_hint_plan provides a valuable tool for those times when you really need to guide Postgres&apos; planner. Nonetheless, remember that PostgreSQL is intended to make sound decisions based on strong statistics, strategic indexing, and well-chosen cost parameters, which can all be optimized using pganalyze. Hints should serve as a targeted solution, not the default approach.&lt;/p&gt;
&lt;h2 id=&quot;references&quot; &gt;&lt;a href=&quot;#references&quot; aria-label=&quot;references permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;References&lt;/h2&gt;
&lt;h3 id=&quot;documentation&quot; &gt;&lt;a href=&quot;#documentation&quot; aria-label=&quot;documentation permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Documentation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ossc-db/pg_hint_plan&quot;&gt;pg_hint_plan GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pg-hint-plan.readthedocs.io/en/latest/index.html&quot;&gt;pg_hint_plan Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/Comments.html#GUID-D316D545-89E2-4D54-977F-FC97815CD62E&quot;&gt;Oracle Database - Hint documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/docs/query-tuning&quot;&gt;pganalyze Query Tuning Workbooks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/explicit-joins.html&quot;&gt;PostgreSQL Documentation: 17: 14.3. Controlling the Planner with Explicit JOIN Clauses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/queries-with.html&quot;&gt;PostgreSQL Documentation: 17: 7.8. WITH Queries (Common Table Expressions)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;5mins-of-postgres-episodes-on-planner-quirks&quot; &gt;&lt;a href=&quot;#5mins-of-postgres-episodes-on-planner-quirks&quot; aria-label=&quot;5mins of postgres episodes on planner quirks permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;5mins of Postgres episodes on planner quirks&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-planner-join-equivalence-class-in-any-filters&quot;&gt;JOIN Equivalence Classes and IN/ANY filters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-planner-jsonb-selectivity&quot;&gt;How to fix bad JSONB selectivity estimates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit&quot;&gt;The impact of ORDER BY + LIMIT on index usage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;webinars--ebooks&quot; &gt;&lt;a href=&quot;#webinars--ebooks&quot; aria-label=&quot;webinars  ebooks permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Webinars &amp;#x26; eBooks&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/webinars/how-to-optimize-slow-queries-with-EXPLAIN&quot;&gt;How to Optimize Slow Queries with EXPLAIN to Fix Bad Query Plans&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;Best Practices for Optimizing Postgres Query Performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;blog-posts&quot; &gt;&lt;a href=&quot;#blog-posts&quot; aria-label=&quot;blog posts permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Blog posts&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-work-mem-tuning&quot;&gt;The surprising logic of the Postgres work_mem setting, and how to tune it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/index-advisor-v3&quot;&gt;Introducing pganalyze Index Advisor 3.0 - A workload-aware system for finding missing indexes in Postgres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/how-postgres-chooses-index&quot;&gt;How Postgres Chooses Which Index To Use For A Query&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-UNION-subquery-pull-up-performance&quot;&gt;Speed up Postgres queries with UNIONs and subquery pull-up&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/introducing-postgres-query-tuning-workbooks&quot;&gt;Introducing Query Tuning Workbooks: Safely Tune Postgres Queries on Production with pganalyze&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Introducing pg_query for Postgres 16 - Parsing SQL/JSON, Windows support, PL/pgSQL parse mode & more]]></title><description><![CDATA[Parsing SQL queries and turning them into a syntax tree is not a simple task. Especially when you want to support special syntax that is specific to a particular database engine, like Postgres. And when you’re working with queries day in day out, like we do at pganalyze, understanding the actual intent of a query, which tables it scans, which columns it filters on, and such, is essential. Almost 10 years ago, we determined that in order to create the best product for monitoring and optimizing…]]></description><link>https://pganalyze.com/blog/pg-query-postgres-16</link><guid isPermaLink="false">https://pganalyze.com/blog/pg-query-postgres-16</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Thu, 11 Jan 2024 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Parsing SQL queries and turning them into a syntax tree is not a simple task. Especially when you want to support special syntax that is specific to a particular database engine, like Postgres. And when you’re working with queries day in day out, like we do at pganalyze, &lt;strong&gt;understanding the actual intent of a query, which tables it scans, which columns it filters on, and such, is essential.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Almost 10 years ago, we determined that in order to create the best product for monitoring and optimizing Postgres, we needed to parse queries the way that Postgres does. &lt;strong&gt;We released the first version of pg_query back in 2014&lt;/strong&gt;, and have seen many different projects outside of pganalyze utilize our open-source project. For example, to support migration use cases, create linting tools, or check which queries an application executes (see our &lt;a href=&quot;https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser&quot;&gt;post from 2021&lt;/a&gt; for some examples). And to name just one vanity metric, the &lt;a href=&quot;https://rubygems.org/gems/pg_query&quot;&gt;Ruby binding for pg_query&lt;/a&gt; has been downloaded an incredible 34 million times!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Today, we’re excited to announce the new &lt;a href=&quot;https://github.com/pganalyze/libpg_query/releases/tag/16-5.1.0&quot;&gt;pg_query release&lt;/a&gt;&lt;/strong&gt; based on the Postgres 16 parser, which introduces support for running on Windows (a frequently requested addition), alternate query parse modes (e.g. to parse PL/pgSQL assignments), as well as parsing and deparsing new Postgres syntax, such as SQL/JSON. We’ve released updated &lt;a href=&quot;https://github.com/pganalyze/pg_query&quot;&gt;Ruby&lt;/a&gt;, &lt;a href=&quot;https://github.com/pganalyze/pg_query.rs&quot;&gt;Rust&lt;/a&gt; and &lt;a href=&quot;https://github.com/pganalyze/pg_query_go&quot;&gt;Go&lt;/a&gt; bindings, and expect bindings maintained by the community, such as for Node.js and Python, to be updated soon as well.&lt;/p&gt;
&lt;p&gt;In this post, we showcase how to use pg_query in your application, and a few benefits of the new release. But first, let’s go back to the basics - how does pg_query work?&lt;/p&gt;
&lt;h2 id=&quot;pg_query-the-postgres-parser-as-a-standalone-c-library&quot; &gt;&lt;a href=&quot;#pg_query-the-postgres-parser-as-a-standalone-c-library&quot; aria-label=&quot;pg_query the postgres parser as a standalone c library permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;pg_query, the Postgres parser as a standalone C library&lt;/h2&gt;
&lt;p&gt;At its core, pg_query is all about making the “raw_parser” function from Postgres available. We’ve &lt;a href=&quot;https://pganalyze.com/blog/parse-postgresql-queries-in-ruby&quot;&gt;written about this in more detail in the original pg_query announcement&lt;/a&gt;, but the quick summary is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We apply a tiny amount of patches on top of Postgres, e.g. to help with parsing $n parameter references in queries from pg_stat_statements&lt;/li&gt;
&lt;li&gt;We utilize libclang to build a tree of dependencies between functions and global variables in the Postgres source code&lt;/li&gt;
&lt;li&gt;In some cases, we apply mocks to avoid entering parts of Postgres we don’t need (e.g., functions that access the file system)&lt;/li&gt;
&lt;li&gt;We locate all the source code necessary for the functions we want to call (like “raw_parser”), and remove all other code, to make sure the compiler doesn’t do unnecessary work, or pull in functionality we don’t need&lt;/li&gt;
&lt;li&gt;From the built-in node definitions (which are C structs), we automatically create output functions for JSON and protocol buffers, to make it convenient to write bindings in other programming languages&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Overall, this results in a library that can parse SQL text and return a Postgres parse tree for you to work with and modify, whilst supporting the full syntax that Postgres itself supports.&lt;/p&gt;
&lt;p&gt;From an end user perspective that means you can, for example in the Ruby library, use the following code to parse a query, and find out which table it&apos;s querying:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&apos;pg_query&apos;&lt;/span&gt;
parsed_query &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT * FROM users&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
puts parsed_query&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;.&lt;/span&gt;stmts&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;.&lt;/span&gt;stmt&lt;span &gt;.&lt;/span&gt;select_stmt&lt;span &gt;.&lt;/span&gt;from_clause&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;.&lt;/span&gt;range_var&lt;span &gt;.&lt;/span&gt;inspect
&lt;span &gt;# =&gt; &amp;lt;PgQuery::RangeVar: catalogname: &quot;&quot;, schemaname: &quot;&quot;, relname: &quot;users&quot;, inh: true, relpersistence: &quot;p&quot;, location: 14&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The parse tree structs are automatically generated as &lt;a href=&quot;https://github.com/pganalyze/libpg_query/blob/16-latest/protobuf/pg_query.proto&quot;&gt;protocol buffer definitions&lt;/a&gt; based on Postgres’ internal structs located in &lt;a href=&quot;https://github.com/postgres/postgres/blob/REL_16_STABLE/src/include/nodes/parsenodes.h&quot;&gt;parsenodes.h&lt;/a&gt; and adjacent files, and the language-specific bindings can use each language’s protobuf libraries to have properly typed structs as well.&lt;/p&gt;
&lt;p&gt;The main change in the core parsing functionality in this release is that we’ve added support for compiling libpg_query on Windows (with either MSVC, or an MSYS2 stack using MinGW/etc), a &lt;a href=&quot;https://github.com/pganalyze/libpg_query/issues/44&quot;&gt;frequently requested feature&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;using-query-fingerprints-to-identify-queries-across-servers&quot; &gt;&lt;a href=&quot;#using-query-fingerprints-to-identify-queries-across-servers&quot; aria-label=&quot;using query fingerprints to identify queries across servers permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using query fingerprints to identify queries across servers&lt;/h2&gt;
&lt;p&gt;Besides parsing itself, there was another major use case that we needed to solve for pganalyze: &lt;strong&gt;The ability to group queries together.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Postgres itself generates a “queryid” to support this. Originally part of pg_stat_statements, it has been part of Postgres core since Postgres 14, and is generated when “compute_query_id” is enabled (automatically done when using pg_stat_statements). However, the Postgres queryid has its flaws: Besides not always grouping together as well as it could (e.g. in the case of IN lists), it’s not portable. If you ran the same query on two different servers, you would get two different query IDs. This difference in query IDs is primarily explained by the fact that Postgres determines which tables a query references based on the relation OIDs. But those OIDs are not stable across servers, as they are internal identifiers.&lt;/p&gt;
&lt;p&gt;With the &lt;strong&gt;pg_query fingerprint&lt;/strong&gt; we intentionally went another way: We utilize the name (and schema) of the table, as it is present in the raw parse tree that pg_query has access to, when generating a unique identifier for a query.&lt;/p&gt;
&lt;p&gt;There are of course many other parts of a query we also take into consideration, e.g. referenced columns, expressions, functions, etc. To enable grouping we do not include constant values in the fingerprint, to ensure that two similar queries get the same fingerprint:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;fingerprint&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT * FROM users WHERE id = 1&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;a0ead580058af585&quot;&lt;/span&gt;
&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;fingerprint&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT * FROM users WHERE id = 2&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;a0ead580058af585&quot;&lt;/span&gt;
&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;fingerprint&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT * FROM users WHERE email = $1&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;e213d9d32c7097d5&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What else can we use fingerprints for? One use case that we’ve heard about from pganalyze customers, is to use query fingerprints to help identify the same query on both the application side and the database.&lt;/p&gt;
&lt;p&gt;Specifically, by using pg_query in application side tracing to tag a query, and then, when looking at a slow trace, using that data in pganalyze to find more detailed information about database-side performance. This also inspired our &lt;a href=&quot;https://pganalyze.com/docs/opentelemetry&quot;&gt;recent integration with OpenTelemetry&lt;/a&gt;, which solves the same use case in a slightly different way.&lt;/p&gt;
&lt;h2 id=&quot;utilizing-deparsing-to-upgrade-queries-to-postgres-16-sqljson-syntax&quot; &gt;&lt;a href=&quot;#utilizing-deparsing-to-upgrade-queries-to-postgres-16-sqljson-syntax&quot; aria-label=&quot;utilizing deparsing to upgrade queries to postgres 16 sqljson syntax permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Utilizing deparsing to upgrade queries to Postgres 16 SQL/JSON syntax&lt;/h2&gt;
&lt;p&gt;Now to something new in the Postgres 16 release! In Postgres 16, one of the bigger syntax changes was the addition of SQL/JSON. And pg_query fully supports that, both for parsing, as well as deparsing (which allows you to turn a syntax tree back into a SQL statement).&lt;/p&gt;
&lt;p&gt;We can use the pg_query deparser to write the equivalent of a codemod for SQL statements, that rewrites the legacy syntax into the more standard SQL/JSON syntax.&lt;/p&gt;
&lt;p&gt;For example, imagine we have many places where we build JSON objects manually in SQL using the “json_build_object” function, and wanted to replace that with the new JSON_OBJECT syntax:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;q &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT json_build_object(&apos;key1&apos;, 1, &apos;key2&apos;, &apos;val&apos;);&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
q&lt;span &gt;.&lt;/span&gt;walk&lt;span &gt;!&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;node&lt;span &gt;|&lt;/span&gt;
  &lt;span &gt;next&lt;/span&gt; &lt;span &gt;unless&lt;/span&gt; node&lt;span &gt;.&lt;/span&gt;is_a&lt;span &gt;?&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; node&lt;span &gt;.&lt;/span&gt;node &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:func_call&lt;/span&gt;
  func_name &lt;span &gt;=&lt;/span&gt; node&lt;span &gt;.&lt;/span&gt;func_call&lt;span &gt;.&lt;/span&gt;funcname&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;string&lt;span &gt;.&lt;/span&gt;sval
  &lt;span &gt;if&lt;/span&gt; func_name &lt;span &gt;==&lt;/span&gt; &lt;span &gt;&apos;json_build_object&apos;&lt;/span&gt;
    exprs &lt;span &gt;=&lt;/span&gt; node&lt;span &gt;.&lt;/span&gt;func_call&lt;span &gt;.&lt;/span&gt;args&lt;span &gt;.&lt;/span&gt;each_slice&lt;span &gt;(&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;map &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;key&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;|&lt;/span&gt;
      &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Node&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;from&lt;span &gt;(&lt;/span&gt;
        &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;JsonKeyValue&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
          key&lt;span &gt;:&lt;/span&gt; key&lt;span &gt;,&lt;/span&gt;
          value&lt;span &gt;:&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;JsonValueExpr&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;raw_expr&lt;span &gt;:&lt;/span&gt; value&lt;span &gt;)&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;
      &lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
    node&lt;span &gt;.&lt;/span&gt;inner &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;JsonObjectConstructor&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;exprs&lt;span &gt;:&lt;/span&gt; exprs&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;
q&lt;span &gt;.&lt;/span&gt;deparse
&lt;span &gt;# =&gt; &quot;SELECT JSON_OBJECT(&apos;key1&apos;: 1, &apos;key2&apos;: &apos;val&apos;)&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Each release, we test the pg_query deparser for completeness with the full set of Postgres regression tests, and be it SQL/JSON, or other new syntax, you can rest assured that pg_query supports it.&lt;/p&gt;
&lt;h2 id=&quot;alternate-parse-modes-to-work-with-plpgsql-expressions&quot; &gt;&lt;a href=&quot;#alternate-parse-modes-to-work-with-plpgsql-expressions&quot; aria-label=&quot;alternate parse modes to work with plpgsql expressions permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Alternate parse modes to work with PL/pgSQL expressions&lt;/h2&gt;
&lt;p&gt;Since &lt;a href=&quot;https://github.com/postgres/postgres/commit/844fe9f159a948377907a63d0ef3fb16dc51ce50&quot;&gt;Postgres 14&lt;/a&gt;, PL/pgSQL expressions are now parsed through the regular “raw_parser” functionality, by passing a special mode flag that then allows for PL/pgSQL specific syntax.&lt;/p&gt;
&lt;p&gt;We didn’t support this in pg_query before, but thanks to &lt;a href=&quot;https://github.com/pganalyze/libpg_query/pull/216&quot;&gt;a contribution by Landan Cheruka&lt;/a&gt;, there is now a way to parse PL/pgSQL expressions directly with pg_query.&lt;/p&gt;
&lt;p&gt;Let’s first utilize parse_plpgsql to parse a function definition, the example taken from the Postgres documentation:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; &lt;span &gt;REPLACE&lt;/span&gt; &lt;span &gt;FUNCTION&lt;/span&gt; cs_fmt_browser_version&lt;span &gt;(&lt;/span&gt;v_name &lt;span &gt;varchar&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                                              	  v_version &lt;span &gt;varchar&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;RETURNS&lt;/span&gt; &lt;span &gt;varchar&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; $$
&lt;span &gt;BEGIN&lt;/span&gt;
  &lt;span &gt;IF&lt;/span&gt; v_version &lt;span &gt;IS&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt; &lt;span &gt;THEN&lt;/span&gt;
	&lt;span &gt;RETURN&lt;/span&gt; v_name&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;END&lt;/span&gt; &lt;span &gt;IF&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;RETURN&lt;/span&gt; v_name &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos;/&apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; v_version&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;END&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;$$&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;&quot;PLpgSQL_function&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;&quot;datums&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
      &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;PLpgSQL_var&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;refname&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;v_name&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;datatype&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;PLpgSQL_type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;typname&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;UNKNOWN&quot;&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;PLpgSQL_var&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;refname&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;v_version&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;datatype&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;PLpgSQL_type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;typname&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;UNKNOWN&quot;&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;PLpgSQL_var&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;refname&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;found&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;datatype&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;PLpgSQL_type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;typname&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;UNKNOWN&quot;&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&quot;action&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      &lt;span &gt;&quot;PLpgSQL_stmt_block&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
        &lt;span &gt;&quot;body&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;
            &lt;span &gt;&quot;PLpgSQL_stmt_if&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;cond&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                &lt;span &gt;&quot;PLpgSQL_expr&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;query&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;v_version IS NULL&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;parseMode&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
              &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;then_body&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
                &lt;span &gt;{&lt;/span&gt;
                  &lt;span &gt;&quot;PLpgSQL_stmt_return&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                    &lt;span &gt;&quot;expr&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                      &lt;span &gt;&quot;PLpgSQL_expr&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;query&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;v_name&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;parseMode&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
                    &lt;span &gt;}&lt;/span&gt;
...
            &lt;span &gt;&quot;PLpgSQL_stmt_return&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;expr&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                &lt;span &gt;&quot;PLpgSQL_expr&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;&quot;query&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;v_name || &apos;/&apos; || v_version&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;parseMode&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this function parse tree, you can see the different PLpgSQL_expr expressions, but the actual expression is just text. We can now use the new pg_query_parse_opt function to turn that text into a parse tree:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&lt;span &gt;#&lt;/span&gt;&lt;span &gt;include&lt;/span&gt; &lt;span &gt;&amp;lt;pg_query.h&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span &gt;&lt;span &gt;#&lt;/span&gt;&lt;span &gt;include&lt;/span&gt; &lt;span &gt;&amp;lt;stdio.h&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span &gt;&lt;span &gt;#&lt;/span&gt;&lt;span &gt;include&lt;/span&gt; &lt;span &gt;&amp;lt;stdlib.h&gt;&lt;/span&gt;&lt;/span&gt;

&lt;span &gt;int&lt;/span&gt; &lt;span &gt;main&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  PgQueryParseResult result&lt;span &gt;;&lt;/span&gt;

  result &lt;span &gt;=&lt;/span&gt; &lt;span &gt;pg_query_parse_opts&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;v_name || &apos;/&apos; || v_version&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; PG_QUERY_PARSE_PLPGSQL_EXPR&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

  &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;result&lt;span &gt;.&lt;/span&gt;error&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
	&lt;span &gt;printf&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;error: %s at %d\n&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; result&lt;span &gt;.&lt;/span&gt;error&lt;span &gt;-&gt;&lt;/span&gt;message&lt;span &gt;,&lt;/span&gt; result&lt;span &gt;.&lt;/span&gt;error&lt;span &gt;-&gt;&lt;/span&gt;cursorpos&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt; &lt;span &gt;else&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
	&lt;span &gt;printf&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;%s\n&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; result&lt;span &gt;.&lt;/span&gt;parse_tree&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;

  &lt;span &gt;pg_query_free_parse_result&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;result&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And that gives us a regular parse tree to work with:&lt;/p&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;{&lt;/span&gt;
	&lt;span &gt;&quot;version&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;160001&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
	&lt;span &gt;&quot;stmts&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
    	&lt;span &gt;{&lt;/span&gt;
        	&lt;span &gt;&quot;stmt&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
            	&lt;span &gt;&quot;SelectStmt&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                	&lt;span &gt;&quot;targetList&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
                    	&lt;span &gt;{&lt;/span&gt;
                        	&lt;span &gt;&quot;ResTarget&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                            	&lt;span &gt;&quot;val&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                                	&lt;span &gt;&quot;A_Expr&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                                    	&lt;span &gt;&quot;kind&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;AEXPR_OP&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
…&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We’re still in the process of updating language bindings to support optionally using these parse modes, and would be curious to hear about more use cases for working with PL/pgSQL and pg_query.&lt;/p&gt;
&lt;h2 id=&quot;a-shout-out-to-the-community&quot; &gt;&lt;a href=&quot;#a-shout-out-to-the-community&quot; aria-label=&quot;a shout out to the community permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;A shout-out to the community&lt;/h2&gt;
&lt;p&gt;pg_query wouldn’t be the same without the community!&lt;/p&gt;
&lt;p&gt;We want to expressly call out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/lelit&quot;&gt;Lele Gaifax&lt;/a&gt; for maintaining the Python binding “pglast” and proactively testing libpg_query PRs&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/lcheruka&quot;&gt;Landan Cheruka&lt;/a&gt; for adding &lt;a href=&quot;https://github.com/pganalyze/libpg_query/pull/216&quot;&gt;support for alternate parse modes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/anuraaga&quot;&gt;Anuraag Agrawal&lt;/a&gt; for contributions to enable use in WebAssembly (see &lt;a href=&quot;https://github.com/wasilibs/go-pgquery&quot;&gt;pg_query_go without cgo&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/emin100&quot;&gt;Mehmet Emin KARAKAŞ&lt;/a&gt; for the many deparser improvements over the years&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/psteinroe&quot;&gt;Philipp Steinrötter&lt;/a&gt; for creating the &lt;a href=&quot;https://github.com/supabase/postgres_lsp&quot;&gt;Postgres Language Server&lt;/a&gt; based on pg_query.rs, and giving lots of good feedback on how things could work better&lt;/li&gt;
&lt;li&gt;And everyone else who contributed to libpg_query and related projects!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Looking ahead, we’re also looking forward to continued conversations with the Postgres community on how we could upstream parts of pg_query as a core part of Postgres, so a query parsing library could be provided directly as part of Postgres.&lt;/p&gt;
&lt;h2 id=&quot;in-conclusion&quot; &gt;&lt;a href=&quot;#in-conclusion&quot; aria-label=&quot;in conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;In conclusion&lt;/h2&gt;
&lt;p&gt;We’re excited about the &lt;strong&gt;&lt;a href=&quot;https://github.com/pganalyze/libpg_query&quot;&gt;new pg_query version&lt;/a&gt;&lt;/strong&gt;, and we’re always happy to hear about new use cases you find for using it to work with Postgres queries. If you have ideas on how pg_query could be better, feel free to &lt;a href=&quot;https://github.com/pganalyze/libpg_query&quot;&gt;open an issue on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And if you’ve benefited from pg_query in the past, and have not yet tried out pganalyze to optimize your Postgres performance, you can &lt;a href=&quot;https://app.pganalyze.com/users/sign_up&quot;&gt;try out pganalyze with our free 14-day trial&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Postgres 16: Cumulative I/O statistics with pg_stat_io]]></title><description><![CDATA[One of the most common questions I get from people running Postgres databases at scale is:
How do I optimize the I/O operations of my database? Historically, getting a complete picture of all the I/O produced by a Postgres server has been challenging. To start with, Postgres splits its I/O activity into writing the WAL stream, and reads/writes to the data directory. The real challenge is understanding second-order effects around writes: Typically the write to the data directory happens after the…]]></description><link>https://pganalyze.com/blog/pg-stat-io</link><guid isPermaLink="false">https://pganalyze.com/blog/pg-stat-io</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Tue, 14 Feb 2023 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;One of the most common questions I get from people running Postgres databases at scale is:&lt;br /&gt;
&lt;strong&gt;How do I optimize the I/O operations of my database?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Historically, getting a complete picture of all the I/O produced by a Postgres server has been challenging. To start with, Postgres splits its I/O activity into writing the WAL stream, and reads/writes to the data directory. &lt;strong&gt;The real challenge is understanding second-order effects around writes&lt;/strong&gt;: Typically the write to the data directory happens after the transaction commits, and understanding which process actually writes to the data directory (and when) is hard.&lt;/p&gt;
&lt;p&gt;This whole situation has become an even bigger challenge in the cloud, when faced with provisioned IOPS, or worse, having to pay for individual I/Os like on Amazon Aurora. Often the solution has been to look at parts of the system that have instrumentation (such as individual queries), to get at least some sense for where the activity is happening.&lt;/p&gt;
&lt;p&gt;Last weekend, a &lt;strong&gt;major improvement to the visibility into I/O activity&lt;/strong&gt; &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a9c70b46dbe152e094f137f7e6ba9cd3a638ee25&quot;&gt;was committed&lt;/a&gt; to the upcoming Postgres 16 by Andres Freund, and authored by Melanie Plageman, with documentation contributed by Samay Sharma. My colleague Maciek Sakrejda and I have reviewed this patch through its various iterations, and we&apos;re very excited about what it brings to Postgres observability.&lt;/p&gt;
&lt;p&gt;Welcome, &lt;strong&gt;pg_stat_io&lt;/strong&gt;. Let&apos;s take a look:&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#querying-system-wide-io-statistics-in-postgres&quot;&gt;Querying system-wide I/O statistics in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#use-cases-for-pg_stat_io&quot;&gt;Use cases for pg_stat_io&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#tracking-write-io-activity-in-postgres&quot;&gt;Tracking Write I/O activity in Postgres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#improve-workload-stability-and-sizing-shared_buffers-by-monitoring-shared-buffer-evictions&quot;&gt;Improve workload stability and sizing shared_buffers by monitoring shared buffer evictions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#tracking-cumulative-io-activity-by-autovacuum-and-manual-vacuums&quot;&gt;Tracking cumulative I/O activity by autovacuum and manual VACUUMs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#visibility-into-bulk-readwrite-strategies-sequential-scans-and-copy&quot;&gt;Visibility into bulk read/write strategies (sequential scans and COPY)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#sneak-peek-visualizing-pg_stat_io-in-pganalyze&quot;&gt;Sneak peek: Visualizing pg_stat_io in pganalyze&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#the-future-of-io-observability-in-postgres&quot;&gt;The future of I/O observability in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;querying-system-wide-io-statistics-in-postgres&quot; &gt;&lt;a href=&quot;#querying-system-wide-io-statistics-in-postgres&quot; aria-label=&quot;querying system wide io statistics in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Querying system-wide I/O statistics in Postgres&lt;/h2&gt;
&lt;p&gt;Let&apos;s start by using a local Postgres built fresh from the development branch. Note that Postgres 16 is still under heavy development, not even at beta stage, and should definitely not be used on production. For this I followed the &lt;a href=&quot;https://wiki.postgresql.org/wiki/Meson&quot;&gt;new cheatsheet for using the Meson build system&lt;/a&gt; (also new in Postgres 16), which significantly speeds up the build and test process.&lt;/p&gt;
&lt;p&gt;We can start by querying &lt;code &gt;pg_stat_io&lt;/code&gt; to get a sense for which information is tracked, omitting rows that are empty:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_stat_io &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;reads&lt;/span&gt; &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; writes &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; extends &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;    backend_type     | io_object | io_context |  reads   | writes  | extends | op_bytes | evictions |  reuses  | fsyncs |          stats_reset          
---------------------+-----------+------------+----------+---------+---------+----------+-----------+----------+--------+-------------------------------
 autovacuum launcher | relation  | normal     |       19 |       5 |         |     8192 |        13 |          |      0 | 2023-02-13 11:50:27.583875-08
 autovacuum worker   | relation  | normal     |    15972 |    2494 |    2894 |     8192 |     17430 |          |      0 | 2023-02-13 11:50:27.583875-08
 autovacuum worker   | relation  | vacuum     |  5754853 | 3006563 |       0 |     8192 |      2056 |  5752594 |        | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | bulkread   | 25832582 |  626900 |         |     8192 |    753962 | 25074439 |        | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | bulkwrite  |     4654 | 2858085 | 3259572 |     8192 |    998220 |  2209070 |        | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | normal     |   960291 |  376524 |  159497 |     8192 |   1103707 |          |      0 | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | vacuum     |   128710 |       0 |       0 |     8192 |      1221 |   127489 |        | 2023-02-13 11:50:27.583875-08
 background worker   | relation  | bulkread   | 39059938 |  590896 |         |     8192 |    802939 | 38253662 |        | 2023-02-13 11:50:27.583875-08
 background worker   | relation  | normal     |   257533 |  118972 |       0 |     8192 |    256437 |          |      0 | 2023-02-13 11:50:27.583875-08
 background writer   | relation  | normal     |          |  243142 |         |     8192 |           |          |      0 | 2023-02-13 11:50:27.583875-08
 checkpointer        | relation  | normal     |          |  390141 |         |     8192 |           |          |  18812 | 2023-02-13 11:50:27.583875-08
 standalone backend  | relation  | bulkwrite  |        0 |       0 |       8 |     8192 |         0 |        0 |        | 2023-02-13 11:50:27.583875-08
 standalone backend  | relation  | normal     |      689 |     983 |     470 |     8192 |         0 |          |      0 | 2023-02-13 11:50:27.583875-08
 standalone backend  | relation  | vacuum     |       10 |       0 |       0 |     8192 |         0 |        0 |        | 2023-02-13 11:50:27.583875-08
(14 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;At a high level, this information can be interpreted as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Statistics are tracked for a given backend type, I/O object type (i.e. whether it&apos;s a temporary table), and I/O context (more on that later)&lt;/li&gt;
&lt;li&gt;The main statistics are counting I/O operations: &lt;strong&gt;reads&lt;/strong&gt;, &lt;strong&gt;writes&lt;/strong&gt; and &lt;strong&gt;extends&lt;/strong&gt; (a special kind of write to resize data files)&lt;/li&gt;
&lt;li&gt;For each I/O operation the size in bytes is noted to help interpret the statistics (currently always block size, i.e., usually 8kB)&lt;/li&gt;
&lt;li&gt;Additionally, the number of shared buffer evictions, ring buffer re-uses and fsync calls are tracked&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;On Postgres 16, this system-wide information will always available.&lt;/strong&gt; You can find the complete details of each field in the &lt;a href=&quot;https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-IO-VIEW&quot;&gt;Postgres documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Note that &lt;code &gt;pg_stat_io&lt;/code&gt; shows logical I/O operations issued by Postgres. Whilst this often eventually maps to an actual I/O to a disk (especially in the case of writes), the operating system has its own caching and batching mechanism, and will for example often times split up an 8kB write to become two individual 4kB writes to the file system.&lt;/p&gt;
&lt;p&gt;Generally we can assume that this captures all I/O issued by Postgres, except for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I/O for writing the Write-Ahead-Log (WAL)&lt;/li&gt;
&lt;li&gt;Special cases such as tables being moved between tablespaces&lt;/li&gt;
&lt;li&gt;Temporary files (such as used for sorts, or extensions like &lt;code &gt;pg_stat_statements&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that temporary relations are tracked (they are not the same as temporary files): In &lt;code &gt;pg_stat_io&lt;/code&gt; these are marked as &lt;code &gt;io_object = &quot;temp relation&quot;&lt;/code&gt; - you may otherwise be familiar with them being called &quot;local buffers&quot; in other statistics views.&lt;/p&gt;
&lt;p&gt;With the basics in place, we can take a closer look at some use cases and learn why this matters.&lt;/p&gt;
&lt;h2 id=&quot;use-cases-for-pg_stat_io&quot; &gt;&lt;a href=&quot;#use-cases-for-pg_stat_io&quot; aria-label=&quot;use cases for pg_stat_io permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Use cases for pg_stat_io&lt;/h2&gt;
&lt;h3 id=&quot;tracking-write-io-activity-in-postgres&quot; &gt;&lt;a href=&quot;#tracking-write-io-activity-in-postgres&quot; aria-label=&quot;tracking write io activity in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Tracking Write I/O activity in Postgres&lt;/h3&gt;
&lt;figure&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/62bff7b564c8c03267aab13d95edf800/ca98b/write_lifecycle.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Lifecycle of a write in Postgres&quot; title=&quot;Lifecycle of a write in Postgres&quot; src=&quot;https://pganalyze.com/static/62bff7b564c8c03267aab13d95edf800/1d69c/write_lifecycle.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;figcaption&gt;Lifecycle of a write in Postgres, and what is currently not visible in most statistics&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;When looking at a write in Postgres, we need to look beyond what a client sees as the query runtime, or something like pg_stat_statements can track. Postgres has a complex set of mechanisms that guarantee durability of writes, whilst allowing
clients to return quickly, trusting that the server has persisted the data in a crash safe manner.&lt;/p&gt;
&lt;p&gt;The first thing that Postgres does to persist data, is to &lt;strong&gt;write it to the WAL log.&lt;/strong&gt; Once this has succeeded, the client
will receive confirmation that the write has been successful. But what happens afterwards is where the additional
statistics tracking comes in handy.&lt;/p&gt;
&lt;p&gt;For example, if you look at a given INSERT statement in pg_stat_statements, the &lt;code &gt;shared_blks_written&lt;/code&gt; field is often going to tell you next to nothing, because the actual write to the data directory typically occurs at a later time, in order to batch writes for efficiency and to avoid I/O spikes.&lt;/p&gt;
&lt;p&gt;In addition to writing the WAL, &lt;strong&gt;Postgres will also update the shared (or local) buffers for the write.&lt;/strong&gt; Such an update
will mark the buffer page in question as &quot;dirty&quot;.&lt;/p&gt;
&lt;p&gt;Then, in most cases, another process is responsible for actually
writing the dirty page to the data directory. There are three main process types to consider:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The background writer:&lt;/strong&gt; Runs continuously in the background to write out (some) dirty pages&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The checkpointer:&lt;/strong&gt; Runs on a scheduled basis, or based on amount of WAL written, and writes out all dirty pages not yet written&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;All other process types&lt;/strong&gt;, including regular client backends: Write out dirty pages if they need to evict the buffer page in question&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The main thing to understand is when the third case occurs - because &lt;strong&gt;it can drastically slow down queries&lt;/strong&gt;. Even a simple &quot;SELECT&quot; might have to suddenly write to disk, before it has enough space in shared buffers to read in its data.&lt;/p&gt;
&lt;p&gt;Historically you were already able to see some of this activity through the &lt;code &gt;pg_stat_bgwriter&lt;/code&gt; view, specifically the fields named &lt;code &gt;buffers_&lt;/code&gt;. However, this was incomplete, did not consider autovacuum activity explicitly, and did not let you understand the root cause of a write (e.g. a buffer eviction).&lt;/p&gt;
&lt;p&gt;With &lt;code &gt;pg_stat_io&lt;/code&gt; you can simply look at the &lt;code &gt;writes&lt;/code&gt; field, and see both an accurate aggregate number, as well as exactly which process in Postgres actually ended up writing your data to disk.&lt;/p&gt;
&lt;h3 id=&quot;improve-workload-stability-and-sizing-shared_buffers-by-monitoring-shared-buffer-evictions&quot; &gt;&lt;a href=&quot;#improve-workload-stability-and-sizing-shared_buffers-by-monitoring-shared-buffer-evictions&quot; aria-label=&quot;improve workload stability and sizing shared_buffers by monitoring shared buffer evictions permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Improve workload stability and sizing shared_buffers by monitoring shared buffer evictions&lt;/h3&gt;
&lt;p&gt;One of the most important metrics that &lt;code &gt;pg_stat_io&lt;/code&gt; helps give clarity on, is the situation where a buffer page in shared buffers is evicted. Since shared buffers is a fixed size pool of pages (each 8kb in size, on most Postgres systems), what is cached inside it matters a great deal - &lt;strong&gt;especially when your working set exceeds shared buffers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;By default, if you&apos;re on a self-managed Postgres, the &lt;code &gt;shared_buffers&lt;/code&gt; setting is set to 128MB - or about 16,000 pages. Let&apos;s imagine you end up having loaded something through a very inefficient index scan, that ended up consuming all 128MB.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What happens when you suddenly read something completely different?&lt;/strong&gt; Postgres has to go and remove some of the old data from cache - also known as evicting a buffer page.&lt;/p&gt;
&lt;p&gt;This eviction has two main effects:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Data that was in Postgres buffer cache before, is no longer in the cache (note it may still be in the OS page cache)&lt;/li&gt;
&lt;li&gt;If the page that was evicted was marked as &quot;dirty&quot;, the process evicting it also has to write the old page to disk&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both of these aspects matter for sizing shared buffers, and &lt;code &gt;pg_stat_io&lt;/code&gt; can clearly show this by tracking &lt;code &gt;evictions&lt;/code&gt; for each backend type across the system. Further, if you see a sudden spike in evictions, and then suddenly a lot of &lt;code &gt;reads&lt;/code&gt;, it can help you infer that the cached data that was evicted, was actually needed again shortly afterwards. If in doubt, you can use the &lt;code &gt;pg_buffercache&lt;/code&gt; extension to look at the current shared buffers contents in detail.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;tracking-cumulative-io-activity-by-autovacuum-and-manual-vacuums&quot; &gt;&lt;a href=&quot;#tracking-cumulative-io-activity-by-autovacuum-and-manual-vacuums&quot; aria-label=&quot;tracking cumulative io activity by autovacuum and manual vacuums permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Tracking cumulative I/O activity by autovacuum and manual VACUUMs&lt;/h3&gt;
&lt;p&gt;It&apos;s a fact that every Postgres server needs the occasional VACUUM - whether you schedule it manually, or have autovacuum take care of it for you. It helps clean up dead rows and makes space re-usable, and it freezes pages to prevent transaction ID wraparound.&lt;/p&gt;
&lt;p&gt;But there is such a thing as VACUUMing too often. If not tuned correctly, VACUUM and autovacuum can have a dramatic effect on I/O activity. Historically the best bet was to look at the output of &lt;code &gt;log_autovacuum_min_duration&lt;/code&gt;, which will give you information like this:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;  LOG:  automatic vacuum of table &quot;mydb.pg_toast.pg_toast_42593&quot;: index scans: 0
        pages: 0 removed, 13594 remain, 13594 scanned (100.00% of total)
        tuples: 0 removed, 54515 remain, 0 are dead but not yet removable
        removable cutoff: 11915, which was 6 XIDs old when operation ended
        new relfrozenxid: 11915, which is 4139 XIDs ahead of previous value
        frozen: 13594 pages from table (100.00% of total) had 54515 tuples frozen
        index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
        avg read rate: 0.113 MB/s, avg write rate: 0.113 MB/s
        buffer usage: 13614 hits, 13602 misses, 13600 dirtied
        WAL usage: 40786 records, 13600 full page images, 113072608 bytes
        system usage: CPU: user: 0.26 s, system: 0.52 s, elapsed: 939.84 s&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;From the &lt;code &gt;buffer usage&lt;/code&gt; you can determine that this single VACUUM had to read 13602 pages, and marked 13600 pages as dirty. But what if we want to get a more complete picture, and across all our VACUUMs?&lt;/p&gt;
&lt;p&gt;With &lt;code &gt;pg_stat_io&lt;/code&gt;, you can now see a system-wide measurement of the impact of VACUUM, by looking at everything marked as &lt;code &gt;io_context = &apos;vacuum&apos;&lt;/code&gt;, or associated to the &lt;code &gt;autovacuum worker&lt;/code&gt; backend type:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_stat_io &lt;span &gt;WHERE&lt;/span&gt; backend_type &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;autovacuum worker&apos;&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;io_context &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;vacuum&apos;&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;reads&lt;/span&gt; &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; writes &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; extends &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;    backend_type    | io_object | io_context |  reads  | writes  | extends | op_bytes | evictions | reuses  | fsyncs |          stats_reset          
--------------------+-----------+------------+---------+---------+---------+----------+-----------+---------+--------+-------------------------------
 autovacuum worker  | relation  | bulkread   |       0 |       0 |         |     8192 |         0 |       0 |        | 2023-02-13 11:50:27.583875-08
 autovacuum worker  | relation  | normal     |   16306 |    2494 |    2915 |     8192 |     17785 |         |      0 | 2023-02-13 11:50:27.583875-08
 autovacuum worker  | relation  | vacuum     | 5824251 | 3028684 |       0 |     8192 |      2588 | 5821460 |        | 2023-02-13 11:50:27.583875-08
 client backend     | relation  | vacuum     |  128710 |       0 |       0 |     8192 |      1221 |  127489 |        | 2023-02-13 11:50:27.583875-08
 standalone backend | relation  | vacuum     |      10 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-13 11:50:27.583875-08
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this particular example, in sum, the autovacuum worker has read 44.4 GB of data (5,824,251 buffer pages), and written 23.1GB (3,028,684 buffer pages).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you track these statistics over time&lt;/strong&gt;, it will help you have a crystal-clear picture of whether autovacuum is to blame for an I/O spike during business hours. It will also help you make changes to tune autovacuum with more confidence, e.g. making autovacuum more aggressive to prevent bloat.&lt;/p&gt;
&lt;h3 id=&quot;visibility-into-bulk-readwrite-strategies-sequential-scans-and-copy&quot; &gt;&lt;a href=&quot;#visibility-into-bulk-readwrite-strategies-sequential-scans-and-copy&quot; aria-label=&quot;visibility into bulk readwrite strategies sequential scans and copy permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Visibility into bulk read/write strategies (sequential scans and COPY)&lt;/h3&gt;
&lt;p&gt;Have you ever used COPY in Postgres to load data? Or read data from a table using a sequential scan? You may not know that in most cases, this data does not pass through shared buffers in the regular way. Instead, Postgres uses a special dedicated ring buffer that ensures that most of shared buffers is undisturbed by such large activities.&lt;/p&gt;
&lt;p&gt;Before &lt;code &gt;pg_stat_io&lt;/code&gt;, it was near impossible to understand this activity in Postgres, as &lt;strong&gt;there was simply no tracking for it&lt;/strong&gt;. Now, we can finally see both bulk reads (typically large sequential scans) and bulk writes (typically COPY in), and the I/O activity they cause.&lt;/p&gt;
&lt;p&gt;You can simply filter for the new &lt;code &gt;bulkwrite&lt;/code&gt; and &lt;code &gt;bulkread&lt;/code&gt; values in &lt;code &gt;io_context&lt;/code&gt;, and have visibility into this activity:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_stat_io &lt;span &gt;WHERE&lt;/span&gt; io_context &lt;span &gt;IN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;bulkread&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;bulkwrite&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;reads&lt;/span&gt; &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; writes &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;OR&lt;/span&gt; extends &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;    backend_type    | io_object | io_context |  reads   | writes  | extends | op_bytes | evictions |  reuses  | fsyncs |          stats_reset          
--------------------+-----------+------------+----------+---------+---------+----------+-----------+----------+--------+-------------------------------
 client backend     | relation  | bulkread   | 25900458 |  627059 |         |     8192 |    754610 | 25141667 |        | 2023-02-13 11:50:27.583875-08
 client backend     | relation  | bulkwrite  |     4654 | 2858085 | 3259572 |     8192 |    998220 |  2209070 |        | 2023-02-13 11:50:27.583875-08
 background worker  | relation  | bulkread   | 39059938 |  590896 |         |     8192 |    802939 | 38253662 |        | 2023-02-13 11:50:27.583875-08
 standalone backend | relation  | bulkwrite  |        0 |       0 |       8 |     8192 |         0 |        0 |        | 2023-02-13 11:50:27.583875-08
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this example, there is 495 GB of bulk read activity, and 21 GB of bulk write activity we had no good way of identifying before. However, and most importantly, we don&apos;t have to worry about the &lt;code &gt;evictions&lt;/code&gt; count here - these are all evictions from the special bulk read / bulk write ring buffer, not from regular shared buffers.&lt;/p&gt;
&lt;h2 id=&quot;sneak-peek-visualizing-pg_stat_io-in-pganalyze&quot; &gt;&lt;a href=&quot;#sneak-peek-visualizing-pg_stat_io-in-pganalyze&quot; aria-label=&quot;sneak peek visualizing pg_stat_io in pganalyze permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Sneak peek: Visualizing pg_stat_io in pganalyze&lt;/h2&gt;
&lt;p&gt;It&apos;s still a while until Postgres 16 will be released (usually September or October each year), but to help test things (and because it&apos;s exciting!) &lt;strong&gt;I took a quick stab at updating pganalyze in an experimental branch&lt;/strong&gt; to collect &lt;code &gt;pg_stat_io&lt;/code&gt; metrics and visualize them over time.&lt;/p&gt;
&lt;p&gt;Here is a very early look at how this may look like in the future:&lt;/p&gt;
&lt;figure&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/9643f0e4baa16706490fda146c7a3791/56fb6/pganalyze_pg_stat_io.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Screenshot of experimental pg_stat_io view in pganalyze&quot; title=&quot;Screenshot of experimental pg_stat_io view in pganalyze&quot; src=&quot;https://pganalyze.com/static/9643f0e4baa16706490fda146c7a3791/1d69c/pganalyze_pg_stat_io.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;figcaption&gt;Experimental view of how pg_stat_io could look like when visualized over time&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Even though this is just running locally on my laptop, already we can see a clear pattern where writes are done by the checkpointer and background writer processes, most of the time. We can also see my &lt;code &gt;checkpoint_timeout&lt;/code&gt; being set to &lt;code &gt;5min&lt;/code&gt; (the default), with both &lt;strong&gt;writes and fsyncs happening like clockwork&lt;/strong&gt; - note the workload is periodic every 10 minutes, so every second checkpoint has less work to do.&lt;/p&gt;
&lt;p&gt;However, we can also clearly see a spike in activity - and that spike can be easily explained: To generate more database activity, I triggered a big daily background process around 8:10pm UTC. The high amount of data read caused the working set to momentarily exceed shared buffers, and caused a large amount of buffer evictions, which then caused &lt;strong&gt;the client backend having to write out buffer pages unexpectedly&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;On this system I have a very small &lt;code &gt;shared_buffers&lt;/code&gt; setting (the default, 128 MB). I should probably increase shared_buffers...&lt;/p&gt;
&lt;h2 id=&quot;the-future-of-io-observability-in-postgres&quot; &gt;&lt;a href=&quot;#the-future-of-io-observability-in-postgres&quot; aria-label=&quot;the future of io observability in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The future of I/O observability in Postgres&lt;/h2&gt;
&lt;p&gt;A lot of the ground work for &lt;code &gt;pg_stat_io&lt;/code&gt; actually happened previously in Postgres 15, through the new cumulative statistics system using shared memory.&lt;/p&gt;
&lt;p&gt;Before Postgres 15, statistics tracking had to go through the statistics collector (an obscure process that received UDP packets from individual processes part of Postgres), which was slow and error prone. This historically limited the ability to collect more advanced statistics easily. As the addition of &lt;code &gt;pg_stat_io&lt;/code&gt; shows, it is now much easier to track additional information about how Postgres operates.&lt;/p&gt;
&lt;p&gt;Amongst the immediate improvements that are already being discussed are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tracking of system-wide buffer cache hits (to allow calculating an accurate buffer cache hit ratio)&lt;/li&gt;
&lt;li&gt;Cumulative system-wide I/O times (not just I/O counts as currently present in &lt;code &gt;pg_stat_io&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Better cumulative WAL statistics (i.e. going beyond what pg_stat_wal offers)&lt;/li&gt;
&lt;li&gt;Additional I/O tracking for tables and indexes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our team at pganalyze is excited to have helped shape the new &lt;code &gt;pg_stat_io&lt;/code&gt; view, and we look forward to continue working with the community on making Postgres better.&lt;/p&gt;
&lt;p&gt;Share this article: If you&apos;d like to share this article with your peers, you can &lt;a href=&quot;(https://twitter.com/intent/tweet?text=Waiting%20for%20Postgres%2016:%20Cumulative%20I/O%20statistics%20with%20pg_stat_io%20-%20Check%20out%20this%20article%20by%20%40pganalyze%20%20and%20learn%20about%20querying%20system%20wide%20I/O%20statistics%20in%20Postgres%3A%20https%3A%2F%2Fpganalyze.com%2Fblog%2Fpg-stat-io)&quot;&gt;tweet about it here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PS:&lt;/strong&gt; If you&apos;re interested in learning more about optimizing Postgres I/O performance and costs you can &lt;a href=&quot;https://pganalyze.com/webinars/optimizing-postgres-io-performance-and-costs&quot;&gt;check out our webinar recording&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Lock monitoring in Postgres: Find blocking queries in a lock tree with pganalyze]]></title><description><![CDATA[Postgres databases power many mission critical applications, and applications expect consistent query performance. If even a single query takes longer than expected, it can lead to unhappy users, or delayed background processes. We can use  to debug a slow query, but there is one Postgres problem it won't tell us about: Blocked queries. You may also know this as "blocked sessions" from other database systems. This is when one query holds a lock on a table and the other is waiting for those locks…]]></description><link>https://pganalyze.com/blog/postgres-lock-monitoring</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-lock-monitoring</guid><dc:creator><![CDATA[Keiko Oda]]></dc:creator><pubDate>Thu, 01 Dec 2022 13:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Postgres databases power many mission critical applications, and applications expect consistent query performance. If even a single query takes longer than expected, it can lead to unhappy users, or delayed background processes. We can use &lt;code &gt;EXPLAIN&lt;/code&gt; to debug a slow query, but there is one Postgres problem it won&apos;t tell us about: Blocked queries. You may also know this as &quot;blocked sessions&quot; from other database systems. This is when one query holds a lock on a table and the other is waiting for those locks to be released.&lt;/p&gt;
&lt;p&gt;Historically, the solution for Postgres lock monitoring was to run a set of queries provided by the community to debug the issue. These queries either look at the &lt;code &gt;pg_locks&lt;/code&gt; view in Postgres, or use the newer &lt;code &gt;pg_blocking_pids()&lt;/code&gt; function to walk the lock tree in Postgres. But this involves a lot of manual work, as well as being present when the problem occurs. If a problem happened earlier in the day and resolved itself, the lock information is already gone.&lt;/p&gt;
&lt;p&gt;Today, we&apos;re excited to announce a better method for Postgres lock monitoring and alerting. The new pganalyze Lock Monitoring feature automatically detects locking/blocking queries as they happen, can alert you of production incidents in near-real time, and keeps a history of past locking incidents to help you understand an earlier locking problem.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#introducing-the-new-pganalyze-lock-monitoring-feature&quot;&gt;Introducing the new pganalyze Lock Monitoring feature&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#identifying-postgres-connections-that-block-queries-and-lead-to-cascading-lock-waits&quot;&gt;Identifying Postgres connections that block queries, and lead to cascading lock waits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#long-running-migrations-that-hold-exclusive-locks-for-too-long&quot;&gt;Long running migrations that hold exclusive locks for too long&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#get-alerted-of-blockinglocking-query-problems-in-near-real-time&quot;&gt;Get alerted of blocking/locking query problems in near-real time&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#behind-the-scenes-pg_blocking_pids&quot;&gt;Behind the scenes: pg_blocking_pids()&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#try-the-new-pganalyze-lock-monitoring-features-now&quot;&gt;Try the new pganalyze Lock Monitoring features now&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;introducing-the-new-pganalyze-lock-monitoring-feature&quot; &gt;&lt;a href=&quot;#introducing-the-new-pganalyze-lock-monitoring-feature&quot; aria-label=&quot;introducing the new pganalyze lock monitoring feature permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Introducing the new pganalyze Lock Monitoring feature&lt;/h2&gt;
&lt;figure&gt;
&lt;img src=&quot;https://pganalyze.com/d994506102718bab9993366d181c5b55/lock_monitoring.gif&quot; alt=&quot;Demonstration of how an idle connection progresses and blocks DELETE FROM and ALTER TABLE queries which in turn block 3 other SELECT queries&quot;&gt;
&lt;figcaption&gt;
Demonstration of how an idle connection progresses and blocks DELETE FROM and ALTER TABLE queries which in turn block 3 other SELECT queries
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Previously, pganalyze already collected &lt;a href=&quot;https://pganalyze.com/blog/postgres-connection-tracing-wait-event-analysis-and-vacuum-monitoring&quot;&gt;wait events&lt;/a&gt;. These events tell you what Postgres connections are waiting on, and the Wait Event History makes it easy to find outliers over time. With the &lt;strong&gt;new extended Connections page&lt;/strong&gt;, you can now easily discover which connection is blocking other queries right inside the Connections page, and quickly jump to historic locking problems and see indirect relationships.&lt;/p&gt;
&lt;p&gt;For example, when you have many &quot;Waiting for Lock&quot; connections, the database is likely having some trouble. It can be challenging to identify why a query is blocked. The new pganalyze Lock Monitoring feature lets you follow &lt;strong&gt;the whole story&lt;/strong&gt;, from queries that are waiting to the connection that is causing the lock waits in the firsts place, and helps you prioritize the issues you should resolve first.&lt;/p&gt;
&lt;p&gt;Next, let&apos;s look at two typical examples of blocked queries that you would encounter with a production application:&lt;/p&gt;
&lt;h3 id=&quot;identifying-postgres-connections-that-block-queries-and-lead-to-cascading-lock-waits&quot; &gt;&lt;a href=&quot;#identifying-postgres-connections-that-block-queries-and-lead-to-cascading-lock-waits&quot; aria-label=&quot;identifying postgres connections that block queries and lead to cascading lock waits permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Identifying Postgres connections that block queries, and lead to cascading lock waits&lt;/h3&gt;
&lt;!-- TODO: Using JPG image because of color compression issue with PNG - tuning the png quality to 80 fixes it, but not clear how to do that just for this image vs globally (if we figure it out, PNG is preferred to JPG) --&gt;
&lt;figure&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/99b23140456d2a2110bdd84bef0b76eb/d165a/cascading_lock_waits_connection_traces.jpg&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Long-running query with lots of operations on multiple tables holding locks longer than usual&quot; title=&quot;Long-running query with lots of operations on multiple tables holding locks longer than usual&quot; src=&quot;https://pganalyze.com/static/99b23140456d2a2110bdd84bef0b76eb/acb04/cascading_lock_waits_connection_traces.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;figcaption&gt;
Long-running query with lots of operations on multiple tables holding locks longer than usual
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This first example is from a production situation we encountered on the pganalyze application database itself. The bottom two queries (PID: 24825 and 30051) were waiting for tuples—row versions—that the bolded query (PID: 48665) was also going to lock—and it had priority in the lock tree. That query itself was waiting for a deletion from the long-running query above (PID: 27542).&lt;/p&gt;
&lt;figure&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/e55f63474dee8f87ee38228851189e09/2cefc/cascading_lock_waits_lock_tree.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Lock tree of cascading lock waits&quot; title=&quot;Lock tree of cascading lock waits&quot; src=&quot;https://pganalyze.com/static/e55f63474dee8f87ee38228851189e09/1d69c/cascading_lock_waits_lock_tree.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;figcaption&gt;Lock tree of cascading lock waits&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here, we had one recurring long-running query that was combining &lt;code &gt;DELETE&lt;/code&gt;s on multiple tables, and holding locks longer than usual. Therefore it was not a situation of &quot;somebody ran a bad query by accident&quot; or &quot;somebody wrote a migration that takes an exclusive lock on the table for a long time&quot;, but rather that we needed to re-think how to avoid this query in the first place.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Specifically, in this application we saw two possible solutions here:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Split up the query into multiple smaller queries, and potentially only delete a subset of rows at a time&lt;/li&gt;
&lt;li&gt;Use table partitioning to avoid a pattern where a daily &lt;code &gt;DELETE&lt;/code&gt; is necessary&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&apos;s take a look at another common locking situation: Schema migrations with slow DDL statements.&lt;/p&gt;
&lt;h3 id=&quot;long-running-migrations-that-hold-exclusive-locks-for-too-long&quot; &gt;&lt;a href=&quot;#long-running-migrations-that-hold-exclusive-locks-for-too-long&quot; aria-label=&quot;long running migrations that hold exclusive locks for too long permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Long running migrations that hold exclusive locks for too long&lt;/h3&gt;
&lt;figure&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/fdcf0b4aa32993634533b89dcb7e20c5/c1b63/long_running_migration.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Connection holding an AccessExclusive lock on the articles table, taken by an earlier DDL statement in the same transaction, blocking other queries on the table&quot; title=&quot;Connection holding an AccessExclusive lock on the articles table, taken by an earlier DDL statement in the same transaction, blocking other queries on the table&quot; src=&quot;https://pganalyze.com/static/fdcf0b4aa32993634533b89dcb7e20c5/1d69c/long_running_migration.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;figcaption&gt;
Connection holding an AccessExclusive lock on the articles table, taken by an earlier DDL statement in the same transaction, blocking other queries on the table
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The second example is a common locking situation: long running migrations. There are several types of migrations that can lock the table and block other queries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Let&apos;s look at the following scenario:&lt;/strong&gt; a new column called &lt;code &gt;data&lt;/code&gt; is introduced to the table &lt;code &gt;articles&lt;/code&gt;, and that column needs to be backfilled. A migration script in Rails could look like this:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;AddDataToArticles&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;7.0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;change&lt;/span&gt;&lt;/span&gt;
    add_column &lt;span &gt;:articles&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:data&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:text&lt;/span&gt;
    &lt;span &gt;Article&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;update_all data&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;backfilling_value&quot;&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Another way of looking at this Rails migration is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start transaction (happens automatically in Rails migrations)&lt;/li&gt;
&lt;li&gt;Add column to the table (this is very fast, but takes an exclusive lock)&lt;/li&gt;
&lt;li&gt;Run the backfill query (this is slow)&lt;/li&gt;
&lt;li&gt;Commit the transaction and release the locks&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Since this migration will happen in one transaction, that &lt;code &gt;Article.update_all data: &quot;backfilling_value&quot;&lt;/code&gt; would happen inside of the transaction (PID: 44248), with the query being &lt;code &gt;UPDATE articles SET data = $1&lt;/code&gt; like you can see in the screenshot. That transaction would hold the exclusive lock on the articles table from the &lt;code &gt;add_column :articles, :data, :text&lt;/code&gt; part.&lt;/p&gt;
&lt;p&gt;Now, what effect would this have on this database? With this example, backfilling 70M rows took almost 10 minutes. During this time, any queries that include the articles table (even a simple &lt;code &gt;SELECT&lt;/code&gt;) had to wait for the migration to be done. If this was a web application, this migration would have taken the application down for 10 minutes!&lt;/p&gt;
&lt;p&gt;It is very important that we don&apos;t write the migration like this to begin with. However, in case we do run such a migration, it is helpful to quickly know the lock information, so we can take the appropriate action, like canceling a query/migration. As a side note, you can find great examples of &quot;bad migrations&quot; in the &lt;a href=&quot;https://github.com/ankane/strong_migrations&quot;&gt;strong migrations&lt;/a&gt; project.&lt;/p&gt;
&lt;h2 id=&quot;get-alerted-of-blockinglocking-query-problems-in-near-real-time&quot; &gt;&lt;a href=&quot;#get-alerted-of-blockinglocking-query-problems-in-near-real-time&quot; aria-label=&quot;get alerted of blockinglocking query problems in near real time permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Get alerted of blocking/locking query problems in near-real time&lt;/h2&gt;
&lt;figure&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/69bc65fbf856c113b46ebda8004d16b0/2cefc/lock_monitoring_alert.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Lock Monitoring alert&quot; title=&quot;Lock Monitoring alert&quot; src=&quot;https://pganalyze.com/static/69bc65fbf856c113b46ebda8004d16b0/1d69c/lock_monitoring_alert.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;figcaption&gt;Lock Monitoring alert&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The new pganalyze Lock Monitoring feature includes a new &quot;Blocking Queries&quot; alert that will notify you when a query is blocking other queries for more than a specified time threshold. By default, the alert will trigger only when the query is blocking 3 or more other queries for more than 5 minutes, and consider it critical after 10 minutes. You can configure this based on your operational standards to something as low as 1 query being blocked for 10 seconds, if you would like to get notified for &lt;em&gt;any&lt;/em&gt; query being blocked right away (we don&apos;t recommend actually setting it this low for most environments).&lt;/p&gt;
&lt;p&gt;By default, these new alerts will show up in the pganalyze UI. Based on your preferences you can enable notifications to be sent by email, Slack or PagerDuty. You can learn more about &lt;a href=&quot;https://pganalyze.com/docs/checks&quot;&gt;our alerts and checkups here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;behind-the-scenes-pg_blocking_pids&quot; &gt;&lt;a href=&quot;#behind-the-scenes-pg_blocking_pids&quot; aria-label=&quot;behind the scenes pg_blocking_pids permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Behind the scenes: pg_blocking_pids()&lt;/h2&gt;
&lt;p&gt;To obtain the lock information, the pganalyze collector uses the &lt;code &gt;pg_blocking_pids()&lt;/code&gt; function. This function returns the list of PIDs a particular query is waiting for (is blocked by):&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;test_db&lt;span &gt;=&lt;/span&gt;&lt;span &gt;# SELECT pid, pg_blocking_pids(pid) FROM pg_stat_activity WHERE wait_event_type = &apos;Lock&apos;;&lt;/span&gt;
  pid  &lt;span &gt;|&lt;/span&gt; pg_blocking_pids 
&lt;span &gt;-------+------------------&lt;/span&gt;
 &lt;span &gt;81175&lt;/span&gt; &lt;span &gt;|&lt;/span&gt; {&lt;span &gt;33219&lt;/span&gt;}
 &lt;span &gt;81189&lt;/span&gt; &lt;span &gt;|&lt;/span&gt; {&lt;span &gt;33219&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;81175&lt;/span&gt;}
 &lt;span &gt;85112&lt;/span&gt; &lt;span &gt;|&lt;/span&gt; {&lt;span &gt;81189&lt;/span&gt;}
 &lt;span &gt;85128&lt;/span&gt; &lt;span &gt;|&lt;/span&gt; {&lt;span &gt;81189&lt;/span&gt;}
 &lt;span &gt;85146&lt;/span&gt; &lt;span &gt;|&lt;/span&gt; {&lt;span &gt;81189&lt;/span&gt;}
&lt;span &gt;(&lt;/span&gt;&lt;span &gt;5&lt;/span&gt; &lt;span &gt;rows&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Calling this function uses the Postgres lock manager, which can be a heavily utilized component on busy Postgres systems. To keep overhead at a minimum, the collector only calls this function when a query is already in the &quot;Waiting for Lock&quot; state, and we know that there is a reason to get additional information. In our benchmarks as well as tests on production systems, we have observed no negative performance impact from tracking this additional data. You can disable this feature by passing the &lt;code &gt;--no-postgres-locks&lt;/code&gt; option to the pganalyze collector, if needed.&lt;/p&gt;
&lt;figure&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/d99300bff8874615b76e8f98d4123f42/2cefc/pg_blocking_pids_lock_tree.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Lock tree based on the result of pg_blocking_pids()&quot; title=&quot;Lock tree based on the result of pg_blocking_pids()&quot; src=&quot;https://pganalyze.com/static/d99300bff8874615b76e8f98d4123f42/1d69c/pg_blocking_pids_lock_tree.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;figcaption&gt;Lock tree based on the result of pg_blocking_pids()&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In case you are calling the &lt;code &gt;pg_blocking_pids()&lt;/code&gt; function manually, be careful to look at the lock tree (as shown in the diagram) to detect other connections that have priority for acquiring the lock. If you are using pganalyze Lock Monitoring feature, this is done automatically for you.&lt;/p&gt;
&lt;h2 id=&quot;try-the-new-pganalyze-lock-monitoring-features-now&quot; &gt;&lt;a href=&quot;#try-the-new-pganalyze-lock-monitoring-features-now&quot; aria-label=&quot;try the new pganalyze lock monitoring features now permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Try the new pganalyze Lock Monitoring features now&lt;/h2&gt;
&lt;p&gt;If you are an existing pganalyze customer on the current Scale or Enterprise Cloud plans, you can start using the new pganalyze Lock Monitoring features today, or if you are not yet using pganalyze you can &lt;a href=&quot;https://app.pganalyze.com/users/sign_up&quot;&gt;sign up for a free 14-day trial&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To collect the necessary locking/blocking information, make sure to upgrade to the pganalyze collector version v0.46.0 or newer. A new Enterprise Server release including this will be released soon.&lt;/p&gt;
&lt;p&gt;We also want to extend our thanks to our early access group that reached out in response to the pganalyze newsletter. We&apos;ve already incorporated feedback, and are looking to add more improvements, such as identifying which individual object a lock is being held on—the whole table, a particular row, or a virtual transaction ID. We are planning to keep iterating on this new set of features and would love to hear your feedback.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DLock%20monitoring%20in%20Postgres:%20Find%20blocking%20queries%20in%20a%20lock%20tree%20with%20pganalyze%22%20-%20In%20this%20article,%20%40pganalyze%20share%20their%20new%20feature,%20which%20automatically%20detects%20locking/blocking%20queries%20as%20they%20happen,%20can%20alert%20you%20of%20production%20incidents,%20and%20more%3A%20https://pganalyze.com/blog/postgres-lock-monitoring&quot;&gt;Share this on Twitter&lt;/a&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[How Postgres Chooses Which Index To Use For A Query]]></title><description><![CDATA[Using Postgres sometimes feels like magic. But sometimes the magic is too much, such as when you are trying to understand the reason behind a seemingly bad Postgres query plan. I've often times found myself in a situation where I asked myself: "Postgres, what are you thinking?". Staring at an EXPLAIN plan, seeing a , and being puzzled as to why Postgres isn't doing what I am expecting. This has led me down the path of reading the Postgres source, in search for answers. Why is Postgres choosing a…]]></description><link>https://pganalyze.com/blog/how-postgres-chooses-index</link><guid isPermaLink="false">https://pganalyze.com/blog/how-postgres-chooses-index</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Fri, 01 Apr 2022 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p &gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Index usage of a Hash Join compared to a Nested Loop Join&quot; title=&quot;Index usage of a Hash Join compared to a Nested Loop Join&quot; src=&quot;https://pganalyze.com/static/d92dffc7749fada6f63b9488297dee5b/1d69c/nested_loop_vs_hash_join.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Using Postgres sometimes feels like magic. But sometimes the magic is too much, such as when you are trying to understand the reason behind a seemingly bad Postgres query plan.&lt;/p&gt;
&lt;p&gt;I&apos;ve often times found myself in a situation where I asked myself: &lt;strong&gt;&quot;Postgres, what are you thinking?&quot;&lt;/strong&gt;. Staring at an EXPLAIN plan, seeing a &lt;code &gt;Sequential Scan&lt;/code&gt;, and being puzzled as to why Postgres isn&apos;t doing what I am expecting.&lt;/p&gt;
&lt;p&gt;This has led me down the path of reading the Postgres source, in search for answers. Why is Postgres choosing a particular index over another one, or not choosing an index altogether?&lt;/p&gt;
&lt;p&gt;In this blog post I aim to give an introduction to how the Postgres planner analyzes your query, and how it decides which indexes to use. Additionally, &lt;strong&gt;we’ll look at a puzzling situation&lt;/strong&gt; where the join type can impact which indexes are being used.&lt;/p&gt;
&lt;p&gt;We’ll look at a lot of Postgres source code, but if you are short on time, you might want to jump to &lt;a href=&quot;#understanding-b-tree-index-cost-estimates&quot;&gt;how B-tree index costing works&lt;/a&gt;, and &lt;a href=&quot;#parameterized-index-scans-or-why-nested-loop-are-sometimes-a-good-join-type&quot;&gt;why Nested Loop Joins impact index usage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We’ll also talk about an &lt;a href=&quot;#new-features-coming-soon-to-pganalyze&quot;&gt;upcoming pganalyze feature&lt;/a&gt; at the very end!&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#a-tour-of-postgres-parse-analysis-and-early-stages-of-planning&quot;&gt;A tour of Postgres: Parse analysis and early stages of planning&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#four-levels-of-planning-a-query&quot;&gt;Four levels of planning a query&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#breaking-down-a-query-into-tables-being-scanned-reloptinfo-and-restrictinfo-structs&quot;&gt;Breaking down a query into tables being scanned (RelOptInfo and RestrictInfo structs)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#choosing-different-paths-and-scan-methods&quot;&gt;Choosing different paths and scan methods&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#where-index-scans-are-made&quot;&gt;Where Index Scans are made&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#creating-the-two-types-of-index-scans-plain-vs-parameterized&quot;&gt;Creating the two types of index scans: plain vs parameterized&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#understanding-b-tree-index-cost-estimates&quot;&gt;Understanding B-tree index cost estimates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#parameterized-index-scans-or-why-nested-loop-are-sometimes-a-good-join-type&quot;&gt;Parameterized Index Scans, or: Why Nested Loop are sometimes a good join type&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#new-features-coming-soon-to-pganalyze&quot;&gt;New features coming soon to pganalyze&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#other--helpful-resources&quot;&gt;Other  helpful resources&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;a-tour-of-postgres-parse-analysis-and-early-stages-of-planning&quot; &gt;&lt;a href=&quot;#a-tour-of-postgres-parse-analysis-and-early-stages-of-planning&quot; aria-label=&quot;a tour of postgres parse analysis and early stages of planning permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;A tour of Postgres: Parse analysis and early stages of planning&lt;/h2&gt;
&lt;p&gt;To start with, let’s look at a query’s lifecycle in Postgres. There are four important steps in how a query is handled:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Parsing: Turning query text into an Abstract Syntax Tree (AST)&lt;/li&gt;
&lt;li&gt;Parse analysis: Turning table names into actual references to table objects&lt;/li&gt;
&lt;li&gt;Planning: Finding and creating the optimal query plan&lt;/li&gt;
&lt;li&gt;Execution: Executing the query plan&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For understanding how the planner chooses which indexes to use, let’s first take a look at what parse analysis does.&lt;/p&gt;
&lt;p&gt;Whilst there are multiple entry points into parse analysis, depending if you have query parameters or not, the core function in parse analysis is &lt;code &gt;transformStmt&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/d22646922d66012705e0e2948cfb5b4a07092a29/src/backend/parser/analyze.c#L313&quot;&gt;source&lt;/a&gt;):&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
* transformStmt -
*    recursively transform a Parse tree into a Query tree.
*/&lt;/span&gt;
Query &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;transformStmt&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;ParseState &lt;span &gt;*&lt;/span&gt;pstate&lt;span &gt;,&lt;/span&gt; Node &lt;span &gt;*&lt;/span&gt;parseTree&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This takes the raw parse tree output (from the first step), and returns a Query struct. It has a lot of specific cases, as it handles both regular SELECTs as well as UPDATEs and other DML statements. Note that utility statements (DDL, etc) mostly get passed through to the execution phase.&lt;/p&gt;
&lt;p&gt;Since we are interested in tables and indexes, let’s take a closer look at how parse analysis handles the FROM clause:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;void&lt;/span&gt;
&lt;span &gt;transformFromClause&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;ParseState &lt;span &gt;*&lt;/span&gt;pstate&lt;span &gt;,&lt;/span&gt; List &lt;span &gt;*&lt;/span&gt;frmList&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
   ListCell   &lt;span &gt;*&lt;/span&gt;fl&lt;span &gt;;&lt;/span&gt;
 
   &lt;span &gt;/*
    * The grammar will have produced a list of RangeVars, RangeSubselects,
    * RangeFunctions, and/or JoinExprs. Transform each one (possibly adding
    * entries to the rtable), check for duplicate refnames, and then add it
    * to the joinlist and namespace.
    */&lt;/span&gt;
   &lt;span &gt;foreach&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;fl&lt;span &gt;,&lt;/span&gt; frmList&lt;span &gt;)&lt;/span&gt;
   &lt;span &gt;{&lt;/span&gt;
       …
 
       n &lt;span &gt;=&lt;/span&gt; &lt;span &gt;transformFromClauseItem&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;pstate&lt;span &gt;,&lt;/span&gt; n&lt;span &gt;,&lt;/span&gt;
                                   &lt;span &gt;&amp;amp;&lt;/span&gt;nsitem&lt;span &gt;,&lt;/span&gt;
                                   &lt;span &gt;&amp;amp;&lt;/span&gt;namespace&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
…
&lt;span &gt;/*
* transformFromClauseItem -
*    Transform a FROM-clause item, adding any required entries to the
*    range table list being built in the ParseState, and return the
*    transformed item ready to include in the joinlist.  Also build a
*    ParseNamespaceItem list describing the names exposed by this item.
*    This routine can recurse to handle SQL92 JOIN expressions.
*/&lt;/span&gt;
&lt;span &gt;static&lt;/span&gt; Node &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;transformFromClauseItem&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;ParseState &lt;span &gt;*&lt;/span&gt;pstate&lt;span &gt;,&lt;/span&gt; Node &lt;span &gt;*&lt;/span&gt;n&lt;span &gt;,&lt;/span&gt;
                       ParseNamespaceItem &lt;span &gt;*&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;top_nsitem&lt;span &gt;,&lt;/span&gt;
                       List &lt;span &gt;*&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;namespace&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
…&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Postgres already separates between the range table list (essentially a list of all the tables referenced by the query), and the joinlist. This distinction will also be visible at a later point in the planner.&lt;/p&gt;
&lt;p&gt;Note that at this point Postgres has not yet made up its mind which indexes to use - it just decided that the FROM reference you called “foobar” is actually the table “foobar” in the “public” schema with OID 16424.&lt;/p&gt;
&lt;p&gt;This information now gets stored in the Query struct, which is the result of the parse analysis phase. This Query struct is then passed into the planner, and that’s where it gets interesting.&lt;/p&gt;
&lt;h3 id=&quot;four-levels-of-planning-a-query&quot; &gt;&lt;a href=&quot;#four-levels-of-planning-a-query&quot; aria-label=&quot;four levels of planning a query permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Four levels of planning a query&lt;/h3&gt;
&lt;p&gt;Commonly we would start with the &lt;code &gt;standard_planner&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/db0d67db2401eb6238ccc04c6407a4fd4f985832/src/backend/optimizer/plan/planner.c#L282&quot;&gt;source&lt;/a&gt;) function as an entry point into the Postgres planner:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;PlannedStmt &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;standard_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;Query &lt;span &gt;*&lt;/span&gt;parse&lt;span &gt;,&lt;/span&gt; &lt;span &gt;const&lt;/span&gt; &lt;span &gt;char&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;query_string&lt;span &gt;,&lt;/span&gt; &lt;span &gt;int&lt;/span&gt; cursorOptions&lt;span &gt;,&lt;/span&gt;
                ParamListInfo boundParams&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This takes our &lt;code &gt;Query&lt;/code&gt; struct, and ultimately returns a &lt;code &gt;PlannedStmt&lt;/code&gt;. For reference, the &lt;code &gt;PlannedStmt&lt;/code&gt; struct (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/include/nodes/plannodes.h#L43&quot;&gt;source&lt;/a&gt;) looks like this:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/* ----------------
*      PlannedStmt node
*
* The output of the planner is a Plan tree headed by a PlannedStmt node.
* PlannedStmt holds the &quot;one time&quot; information needed by the executor.
* ----------------
*/&lt;/span&gt;
&lt;span &gt;typedef&lt;/span&gt; &lt;span &gt;struct&lt;/span&gt; &lt;span &gt;PlannedStmt&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
   NodeTag     type&lt;span &gt;;&lt;/span&gt;
 
   CmdType     commandType&lt;span &gt;;&lt;/span&gt;    &lt;span &gt;/* select|insert|update|delete|utility */&lt;/span&gt;
 
…
 
   &lt;span &gt;struct&lt;/span&gt; &lt;span &gt;Plan&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;planTree&lt;span &gt;;&lt;/span&gt;      &lt;span &gt;/* tree of Plan nodes */&lt;/span&gt;
 
…&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The tree of plan nodes is what you would be familiar with if you’ve looked at an EXPLAIN output before - ultimately EXPLAIN is based on walking that plan tree and showing you a text/JSON/etc version of it.&lt;/p&gt;
&lt;p&gt;The core function of the planner is best described in these lines of &lt;code &gt;standard_planner&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/* primary planning entry point (may recurse for subqueries) */&lt;/span&gt;
root &lt;span &gt;=&lt;/span&gt; &lt;span &gt;subquery_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;glob&lt;span &gt;,&lt;/span&gt; parse&lt;span &gt;,&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                        false&lt;span &gt;,&lt;/span&gt; tuple_fraction&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;/* Select best Path and turn it into a Plan */&lt;/span&gt;
final_rel &lt;span &gt;=&lt;/span&gt; &lt;span &gt;fetch_upper_rel&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; UPPERREL_FINAL&lt;span &gt;,&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
best_path &lt;span &gt;=&lt;/span&gt; &lt;span &gt;get_cheapest_fractional_path&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;final_rel&lt;span &gt;,&lt;/span&gt; tuple_fraction&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

top_plan &lt;span &gt;=&lt;/span&gt; &lt;span &gt;create_plan&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; best_path&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The planner first creates what are called “paths” using the &lt;code &gt;subquery_planner&lt;/code&gt; (which may recursively call itself), and then the planner picks the best path. Best on this best path, the actual plan tree is constructed.&lt;/p&gt;
&lt;p&gt;For understanding how the planner chose which indexes to use, we must therefore look at paths, not at plan nodes. Let’s see what &lt;code &gt;subquery_planner&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/backend/optimizer/plan/planner.c#L596&quot;&gt;source&lt;/a&gt;) does:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*--------------------
* subquery_planner
*    Invokes the planner on a subquery.  We recurse to here for each
*    sub-SELECT found in the query tree.
…
*/&lt;/span&gt;
PlannerInfo &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;subquery_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerGlobal &lt;span &gt;*&lt;/span&gt;glob&lt;span &gt;,&lt;/span&gt; Query &lt;span &gt;*&lt;/span&gt;parse&lt;span &gt;,&lt;/span&gt;
                PlannerInfo &lt;span &gt;*&lt;/span&gt;parent_root&lt;span &gt;,&lt;/span&gt;
                bool hasRecursion&lt;span &gt;,&lt;/span&gt; &lt;span &gt;double&lt;/span&gt; tuple_fraction&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As described in the comment, this handles each sub-SELECT separately - but note that even if the original query contains a written sub-SELECT, the planner may optimize it away to pull it up into the parent planning process, if possible.&lt;/p&gt;
&lt;p&gt;For the purposes of focusing on index choice, here are the two key parts of &lt;code &gt;subquery_planner&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
 * Do the main planning.  If we have an inherited target relation, that
 * needs special processing, else go straight to grouping_planner.
 */&lt;/span&gt;
&lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;parse&lt;span &gt;-&gt;&lt;/span&gt;resultRelation &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span &gt;rt_fetch&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;parse&lt;span &gt;-&gt;&lt;/span&gt;resultRelation&lt;span &gt;,&lt;/span&gt; parse&lt;span &gt;-&gt;&lt;/span&gt;rtable&lt;span &gt;)&lt;/span&gt;&lt;span &gt;-&gt;&lt;/span&gt;inh&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;inheritance_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;else&lt;/span&gt;
    &lt;span &gt;grouping_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; false&lt;span &gt;,&lt;/span&gt; tuple_fraction&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

…

&lt;span &gt;/*
 * Make sure we&apos;ve identified the cheapest Path for the final rel.  (By
 * doing this here not in grouping_planner, we include initPlan costs in
 * the decision, though it&apos;s unlikely that will change anything.)
 */&lt;/span&gt;
&lt;span &gt;set_cheapest&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;final_rel&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This method also optimizes for the cheapest path - we’ll see more on that in a moment. But for now, let’s go deeper down the rabbit hole and look at &lt;code &gt;grouping_planner&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/backend/optimizer/plan/planner.c#L1253&quot;&gt;source&lt;/a&gt;):&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/* --------------------
 * grouping_planner
 *    Perform planning steps related to grouping, aggregation, etc.
 *
 * This function adds all required top-level processing to the scan/join
 * Path(s) produced by query_planner.
 *
 * --------------------
 */&lt;/span&gt;
&lt;span &gt;static&lt;/span&gt; &lt;span &gt;void&lt;/span&gt;
&lt;span &gt;grouping_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; bool inheritance_update&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;double&lt;/span&gt; tuple_fraction&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Reading through its code, turns out we’re still not there. It’s actually &lt;code &gt;query_planner&lt;/code&gt; that we are looking for, as described in this comment:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;RelOptInfo &lt;span &gt;*&lt;/span&gt;current_rel&lt;span &gt;;&lt;/span&gt;
…
&lt;span &gt;/*
* Generate the best unsorted and presorted paths for the scan/join
* portion of this Query, ie the processing represented by the
* FROM/WHERE clauses.  (Note there may not be any presorted paths.)
* We also generate (in standard_qp_callback) pathkey representations
* of the query&apos;s sort clause, distinct clause, etc.
*/&lt;/span&gt;
current_rel &lt;span &gt;=&lt;/span&gt; &lt;span &gt;query_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; standard_qp_callback&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;qp_extra&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Before we dive into the &lt;code &gt;query_planner&lt;/code&gt; method, let’s pause for a moment and look at what the result of &lt;code &gt;query_planner&lt;/code&gt; is, the &lt;code &gt;RelOptInfo&lt;/code&gt; struct:&lt;/p&gt;
&lt;h3 id=&quot;breaking-down-a-query-into-tables-being-scanned-reloptinfo-and-restrictinfo-structs&quot; &gt;&lt;a href=&quot;#breaking-down-a-query-into-tables-being-scanned-reloptinfo-and-restrictinfo-structs&quot; aria-label=&quot;breaking down a query into tables being scanned reloptinfo and restrictinfo structs permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Breaking down a query into tables being scanned (RelOptInfo and RestrictInfo structs)&lt;/h3&gt;
&lt;p&gt;In the Postgres planner, &lt;code &gt;RelOptInfo&lt;/code&gt; is best described as the internal representation of a particular table that is being scanned (with either a sequential scan, or an index scan).&lt;/p&gt;
&lt;p&gt;When trying to understand how Postgres interprets your query, adding debug information that shows RelOptInfo would be the closest that you can get to seeing which tables Postgres is going to scan, and how it makes a decision between different scan methods, such as an Index Scan.&lt;/p&gt;
&lt;p&gt;RelOptInfo (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/include/nodes/pathnodes.h#L674&quot;&gt;source&lt;/a&gt;) has many details to it, but the key parts for our focus on indexing are these:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*----------
* RelOptInfo
*      Per-relation information for planning/optimization
…
*      pathlist - List of Path nodes, one for each potentially useful
*                 method of generating the relation
… 
*      baserestrictinfo - List of RestrictInfo nodes, containing info about
*                  each non-join qualification clause in which this relation
*                  participates (only used for base rels)
…
*      joininfo  - List of RestrictInfo nodes, containing info about each
*                  join clause in which this relation participates
…
*/&lt;/span&gt;
&lt;span &gt;typedef&lt;/span&gt; &lt;span &gt;struct&lt;/span&gt; &lt;span &gt;RelOptInfo&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
…
   List       &lt;span &gt;*&lt;/span&gt;pathlist&lt;span &gt;;&lt;/span&gt;       &lt;span &gt;/* Path structures */&lt;/span&gt;
…
   List       &lt;span &gt;*&lt;/span&gt;baserestrictinfo&lt;span &gt;;&lt;/span&gt;   &lt;span &gt;/* RestrictInfo structures (if base rel) */&lt;/span&gt;
…
   List       &lt;span &gt;*&lt;/span&gt;joininfo&lt;span &gt;;&lt;/span&gt;       &lt;span &gt;/* RestrictInfo structures for join clauses
                                * involving this rel */&lt;/span&gt;
…
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Before we interpret this, let’s look at &lt;code &gt;RestrictInfo&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/include/nodes/pathnodes.h#L2067&quot;&gt;source&lt;/a&gt;):&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
* Restriction clause info.
*
* We create one of these for each AND sub-clause of a restriction condition
* (WHERE or JOIN/ON clause).  Since the restriction clauses are logically
* ANDed, we can use any one of them or any subset of them to filter out
* tuples, without having to evaluate the rest.
..
*/&lt;/span&gt;
&lt;span &gt;typedef&lt;/span&gt; &lt;span &gt;struct&lt;/span&gt; &lt;span &gt;RestrictInfo&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
   NodeTag     type&lt;span &gt;;&lt;/span&gt;
   Expr       &lt;span &gt;*&lt;/span&gt;clause&lt;span &gt;;&lt;/span&gt;         &lt;span &gt;/* the represented clause of WHERE or JOIN */&lt;/span&gt;
…
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A note on terminology: This references “base relations”, which are relations (aka tables) that are looked at solely on their individual basis, as compared to in the context of a JOIN.&lt;/p&gt;
&lt;p&gt;In the code sample, &lt;code &gt;RestrictInfo&lt;/code&gt; is how our WHERE clause and JOIN conditions get represented. This is the part that is key to understanding how Postgres compares your query against the indexes that exist.&lt;/p&gt;
&lt;p&gt;You can think about it this way - for each table that’s included in the query, Postgres generates two lists of “restriction” clauses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Base restriction clauses&lt;/strong&gt;: Typically part of your WHERE clause, and are expressions that involve only the table itself - for example &lt;code &gt;users.id = 123&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Join clauses&lt;/strong&gt;: Typically part of your JOIN clause, and expressions that involve multiple tables - for example &lt;code &gt;users.id = comments.user_id&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note the reason that Postgres calls these “restriction” clauses is because they restrict (or filter) the amount of data that is being returned from your table. &lt;strong&gt;And how can we effectively filter data from a table? By using an index!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The base restriction clauses will typically be used to filter down the amount of data that is being returned from the table. But join clauses oftentimes will not, as they are only used as part of the matching of rows that happens during the JOIN operation.&lt;/p&gt;
&lt;p&gt;The one exception to this are &lt;a src=&quot;https://pganalyze.com/docs/explain/join-nodes/nested-loop&quot;&gt;Nested Loop Joins&lt;/a&gt; - but we’ll come back to that.&lt;/p&gt;
&lt;p&gt;
&lt;a src=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Effective Indexing eBook promotion banner&quot; title=&quot;Effective Indexing eBook promotion banner&quot; src=&quot;https://pganalyze.com/static/b24fdd95dbc38757fe354c86d9ad9aaa/acb04/promo_ebook.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id=&quot;choosing-different-paths-and-scan-methods&quot; &gt;&lt;a href=&quot;#choosing-different-paths-and-scan-methods&quot; aria-label=&quot;choosing different paths and scan methods permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Choosing different paths and scan methods&lt;/h3&gt;
&lt;p&gt;Let’s go back to &lt;code &gt;query_planner&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/backend/optimizer/plan/planmain.c#L55&quot;&gt;source&lt;/a&gt;), and what it does:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
* query_planner
*    Generate a path (that is, a simplified plan) for a basic query,
*    which may involve joins but not any fancier features.
*
* Since query_planner does not handle the toplevel processing (grouping,
* sorting, etc) it cannot select the best path by itself.  Instead, it
* returns the RelOptInfo for the top level of joining, and the caller
* (grouping_planner) can choose among the surviving paths for the rel.
…
*/&lt;/span&gt;
RelOptInfo &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;query_planner&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt;
             query_pathkeys_callback qp_callback&lt;span &gt;,&lt;/span&gt; &lt;span &gt;void&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;qp_extra&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
…
   &lt;span &gt;/*
    * Construct RelOptInfo nodes for all base relations used in the query.
    */&lt;/span&gt;
   &lt;span &gt;add_base_rels_to_query&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;Node &lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; parse&lt;span &gt;-&gt;&lt;/span&gt;jointree&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
…
   &lt;span &gt;/*
    * Ready to do the primary planning.
    */&lt;/span&gt;
   final_rel &lt;span &gt;=&lt;/span&gt; &lt;span &gt;make_one_rel&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; joinlist&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
   &lt;span &gt;return&lt;/span&gt; final_rel&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The main point of &lt;code &gt;query_planner&lt;/code&gt; itself is to create a set of &lt;code &gt;RelOptInfo&lt;/code&gt; nodes, do a bunch of processing involving them, and then passing them to &lt;code &gt;make_one_rel&lt;/code&gt;. As that name says, it creates one “final rel”, which is also a &lt;code &gt;RelOptInfo&lt;/code&gt; node, that is then used to create our final plan.&lt;/p&gt;
&lt;p&gt;We’ve looked at a bunch of code already, but now it’s time to get to the exciting part!&lt;/p&gt;
&lt;p&gt;The implementation of &lt;code &gt;make_one_rel&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/backend/optimizer/path/allpaths.c#L153&quot;&gt;source&lt;/a&gt;) sits in a file with the important sounding name of &lt;code &gt;allpaths.c&lt;/code&gt; - and as referenced earlier, when we talk about plan choices, we need to understand which path is chosen, as that is used to then create a plan node.&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
 * make_one_rel
 *    Finds all possible access paths for executing a query, returning a
 *    single rel that represents the join of all base rels in the query.
 */&lt;/span&gt;
RelOptInfo &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;make_one_rel&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; List &lt;span &gt;*&lt;/span&gt;joinlist&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
…
   &lt;span &gt;/*
    * Compute size estimates and consider_parallel flags for each base rel.
    */&lt;/span&gt;
   &lt;span &gt;set_base_rel_sizes&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
…
 
   &lt;span &gt;/*
    * Generate access paths for each base rel.
    */&lt;/span&gt;
   &lt;span &gt;set_base_rel_pathlists&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
   &lt;span &gt;/*
    * Generate access paths for the entire join tree.
    */&lt;/span&gt;
   rel &lt;span &gt;=&lt;/span&gt; &lt;span &gt;make_rel_from_joinlist&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; joinlist&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
   &lt;span &gt;return&lt;/span&gt; rel&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Paths are chosen in three steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Estimate the sizes of the involved tables&lt;/li&gt;
&lt;li&gt;Find the best path for each base relation&lt;/li&gt;
&lt;li&gt;Find the best path for the entire join tree&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first step is mainly concerned with size estimates as they relate to the output of scanning the relation. This impacts the cost and rows numbers you are familiar with from EXPLAIN - and this may impact joins, but typically should not directly impact index usage.&lt;/p&gt;
&lt;p&gt;Now step 2 is key to our goal here. And &lt;code &gt;set_base_rel_pathlists&lt;/code&gt; ultimately calls &lt;code &gt;set_plain_rel_pathlist&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/backend/optimizer/path/allpaths.c#L767&quot;&gt;source&lt;/a&gt;), which finally looks like what we are interested in:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
 * set_plain_rel_pathlist
 *    Build access paths for a plain relation (no subquery, no inheritance)
 */&lt;/span&gt;
&lt;span &gt;static&lt;/span&gt; &lt;span &gt;void&lt;/span&gt;
&lt;span &gt;set_plain_rel_pathlist&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; RelOptInfo &lt;span &gt;*&lt;/span&gt;rel&lt;span &gt;,&lt;/span&gt; RangeTblEntry &lt;span &gt;*&lt;/span&gt;rte&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
   …
 
   &lt;span &gt;/* Consider sequential scan */&lt;/span&gt;
   &lt;span &gt;add_path&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;rel&lt;span &gt;,&lt;/span&gt; &lt;span &gt;create_seqscan_path&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;,&lt;/span&gt; required_outer&lt;span &gt;,&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
   &lt;span &gt;/* If appropriate, consider parallel sequential scan */&lt;/span&gt;
   &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;rel&lt;span &gt;-&gt;&lt;/span&gt;consider_parallel &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; required_outer &lt;span &gt;==&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
       &lt;span &gt;create_plain_partial_paths&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
   &lt;span &gt;/* Consider index scans */&lt;/span&gt;
   &lt;span &gt;create_index_paths&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
   &lt;span &gt;/* Consider TID scans */&lt;/span&gt;
   &lt;span &gt;create_tidscan_paths&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;where-index-scans-are-made&quot; &gt;&lt;a href=&quot;#where-index-scans-are-made&quot; aria-label=&quot;where index scans are made permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Where Index Scans are made&lt;/h2&gt;
&lt;h3 id=&quot;creating-the-two-types-of-index-scans-plain-vs-parameterized&quot; &gt;&lt;a href=&quot;#creating-the-two-types-of-index-scans-plain-vs-parameterized&quot; aria-label=&quot;creating the two types of index scans plain vs parameterized permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating the two types of index scans: plain vs parameterized&lt;/h3&gt;
&lt;p&gt;Let’s look at &lt;code &gt;create_index_paths&lt;/code&gt; (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/backend/optimizer/path/indxpath.c#L235&quot;&gt;source&lt;/a&gt;), since we want to see how indexes are picked:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
* create_index_paths()
*    Generate all interesting index paths for the given relation.
*    Candidate paths are added to the rel&apos;s pathlist (using add_path).
*
* To be considered for an index scan, an index must match one or more
* restriction clauses or join clauses from the query&apos;s qual condition,
* or match the query&apos;s ORDER BY condition, or have a predicate that
* matches the query&apos;s qual condition.
*
* There are two basic kinds of index scans.  A &quot;plain&quot; index scan uses
* only restriction clauses (possibly none at all) in its indexqual,
* so it can be applied in any context.  A &quot;parameterized&quot; index scan uses
* join clauses (plus restriction clauses, if available) in its indexqual.
* When joining such a scan to one of the relations supplying the other
* variables used in its indexqual, the parameterized scan must appear as
* the inner relation of a nestloop join; it can&apos;t be used on the outer side,
* nor in a merge or hash join.
…
*/&lt;/span&gt;
&lt;span &gt;void&lt;/span&gt;
&lt;span &gt;create_index_paths&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; RelOptInfo &lt;span &gt;*&lt;/span&gt;rel&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
…
   &lt;span &gt;/* Examine each index in turn */&lt;/span&gt;
   &lt;span &gt;foreach&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;lc&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;-&gt;&lt;/span&gt;indexlist&lt;span &gt;)&lt;/span&gt;
   &lt;span &gt;{&lt;/span&gt;
       IndexOptInfo &lt;span &gt;*&lt;/span&gt;index &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;IndexOptInfo &lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;lfirst&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;lc&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
       …
 
       &lt;span &gt;/*
        * Ignore partial indexes that do not match the query.
        */&lt;/span&gt;
       &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;index&lt;span &gt;-&gt;&lt;/span&gt;indpred &lt;span &gt;!=&lt;/span&gt; NIL &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span &gt;!&lt;/span&gt;index&lt;span &gt;-&gt;&lt;/span&gt;predOK&lt;span &gt;)&lt;/span&gt;
           &lt;span &gt;continue&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
       &lt;span &gt;/*
        * Identify the restriction clauses that can match the index.
        */&lt;/span&gt;
       &lt;span &gt;match_restriction_clauses_to_index&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; index&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;rclauseset&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
       &lt;span &gt;/*
        * Build index paths from the restriction clauses.  These will be
        * non-parameterized paths.  Plain paths go directly to add_path(),
        * bitmap paths are added to bitindexpaths to be handled below.
        */&lt;/span&gt;
       &lt;span &gt;get_index_paths&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;,&lt;/span&gt; index&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;rclauseset&lt;span &gt;,&lt;/span&gt;
                       &lt;span &gt;&amp;amp;&lt;/span&gt;bitindexpaths&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
       &lt;span &gt;/*
        * Identify the join clauses that can match the index.  For the moment
        * we keep them separate from the restriction clauses.
        */&lt;/span&gt;
       &lt;span &gt;match_join_clauses_to_index&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;,&lt;/span&gt; index&lt;span &gt;,&lt;/span&gt;
                                   &lt;span &gt;&amp;amp;&lt;/span&gt;jclauseset&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;joinorclauses&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
…
       &lt;span &gt;/*
        * If we found any plain or eclass join clauses, build parameterized
        * index paths using them.
        */&lt;/span&gt;
       &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;jclauseset&lt;span &gt;.&lt;/span&gt;nonempty &lt;span &gt;||&lt;/span&gt; eclauseset&lt;span &gt;.&lt;/span&gt;nonempty&lt;span &gt;)&lt;/span&gt;
           &lt;span &gt;consider_index_join_clauses&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;,&lt;/span&gt; index&lt;span &gt;,&lt;/span&gt;
                                       &lt;span &gt;&amp;amp;&lt;/span&gt;rclauseset&lt;span &gt;,&lt;/span&gt;
                                       &lt;span &gt;&amp;amp;&lt;/span&gt;jclauseset&lt;span &gt;,&lt;/span&gt;
                                       &lt;span &gt;&amp;amp;&lt;/span&gt;eclauseset&lt;span &gt;,&lt;/span&gt;
                                       &lt;span &gt;&amp;amp;&lt;/span&gt;bitjoinpaths&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
   &lt;span &gt;}&lt;/span&gt;
 
…
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are a lot of things to take in here - and we’ve already removed BitmapOr/BitmapAnd index scans from this code sample.&lt;/p&gt;
&lt;p&gt;First of all, &lt;strong&gt;this builds two types of index scans&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Plain index scans&lt;/strong&gt;, that only use the base restriction clauses&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Parameterized index scans&lt;/strong&gt;, that use both base restriction clauses and join clauses&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We’ll talk more about the second case in a moment.&lt;/p&gt;
&lt;p&gt;Other key aspects to understand:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partial indexes (i.e. those with an attached WHERE clause on the index definition) are matched against the set of restriction clauses and discarded here if they don’t match&lt;/li&gt;
&lt;li&gt;Each index is both considered for an Index Scan and Index Only Scan (through the “build_index_paths” method), as well as for a Bitmap Heap Scan / Bitmap Index Scan&lt;/li&gt;
&lt;li&gt;Each potential way of using an index gets a cost assigned - and this cost decides whether Postgres actually chooses the index (see earlier notion of the “best path”), or not&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For understanding how costing works, you can look at the &lt;code &gt;cost_index&lt;/code&gt; function (&lt;a href=&quot;https://github.com/postgres/postgres/blob/9f91344223aad903ff70301f40183691a89f6cd4/src/backend/optimizer/path/costsize.c#L492&quot;&gt;source&lt;/a&gt;), which gets called from &lt;code &gt;build_index_paths&lt;/code&gt; through a few hoops.&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
* cost_index
*    Determines and returns the cost of scanning a relation using an index.
…
* In addition to rows, startup_cost and total_cost, cost_index() sets the
* path&apos;s indextotalcost and indexselectivity fields.  These values will be
* needed if the IndexPath is used in a BitmapIndexScan.
*/&lt;/span&gt;
&lt;span &gt;void&lt;/span&gt;
&lt;span &gt;cost_index&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;IndexPath &lt;span &gt;*&lt;/span&gt;path&lt;span &gt;,&lt;/span&gt; PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; &lt;span &gt;double&lt;/span&gt; loop_count&lt;span &gt;,&lt;/span&gt;
          bool partial_path&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
…
   &lt;span &gt;/*
    * Call index-access-method-specific code to estimate the processing cost
    * for scanning the index, as well as the selectivity of the index (ie,
    * the fraction of main-table tuples we will have to retrieve) and its
    * correlation to the main-table tuple order.
    */&lt;/span&gt;
   &lt;span &gt;amcostestimate&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; path&lt;span &gt;,&lt;/span&gt; loop_count&lt;span &gt;,&lt;/span&gt;
                  &lt;span &gt;&amp;amp;&lt;/span&gt;indexStartupCost&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;indexTotalCost&lt;span &gt;,&lt;/span&gt;
                  &lt;span &gt;&amp;amp;&lt;/span&gt;indexSelectivity&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;indexCorrelation&lt;span &gt;,&lt;/span&gt;
                  &lt;span &gt;&amp;amp;&lt;/span&gt;index_pages&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Whilst there are other factors in costing an index scan, the main responsibility falls to the &lt;a href=&quot;https://www.postgresql.org/docs/current/indexam.html&quot;&gt;Index Access Method&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;understanding-b-tree-index-cost-estimates&quot; &gt;&lt;a href=&quot;#understanding-b-tree-index-cost-estimates&quot; aria-label=&quot;understanding b tree index cost estimates permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Understanding B-tree index cost estimates&lt;/h3&gt;
&lt;p&gt;The most common index access method (or index type) is B-tree, so let’s look at &lt;code &gt;btcostestimate&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;void&lt;/span&gt;
&lt;span &gt;btcostestimate&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; IndexPath &lt;span &gt;*&lt;/span&gt;path&lt;span &gt;,&lt;/span&gt; &lt;span &gt;double&lt;/span&gt; loop_count&lt;span &gt;,&lt;/span&gt;
              Cost &lt;span &gt;*&lt;/span&gt;indexStartupCost&lt;span &gt;,&lt;/span&gt; Cost &lt;span &gt;*&lt;/span&gt;indexTotalCost&lt;span &gt;,&lt;/span&gt;
              Selectivity &lt;span &gt;*&lt;/span&gt;indexSelectivity&lt;span &gt;,&lt;/span&gt; &lt;span &gt;double&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;indexCorrelation&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;double&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;indexPages&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
…
   &lt;span &gt;/*
    * For a btree scan, only leading &apos;=&apos; quals plus inequality quals for the
    * immediately next attribute contribute to index selectivity (these are
    * the &quot;boundary quals&quot; that determine the starting and stopping points of
    * the index scan).
    */&lt;/span&gt;
   indexBoundQuals &lt;span &gt;=&lt;/span&gt; …
 
   &lt;span &gt;/*
    * If the index is partial, AND the index predicate with the
    * index-bound quals to produce a more accurate idea of the number of
    * rows covered by the bound conditions.
    */&lt;/span&gt;
   selectivityQuals &lt;span &gt;=&lt;/span&gt; &lt;span &gt;add_predicate_to_index_quals&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;index&lt;span &gt;,&lt;/span&gt; indexBoundQuals&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
 
   btreeSelectivity &lt;span &gt;=&lt;/span&gt; &lt;span &gt;clauselist_selectivity&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; selectivityQuals&lt;span &gt;,&lt;/span&gt;
                                             index&lt;span &gt;-&gt;&lt;/span&gt;rel&lt;span &gt;-&gt;&lt;/span&gt;relid&lt;span &gt;,&lt;/span&gt;
                                             JOIN_INNER&lt;span &gt;,&lt;/span&gt;
                                             &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
   numIndexTuples &lt;span &gt;=&lt;/span&gt; btreeSelectivity &lt;span &gt;*&lt;/span&gt; index&lt;span &gt;-&gt;&lt;/span&gt;rel&lt;span &gt;-&gt;&lt;/span&gt;tuples&lt;span &gt;;&lt;/span&gt;
…
   costs&lt;span &gt;.&lt;/span&gt;numIndexTuples &lt;span &gt;=&lt;/span&gt; numIndexTuples&lt;span &gt;;&lt;/span&gt;
   &lt;span &gt;genericcostestimate&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; path&lt;span &gt;,&lt;/span&gt; loop_count&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;costs&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
…&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see a lot revolves around determining how many index tuples will be matched by the scan - as that’s the main expensive portion of querying a B-tree index.&lt;/p&gt;
&lt;p&gt;The first step is determining the boundaries of the index scan, as it relates to the data stored in the index. In particular this is relevant for multi-column B-tree indexes, where only a subset of the columns might match the query.&lt;/p&gt;
&lt;p&gt;You may have heard before about the best practice of ordering B-tree columns so the columns that are queried by an equality comparison (“=” operator) are put first, followed by one optional inequality comparison (“&amp;#x3C;&gt;” operator), followed by any other columns. This recommendation is based on the physical structure of the B-tree index, and the cost model also reflects this constraint.&lt;/p&gt;
&lt;p&gt;Put differently: The more specific you are with matching equality comparisons, the less parts of the index have to be scanned. This is represented here by the calculation of “btreeSelectivity”. If this number is small, the cost of the index scan will be less, as determined by “genericcostestimate” based on the estimated number of index tuples being scanned.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For creating the ideal B-tree index, you would:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Focus on indexing columns used in equality comparisons&lt;/li&gt;
&lt;li&gt;Index the columns with the best selectivity (i.e. being most specific), so that only a small portion of the index has to be scanned&lt;/li&gt;
&lt;li&gt;Involve a small number of columns (possibly only one), to keep the index size small - and thus reduce the total number of pages in the index&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you follow these steps, you will create a B-tree index that has a low cost, and that Postgres should choose.&lt;/p&gt;
&lt;p&gt;Now, there is one more thing we wanted to talk about, and that involves the notion of Parameterized Index Scans:&lt;/p&gt;
&lt;h3 id=&quot;parameterized-index-scans-or-why-nested-loop-are-sometimes-a-good-join-type&quot; &gt;&lt;a href=&quot;#parameterized-index-scans-or-why-nested-loop-are-sometimes-a-good-join-type&quot; aria-label=&quot;parameterized index scans or why nested loop are sometimes a good join type permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Parameterized Index Scans, or: Why Nested Loop are sometimes a good join type&lt;/h3&gt;
&lt;p&gt;As noted earlier, when Postgres looks at the potential index scans, it creates both plain index scans, and parameterized index scans.&lt;/p&gt;
&lt;p&gt;Plain index scans only involve parts of your query that involve the table itself, and would typically reference the clauses found in the WHERE clause.&lt;/p&gt;
&lt;p&gt;Parameterized index scans on the other hand involve the part of your query that references two different tables. Oftentimes you would find these clauses in the JOIN clause.&lt;/p&gt;
&lt;p&gt;Let’s take a look at a practical example. Assume the following schema and indexes:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; t1 &lt;span &gt;(&lt;/span&gt;
  id &lt;span &gt;bigint&lt;/span&gt; &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  field &lt;span &gt;text&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; t2 &lt;span &gt;(&lt;/span&gt;
  id &lt;span &gt;bigint&lt;/span&gt; &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  t1_id &lt;span &gt;bigint&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  other_field &lt;span &gt;text&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; t1_field_idx &lt;span &gt;ON&lt;/span&gt; t1&lt;span &gt;(&lt;/span&gt;field&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; t2_t1_id_idx &lt;span &gt;ON&lt;/span&gt; t2&lt;span &gt;(&lt;/span&gt;t1_id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And this query:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; t1
&lt;span &gt;JOIN&lt;/span&gt; t2 &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;t1&lt;span &gt;.&lt;/span&gt;id &lt;span &gt;=&lt;/span&gt; t2&lt;span &gt;.&lt;/span&gt;t1_id&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;WHERE&lt;/span&gt; t1&lt;span &gt;.&lt;/span&gt;field &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;123&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We have two tables to scan - t1 and t2.&lt;/p&gt;
&lt;p&gt;For t1, we can utilize a plain index scan on the &lt;code &gt;t1_field_idx&lt;/code&gt; index - and that will perform well, since we have a specific value that is present in the query, that ideally matches a small amount of rows.&lt;/p&gt;
&lt;p&gt;When we run an EXPLAIN on the query, the simplest plan will look like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; t1
&lt;span &gt;JOIN&lt;/span&gt; t2 &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;t1&lt;span &gt;.&lt;/span&gt;id &lt;span &gt;=&lt;/span&gt; t2&lt;span &gt;.&lt;/span&gt;t1_id&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;WHERE&lt;/span&gt; t1&lt;span &gt;.&lt;/span&gt;field &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;123&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Hash Join  (cost=13.74..37.26 rows=5 width=88)
   Hash Cond: (t2.t1_id = t1.id)
   -&gt;  Seq Scan on t2  (cost=0.00..20.70 rows=1070 width=48)
   -&gt;  Hash  (cost=13.67..13.67 rows=6 width=40)
         -&gt;  Bitmap Heap Scan on t1  (cost=4.20..13.67 rows=6 width=40)
               Recheck Cond: (field = &apos;123&apos;::text)
               -&gt;  Bitmap Index Scan on t1_field_idx  (cost=0.00..4.20 rows=6 width=0)
                     Index Cond: (field = &apos;123&apos;::text)
(8 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or put visually:&lt;/p&gt;
&lt;p &gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Index scans of an example Hash Join&quot; title=&quot;Index scans of an example Hash Join&quot; src=&quot;https://pganalyze.com/static/8889223c0f4a4b4b0c07c5f35bdf24eb/f8067/hash_join.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;As we can see Postgres uses a Sequential Scan on t2. Let’s add some more data into the tables, to see if that changes the plan:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; t1 &lt;span &gt;SELECT&lt;/span&gt; val&lt;span &gt;,&lt;/span&gt; val::&lt;span &gt;text&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; generate_series&lt;span &gt;(&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; x&lt;span &gt;(&lt;/span&gt;val&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; t2 &lt;span &gt;SELECT&lt;/span&gt; val&lt;span &gt;,&lt;/span&gt; val&lt;span &gt;,&lt;/span&gt; val::&lt;span &gt;text&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; generate_series&lt;span &gt;(&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; x&lt;span &gt;(&lt;/span&gt;val&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that we are effectively creating exactly one entry that matches the &lt;code &gt;t1.field = &apos;123&apos;&lt;/code&gt; condition, and we are also creating exactly one t2 entry for each t1 entry.&lt;/p&gt;
&lt;p&gt;If we re-run the EXPLAIN, we get the following plan:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Nested Loop  (cost=0.55..16.60 rows=1 width=30)
   -&gt;  Index Scan using t1_field_idx on t1  (cost=0.28..8.29 rows=1 width=11)
         Index Cond: (field = &apos;123&apos;::text)
   -&gt;  Index Scan using t2_t1_id_idx on t2  (cost=0.28..8.29 rows=1 width=19)
         Index Cond: (t1_id = t1.id)
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p &gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Index scans of an example Nested Loop Join&quot; title=&quot;Index scans of an example Nested Loop Join&quot; src=&quot;https://pganalyze.com/static/cfc4b253e446e331010d8a28b593864b/50383/nested_loop_join.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;As you can see, we now get an index scan on &lt;code &gt;t2_t1_id_idx&lt;/code&gt;. This shows a Parameterized Index Scan in action - this is only possible because the join chosen by Postgres is a Nested Loop - not a Hash Join or Merge Join.&lt;/p&gt;
&lt;p&gt;A quick summary of how different join types impact index usage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Merge Join:&lt;/strong&gt; Needs sorted output from the scan node (thus can benefit from a sorted index like B-tree), but doesn&apos;t use the JOIN clause to restrict the data when scanning the table&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hash Join:&lt;/strong&gt; Doesn’t need sorted output, and doesn’t use the JOIN clause to restrict the data when scanning the table&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Nested Loop Join:&lt;/strong&gt; Doesn’t need sorted output from the scan node, but &lt;strong&gt;for one of the two tables&lt;/strong&gt; uses the JOIN clause to restrict the data when scanning the table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Understanding what’s in your WHERE, your JOIN clause and your likely JOIN type is key, as all three will impact index usage.&lt;/p&gt;
&lt;p&gt;If you see a surprising Sequential Scan, you might want to review whether all possible index scans were parameterized index scans, and how the plan changes when you add an additional WHERE clause.&lt;/p&gt;
&lt;h2 id=&quot;new-features-coming-soon-to-pganalyze&quot; &gt;&lt;a href=&quot;#new-features-coming-soon-to-pganalyze&quot; aria-label=&quot;new features coming soon to pganalyze permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;New features coming soon to pganalyze&lt;/h2&gt;
&lt;p&gt;If you find you’re having a hard time reasoning about all of this, you are not alone!&lt;/p&gt;
&lt;p&gt;The reason we’ve spent a lot of time looking through these parts of the Postgres source code, is because they form the basis of a new upcoming version of the Index Advisor.&lt;/p&gt;
&lt;p&gt;And as part of the new Index Advisor, we’ll show you additional information for all scans on a table, to help you assess how Postgres uses existing indexes, and what the best indexing strategy might be.&lt;/p&gt;
&lt;p&gt;Here is a sneak peek from our current design iteration:&lt;/p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Upcoming pganalyze Scans UI&quot; title=&quot;Upcoming pganalyze Scans UI&quot; src=&quot;https://pganalyze.com/static/9dfa02d25d55a229d1fe898a27b1c2e7/1d69c/pganalyze_scans.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;p&gt;The same WHERE clause and JOIN clause data from the Postgres planner is shown in the Scans list, to help you make an assessment of how Postgres builds Plain Index Scans and Parameterized Index Scans for your queries.&lt;/p&gt;
&lt;p&gt;But more on this another day!&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this post we’ve gone down and chased through the Postgres source code until we’ve found the place where indexing decisions happen. We’ve looked at B-tree costing in particular, and looked at a puzzling case of how Nested Loops can affect index usage, by allowing the use of Parameterized Index Scans.&lt;/p&gt;
&lt;p&gt;If you optimize your queries, it helps to understand which tables you are scanning, and what the involved WHERE and JOIN clauses are. Additionally, it’s important to understand the different join types, and that only Nested Loop joins can make use of indexes on columns in the JOIN clause.&lt;/p&gt;
&lt;p&gt;Do you think your peers might be interested in this article? &lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DHow%20%23Postgres%20Chooses%20Which%20Index%20To%20Use%20For%20A%20Query%E2%80%9D%20-%20In%20this%20article%2C%20%40pganalyze%20explain%20how%20the%20Postgres%20planner%20breaks%20down%20a%20query%20into%20scans%20and%20how%20this%20impacts%20indexing%20choices%3A%20https%3A%2F%2Fpganalyze.com%2Fblog%2Fhow-postgres-chooses-index&quot;&gt;Share this on Twitter&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;other--helpful-resources&quot; &gt;&lt;a href=&quot;#other--helpful-resources&quot; aria-label=&quot;other  helpful resources permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Other  helpful resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/postgres-create-index&quot;&gt;Using Postgres CREATE INDEX: Understanding operator classes, index types &amp;#x26; more&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/gin-index&quot;&gt;Understanding Postgres GIN Indexes: The Good and the Bad&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/deconstructing-the-postgres-planner&quot;&gt;How we deconstructed the Postgres planner to find indexing opportunities&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/introducing-pganalyze-index-advisor&quot;&gt;A better way to index your Postgres database: pganalyze Index Advisor&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;Effective Indexing in Postgres eBook&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;Best Practices for Optimizing Postgres Query Performance eBook&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-optimizing-postgres-text-search-trigrams-gist-indexes&quot;&gt;5mins of Postgres E6: Optimizing Postgres Text Search with Trigrams and GiST indexes&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Postgres in 2021: An Observer's Year In Review]]></title><description><![CDATA[Every January, the pganalyze team takes time to sit down to reflect on the year gone by. Of course, we are thinking about pganalyze, our customers and how we can improve our product. But, more importantly, we always take a bird's-eye view at what has happened in our industry, and specifically in the Postgres community. As you can imagine: A lot! So we thought: Instead of trying to summarize everything, let's review what happened with the Postgres project, and what is most exciting from our…]]></description><link>https://pganalyze.com/blog/postgres-2021-year-in-review</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-2021-year-in-review</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Fri, 07 Jan 2022 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Every January, the pganalyze team takes time to sit down to reflect on the year gone by. Of course, we are thinking about pganalyze, our customers and how we can improve our product. But, more importantly, we always take a bird&apos;s-eye view at what has happened in our industry, and specifically in the Postgres community. As you can imagine: A lot!&lt;/p&gt;
&lt;p&gt;So we thought: Instead of trying to summarize everything, &lt;strong&gt;let&apos;s review what happened with the Postgres project, and what is most exciting from our personal perspective&lt;/strong&gt;. Coincidentally, a new Postgres &lt;a href=&quot;https://commitfest.postgresql.org/&quot;&gt;Commitfest&lt;/a&gt; has just started, so it&apos;s the perfect time to read about new functionality that is being proposed by the PostgreSQL community.&lt;/p&gt;
&lt;p&gt;The following are my own thoughts on the past year of Postgres, and a few of the things that I&apos;m excited about looking ahead. Let&apos;s take a look:&lt;/p&gt;
&lt;p &gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/01f82442d85bf3fb837f7eb6e385542b/ec605/postgres_2021_year_in_review_pganalyze.jpg&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Postgres: 2021 Year In Review&quot; title=&quot;Postgres: 2021 Year In Review&quot; src=&quot;https://pganalyze.com/static/01f82442d85bf3fb837f7eb6e385542b/acb04/postgres_2021_year_in_review_pganalyze.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#postgres-performance-sometimes-its-the-small-things&quot;&gt;Postgres Performance: Sometimes it&apos;s the small things&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#does-autovacuum-dream-of-64-bit-transaction-ids&quot;&gt;Does autovacuum dream of 64-bit Transaction IDs?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#explain-nested-loops-can-be-deceiving&quot;&gt;EXPLAIN: Nested Loops can be deceiving&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#extended-statistics-help-the-postgres-planner-do-its-job-better&quot;&gt;Extended Statistics: Help the Postgres planner do its job better&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#crustaceous-postgres-using-rust-for-extensions--more&quot;&gt;Crustaceous Postgres: Using Rust For Extensions &amp;#x26; more&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#other-highlights-from-postgres-development-in-2021&quot;&gt;Other highlights from Postgres development in 2021&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;postgres-performance-sometimes-its-the-small-things&quot; &gt;&lt;a href=&quot;#postgres-performance-sometimes-its-the-small-things&quot; aria-label=&quot;postgres performance sometimes its the small things permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Postgres Performance: Sometimes it&apos;s the small things&lt;/h2&gt;
&lt;p&gt;To start with, I wanted to look at one very specific change that I actually hadn&apos;t noticed until recently.&lt;/p&gt;
&lt;p&gt;Specifically: The performance of &lt;code &gt;IN&lt;/code&gt; clauses, and the work done to improve performance for long &lt;code &gt;IN&lt;/code&gt; lists in Postgres 14.&lt;/p&gt;
&lt;p&gt;First, let&apos;s set up a test table with some data that we can query:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; tbl &lt;span &gt;(&lt;/span&gt;
    id &lt;span &gt;int&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; tbl &lt;span &gt;SELECT&lt;/span&gt; i &lt;span &gt;FROM&lt;/span&gt; generate_series&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;100000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; n&lt;span &gt;(&lt;/span&gt;i&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, we run a very simple query with a long &lt;code &gt;IN&lt;/code&gt; list on Postgres 13:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;postgres&lt;span &gt;=&lt;/span&gt;&lt;span &gt;# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]);&lt;/span&gt;
 count 
&lt;span &gt;-------&lt;/span&gt;
  &lt;span &gt;1000&lt;/span&gt;
&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt; &lt;span &gt;row&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

&lt;span &gt;Time&lt;/span&gt;: &lt;span &gt;360.520&lt;/span&gt; ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is noticeably slow. With Postgres 14 however:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;postgres&lt;span &gt;=&lt;/span&gt;&lt;span &gt;# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]);&lt;/span&gt;
 count 
&lt;span &gt;-------&lt;/span&gt;
  &lt;span &gt;1000&lt;/span&gt;
&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt; &lt;span &gt;row&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

&lt;span &gt;Time&lt;/span&gt;: &lt;span &gt;12.246&lt;/span&gt; ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;An amazing 30x improvement!&lt;/strong&gt; Note that this is most pronounced with Sequential Scans, or other situations where the executor makes a lot of comparisons, i.e. when the expression shows up as a &lt;code &gt;Filter&lt;/code&gt; clause.&lt;/p&gt;
&lt;p&gt;The reason I like this change is that it demonstrates what the Postgres community does well: Refine the existing system,
and optimize clear inefficiencies, without requiring users to change their queries.&lt;/p&gt;
&lt;p&gt;Of course, there are many other exciting performance efforts, here are a few:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;https://pganalyze.com/blog/postgres-14-performance-monitoring#improved-active-and-idle-connection-scaling-in-postgres-14&quot;&gt;Connection scaling improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;https://blog.jooq.org/postgresql-14s-enable_memoize-for-improved-performance-of-nested-loop-joins/&quot;&gt;Memoization of Nested Loops&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;https://www.postgresql.org/docs/14/libpq-pipeline-mode.html&quot;&gt;libpq pipelining&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3499/&quot;&gt;libpq compression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3316/&quot;&gt;Asynchronous I/O and direct I/O&lt;/a&gt; (see also this &lt;a href=&quot;https://speakerdeck.com/azuredbpostgres/asynchronous-io-for-postgresql-pgcon-2020-andres-freund&quot;&gt;presentation&lt;/a&gt; by Andres Freund)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;does-autovacuum-dream-of-64-bit-transaction-ids&quot; &gt;&lt;a href=&quot;#does-autovacuum-dream-of-64-bit-transaction-ids&quot; aria-label=&quot;does autovacuum dream of 64 bit transaction ids permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Does autovacuum dream of 64-bit Transaction IDs?&lt;/h2&gt;
&lt;p&gt;Now, on to a much bigger topic. If you&apos;ve scaled Postgres, you&apos;ve likely come to meet the archenemy of a large Postgres installation: VACUUM, or rather its cousin, autovacuum, which cleans up dead tuples from your tables and advances the transaction ID horizon in Postgres.&lt;/p&gt;
&lt;p&gt;Much has been said (&lt;a href=&quot;https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres&quot;&gt;1&lt;/a&gt;, &lt;a href=&quot;https://blog.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql&quot;&gt;2&lt;/a&gt;, &lt;a href=&quot;https://www.joyent.com/blog/manta-postmortem-7-27-2015&quot;&gt;3&lt;/a&gt;) about what happens when you hit &lt;strong&gt;Transaction ID (TXID) Wraparound&lt;/strong&gt;, a situation in which Postgres is unable to start a new transaction. A recent blog post illustrating Notion&apos;s motivation to shard their Postgres deployment, puts it well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;More worrying was the prospect of transaction ID (TXID) wraparound, a safety mechanism in which Postgres would stop processing all writes to avoid clobbering existing data.
Realizing that TXID wraparound would pose an existential threat to the product, our infrastructure team doubled down and got to work.
&lt;br/&gt;&lt;br/&gt;
&lt;em&gt;- &lt;a href=&quot;https://www.notion.so/blog/sharding-postgres-at-notion&quot;&gt;Garrett Fidalgo - Herding elephants: Lessons learned from sharding Postgres at Notion&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The root cause here is actually very simple. Transaction IDs are stored as 32-bit integers in Postgres. For example on individual rows in the table, to identify when the row first became visible to other transactions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Most people would agree that moving from 32-bit to 64-bit Transaction IDs is a good idea.&lt;/strong&gt; There have been multiple attempts over the years, but in the last weeks a new patch by Maxim Orlov has kickstarted a &lt;a href=&quot;https://www.postgresql.org/message-id/flat/CACG=ezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe=pyyjVWA@mail.gmail.com&quot;&gt;new discussion&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Whilst the community&apos;s motivation to fix this is certainly there, the early reviews give a glimpse of what needs to be considered when moving to 64-bit TXIDs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;32-bit systems will have issues with atomic read/write of shared transaction ID variables&lt;/li&gt;
&lt;li&gt;Extremely long-running transactions could fail if they exceed the new &quot;short transaction ID&quot; boundary (which remains at 32-bit in this patch)&lt;/li&gt;
&lt;li&gt;On-disk format - keeping compatibility with the old format vs rewriting all data when an old cluster is upgraded (this patch tries to avoid changing the on-disk format)&lt;/li&gt;
&lt;li&gt;Multixact freeze still needs to happen at a somewhat regular frequency (one of the activities that VACUUM takes care of today)&lt;/li&gt;
&lt;li&gt;Memory overhead of larger 64-bit IDs in hot code paths (e.g. those optimized by recent connection scalability improvements)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And Peter Geoghegan &lt;a href=&quot;https://www.postgresql.org/message-id/flat/CAH2-Wzk68iW_z0rb8VxEchQavHLPLPXv_Vkx954B%3DBmqSrL_mQ%40mail.gmail.com#4d0f09cc88ae1ee58ba3278e827a82dc&quot;&gt;puts it succinctly&lt;/a&gt; in reviewing the patch:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I believe that a good solution to the problem that this patch tries to
solve needs to be more ambitious. I think that we need to return to
first principles, rather than extending what we have already.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Despite the email thread being titled &quot;Add 64-bit XIDs into PostgreSQL 15&quot;, given these concerns, it&apos;s extremely unlikely that a change like this would make it into Postgres 15 at this point - but one can dream, and look ahead to Postgres 16.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Looking for something you can use today?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Postgres 14 brought two great improvements in the area of VACUUM and bloat reduction: (1) &lt;a href=&quot;https://www.postgresql.org/docs/14/btree-implementation.html#BTREE-DELETION&quot;&gt;The new bottom-up index deletion for B-tree indexes&lt;/a&gt;, (2) The new VACUUM &quot;emergency mode&quot; that provides better protection against impeding TXID Wraparound.&lt;/p&gt;
&lt;h2 id=&quot;explain-nested-loops-can-be-deceiving&quot; &gt;&lt;a href=&quot;#explain-nested-loops-can-be-deceiving&quot; aria-label=&quot;explain nested loops can be deceiving permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;EXPLAIN: Nested Loops can be deceiving&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://wiki.postgresql.org/wiki/CommitFest&quot;&gt;Commitfests&lt;/a&gt; are about encouraging code reviews, first and foremost. Whilst looking through patches, I noticed a small one, which adds additional information about &lt;a href=&quot;https://commitfest.postgresql.org/36/2765/&quot;&gt;Nested Loops to EXPLAIN&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The patch was initially proposed back in 2020, and saw some minor refactorings in 2021, but no-one had reviewed it yet in this Commitfest. So I took the opportunity to &lt;a href=&quot;https://www.postgresql.org/message-id/flat/CAP53Pkw1d%2BsmuPvsVDecSnfphyZ46zrkSNjNEbSF3HA6-EsFkA%40mail.gmail.com#68986e7b42340acec3dc61f7af36cdf2&quot;&gt;review it&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First, to understand what the patch aims to do, let&apos;s look at a common EXPLAIN output for a Nested Loop:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop (actual rows=23 loops=1)
   Output: tbl1.col1, tprt.col1
   -&gt;  Seq Scan on public.tbl1 (actual rows=5 loops=1)
         Output: tbl1.col1
   -&gt;  Append (actual rows=5 loops=5)
         -&gt;  Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5)
               Output: tprt_1.col1
               Index Cond: (tprt_1.col1 &amp;lt; tbl1.col1)
         -&gt;  Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4)
               Output: tprt_2.col1
               Index Cond: (tprt_2.col1 &amp;lt; tbl1.col1)
         -&gt;  Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2)
               Output: tprt_3.col1
               Index Cond: (tprt_3.col1 &amp;lt; tbl1.col1)
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Based on this we might assume that each loop produces 5 rows, as the existing &quot;actual rows&quot; statistic shows the average across all loops.&lt;/p&gt;
&lt;p&gt;But this example shows well where the math already doesn&apos;t add up: The parent &lt;a src=&quot;https://pganalyze.com/docs/explain/other-nodes/append&quot;&gt;Append&lt;/a&gt; node returns 5 rows on average, but the child node &quot;actual rows&quot; add up to 6. And the top &lt;a src=&quot;https://pganalyze.com/docs/explain/join-nodes/nested-loop&quot;&gt;Nested Loop&lt;/a&gt; node returns 23 rows, but we can&apos;t see clearly which index these rows are being found in.&lt;/p&gt;
&lt;p&gt;With the patch in place, we get an extra row with &lt;code &gt;Loop&lt;/code&gt; information:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop (actual rows=23 loops=1)
   Output: tbl1.col1, tprt.col1
   -&gt;  Seq Scan on public.tbl1 (actual rows=5 loops=1)
         Output: tbl1.col1
   -&gt;  Append (actual rows=5 loops=5)
         Loop Min Rows: 2  Max Rows: 6  Total Rows: 23
         -&gt;  Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5)
               Loop Min Rows: 2  Max Rows: 2  Total Rows: 10
               Output: tprt_1.col1
               Index Cond: (tprt_1.col1 &amp;lt; tbl1.col1)
         -&gt;  Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4)
               Loop Min Rows: 2  Max Rows: 3  Total Rows: 11
               Output: tprt_2.col1
               Index Cond: (tprt_2.col1 &amp;lt; tbl1.col1)
         -&gt;  Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2)
               Loop Min Rows: 1  Max Rows: 1  Total Rows: 2
               Output: tprt_3.col1
               Index Cond: (tprt_3.col1 &amp;lt; tbl1.col1)
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see how much clearer the picture is with this. We can understand that both &lt;code &gt;tprt1_idx&lt;/code&gt; and &lt;code &gt;tprt2_idx&lt;/code&gt; contributed about equally to the result. We can also see that some loop iterations have smaller row counts (2), vs other iterations have higher counts (6). When &lt;code &gt;TIMING&lt;/code&gt; is turned on, you also get information on the min/max time of the loop iterations.&lt;/p&gt;
&lt;p&gt;Given the prevalance of slow query plans that contain a Nested Loop, this appears to be a very useful patch. The main open item with this patch appears to be the slight overhead caused by collecting additional statistics - something to be discussed further on the mailinglist.&lt;/p&gt;
&lt;p&gt;Interested in other EXPLAIN improvements? Here&apos;s what happened recently:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;https://pganalyze.com/blog/postgres-14-performance-monitoring#monitor-queries-with-the-built-in-postgres-query_id&quot;&gt;pg_stat_statements queryid is now built into core&lt;/a&gt;, and shows in EXPLAIN output&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3050/&quot;&gt;Showing applied extended statistics in EXPLAIN&lt;/a&gt; (to quote my colleague Maciek: &quot;Oh neat, that&apos;s pretty cool.&quot;)&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3298/&quot;&gt;Showing I/O timings spent reading/writing temp buffers in EXPLAIN&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;extended-statistics-help-the-postgres-planner-do-its-job-better&quot; &gt;&lt;a href=&quot;#extended-statistics-help-the-postgres-planner-do-its-job-better&quot; aria-label=&quot;extended statistics help the postgres planner do its job better permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Extended Statistics: Help the Postgres planner do its job better&lt;/h2&gt;
&lt;p&gt;Going back to what you can use today: Extended Statistics on Expressions, released in Postgres 14.&lt;/p&gt;
&lt;p&gt;Let&apos;s back up there for a moment. If you are not familiar, &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createstatistics.html&quot;&gt;extended statistics&lt;/a&gt; allow you to collect additional statistics about table contents, so the Postgres planner can provide better query plans.&lt;/p&gt;
&lt;p&gt;The general syntax is like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;STATISTICS&lt;/span&gt; &lt;span &gt;[&lt;/span&gt; &lt;span &gt;IF&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;EXISTS&lt;/span&gt; &lt;span &gt;]&lt;/span&gt; statistics_name
    &lt;span &gt;[&lt;/span&gt; &lt;span &gt;(&lt;/span&gt; statistics_kind &lt;span &gt;[&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt; &lt;span &gt;]&lt;/span&gt; &lt;span &gt;)&lt;/span&gt; &lt;span &gt;]&lt;/span&gt;
    &lt;span &gt;ON&lt;/span&gt; column_name&lt;span &gt;,&lt;/span&gt; column_name &lt;span &gt;[&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; table_name&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Before Postgres 14 you could already create extended statistics that help the planner understand the correlation between two columns, which often times is necessary to avoid selectivity mis-estimates.&lt;/p&gt;
&lt;p&gt;With the new extended statistics for expressions, you can inform the planner how selective a particular expression is, which in turn leads to better query plans. Here is an example of how to use this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; tbl &lt;span &gt;(&lt;/span&gt;
    a timestamptz
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;STATISTICS&lt;/span&gt; st &lt;span &gt;ON&lt;/span&gt; date_trunc&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;month&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; a&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; tbl&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will cause Postgres to not only collect statistics about &lt;code &gt;a&lt;/code&gt; itself (which it does by default), but also the expression that uses the &lt;code &gt;date_trunc&lt;/code&gt; function, and what the statistics of results of that expression are. You can find a complete example in the &lt;a href=&quot;https://www.postgresql.org/docs/14/sql-createstatistics.html&quot;&gt;Postgres docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In addition to this, there are many changes in-flight that are being discussed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3245/&quot;&gt;Improve selectivity estimates when extended statistics are present&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/2831/&quot;&gt;Extended statistics for Var op Var clauses / Expr op Expr&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3055/&quot;&gt;Estimating JOINs using extended statistics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;crustaceous-postgres-using-rust-for-extensions--more&quot; &gt;&lt;a href=&quot;#crustaceous-postgres-using-rust-for-extensions--more&quot; aria-label=&quot;crustaceous postgres using rust for extensions  more permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Crustaceous Postgres: Using Rust For Extensions &amp;#x26; more&lt;/h2&gt;
&lt;p&gt;A side topic that isn&apos;t actually about Postgres development itself, but still pretty exciting on a larger scale: &lt;strong&gt;Postgres and Rust&lt;/strong&gt;. As you probably know, Postgres itself is written in C, and that is unlikely to change.&lt;/p&gt;
&lt;p&gt;However there are two great examples of Rust being used to augment the Postgres ecosystem.&lt;/p&gt;
&lt;p&gt;First, you can write Postgres extensions in Rust using &lt;a href=&quot;https://github.com/zombodb/pgx&quot;&gt;pgx&lt;/a&gt;, and by now this approach has matured to the point that even established extension authors such as the TimescaleDB team have started adopting Rust for some of their projects, such as the &lt;a href=&quot;https://github.com/timescale/timescaledb-toolkit&quot;&gt;TimescaleDB toolkit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Second, there are new systems being developed that build on Postgres, that utilize Rust as their language of choice, e.g. for networked services. The most interesting development in 2021 in this regard is the work of the team at &lt;a href=&quot;https://github.com/zenithdb/zenith&quot;&gt;ZenithDB&lt;/a&gt;, that is working on an Apache 2.0-licensed variant of a shared disk-type scale-out architecture (similar to Amazon Aurora), built on Postgres, with services written in Rust.&lt;/p&gt;
&lt;h2 id=&quot;other-highlights-from-postgres-development-in-2021&quot; &gt;&lt;a href=&quot;#other-highlights-from-postgres-development-in-2021&quot; aria-label=&quot;other highlights from postgres development in 2021 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Other highlights from Postgres development in 2021&lt;/h2&gt;
&lt;p&gt;A single post could never do everything justice, but here are a few more things that caught my attention:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Better Security:&lt;/strong&gt; No more (unexpected) &lt;a href=&quot;https://www.depesz.com/2021/09/10/waiting-for-postgresql-15-revoke-public-create-from-public-schema-now-owned-by-pg_database_owner/&quot;&gt;creation of objects in the public schema&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What&apos;s in your JSONB, really?&lt;/strong&gt; Find out by &lt;a href=&quot;https://commitfest.postgresql.org/36/3500/&quot;&gt;Collecting statistics about JSONB columns&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ANALYZE + Partitioning:&lt;/strong&gt; Did you know partitioned parent tables may need manual ANALYZE? &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e1efc5b465c844969a0ed0d07e1364f3ce424d8c&quot;&gt;With Postgres 14 it&apos;s easier to keep track of it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Filtering Logical Replication:&lt;/strong&gt; Want to filter your data &lt;a href=&quot;https://commitfest.postgresql.org/36/3230/&quot;&gt;by columns&lt;/a&gt;, or &lt;a href=&quot;https://commitfest.postgresql.org/36/2906/&quot;&gt;by rows&lt;/a&gt;? Postgres 15 may allow you to do both!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And even more things that are pretty cool:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Security&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;https://www.postgresql.org/docs/14/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION&quot;&gt;SCRAM-SHA-256 is now the default for password encryption&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;https://www.cybertec-postgresql.com/en/finally-a-system-level-read-all-data-role-for-postgresql/&quot;&gt;pg_read_all_data and pg_write_all_data roles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3138/&quot;&gt;Support for NSS as a libpq TLS backend&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3414/&quot;&gt;Non-superuser subscription owners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3458/&quot;&gt;Support issuing SSL certificates for multiple IP addresses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;JSON(B)&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;https://blog.crunchydata.com/blog/better-json-in-postgres-with-postgresql-14&quot;&gt;JSON subscript operators&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://www.postgresql.org/message-id/flat/224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru&quot;&gt;Custom TOASTer for JSONB, and Pluggable Toast&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/2902/&quot;&gt;JSON_TABLE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/2901/&quot;&gt;SQL/JSON&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/2482/&quot;&gt;jsonpath syntax extensions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Partitioning&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3052/&quot;&gt;Merging statistics from partition children instead of re-sampling everything&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/2815/&quot;&gt;CREATE INDEX CONCURRENTLY on partitioned table&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3071/&quot;&gt;Lazy JIT IR code generation to increase JIT speed with partitions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3478/&quot;&gt;AcquireExecutorLocks() and run-time pruning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/31/2694/&quot;&gt;Automatic partition creation&lt;/a&gt; (sadly this patch has no recent progress)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Logical Replication&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Postgres 14: &lt;a href=&quot;http://amitkapila16.blogspot.com/2021/09/logical-replication-improvements-in.html&quot;&gt;Multiple improvements &amp;#x26; performance fixes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Postgres 15: &lt;a href=&quot;https://www.depesz.com/2021/11/16/waiting-for-postgresql-15-allow-publishing-the-tables-of-schema/&quot;&gt;Allow publishing all tables in a schema&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Postgres 15: &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8d74fc96db5fd547e077bf9bf4c3b67f821d71cd&quot;&gt;pg_stat_subscription_workers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/2968/&quot;&gt;Logical Decoding on standbys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3393/&quot;&gt;Synchronize Logical Replication slots to standbys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In Development: &lt;a href=&quot;https://commitfest.postgresql.org/36/3155/&quot;&gt;Logical decoding and replication of sequences&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&amp;#x26; more :)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You may also enjoy &lt;a href=&quot;https://commitfest.postgresql.org/36/&quot;&gt;looking at the current Commitfest&lt;/a&gt;, to make up your own mind.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The above might feel quite extensive, but that&apos;s not merely all of the things that have happened with Postgres in 2021. I&apos;m excited to be part of such a vibrant community contributing to making Postgres continuously better and am eager to see what&apos;s to come for Postgres in 2022!&lt;/p&gt;
&lt;p&gt;At pganalyze we&apos;re committed to providing the best &lt;strong&gt;Postgres monitoring and observability&lt;/strong&gt; to help you uncover deep insights about Postgres performance. Whether your Postgres runs in the cloud, your on-premises data center, or a Raspberry Pi: &lt;a href=&quot;https://app.pganalyze.com/users/sign_up&quot;&gt;You can give pganalyze a try&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DPostgres%20in%202021%3A%20An%20Observer%27s%20Year%20In%20Review%E2%80%9D%20-%20In%20this%20article%2C%20%40pganalyze%20take%20a%20look%20at%20Postgres%2014%2C%20explaining%20nested%20loops%2C%20extended%20statistics%20for%20expressions%2C%2064-bit%20transaction%20IDs%2C%20and%20more%20exciting%20Postgres%20patches%20from%202021%3A%20https%3A%2F%2Fpganalyze.com%2Fblog%2Fpostgres-2021-year-in-review&quot;&gt;Share this on Twitter&lt;/a&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[The Fastest Way To Load Data Into Postgres With Ruby on Rails]]></title><description><![CDATA[Data migration is a delicate and sometimes complicated and time-consuming process. Whether you are loading data from a legacy application to a new application or you just want to move data from one database to another, you’ll most likely need to create a migration script that will be accurate, efficient, and fast to help with the process — especially if you are planning to load a huge amount of data. There are several ways you can load data from an old Rails app or other application to Rails. In…]]></description><link>https://pganalyze.com/blog/fastest-way-importing-data-into-postgres-with-ruby-rails</link><guid isPermaLink="false">https://pganalyze.com/blog/fastest-way-importing-data-into-postgres-with-ruby-rails</guid><dc:creator><![CDATA[Eze Sunday Eze]]></dc:creator><pubDate>Tue, 14 Dec 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Data migration is a delicate and sometimes complicated and time-consuming process. Whether you are loading data from a legacy application to a new application or you just want to move data from one database to another, you’ll most likely need to create a migration script that will be accurate, efficient, and fast to help with the process — especially if you are planning to load a huge amount of data.&lt;/p&gt;
&lt;p&gt;There are several ways you can load data from an old Rails app or other application to Rails. In this article, I’ll explain a few ways to load data to a PostgreSQL database with Rails. We’ll go over their pros and cons, so you can choose the method that works best for your situation.&lt;/p&gt;
&lt;p&gt;Postgres is an innovative database. According to &lt;a href=&quot;https://pgconf.in/files/presentations/2019/02-01Future_of_PostgreSQL_5.pdf&quot;&gt;a recent study by DB-Engines (PDF)&lt;/a&gt;, &lt;strong&gt;PostgreSQL’s popularity rating increased by 65 percent from January 2016–January 2019&lt;/strong&gt;, while the rating of MySQL, SQL Server, and Oracle decreased by 10–16 percent during the same period.&lt;/p&gt;
&lt;p&gt;PostgreSQL has a strong reputation for handling large data sets. However, with the wrong tools and solutions, its powers can be undermined. So what’s the fastest way to load data to a Postgres database in your Rails app? Let’s look at four different methods, and then we’ll see which is the fastest.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#inserting-one-record-at-a-time-to-load-data-to-your-postgres-database&quot;&gt;Inserting one record at a time to load data to your Postgres database&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#pros-of-single-row-inserts-with-postgres&quot;&gt;Pros of single-row inserts with Postgres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#cons-of-single-row-inserts-with-postgres&quot;&gt;Cons of single-row inserts with Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#bulk-inserts-with-active-record-import-to-load-data-to-your-postgres-database&quot;&gt;Bulk Inserts with Active Record Import to load data to your Postgres database&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#pros-of-bulk-inserts-with-active-record-in-ruby-on-rails-and-postgres&quot;&gt;Pros of Bulk Inserts with Active Record in Ruby on Rails and Postgres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#cons-of-bulk-inserts-with-active-record-in-ruby-on-rails-and-postgres&quot;&gt;Cons of Bulk Inserts with Active Record in Ruby on Rails and Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#using-postgresql-copy-with-activerecord-copy-to-load-data-to-your-postgres-database&quot;&gt;Using PostgreSQL Copy with Activerecord-copy to load data to your Postgres database&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#pros-of-using-postgresql-copy-with-activerecord-copy&quot;&gt;Pros of using PostgreSQL Copy with Activerecord-copy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#cons-of-using-postgresql-copy-with-activerecord-copy&quot;&gt;Cons of using PostgreSQL Copy with Activerecord-copy&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#4-using-background-jobs--to-load-data-to-your-postgres-database&quot;&gt;4. Using background jobs  to load data to your Postgres database&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#final-thoughts-about-loading-large-data-sets-into-a-postgresql-database-with-rails&quot;&gt;Final Thoughts About Loading Large Data Sets into a PostgreSQL Database with Rails&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#speed-comparison-of-different-ways-to-load-data-into-postgres-with-rails&quot;&gt;Speed comparison of different ways to load data into Postgres with Rails&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#other-articles-and-resources-you-might-like&quot;&gt;Other articles and resources you might like&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p &gt;
&lt;img src=&quot;https://pganalyze.com/662907c95c4f480497bd6b611ef1a61e/rails-insert-methods.svg&quot; alt=&quot;Diagram of Ruby on Rails Insert Methods&quot;&gt;
&lt;/p&gt;
&lt;h2 id=&quot;inserting-one-record-at-a-time-to-load-data-to-your-postgres-database&quot; &gt;&lt;a href=&quot;#inserting-one-record-at-a-time-to-load-data-to-your-postgres-database&quot; aria-label=&quot;inserting one record at a time to load data to your postgres database permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Inserting one record at a time to load data to your Postgres database&lt;/h2&gt;
&lt;p&gt;One easy way to load data to a Postgres database is to &lt;strong&gt;loop through the data and insert them one at a time.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here’s a sample code to do this in Rails, assuming we have the source data in a CSV file:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# lib/tasks/one_record_at_a_time.rake&lt;/span&gt;
&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&apos;csv&apos;&lt;/span&gt;
&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&quot;benchmark&quot;&lt;/span&gt;

namespace &lt;span &gt;:import&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
   desc &lt;span &gt;&quot;imports data from csv to postgresql&quot;&lt;/span&gt;
   task &lt;span &gt;:single_record&lt;/span&gt; &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;:environment&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
       &lt;span &gt;#This function loops over the content of the csv file and creates a new record for each of them.&lt;/span&gt;
       &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;insert_user&lt;/span&gt;&lt;/span&gt;
           &lt;span &gt;CSV&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;foreach&lt;span &gt;(&lt;/span&gt;filename&lt;span &gt;,&lt;/span&gt; headers&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;row&lt;span &gt;|&lt;/span&gt;
               &lt;span &gt;User&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;create&lt;span &gt;(&lt;/span&gt;row&lt;span &gt;)&lt;/span&gt;
           &lt;span &gt;end&lt;/span&gt;
       &lt;span &gt;end&lt;/span&gt;
       puts &lt;span &gt;Benchmark&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;realtime &lt;span &gt;{&lt;/span&gt;insert_user &lt;span &gt;}&lt;/span&gt; &lt;span &gt;#Here we are using benchmark to measure the speed&lt;/span&gt;
   &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But there’s a problem with this approach. Inserting data one at a time into a PostgreSQL database is &lt;strong&gt;extremely slow&lt;/strong&gt;. I ran this Rake task to insert over a million records and measured it with &lt;a href=&quot;https://www.google.com/url?q=https://github.com/ruby/benchmark&amp;#x26;sa=D&amp;#x26;source=docs&amp;#x26;ust=1638299761966000&amp;#x26;usg=AOvVaw2Sb3Tlqj3l5TpIHQ9Rt3g8&quot;&gt;Benchmark&lt;/a&gt;. The report came back with a result of over 1.3 hours, &lt;em&gt;that’s a long time&lt;/em&gt;. There&apos;s overhead in both the database and the application in processing rows one-by-one, and additional latency in waiting for the database round trip for each row.&lt;/p&gt;
&lt;p&gt;We’ll see a better approach in the next section, but for now, here’s a summary of the pros and cons of single-row inserts:&lt;/p&gt;
&lt;h3 id=&quot;pros-of-single-row-inserts-with-postgres&quot; &gt;&lt;a href=&quot;#pros-of-single-row-inserts-with-postgres&quot; aria-label=&quot;pros of single row inserts with postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Pros of single-row inserts with Postgres&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Doesn’t require an external dependency&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;cons-of-single-row-inserts-with-postgres&quot; &gt;&lt;a href=&quot;#cons-of-single-row-inserts-with-postgres&quot; aria-label=&quot;cons of single row inserts with postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Cons of single-row inserts with Postgres&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Very slow&lt;/li&gt;
&lt;li&gt;Might lock your session for a long time&lt;/li&gt;
&lt;li&gt;Not suitable for inserting large data sets&lt;/li&gt;
&lt;li&gt;If one insert fails, you’re stuck with partially loaded data&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;bulk-inserts-with-active-record-import-to-load-data-to-your-postgres-database&quot; &gt;&lt;a href=&quot;#bulk-inserts-with-active-record-import-to-load-data-to-your-postgres-database&quot; aria-label=&quot;bulk inserts with active record import to load data to your postgres database permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Bulk Inserts with Active Record Import to load data to your Postgres database&lt;/h2&gt;
&lt;p&gt;Running a bulk insert query is a better and faster way to load data into your Postgres database, and the Rails gem &lt;a href=&quot;https://github.com/zdennis/activerecord-import&quot;&gt;&lt;code &gt;activerecord-import&lt;/code&gt;&lt;/a&gt; makes it easy to load massive data in bulk in a way that the Active Record ORM can understand and manipulate.&lt;/p&gt;
&lt;p&gt;Instead of hitting your database multiple times, processing transactions, and doing all the back and forth with your app and database, &lt;strong&gt;the Active Record Import gem allows you to build up large insert queries and run them at once.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can install the Active Record Import gem by adding &lt;code &gt;gem &apos;activerecord-import&apos;&lt;/code&gt; to your &lt;code &gt;Gemfile&lt;/code&gt; and running &lt;code &gt;bundle install&lt;/code&gt; in your terminal. This gem adds &lt;code &gt;import&lt;/code&gt; to Active Record classes. That means you’ll only need to call the import method on your model classes to load the data into your database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Here is an example:&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# lib/tasks/active_record_import.rake&lt;/span&gt;
&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&apos;csv&apos;&lt;/span&gt;
&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&quot;benchmark&quot;&lt;/span&gt;

namespace &lt;span &gt;:import&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
   desc &lt;span &gt;&quot;imports data from csv to postgresql&quot;&lt;/span&gt;
   users &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
   task &lt;span &gt;:batch_record&lt;/span&gt; &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;:environment&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
       &lt;span &gt;CSV&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;foreach&lt;span &gt;(&lt;/span&gt;filename&lt;span &gt;,&lt;/span&gt; headers&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;row&lt;span &gt;|&lt;/span&gt;
           users &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; row
       &lt;span &gt;end&lt;/span&gt;
       newusers &lt;span &gt;=&lt;/span&gt; users&lt;span &gt;.&lt;/span&gt;map &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;attrs&lt;span &gt;|&lt;/span&gt;
           &lt;span &gt;User&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;attrs&lt;span &gt;)&lt;/span&gt;
       &lt;span &gt;end&lt;/span&gt;
       time &lt;span &gt;=&lt;/span&gt; &lt;span &gt;Benchmark&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;realtime &lt;span &gt;{&lt;/span&gt;&lt;span &gt;User&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;import&lt;span &gt;(&lt;/span&gt;newusers&lt;span &gt;)&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;
       puts time
   &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice how we’re building up the record in an array—&lt;code &gt;users&lt;/code&gt;—and passing the array to the import method on the User model— &lt;code &gt;User.import(newusers)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That’s really all that needs to be done. However, you can choose to pass only some specific columns and the values in an array to the import method if you want to. For example, &lt;code &gt;User.import columns values&lt;/code&gt; where the columns will be an array like &lt;code &gt;[&quot;first_name&quot;, &quot;last_name&quot;]&lt;/code&gt;, while the values will be an array like &lt;code &gt;[ [&apos;Peter&apos;, &apos;Joseph&apos;], [&apos;Banabas&apos;, &apos;Bob Jones&apos;] ]&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I analyzed loading a million records into a Postgres database with Rails using this method, and it took only 5.1 minutes. Remember the first method took 1.3 hours? &lt;strong&gt;This method is 1,529% ( ~15x ) faster.&lt;/strong&gt; That’s impressive.&lt;/p&gt;
&lt;h3 id=&quot;pros-of-bulk-inserts-with-active-record-in-ruby-on-rails-and-postgres&quot; &gt;&lt;a href=&quot;#pros-of-bulk-inserts-with-active-record-in-ruby-on-rails-and-postgres&quot; aria-label=&quot;pros of bulk inserts with active record in ruby on rails and postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Pros of Bulk Inserts with Active Record in Ruby on Rails and Postgres&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Follows Active Record Associations, meaning Rails ORM is able to do its magic with the loaded data&lt;/li&gt;
&lt;li&gt;Faster to load data into your PostgreSQL database&lt;/li&gt;
&lt;li&gt;Doesn’t have per-row overhead&lt;/li&gt;
&lt;li&gt;If insert fails, your transaction will rollback the insert&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;cons-of-bulk-inserts-with-active-record-in-ruby-on-rails-and-postgres&quot; &gt;&lt;a href=&quot;#cons-of-bulk-inserts-with-active-record-in-ruby-on-rails-and-postgres&quot; aria-label=&quot;cons of bulk inserts with active record in ruby on rails and postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Cons of Bulk Inserts with Active Record in Ruby on Rails and Postgres&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;The activerecord-import gem might conflict with other gems that add &lt;code &gt;.import&lt;/code&gt;  method to the Active Record model. However, in cases where this might happen, you can use the &lt;code &gt;.bulk_import&lt;/code&gt; method also attached to your model classes as an alternative.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See how batch import improved our speed by over 1,529%? That was incredible, right? There is still a faster way to load data to a Postgres database.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;using-postgresql-copy-with-activerecord-copy-to-load-data-to-your-postgres-database&quot; &gt;&lt;a href=&quot;#using-postgresql-copy-with-activerecord-copy-to-load-data-to-your-postgres-database&quot; aria-label=&quot;using postgresql copy with activerecord copy to load data to your postgres database permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using PostgreSQL Copy with Activerecord-copy to load data to your Postgres database&lt;/h2&gt;
&lt;p&gt;COPY is the fastest way to load data to a PostgreSQL database; it uses the combined power of a bulk insert and avoids some of the overhead of repeatedly parsing and planning an &lt;code &gt;INSERT&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The gem &lt;a href=&quot;https://github.com/lfittl/activerecord-copy/&quot;&gt;activerecord-copy&lt;/a&gt; provides an easy-to-use interface for implementing COPY in your Rails app. You’ll need to add the line &lt;code &gt;gem &apos;activerecord-import&apos;&lt;/code&gt; to your Gemfile and run &lt;code &gt;bundle install&lt;/code&gt; in your terminal to install the gem and get ready to use it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Here is a sample Rake task showing how you can use it:&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# lib/tasks/active_record_copy.rake&lt;/span&gt;
&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&apos;csv&apos;&lt;/span&gt;
&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&quot;benchmark&quot;&lt;/span&gt;
namespace &lt;span &gt;:copy&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
   desc &lt;span &gt;&quot;imports data from csv to postgresql&quot;&lt;/span&gt;
   task &lt;span &gt;:data&lt;/span&gt; &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;:environment&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
       &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;insert_user&lt;/span&gt;&lt;/span&gt;
           users &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
           &lt;span &gt;CSV&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;foreach&lt;span &gt;(&lt;/span&gt;filename&lt;span &gt;,&lt;/span&gt; headers&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;row&lt;span &gt;|&lt;/span&gt;
               users &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; row
           &lt;span &gt;end&lt;/span&gt;
           time &lt;span &gt;=&lt;/span&gt; &lt;span &gt;Time&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;now&lt;span &gt;.&lt;/span&gt;getutc
          
           &lt;span &gt;User&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;copy_from_client &lt;span &gt;[&lt;/span&gt;&lt;span &gt;:first_name&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:last_name&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:email&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:created_at&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:updated_at&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;copy&lt;span &gt;|&lt;/span&gt;
               users&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;d&lt;span &gt;|&lt;/span&gt;
                   copy &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;d&lt;span &gt;[&lt;/span&gt;&lt;span &gt;:first_name&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; d&lt;span &gt;[&lt;/span&gt;&lt;span &gt;:last_name&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; d&lt;span &gt;[&lt;/span&gt;&lt;span &gt;:email&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; &lt;span &gt;,&lt;/span&gt;time&lt;span &gt;,&lt;/span&gt; time &lt;span &gt;]&lt;/span&gt;
               &lt;span &gt;end&lt;/span&gt;
           &lt;span &gt;end&lt;/span&gt;
       &lt;span &gt;end&lt;/span&gt;
       puts &lt;span &gt;Benchmark&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;realtime &lt;span &gt;{&lt;/span&gt;insert_user&lt;span &gt;}&lt;/span&gt;
   &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The activerecord-copy gem adds a &lt;code &gt;copy_from_client&lt;/code&gt; method to all your model classes, as shown in the snippet above (you’ll have to define the columns and their values as shown).&lt;/p&gt;
&lt;p&gt;Note that when you use the activerecord-copy gem, the time stamp is not created for you automatically. You’ll have to create this yourself. You’ll also notice where I created the time stamp &lt;code &gt;time = Time.now.getutc&lt;/code&gt;;  that’s because Rails will not create time stamps for you automatically with COPY.&lt;/p&gt;
&lt;h3 id=&quot;pros-of-using-postgresql-copy-with-activerecord-copy&quot; &gt;&lt;a href=&quot;#pros-of-using-postgresql-copy-with-activerecord-copy&quot; aria-label=&quot;pros of using postgresql copy with activerecord copy permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Pros of using PostgreSQL Copy with Activerecord-copy&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Doesn’t have per-row overhead&lt;/li&gt;
&lt;li&gt;If insert fails, your transaction will rollback the insert&lt;/li&gt;
&lt;li&gt;Super fast&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;cons-of-using-postgresql-copy-with-activerecord-copy&quot; &gt;&lt;a href=&quot;#cons-of-using-postgresql-copy-with-activerecord-copy&quot; aria-label=&quot;cons of using postgresql copy with activerecord copy permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Cons of using PostgreSQL Copy with Activerecord-copy&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Manually set time stamps (created_at, updated_at, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I analyzed the &lt;code &gt;activerecord-copy&lt;/code&gt; performance with a transaction of over one million records, as I did for other methods, and the speed is about 1.5 minutes. Insanely fast compared to the other methods we’ve seen in this article.&lt;/p&gt;
&lt;h2 id=&quot;4-using-background-jobs--to-load-data-to-your-postgres-database&quot; &gt;&lt;a href=&quot;#4-using-background-jobs--to-load-data-to-your-postgres-database&quot; aria-label=&quot;4 using background jobs  to load data to your postgres database permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;4. Using background jobs  to load data to your Postgres database&lt;/h2&gt;
&lt;p&gt;If you frequently load new data to your database, one great way to improve your app’s performance is to run your data loading using a background job. There are several tools that make this possible, for example, Rails’ &lt;a href=&quot;https://github.com/collectiveidea/delayed_job&quot;&gt;delayed_job&lt;/a&gt; gem, &lt;a href=&quot;https://github.com/mperham/sidekiq&quot;&gt;sidekiq&lt;/a&gt;, and &lt;a href=&quot;https://github.com/resque/resque&quot;&gt;resque&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, just like Active Record, &lt;strong&gt;Rails uses Active Jobs to allow you to use any of these supported adapters within your Rails app without bothering about job-specific implementation&lt;/strong&gt;. So you could set up a script for Active Record and run the script in a background job using Active Jobs and the delayed_job adapter. That way, you&apos;ll be running your data loading in the background.&lt;/p&gt;
&lt;p&gt;Let’s walk through how to set up your Active Job to run your background process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Since you’re going to use the delayed_job adapter,  install the &lt;a href=&quot;https://github.com/collectiveidea/delayed_job_active_record&quot;&gt;delayed_job_active_record gem&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Add  &lt;code &gt;gem &apos;delayed_job_active_record&apos;&lt;/code&gt;  to your Gemfile.&lt;/li&gt;
&lt;li&gt;Run &lt;code &gt;bundle install&lt;/code&gt; on your terminal/command line.&lt;/li&gt;
&lt;li&gt;Run the following command to create a delayed job migration for the delayed jobs table:&lt;/li&gt;
&lt;/ol&gt;
&lt;div  data-language=&quot;shell&quot;&gt;&lt;pre &gt;&lt;code &gt;rails g delayed_job:active_record
rake db:migrate&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;5&quot;&gt;
&lt;li&gt;Generate an Active Job by running the following command:&lt;/li&gt;
&lt;/ol&gt;
&lt;div  data-language=&quot;shell&quot;&gt;&lt;pre &gt;&lt;code &gt;rails generate job import_data&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;6&quot;&gt;
&lt;li&gt;Open the file created in your &lt;code &gt;app/jobs&lt;/code&gt; directory—&lt;code &gt;app/jobs/import_data_job.rb&lt;/code&gt;—and add your data loading code:&lt;/li&gt;
&lt;/ol&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# app/jobs/import_data_job.rb&lt;/span&gt;
&lt;span &gt;class&lt;/span&gt; &lt;span &gt;ImportDataJob&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationJob&lt;/span&gt;
   queue_as &lt;span &gt;:default&lt;/span&gt;
   &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;perform&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;args&lt;span &gt;)&lt;/span&gt;
   &lt;span &gt;# Write your code here to load records to the database. You can use any of the fast methods we&apos;ve discussed.&lt;/span&gt;
   &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;7&quot;&gt;
&lt;li&gt;In order for Rails to be aware of the Active Job adapter you want to use, you need to add the adapter to your config file. Just add this line: &lt;code &gt;config.active_job.queue_adapter = :delayed_job_active_record&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;   &lt;span &gt;# config/application.rb&lt;/span&gt;
   &lt;span &gt;module&lt;/span&gt; &lt;span &gt;YourApp&lt;/span&gt;
     &lt;span &gt;class&lt;/span&gt; &lt;span &gt;Application&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;Rails&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Application&lt;/span&gt;
       &lt;span &gt;# Be sure to have the adapter&apos;s gem in your Gemfile&lt;/span&gt;
       &lt;span &gt;# and follow the adapter&apos;s specific installation&lt;/span&gt;
       &lt;span &gt;# and deployment instructions.&lt;/span&gt;
       config&lt;span &gt;.&lt;/span&gt;active_job&lt;span &gt;.&lt;/span&gt;queue_adapter &lt;span &gt;=&lt;/span&gt; &lt;span &gt;:delayed_job_active_record&lt;/span&gt;
     &lt;span &gt;end&lt;/span&gt;
   &lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Depending on how often you want the job to run, you can set the job to be enqueued at a specific time or immediately, following the instructions in the &lt;a href=&quot;https://guides.rubyonrails.org/active_job_basics.html&quot;&gt;Active Jobs documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One way you can do this is to allow the job to run asynchronously. Create a Rake task, add &lt;code &gt;ImportDataJob.perform_later&lt;/code&gt; to the task, and run it. Example:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;namespace &lt;span &gt;:active_jobs&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
   desc &lt;span &gt;&quot;imports data from sql to postgresql&quot;&lt;/span&gt;
   task &lt;span &gt;:import&lt;/span&gt; &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;:environment&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
       &lt;span &gt;ImportDataJob&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;perform_later
   &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once this is done, you can now run the task &lt;code &gt;rake active_jobs:import&lt;/code&gt; on your terminal.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts-about-loading-large-data-sets-into-a-postgresql-database-with-rails&quot; &gt;&lt;a href=&quot;#final-thoughts-about-loading-large-data-sets-into-a-postgresql-database-with-rails&quot; aria-label=&quot;final thoughts about loading large data sets into a postgresql database with rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Final Thoughts About Loading Large Data Sets into a PostgreSQL Database with Rails&lt;/h2&gt;
&lt;p&gt;When considering how to optimize your database performance, it’s best to first figure out the optimization options the database has already provided. As you may have noticed, most of the tools and techniques in this article leverage the hidden power of the PostgreSQL database. Sometimes, it might just be your implementation slowing down your database performance.&lt;/p&gt;
&lt;h2 id=&quot;speed-comparison-of-different-ways-to-load-data-into-postgres-with-rails&quot; &gt;&lt;a href=&quot;#speed-comparison-of-different-ways-to-load-data-into-postgres-with-rails&quot; aria-label=&quot;speed comparison of different ways to load data into postgres with rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Speed comparison of different ways to load data into Postgres with Rails&lt;/h2&gt;
&lt;p&gt;Here’s a table summarizing the various speeds of the methods discussed in this article.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Amount of records&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One record at a time insert&lt;/td&gt;
&lt;td&gt;1.3 hours&lt;/td&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bulk inserts with Activerecord Import&lt;/td&gt;
&lt;td&gt;5.1 minutes&lt;/td&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL Copy with Activerecord-copy&lt;/td&gt;
&lt;td&gt;1.5 minutes&lt;/td&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using Background Jobs&lt;/td&gt;
&lt;td&gt;&amp;#x3C; 1 sec (perceived)&lt;/td&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You’ve learned that if you’re loading a huge amount of data into your PostgreSQL database, one insert at a time is slow and shouldn’t even be considered. For ultimate performance, you want to use COPY. Of course, you’ve also seen the caveats of each method, and you should weigh all the pros and cons before making your final decision.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article we’d appreciate it if you’d &lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DThe%20Fastest%20Way%20To%20Load%20Data%20Into%20%23Postgres%20with%20Rails%E2%80%9D%20-%20In%20this%20article,%20%40pganalyze%20share%20four%20methods:%20Insert%20one%20record%20at%20a%20time,%20Bulk%20Inserts%20with%20Active%20%20Record%20Import,%20PostgreSQL%20Copy%20with%20Activerecord-copy,%20and%20using%20background%20jobs%3A%20https://pganalyze.com/blog/fastest-way-importing-data-into-postgres-with-ruby-rails&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;other-articles-and-resources-you-might-like&quot; &gt;&lt;a href=&quot;#other-articles-and-resources-you-might-like&quot; aria-label=&quot;other articles and resources you might like permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Other articles and resources you might like&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/postgres-row-level-security-ruby-rails&quot;&gt;Using Postgres Row-Level Security in Ruby on Rails&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/custom-postgres-data-types-ruby-rails&quot;&gt;Creating Custom Postgres Data Types in Rails&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;Efficient Search in Rails with Postgres (PDF eBook)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/postgis-rails-geocoder&quot;&gt;PostGIS vs. Geocoder in Rails&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/active-record-subqueries-rails&quot;&gt;Advanced Active Record: Using Subqueries in Rails&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/full-text-search-ruby-rails-postgres&quot;&gt;Full Text Search in Milliseconds with Rails and PostgreSQL&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/materialized-views-ruby-rails&quot;&gt;Effectively Using Materialized Views in Ruby on Rails&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/similarity-in-postgres-and-ruby-on-rails-using-trigrams&quot;&gt;Similarity in Postgres and Rails using Trigrams&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/efficient-graphql-queries-in-ruby-on-rails-and-postgres&quot;&gt;Efficient GraphQL queries in Ruby on Rails &amp;#x26; Postgres&lt;/a&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Understanding Postgres GIN Indexes: The Good and the Bad]]></title><description><![CDATA[Adding, tuning and removing indexes is an essential part of maintaining an application that uses a database. Oftentimes, our applications rely on sophisticated database features and data types, such as JSONB, array types or full text search in Postgres. A simple B-tree index does not work in such situations, for example to index a JSONB column. Instead, we need to look beyond, to GIN indexes. Almost 15 years ago to the dot, GIN indexes were added in Postgres 8.2, and they have since become an…]]></description><link>https://pganalyze.com/blog/gin-index</link><guid isPermaLink="false">https://pganalyze.com/blog/gin-index</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Thu, 02 Dec 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p &gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Diagram of GIN index structure&quot; title=&quot;Diagram of GIN index structure&quot; src=&quot;https://pganalyze.com/static/718f52cb037c0a56a45cb32a73db791e/1d69c/gin_diagram.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Adding, tuning and removing indexes is an essential part of maintaining an application that uses a database. Oftentimes, our applications rely on sophisticated database features and data types, such as JSONB, array types or full text search in Postgres. &lt;strong&gt;A simple B-tree index does not work in such situations, for example to index a JSONB column&lt;/strong&gt;. Instead, we need to look beyond, to GIN indexes.&lt;/p&gt;
&lt;p&gt;Almost 15 years ago to the dot, &lt;a href=&quot;http://www.sai.msu.su/~megera/wiki/Gin&quot;&gt;GIN indexes were added in Postgres 8.2&lt;/a&gt;, and they have since become an essential tool in the application DBA’s toolbox. GIN indexes can seem like magic, as they can index what a normal B-tree cannot, such as JSONB data types and full text search. With this great power comes great responsibility, as GIN indexes can have adverse effects if used carelessly.&lt;/p&gt;
&lt;p&gt;In this article, we’ll take an in-depth look at GIN indexes in Postgres, building on, and referencing many great articles that have been written over the years by the community. We’ll start by reviewing &lt;strong&gt;what GIN indexes can do, how they are structured, and their most common use cases&lt;/strong&gt;, such as for indexing JSONB columns, or to support &lt;a href=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;Postgres full text search&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But, understanding the fundamentals is only part of the puzzle. It’s much better when we can also learn from real world examples on busy databases. We’ll review a specific situation that the GitLab database team found themselves in this year, as it relates to write overhead caused by GIN indexes on a busy table with more than 1000 updates per minute.&lt;/p&gt;
&lt;p&gt;And we’ll conclude with a review of the trade-offs between the GIN write overhead and the possible performance gains. Plus: We’ve added support for GIN index recommendations to the pganalyze Index Advisor.&lt;/p&gt;
&lt;p&gt;To start with, let’s review what a GIN index looks like:&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#gin-index-in-postgres-what-is-it-actually&quot;&gt;GIN Index in Postgres: What is it actually?&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#indexing-tsvector-columns-for-postgres-full-text-search&quot;&gt;Indexing tsvector columns for Postgres full text search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#indexing-like-searches-with-trigrams-and-gin_trgm_ops&quot;&gt;Indexing LIKE searches with Trigrams and gin_trgm_ops&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#postgresql-jsonb-and-gin-indexes&quot;&gt;PostgreSQL, JSONB and GIN Indexes&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#postgres-gin-index-for-jsonb-columns-using-jsonb_ops-and-jsonb_path_ops&quot;&gt;Postgres GIN index for JSONB columns using jsonb_ops and jsonb_path_ops&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#multi-column-gin-indexes-and-combining-gin-and-b-tree-indexes&quot;&gt;Multi-Column GIN Indexes, and Combining GIN and B-tree indexes&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#the-downside-of-gin-indexes-expensive-updates&quot;&gt;The downside of GIN Indexes: Expensive Updates&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#gin-trigram-indexes-a-lesson-from-gitlab&quot;&gt;GIN trigram indexes: A lesson from GitLab&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#measuring-gin-pending-list-overhead-and-size&quot;&gt;Measuring GIN pending list overhead and size&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#strategies-for-dealing-with-gin-pending-list-update-issues&quot;&gt;Strategies for dealing with GIN pending list update issues&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#gin-index-support-in-the-pganalyze-index-advisor&quot;&gt;GIN index support in the pganalyze Index Advisor&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#other-helpful-resources&quot;&gt;Other helpful resources&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;gin-index-in-postgres-what-is-it-actually&quot; &gt;&lt;a href=&quot;#gin-index-in-postgres-what-is-it-actually&quot; aria-label=&quot;gin index in postgres what is it actually permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;GIN Index in Postgres: What is it actually?&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;“The GIN index type was designed to &lt;strong&gt;deal with data types that are subdividable and you want to search for individual component values&lt;/strong&gt; (array elements, lexemes in a text document, etc)” - &lt;a href=&quot;https://www.postgresql.org/message-id/flat/26038.1559516834%40sss.pgh.pa.us#ccb004aefc151d913e7a274a9b30c631&quot;&gt;Tom Lane&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The GIN index type was initially created by Teodor Sigaev and Oleg Bartunov, first released in Postgres 8.2, on December 5, 2006 - almost 15 years ago. Since then, GIN has seen many improvements, but the fundamental structure remains similar. GIN stands for &quot;Generalized Inverted iNdex&quot;. &quot;Inverted&quot; refers to the way that the index structure is set up, building a table-encompassing tree of all column values, where a single row can be represented in &lt;strong&gt;many places&lt;/strong&gt; within the tree. By comparison, a B-tree index generally has &lt;strong&gt;one location&lt;/strong&gt; where an index entry points to a specific row.&lt;/p&gt;
&lt;p&gt;Another way of explaining GIN indexes comes from a &lt;a href=&quot;https://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf&quot;&gt;presentation by Oleg Bartunov and Alexander Korotkov&lt;/a&gt; at PGConf.EU 2012 in Prague. They describe a GIN index like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:&lt;/p&gt;
&lt;p &gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Example of how GIN is structured like a book&apos;s table of contents&quot; title=&quot;Example of how GIN is structured like a book&apos;s table of contents&quot; src=&quot;https://pganalyze.com/static/888d381b466ef22724d3053f47c7a4f1/1d69c/gin_table_of_contents.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;It’s important to note that the exact mapping of a column of a given data type is dependent on the GIN index operator class. That means, instead of having a uniform representation of data in the index, like with B-trees, a GIN index can have very different index contents depending on which data type and operator class you are using. Some data types, such as JSONB have more than one GIN operator class to support the most optimal index structure for specific query patterns.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Before we move on, one more thing to know:&lt;/strong&gt; GIN indexes only support Bitmap Index Scans (not Index Scan or Index Only Scan), due to the fact that they only store parts of the row values in each index page. Don’t be surprised when EXPLAIN always shows Bitmap Index / Heap Scans for your GIN indexes.&lt;/p&gt;
&lt;p&gt;Let’s take a look at a few examples:&lt;/p&gt;
&lt;h3 id=&quot;indexing-tsvector-columns-for-postgres-full-text-search&quot; &gt;&lt;a href=&quot;#indexing-tsvector-columns-for-postgres-full-text-search&quot; aria-label=&quot;indexing tsvector columns for postgres full text search permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Indexing tsvector columns for Postgres full text search&lt;/h3&gt;
&lt;p&gt;The initial motivation for GIN indexes was full text search. Before GIN was added, there was no way to index full text search in Postgres, instead requiring a very slow sequential scan of the table.&lt;/p&gt;
&lt;p&gt;We’ve previously written about &lt;a src=&quot;https://pganalyze.com/blog/full-text-search-django-postgres&quot;&gt;Postgres full text search with Django&lt;/a&gt;, as well as how to do it with &lt;a src=&quot;https://pganalyze.com/blog/full-text-search-ruby-rails-postgres&quot;&gt;Ruby on Rails&lt;/a&gt; on the pganalyze blog.&lt;/p&gt;
&lt;p&gt;A simple example for a full text search index looks like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; pgweb_idx &lt;span &gt;ON&lt;/span&gt; pgweb &lt;span &gt;USING&lt;/span&gt; GIN &lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; body&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses an expression index to create a GIN index that contains the indexed tsvector values for each row. You can then query like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; title
&lt;span &gt;FROM&lt;/span&gt; pgweb
&lt;span &gt;WHERE&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; body&lt;span &gt;)&lt;/span&gt; @@ to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;friend&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As described in the &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch-indexes.html&quot;&gt;Postgres documentation&lt;/a&gt;, the tsvector GIN index structure is focused on lexemes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“GIN indexes are the preferred text search index type. As inverted indexes, they contain an index entry for each word (lexeme), with a compressed list of matching locations. Multi-word searches can find the first match, then use the index to remove rows that are lacking additional words.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;GIN indexes are the best starting point when using Postgres Full Text Search. There are situations where a GIST index might be preferred (see the &lt;a href=&quot;https://www.postgresql.org/docs/14/textsearch-indexes.html&quot;&gt;Postgres documentation&lt;/a&gt; for details), and if you run your own server you could also consider the newer &lt;a href=&quot;https://github.com/postgrespro/rum&quot;&gt;RUM index types&lt;/a&gt; available through an extension.&lt;/p&gt;
&lt;p&gt;Let&apos;s see what else GIN has to offer:&lt;/p&gt;
&lt;h3 id=&quot;indexing-like-searches-with-trigrams-and-gin_trgm_ops&quot; &gt;&lt;a href=&quot;#indexing-like-searches-with-trigrams-and-gin_trgm_ops&quot; aria-label=&quot;indexing like searches with trigrams and gin_trgm_ops permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Indexing LIKE searches with Trigrams and gin_trgm_ops&lt;/h3&gt;
&lt;p&gt;Sometimes Full Text Search isn&apos;t the right fit, but you find yourself needing to index a LIKE search on a particular column:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; test_trgm &lt;span &gt;(&lt;/span&gt;t &lt;span &gt;text&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; test_trgm &lt;span &gt;WHERE&lt;/span&gt; t &lt;span &gt;LIKE&lt;/span&gt; &lt;span &gt;&apos;%foo%bar&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Due to the nature of the LIKE operation, which supports arbitrary wildcard expressions, this is fundamentally hard to index. However, the &lt;code &gt;pg_trgm&lt;/code&gt; extension can help. When you create an index like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; trgm_idx &lt;span &gt;ON&lt;/span&gt; test_trgm &lt;span &gt;USING&lt;/span&gt; gin &lt;span &gt;(&lt;/span&gt;t gin_trgm_ops&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Postgres will split the row values into trigrams, allowing indexed searches:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; test_trgm &lt;span &gt;WHERE&lt;/span&gt; t &lt;span &gt;LIKE&lt;/span&gt; &lt;span &gt;&apos;%foo%bar&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                               QUERY PLAN                               
------------------------------------------------------------------------
 Bitmap Heap Scan on test_trgm  (cost=16.00..20.02 rows=1 width=32)
   Recheck Cond: (t ~~ &apos;%foo%bar&apos;::text)
   -&gt;  Bitmap Index Scan on trgm_idx  (cost=0.00..16.00 rows=1 width=0)
         Index Cond: (t ~~ &apos;%foo%bar&apos;::text)
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Effectiveness of this method varies with the exact data set. But when it works, it can speed up searches on arbitrary text data quite significantly.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Effective Indexing in Postgres&quot;
        title=&quot;Download Free eBook: Effective Indexing in Postgres&quot;
        src=&quot;https://pganalyze.com/static/97b01777597bdcba8b1803935f1b7da0/acb04/ebook_promo_postgres_create_index.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;postgresql-jsonb-and-gin-indexes&quot; &gt;&lt;a href=&quot;#postgresql-jsonb-and-gin-indexes&quot; aria-label=&quot;postgresql jsonb and gin indexes permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;PostgreSQL, JSONB and GIN Indexes&lt;/h2&gt;
&lt;p&gt;JSONB was added to Postgres almost 10 years after GIN indexes were introduced - and it shows the flexibility of the GIN index type that they are the preferred way to index JSONB columns.&lt;/p&gt;
&lt;h3 id=&quot;postgres-gin-index-for-jsonb-columns-using-jsonb_ops-and-jsonb_path_ops&quot; &gt;&lt;a href=&quot;#postgres-gin-index-for-jsonb-columns-using-jsonb_ops-and-jsonb_path_ops&quot; aria-label=&quot;postgres gin index for jsonb columns using jsonb_ops and jsonb_path_ops permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Postgres GIN index for JSONB columns using jsonb_ops and jsonb_path_ops&lt;/h3&gt;
&lt;p&gt;With JSONB in Postgres we gain the flexibility of not having to define our schema upfront, but instead we can dynamically add data to a column in our table in JSON format.&lt;/p&gt;
&lt;p&gt;The most basic GIN index example for JSONB looks like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; test &lt;span &gt;(&lt;/span&gt;
  id bigserial &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;data&lt;/span&gt; jsonb
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; test&lt;span &gt;(&lt;/span&gt;&lt;span &gt;data&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;{&quot;field&quot;: &quot;value1&quot;}&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; test&lt;span &gt;(&lt;/span&gt;&lt;span &gt;data&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;{&quot;field&quot;: &quot;value2&quot;}&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; test&lt;span &gt;(&lt;/span&gt;&lt;span &gt;data&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;{&quot;other_field&quot;: &quot;value42&quot;}&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; test &lt;span &gt;USING&lt;/span&gt; gin&lt;span &gt;(&lt;/span&gt;&lt;span &gt;data&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see with EXPLAIN, this is able to use the index, for example when querying for all rows that have the field key defined:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; test &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;data&lt;/span&gt; ? &lt;span &gt;&apos;field&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=8.00..12.01 rows=1 width=40)
   Recheck Cond: (data ? &apos;field&apos;::text)
   -&gt;  Bitmap Index Scan on test_data_idx  (cost=0.00..8.00 rows=1 width=0)
         Index Cond: (data ? &apos;field&apos;::text)
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The way this gets stored is based on the keys and values of the JSONB data. In the above test data, the default &lt;code &gt;jsonb_ops&lt;/code&gt; operator class would store the following values in the GIN index, as separate entries: &lt;code &gt;field&lt;/code&gt;, &lt;code &gt;other_field&lt;/code&gt;, &lt;code &gt;value1&lt;/code&gt;, &lt;code &gt;value2&lt;/code&gt;, &lt;code &gt;value42&lt;/code&gt;. Depending on the search the GIN index will combine multiple index entries to satisfy the specific query conditions.&lt;/p&gt;
&lt;p&gt;Now, we can also use the non-default &lt;code &gt;jsonb_path_ops&lt;/code&gt; operator class with a JSONB GIN index. This uses an optimized GIN index structure that would instead store the above data as three individual entries using a hash function: &lt;code &gt;hashfn(field, value1)&lt;/code&gt;, &lt;code &gt;hashfn(field, value2)&lt;/code&gt; and &lt;code &gt;hashfn(other_field, value42)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code &gt;jsonb_path_ops&lt;/code&gt; class is intended to efficiently support containment queries. First we specify the operator class during index creation:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; test &lt;span &gt;USING&lt;/span&gt; gin&lt;span &gt;(&lt;/span&gt;&lt;span &gt;data&lt;/span&gt; jsonb_path_ops&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And then we can use it for queries such as the following:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; test &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;data&lt;/span&gt; @&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;&apos;{&quot;field&quot;: &quot;value1&quot;}&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=8.00..12.01 rows=1 width=40)
   Recheck Cond: (data @&gt; &apos;{&quot;field&quot;: &quot;value1&quot;}&apos;::jsonb)
   -&gt;  Bitmap Index Scan on test_data_idx1  (cost=0.00..8.00 rows=1 width=0)
         Index Cond: (data @&gt; &apos;{&quot;field&quot;: &quot;value1&quot;}&apos;::jsonb)
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see it’s easy to index a JSONB column. Note that you could technically also index JSONB with other index types by taking specific parts of the data. For example, we could use a B-tree expression index to index the field keys:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; test &lt;span &gt;USING&lt;/span&gt; &lt;span &gt;btree&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;data&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&gt;&lt;/span&gt; &lt;span &gt;&apos;field&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The Postgres query planner will then use the specific expression index behind the scenes, if your query matches the expression:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; test &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;data&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&apos;field&apos;&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;value1&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                 QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Scan using test_expr_idx on test  (cost=0.13..8.15 rows=1 width=40)
   Index Cond: ((data -&gt;&gt; &apos;field&apos;::text) = &apos;value1&apos;::text)
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There is one more thing we should look at with finding the right GIN index, and that is multi-column GIN indexes.&lt;/p&gt;
&lt;h2 id=&quot;multi-column-gin-indexes-and-combining-gin-and-b-tree-indexes&quot; &gt;&lt;a href=&quot;#multi-column-gin-indexes-and-combining-gin-and-b-tree-indexes&quot; aria-label=&quot;multi column gin indexes and combining gin and b tree indexes permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Multi-Column GIN Indexes, and Combining GIN and B-tree indexes&lt;/h2&gt;
&lt;p&gt;Often times you’ll have queries that filter on a column that uses a data type that’s ideal for GIN indexes, such as JSONB, but you are also filtering on another column, that is more of a typical B-tree index candidate:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; records &lt;span &gt;(&lt;/span&gt;
  id bigserial &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  customer_id int4&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;data&lt;/span&gt; jsonb
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; records &lt;span &gt;WHERE&lt;/span&gt; customer_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;123&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;data&lt;/span&gt; @&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;&apos;{ &quot;location&quot;: &quot;New York&quot; }&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In addition you might have a query like the following:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; records &lt;span &gt;WHERE&lt;/span&gt; customer_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;123&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And you are considering which index to create for the two queries combined.&lt;/p&gt;
&lt;p&gt;There are two fundamental strategies you can take:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;(1) Create two separate indexes, one on &lt;code &gt;customer_id&lt;/code&gt; using a B-tree, and one on &lt;code &gt;data&lt;/code&gt; using GIN
&lt;ul&gt;
&lt;li&gt;In this situation, for the first query, Postgres might use BitmapAnd to combine the index search results from both indexes to find the affected rows&lt;/li&gt;
&lt;li&gt;Whilst the idea of using two separate indexes sounds great in theory, in practice it often turns out to be the worse performing option. You can find some discussions about this on the &lt;a href=&quot;https://www.postgresql.org/message-id/flat/56B332B6.1040109%40promani.be&quot;&gt;Postgres mailing lists&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;(2) Create one multi-column GIN index on both &lt;code &gt;customer_id&lt;/code&gt; and &lt;code &gt;data&lt;/code&gt;
&lt;ul&gt;
&lt;li&gt;Note that multi-column GIN indexes don’t help much with making the index more effective, but they can help cover multiple queries with the same index&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For implementing the second strategy, we need the help of the “btree_gin” extension in Postgres (part of contrib) that contains operator classes for data types that are not subdividable.&lt;/p&gt;
&lt;p&gt;You can create the extension and the multi-column index like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; EXTENSION btree_gin&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; records &lt;span &gt;USING&lt;/span&gt; gin &lt;span &gt;(&lt;/span&gt;&lt;span &gt;data&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; customer_id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that index column order does not matter for GIN indexes. And as we can see, this gets used during query planning:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; records &lt;span &gt;WHERE&lt;/span&gt; customer_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;123&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;data&lt;/span&gt; @&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;&apos;{ &quot;location&quot;: &quot;New York&quot; }&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on records  (cost=16.01..20.03 rows=1 width=41)
   Recheck Cond: ((customer_id = 123) AND (data @&gt; &apos;{&quot;location&quot;: &quot;New York&quot;}&apos;::jsonb))
   -&gt;  Bitmap Index Scan on records_customer_id_data_idx  (cost=0.00..16.01 rows=1 width=0)
         Index Cond: ((customer_id = 123) AND (data @&gt; &apos;{&quot;location&quot;: &quot;New York&quot;}&apos;::jsonb))
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It’s rather uncommon to use multi-column GIN indexes, but depending on your workload it might make sense. Remember that larger indexes mean more I/O, making index lookups slower, and writes more expensive.&lt;/p&gt;
&lt;h2 id=&quot;the-downside-of-gin-indexes-expensive-updates&quot; &gt;&lt;a href=&quot;#the-downside-of-gin-indexes-expensive-updates&quot; aria-label=&quot;the downside of gin indexes expensive updates permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The downside of GIN Indexes: Expensive Updates&lt;/h2&gt;
&lt;p&gt;As you saw in the examples above, GIN indexes are special because they often contain multiple index entries per single row that is being inserted. This is essential to enable the use cases that GIN supports, but causes one significant problem: Updating the index is expensive.&lt;/p&gt;
&lt;p&gt;Due to the fact that a single row can cause 10s or worst case 100s of index entries to be updated, it’s important to understand the special &lt;code &gt;fastupdate&lt;/code&gt; mechanism of GIN indexes.&lt;/p&gt;
&lt;p&gt;By default &lt;code &gt;fastupdate&lt;/code&gt; is enabled for GIN indexes, and it causes index updates to be deferred, so they can occur at a point where multiple updates have to be made, reducing the overhead for a single UPDATE, at the expense of having to do the work at a later point.&lt;/p&gt;
&lt;p&gt;The data that is deferred is kept in the special &lt;strong&gt;pending list&lt;/strong&gt;, which then gets flushed to the main index structure in one of three situations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The &lt;code &gt;gin_pending_list_limit&lt;/code&gt; (default of 4MB) is reached during a regular index update&lt;/li&gt;
&lt;li&gt;Explicit call to the &lt;code &gt;gin_clean_pending_list&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Autovacuum on the table with the GIN index (GIN pending list cleanup happens at the end of vacuum)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As you can imagine this can be quite an expensive operation, which is why one symptom of index write overhead with GIN can be that every Nth INSERT or UPDATE statement suddenly is a lot slower, in case you run into the first scenario above, where the &lt;code &gt;gin_pending_list_limit&lt;/code&gt; is reached.&lt;/p&gt;
&lt;p&gt;This exact situation happened to the team at GitLab recently. Let’s look at a real life example of where GIN updates became a problem.&lt;/p&gt;
&lt;h3 id=&quot;gin-trigram-indexes-a-lesson-from-gitlab&quot; &gt;&lt;a href=&quot;#gin-trigram-indexes-a-lesson-from-gitlab&quot; aria-label=&quot;gin trigram indexes a lesson from gitlab permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;GIN trigram indexes: A lesson from GitLab&lt;/h3&gt;
&lt;p&gt;The team at GitLab often publishes their discussions of database optimizations publicly, and we can learn a lot from these interactions. &lt;a href=&quot;https://gitlab.com/gitlab-org/gitlab/-/issues/336930&quot;&gt;A recent example discussed&lt;/a&gt; a GIN trigram index that caused merge requests to be quite slow occasionally:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“We can see there are a number of slow updates for updating a merge request. The interesting thing here is that we see very little locking statements (locking is logged after 5 seconds waiting), which suggests something else is occurring to make these slow.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was determined to be caused by the GIN pending list:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Anecdotally, cleaning the gin index pending-list for the description field on the merge_requests table can cost multiple seconds.  The overhead does increase when there are more pending entries to write to the index.  In this informal survey of manually running gin_clean_pending_list( &apos;index_merge_requests_on_description_trigram&apos;::regclass ) the duration varied between 465 ms and 3155 ms.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The team further investigated, and determined that the GIN pending list was flushed a very high number of times during business hours:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“this gin index&apos;s pending list fills up roughly once every 2.7 seconds during the peak hours of a normal weekday.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you want to read the full story, GitLab’s Matt Smiley has done an &lt;a href=&quot;https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4725#note_596146675&quot;&gt;excellent analysis of the problem they’ve encountered&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As we can see, getting good data about the actual overhead of GIN pending list updates is critical.&lt;/p&gt;
&lt;p&gt;
&lt;a src=&quot;https://pganalyze.com/index-advisor&quot;&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor promotion banner&quot; title=&quot;pganalyze Index Advisor promotion banner&quot; src=&quot;https://pganalyze.com/static/7dad04148f9e0117c49a306ff9ab40b1/acb04/promo_index_advisor.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id=&quot;measuring-gin-pending-list-overhead-and-size&quot; &gt;&lt;a href=&quot;#measuring-gin-pending-list-overhead-and-size&quot; aria-label=&quot;measuring gin pending list overhead and size permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Measuring GIN pending list overhead and size&lt;/h3&gt;
&lt;p&gt;To validate whether the GIN pending list is a problem on a busy table, we can do a few things:&lt;/p&gt;
&lt;p&gt;First, we could utilize the &lt;code &gt;pgstatginindex&lt;/code&gt; function together with something like psql’s \watch command to keep a close eye on a particular index:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; EXTENSION pgstattuple&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pgstatginindex&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;myindex&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; version | pending_pages | pending_tuples 
---------+---------------+----------------
       2 |             0 |              0
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Second, If you run your own database server, you can use “perf” &lt;a href=&quot;https://wiki.postgresql.org/wiki/Profiling_with_perf#Dynamic_tracepoints&quot;&gt;dynamic tracepoints&lt;/a&gt; to measure calls to the &lt;code &gt;ginInsertCleanup&lt;/code&gt; function in Postgres:&lt;/p&gt;
&lt;div  data-language=&quot;sh&quot;&gt;&lt;pre &gt;&lt;code &gt;sudo perf probe -x /usr/lib/postgresql/14/bin/postgres ginInsertCleanup
sudo perf stat -a -e probe_postgres:ginInsertCleanup -- sleep 60&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;An alternate method, using DTrace, was described in a &lt;a href=&quot;https://www.youtube.com/watch?v=Brt41xnMZqo&amp;#x26;t=1949s&quot;&gt;2019 PGCon talk&lt;/a&gt;. The authors of that talk also ended up visualizing different &lt;code &gt;gin_pending_list_limit&lt;/code&gt; and &lt;code &gt;work_mem&lt;/code&gt; settings:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;DTrace measurements of GIN pending list flushes&quot; title=&quot;DTrace measurements of GIN pending list flushes&quot; src=&quot;https://pganalyze.com/static/5389af77457315017a70d95953877cd4/1d69c/gin_dtrace.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;As they discovered, the memory limit during flushing of the pending list makes a quite noticable difference.&lt;/p&gt;
&lt;p&gt;If you don&apos;t have the luxury of direct access to your database server, you can &lt;a href=&quot;https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4725#note_596146675&quot;&gt;estimate how often the pending list&lt;/a&gt; fills up based on the average size of index tuples and other statistics.&lt;/p&gt;
&lt;p&gt;Now, if we determine that we have a problem, what can we do about it?&lt;/p&gt;
&lt;h3 id=&quot;strategies-for-dealing-with-gin-pending-list-update-issues&quot; &gt;&lt;a href=&quot;#strategies-for-dealing-with-gin-pending-list-update-issues&quot; aria-label=&quot;strategies for dealing with gin pending list update issues permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Strategies for dealing with GIN pending list update issues&lt;/h3&gt;
&lt;p&gt;There are multiple alternate ways you can resolve issues like the one GitLab encountered:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;(1) Reduce &lt;code &gt;gin_pending_list_limit&lt;/code&gt;
&lt;ul&gt;
&lt;li&gt;Have more frequent, smaller flushes&lt;/li&gt;
&lt;li&gt;This may sound odd - but &lt;code &gt;gin_pending_list_limit&lt;/code&gt; started out as being determined by work_mem (instead of being its own setting), and is only configurable separately since Postgres 9.5 - explaining the 4MB default, which may be too high in some cases&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;(2) Increase &lt;code &gt;gin_pending_list_limit&lt;/code&gt;
&lt;ul&gt;
&lt;li&gt;Have more opportunities to cleanup the list outside of the regular workload&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;(3) Turning off &lt;code &gt;fastupdate&lt;/code&gt;
&lt;ul&gt;
&lt;li&gt;Taking the overhead with each individual INSERT/UPDATE&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;(4) Tune autovacuum to run more often on the table, in order to clean the pending list&lt;/li&gt;
&lt;li&gt;(5) Explicitly calling &lt;code &gt;gin_clean_pending_list()&lt;/code&gt;, instead of relying on Autovacuum&lt;/li&gt;
&lt;li&gt;(6) Drop the GIN index
&lt;ul&gt;
&lt;li&gt;If you have alternate ways of indexing the data, for example using expression indexes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Depending on your workload one or multiple of these approaches could be a good fit.&lt;/p&gt;
&lt;p&gt;In addition, it’s important to ensure you have sufficient memory available during the GIN pending list cleanup. The memory limit used for the pending list flush can be confusing, and is not related to the size of gin_pending_list_limit. Instead it uses the following Postgres settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code &gt;work_mem&lt;/code&gt; during regular INSERT/UPDATE&lt;/li&gt;
&lt;li&gt;&lt;code &gt;maintenance_work_mem&lt;/code&gt; during &lt;code &gt;gin_clean_pending_list()&lt;/code&gt; call&lt;/li&gt;
&lt;li&gt;&lt;code &gt;autovacuum_work_mem&lt;/code&gt; during autovacuum&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Last but not least, you may want to consider partitioning or sharding a table that encounters problems like this. It may not be the easiest thing to do, but scaling GIN indexes to heavy write workloads is quite a tricky business.&lt;/p&gt;
&lt;h2 id=&quot;gin-index-support-in-the-pganalyze-index-advisor&quot; &gt;&lt;a href=&quot;#gin-index-support-in-the-pganalyze-index-advisor&quot; aria-label=&quot;gin index support in the pganalyze index advisor permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;GIN index support in the pganalyze Index Advisor&lt;/h2&gt;
&lt;p&gt;Not sure if your workload could utilize a GIN index, or which index to create for your queries?&lt;/p&gt;
&lt;p&gt;We have now added initial support for GIN and GIST index recommendations to the &lt;a src=&quot;https://pganalyze.com/index-advisor&quot;&gt;pganalyze Index Advisor&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here is an example of a GIN index recommendation for an existing &lt;code &gt;tsvector&lt;/code&gt; column:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor example with GIN index recommendation&quot; title=&quot;pganalyze Index Advisor example with GIN index recommendation&quot; src=&quot;https://pganalyze.com/static/438113e00dfe6d2bf03fbf617b46b853/1d69c/index_advisor.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Note that the costing and size estimation logic for GIN and GIST indexes is still being actively developed.&lt;/p&gt;
&lt;p&gt;We recommend trying out the Index Advisor recommendation on your own system to assess its effectiveness, as well as monitoring the production table for write overhead after you have added an index. You may also need to tweak your queries to make use of a particular index.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;GIN indexes are powerful, and often the only way to index certain queries and data types. But with great power comes great responsibility. Use GIN indexes wisely, especially on tables that are heavily written to.&lt;/p&gt;
&lt;p&gt;And when you are not sure which GIN index could work, try out the &lt;a href=&quot;https://pganalyze.com/index-advisor&quot;&gt;pganalyze Index Advisor&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you want to share this article with your peers, feel free to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DUnderstanding%20Postgres%20GIN%20Indexes%3A%20The%20Good%20and%20the%20Bad%22%20-%20In%20this%20article,%20%40pganalyze%20shows%20how%20to%20index%20JSONB,%20text%20search%20and%20more%20with%20GIN,%20and%20why%20index%20updates%20can%20get%20expensive%3A%20https://pganalyze.com/blog/gin-index&quot;&gt;tweet it&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;other-helpful-resources&quot; &gt;&lt;a href=&quot;#other-helpful-resources&quot; aria-label=&quot;other helpful resources permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Other helpful resources&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/postgres-create-index&quot;&gt;Using Postgres CREATE INDEX: Understanding operator classes, index types &amp;#x26; more&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/deconstructing-the-postgres-planner&quot;&gt;How we deconstructed the Postgres planner to find indexing opportunities&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;Efficient Search in Rails with Postgres (PDF eBook)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/full-text-search-django-postgres&quot;&gt;Efficient Postgres Full Text Search in Django&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/full-text-search-ruby-rails-postgres&quot;&gt;Full Text Search in Milliseconds with Rails and PostgreSQL&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/pagination-django-postgres&quot;&gt;Efficient Pagination in Django and Postgres&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;eBook: Effective Indexing in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/webinars/how-to-reason-about-indexing-your-postgres-database&quot;&gt;Webinar: How To Reason About Indexing Your Postgres Database&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/blog/5mins-postgres-for-app-developers-tables-indexes&quot;&gt;5mins of Postgres E17: Demystifying Postgres for application developers: A mental model for tables and indexes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pganalyze.com/index-advisor&quot;&gt;pganalyze Index Advisor for Postgres&lt;/a&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Postgres Views in Django]]></title><description><![CDATA[At my first job, we worked with a lot of data. I quickly found that when there's a lot of data, there are bound to be some long, convoluted SQL queries. Many of ours contained multiple joins, conditionals, and filters. One of the ways we kept the complexity manageable was to create Postgres views for common queries. Postgres views allow you to query against the results of another query. Views can be composed of columns from one or more tables or even other views, and they are easy to work with…]]></description><link>https://pganalyze.com/blog/postgresql-views-django-python</link><guid isPermaLink="false">https://pganalyze.com/blog/postgresql-views-django-python</guid><dc:creator><![CDATA[Josh Alletto]]></dc:creator><pubDate>Tue, 16 Nov 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;At my first job, we worked with a lot of data. I quickly found that when there&apos;s a lot of data, there are bound to be some long, convoluted SQL queries. Many of ours contained multiple joins, conditionals, and filters. One of the ways we kept the complexity manageable was to create Postgres views for common queries.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createview.html&quot;&gt;Postgres views&lt;/a&gt; allow you to query against the results of another query. Views can be composed of columns from one or more tables or even other views, and they are easy to work with in a Django app. In this article, you’ll learn about the two different types of Postgres views and how to decide when and if you should use them. Finally, you’ll create a view and set up a Django app to use it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pganalyze.com/504c5c924f1313f41499308a148b04b8/postgresql-views-in-django-pganalyze.svg&quot; alt=&quot;PostgreSQL views made of data from columns in multiple tables&quot;&gt;&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#why-postgres-views&quot;&gt;Why Postgres views?&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#materialized-views-in-postgres&quot;&gt;Materialized Views in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#creating-a-materialized-view-in-postgres&quot;&gt;Creating a Materialized View in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#using-postgres-views-in-django-and-python&quot;&gt;Using Postgres Views in Django and Python&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-model&quot;&gt;The Model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;why-postgres-views&quot; &gt;&lt;a href=&quot;#why-postgres-views&quot; aria-label=&quot;why postgres views permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Why Postgres views?&lt;/h2&gt;
&lt;p&gt;One reason to use a view is that they help &lt;strong&gt;cut down on complexity&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;For example, your customer data may be spread across several tables: a &lt;code &gt;customers&lt;/code&gt; table, an &lt;code &gt;emails&lt;/code&gt; table, and an &lt;code &gt;addresses&lt;/code&gt; table. Addresses could reference more data in a &lt;code &gt;cities&lt;/code&gt; and &lt;code &gt;states&lt;/code&gt; table. This is an effective schema for your data, but you have to join all these tables every time you want to get a complete view of a customer. This may not be bad if you only do this occasionally, but it’s quite cumbersome if you’re going to query it often. Even if you only want two or three records, you still need to perform all these joins.&lt;/p&gt;
&lt;p&gt;You can solve this problem by creating a view that looks like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;VIEW&lt;/span&gt; complete_customer_data &lt;span &gt;AS&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt;
  concat&lt;span &gt;(&lt;/span&gt;customers&lt;span &gt;.&lt;/span&gt;first_name&lt;span &gt;,&lt;/span&gt;&lt;span &gt;&apos; &apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; customers&lt;span &gt;.&lt;/span&gt;last_name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; customer_name&lt;span &gt;,&lt;/span&gt;
  addresses&lt;span &gt;.&lt;/span&gt;street_address &lt;span &gt;AS&lt;/span&gt; street&lt;span &gt;,&lt;/span&gt;
  addresses&lt;span &gt;.&lt;/span&gt;zip_code &lt;span &gt;AS&lt;/span&gt; zip&lt;span &gt;,&lt;/span&gt;
  cities&lt;span &gt;.&lt;/span&gt;city&lt;span &gt;,&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; city
  states&lt;span &gt;.&lt;/span&gt;state &lt;span &gt;AS&lt;/span&gt; state&lt;span &gt;,&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; customers
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; addresses &lt;span &gt;ON&lt;/span&gt; customers&lt;span &gt;.&lt;/span&gt;id &lt;span &gt;=&lt;/span&gt; addresses&lt;span &gt;.&lt;/span&gt;customer_id
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; cities &lt;span &gt;ON&lt;/span&gt; addresses&lt;span &gt;.&lt;/span&gt;city_id &lt;span &gt;=&lt;/span&gt; cities&lt;span &gt;.&lt;/span&gt;id
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; state &lt;span &gt;ON&lt;/span&gt; cities&lt;span &gt;.&lt;/span&gt;state_id &lt;span &gt;=&lt;/span&gt; state&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, a view is just a query. Now, if you want to query all your customers from “Chicago,” you can query the view, which is much easier to write and more readable.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; complete_customer_data 
&lt;span &gt;WHERE&lt;/span&gt; city &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;Chicago&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;materialized-views-in-postgres&quot; &gt;&lt;a href=&quot;#materialized-views-in-postgres&quot; aria-label=&quot;materialized views in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Materialized Views in Postgres&lt;/h2&gt;
&lt;p&gt;Views are great for simplifying code, but with large datasets, you&apos;re not really saving any time when you run them because a view is only as fast as its underlying query. For costly queries and large datasets, this can be a drawback.&lt;/p&gt;
&lt;p&gt;A better solution when performance is a concern might be to create a &lt;strong&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/rules-materializedviews.html&quot;&gt;materialized view&lt;/a&gt;&lt;/strong&gt;. Materialized views allow you to cache the results of the query on disk in a temporary table. This makes running queries against the view much faster.&lt;/p&gt;
&lt;p&gt;The drawback to materialized views is that the cached results do not automatically update when the data in the base tables changes. So in the example above, if a customer changed their address and we made our view a materialized view, &lt;strong&gt;we would not see the change until we &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-refreshmaterializedview.html&quot;&gt;refreshed&lt;/a&gt; the view.&lt;/strong&gt; This reruns the query and caches the new results. You’ll see an example of this in the next section.&lt;/p&gt;
&lt;h2 id=&quot;creating-a-materialized-view-in-postgres&quot; &gt;&lt;a href=&quot;#creating-a-materialized-view-in-postgres&quot; aria-label=&quot;creating a materialized view in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating a Materialized View in Postgres&lt;/h2&gt;
&lt;p&gt;Imagine you have an online store and want to send out coupons to customers, offering them different deals based on how often they shop and where they live.&lt;/p&gt;
&lt;p&gt;You will start with a query that tracks customers by order frequency, how much they ordered, and where they are ordering from. Rather than rewriting this query each time, you can create a view that allows you to find a subset of customers. For example, you might want to see all customers who live in Texas that have bought more than three products in the past five months. Since this query needs to check against all of your customers and all of their orders, it will take a long time to run, so use a materialized view that you can refresh as often as you need to.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; MATERIALIZED &lt;span &gt;VIEW&lt;/span&gt; customer_order_volume &lt;span &gt;AS&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt;
  concat&lt;span &gt;(&lt;/span&gt;customers&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;-&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; orders&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; unique_id&lt;span &gt;,&lt;/span&gt;
  concat&lt;span &gt;(&lt;/span&gt;customers&lt;span &gt;.&lt;/span&gt;first_name&lt;span &gt;,&lt;/span&gt;&lt;span &gt;&apos; &apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; customers&lt;span &gt;.&lt;/span&gt;last_name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; customer_name&lt;span &gt;,&lt;/span&gt; 
  orders&lt;span &gt;.&lt;/span&gt;created_on &lt;span &gt;AS&lt;/span&gt; purchase_date&lt;span &gt;,&lt;/span&gt;
  addresses&lt;span &gt;.&lt;/span&gt;city &lt;span &gt;AS&lt;/span&gt; city&lt;span &gt;,&lt;/span&gt;
  addresses&lt;span &gt;.&lt;/span&gt;state &lt;span &gt;AS&lt;/span&gt; state&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;count&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;products&lt;span &gt;.&lt;/span&gt;product_name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; order_size&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;sum&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;products&lt;span &gt;.&lt;/span&gt;product_cost&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; order_cost
&lt;span &gt;FROM&lt;/span&gt; orders
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; customers &lt;span &gt;ON&lt;/span&gt; orders&lt;span &gt;.&lt;/span&gt;customer_id &lt;span &gt;=&lt;/span&gt; customers&lt;span &gt;.&lt;/span&gt;id
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; products_orders po &lt;span &gt;ON&lt;/span&gt; orders&lt;span &gt;.&lt;/span&gt;id &lt;span &gt;=&lt;/span&gt; po&lt;span &gt;.&lt;/span&gt;order_id
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; products &lt;span &gt;ON&lt;/span&gt; po&lt;span &gt;.&lt;/span&gt;product_id &lt;span &gt;=&lt;/span&gt; products&lt;span &gt;.&lt;/span&gt;id
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; addresses &lt;span &gt;ON&lt;/span&gt; addresses&lt;span &gt;.&lt;/span&gt;customer_id &lt;span &gt;=&lt;/span&gt; orders&lt;span &gt;.&lt;/span&gt;customer_id
&lt;span &gt;GROUP&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; customer_name&lt;span &gt;,&lt;/span&gt; purchase_date&lt;span &gt;,&lt;/span&gt; city&lt;span &gt;,&lt;/span&gt; state&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This view combines the &lt;code &gt;customer_id&lt;/code&gt; and &lt;code &gt;order_id&lt;/code&gt; to create a unique identifier for each row. This will help you out later in the tutorial.&lt;/p&gt;
&lt;p&gt;You can query materialized views the same way you queried the regular view, but this time, the view’s results have been cached, so the underlying query doesn’t run again.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; customer_order_volume
&lt;span &gt;WHERE&lt;/span&gt; state &lt;span &gt;in&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;TX&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;IL&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;OH&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; state&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When you want to refresh the data, run:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;REFRESH MATERIALIZED &lt;span &gt;VIEW&lt;/span&gt; customer_order_volume&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/sql-refreshmaterializedview.html&quot;&gt;Refreshing a view&lt;/a&gt; like this is the fastest method, but you risk blocking other connections trying to read from the view during the refresh. If you want to be able to refresh the view without interrupting read access, you’ll need to do a concurrent refresh:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;REFRESH MATERIALIZED &lt;span &gt;VIEW&lt;/span&gt; CONCURRENTLY customer_order_volume&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This only works if your view has a unique identifier: a column or comma separated list of columns from the view. You need to explicitly set it by creating an index on your materialized view:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;UNIQUE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; customer_order_volume&lt;span &gt;(&lt;/span&gt;unique_id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also remove the view if you don&apos;t need it anymore:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;DROP&lt;/span&gt; &lt;span &gt;VIEW&lt;/span&gt; customer_order_volume&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;using-postgres-views-in-django-and-python&quot; &gt;&lt;a href=&quot;#using-postgres-views-in-django-and-python&quot; aria-label=&quot;using postgres views in django and python permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using Postgres Views in Django and Python&lt;/h2&gt;
&lt;p&gt;First, the bad news: as of this writing, Django&apos;s ORM cannot create views for you. You’ll have to write some raw SQL for Django to run during the migration.&lt;/p&gt;
&lt;p&gt;The good news is that once the view is created, it&apos;s relatively easy to use it in Django. You just need to set up a model like you would for any other table in the database. In the following sections, you’ll create a materialized view and a method to refresh it. If you want to use a regular view, the process is the same, you just won’t need the refresh method.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;the-model&quot; &gt;&lt;a href=&quot;#the-model&quot; aria-label=&quot;the model permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The Model&lt;/h3&gt;
&lt;p&gt;The model attributes should reflect the columns returned by your view just like they would for any other table.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;CustomerOrderVolume&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    unique_id   &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; primary_key&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    customer_name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    city          &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    state         &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    purchase_date &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;DateField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    order_size    &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;IntegerField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    order_cost    &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;FloatField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

    &lt;span &gt;class&lt;/span&gt; &lt;span &gt;Meta&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        managed &lt;span &gt;=&lt;/span&gt; &lt;span &gt;False&lt;/span&gt;
        db_table&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;customer_order_volume&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Most notable here is the &lt;code &gt;Meta&lt;/code&gt; class. Setting &lt;code &gt;manage&lt;/code&gt; to false tells Django you don&apos;t need it to create the table in the migration. You also need to explicitly set the &lt;code &gt;db_table&lt;/code&gt; name so that Django knows which table to run queries on.&lt;/p&gt;
&lt;p&gt;The last thing to note about the model is that you need to set one of our fields as a primary key. Otherwise, Django will expect a column called &lt;code &gt;id&lt;/code&gt; and throw an error when it doesn&apos;t find one. In this case, you can again take advantage of the unique ID field you’ll create for the view.&lt;/p&gt;
&lt;p&gt;Create your migration as usual. After the migration is created, add a call to the &lt;code &gt;RunSQL&lt;/code&gt; method in the options section of the migration to create the view:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; migrations&lt;span &gt;,&lt;/span&gt; models

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;Migration&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;

    initial &lt;span &gt;=&lt;/span&gt; &lt;span &gt;True&lt;/span&gt;

    dependencies &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;

    operations &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        migrations&lt;span &gt;.&lt;/span&gt;CreateModel&lt;span &gt;(&lt;/span&gt;
            name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;CustomerOrderVolume&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            fields&lt;span &gt;=&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;unique_id&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; primary_key&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; serialize&lt;span &gt;=&lt;/span&gt;&lt;span &gt;False&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;customer_name&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;city&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;state&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;purchase_date&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;DateField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;order_size&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;IntegerField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;order_cost&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;FloatField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            options&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;
                &lt;span &gt;&apos;db_table&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;customer_order_volume&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&apos;managed&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;False&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
         migrations&lt;span &gt;.&lt;/span&gt;RunSQL&lt;span &gt;(&lt;/span&gt;
            &lt;span &gt;&quot;&quot;&quot;
            CREATE MATERIALIZED VIEW customer_order_volume AS
                SELECT
                concat(customers.id, orders.id) AS unique_id, 
                concat(customers.first_name,&apos; &apos;, customers.last_name) AS customer_name, 
                orders.created_on AS purchase_date,
                addresses.city AS city,
                addresses.state AS state,
                count(products.product_name) AS order_size,
                sum(products.product_cost) AS order_cost
                FROM orders
                INNER JOIN customers ON orders.customer_id = customers.id
                INNER JOIN products_orders po ON orders.id = po.order_id
                INNER JOIN products ON po.product_id = products.id
                INNER JOIN addresses ON addresses.customer_id = orders.customer_id
                GROUP BY unique_id, customer_name, purchase_date, city, state;
            &quot;&quot;&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;DROP VIEW customer_order_volume;&quot;&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Supply the &lt;code &gt;RunSQL&lt;/code&gt; method with SQL code to create and destroy the view. When you run the migrations, Django won’t create a &lt;code &gt;customer_order_volume&lt;/code&gt; table because you set managed to &lt;code &gt;false&lt;/code&gt;, but it will run the raw SQL and create the view for you.&lt;/p&gt;
&lt;p&gt;Finally, create a refresh method that you can call anytime you want to update your materialized view. I chose to create it as a class method, but this is not required. You can do this anywhere since all you are doing is executing raw SQL.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;@classmethod&lt;/span&gt;
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;refresh_view&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;cl&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;with&lt;/span&gt; connection&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; cursor&lt;span &gt;:&lt;/span&gt;
            cursor&lt;span &gt;.&lt;/span&gt;execute&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_volume&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This method can be called whenever you want to repopulate your view’s data. This could be done via a cron job that runs at night when traffic to the site is low.&lt;/p&gt;
&lt;p&gt;Now, you can test the view from the Django shell:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;In &lt;span &gt;[&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; c &lt;span &gt;=&lt;/span&gt; CustomerOrderVolume&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;all&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
In &lt;span &gt;[&lt;/span&gt;&lt;span &gt;4&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; c
Out&lt;span &gt;[&lt;/span&gt;&lt;span &gt;4&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;QuerySet &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt;CustomerOrderVolume&lt;span &gt;:&lt;/span&gt; CustomerOrderVolume &lt;span &gt;object&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;Jonathan Griffith&lt;span &gt;)&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;CustomerOrderVolume&lt;span &gt;:&lt;/span&gt; CustomerOrderVolume &lt;span &gt;object&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;Stephanie Fernandez&lt;span &gt;)&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;CustomerOrderVolume&lt;span &gt;:&lt;/span&gt; CustomerOrderVolume &lt;span &gt;object&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;Austin Burns&lt;span &gt;)&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see that Django returns a query set just like it would with any other model. Similarly, you can filter and access attributes on the objects just as you&apos;d expect:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;In &lt;span &gt;[&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; order &lt;span &gt;=&lt;/span&gt; c&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
In &lt;span &gt;[&lt;/span&gt;&lt;span &gt;4&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; order
Out&lt;span &gt;[&lt;/span&gt;&lt;span &gt;4&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;CustomerOrderVolume&lt;span &gt;:&lt;/span&gt; CustomerOrderVolume &lt;span &gt;object&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;Adam Turner&lt;span &gt;)&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
In &lt;span &gt;[&lt;/span&gt;&lt;span &gt;5&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; order&lt;span &gt;.&lt;/span&gt;purchase_date
Out&lt;span &gt;[&lt;/span&gt;&lt;span &gt;5&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; datetime&lt;span &gt;.&lt;/span&gt;datetime&lt;span &gt;(&lt;/span&gt;&lt;span &gt;2020&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;7&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;9&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;20&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;50&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;43&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;895459&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is a good start. From here, a helpful addition would be a database table that keeps track of how often the view gets refreshed. You could set up a cron job to run your refresh function for you at night or on the weekends, or it could be &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/topics/signals/&quot;&gt;called from a signal&lt;/a&gt; when the underlying models are updated. Be aware that the refresh might take a while if you have a lot of underlying data, so you probably don’t want to call it too frequently.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this post, you saw the two different types of views available in Postgresql, and the reasons you might want to create a view for your application. Views are useful if you want to limit the amount of code you write each time you query the database, cut down on the complexity of a large query, or cache the results of a costly query. Whatever your reason, once your view is created, it&apos;s just a matter of setting up your Django model correctly to get it working in your Python application. Finally, don’t forget to create a refresh method to update the view if you elect to use a materialized view.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article we’d appreciate it if you’d &lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DUsing%20PostgreSQL%20Views%20in%20Django%22%20-%20In%20this%20article,%20%40pganalyze%20share%20how%20views%20differ%20from%20materialized%20views%20and%20show%20how%20you%20can%20use%20%23Postgres%20views%20to%20make%20querying%20aggregated%20data%20easier%20and%20faster%3A%20https://pganalyze.com/blog/postgresql-views-django-python&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PS:&lt;/strong&gt; If you are interested in learning about views and materialized views in Ruby on Rails check out our article about it here: &lt;a href=&quot;https://pganalyze.com/blog/materialized-views-ruby-rails&quot;&gt;Effectively Using Materialized Views in Ruby on Rails&lt;/a&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[How we deconstructed the Postgres planner to find indexing opportunities]]></title><description><![CDATA[Everyone who has used Postgres has directly or indirectly used the Postgres planner. The Postgres planner is central to determining how a query gets executed, whether indexes get used, how tables are joined, and more. When Postgres asks itself "How do we run this query?”, the planner answers. And just like Postgres has evolved over decades, the planner has not stood still either. It can sometimes be challenging to understand what exactly the Postgres planner does, and which data it bases its…]]></description><link>https://pganalyze.com/blog/deconstructing-the-postgres-planner</link><guid isPermaLink="false">https://pganalyze.com/blog/deconstructing-the-postgres-planner</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Tue, 02 Nov 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Everyone who has used Postgres has directly or indirectly used the Postgres planner. The Postgres planner is central to determining how a query gets executed, whether indexes get used, how tables are joined, and more. When Postgres asks itself &lt;em&gt;&quot;How do we run this query?”&lt;/em&gt;, the planner answers.&lt;/p&gt;
&lt;p&gt;And just like Postgres has evolved over decades, the planner has not stood still either. &lt;strong&gt;It can sometimes be challenging to understand what exactly the Postgres planner does&lt;/strong&gt;, and which data it bases its decisions on.&lt;/p&gt;
&lt;p&gt;Earlier this year we set out to gain a deep understanding of the planner to improve indexing tools for Postgres. Based on this work we launched the first iteration of the &lt;a src=&quot;https://pganalyze.com/blog/introducing-pganalyze-index-advisor&quot;&gt;pganalyze Index Advisor&lt;/a&gt; over a month ago, and have received an incredible amount of feedback and overall response.&lt;/p&gt;
&lt;p&gt;In this post we take a closer look at &lt;strong&gt;how we extracted the planner into a standalone library&lt;/strong&gt;, just like we did with &lt;a src=&quot;https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser&quot;&gt;pg_query&lt;/a&gt;. We then assess whether this approach compares to an actually running server, and what is possible now that we can run the planner code. Based on this we look at how we used its decision making know-how to find indexing opportunities, and review the topic of clause selectivity, and how we incorporated feedback by a Postgres community member.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#planning-a-postgres-query-without-a-running-database-server&quot;&gt;Planning a Postgres query without a running database server&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#how-accurate-is-this-planning-process&quot;&gt;How accurate is this planning process?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#finding-multiple-possible-plan-paths-not-just-the-best-path&quot;&gt;Finding multiple possible plan paths, not just the best path&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#making-index-recommendations-based-on-restriction-clauses&quot;&gt;Making index recommendations based on restriction clauses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#understanding-postgres-clause-selectivity&quot;&gt;Understanding Postgres clause selectivity&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#how-we-incorporated-postgres-community-feedback&quot;&gt;How we incorporated Postgres community feedback&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#creating-the-best-index-vs-creating-good-enough-indexes&quot;&gt;Creating the best index, vs creating “good enough” indexes&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#join-us-for-design-research-sessions&quot;&gt;Join us for design research sessions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor architecture&quot; title=&quot;pganalyze Index Advisor architecture&quot; src=&quot;https://pganalyze.com/static/0f186601ce07fcb5307920523985884e/1d69c/index_advisor_architecture_short.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;h2 id=&quot;planning-a-postgres-query-without-a-running-database-server&quot; &gt;&lt;a href=&quot;#planning-a-postgres-query-without-a-running-database-server&quot; aria-label=&quot;planning a postgres query without a running database server permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Planning a Postgres query without a running database server&lt;/h2&gt;
&lt;p&gt;At pganalyze we offer performance recommendations for production database systems, without requiring complex installation steps or version upgrades. Whilst Postgres’ extension system is very capable, and we have many ideas on what we could track or do inside Postgres itself, &lt;strong&gt;we intentionally decided not to focus on a Postgres extension for giving index advice&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;There are three top motivations for not creating an extension:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Index decisions often happen during development, where the database that you are working with is not production sized&lt;/li&gt;
&lt;li&gt;Not everyone has direct access to the production database - it’s important we create tooling that can be used by the whole development team&lt;/li&gt;
&lt;li&gt;Adopting a new Postgres extension on a production database is risky, especially if the code is new - and you may not be able to install custom extensions (e.g. on Amazon RDS)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We’ve thus focused on creating something that runs separately from Postgres, but knows how Postgres works. Our approach is inspired by our work on &lt;a src=&quot;https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser&quot;&gt;pg_query&lt;/a&gt;, and enables planning a query solely based on the query text, the schema definition, and table statistics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We utilized libclang to automatically extract source code from Postgres&lt;/strong&gt;, &lt;a src=&quot;https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser#using-libclang-to-extract-c-source-code-from-postgres&quot;&gt;just like we&apos;ve done for pg_query&lt;/a&gt;. Whilst for pg_query we extracted a little bit over 100,000 lines of Postgres source, for the planner we extracted almost 470,000 lines of Postgres source, more than 4x the amount of code. For reference, Postgres itself is almost 1,000,000 lines of source code (as determined by &lt;code &gt;sloccount&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Examples of code from Postgres we didn’t use:&lt;/strong&gt; The executor (except for some initialization routines), the storage subsystem, frontend code, and various specialized code paths.&lt;/p&gt;
&lt;p&gt;A good amount of engineering time later, we ended up with a seemingly simple function in a C library, that takes a query, a schema definition, and returns a result similar to an EXPLAIN plan:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
 * Plan the provided query utilizing the schema definition and the
 * provided table statistics, and return an EXPLAIN-like result.
 */&lt;/span&gt;
PgPlanResult &lt;span &gt;pg_plan&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;char&lt;/span&gt;&lt;span &gt;*&lt;/span&gt; query&lt;span &gt;,&lt;/span&gt; &lt;span &gt;const&lt;/span&gt; &lt;span &gt;char&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;schema_and_statistics&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
  …
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;This function is idempotent&lt;/strong&gt;, that is, when you pass the same set of input parameters, you will always get the same output parameters.&lt;/p&gt;
&lt;p&gt;This required some additional modifications to the extracted code (we have about 90 small patches to adjust certain code paths), especially in places where Postgres does the rare on-demand checking of file sizes, or looking at the B-tree meta page. All of these are instead a fixed input parameter, defined using &lt;code &gt;SET&lt;/code&gt; commands in the schema definition.&lt;/p&gt;
&lt;h3 id=&quot;how-accurate-is-this-planning-process&quot; &gt;&lt;a href=&quot;#how-accurate-is-this-planning-process&quot; aria-label=&quot;how accurate is this planning process permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How accurate is this planning process?&lt;/h3&gt;
&lt;p&gt;Let’s take a look at one of our own test queries:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;WITH&lt;/span&gt; unused_indexes &lt;span &gt;AS&lt;/span&gt; MATERIALIZED &lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;SELECT&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;last_used_at&lt;span &gt;,&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;database_id&lt;span &gt;,&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;table_id 
    &lt;span &gt;FROM&lt;/span&gt; schema_indexes
         &lt;span &gt;JOIN&lt;/span&gt; schema_tables &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;schema_indexes&lt;span &gt;.&lt;/span&gt;table_id &lt;span &gt;=&lt;/span&gt; schema_tables&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
   &lt;span &gt;WHERE&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;database_id &lt;span &gt;IN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
         &lt;span &gt;AND&lt;/span&gt; schema_tables&lt;span &gt;.&lt;/span&gt;invalidated_at_snapshot_id &lt;span &gt;IS&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;
         &lt;span &gt;AND&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;invalidated_at_snapshot_id &lt;span &gt;IS&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;
         &lt;span &gt;AND&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;is_valid
         &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;is_unique &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;&amp;lt;&gt;&lt;/span&gt; &lt;span &gt;ALL&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;schema_indexes&lt;span &gt;.&lt;/span&gt;&lt;span &gt;columns&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
         &lt;span &gt;AND&lt;/span&gt; schema_indexes&lt;span &gt;.&lt;/span&gt;last_used_at &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;now&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;-&lt;/span&gt; &lt;span &gt;&apos;14 day&apos;&lt;/span&gt;::&lt;span &gt;interval&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; ui&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; ui&lt;span &gt;.&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; ui&lt;span &gt;.&lt;/span&gt;last_used_at&lt;span &gt;,&lt;/span&gt; ui&lt;span &gt;.&lt;/span&gt;database_id&lt;span &gt;,&lt;/span&gt; ui&lt;span &gt;.&lt;/span&gt;table_id 
  &lt;span &gt;FROM&lt;/span&gt; unused_indexes ui
 &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;COALESCE&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
         &lt;span &gt;SELECT&lt;/span&gt; size_bytes
           &lt;span &gt;FROM&lt;/span&gt; schema_index_stats_35d sis
          &lt;span &gt;WHERE&lt;/span&gt; sis&lt;span &gt;.&lt;/span&gt;schema_index_id &lt;span &gt;=&lt;/span&gt; ui&lt;span &gt;.&lt;/span&gt;id
                &lt;span &gt;AND&lt;/span&gt; collected_at &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;2021-10-31 06:40:04&apos;&lt;/span&gt; &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;32768&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query is used inside the pganalyze application to find indexes that were not in use in the last 14 days. Running &lt;code &gt;EXPLAIN (FORMAT JSON)&lt;/code&gt; for the query on our production system, we get a result like this:&lt;/p&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt; &lt;span &gt;[&lt;/span&gt;
   &lt;span &gt;{&lt;/span&gt;
     &lt;span &gt;&quot;Plan&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
       &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;CTE Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
       …
       &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3172.85&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
       &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3311.01&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
       &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;11&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
       &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;60&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
       &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(COALESCE((SubPlan 2), &apos;0&apos;::bigint) &gt; 32768)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
       &lt;span &gt;&quot;Plans&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
         &lt;span &gt;{&lt;/span&gt;
           &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Nested Loop&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           …
           &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;1.12&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3172.85&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;32&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;63&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
           &lt;span &gt;&quot;Inner Unique&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;Plans&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
             &lt;span &gt;{&lt;/span&gt;
               &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Index Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Parent Relationship&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Outer&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
               …
               &lt;span &gt;&quot;Index Name&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;index_schema_indexes_on_database_id&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               …
               &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.56&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2581.00&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;69&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;63&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Index Cond&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(database_id = 1)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(is_valid AND (NOT is_unique) AND (last_used_at &amp;lt; &apos;2021-10-17&apos;::date) AND (0 &amp;lt;&gt; ALL (columns)))&quot;&lt;/span&gt;
             &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
             &lt;span &gt;{&lt;/span&gt;
               &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Index Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Parent Relationship&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Inner&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               …
               &lt;span &gt;&quot;Index Name&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;schema_tables_pkey&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               …
               &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.56&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;8.58&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;8&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Index Cond&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(id = schema_indexes.table_id)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
               &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(invalidated_at_snapshot_id IS NULL)&quot;&lt;/span&gt;
             &lt;span &gt;}&lt;/span&gt;
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that we are intentionally running EXPLAIN without ANALYZE, since we care about the cost-based estimation model used by the planner.&lt;/p&gt;
&lt;p&gt;And now, running the same query, with its schema definition and production statistics (but not the actual table data!) provided to the &lt;code &gt;pg_plan&lt;/code&gt; function:&lt;/p&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;[&lt;/span&gt;
  &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;&quot;Plan&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      &lt;span &gt;&quot;Node ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;CTE Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      …
      &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3181.43&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3324.07&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;11&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;60&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(COALESCE((SubPlan 2), &apos;0&apos;::bigint) &gt; 32768)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;Plans&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        &lt;span &gt;{&lt;/span&gt;
          &lt;span &gt;&quot;Node ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Nested Loop&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          …
          &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;1.12&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3181.43&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;33&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;63&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;Inner Unique&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;Plans&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
            &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;Node ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Index Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Parent Relationship&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Outer&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
              …
              &lt;span &gt;&quot;Index Name&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;index_schema_indexes_on_database_id&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              …
              &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.56&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2581.00&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;70&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;63&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Index Cond&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(database_id = 1)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(is_valid AND (NOT is_unique) AND (last_used_at &amp;lt; &apos;2021-10-17&apos;::date) AND (0 &amp;lt;&gt; ALL (columns)))&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;Node ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Index Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Parent Relationship&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Inner&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              …
              &lt;span &gt;&quot;Index Name&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;schema_tables_pkey&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              …
              &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.56&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;8.58&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Plan Rows&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Plan Width&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;8&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Index Cond&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(id = schema_indexes.table_id)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(invalidated_at_snapshot_id IS NULL)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;
            …&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, for this query &lt;strong&gt;the plan cost estimation is within a 1% margin of the actual production estimates&lt;/strong&gt;. That means, we provided the Postgres planner the exact same input parameters as used on the actual database server, and the cost calculation matched almost to the dot.&lt;/p&gt;
&lt;p&gt;Now that we’ve established a basis for running the planner and getting cost estimates, let’s look at what we can do with this.&lt;/p&gt;
&lt;p&gt;
&lt;a src=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Effective Indexing eBook promotion banner&quot; title=&quot;Effective Indexing eBook promotion banner&quot; src=&quot;https://pganalyze.com/static/b24fdd95dbc38757fe354c86d9ad9aaa/acb04/promo_ebook.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id=&quot;finding-multiple-possible-plan-paths-not-just-the-best-path&quot; &gt;&lt;a href=&quot;#finding-multiple-possible-plan-paths-not-just-the-best-path&quot; aria-label=&quot;finding multiple possible plan paths not just the best path permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Finding multiple possible plan paths, not just the best path&lt;/h3&gt;
&lt;p&gt;When the Postgres planner plans a query, it is under time-sensitive circumstances. That is, all extra work to find a better plan would lead to the planner itself being slow. To be fast, the planner quickly throws away plan options it does not consider worth pursuing.&lt;/p&gt;
&lt;p&gt;That unfortunately means we can’t just run EXPLAIN with a flag that says “show me all possible plan variants” - the planner code is simply not written in a way that’s possible, at least not today.&lt;/p&gt;
&lt;p&gt;However, with our &lt;code &gt;pg_plan&lt;/code&gt; logic running outside the server itself, we do not have these strict speed requirements, and can therefore spend more time looking at alternatives and keeping them around for analysis. For example, here is the internal information we have for a scan node on a table, that illustrates the different paths that could be taken to fulfill the query:&lt;/p&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt;    &lt;span &gt;&quot;Scans&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
      &lt;span &gt;{&lt;/span&gt;
        &lt;span &gt;&quot;Node ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;&quot;Relation OID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;16398&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;&quot;Restriction Clauses&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
	      …
        &lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;&quot;Plans&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;
            &lt;span &gt;&quot;Plan&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Index Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Index Name&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;schema_indexes_table_id_name_idx&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              …
              &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.68&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;352.94&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Index Cond&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(table_id = id)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(is_valid AND (NOT is_unique) AND (last_used_at &amp;lt; &apos;2021-10-17&apos;::date) AND (database_id = 1) AND (0 &amp;lt;&gt; ALL (columns)))&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;
            &lt;span &gt;&quot;Plan&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Index Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Index Name&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;index_schema_indexes_on_database_id&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              ...
              &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.56&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2581.00&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Index Cond&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(database_id = 1)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(is_valid AND (NOT is_unique) AND (last_used_at &amp;lt; &apos;2021-10-17&apos;::date) AND (0 &amp;lt;&gt; ALL (columns)))&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;
            &lt;span &gt;&quot;Plan&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;Node ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Node Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Seq Scan&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              ...
              &lt;span &gt;&quot;Startup Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.00&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Total Cost&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;3933763.60&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
              &lt;span &gt;&quot;Filter&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;((invalidated_at_snapshot_id IS NULL) AND is_valid AND (NOT is_unique) AND (last_used_at &amp;lt; &apos;2021-10-17&apos;::date) AND (database_id = 1) AND (0 &amp;lt;&gt; ALL (columns)))&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, the &lt;code &gt;Seq Scan&lt;/code&gt; option was clearly more expensive and not considered. You can also see the different index options and their costs.&lt;/p&gt;
&lt;p&gt;What is especially interesting with this plan is that there was actually a cheaper index scan available, but Postgres did not end up using it in the final plan. This is because the Nested Loop ended up being cheaper by using the &lt;code &gt;schema_indexes&lt;/code&gt; table as the outer table in the nested loop. The first index could only have been used if the Nested Loop relationship was inverted. That is, if &lt;code &gt;table_id&lt;/code&gt; values were used as the input to the &lt;code &gt;schema_indexes&lt;/code&gt; scan, instead of &lt;code &gt;table_id&lt;/code&gt; values being the output thats matched against the &lt;code &gt;schema_tables&lt;/code&gt; table&apos;s &lt;code &gt;id&lt;/code&gt; column.&lt;/p&gt;
&lt;p&gt;As you can see, this data can be especially useful when determining why a particular index wasn’t used, or to consider how to consolidate indexes. In the pganalyze Index Advisor this is surfaced visually in the advanced analysis view:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor advanced analysis&quot; title=&quot;pganalyze Index Advisor advanced analysis&quot; src=&quot;https://pganalyze.com/static/c201ca2f20ace7ff5e151ea54eaf52d4/1d69c/index_advisor_index_options.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Note we also indicate the individual filter clauses for the scan, and show which indexes are matching each clause.&lt;/p&gt;
&lt;h3 id=&quot;making-index-recommendations-based-on-restriction-clauses&quot; &gt;&lt;a href=&quot;#making-index-recommendations-based-on-restriction-clauses&quot; aria-label=&quot;making index recommendations based on restriction clauses permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Making index recommendations based on restriction clauses&lt;/h3&gt;
&lt;p&gt;In addition to comparing different existing indexes, we can use the data available to the planner to ask the question &lt;em&gt;“What would the best index look like?”&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;For a scan like the above example, we get a list of restriction clauses, which is a combination of the WHERE clauses as well as the JOIN condition. For index scans to work as expected, one or more of the clauses need to match the index definition.&lt;/p&gt;
&lt;p&gt;The data looks like this for each scan:&lt;/p&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt;        &lt;span &gt;&quot;Restriction Clauses&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;
            &lt;span &gt;&quot;ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;Expression&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;schema_indexes.is_valid&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;Selectivity&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.9926&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;Relation Column&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;is_valid&quot;&lt;/span&gt;
          &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;
            &lt;span &gt;&quot;ID&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;Expression&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;(schema_indexes.database_id = 1)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;Selectivity&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.0001&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;OpExpr&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
              &lt;span &gt;&quot;Operator&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                &lt;span &gt;&quot;Oid&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;416&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&quot;Name&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;=&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&quot;Left Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;bigint&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&quot;Right Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;integer&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&quot;Result Type&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;boolean&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&quot;Source Func&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;int84eq&quot;&lt;/span&gt;
              &lt;span &gt;}&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;&quot;Relation Column&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;database_id&quot;&lt;/span&gt;
          &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
         ...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using this data we then attempt a best guess at making a new index, run a &lt;code &gt;CREATE INDEX&lt;/code&gt; command behind the scenes, and re-run the Postgres planner to reconsider the new index. If the cost of the new scan improves on the initial scan we make a recommendation and note the difference in estimated cost.&lt;/p&gt;
&lt;p&gt;In summary, you can imagine the Index Advisor working roughly like this:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor architecture&quot; title=&quot;pganalyze Index Advisor architecture&quot; src=&quot;https://pganalyze.com/static/9ca071020e082c267e1de228e0ba4727/1d69c/index_advisor_architecture.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;h2 id=&quot;understanding-postgres-clause-selectivity&quot; &gt;&lt;a href=&quot;#understanding-postgres-clause-selectivity&quot; aria-label=&quot;understanding postgres clause selectivity permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Understanding Postgres clause selectivity&lt;/h2&gt;
&lt;p&gt;If you look closely at the earlier advanced analysis screenshot, you will notice a new field that we’ve just made available in a new Index Advisor update: &lt;strong&gt;Selectivity&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is Selectivity?&lt;/strong&gt; It indicates what fraction of rows of the table will be matched by the particular clause of the query. This information is then used by Postgres to estimate the row count that a node returns, as well as determine the cost of that plan node.&lt;/p&gt;
&lt;p&gt;Selectivity estimations are front and center to how the planner operates, but they are unfortunately hidden behind the scenes, and historically one would have had to resort to counting/filtering the actual data to confirm how frequent certain values are, or do manual queries against the Postgres catalog.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How does the planner know the selectivity?&lt;/strong&gt; Counting actual table rows would be very expensive in time sensitive situations. Instead, it primarily relies on the &lt;code &gt;pg_statistic&lt;/code&gt; table (often accessed through the &lt;code &gt;pg_stats&lt;/code&gt; view for debugging), that keeps table statistics collected by the &lt;code &gt;ANALYZE&lt;/code&gt; command in Postgres. You can learn more about how the Postgres planner uses statistics in the &lt;a href=&quot;https://www.postgresql.org/docs/current/planner-stats.html&quot;&gt;Postgres documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The data in the &lt;code &gt;pg_stats&lt;/code&gt; view can be queried like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_stats &lt;span &gt;WHERE&lt;/span&gt; tablename &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;z&apos;&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; attname &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;a&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;-[ RECORD 1 ]----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | z
attname                | a
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 17
most_common_vals       | {2,3,7,12,13,4,1,5,11,14,9,6,10,8,15,16,0}
most_common_freqs      | {0.0653,0.06446667,0.063766666,0.06363333,0.063533336,0.063433334,0.0629,0.061966665,0.061833333,0.0618,0.0611,0.0605,0.0604,0.060366668,0.059666667,0.0332,0.032133333}
histogram_bounds       | 
correlation            | 0.061594862
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this example you can see that there are a total of 17 distinct values (&lt;code &gt;n_distinct&lt;/code&gt;), with values between 1 and 15 having equal frequency, and 0 and 16 being less frequent (&lt;code &gt;most_common_vals&lt;/code&gt;/&lt;code &gt;most_common_freqs&lt;/code&gt;). None of the rows have NULL values (&lt;code &gt;null_frac&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Now, to have accurate plans in the Index Advisor, this same information can be provided using the new special SET commands in the schema definition:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; pganalyze&lt;span &gt;.&lt;/span&gt;avg_width&lt;span &gt;.&lt;/span&gt;&lt;span &gt;public&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;z&lt;span &gt;.&lt;/span&gt;a &lt;span &gt;=&lt;/span&gt; &lt;span &gt;4&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SET&lt;/span&gt; pganalyze&lt;span &gt;.&lt;/span&gt;correlation&lt;span &gt;.&lt;/span&gt;&lt;span &gt;public&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;z&lt;span &gt;.&lt;/span&gt;a &lt;span &gt;=&lt;/span&gt; &lt;span &gt;0.061594862&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SET&lt;/span&gt; pganalyze&lt;span &gt;.&lt;/span&gt;most_common_freqs&lt;span &gt;.&lt;/span&gt;&lt;span &gt;public&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;z&lt;span &gt;.&lt;/span&gt;a &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;{0.0653,0.06446667,0.063766666,0.06363333,0.063533336,0.063433334,0.0629,0.061966665,0.061833333,0.0618,0.0611,0.0605,0.0604,0.060366668,0.059666667,0.0332,0.032133333}&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SET&lt;/span&gt; pganalyze&lt;span &gt;.&lt;/span&gt;most_common_vals&lt;span &gt;.&lt;/span&gt;&lt;span &gt;public&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;z&lt;span &gt;.&lt;/span&gt;a &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;{2,3,7,12,13,4,1,5,11,14,9,6,10,8,15,16,0}&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SET&lt;/span&gt; pganalyze&lt;span &gt;.&lt;/span&gt;n_distinct&lt;span &gt;.&lt;/span&gt;&lt;span &gt;public&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;z&lt;span &gt;.&lt;/span&gt;a &lt;span &gt;=&lt;/span&gt; &lt;span &gt;17&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SET&lt;/span&gt; pganalyze&lt;span &gt;.&lt;/span&gt;null_frac&lt;span &gt;.&lt;/span&gt;&lt;span &gt;public&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;z&lt;span &gt;.&lt;/span&gt;a &lt;span &gt;=&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can learn how to retrieve this information, as well as all the available settings, in the &lt;a src=&quot;https://pganalyze.com/docs/index-advisor/standalone/settings&quot;&gt;Index Advisor documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Based on this data we can now calculate the selectivity of a clause like &lt;code &gt;z.a = 12&lt;/code&gt; to determine that it is &lt;code &gt;0.0636&lt;/code&gt;. Or put differently, the planner estimates that 6.36% of the table would match this condition. This same information is now directly visible in the Index Advisor, when viewing the advanced analysis.&lt;/p&gt;
&lt;h3 id=&quot;how-we-incorporated-postgres-community-feedback&quot; &gt;&lt;a href=&quot;#how-we-incorporated-postgres-community-feedback&quot; aria-label=&quot;how we incorporated postgres community feedback permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How we incorporated Postgres community feedback&lt;/h3&gt;
&lt;p&gt;At this point we’d also like to give a shout-out to Hubert Lubaczewski (aka “depesz”), who &lt;a href=&quot;https://www.depesz.com/2021/10/22/why-is-it-hard-to-automatically-suggest-what-index-to-create/&quot;&gt;reviewed the initial version of the index advisor&lt;/a&gt;, had some critical feedback, and provided an example we could investigate further.&lt;/p&gt;
&lt;p&gt;Based on improvements we&apos;ve done, we now take selectivity estimates into account for index suggestions. In particular, we give priority to columns with low selectivity, i.e. those that match a small number of rows. Note this requires use of &lt;code &gt;SET&lt;/code&gt; commands in addition to the raw schema data for the best results.&lt;/p&gt;
&lt;p&gt;With these recent changes the pganalyze Index Advisor recommendation matches depesz&apos;s handcrafted index suggestion in the blog post:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Example result based on clause selectivity&quot; title=&quot;Example result based on clause selectivity&quot; src=&quot;https://pganalyze.com/static/199517e98ae9ffc34d5544934e7b0b13/1d69c/example_selectivity.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;This is a good example of how our planner-based index advisor approach can be improved and tuned, as its behavior is modeled on Postgres itself.&lt;/p&gt;
&lt;h2 id=&quot;creating-the-best-index-vs-creating-good-enough-indexes&quot; &gt;&lt;a href=&quot;#creating-the-best-index-vs-creating-good-enough-indexes&quot; aria-label=&quot;creating the best index vs creating good enough indexes permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating the best index, vs creating “good enough” indexes&lt;/h2&gt;
&lt;p&gt;Another question that came up in multiple conversations, is &lt;em&gt;“Should I just create all indexes that the Index Advisor recommends for each query?”&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Unless you have just a handful of queries, the answer to that is no - you shouldn’t just create every index, because that would slow down writes to the table, as they have to update each index separately.&lt;/p&gt;
&lt;p&gt;Today, the best way to utilize the index advisor for a whole database, is to try out different CREATE INDEX statements - and make sure to update the schema definition with your index definition, to have the Index Advisor make a determination based on the existing indexes.&lt;/p&gt;
&lt;p&gt;But we are taking this a step further. The work we are currently doing in this area is focused on two aspects:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Utilize query workload data from &lt;code &gt;pg_stat_statements&lt;/code&gt; to weigh common queries heavier in index recommendations, and come up with “good enough” indexes that cover more queries&lt;/li&gt;
&lt;li&gt;Estimate the write overhead of a new index, based on the number of updates/deletes/inserts on a table, as well as the estimated index size&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With this, not only are we targeting better summary recommendations, but we also want to help you determine when you can consolidate indexes, where you have two existing very similar indexes.&lt;/p&gt;
&lt;p&gt;Curious to learn more? Sign up to join us for a design research session:&lt;/p&gt;
&lt;h3 id=&quot;join-us-for-design-research-sessions&quot; &gt;&lt;a href=&quot;#join-us-for-design-research-sessions&quot; aria-label=&quot;join us for design research sessions permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Join us for design research sessions&lt;/h3&gt;
&lt;p&gt;If you are up for testing early prototypes and answering questions to help us understand your workflows better, then we would like to invite you to our design research sessions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.userinterviews.com/projects/C9tzb5mnxA/apply&quot;&gt;pganalyze Design Research Sign-Up&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Realistically, there will always be a trial-and-error aspect in making indexing decisions. But good tools can help guide you in those decisions, no matter your level of Postgres know-how. Our goal with the &lt;a src=&quot;https://pganalyze.com/index-advisor&quot;&gt;pganalyze Index Advisor&lt;/a&gt; is to make indexing an activity that can be done by the whole team, and where it’s easy to get a &lt;code &gt;CREATE INDEX&lt;/code&gt; statement to start working from.&lt;/p&gt;
&lt;p&gt;As you see, the Index Advisor is based on the core logic of Postgres itself, and that forms the basis for making complex assessments behind the scenes. We believe in an iterative process and sharing what we’ve learned, and hope to continue the conversation on how to make indexing better for Postgres.&lt;/p&gt;
&lt;p&gt;We&apos;ve recently made a number of updates to the Index Advisor. Give it a try, and use the &lt;a src=&quot;https://pganalyze.com/docs/index-advisor/standalone/settings&quot;&gt;new SET table statistics syntax&lt;/a&gt; for best results. Encounter an issue with the Index Advisor? You can provide feedback through our &lt;a href=&quot;https://github.com/pganalyze/index-advisor-feedback/discussions&quot;&gt;dedicated discussion board on GitHub&lt;/a&gt;, or send us a support request for in-app functionality.&lt;/p&gt;
&lt;p&gt;
&lt;a src=&quot;https://pganalyze.com/index-advisor&quot;&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor promotion banner&quot; title=&quot;pganalyze Index Advisor promotion banner&quot; src=&quot;https://pganalyze.com/static/7dad04148f9e0117c49a306ff9ab40b1/acb04/promo_index_advisor.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;If you want to share this article with your peers, feel free to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DHow%20we%20deconstructed%20the%20Postgres%20planner%20to%20find%20indexing%20opportunities%22%20-%20In%20this%20article,%20%40pganalyze%20shares%20how%20they%20built%20their%20new%20index%20advisor%20for%20%23Postgres%20and%20how%20they%20run%20the%20Postgres%20planner%20as%20a%20library%3A%20https://pganalyze.com/blog/deconstructing-the-postgres-planner&quot;&gt;tweet it&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[A better way to index your Postgres database: pganalyze Index Advisor]]></title><description><![CDATA[When you run an application with a relational database attached, you will no doubt have encountered this question: Which indexes should I create? For some of us, indexing comes naturally, and B-tree, GIN and GIST are words of everyday use. And for some of us it’s more challenging to find out which index to create, taking a lot of time to get right. But what unites us is that creating and tweaking indexes is part of our job when we use a relational database such as Postgres in production. We need…]]></description><link>https://pganalyze.com/blog/introducing-pganalyze-index-advisor</link><guid isPermaLink="false">https://pganalyze.com/blog/introducing-pganalyze-index-advisor</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Thu, 23 Sep 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Screenshot of the new pganalyze Index Advisor&quot; title=&quot;Screenshot of the new pganalyze Index Advisor&quot; src=&quot;https://pganalyze.com/static/8b658545c93873c38a1d2fb2d9be699d/1d69c/header-image.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;When you run an application with a relational database attached, you will no doubt have encountered this question: Which indexes should I create?&lt;/p&gt;
&lt;p&gt;For some of us, indexing comes naturally, and B-tree, GIN and GIST are words of everyday use. And for some of us it’s more challenging to &lt;a href=&quot;https://pganalyze.com/blog/postgres-create-index&quot;&gt;find out which index to create&lt;/a&gt;, taking a lot of time to get right. But what unites us is that creating and tweaking indexes is part of our job when we use a relational database such as Postgres in production. We need to get indexes right, in order to make sure our application performs well.&lt;/p&gt;
&lt;p&gt;There are multiple ways to determine which indexes get used in your Postgres database. For example, you may choose to query the &lt;code &gt;pg_stat_user_indexes&lt;/code&gt; table. There are Postgres extensions like HypoPG to try out hypothetical indexes on your database server. And some of us may decide to go ahead and simply &lt;a href=&quot;https://twitter.com/craigkerstiens/status/851817428833009664&quot;&gt;index every column on every table&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But the reality nowadays is that modern apps are complex, and applications built on Postgres grow at an incredible pace. This makes indexing more important, but also more challenging than ever. As developers we want to focus on what matters, and not spend hours investigating which Postgres index to create.&lt;/p&gt;
&lt;p&gt;At the beginning of this year we set out to improve the status quo for indexing with Postgres. And today, after many months of effort and having published an &lt;a href=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;eBook about Index in Postgres&lt;/a&gt;, we’re excited to announce the new &lt;a href=&quot;https://pganalyze.com/postgres-index-advisor&quot;&gt;pganalyze Index Advisor for Postgres&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Before we dive into all the details, let’s take a step back and ask ourselves “How could we determine which index to create?”&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#postgres-indexing-is-machine-learning-the-answer&quot;&gt;Postgres Indexing: Is machine learning the answer?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-postgres-determines-when-to-use-an-index&quot;&gt;How Postgres determines when to use an index&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#creating-the-best-postgres-index-for-your-query&quot;&gt;Creating the best Postgres index for your query&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#review-existing-indexes-with-the-index-advisor&quot;&gt;Review existing indexes with the Index Advisor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#try-out-the-index-advisor-for-free-with-the-standalone-tool&quot;&gt;Try out the Index Advisor for free with the standalone tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#automatic-index-advisor-for-your-production-queries-in-pganalyze&quot;&gt;Automatic index advisor for your production queries in pganalyze&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#pganalyze-index-advisor-and-new-pricing-plans&quot;&gt;pganalyze Index Advisor and new pricing plans&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;postgres-indexing-is-machine-learning-the-answer&quot; &gt;&lt;a href=&quot;#postgres-indexing-is-machine-learning-the-answer&quot; aria-label=&quot;postgres indexing is machine learning the answer permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Postgres Indexing: Is machine learning the answer?&lt;/h2&gt;
&lt;p&gt;It’s 2021, and of course we had to ask ourselves - is this a problem that requires ML and AI? Couldn’t we just train a model to create the right indexes for us?&lt;/p&gt;
&lt;p&gt;We turned to GitHub CoPilot, the most sophisticated AI-based helper that exists today for developers, and asked it to create an index for a real world query in our own Postgres database:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Screenshot of GitHub CoPilot trying to recommend an index&quot; title=&quot;Screenshot of GitHub CoPilot trying to recommend an index&quot; src=&quot;https://pganalyze.com/static/dd9c227bc19c128d86e050fe701192c8/1d69c/copilot.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Suffice to say that indexing like this is &lt;strong&gt;not effective&lt;/strong&gt;. You will end up with significant overhead due to indexing almost everything, including columns that are not even referenced in the query.&lt;/p&gt;
&lt;p&gt;Whilst this ML model will certainly improve, and there is research on more purpose-built solutions for databases, the point is: &lt;strong&gt;ML is not the magic solution we are looking for.&lt;/strong&gt; We need more than just machine learning to know which indexes to create.&lt;/p&gt;
&lt;p&gt;In fact, from our own experience, knowing which index to create does not require an ML model at all. Knowing how to create the best index can be done with a &lt;strong&gt;deterministic approach&lt;/strong&gt;, that takes into account production database queries and schema statistics, and has a detailed understanding of how Postgres works.&lt;/p&gt;
&lt;p&gt;And who knows best how Postgres works? Postgres itself!&lt;/p&gt;
&lt;h2 id=&quot;how-postgres-determines-when-to-use-an-index&quot; &gt;&lt;a href=&quot;#how-postgres-determines-when-to-use-an-index&quot; aria-label=&quot;how postgres determines when to use an index permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How Postgres determines when to use an index&lt;/h2&gt;
&lt;p&gt;We started out by asking ourselves the question: How does Postgres decide which index to use? We can find this logic in the &lt;a href=&quot;https://www.postgresql.org/docs/current/planner-optimizer.html&quot;&gt;Postgres planner&lt;/a&gt;, which takes a parsed query and turns it into an execution plan.&lt;/p&gt;
&lt;p&gt;Specifically, we decided to look at the function create_index_paths(..), where you can see that Postgres loops over all indexes on a particular table, and decides which indexes can be used:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;void&lt;/span&gt;
&lt;span &gt;create_index_paths&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PlannerInfo &lt;span &gt;*&lt;/span&gt;root&lt;span &gt;,&lt;/span&gt; RelOptInfo &lt;span &gt;*&lt;/span&gt;rel&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
	&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;

	&lt;span &gt;/* Skip the whole mess if no indexes */&lt;/span&gt;
	&lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;rel&lt;span &gt;-&gt;&lt;/span&gt;indexlist &lt;span &gt;==&lt;/span&gt; NIL&lt;span &gt;)&lt;/span&gt;
		&lt;span &gt;return&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

	&lt;span &gt;/* Bitmap paths are collected and then dealt with at the end */&lt;/span&gt;
	bitindexpaths &lt;span &gt;=&lt;/span&gt; bitjoinpaths &lt;span &gt;=&lt;/span&gt; joinorclauses &lt;span &gt;=&lt;/span&gt; NIL&lt;span &gt;;&lt;/span&gt;

	&lt;span &gt;/* Examine each index in turn */&lt;/span&gt;
	&lt;span &gt;foreach&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;lc&lt;span &gt;,&lt;/span&gt; rel&lt;span &gt;-&gt;&lt;/span&gt;indexlist&lt;span &gt;)&lt;/span&gt;
	&lt;span &gt;{&lt;/span&gt;
		IndexOptInfo &lt;span &gt;*&lt;/span&gt;index &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;IndexOptInfo &lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;lfirst&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;lc&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

		&lt;span &gt;/*
		 * Ignore partial indexes that do not match the query.
		 * (generate_bitmap_or_paths() might be able to do something with
		 * them, but that&apos;s of no concern here.)
		 */&lt;/span&gt;
		&lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;index&lt;span &gt;-&gt;&lt;/span&gt;indpred &lt;span &gt;!=&lt;/span&gt; NIL &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span &gt;!&lt;/span&gt;index&lt;span &gt;-&gt;&lt;/span&gt;predOK&lt;span &gt;)&lt;/span&gt;
			&lt;span &gt;continue&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
   &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Going into all this logic would likely fill multiple books, and it is based on decades of academic research. Cleary, Postgres is very sophisticated about determining which indexes can be used for a given query. Amongst the core decisions it makes are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does the index match the columns used in the query?&lt;/li&gt;
&lt;li&gt;Does the query’s operator match the operator class of the index?&lt;/li&gt;
&lt;li&gt;Does the index have a sort order that can be used by the query to avoid an explicit Sort step?&lt;/li&gt;
&lt;li&gt;Does the query condition match a partial index condition?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And many other requirements and heuristics that need extensive knowledge of Postgres’ inner workings.&lt;/p&gt;
&lt;p&gt;At pganalyze, we looked at this, and other functions, and we asked ourselves: &lt;em&gt;&lt;strong&gt;What if we used the Postgres planner to tell us which index it would like to see, based on a given query?&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;That is, instead of asking “does this index match this query?”, we are asking “what’s the perfect index for this query?”. Perfect as in: ticks all the boxes in terms of operators/operator classes, columns and data types, and can be used to fulfill the query filter and join clauses of the query, if possible.&lt;/p&gt;
&lt;p&gt;This logic based on the Postgres planner is the centerpiece of the new &lt;a href=&quot;https://pganalyze.com/postgres-index-advisor&quot;&gt;pganalyze Index Advisor&lt;/a&gt;. Our index advisor is available in the pganalyze app, but we also decided to provide a &lt;strong&gt;free, standalone version available to anyone&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Simply paste your query and schema data and get insights on whether existing indexes are useful, or learn why indexes you thought might help are ignored. Note that data uploaded to the standalone pganalyze Index Advisor stays local within your browser, unless you explicitly use the share functionality.&lt;/p&gt;
&lt;p&gt;Going forward in this article, when you see examples and screenshots of the index advisor for Postgres, we are showing the public, standalone tool.&lt;/p&gt;
&lt;p&gt;
&lt;a src=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Effective Indexing eBook promotion banner&quot; title=&quot;Effective Indexing eBook promotion banner&quot; src=&quot;https://pganalyze.com/static/b24fdd95dbc38757fe354c86d9ad9aaa/acb04/promo_ebook.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id=&quot;creating-the-best-postgres-index-for-your-query&quot; &gt;&lt;a href=&quot;#creating-the-best-postgres-index-for-your-query&quot; aria-label=&quot;creating the best postgres index for your query permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating the best Postgres index for your query&lt;/h2&gt;
&lt;p&gt;Let’s go back to our earlier example, and run it through the pganalyze Index Advisor:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor Example showing better recommendation than GitHub CoPilot&quot; title=&quot;pganalyze Index Advisor Example showing better recommendation than GitHub CoPilot&quot; src=&quot;https://pganalyze.com/static/4c827dde6b120965840518cb121d4b51/1d69c/issues_example.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;As you can see, we get a recommendation for a single multi column index that covers all columns that are in the WHERE clause, except for the column that’s inside the OR condition. This is the best index that we can create to ensure the query runs fast.&lt;/p&gt;
&lt;p&gt;At launch the index advisor is focused on recommending B-tree indexes, with support for other index types coming soon.&lt;/p&gt;
&lt;p&gt;Note that the index advisor also understands common query patterns like filtering out records based on &lt;code &gt;deleted_at&lt;/code&gt; column, and recommends partial indexes for these queries:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Example of a partial index in pganalyze Index Advisor&quot; title=&quot;Example of a partial index in pganalyze Index Advisor&quot; src=&quot;https://pganalyze.com/static/9178eaa81f801c6e25b500d46bca9027/1d69c/partial_index_example.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;h2 id=&quot;review-existing-indexes-with-the-index-advisor&quot; &gt;&lt;a href=&quot;#review-existing-indexes-with-the-index-advisor&quot; aria-label=&quot;review existing indexes with the index advisor permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Review existing indexes with the Index Advisor&lt;/h2&gt;
&lt;p&gt;The pganalyze Index Advisor is also able to determine how different existing indexes perform, to help you understand which index Postgres will most likely use.&lt;/p&gt;
&lt;p&gt;For example, imagine a schema and index definition like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; events&lt;span &gt;(&lt;/span&gt;
  id bigserial &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  created_at timestamptz&lt;span &gt;,&lt;/span&gt;
  severity &lt;span &gt;smallint&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  organization_id &lt;span &gt;bigint&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  description &lt;span &gt;text&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  details jsonb
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; events&lt;span &gt;(&lt;/span&gt;organization_id&lt;span &gt;,&lt;/span&gt; severity&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We want to understand how effective this index is for queries that only query the “severity” column, without looking up a particular organization.&lt;/p&gt;
&lt;p&gt;With the index advisor, we can see the cost difference between the indexes, and that Postgres prefers using the single-column index in most situations:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Example of an existing index comparison in pganalyze Index Advisor&quot; title=&quot;Example of an existing index comparison in pganalyze Index Advisor&quot; src=&quot;https://pganalyze.com/static/116afeaf141d2fe48d8d6ca43b303d41/1d69c/existing_index.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;This can be explained by the fact that single-column indexes are usually smaller, and it’s more efficient, especially in older Postgres releases, to find index records when the queried column is listed first in the column list. You may still choose to use a multi-column index, but this helps you understand the trade-off.&lt;/p&gt;
&lt;p&gt;
&lt;a src=&quot;https://pganalyze.com/index-advisor&quot;&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;pganalyze Index Advisor promotion banner&quot; title=&quot;pganalyze Index Advisor promotion banner&quot; src=&quot;https://pganalyze.com/static/7dad04148f9e0117c49a306ff9ab40b1/acb04/promo_index_advisor.jpg&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id=&quot;try-out-the-index-advisor-for-free-with-the-standalone-tool&quot; &gt;&lt;a href=&quot;#try-out-the-index-advisor-for-free-with-the-standalone-tool&quot; aria-label=&quot;try out the index advisor for free with the standalone tool permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Try out the Index Advisor for free with the standalone tool&lt;/h2&gt;
&lt;p&gt;Want to try out the index advisor yourself? As mentioned above, we developed a standalone version of the index advisor that runs fully in your web browser, powered by our self-contained Postgres planner compiled to WebAssembly.&lt;/p&gt;
&lt;p&gt;You can simply go to &lt;a href=&quot;https://pganalyze.com/index-advisor&quot;&gt;https://pganalyze.com/index-advisor&lt;/a&gt;, paste your query and schema, and get your recommendations. If you don’t have a query and schema ready, for example because you are reading this on your mobile phone, you can take a look at how it works with a set of examples we added for your convenience.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Standalone pganalyze Index Advisor tool&quot; title=&quot;Standalone pganalyze Index Advisor tool&quot; src=&quot;https://pganalyze.com/static/6ff47e98c45f1c5dbfb29ee1753fc2c0/1d69c/standalone_start.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;We’ve also ensured that the standalone tool is ready for collaboration. If you want to share index recommendations with your team, simply click the [Share] button. After you confirm, this uploads the result of the index advisor to the pganalyze servers for sharing, and gives you a unique URL to share. Note that unless you share, all data stays local within your web browser.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Share function of standalone pganalyze Index Advisor tool&quot; title=&quot;Share function of standalone pganalyze Index Advisor tool&quot; src=&quot;https://pganalyze.com/static/968b5ff0baa5f327151a350d2e2f3921/1d69c/standalone_share.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Of course, copying query texts can be tedious and a lot of work. But, if you are a &lt;a href=&quot;https://pganalyze.com&quot;&gt;pganalyze&lt;/a&gt; customer, we already have your query information in our app. The second part of today’s launch is about the new &lt;strong&gt;in-app pganalyze Index Advisor&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&quot;automatic-index-advisor-for-your-production-queries-in-pganalyze&quot; &gt;&lt;a href=&quot;#automatic-index-advisor-for-your-production-queries-in-pganalyze&quot; aria-label=&quot;automatic index advisor for your production queries in pganalyze permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Automatic index advisor for your production queries in pganalyze&lt;/h2&gt;
&lt;p&gt;With the new Index Advisor in pganalyze, you can now see at a glance what index recommendations exist for each of your queries. You can simply go to the query details page for your queries, and see what the Index Advisor recommends:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;In-app screenshot of pganalyze Index Advisor&quot; title=&quot;In-app screenshot of pganalyze Index Advisor&quot; src=&quot;https://pganalyze.com/static/9716fe29cfa7a4c02ff8c358f54cdd98/1d69c/in_app_advisor.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;This is really nice, but we already have work underway to help you get an even better assessment of index usage summarized &lt;strong&gt;across your whole database&lt;/strong&gt;. But more on that soon (sign up for the newsletter if you want to get updates about this).&lt;/p&gt;
&lt;h2 id=&quot;pganalyze-index-advisor-and-new-pricing-plans&quot; &gt;&lt;a href=&quot;#pganalyze-index-advisor-and-new-pricing-plans&quot; aria-label=&quot;pganalyze index advisor and new pricing plans permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;pganalyze Index Advisor and new pricing plans&lt;/h2&gt;
&lt;p&gt;The pganalyze Index Advisor represents a significant improvement to the core functionality of pganalyze, and introduces additional sophisticated processing for each query received by pganalyze. We are therefore taking this moment to introduce both a &lt;a src=&quot;https://pganalyze.com/pricing&quot;&gt;new Production and a new Scale plan&lt;/a&gt;. In addition to the &lt;strong&gt;Index Advisor&lt;/strong&gt;, the new Scale plan also features &lt;strong&gt;SAML-based Single Sign On&lt;/strong&gt; in early access, to integrate with identity providers such as Okta.&lt;/p&gt;
&lt;p&gt;If you are an existing pganalyze customer on (what is now) a legacy plan you can try out the Index Advisor until the end of October 2021. Trying out the Index Advisor requires no changes to your existing pganalyze integration.&lt;/p&gt;
&lt;p&gt;If you do not have an account with us at the moment but sign up for a new trial the pganalyze Index Advisor will be activated for your 14-day trial. Try it out today in the pganalyze app, or &lt;a href=&quot;https://app.pganalyze.com/users/sign_up&quot;&gt;start a new trial&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;All of us at pganalyze are excited to share the new pganalyze Index Advisor with you. Try out the standalone tool or explore the new in-app functionality today. We hope the standalone tool is a service you will come back to time and again and get value out of it. Feel free to bookmark it!&lt;/p&gt;
&lt;p&gt;You can provide feedback through our &lt;a href=&quot;https://github.com/pganalyze/index-advisor-feedback/discussions&quot;&gt;dedicated discussion board on GitHub&lt;/a&gt;, or send us a support request for in-app functionality. We look forward to hearing from you.&lt;/p&gt;
&lt;p&gt;If you want to share this article with your peers, feel free to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%E2%80%9DA%20better%20way%20for%20indexing%20your%20Postgres%20database%22%20-%20In%20this%20article,%20%40pganalyze%20share%20how%20they%20approached%20building%20their%20new%20index%20advisor%20for%20%23Postgres%20and%20give%20you%20free%20access%20to%20it%3A%20https://pganalyze.com/blog/introducing-pganalyze-index-advisor&quot;&gt;tweet it&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Using Postgres CREATE INDEX: Understanding operator classes, index types & more]]></title><description><![CDATA[Most developers working with databases know the challenge: New code gets deployed to production, and suddenly the application is slow. We investigate, look at our APM tools and our database monitoring, and we find out that the new code caused a new query to be issued. We investigate further, and discover the query is not able to use an index. But what makes an index usable by a query, and how can we add the right index in Postgres? In this post we’ll look at the practical aspects of using the…]]></description><link>https://pganalyze.com/blog/postgres-create-index</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-create-index</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Thu, 12 Aug 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Most developers working with databases know the challenge: New code gets deployed to production, and suddenly the application is slow. We investigate, look at our APM tools and our database monitoring, and we find out that the new code caused a new query to be issued. We investigate further, and discover the query is not able to use an index.&lt;/p&gt;
&lt;p&gt;But what makes an index usable by a query, and how can we add the right index in Postgres?&lt;/p&gt;
&lt;p&gt;In this post we’ll look at the practical aspects of using the &lt;code &gt;CREATE INDEX&lt;/code&gt; command, as well as how you can &lt;strong&gt;analyze a PostgreSQL query for its operators and data types&lt;/strong&gt;, so you can choose the best index definition.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/b276e42dae661e98b7fbb885bd7609ac/aa440/postgres-create-index.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Representation of a Postgres query compared to the matching index definition&quot; title=&quot;Representation of a Postgres query compared to the matching index definition&quot; src=&quot;https://pganalyze.com/static/b276e42dae661e98b7fbb885bd7609ac/1d69c/postgres-create-index.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#how-do-you-create-an-index-in-postgres&quot;&gt;How do you create an index in Postgres?&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#parse-analysis-how-postgres-interprets-your-query&quot;&gt;Parse analysis: How Postgres interprets your query&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#looking-behind-the-scenes-operators-and-data-types&quot;&gt;Looking behind the scenes: Operators and data types&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#finding-the-right-index-type&quot;&gt;Finding the right index type&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#specifying-operator-classes-during-create-index&quot;&gt;Specifying operator classes during CREATE INDEX&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#specifying-multiple-columns-when-adding-a-postgres-index&quot;&gt;Specifying multiple columns when adding a Postgres index&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#using-functions-and-expressions-in-an-index-definition&quot;&gt;Using functions and expressions in an index definition&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#specifying-a-where-clause-to-create-partial-postgresql-indexes&quot;&gt;Specifying a WHERE clause to create partial PostgreSQL indexes&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#using-include-to-create-a-covering-index-for-index-only-scans&quot;&gt;Using INCLUDE to create a covering index for Index-Only Scans&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#adding-and-dropping-postgresql-indexes-safely-on-production&quot;&gt;Adding and dropping PostgreSQL indexes safely on production&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;how-do-you-create-an-index-in-postgres&quot; &gt;&lt;a href=&quot;#how-do-you-create-an-index-in-postgres&quot; aria-label=&quot;how do you create an index in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How do you create an index in Postgres?&lt;/h2&gt;
&lt;p&gt;Before we dive into the internals, let’s set the stage and look at the most basic way of creating an index in Postgres. The essence of adding an index is this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;table&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;column1&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For an actual example, let’s say we have a query on our users table that looks for a particular email address:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; users &lt;span &gt;WHERE&lt;/span&gt; users&lt;span &gt;.&lt;/span&gt;email &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;test@example.com&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can see this query is searching for values in the “email” column - so the index we should create is on that particular column:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When we run this command, Postgres will create an index for us.&lt;/p&gt;
&lt;p&gt;It&apos;s important to &lt;strong&gt;remember that indexes are redundant data structures&lt;/strong&gt;. If you drop an index you don&apos;t lose any data. The primary benefit of an index is to allow faster searching of particular rows in a table. The alternative to having an index is to have Postgres scan each row individually (&quot;Sequential Scan&quot;), which is of course very slow for large tables.&lt;/p&gt;
&lt;p&gt;Let&apos;s take a look behind the scenes of how Postgres determines whether to use an index.&lt;/p&gt;
&lt;h3 id=&quot;parse-analysis-how-postgres-interprets-your-query&quot; &gt;&lt;a href=&quot;#parse-analysis-how-postgres-interprets-your-query&quot; aria-label=&quot;parse analysis how postgres interprets your query permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Parse analysis: How Postgres interprets your query&lt;/h3&gt;
&lt;p&gt;When Postgres runs our query, it steps through multiple stages. At a high level, they are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Parsing (see our &lt;a href=&quot;https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser#how-pg_query-turns-a-postgres-statement-into-a-parse-tree&quot;&gt;blog post on the Postgres parser&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Parse analysis&lt;/li&gt;
&lt;li&gt;Planning&lt;/li&gt;
&lt;li&gt;Execution&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Throughout these stages the query is no longer just text - it&apos;s represented as a tree. Each stage modifies and annotates the tree structure, until it&apos;s finally executed. For understanding Postgres index usage, we need to first understand what &lt;strong&gt;parse analysis&lt;/strong&gt; does.&lt;/p&gt;
&lt;p&gt;Lets pick a slightly more complex example:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; users &lt;span &gt;WHERE&lt;/span&gt; users&lt;span &gt;.&lt;/span&gt;email &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;test@example.com&apos;&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; users&lt;span &gt;.&lt;/span&gt;deleted_at &lt;span &gt;IS&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can look at the result of parse analysis by turning on the &lt;code &gt;debug_print_parse&lt;/code&gt; setting, and then looking at the Postgres logs (not recommended on production databases):&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;LOG:  parse tree:
DETAIL:     {QUERY 
	   ...
	      :quals 
	         {BOOLEXPR 
	         :boolop and 
	         :args (
	            {OPEXPR 
	            :opno 98 
	            :opfuncid 67 
	            :opresulttype 16 
	            :opretset false 
	            :opcollid 0 
	            :inputcollid 100 
	            :args (
	               ...
	            )
	            :location 38
	            }
	            {NULLTEST 
	            :arg 
	               ...
	            :nulltesttype 0 
	            :argisrow false 
	            :location 80
	            }
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This format is a bit hard to read - let’s look at it in a more visual way, and with names instead of OIDs:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/ded2511f7ca6069b5cb8495723df4156/aa440/postgres-parse-analysis.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Parse analysis visualization of the query&quot; title=&quot;Parse analysis visualization of the query&quot; src=&quot;https://pganalyze.com/static/ded2511f7ca6069b5cb8495723df4156/1d69c/postgres-parse-analysis.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;We can see two important parse nodes here, one for each expression in the &lt;code &gt;WHERE&lt;/code&gt; clause. The &lt;code &gt;OpExpr&lt;/code&gt; node, and the &lt;code &gt;NullTest&lt;/code&gt; node. For now, let&apos;s focus on the &lt;code &gt;OpExpr&lt;/code&gt; node.&lt;/p&gt;
&lt;h3 id=&quot;looking-behind-the-scenes-operators-and-data-types&quot; &gt;&lt;a href=&quot;#looking-behind-the-scenes-operators-and-data-types&quot; aria-label=&quot;looking behind the scenes operators and data types permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Looking behind the scenes: Operators and data types&lt;/h3&gt;
&lt;p&gt;It&apos;s important to remember that Postgres is an &lt;strong&gt;object-relational&lt;/strong&gt; database system. That is, it&apos;s designed from the ground up to be extensible. Many of the references that are added in parse analysis are not hard-coded logic, but instead reference actual database objects in the Postgres catalog tables.&lt;/p&gt;
&lt;p&gt;The two most important objects to know about are &lt;strong&gt;data types&lt;/strong&gt; and &lt;strong&gt;operators&lt;/strong&gt;. You are most likely familiar with data types in Postgres, for example you have used them when specifying the schema for your table. Operators in Postgres define how particular comparisons between one or two values, for example in a WHERE clause, are implemented.&lt;/p&gt;
&lt;p&gt;The &lt;code &gt;OpExpr&lt;/code&gt; node represents an expression that uses an operator to compare one or two values of a given type. In this case you can see we are using the &lt;code &gt;=(text, text)&lt;/code&gt; operator. This operator utilizes the &lt;code &gt;=&lt;/code&gt; symbol as its name, and has a &lt;code &gt;text&lt;/code&gt; data type on the left and right of the operator.&lt;/p&gt;
&lt;p&gt;We can query the &lt;code &gt;pg_operator&lt;/code&gt; table to see details about it, including which function implements the operator:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; oid&lt;span &gt;,&lt;/span&gt; oid::regoperator&lt;span &gt;,&lt;/span&gt; oprcode&lt;span &gt;,&lt;/span&gt; oprnegate::regoperator
  &lt;span &gt;FROM&lt;/span&gt; pg_operator
 &lt;span &gt;WHERE&lt;/span&gt; oprname &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;=&apos;&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; oprleft &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;text&apos;&lt;/span&gt;::regtype &lt;span &gt;AND&lt;/span&gt; oprright &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;text&apos;&lt;/span&gt;::regtype&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; oid |     oid      | oprcode |   oprnegate   
-----+--------------+---------+---------------
  98 | =(text,text) | texteq  | &amp;lt;&gt;(text,text)
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And if you really want to know what’s happening, you can look up the operator&apos;s underlying &lt;code &gt;texteq&lt;/code&gt; function in &lt;a href=&quot;https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/utils/adt/varlena.c#L1745&quot;&gt;the Postgres source&lt;/a&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
 * Comparison functions for text strings.
 */&lt;/span&gt;
Datum
&lt;span &gt;texteq&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PG_FUNCTION_ARGS&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
    &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;lc_collate_is_c&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;collid&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt;
		collid &lt;span &gt;==&lt;/span&gt; DEFAULT_COLLATION_OID &lt;span &gt;||&lt;/span&gt;
		&lt;span &gt;pg_newlocale_from_collation&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;collid&lt;span &gt;)&lt;/span&gt;&lt;span &gt;-&gt;&lt;/span&gt;deterministic&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;{&lt;/span&gt;
        &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
    	result &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;memcmp&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;VARDATA_ANY&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;targ1&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;VARDATA_ANY&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;targ2&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
				  len1 &lt;span &gt;-&lt;/span&gt; VARHDRSZ&lt;span &gt;)&lt;/span&gt; &lt;span &gt;==&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
        &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;
    &lt;span &gt;else&lt;/span&gt;
	&lt;span &gt;{&lt;/span&gt;
        &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
        result &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;text_cmp&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;arg1&lt;span &gt;,&lt;/span&gt; arg2&lt;span &gt;,&lt;/span&gt; collid&lt;span &gt;)&lt;/span&gt; &lt;span &gt;==&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
        &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;
    &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That function illustrates nicely how Postgres considers the collation to determine whether it can do a fast comparison that simply compares bytes, or whether it has to do a more expensive full text comparison. As we can see from the source, using a C locale for your collation can yield performance benefits.&lt;/p&gt;
&lt;p&gt;Of course you can also define your own custom operators that work on your own custom data types. Postgres is extensible like that, and that’s actually pretty neat.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operators are essential for creating the right index.&lt;/strong&gt; The operator that is used by an expression is the most important detail, besides the column name, that indicates whether a particular index can be used.&lt;/p&gt;
&lt;p&gt;You can think of operators as the &quot;how&quot; we want to search the table for values. For example, we may use a simple &lt;code &gt;=&lt;/code&gt; operator to match values for equality against an input value. Or we may utilize a more complex operator, such as &lt;code &gt;@@&lt;/code&gt; to perform a text search on a tsvector column.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/postgres-indexing&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Effective Indexing in Postgres&quot;
        title=&quot;Download Free eBook: Effective Indexing in Postgres&quot;
        src=&quot;https://pganalyze.com/static/97b01777597bdcba8b1803935f1b7da0/acb04/ebook_promo_postgres_create_index.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;finding-the-right-index-type&quot; &gt;&lt;a href=&quot;#finding-the-right-index-type&quot; aria-label=&quot;finding the right index type permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Finding the right index type&lt;/h2&gt;
&lt;p&gt;When you think of an index type, it&apos;s important to remember that it&apos;s ultimately a specific data structure that supports a specific, limited set of search operators. For example, the most common index type in Postgres, the B-tree index, supports the &lt;code &gt;=&lt;/code&gt; operator as well as the range comparison operators (&lt;code &gt;&amp;lt;&lt;/code&gt;, &lt;code &gt;&amp;lt;=&lt;/code&gt;, &lt;code &gt;=&gt;&lt;/code&gt;, &lt;code &gt;&gt;&lt;/code&gt;), and the &lt;code &gt;~&lt;/code&gt; and &lt;code &gt;~*&lt;/code&gt; operators in some cases. It does not support any other operators.&lt;/p&gt;
&lt;p&gt;Let&apos;s say we have a &lt;code &gt;tsvector&lt;/code&gt; column on our &lt;code &gt;users&lt;/code&gt; table, and we use the &lt;code &gt;@@&lt;/code&gt; operator to search the column:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; users &lt;span &gt;WHERE&lt;/span&gt; about_text_search @@ to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;index&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Even if I create an index, it keeps doing a sequential scan:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users&lt;span &gt;(&lt;/span&gt;about_text_search&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;pgaweb=# EXPLAIN SELECT * FROM users WHERE about_text_search @@ to_tsquery(&apos;index&apos;);
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on users  (cost=10000000000.00..10000000006.51 rows=1 width=4463)
   Filter: (about_text_search @@ to_tsquery(&apos;index&apos;::text))
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is because a B-tree index does not have the correct data structure to support text searches. There is no operator class that matches B-Tree indexes and the &lt;code &gt;@@(tsvector,tsquery)&lt;/code&gt; operator.&lt;/p&gt;
&lt;p&gt;Like earlier, thanks to Postgres extensibility, we can introspect the system to understand operator classes. &lt;strong&gt;Which index type can support the &lt;code &gt;@@&lt;/code&gt; operator on a tsvector column?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We can query the internal tables to answer this question:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; am&lt;span &gt;.&lt;/span&gt;amname &lt;span &gt;AS&lt;/span&gt; index_method&lt;span &gt;,&lt;/span&gt;
       opf&lt;span &gt;.&lt;/span&gt;opfname &lt;span &gt;AS&lt;/span&gt; opfamily_name&lt;span &gt;,&lt;/span&gt;
       amop&lt;span &gt;.&lt;/span&gt;amopopr::regoperator &lt;span &gt;AS&lt;/span&gt; opfamily_operator
  &lt;span &gt;FROM&lt;/span&gt; pg_am am&lt;span &gt;,&lt;/span&gt;
       pg_opfamily opf&lt;span &gt;,&lt;/span&gt;
       pg_amop amop
 &lt;span &gt;WHERE&lt;/span&gt; opf&lt;span &gt;.&lt;/span&gt;opfmethod &lt;span &gt;=&lt;/span&gt; am&lt;span &gt;.&lt;/span&gt;oid &lt;span &gt;AND&lt;/span&gt; amop&lt;span &gt;.&lt;/span&gt;amopfamily &lt;span &gt;=&lt;/span&gt; opf&lt;span &gt;.&lt;/span&gt;oid
       &lt;span &gt;AND&lt;/span&gt; amop&lt;span &gt;.&lt;/span&gt;amopopr &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;@@(tsvector,tsquery)&apos;&lt;/span&gt;::regoperator&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; index_method | opfamily_name |  opfamily_operator   
--------------+---------------+----------------------
 gist         | tsvector_ops  | @@(tsvector,tsquery)
 gin          | tsvector_ops  | @@(tsvector,tsquery)
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Looks like we need either a GIN or GIST index! We can create a GIN index like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;USING&lt;/span&gt; gin &lt;span &gt;(&lt;/span&gt;about_text_search&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And voilà, it can be used by the query:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;=# EXPLAIN SELECT * FROM users WHERE about_text_search @@ to_tsquery(&apos;index&apos;);
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Bitmap Heap Scan on users  (cost=8.25..12.51 rows=1 width=4463)
   Recheck Cond: (about_text_search @@ to_tsquery(&apos;index&apos;::text))
   -&gt;  Bitmap Index Scan on users_about_text_search_idx1  (cost=0.00..8.25 rows=1 width=0)
         Index Cond: (about_text_search @@ to_tsquery(&apos;index&apos;::text))
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What&apos;s that &lt;code &gt;tsvector_ops&lt;/code&gt; name we saw in the internal Postgres table?&lt;/p&gt;
&lt;p&gt;That&apos;s how index types are linked to operators, using operator families and operator classes. For a given operator, there can be multiple different operator classes - an operator class defines how data is represented for a particular index type, and how the search operation for that index works to implement the operator used in a query.&lt;/p&gt;
&lt;h2 id=&quot;specifying-operator-classes-during-create-index&quot; &gt;&lt;a href=&quot;#specifying-operator-classes-during-create-index&quot; aria-label=&quot;specifying operator classes during create index permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Specifying operator classes during CREATE INDEX&lt;/h2&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/b241326f6c407f6efae549183a798b2a/aa440/postgres-operator-class.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;CREATE INDEX command with default operator class&quot; title=&quot;CREATE INDEX command with default operator class&quot; src=&quot;https://pganalyze.com/static/b241326f6c407f6efae549183a798b2a/1d69c/postgres-operator-class.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;p&gt;For example let’s look at &lt;code &gt;=(text,text)&lt;/code&gt;, which is the operator used in an earlier query:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; am&lt;span &gt;.&lt;/span&gt;amname &lt;span &gt;AS&lt;/span&gt; index_method&lt;span &gt;,&lt;/span&gt;
       opf&lt;span &gt;.&lt;/span&gt;opfname &lt;span &gt;AS&lt;/span&gt; opfamily_name&lt;span &gt;,&lt;/span&gt;
       amop&lt;span &gt;.&lt;/span&gt;amopopr::regoperator &lt;span &gt;AS&lt;/span&gt; opfamily_operator
  &lt;span &gt;FROM&lt;/span&gt; pg_am am&lt;span &gt;,&lt;/span&gt;
       pg_opfamily opf&lt;span &gt;,&lt;/span&gt;
       pg_amop amop
 &lt;span &gt;WHERE&lt;/span&gt; opf&lt;span &gt;.&lt;/span&gt;opfmethod &lt;span &gt;=&lt;/span&gt; am&lt;span &gt;.&lt;/span&gt;oid &lt;span &gt;AND&lt;/span&gt; amop&lt;span &gt;.&lt;/span&gt;amopfamily &lt;span &gt;=&lt;/span&gt; opf&lt;span &gt;.&lt;/span&gt;oid
       &lt;span &gt;AND&lt;/span&gt; amop&lt;span &gt;.&lt;/span&gt;amopopr &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;=(text,text)&apos;&lt;/span&gt;::regoperator&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; index_method |  opfamily_name   | opfamily_operator 
--------------+------------------+-------------------
 btree        | text_ops         | =(text,text)
 hash         | text_ops         | =(text,text)
 btree        | text_pattern_ops | =(text,text)
 hash         | text_pattern_ops | =(text,text)
 spgist       | text_ops         | =(text,text)
 brin         | text_minmax_ops  | =(text,text)
 gist         | gist_text_ops    | =(text,text)
(7 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see there is a default operator class (&lt;code &gt;text_ops&lt;/code&gt;) that gets used when you don’t explicitly specify it - for text columns the default operator class is often all you need.&lt;/p&gt;
&lt;p&gt;But there are cases where we want to set a particular operator class. For example, let&apos;s say we run a LIKE query on our database, and our database happens to use the en_US.UTF-8 collation - in that case, you will see the LIKE query is not actually able to use an index:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;pgaweb=# EXPLAIN SELECT * FROM users WHERE email LIKE &apos;lukas@%&apos;;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on users  (cost=10000000000.00..10000000001.26 rows=1 width=4463)
   Filter: ((email)::text ~~ &apos;lukas@%&apos;::text)
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Generally, LIKE queries are challenging to index, but if you do not have a leading wildcard, an index can be created that works for them - but you need to either (1) use the C locale on your database (effectively saying you don’t want language-specific text sorting/comparison), or (2) use the &lt;code &gt;text_pattern_ops&lt;/code&gt; operator class.&lt;/p&gt;
&lt;p&gt;Let’s create the same index, but this time specify the &lt;code &gt;text_pattern_ops&lt;/code&gt; operator class:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;email text_pattern_ops&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;pgaweb=# EXPLAIN SELECT * FROM users WHERE email LIKE &apos;lukas@%&apos;;
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Index Scan using users_email_idx on users  (cost=0.14..8.16 rows=1 width=4463)
   Index Cond: (((email)::text ~&gt;=~ &apos;lukas@&apos;::text) AND ((email)::text ~&amp;lt;~ &apos;lukasA&apos;::text))
   Filter: ((email)::text ~~ &apos;lukas@%&apos;::text)
(3 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see now the same &lt;code &gt;LIKE&lt;/code&gt; query can use the index.&lt;/p&gt;
&lt;p&gt;Now that we know our index type and operator class for our columns, let&apos;s look at a few other aspects of creating an index.&lt;/p&gt;
&lt;h2 id=&quot;specifying-multiple-columns-when-adding-a-postgres-index&quot; &gt;&lt;a href=&quot;#specifying-multiple-columns-when-adding-a-postgres-index&quot; aria-label=&quot;specifying multiple columns when adding a postgres index permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Specifying multiple columns when adding a Postgres index&lt;/h2&gt;
&lt;p&gt;One essential feature is the option to add multiple columns to an index definition.&lt;/p&gt;
&lt;p&gt;You can do it simply like that:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;CREATE INDEX ON [table] ([column_a], [column_b]);&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But what does that actually do? Turns out it’s dependent on the index type. Each index type has a different representation for multiple columns in its data structure. And some index types like BRIN or Hash do not support multiple columns.&lt;/p&gt;
&lt;p&gt;However with the most common index type, B-tree, multi-column indexes work well, and they are commonly used. &lt;strong&gt;The most important thing to know for multi-column B-tree indexes&lt;/strong&gt;: Column order matters. If you have some queries that only utilize &lt;code &gt;column_a&lt;/code&gt;, but all queries utilize &lt;code &gt;column_b&lt;/code&gt;, you should put &lt;code &gt;column_b&lt;/code&gt; first in your index definition. If you don’t follow this rule, you will end up with queries doing a lot more work because they have to skip over all the earlier columns that they can’t filter on. With GIST indexes on the other hand, this does not matter - and you can specify columns in any order.&lt;/p&gt;
&lt;p&gt;Another decision to make is: &lt;strong&gt;Should I create multiple indexes, one for each column I’m querying by, or should I create a single multi-column index?&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;table&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;column_a&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;table&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;column_b&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;--- or&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;table&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;column_a&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;column_b&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When looking at an individual query, the answer will almost always be: Create a single multi-column index that matches the query. It will be faster than having multiple indexes.&lt;/p&gt;
&lt;p&gt;But if you have a larger workload, it may make sense to create multiple single-column indexes. Be aware that Postgres will have to do more work in that case, and you should verify what indexes actually get chosen by looking at your EXPLAIN plans.&lt;/p&gt;
&lt;h2 id=&quot;using-functions-and-expressions-in-an-index-definition&quot; &gt;&lt;a href=&quot;#using-functions-and-expressions-in-an-index-definition&quot; aria-label=&quot;using functions and expressions in an index definition permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using functions and expressions in an index definition&lt;/h2&gt;
&lt;p&gt;Stepping back from specific index types for a moment: Postgres has a universal feature that applies to all index types, that&apos;s pretty useful: Instead of indexing a particular column&apos;s value, you can index an expression that references the column&apos;s data.&lt;/p&gt;
&lt;p&gt;For example, we might typically compare our user email addresses with the &lt;code &gt;lower(..)&lt;/code&gt; function:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; users &lt;span &gt;WHERE&lt;/span&gt; lower&lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; $&lt;span &gt;1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you were to run EXPLAIN on this, you would notice that Postgres is not able to use a simple index on &lt;code &gt;email&lt;/code&gt; here - since it doesn’t match the expression.&lt;/p&gt;
&lt;p&gt;But since &lt;code &gt;lower(..)&lt;/code&gt; is what’s called a &quot;immutable&quot; function, we can use it to create an expression index, that indexes all values of email with their lower-case form:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;lower&lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now our query will be able to use the index. Note that this does not work for all functions. For example, if you were to create an index on &lt;code &gt;now()&lt;/code&gt;, it would fail:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;&lt;span &gt;now&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;ERROR:  functions in index expression must be marked IMMUTABLE&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Additionally, &lt;strong&gt;remember that expression indexes only work when they match the query.&lt;/strong&gt; If we only have an index on &lt;code &gt;lower(email)&lt;/code&gt;, a query that simply references &lt;code &gt;email&lt;/code&gt; won’t be able to use the index.&lt;/p&gt;
&lt;h2 id=&quot;specifying-a-where-clause-to-create-partial-postgresql-indexes&quot; &gt;&lt;a href=&quot;#specifying-a-where-clause-to-create-partial-postgresql-indexes&quot; aria-label=&quot;specifying a where clause to create partial postgresql indexes permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Specifying a WHERE clause to create partial PostgreSQL indexes&lt;/h2&gt;
&lt;p&gt;Let’s return to an example we saw at the beginning of the post - but now let’s look at the &lt;code &gt;NullTest&lt;/code&gt; expression:&lt;/p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/ded2511f7ca6069b5cb8495723df4156/aa440/postgres-parse-analysis.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Parse analysis visualization of the query&quot; title=&quot;Parse analysis visualization of the query&quot; src=&quot;https://pganalyze.com/static/ded2511f7ca6069b5cb8495723df4156/1d69c/postgres-parse-analysis.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;p&gt;Here we are making sure we only get rows that are not yet marked as deleted by our application. Depending on your workload, this may be a very large number of rows that needs to be skipped over.&lt;/p&gt;
&lt;p&gt;Whilst you could create an index that includes the &lt;code &gt;deleted_at&lt;/code&gt; column, it would be quite wasteful to have all these index entries that you don’t actually want to ever look at.&lt;/p&gt;
&lt;p&gt;Postgres has a better way: With partial indexes, you can restrict for which rows the index has index entries. When the restriction does not apply, the row won’t be saved to the index, saving space. And during query execution, this also acts as a significant time saver in many cases, since the planner can do a simple check to determine which partial indexes match, and ignore all that don&apos;t match.&lt;/p&gt;
&lt;p&gt;In practice, all you need to do is add a &lt;code &gt;WHERE&lt;/code&gt; clause to your index definition:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users&lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; deleted_at &lt;span &gt;IS&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are reasons why you may not want to do that though:&lt;/p&gt;
&lt;p&gt;First, adding this restriction means that only queries that contain &lt;code &gt;deleted_at IS NULL&lt;/code&gt; will be able to use the index. That means you may need two indexes, one with that restriction and the other without.&lt;/p&gt;
&lt;p&gt;Second, adding hundreds or thousands of partial indexes causes overhead in the Postgres planner, as it has to do a more expensive analysis to determine which indexes can be used.&lt;/p&gt;
&lt;h2 id=&quot;using-include-to-create-a-covering-index-for-index-only-scans&quot; &gt;&lt;a href=&quot;#using-include-to-create-a-covering-index-for-index-only-scans&quot; aria-label=&quot;using include to create a covering index for index only scans permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using INCLUDE to create a covering index for Index-Only Scans&lt;/h2&gt;
&lt;p&gt;Last but not least, let’s talk about a more recent addition to Postgres: The &lt;code &gt;INCLUDE&lt;/code&gt; keyword that can be added to &lt;code &gt;CREATE INDEX&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Before we look at what this keyword does, let’s understand the difference between an Index Scan and an Index-Only Scan. An Index-Only Scan is possible when all data that is needed can be retrieved from the index itself - instead of having to fetch it from disk.&lt;/p&gt;
&lt;p&gt;Note that Index-Only scans only work when the table has been recently VACUUMed - otherwise Postgres will need to check visibility too often for each index entry, and therefore does not opt to use Index-Only Scans, preferring an Index Scan instead in most cases.&lt;/p&gt;
&lt;p&gt;Let&apos;s look at two examples - one query that matches an index fully, and one that does not (because of the target list):&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;email&lt;span &gt;,&lt;/span&gt; id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;=# EXPLAIN SELECT id FROM users WHERE email = &apos;test@example.com&apos;;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Index Only Scan using users_email_id_idx on users  (cost=0.14..4.16 rows=1 width=4)
   Index Cond: (email = &apos;test@example.com&apos;::text)
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;=# EXPLAIN SELECT id, fullname FROM users WHERE email = &apos;test@example.com&apos;;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Index Scan using users_email_id_idx on users  (cost=0.14..8.15 rows=1 width=520)
   Index Cond: ((email)::text = &apos;test@example.com&apos;::text)
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, to get an Index Only Scan for the second query we can create an index that includes that column at the end - and that makes Postgres use an Index Only Scan:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;email&lt;span &gt;,&lt;/span&gt; id&lt;span &gt;,&lt;/span&gt; fullname&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;=# EXPLAIN SELECT id, fullname FROM users WHERE email = &apos;test@example.com&apos;;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Index Only Scan using users_email_id_fullname_idx on users  (cost=0.14..4.16 rows=1 width=520)
   Index Cond: (email = &apos;test@example.com&apos;::text)
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;However, doing this has a few restrictions: It doesn’t work if you have unique indexes (since any column would modify what’s being checked for being unique), and it bloats the data stored in the index for searching.&lt;/p&gt;
&lt;p&gt;For B-tree indexes the new INCLUDE keyword is the better approach:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;email&lt;span &gt;,&lt;/span&gt; id&lt;span &gt;)&lt;/span&gt; INCLUDE &lt;span &gt;(&lt;/span&gt;fullname&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This keeps the overhead for such additional columns slightly lower, works without problems with UNIQUE constraint indexes, and clearly communicates the intent: That you only added a column in order to support Index Only Scans.&lt;/p&gt;
&lt;p&gt;This is a feature best used sparingly: Adding more data to the index means larger index values, which on its own can be a problem - it’s usually not a good idea to just add a lot of columns to the INCLUDE clause for an index.&lt;/p&gt;
&lt;h2 id=&quot;adding-and-dropping-postgresql-indexes-safely-on-production&quot; &gt;&lt;a href=&quot;#adding-and-dropping-postgresql-indexes-safely-on-production&quot; aria-label=&quot;adding and dropping postgresql indexes safely on production permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Adding and dropping PostgreSQL indexes safely on production&lt;/h2&gt;
&lt;p&gt;I’ll end with a warning: Creating indexes on production databases requires a bit of thought. Not just which index definition to use, but also how to create them, and when to take the I/O impact of the new index being built.&lt;/p&gt;
&lt;p&gt;The most important thing: Remember that Postgres will take an exclusive lock when you simply run &lt;code &gt;CREATE INDEX&lt;/code&gt;, that will block all reads and writes to that table. That’s why Postgres has the special &lt;code &gt;CONCURRENTLY&lt;/code&gt; keyword. When you create an index on a table on production that already has data, always specify this keyword:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; CONCURRENTLY &lt;span &gt;ON&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; deleted_at &lt;span &gt;IS&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is the same when dropping an index with &lt;code &gt;DROP INDEX&lt;/code&gt; - adding &lt;code &gt;CONCURRENTLY&lt;/code&gt; reduces the locking requirements slightly, making it faster to use this operation on production.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this post you should have gotten a fundamental understanding of how operators and operator classes related to indexing, and why knowing these concepts is essential to creating the best index for complex queries. We also looked at a few complimentary features of the &lt;code &gt;CREATE INDEX&lt;/code&gt; command, that are typically needed when reasoning about which index to create.&lt;/p&gt;
&lt;p&gt;There are actually a few things we didn’t talk about: Adding indexes to specific tablespaces, using index storage parameters (especially useful for GIN index types!) and specifying the sort order for a particular column. I encourage you to take a further look at the &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createindex.html&quot;&gt;Postgres documentation for these topics&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22Postgres%20CREATE%20INDEX:%20Operator%20classes,%20index%20types%20and%20more%22%20-%20This%20post%20by%20%40pganalyze%20offers%20a%20behind%20the%20scenes%20look%20at%20adding%20the%20best%20index%20and%20explains%20how%20to%20match%20index%20definitions%20to%20queries%3A%20https://pganalyze.com/blog/postgres-create-index&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Efficient Pagination: PostgreSQL and Django]]></title><description><![CDATA[You could say most web frameworks take a naive approach to pagination. Using PostgreSQL’s COUNT, LIMIT, and OFFSET features for pagination works fine for the majority of web applications, but if you have tables with a million records or more, performance degrades quickly. Django is an excellent framework for building web applications, but its default pagination method falls into this trap at scale. In this article, I’ll help you understand Django’s pagination limitations and offer three…]]></description><link>https://pganalyze.com/blog/pagination-django-postgres</link><guid isPermaLink="false">https://pganalyze.com/blog/pagination-django-postgres</guid><dc:creator><![CDATA[Ryan Westerberg]]></dc:creator><pubDate>Tue, 20 Jul 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;You could say most web frameworks take a naive approach to pagination. Using PostgreSQL’s &lt;code &gt;COUNT&lt;/code&gt;, &lt;code &gt;LIMIT&lt;/code&gt;, and &lt;code &gt;OFFSET&lt;/code&gt; features for pagination works fine for the majority of web applications, but if you have tables with a million records or more, performance degrades quickly.&lt;/p&gt;
&lt;p&gt;Django is an excellent framework for building web applications, but its default pagination method falls into this trap at scale. In this article, I’ll help you understand Django’s pagination limitations and offer three alternative methods that will improve your application’s performance. Along the way, you’ll see the tradeoffs and use cases for each method so you can decide which is the best fit for your application.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#understanding-naive-postgresql-pagination&quot;&gt;Understanding Naive PostgreSQL Pagination&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#performance-of-naive-postgresql-pagination&quot;&gt;Performance of Naive PostgreSQL Pagination&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#pagination-handling-in-django&quot;&gt;Pagination Handling in Django&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#postgresql-pagination-in-django-option-1--removing-the-count-query&quot;&gt;PostgreSQL Pagination in Django: Option 1 – Removing the COUNT Query&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#postgresql-pagination-in-django-option-2--approximating-the-count&quot;&gt;PostgreSQL Pagination in Django: Option 2 – Approximating the COUNT&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#postgresql-pagination-in-django-option-3--keyset-pagination&quot;&gt;PostgreSQL Pagination in Django: Option 3 – Keyset Pagination&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#example-of-using-keyset-pagination&quot;&gt;Example of using keyset pagination&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#trade-offs-of-keyset-pagination&quot;&gt;Trade-offs of keyset pagination&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-the-dj-pagination-plugin-for-django&quot;&gt;Using the dj-pagination plugin for Django&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;understanding-naive-postgresql-pagination&quot; &gt;&lt;a href=&quot;#understanding-naive-postgresql-pagination&quot; aria-label=&quot;understanding naive postgresql pagination permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Understanding Naive PostgreSQL Pagination&lt;/h2&gt;
&lt;p&gt;Let’s have a look at an example of the type of pagination control that a web application might use:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/f8f05650640b79720865888a54c86aaa/b5a09/typical_pagination_navigation.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Typical pagination navigation in a web application&quot; title=&quot;Typical pagination navigation in a web application&quot; src=&quot;https://pganalyze.com/static/f8f05650640b79720865888a54c86aaa/1d69c/typical_pagination_navigation.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;In this control, the user may go to the previous and next pages or jump directly to a specific page. The query to get the tenth page using Postgres’ &lt;code &gt;LIMIT&lt;/code&gt; and &lt;code &gt;OFFSET&lt;/code&gt; approach might look like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; users
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; created_at &lt;span &gt;DESC&lt;/span&gt;
&lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;10&lt;/span&gt;
&lt;span &gt;OFFSET&lt;/span&gt; &lt;span &gt;100&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice that to get the correct &lt;code &gt;OFFSET&lt;/code&gt;, you must multiply the page number you want by the &lt;code &gt;LIMIT&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;There&apos;s one more query needed to display our pagination control. You must know the number of records in the table. Without that information, you won’t know how many pages you need to seek through.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;count&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; users&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You might see how this approach can become a real performance issue very quickly.&lt;/p&gt;
&lt;h2 id=&quot;performance-of-naive-postgresql-pagination&quot; &gt;&lt;a href=&quot;#performance-of-naive-postgresql-pagination&quot; aria-label=&quot;performance of naive postgresql pagination permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Performance of Naive PostgreSQL Pagination&lt;/h2&gt;
&lt;p&gt;To better understand the performance bottleneck of &lt;code &gt;LIMIT&lt;/code&gt; and &lt;code &gt;OFFSET&lt;/code&gt;, you can import some test data and try it out. First, create a table large enough to encounter slowdowns:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; USERS &lt;span &gt;(&lt;/span&gt;
    id &lt;span &gt;serial&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    name &lt;span &gt;varchar&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;50&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; users
&lt;span &gt;SELECT&lt;/span&gt;
    &lt;span &gt;--- Ten million records&lt;/span&gt;
    generate_series&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;10000000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; id&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;--- Example: &quot;e6f2c6842d146c518185e1e47add9532&quot;&lt;/span&gt;
    substr&lt;span &gt;(&lt;/span&gt;md5&lt;span &gt;(&lt;/span&gt;random&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;::&lt;span &gt;text&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;50&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; name&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When you run the query to get the tenth page of results, the response is nearly instant. On my 2018 Macbook Pro with the latest version of Postgres, I see data in 89ms.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/6e1d03c4f7ebac6dc78e246ca9de4886/b5a09/naive_pagination_response_time.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Response time of the tenth page of results&quot; title=&quot;Response time of the tenth page of results&quot; src=&quot;https://pganalyze.com/static/6e1d03c4f7ebac6dc78e246ca9de4886/1d69c/naive_pagination_response_time.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;However, for queries farther in, the wait times increase&lt;/strong&gt;. With a LIMIT of 10 and an OFFSET of 5,000,000, a response takes 2.62 seconds.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/1ca7a7f72f12b41264da223299b420fb/b5a09/naive_pagination_response_time_millions.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Response time of the five millionth record&quot; title=&quot;Response time of the five millionth record&quot; src=&quot;https://pganalyze.com/static/1ca7a7f72f12b41264da223299b420fb/1d69c/naive_pagination_response_time_millions.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Finally, you can look at the &lt;code &gt;COUNT&lt;/code&gt; query, which runs on all 10 million rows. That query now takes a lethargic 4.45 seconds.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/e4d439c2aebe476a2ee2b0ec6305fbf4/b5a09/pagination_django_count_query.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Count query on a large database table&quot; title=&quot;Count query on a large database table&quot; src=&quot;https://pganalyze.com/static/e4d439c2aebe476a2ee2b0ec6305fbf4/1d69c/pagination_django_count_query.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;The slow performance in these examples is caused by &lt;a href=&quot;https://use-the-index-luke.com/sql/partial-results/fetch-next-page&quot;&gt;the way that &lt;code &gt;OFFSET&lt;/code&gt; and &lt;code &gt;COUNT&lt;/code&gt; work&lt;/a&gt;. Getting to the specified page using &lt;code &gt;OFFSET&lt;/code&gt; requires the database to traverse each index up to the page you want. Therefore, the performance degrades the farther you peer into the table.&lt;/p&gt;
&lt;p&gt;The naive approach to pagination using &lt;code &gt;COUNT&lt;/code&gt;, &lt;code &gt;LIMIT&lt;/code&gt;, and &lt;code &gt;OFFSET&lt;/code&gt; is only a viable solution for tables under a million rows. In large tables, you can &lt;strong&gt;expect to see slow queries and congruently a poor user experience&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&quot;pagination-handling-in-django&quot; &gt;&lt;a href=&quot;#pagination-handling-in-django&quot; aria-label=&quot;pagination handling in django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Pagination Handling in Django&lt;/h2&gt;
&lt;p&gt;Now that you have some background knowledge on the performance of pagination queries in Postgres, you can start to understand why pagination slows down in Django. To demonstrate how Django handles pagination, I created a new application with a User model and inserted 10 million records by adapting our earlier query.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# {project}/users/models.py&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;User&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;50&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I used the admin site to test the pagination speed since it works out of the box.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# {project}/users/admin.py&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib &lt;span &gt;import&lt;/span&gt; admin
&lt;span &gt;from&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; User

admin&lt;span &gt;.&lt;/span&gt;site&lt;span &gt;.&lt;/span&gt;register&lt;span &gt;(&lt;/span&gt;User&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that the User model is presented in the admin panel, I can see the table with 10 million records.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/56c1ef33a5086d5b6f5dea6e8ff6697c/b5a09/pagination_django_admin_panel.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;10,000,000 records in the Django admin panel&quot; title=&quot;10,000,000 records in the Django admin panel&quot; src=&quot;https://pganalyze.com/static/56c1ef33a5086d5b6f5dea6e8ff6697c/1d69c/pagination_django_admin_panel.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Using &lt;a href=&quot;https://django-debug-toolbar.readthedocs.io/en/latest/&quot;&gt;django-debug-toolbar&lt;/a&gt; I can peer into the SQL queries that Django is generating in real-time. There are two queries used to generate this UI:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;-- Count the total number of records - 2.43 seconds&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; &lt;span &gt;&quot;__count&quot;&lt;/span&gt;
  &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;

&lt;span &gt;-- Get first page of items - 2ms&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
       &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;name&quot;&lt;/span&gt;
  &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;
 &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;DESC&lt;/span&gt;
 &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;100&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;These queries should look familiar because they are almost identical to the naive pagination queries above. Strangely, the count query is triggered twice, which means that when you load the Django admin panel, you have to wait for the database to count every single row of the table &lt;strong&gt;two times&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When you click on page 99,999, Django will fire off two count queries again and another pagination query using &lt;code &gt;LIMIT&lt;/code&gt; and &lt;code &gt;OFFSET&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;-- Get the 99,999 page (100 results per page) - 13.34 seconds&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
       &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;name&quot;&lt;/span&gt;
  &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;
 &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; &lt;span &gt;&quot;users_user&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;DESC&lt;/span&gt;
 &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;100&lt;/span&gt;
&lt;span &gt;OFFSET&lt;/span&gt; &lt;span &gt;9999900&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query takes a whopping &lt;strong&gt;13 seconds&lt;/strong&gt; to finish!&lt;/p&gt;
&lt;p&gt;Clearly, the naive approach to pagination in Django is slow for large tables. Over time, your database tables will likely grow, and as they reach tens of millions of records, you and your customers are going to start to notice these terrible load times. So what can you do about it?&lt;/p&gt;
&lt;p&gt;In the following sections, I’ll show you three options for improving your pagination performance in a Django application.&lt;/p&gt;
&lt;p&gt;
&lt;img src=&quot;https://pganalyze.com/ce7c8be5616e94d6a5e02ed98330a930/django_pagination_performance_comparison.svg&quot; alt=&quot;Comparing the performance of pagination methods in Django and Postgres&quot;&gt;
&lt;/p&gt;
&lt;h2 id=&quot;postgresql-pagination-in-django-option-1--removing-the-count-query&quot; &gt;&lt;a href=&quot;#postgresql-pagination-in-django-option-1--removing-the-count-query&quot; aria-label=&quot;postgresql pagination in django option 1  removing the count query permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;PostgreSQL Pagination in Django: Option 1 – Removing the COUNT Query&lt;/h2&gt;
&lt;p&gt;The &lt;code &gt;COUNT&lt;/code&gt; query dominates the loading time for the first page of results. When skipping to later pages, the offset query is the slowest, but I’ll focus on improving the &lt;code &gt;COUNT&lt;/code&gt; first in this first option.&lt;/p&gt;
&lt;p&gt;This may come as a surprise, but one solution is to remove the count query completely.&lt;/p&gt;
&lt;p&gt;Won’t that break the UI!?&lt;/p&gt;
&lt;p&gt;Sort of… In this case, it might be reasonable not to know how many pages are in the Users table. It’s not often that users will find themselves at the five millionth page of a table of records. Navigating to the next and previous page is typically enough control. Using a search box or filter is likely a better method for finding the record you want. Look at &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.search_fields&quot;&gt;Django’s search_fields&lt;/a&gt; for information on enabling search and filtering in the admin panel and feel free to read through this pganalyze article about &lt;a href=&quot;https://pganalyze.com/blog/full-text-search-django-postgres&quot;&gt;Full Text Search in Django&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The most well-known example of this type of pagination is the Google search results page. At the bottom of the page, a truncated pagination control shows direct links to only the first 10 pages:&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/a82958405c72e62ebe2c5d0f3905aef0/b5a09/google_pagination.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;No count of the total results in the Google pagination&quot; title=&quot;No count of the total results in the Google pagination&quot; src=&quot;https://pganalyze.com/static/a82958405c72e62ebe2c5d0f3905aef0/1d69c/google_pagination.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;This doesn’t mean that Google is preventing you from seeing the rest of the billions of results. It’s simply telling you that a refined search term would be a better way to get to those results than pagination.&lt;/p&gt;
&lt;p&gt;If getting rid of the &lt;code &gt;COUNT&lt;/code&gt; makes sense in your application, Django makes it easy to hide. First, &lt;a href=&quot;https://hakibenita.com/optimizing-the-django-admin-paginator&quot;&gt;overwrite the count property of the default Paginator&lt;/a&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# {project}/users/paginator.py&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;core&lt;span &gt;.&lt;/span&gt;paginator &lt;span &gt;import&lt;/span&gt; Paginator
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;utils&lt;span &gt;.&lt;/span&gt;functional &lt;span &gt;import&lt;/span&gt; cached_property

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;UserPaginator&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;Paginator&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    
    &lt;span &gt;@cached_property&lt;/span&gt;
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;count&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; &lt;span &gt;9999999999&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice the placeholder value is a number much larger than you expect to have results for.&lt;/p&gt;
&lt;p&gt;Django responds to this adjustment by displaying the first few pages in the pagination component, as you would expect. However, the last few pages will be the fake count. When you click on a page that doesn’t exist, Django will take you to the last page no matter how many records you have.&lt;/p&gt;
&lt;p&gt;After you override the default paginator, import it into your admin model:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# {project}/users/admin.py&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib &lt;span &gt;import&lt;/span&gt; admin
&lt;span &gt;from&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; User
&lt;span &gt;from&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;paginator &lt;span &gt;import&lt;/span&gt; UserPaginator

&lt;span &gt;@admin&lt;span &gt;.&lt;/span&gt;register&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;User&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;class&lt;/span&gt; &lt;span &gt;UserTableAdmin&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;admin&lt;span &gt;.&lt;/span&gt;ModelAdmin&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    show_full_result_count &lt;span &gt;=&lt;/span&gt; &lt;span &gt;False&lt;/span&gt;
    paginator &lt;span &gt;=&lt;/span&gt; UserPaginator&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that I also set &lt;code &gt;show_full_result_count&lt;/code&gt; to &lt;code &gt;False&lt;/code&gt;. This will turn off the second count query that I noted earlier.&lt;/p&gt;
&lt;p&gt;After updating my application with these changes, I reduced the time for the first page &lt;strong&gt;from ~5 seconds to 8ms&lt;/strong&gt;. Keep in mind that this table is still suffering from slow &lt;code &gt;OFFSET&lt;/code&gt; queries though. Jumping to page 50,0000 took 18 seconds. Before I show you how to address the &lt;code &gt;OFFSET&lt;/code&gt; problem, I’ll show you one more method to improve the &lt;code &gt;COUNT&lt;/code&gt; query.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;postgresql-pagination-in-django-option-2--approximating-the-count&quot; &gt;&lt;a href=&quot;#postgresql-pagination-in-django-option-2--approximating-the-count&quot; aria-label=&quot;postgresql pagination in django option 2  approximating the count permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;PostgreSQL Pagination in Django: Option 2 – Approximating the COUNT&lt;/h2&gt;
&lt;p&gt;Another way to reduce the time spent on the &lt;code &gt;COUNT&lt;/code&gt; query is to use some built-in Postgres features to estimate the total number of records when the count takes too long. You can see a thorough implementation of the approach &lt;a href=&quot;https://gist.github.com/noviluni/d86adfa24843c7b8ed10c183a9df2afe&quot;&gt;in this gist&lt;/a&gt; which overloads the &lt;code &gt;count&lt;/code&gt; method in Django’s &lt;code &gt;Paginator&lt;/code&gt; class.&lt;/p&gt;
&lt;p&gt;The first things to do is to set a &lt;a href=&quot;https://postgresqlco.nf/en/doc/param/statement_timeout&quot;&gt;statement_timeout&lt;/a&gt; on the query and fallback to an estimated count. You can use &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/topics/db/transactions/&quot;&gt;atomic transactions&lt;/a&gt; to set the timeout to 150ms.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
    &lt;span &gt;try&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;with&lt;/span&gt; transaction&lt;span &gt;.&lt;/span&gt;atomic&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; connection&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; cursor&lt;span &gt;:&lt;/span&gt;
            &lt;span &gt;# Limit to 150 ms&lt;/span&gt;
            cursor&lt;span &gt;.&lt;/span&gt;execute&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SET LOCAL statement_timeout TO 150;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
            &lt;span &gt;return&lt;/span&gt; &lt;span &gt;super&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count
    &lt;span &gt;except&lt;/span&gt; OperationalError&lt;span &gt;:&lt;/span&gt;
            &lt;span &gt;pass&lt;/span&gt;
&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If the &lt;code &gt;count&lt;/code&gt; method returns data before the time limit is up, then the real value is used. However, if the query takes longer, Django will fallback to an approximate value stored in the &lt;a href=&quot;https://www.postgresql.org/docs/current/catalog-pg-class.html&quot;&gt;&lt;code &gt;pg_class&lt;/code&gt;&lt;/a&gt; metadata. That metadata is updated when commands like &lt;code &gt;VACUUM&lt;/code&gt;, &lt;code &gt;ANALYZE&lt;/code&gt; and &lt;code &gt;CREATE INDEX&lt;/code&gt; are called, or autovacuum runs on the table.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
        &lt;span &gt;with&lt;/span&gt; transaction&lt;span &gt;.&lt;/span&gt;atomic&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; connection&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; cursor&lt;span &gt;:&lt;/span&gt;
            &lt;span &gt;# Obtain estimated values (only valid with PostgreSQL)&lt;/span&gt;
            cursor&lt;span &gt;.&lt;/span&gt;execute&lt;span &gt;(&lt;/span&gt;
                    &lt;span &gt;&quot;SELECT reltuples FROM pg_class WHERE relname = %s&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                    &lt;span &gt;[&lt;/span&gt;self&lt;span &gt;.&lt;/span&gt;object_list&lt;span &gt;.&lt;/span&gt;query&lt;span &gt;.&lt;/span&gt;model&lt;span &gt;.&lt;/span&gt;_meta&lt;span &gt;.&lt;/span&gt;db_table&lt;span &gt;]&lt;/span&gt;
            &lt;span &gt;)&lt;/span&gt;
            estimate &lt;span &gt;=&lt;/span&gt; &lt;span &gt;int&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;fetchone&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
            &lt;span &gt;return&lt;/span&gt; estimate
&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After implementing this method, the maximum time you will spend loading the &lt;code &gt;COUNT&lt;/code&gt; is 150ms.&lt;/p&gt;
&lt;h2 id=&quot;postgresql-pagination-in-django-option-3--keyset-pagination&quot; &gt;&lt;a href=&quot;#postgresql-pagination-in-django-option-3--keyset-pagination&quot; aria-label=&quot;postgresql pagination in django option 3  keyset pagination permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;PostgreSQL Pagination in Django: Option 3 – Keyset Pagination&lt;/h2&gt;
&lt;p&gt;To solve the slow &lt;code &gt;OFFSET&lt;/code&gt; problem, you can replace it with &lt;a href=&quot;https://use-the-index-luke.com/no-offset&quot;&gt;keyset (or seek) pagination&lt;/a&gt;. In keyset pagination, each page is fetched by an ordered field like an &lt;code &gt;id&lt;/code&gt; or &lt;code &gt;created_at&lt;/code&gt; date. Instead of iteratively counting pages, as &lt;code &gt;OFFSET&lt;/code&gt; does, keyset pagination filters directly by the ordered field.&lt;/p&gt;
&lt;h3 id=&quot;example-of-using-keyset-pagination&quot; &gt;&lt;a href=&quot;#example-of-using-keyset-pagination&quot; aria-label=&quot;example of using keyset pagination permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Example of using keyset pagination&lt;/h3&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;user&lt;/span&gt;
    &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;60&lt;/span&gt; &lt;span &gt;-- The last item in the previous page&lt;/span&gt;
    &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; id &lt;span &gt;DESC&lt;/span&gt;
    &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;10&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This approach is a little different from &lt;code &gt;OFFSET&lt;/code&gt; because you must know the value you are starting at. Imagine that you are on page four of the table, and you know the first &lt;code &gt;id&lt;/code&gt; of page five. Instead of counting all the records, you can go directly to that &lt;code &gt;id&lt;/code&gt; and return the next ten items.&lt;/p&gt;
&lt;p&gt;This works because indexes can support a query like this efficiently. Asking a B-tree index on &lt;code &gt;id&lt;/code&gt; to return 10 entries before a certain &lt;code &gt;id&lt;/code&gt; only requires loading 10 index entries. Contrast that to using &lt;code &gt;OFFSET&lt;/code&gt;, where all entries up to the offset and then the specified limit need to be loaded, making high offsets very expensive.&lt;/p&gt;
&lt;p&gt;When using keyset pagination together with the right index, you will see a &lt;strong&gt;significant performance boost&lt;/strong&gt;. The &lt;a href=&quot;https://en.wikipedia.org/wiki/Time_complexity&quot;&gt;time complexity&lt;/a&gt; to query any record in the database is constant. For example, seeking the last page of the large table generated above takes just 78ms.&lt;/p&gt;
&lt;p&gt;This method also guards against sparse data. If a user is deleted and the order is not sequential in the table anymore, keyset pagination is not affected; it will skip the missing value with no problem.&lt;/p&gt;
&lt;h3 id=&quot;trade-offs-of-keyset-pagination&quot; &gt;&lt;a href=&quot;#trade-offs-of-keyset-pagination&quot; aria-label=&quot;trade offs of keyset pagination permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Trade-offs of keyset pagination&lt;/h3&gt;
&lt;p&gt;Keyset pagination comes with a couple of trade-offs. Without the offset, you don’t know exactly how many pages there are in the table or which page number you are currently on. Additionally, keyset pagination requires a sortable field on your model. Sequential IDs and date fields work well.&lt;/p&gt;
&lt;h3 id=&quot;using-the-dj-pagination-plugin-for-django&quot; &gt;&lt;a href=&quot;#using-the-dj-pagination-plugin-for-django&quot; aria-label=&quot;using the dj pagination plugin for django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using the dj-pagination plugin for Django&lt;/h3&gt;
&lt;p&gt;Unfortunately, I could not find a library that extends Django’s core &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/paginator/&quot;&gt;Paginator&lt;/a&gt; to add keyset pagination. The admin table requires a paginator of that type, so I&apos;ll demonstrate the same generated data in a new application view using the &lt;a href=&quot;https://dj-pagination.readthedocs.io/en/latest/&quot;&gt;dj-pagination plugin&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Add the application into your &lt;code &gt;INSTALLED_APPS&lt;/code&gt; and &lt;code &gt;middleware&lt;/code&gt; - &lt;a href=&quot;https://dj-pagination.readthedocs.io/en/latest/&quot;&gt;the docs explain well&lt;/a&gt;. Then, create a view in your Users app:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# {project}/users/views.py&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;shortcuts &lt;span &gt;import&lt;/span&gt; render
&lt;span &gt;from&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; User


&lt;span &gt;def&lt;/span&gt; &lt;span &gt;index&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;request&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    context &lt;span &gt;=&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
        &lt;span &gt;&apos;users&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; User&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;order_by&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;id&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;all&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;
    &lt;span &gt;return&lt;/span&gt; render&lt;span &gt;(&lt;/span&gt;request&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;users/index.html&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; context&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This code makes the users &lt;code &gt;QuerySet&lt;/code&gt; available in the view context and tells it to render the template file. Next, add the template directory to your &lt;code &gt;TEMPLATES&lt;/code&gt; setting object and add your new template file:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# {project}/templates/users/index.html&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;&lt;span &gt;%&lt;/span&gt; load pagination_tags &lt;span &gt;%&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;

&lt;span &gt;{&lt;/span&gt;&lt;span &gt;%&lt;/span&gt; autopaginate users &lt;span &gt;%&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;

&lt;span &gt;{&lt;/span&gt;&lt;span &gt;%&lt;/span&gt; &lt;span &gt;for&lt;/span&gt; user &lt;span &gt;in&lt;/span&gt; users &lt;span &gt;%&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;
    &lt;span &gt;{&lt;/span&gt;&lt;span &gt;{&lt;/span&gt; user&lt;span &gt;.&lt;/span&gt;name &lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;&lt;span &gt;%&lt;/span&gt; endfor &lt;span &gt;%&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;

&lt;span &gt;{&lt;/span&gt;&lt;span &gt;%&lt;/span&gt; paginate &lt;span &gt;%&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In a real application, you would add additional markup and styling to show the list of users, but this demonstrates the tags that &lt;code &gt;dj-pagination&lt;/code&gt; makes available to you.&lt;/p&gt;
&lt;p&gt;Now that you have a view and template, create a &lt;code &gt;urls.py&lt;/code&gt; file to route requests to your view:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# {project}/users/urls.py&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;urls &lt;span &gt;import&lt;/span&gt; path
&lt;span &gt;from&lt;/span&gt; &lt;span &gt;.&lt;/span&gt; &lt;span &gt;import&lt;/span&gt; views

app_name &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;users&apos;&lt;/span&gt;
urlpatterns &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
    &lt;span &gt;# ex: /users/&lt;/span&gt;
    path&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; views&lt;span &gt;.&lt;/span&gt;index&lt;span &gt;,&lt;/span&gt; name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;index&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
&lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, add it to your root URLs:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;//&lt;/span&gt; various imports&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;

urlpatterns &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
    &lt;span &gt;//&lt;/span&gt; routes&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
    path&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;users/&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; include&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;users.urls&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
&lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, when you point your browser to the &lt;code &gt;/users&lt;/code&gt; page, you will see a list of usernames with simple pagination controls. The generated query returns results in less than 100ms, regardless of which page I navigate to.&lt;/p&gt;
&lt;p&gt;If you are looking for a way to have more control over keyset pagination, &lt;a href=&quot;https://github.com/nitely/django-infinite-scroll-pagination&quot;&gt;django-infinite-scroll-pagination&lt;/a&gt; might be worth a look.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, you learned about how pagination works in Django. While naive pagination performs well for small tables, this method quickly degrades in performance as your table grows to millions of rows. Furthermore, jumping to a record deep in the table will be very slow in a query that uses &lt;code &gt;OFFSET&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The good news is that you can speed things up by altering the &lt;code &gt;COUNT&lt;/code&gt; query. Additionally, switching to keyset pagination will improve the performance of page lookups and make them work in constant time. Django makes it easy to alter its default configuration, giving you the power to build a performant solution for pagination in Django.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22Efficient%20Pagination%20in%20Django%20and%20Postgres%22%20-%20This%20post%20by%20%40pganalyze%20walks%20through%203%20different%20methods%20for%20pagination%20with%20Django%20and%20Postgres%20and%20explains%20their%20benefits%20and%20tradeoffs%20so%20you%20can%20decide%20which%20one%20is%20the%20best%20fit%20for%20your%20application%3A%20https://pganalyze.com/blog/pagination-django-postgres&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[PostgreSQL Partitioning in Django]]></title><description><![CDATA[Postgres 10 introduced partitioning to improve performance for very large database tables. You will typically start to see the performance benefits with tables of 1 million or more records, but the technical complexity usually doesn’t pay off unless you’re dealing with hundreds of gigabytes of data. Though there are several advantages to partitioning, it requires more tables, which can become cumbersome to work with, especially if you change your data structure in the future. Please note: If you…]]></description><link>https://pganalyze.com/blog/postgresql-partitioning-django</link><guid isPermaLink="false">https://pganalyze.com/blog/postgresql-partitioning-django</guid><dc:creator><![CDATA[Josh Alletto]]></dc:creator><pubDate>Thu, 08 Jul 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Postgres 10 introduced &lt;a href=&quot;https://www.postgresql.org/docs/11/ddl-partitioning.html&quot;&gt;partitioning&lt;/a&gt; to improve performance for very large database tables. You will typically start to see the performance benefits with tables of 1 million or more records, but the technical complexity usually doesn’t pay off unless you’re dealing with hundreds of gigabytes of data.&lt;/p&gt;
&lt;p&gt;Though there are several advantages to partitioning, it requires more tables, which can become cumbersome to work with, especially if you change your data structure in the future. Please note: &lt;strong&gt;If you are just starting out with a small database, you probably don&apos;t need partitioning.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That said, if you think you may have a legitimate reason to partition your Postgres database and you want to use Django to manage it, this article is the right one for you.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#what-is-database-partitioning&quot;&gt;What is Database Partitioning?&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#list-partitioning-in-postgres&quot;&gt;List Partitioning in Postgres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#range-partitioning-in-postgres&quot;&gt;Range Partitioning in Postgres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#hash-partitions-in-postgres&quot;&gt;Hash Partitions in Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#how-does-postgresql-partitioning-work&quot;&gt;How does PostgreSQL partitioning work?&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#creating-a-partitioned-table-in-postgresql&quot;&gt;Creating a partitioned table in PostgreSQL&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#comparing-partitioned-postgres-table-performance-with-python-and-faker&quot;&gt;Comparing Partitioned Postgres Table Performance with Python and Faker&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#postgres-data-partitioning-in-django&quot;&gt;Postgres Data Partitioning in Django&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;
&lt;img src=&quot;https://pganalyze.com/aa1e9c646c9fbbf96c9886c7e7470066/partitioned-vs-non-partitioned-tables-postgres.svg&quot; alt=&quot;Comparing performance of partitioned vs. non-partitioned Postgres database tables&quot;&gt;
&lt;/p&gt;
&lt;h2 id=&quot;what-is-database-partitioning&quot; &gt;&lt;a href=&quot;#what-is-database-partitioning&quot; aria-label=&quot;what is database partitioning permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What is Database Partitioning?&lt;/h2&gt;
&lt;p&gt;You may hear partitioning and think it is &lt;a href=&quot;https://wiki.postgresql.org/wiki/Built-in_Sharding&quot;&gt;similar to sharding&lt;/a&gt;, where a database or table is spread out across several different nodes. In fact, partitioning in PostgreSQL involves splitting a single table up into several different tables, but partitioning is performed on the same node. Partitioning allows you to organize the data into subsets that are easier for the query planner to traverse. This can vastly increase the speed of lookups, deletes, and inserts.&lt;/p&gt;
&lt;p&gt;To that end, the way you choose to partition the data should reflect the way you want to access your data. In other words, if you are frequently accessing records based on when they were created, you should probably partition based on creation date. If you&apos;re regularly grabbing subsets of data based on a region or country, you may want to partition based on the records’ locations.&lt;/p&gt;
&lt;p&gt;There are three types of partitioning supported by PostgreSQL:&lt;/p&gt;
&lt;h3 id=&quot;list-partitioning-in-postgres&quot; &gt;&lt;a href=&quot;#list-partitioning-in-postgres&quot; aria-label=&quot;list partitioning in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;List Partitioning in Postgres&lt;/h3&gt;
&lt;p&gt;List Partitioning allows you to &lt;strong&gt;explicitly state which values you would like to put into each partition&lt;/strong&gt;. For example, you could partition a table of North American climate data by country with a United States, Canada, and Mexico partition. Since you can create partitions of a partition, you could further split these tables up by state or province.&lt;/p&gt;
&lt;h3 id=&quot;range-partitioning-in-postgres&quot; &gt;&lt;a href=&quot;#range-partitioning-in-postgres&quot; aria-label=&quot;range partitioning in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Range Partitioning in Postgres&lt;/h3&gt;
&lt;p&gt;Range Partitions are the most useful and the kind I’ll use most in this tutorial. They allow you to &lt;strong&gt;specify partitions based on a range of numbers or dates&lt;/strong&gt;. A table for storing measurements on an hourly basis might be partitioned by date and time. This would make looking up new measurements or deleting older measurements much faster.&lt;/p&gt;
&lt;h3 id=&quot;hash-partitions-in-postgres&quot; &gt;&lt;a href=&quot;#hash-partitions-in-postgres&quot; aria-label=&quot;hash partitions in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Hash Partitions in Postgres&lt;/h3&gt;
&lt;p&gt;Hash Partitions split data by &lt;strong&gt;specifying a modulus and a remainder for each partition&lt;/strong&gt;. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder. This comes in handy if there isn&apos;t a clear way to organize the data, or you want a pseudo-random breakdown of your data.&lt;/p&gt;
&lt;h2 id=&quot;how-does-postgresql-partitioning-work&quot; &gt;&lt;a href=&quot;#how-does-postgresql-partitioning-work&quot; aria-label=&quot;how does postgresql partitioning work permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How does PostgreSQL partitioning work?&lt;/h2&gt;
&lt;p&gt;One important thing to understand about a partitioned table is that the partitions themselves are also tables. They are created individually, and you can query them separately, though you would rarely want to use this feature.&lt;/p&gt;
&lt;p&gt;The other thing to understand is that &lt;strong&gt;the partitioned table - the table you will split into smaller tables - doesn&apos;t hold any data&lt;/strong&gt;. It exists as a parent to the partitions and a blueprint for the table schema.&lt;/p&gt;
&lt;p&gt;
&lt;img src=&quot;https://pganalyze.com/f9f0ad95a0e0056bc08e89cc7454c6fa/postgresql-partitioning-in-django.svg&quot; alt=&quot;Partitions and the partitioned table in Postgres&quot;&gt;
&lt;/p&gt;
&lt;h2 id=&quot;creating-a-partitioned-table-in-postgresql&quot; &gt;&lt;a href=&quot;#creating-a-partitioned-table-in-postgresql&quot; aria-label=&quot;creating a partitioned table in postgresql permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating a partitioned table in PostgreSQL&lt;/h2&gt;
&lt;p&gt;The best way to understand partitions and see some of their benefits is to consider an example. Start by setting up a new &lt;code &gt;people&lt;/code&gt; table, which you’ll compare to the partitioned table:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; people &lt;span &gt;(&lt;/span&gt;
  id BIGSERIAL &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  full_name &lt;span &gt;text&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  birth_date &lt;span &gt;date&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now create another table with the same columns but partitioned:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; people_partitioned &lt;span &gt;(&lt;/span&gt;
  id BIGSERIAL&lt;span &gt;,&lt;/span&gt;
  full_name &lt;span &gt;text&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  birth_date &lt;span &gt;date&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; birth_date&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt; &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; RANGE &lt;span &gt;(&lt;/span&gt;birth_date&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here, we&apos;ve created a &lt;code &gt;RANGE&lt;/code&gt; partition that uses birth dates to delimit records in each partition. You could just as easily do this for &lt;code &gt;created_on&lt;/code&gt; timestamps or an &lt;code &gt;int&lt;/code&gt; column like a measurement value or record ID. Note that we had to define the primary key to include both the &lt;code &gt;id&lt;/code&gt; column and the &lt;code &gt;birth_date&lt;/code&gt; column we are partitioning by, since &lt;strong&gt;primary keys always need to include the partition column(s)&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Remember, a partitioned table on its own doesn&apos;t contain any data. You need to create the tables that represent the partitions themselves. In this case, split the data up into chunks of fifty years:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; people_partitioned_birthdays_1800_to_1850 &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;OF&lt;/span&gt; people_partitioned
    &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1800-01-01&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;TO&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1850-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; people_partitioned_birthdays_1850_to_1900 &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;OF&lt;/span&gt; people_partitioned
    &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1850-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;TO&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1900-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; people_partitioned_birthdays_1900_to_1950 &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;OF&lt;/span&gt; people_partitioned
    &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1900-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;TO&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1950-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; people_partitioned_birthdays_1950_to_2000 &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;OF&lt;/span&gt; people_partitioned
    &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1950-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;TO&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;2000-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; people_partitioned_birthdays_2000_to_2050 &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;OF&lt;/span&gt; people_partitioned
    &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;2000-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;TO&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;2050-12-31&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Each partition table is declared a &lt;code &gt;PARTITION OF&lt;/code&gt; the &lt;code &gt;people_partitioned&lt;/code&gt; table and includes the range of values you want to include in that table. It&apos;s best to give the tables descriptive names.&lt;/p&gt;
&lt;p&gt;Now, you can insert data into the &lt;code &gt;people_partitioned&lt;/code&gt; table:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; people_partitioned &lt;span &gt;(&lt;/span&gt;full_name&lt;span &gt;,&lt;/span&gt; birth_date&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;Bob Sponge&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;2000-08-21&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you query the &lt;code &gt;people_partitoned&lt;/code&gt; table, you’ll get the data you just inserted:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; people_partitioned&lt;span &gt;;&lt;/span&gt;

full_name  &lt;span &gt;|&lt;/span&gt; birth_date
&lt;span &gt;-----------+-----------&lt;/span&gt;
Bob Sponge &lt;span &gt;|&lt;/span&gt; &lt;span &gt;2000&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;08&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;21&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To ensure the record went into the right partition, query the partition table directly:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; people_partitioned_birthdays_2000_to_2050&lt;span &gt;;&lt;/span&gt;

full_name  &lt;span &gt;|&lt;/span&gt; birth_date
&lt;span &gt;-----------+-----------&lt;/span&gt;
Bob Sponge &lt;span &gt;|&lt;/span&gt; &lt;span &gt;2000&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;08&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;21&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, records are stored in the individual tables but accessible through the top-level partitioned table as well. This makes accessing the data relatively straightforward as you don’t have to keep track of which data is in each partition.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;comparing-partitioned-postgres-table-performance-with-python-and-faker&quot; &gt;&lt;a href=&quot;#comparing-partitioned-postgres-table-performance-with-python-and-faker&quot; aria-label=&quot;comparing partitioned postgres table performance with python and faker permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Comparing Partitioned Postgres Table Performance with Python and Faker&lt;/h2&gt;
&lt;p&gt;Next, I used Python and &lt;a href=&quot;https://faker.readthedocs.io/en/master/&quot;&gt;Faker&lt;/a&gt; to populate each table with ten million rows of random data. To compare the performance on each table, run a &lt;code &gt;SELECT&lt;/code&gt; query for anyone born between 1901 and 1920:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; people &lt;span &gt;WHERE&lt;/span&gt; EXTRACT&lt;span &gt;(&lt;/span&gt;&lt;span &gt;year&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; birth_date&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;1901&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; EXTRACT&lt;span &gt;(&lt;/span&gt;&lt;span &gt;year&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; birth_date&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;1920&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The query returned 1,313,997 rows of data. Our unpartitioned table ran the query in &lt;code &gt;4.109&lt;/code&gt; seconds while the partitioned table returned the exact same rows in &lt;code &gt;2.878&lt;/code&gt; seconds, a difference of &lt;code &gt;1.23&lt;/code&gt; seconds.&lt;/p&gt;
&lt;p&gt;This dataset is relatively small compared to what you would typically see in a partitioned database, but if you had to run a similar query on 500 million rows, you can see how partitioning could make a big difference.&lt;/p&gt;
&lt;p&gt;To continue our tutorial, next, delete everybody born in 1990:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;DELETE&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; people &lt;span &gt;WHERE&lt;/span&gt; EXTRACT&lt;span &gt;(&lt;/span&gt;&lt;span &gt;year&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; birth_date&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;1990&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this case, each table deleted 73,015 rows. The non-partitioned table did it in &lt;code &gt;00:05.431&lt;/code&gt; seconds and the partitioned table finished deleting the same rows in &lt;code &gt;00:03.688&lt;/code&gt; seconds - &lt;code &gt;1.74&lt;/code&gt; seconds faster.&lt;/p&gt;
&lt;p&gt;A great use case for partitioning is data that accumulates quickly in large quantities like up-to-the-minute weather data. This data is very relevant near the time it is collected but becomes much less useful a week later. &lt;strong&gt;Because the partitions are just tables, you can just drop irrelevant tables, making deletes even faster.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&quot;postgres-data-partitioning-in-django&quot; &gt;&lt;a href=&quot;#postgres-data-partitioning-in-django&quot; aria-label=&quot;postgres data partitioning in django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Postgres Data Partitioning in Django&lt;/h2&gt;
&lt;p&gt;Django&apos;s ORM &lt;strong&gt;doesn&apos;t have built-in support for partitioned tables&lt;/strong&gt;, so if you want to use partitions in your application, it&apos;s going to take a little extra work.&lt;/p&gt;
&lt;p&gt;One way to use partitions is to roll &lt;a href=&quot;https://www.endpoint.com/blog/2016/09/17/executing-custom-sql-in-django-migration&quot;&gt;your own migrations that run raw SQL&lt;/a&gt;. This will work, but it means you&apos;re going to have to manually manage the migrations for all changes you make to the table in the future.&lt;/p&gt;
&lt;p&gt;Another option is to use a package called &lt;a href=&quot;https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html&quot;&gt;django-postgres-extra&lt;/a&gt;. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for &lt;code &gt;TRUNCATE TABLE&lt;/code&gt; and table partitioning.&lt;/p&gt;
&lt;p&gt;After you install the package, add it to your installed apps:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;INSTALLED_APPS &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
    &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
    &lt;span &gt;&apos;django.contrib.messages&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&apos;django.contrib.staticfiles&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&apos;psqlextra&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
&lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, set your partitioned model to inherit from &lt;code &gt;PostgresPartitionedModel&lt;/code&gt; from &lt;code &gt;psqlextra&lt;/code&gt;. You&apos;ll also need to set up a meta class to define what kind of partition you would like to use (Range, List, Hash) and the column you&apos;d like to partition by:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models
&lt;span &gt;from&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;types &lt;span &gt;import&lt;/span&gt; PostgresPartitioningMethod
&lt;span &gt;from&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; PostgresPartitionedModel

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Person&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;PostgresPartitionedModel&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    &lt;span &gt;class&lt;/span&gt; &lt;span &gt;PartitioningMeta&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        method &lt;span &gt;=&lt;/span&gt; PostgresPartitioningMethod&lt;span &gt;.&lt;/span&gt;RANGE
        key &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&quot;birth_date&quot;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
    
    full_name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;TextField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    birth_date &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;DateField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Create the migration with &lt;code &gt;python manage.py pgmakemigrations&lt;/code&gt;. You should get a file that looks something like this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# Generated by Django 3.1.2 on 2020-10-13 23:34&lt;/span&gt;

&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; migrations&lt;span &gt;,&lt;/span&gt; models
&lt;span &gt;import&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;backend&lt;span &gt;.&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;operations&lt;span &gt;.&lt;/span&gt;add_default_partition
&lt;span &gt;import&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;backend&lt;span &gt;.&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;operations&lt;span &gt;.&lt;/span&gt;create_partitioned_model
&lt;span &gt;import&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;manager&lt;span &gt;.&lt;/span&gt;manager
&lt;span &gt;import&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;partitioned
&lt;span &gt;import&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;types


&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;Migration&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;

    initial &lt;span &gt;=&lt;/span&gt; &lt;span &gt;True&lt;/span&gt;

    dependencies &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;

    operations &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        psqlextra&lt;span &gt;.&lt;/span&gt;backend&lt;span &gt;.&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;operations&lt;span &gt;.&lt;/span&gt;create_partitioned_model&lt;span &gt;.&lt;/span&gt;PostgresCreatePartitionedModel&lt;span &gt;(&lt;/span&gt;
            name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;Person&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            fields&lt;span &gt;=&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;id&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;AutoField&lt;span &gt;(&lt;/span&gt;auto_created&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; primary_key&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; serialize&lt;span &gt;=&lt;/span&gt;&lt;span &gt;False&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; verbose_name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;ID&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;full_name&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;TextField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;birth_date&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;DateField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            options&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;
                &lt;span &gt;&apos;abstract&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;False&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&apos;base_manager_name&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;objects&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            partitioning_options&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;
                &lt;span &gt;&apos;method&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;types&lt;span &gt;.&lt;/span&gt;PostgresPartitioningMethod&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;RANGE&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                &lt;span &gt;&apos;key&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;birth_date&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            bases&lt;span &gt;=&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;psqlextra&lt;span &gt;.&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;partitioned&lt;span &gt;.&lt;/span&gt;PostgresPartitionedModel&lt;span &gt;,&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            managers&lt;span &gt;=&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;
                &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;objects&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;manager&lt;span &gt;.&lt;/span&gt;manager&lt;span &gt;.&lt;/span&gt;PostgresManager&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            &lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        psqlextra&lt;span &gt;.&lt;/span&gt;backend&lt;span &gt;.&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;operations&lt;span &gt;.&lt;/span&gt;add_default_partition&lt;span &gt;.&lt;/span&gt;PostgresAddDefaultPartition&lt;span &gt;(&lt;/span&gt;
            model_name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;Person&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
            name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;default&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, create some empty migration files - one for each partition.&lt;/p&gt;
&lt;p&gt;You can create an empty migration with &lt;code &gt;python manage.py makemigrations --empty yourappname&lt;/code&gt;. Then, use &lt;code &gt;django-postgres-extra&lt;/code&gt; to set up the migrations:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;backend&lt;span &gt;.&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;operations &lt;span &gt;import&lt;/span&gt; PostgresAddRangePartition

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;Migration&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    dependencies &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;people&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;0001_initial&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;
    
    operations &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        PostgresAddRangePartition&lt;span &gt;(&lt;/span&gt;
           model_name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;person&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;people_partitioned_birthdays_1800_to_1850&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           from_values&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;1800-01-01&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           to_values&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;1850-12-31&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again, &lt;strong&gt;you&apos;ll need to create one of these for every partition you need&lt;/strong&gt;. To get our example from the section above to work, I would need five migrations in addition to the one created for the model.&lt;/p&gt;
&lt;p&gt;Creating a migration to delete one of your partitions is basically the same:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; migrations&lt;span &gt;,&lt;/span&gt; models

&lt;span &gt;from&lt;/span&gt; psqlextra&lt;span &gt;.&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;operations &lt;span &gt;import&lt;/span&gt; PostgresDeleteListPartition

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;Migration&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    operations &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        PostgresDeleteListPartition&lt;span &gt;(&lt;/span&gt;
           model_name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;person&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;people_partitioned_birthdays_1800_to_1850&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, when you use the Django model, the data will be stored across the partitions, but the ORM will work as you expect it to for any Django application.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, you learned about the different types of partitions available in PostgreSQL. You saw that a partition is just a table that links to a parent table and helps organize data so that it can be accessed faster. Finally, you saw that even though Django&apos;s ORM does not natively support partitioning, it is possible to use the feature with the help of the &lt;a href=&quot;https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html&quot;&gt;django-postgres-extra package&lt;/a&gt;. It is also possible to create your own migrations and set it up that way.&lt;/p&gt;
&lt;p&gt;No matter how you decide to go about it, it’s important to remember that you shouldn&apos;t use partitions unless you are sure it&apos;s the right move for your project. &lt;strong&gt;Partitions will make individual tables smaller but give you more tables to manage and for Postgres to search&lt;/strong&gt;. They are typically best used for larger tables more than 100 GB in size.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22PostgreSQL%20Partitioning%20in%20Django%22%20-%20This%20article%20by%20%40pganalyze%20walks%20through%20the%20different%20types%20of%20partitions%20available%20in%20Postgres%20and%20how%20to%20use%20django-postgres-extra%20to%20make%20partitioning%20happen%20in%20Django%3A%20https://pganalyze.com/blog/postgresql-partitioning-django&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Josh is a former educator turned developer with a proven ability to learn quickly and adapt to different roles. In 2018 he changed careers from education to tech and has been excited to find that his communication and presentation skills have transferred over to his new technical career. He&apos;s always looking for a new challenge and a dedicated team to collaborate with.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[GeoDjango and PostGIS in Django]]></title><description><![CDATA[In this article, I’ll introduce you to spatial data in PostgreSQL and Django. You’ll learn how to use PostGIS and GeoDjango to create, store, and manipulate geographic data (both raster and vector) in a Python web application. Spatial data is any geographic data that contains information related to the earth, such as rivers, boundaries, cities, or natural landmarks. It describes the contours, topology, size, and shape of these features. Maps are a common method of visualizing spatial data, which…]]></description><link>https://pganalyze.com/blog/geodjango-postgis</link><guid isPermaLink="false">https://pganalyze.com/blog/geodjango-postgis</guid><dc:creator><![CDATA[Adeyinka Adegbenro]]></dc:creator><pubDate>Thu, 24 Jun 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;In this article, I’ll introduce you to spatial data in PostgreSQL and Django. You’ll learn how to use &lt;a href=&quot;https://postgis.net/&quot;&gt;PostGIS&lt;/a&gt; and &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/&quot;&gt;GeoDjango&lt;/a&gt; to create, store, and manipulate geographic data (both raster and vector) in a Python web application.&lt;/p&gt;
&lt;p&gt;Spatial data is &lt;strong&gt;any geographic data that contains information related to the earth&lt;/strong&gt;, such as rivers, boundaries, cities, or natural landmarks. It describes the contours, topology, size, and shape of these features. Maps are a common method of visualizing spatial data, which is typically represented in vector or raster form. Along the way, you’ll see several use cases for spatial data that you’re likely to encounter as a software developer.&lt;/p&gt;
&lt;p&gt;If you are interested in reading about PostGIS in Rails I can recommend our &lt;a href=&quot;https://pganalyze.com/blog/postgis-rails-geocoder&quot;&gt;PostGIS vs. Geocoder in Rails&lt;/a&gt; article on the pganalyze blog where we compare PostGIS in Rails with Geocoder and highlight a couple of the areas where you&apos;ll want to (or need to) reach for one over the other.&lt;/p&gt;
&lt;h2 id=&quot;vector-data-vs-raster-data&quot; &gt;&lt;a href=&quot;#vector-data-vs-raster-data&quot; aria-label=&quot;vector data vs raster data permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Vector data vs. raster data&lt;/h2&gt;
&lt;p&gt;
&lt;img src=&quot;https://pganalyze.com/46e5c50440b998463f17090eb675b8f0/raster-vs-vector.svg&quot; alt=&quot;Raster vs. Vector spatial data&quot;&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Vector data&lt;/strong&gt; is a representation of the earth using points, lines, and polygons. A point is used to represent small, discrete areas using an “x” and “y” coordinate. Connected points create lines, which may be used to describe roads, streams, and networks. Polygons are formed from an enclosed connection of lines and represent features with an enclosed area like buildings, islands, and borders. Vector data types are more common in relational databases than raster data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Raster data&lt;/strong&gt;, on the other hand, is a representation of geographic data in pixels. It typically refers to imagery of the earth taken from aerial satellites. They are usually stored in a grid of rows and columns with relevant metadata, such as measurements and resolution. Raster data is &lt;a href=&quot;https://www.esri.com/content/dam/esrisites/en-us/media/pdf/teach-with-gis/raster-faster.pdf&quot;&gt;faster and less expensive to create&lt;/a&gt; than vector data types.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#vector-data-vs-raster-data&quot;&gt;Vector data vs. raster data&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#spatial-data-in-postgres-with-postgis&quot;&gt;Spatial data in Postgres with PostGIS&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#geodjango-for-spatial-data-in-django&quot;&gt;GeoDjango for spatial data in Django&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#saving-polygons-using-geosgeometry&quot;&gt;Saving polygons Using GEOSGeometry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#saving-models-with-raster-fields-using-gdalraster&quot;&gt;Saving Models with Raster Fields Using GDALRaster&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#searching-for-points-in-space-using-geometry-lookups&quot;&gt;Searching for Points in Space Using Geometry Lookups&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#calculating-the-distance-between-points&quot;&gt;Calculating the distance between points&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#more-about-geodjango-and-postgis&quot;&gt;More about GeoDjango and PostGIS&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;spatial-data-in-postgres-with-postgis&quot; &gt;&lt;a href=&quot;#spatial-data-in-postgres-with-postgis&quot; aria-label=&quot;spatial data in postgres with postgis permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Spatial data in Postgres with PostGIS&lt;/h2&gt;
&lt;p&gt;Whenever you need to answer questions about your geographic environment, such as &quot;How far is the hospital?,” “Where is the closest store?,” “How high is that skyscraper?,” or &quot;What is the fastest route?” spatial data is likely to come into play.&lt;/p&gt;
&lt;p&gt;Spatial data is also &lt;strong&gt;used in statistics for analyzing patterns and relationships between elements&lt;/strong&gt;. For example, when analyzing the spread of a disease in a geographical area, hot zones can be identified and quarantined using spatial data. These data can be used to identify the source of an outbreak, the zoning of cities, and much more. Because more software applications are dependent on location, the manner with which you manage and store spatial data is more critical than ever.&lt;/p&gt;
&lt;p&gt;PostgreSQL, on its own, does not provide support for the storage of spatial data. This is where PostGIS comes in. &lt;a href=&quot;https://postgis.net/&quot;&gt;PostGIS&lt;/a&gt; is a free, open-source extension that adds spatial data capabilities to PostgreSQL databases. PostGIS allows you to &lt;strong&gt;store spatial data and use its library of functions&lt;/strong&gt; to manipulate it. A database with PostGIS can store geographic coordinates, lines, and shapes and query them using spatial functions.&lt;/p&gt;
&lt;p&gt;If you use a Database-as-a-Service provider such as Amazon RDS or Google Cloud SQL, PostGIS is likely to already be installed. If you run your own server, check the &lt;a href=&quot;https://postgis.net/install/&quot;&gt;PostGIS website&lt;/a&gt;) for details. Once installed, enabling PostGIS is as simple as:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; EXTENSION postgis&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, let&apos;s see how we can work with geospatial data in Django.&lt;/p&gt;
&lt;h2 id=&quot;geodjango-for-spatial-data-in-django&quot; &gt;&lt;a href=&quot;#geodjango-for-spatial-data-in-django&quot; aria-label=&quot;geodjango for spatial data in django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;GeoDjango for spatial data in Django&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/&quot;&gt;GeoDjango&lt;/a&gt; is a Django module used for creating geographic applications. It can be used to manage a spatial database in Python. It comes integrated with Django, but can be used as a standalone framework as well. It aims to make it as easy as possible to create location-based web applications.&lt;/p&gt;
&lt;p&gt;In the following sections, you’ll see four different use cases for GeoDjango. These will illustrate how you can create, store, and retrieve spatial data in a Django application backed by a Postgres database that uses PostGIS. You’ll also see how to use spatial data for common operations like finding the distance between two locations in space.&lt;/p&gt;
&lt;h3 id=&quot;saving-polygons-using-geosgeometry&quot; &gt;&lt;a href=&quot;#saving-polygons-using-geosgeometry&quot; aria-label=&quot;saving polygons using geosgeometry permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Saving polygons Using GEOSGeometry&lt;/h3&gt;
&lt;p&gt;A polygon is a type of vector data: a connection of Points that form an enclosed shape. You can add a polygon to a spatial database in Django using &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/geos/#geosgeometry&quot;&gt;GEOSGeometry&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code &gt;GEOSGeometry&lt;/code&gt; class comes from the &lt;a href=&quot;https://trac.osgeo.org/geos/&quot;&gt;GEOS API&lt;/a&gt;. It takes two arguments, the first argument being a string input which represents the geometry being saved, and a second optional argument, &lt;a href=&quot;http://dcx.sap.com/sqla170/en/html/3c207ab56c5f1014a95ba9268e096e6a.html&quot;&gt;an SRID (spatial reference identifier) number&lt;/a&gt;. The SRID is a unique identifier that defines what coordinate system you would like to use and describes how to convert data to real-world locations. When performing geospatial functions such as finding distance and area data, it is important to use data with the same SRID as the one used in the database to ensure the correct result.&lt;/p&gt;
&lt;p&gt;To save a Polygon to a spatial database using &lt;code &gt;GEOSGeometry&lt;/code&gt;, make sure a &lt;code &gt;Polygon&lt;/code&gt; field is defined on your model. Suppose you have a &lt;code &gt;Bank&lt;/code&gt; model that represents all the banks in a state with a &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/model-api/#polygonfield&quot;&gt;PolygonField&lt;/a&gt; (&lt;code &gt;poly&lt;/code&gt;) that outlines the physical real-life boundary and shape of a particular bank branch:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Bank&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;20&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    address &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;128&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    zip_code &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    poly &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;PolygonField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__str__&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;name&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To store data on such a field with GEOSGeometry, you can run the following:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; app&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; Bank
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;geos &lt;span &gt;import&lt;/span&gt; GEOSGeometry
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; polygon &lt;span &gt;=&lt;/span&gt; GEOSGeometry&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;POLYGON ((-98.503358 29.335668, -98.503086 29.335668, -98.503086 29.335423, -98.503358 29.335423, -98.503358 29.335668))&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; srid&lt;span &gt;=&lt;/span&gt;&lt;span &gt;4326&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; bank &lt;span &gt;=&lt;/span&gt; Bank&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;Suntrust Bank&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; address&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;144 Monsourd Blvd, San Antonio Texas, USA&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;zip_code&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;78221&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; poly&lt;span &gt;=&lt;/span&gt;polygon&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; bank&lt;span &gt;.&lt;/span&gt;save&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using the &lt;code &gt;GEOSGeometry&lt;/code&gt; class, you have created a &lt;code &gt;Polygon&lt;/code&gt; object that represents an outline of a certain Suntrust bank in San Antonio, Texas. Each coordinate given to the &lt;code &gt;POLYGON&lt;/code&gt; parameter defines a “corner” of the building’s outline.&lt;/p&gt;
&lt;h3 id=&quot;saving-models-with-raster-fields-using-gdalraster&quot; &gt;&lt;a href=&quot;#saving-models-with-raster-fields-using-gdalraster&quot; aria-label=&quot;saving models with raster fields using gdalraster permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Saving Models with Raster Fields Using GDALRaster&lt;/h3&gt;
&lt;p&gt;When working with raster data, you need the field used for storing a raster (called a &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/model-api/#rasterfield&quot;&gt;RasterField&lt;/a&gt;). The raster functionality has always been part of PostGIS, but as of &lt;a href=&quot;https://postgis.net/2019/10/20/postgis-3.0.0/&quot;&gt;PostGIS 3.0&lt;/a&gt;, the raster extension has been broken into a separate extension. After installation, make sure the extension is enabled in your database by running:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; EXTENSION postgis_raster&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, suppose you have a model called &lt;code &gt;Elevation&lt;/code&gt; with a raster field on it. The Elevation model would represent the vertical and horizontal dimension of different surfaces, and the &lt;code &gt;RasterField&lt;/code&gt; on it (&lt;code &gt;rast&lt;/code&gt;, as seen below) would be a field that takes in an abstracted raster object describing the elevation. For example, it could be a satellite mapping of the terrain of a hill:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Elevation&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;100&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    rast &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;RasterField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code &gt;RasterField&lt;/code&gt; stores a &lt;code &gt;GDALRaster&lt;/code&gt; object. &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/gdal/&quot;&gt;GDALRaster&lt;/a&gt; is an object that supports the reading of spatial file formats such as raster files. It can be instantiated with &lt;strong&gt;two inputs&lt;/strong&gt;. The first parameter can be &lt;strong&gt;either a string representing a file path or dictionary or a byte object representing the raster&lt;/strong&gt;. The second parameter &lt;strong&gt;specifies whether the raster should be opened in “write mode.”&lt;/strong&gt; If you don’t use write mode, you cannot modify the raster data.&lt;/p&gt;
&lt;p&gt;Below, &lt;code &gt;GDALRaster&lt;/code&gt; takes in the &lt;code &gt;raster.tif &lt;/code&gt; file, reads it as a file object and abstracts it into a &lt;code &gt;GDALRaster&lt;/code&gt; object that can be stored in the model’s &lt;code &gt;RasterField&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;gdal &lt;span &gt;import&lt;/span&gt; GDALRaster
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; rast &lt;span &gt;=&lt;/span&gt; GDALRaster&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;/path/to/raster/raster.tif&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; write&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; rast&lt;span &gt;.&lt;/span&gt;name
&lt;span &gt;/&lt;/span&gt;path&lt;span &gt;/&lt;/span&gt;to&lt;span &gt;/&lt;/span&gt;raster&lt;span &gt;/&lt;/span&gt;raster&lt;span &gt;.&lt;/span&gt;tif

&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; rast&lt;span &gt;.&lt;/span&gt;width&lt;span &gt;,&lt;/span&gt; rast&lt;span &gt;.&lt;/span&gt;height &lt;span &gt;# this file has 163 by 174 pixels&lt;/span&gt;
&lt;span &gt;(&lt;/span&gt;&lt;span &gt;163&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;174&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; topography &lt;span &gt;=&lt;/span&gt; Elevation&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;Mount Fuji&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; rast&lt;span &gt;=&lt;/span&gt;rast&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; topography&lt;span &gt;.&lt;/span&gt;save&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this way, you can store a raster’s &lt;code &gt;.tif&lt;/code&gt; image file representing the terrain of Mount Fuji.&lt;/p&gt;
&lt;p&gt;A new raster can also be created using raw data from a Python &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/gdal/#the-ds-input-dictionary&quot;&gt;dictionary&lt;/a&gt; containing the parameters scale, size, origin, and srid. Below, you can see how to define a new raster that describes a canyon with a width and height of 10 pixels and bands which represent a single layer of data in the raster:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; rst &lt;span &gt;=&lt;/span&gt; GDALRaster&lt;span &gt;(&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&apos;width&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;10&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;height&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;10&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;name&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;canyon&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;srid&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;4326&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;bands&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;data&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;range&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;100&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; rst&lt;span &gt;.&lt;/span&gt;name
&lt;span &gt;&apos;canyon&apos;&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; topography &lt;span &gt;=&lt;/span&gt; Elevation&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;Mount Fuji&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; rast&lt;span &gt;=&lt;/span&gt;rst&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; topography&lt;span &gt;.&lt;/span&gt;save&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;searching-for-points-in-space-using-geometry-lookups&quot; &gt;&lt;a href=&quot;#searching-for-points-in-space-using-geometry-lookups&quot; aria-label=&quot;searching for points in space using geometry lookups permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Searching for Points in Space Using Geometry Lookups&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/db-api/#geometry-lookups&quot;&gt;Geometry Lookups&lt;/a&gt; help you find points, lines, and polygons within another geometry. For example, you can use geometry lookups to determine if a point lies within a polygon&apos;s surface.&lt;/p&gt;
&lt;p&gt;First, create a &lt;code &gt;Country&lt;/code&gt; model defined as follows:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Country&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;50&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    area &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;IntegerField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    pop2005 &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;IntegerField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;Population 2005&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    fips &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;FIPS Code&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    iso2 &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;2 Digit ISO&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    iso3 &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;3 Digit ISO&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    un &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;IntegerField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;United Nations Code&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    region &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;IntegerField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;Region Code&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    subregion &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;IntegerField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;Sub-Region Code&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    lon &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;FloatField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    lat &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;FloatField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

    &lt;span &gt;# GeoDjango-specific: a geometry field (MultiPolygonField)&lt;/span&gt;
    mpoly &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;MultiPolygonField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

    &lt;span &gt;# Returns the string representation of the model.&lt;/span&gt;
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__str__&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;name &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code &gt;Country&lt;/code&gt; represents a table that stores the boundaries of world countries. Next, you can use GeoDjango to check if a particular &lt;code &gt;Point&lt;/code&gt; coordinate is stored in a &lt;code &gt;mpoly&lt;/code&gt; field in one of the countries in the database:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; app&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; Country
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;geos &lt;span &gt;import&lt;/span&gt; Point
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; point &lt;span &gt;=&lt;/span&gt; Point&lt;span &gt;(&lt;/span&gt;&lt;span &gt;954158.1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;4215137.1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; srid&lt;span &gt;=&lt;/span&gt;&lt;span &gt;32140&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; Country&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;mpoly__contains&lt;span &gt;=&lt;/span&gt;point&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&amp;lt;&lt;/span&gt;QuerySet &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt;Country&lt;span &gt;:&lt;/span&gt; United States&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also do a spatial lookup to determine if a point is inside a particular country. Run the code below to define a &lt;code &gt;Point&lt;/code&gt; object that represents a location in Valdagrone, San Marino. Then, you can search for this &lt;code &gt;Point&lt;/code&gt; using the &lt;code &gt;contains&lt;/code&gt; method:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; san_marino &lt;span &gt;=&lt;/span&gt; Country&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;get&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;San Marino&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pnt &lt;span &gt;=&lt;/span&gt; Point&lt;span &gt;(&lt;/span&gt;&lt;span &gt;12.4604&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;43.9420&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# Valdagrone, San Marino&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; san_marino&lt;span &gt;.&lt;/span&gt;mpoly&lt;span &gt;.&lt;/span&gt;contains&lt;span &gt;(&lt;/span&gt;pnt&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;True&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;calculating-the-distance-between-points&quot; &gt;&lt;a href=&quot;#calculating-the-distance-between-points&quot; aria-label=&quot;calculating the distance between points permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Calculating the distance between points&lt;/h3&gt;
&lt;p&gt;Finally, GeoDjango can be used to calculate the distance between two points. Assuming you know two point coordinates and want to find the distance between them, you could run the following in your Python shell:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;geos &lt;span &gt;import&lt;/span&gt; GEOSGeometry
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; point1 &lt;span &gt;=&lt;/span&gt; GEOSGeometry&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SRID=4326;POINT(-167.8522796630859 65.55173492431641)&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;transform&lt;span &gt;(&lt;/span&gt;&lt;span &gt;900913&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; clone&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# Tin City, Alaska&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; point2 &lt;span &gt;=&lt;/span&gt; GEOSGeometry&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SRID=4326;POINT(-165.4089813232422 64.50033569335938)&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;transform&lt;span &gt;(&lt;/span&gt;&lt;span &gt;900913&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; clone&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# Nome, Alaska&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; distance &lt;span &gt;=&lt;/span&gt; point1&lt;span &gt;.&lt;/span&gt;distance&lt;span &gt;(&lt;/span&gt;point2&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# in meters&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; distance &lt;span &gt;/&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt; &lt;span &gt;# in Kilometers&lt;/span&gt;
&lt;span &gt;388.3890308954561&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This example uses the &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/ref/contrib/gis/gdal/#django.contrib.gis.gdal.OGRGeometry.transform&quot;&gt;transform&lt;/a&gt; method to convert the Point coordinates from latitude/longitude decimal degrees to metric distance.&lt;/p&gt;
&lt;p&gt;To illustrate a more Django-specific example, you could create a model for cities in the United States that looks like this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Cities&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    feature &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;20&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;30&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    county &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;20&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    state &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;20&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    the_geom &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;PointField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

    &lt;span &gt;# Returns the string representation of the model.&lt;/span&gt;
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__str__&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;name &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To calculate the distance between the cities &lt;strong&gt;Point Hope&lt;/strong&gt; and &lt;strong&gt;Point Lay&lt;/strong&gt;, you can use the models like this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; app&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; Cities
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pt_hope &lt;span &gt;=&lt;/span&gt; Cities&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;get&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;Point Hope&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pt_lay &lt;span &gt;=&lt;/span&gt; Cities&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;get&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;Point Lay&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pt_hope_meters &lt;span &gt;=&lt;/span&gt; pt_hope&lt;span &gt;.&lt;/span&gt;the_geom&lt;span &gt;.&lt;/span&gt;transform&lt;span &gt;(&lt;/span&gt;&lt;span &gt;900913&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; clone&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pt_lay_meters &lt;span &gt;=&lt;/span&gt; pt_lay&lt;span &gt;.&lt;/span&gt;the_geom&lt;span &gt;.&lt;/span&gt;transform&lt;span &gt;(&lt;/span&gt;&lt;span &gt;900913&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; clone&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pt_hope_meters&lt;span &gt;.&lt;/span&gt;distance&lt;span &gt;(&lt;/span&gt;pt_lay_meters&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;594946.4349305361&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;GeoDjango also provides some &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/db-api/#distance-lookups&quot;&gt;distance lookup&lt;/a&gt; functions such as &lt;code &gt;distance_lt&lt;/code&gt;, &lt;code &gt;distance_lte&lt;/code&gt;, &lt;code &gt;distance_gt&lt;/code&gt;, &lt;code &gt;distance_gte&lt;/code&gt; and &lt;code &gt;dwithin&lt;/code&gt;. For example:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;geos &lt;span &gt;import&lt;/span&gt; Point
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;gis&lt;span &gt;.&lt;/span&gt;measure &lt;span &gt;import&lt;/span&gt; D
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pnt &lt;span &gt;=&lt;/span&gt; Point&lt;span &gt;(&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;163.0928955078125&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;69.72028350830078&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# Point Lay&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; dist &lt;span &gt;=&lt;/span&gt; Cities&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;the_geom__distance_lte&lt;span &gt;=&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;pnt&lt;span &gt;,&lt;/span&gt; D&lt;span &gt;(&lt;/span&gt;km&lt;span &gt;=&lt;/span&gt;&lt;span &gt;7&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# find all cities within 7 kilometers of Point Lay&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; dist &lt;span &gt;=&lt;/span&gt; Cities&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;the_geom__distance_gte&lt;span &gt;=&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;pnt&lt;span &gt;,&lt;/span&gt; D&lt;span &gt;(&lt;/span&gt;mi&lt;span &gt;=&lt;/span&gt;&lt;span &gt;20&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# find all cities greater than or equal to 20 miles away from Point Lay&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this way, you can use GeoDjango to find the distance between two models having location points or two raw point objects. Combining this method with vector or raster data about roads, you could build complex distance calculations for driving, walking, or biking into your application.&lt;/p&gt;
&lt;h2 id=&quot;more-about-geodjango-and-postgis&quot; &gt;&lt;a href=&quot;#more-about-geodjango-and-postgis&quot; aria-label=&quot;more about geodjango and postgis permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;More about GeoDjango and PostGIS&lt;/h2&gt;
&lt;p&gt;Spatial data has many important real-world use cases. In this post, you’ve seen how PostGIS and GeoDjango can help you use spatial data to build location-aware web applications, but there’s still much more to learn about the topic. Be sure to check out the &lt;a href=&quot;https://postgis.net/workshops/postgis-intro/&quot;&gt;PostGIS Introduction Documentation&lt;/a&gt; and &lt;a href=&quot;https://docs.djangoproject.com/en/3.2/ref/contrib/gis/db-api/&quot;&gt;GeoDjango API&lt;/a&gt; for more information and examples.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22Using%20GeoDjango%20and%20PostGIS%20in%20Django%22%20-%20This%20article%20by%20%40pganalyze%20shows%20how%20to%20get%20started%20with%20GeoDjango%20and%20PostGIS%20to%20work%20with%20geospatial%20data%20in%20Postgres%3A%20https://pganalyze.com/blog/geodjango-postgis&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Using Postgres Row-Level Security in Ruby on Rails]]></title><description><![CDATA[Securing access to your Postgres database is more important than ever. With applications growing more complex, often times using multiple programming languages and frameworks within the same app, it can be challenging to ensure access to customer data is handled consistently. For example, if you are building a SaaS application where different companies use the application, you don't want users of Company A to see the data of users in Company B by accident. Sure, you could use create a separate…]]></description><link>https://pganalyze.com/blog/postgres-row-level-security-ruby-rails</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-row-level-security-ruby-rails</guid><dc:creator><![CDATA[Eze Sunday Eze]]></dc:creator><pubDate>Tue, 25 May 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Securing access to your Postgres database is more important than ever. With applications growing more complex, often times using multiple programming languages and frameworks within the same app, it can be challenging to ensure access to customer data is handled consistently. For example, if you are building a SaaS application where different companies use the application, you don&apos;t want users of Company A to see the data of users in Company B by accident.&lt;/p&gt;
&lt;p&gt;Sure, you could use create a separate Postgres schema for each customer, or try to ensure the &lt;code &gt;WHERE&lt;/code&gt; clause of every single query includes the particular company—but what if you forget a &lt;code &gt;WHERE&lt;/code&gt; clause? That means users from company A will be able to see or manipulate data from company B and maybe other companies, at some point. You don&apos;t want that to happen.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Row-Level Security (RLS)&lt;/strong&gt; solves this problem. It is an additional layer of security that allows you to limit access to database rows based on the currently logged in database user or other attributes of a Postgres connection. With RLS, you wouldn&apos;t even need to add a &lt;code &gt;WHERE&lt;/code&gt; clause to your queries to limit access to certain rows because users will be able to access only rows that the Row-Level Security policy allows them to have access to.&lt;/p&gt;
&lt;p&gt;In this post, you are going to learn how Row-Level Security works with Postgres and how you can implement it in your Rails app. As a side note: Should you be interested in learning how to use Row-Level Security with Python and Django, you can read our dedicated article about it here: &lt;a href=&quot;https://pganalyze.com/blog/postgres-row-level-security-django-python&quot;&gt;Using Postgres Row-Level Security in Python and Django&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/556a19e96ff00acbb165a3bd523cc5e5/db7ce/postgres-row-level-security-rails.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;SQL statements for setting up row-level security&quot; title=&quot;SQL statements for setting up row-level security&quot; src=&quot;https://pganalyze.com/static/556a19e96ff00acbb165a3bd523cc5e5/1d69c/postgres-row-level-security-rails.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#row-level-security-in-postgres&quot;&gt;Row-Level Security in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#how-to-create-a-postgres-rls-policy&quot;&gt;How to Create a Postgres RLS Policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-postgres-session-variables-in-row-level-security-policies&quot;&gt;Using Postgres session variables in Row-Level Security policies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#testing-rls-permissions-with-different-customers&quot;&gt;Testing RLS permissions with different customers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#row-level-security-in-ruby-on-rails&quot;&gt;Row-Level Security in Ruby on Rails&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#creating-rls-enabled-tables-in-rails-migrations&quot;&gt;Creating RLS enabled tables in Rails migrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-separate-users-for-migrations-in-rails&quot;&gt;Using separate users for migrations in Rails&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#setting-the-customer-id-in-rails&quot;&gt;Setting the Customer ID in Rails&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#performance-implications-of-using-rls-in-postgres&quot;&gt;Performance Implications of using RLS in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;row-level-security-in-postgres&quot; &gt;&lt;a href=&quot;#row-level-security-in-postgres&quot; aria-label=&quot;row level security in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Row-Level Security in Postgres&lt;/h2&gt;
&lt;p&gt;Row-Level Security is an advanced security feature that was first released in &lt;a href=&quot;https://www.postgresql.org/docs/current/ddl-rowsecurity.html&quot;&gt;PostgreSQL 9.5&lt;/a&gt;. Instead of adding restrictions to an entire table, with RLS, we can add fine-grained access restrictions for individual rows based on policies. You can imagine RLS like an implicit &lt;code &gt;WHERE&lt;/code&gt; clause that automatically gets added to all your reads and writes on specific tables.&lt;/p&gt;
&lt;p&gt;There are trade-offs to consider with RLS, and it may not always be the best fit because of implementation complexity and performance implications. We&apos;ll get to these later on, but lets take a look at how RLS works first.&lt;/p&gt;
&lt;h3 id=&quot;how-to-create-a-postgres-rls-policy&quot; &gt;&lt;a href=&quot;#how-to-create-a-postgres-rls-policy&quot; aria-label=&quot;how to create a postgres rls policy permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How to Create a Postgres RLS Policy&lt;/h3&gt;
&lt;p&gt;For this example we assume our customers store financial records with us, and we are looking to use RLS for ensuring no data gets shared by accident with other customers.&lt;/p&gt;
&lt;p&gt;Let&apos;s create a &lt;code &gt;transactions&lt;/code&gt; table to start with:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    id uuid &lt;span &gt;PRIMARY&lt;/span&gt; &lt;span &gt;KEY&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt; &lt;span &gt;DEFAULT&lt;/span&gt; gen_random_uuid&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    customer_id &lt;span &gt;bigint&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    description &lt;span &gt;text&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    amount_cents &lt;span &gt;bigint&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    created_at timestamptz &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We could use the &lt;code &gt;GRANT&lt;/code&gt; mechanism in Postgres to restrict access, but that only works in an all-or-nothing approach - it doesn&apos;t let you restrict access to certain rows in the table.&lt;/p&gt;
&lt;p&gt;This is what Row-Level Security helps us with. Enable RLS on the &lt;code &gt;accounts&lt;/code&gt; table we just created using the &lt;code &gt;ALTER TABLE&lt;/code&gt; command:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;ALTER&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt; &lt;span &gt;ENABLE&lt;/span&gt; &lt;span &gt;ROW&lt;/span&gt; &lt;span &gt;LEVEL&lt;/span&gt; SECURITY&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since we have not created a policy yet, this will &lt;strong&gt;enable a default-deny policy on the table&lt;/strong&gt;, meaning all access is denied. However, the table owner, superusers and roles with the &lt;code &gt;BYPASSRLS&lt;/code&gt; attribute will not be subject to this policy.&lt;/p&gt;
&lt;p&gt;Now, we&apos;ll need to create a policy that defines the database access for our application user, depending on which end customer is currently logged in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For mapping an end customer to an RLS policy, we have two options:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;(1) Create separate database users for each customer, and check the &lt;code &gt;current_user&lt;/code&gt; in the RLS policy&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;(2) Use a session variable that indicates which customer is logged in, e.g. by calling &lt;code &gt;SET rls.customer_id = 42&lt;/code&gt;, and then checking that in the policy using &lt;code &gt;current_setting&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a security and isolation perspective, using separate database users is clearly better, but it ends up being complicated to manage in practice. This is especially the case when using a framework like Ruby on Rails that would have to maintain per-user connection pools. You can take a look at the &lt;a href=&quot;https://www.postgresql.org/docs/current/ddl-rowsecurity.html&quot;&gt;Postgres documentation&lt;/a&gt; to see an example of how RLS with separate database users works.&lt;/p&gt;
&lt;h3 id=&quot;using-postgres-session-variables-in-row-level-security-policies&quot; &gt;&lt;a href=&quot;#using-postgres-session-variables-in-row-level-security-policies&quot; aria-label=&quot;using postgres session variables in row level security policies permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using Postgres session variables in Row-Level Security policies&lt;/h3&gt;
&lt;p&gt;For this post we&apos;ll focus on using session variables for determining which end customer is currently logged in, and checking that variable in our RLS policy.&lt;/p&gt;
&lt;p&gt;We&apos;ll use the variable &lt;code &gt;rls.customer_id&lt;/code&gt; to identify the current customer ID. Note that you can use any variable name, e.g. &lt;code &gt;myapp.user_id&lt;/code&gt; - just make sure the name doesn&apos;t conflict with any Postgres config settings.&lt;/p&gt;
&lt;p&gt;To start, we&apos;ll create a new Postgres database user for our application. It&apos;s generally a good practice to keep administrative Postgres users separate from regular users, and this is especially important with RLS since the administrative user would typically be the table owner, and that user would by default always have full access to the table.&lt;/p&gt;
&lt;p&gt;Let&apos;s create the user on our database:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;USER&lt;/span&gt; app_user&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;GRANT&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;INSERT&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;UPDATE&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;DELETE&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt; &lt;span &gt;TO&lt;/span&gt; app_user&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If we connect with this user to the database, and attempt to query the transactions table, we&apos;ll get an empty result.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; id | customer_id | description | amount_cents | created_at 
----+-------------+-------------+--------------+------------
(0 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is by design - if the RLS policy denies access for a SELECT you will simply get an empty result. You can imagine the default-deny RLS policy as a &lt;code &gt;WHERE false&lt;/code&gt; clause that will always return nothing.&lt;/p&gt;
&lt;p&gt;When we attempt to insert data we can see the RLS policy in effect more easily:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;customer_id&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;,&lt;/span&gt; amount_cents&lt;span &gt;,&lt;/span&gt; created_at&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;test&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;4200&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;2020-01-01 00:00:00&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;ERROR:  new row violates row-level security policy for table &quot;transactions&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let&apos;s create a policy to replace the default-deny policy, that allows access based on the current value of the &lt;code &gt;rls.customer_id&lt;/code&gt; session variable:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; POLICY transactions_app_user
  &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;
  &lt;span &gt;TO&lt;/span&gt; app_user
  &lt;span &gt;USING&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;customer_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;NULLIF&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;current_setting&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;rls.customer_id&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;TRUE&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;::&lt;span &gt;bigint&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This permits &lt;code &gt;SELECT&lt;/code&gt;, &lt;code &gt;INSERT&lt;/code&gt;, &lt;code &gt;UPDATE&lt;/code&gt; and &lt;code &gt;DELETE&lt;/code&gt; access if the value of the &lt;code &gt;customer_id&lt;/code&gt; column matches the &lt;code &gt;rls.customer_id&lt;/code&gt; session variable. Since session variables use the text type, we need to cast it to &lt;code &gt;bigint&lt;/code&gt; in the policy definition for comparison, and use &lt;code &gt;NULLIF&lt;/code&gt; to ensure empty values don&apos;t get turned into &lt;code &gt;0&lt;/code&gt;, but rather &lt;code &gt;NULL&lt;/code&gt;, meaning no access.&lt;/p&gt;
&lt;p&gt;When we connect now, we would still get the same error by default, since &lt;code &gt;rls.customer_id&lt;/code&gt; would not be set yet. However, when we set the &lt;code &gt;rls.customer_id&lt;/code&gt;, our query will succeed:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; rls&lt;span &gt;.&lt;/span&gt;customer_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;customer_id&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;,&lt;/span&gt; amount_cents&lt;span &gt;,&lt;/span&gt; created_at&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;test&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;4200&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;2020-01-01 00:00:00&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;testing-rls-permissions-with-different-customers&quot; &gt;&lt;a href=&quot;#testing-rls-permissions-with-different-customers&quot; aria-label=&quot;testing rls permissions with different customers permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Testing RLS permissions with different customers&lt;/h3&gt;
&lt;p&gt;If we attempted to add a record for a different user, that would fail, since it violates the RLS policy:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;customer_id&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;,&lt;/span&gt; amount_cents&lt;span &gt;,&lt;/span&gt; created_at&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;test2&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;2300&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;2020-01-01 00:00:00&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;ERROR:  new row violates row-level security policy for table &quot;transactions&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For querying the data we just added, we can see our own row when querying the table:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                  id                  | customer_id | description | amount_cents |     created_at      
--------------------------------------+-------------+-------------+--------------+---------------------
 bfd4b810-487d-4622-af24-73d284fb90d4 |           1 | test        |         4200 | 2020-01-01 00:00:00
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;However, if we change the customer ID, the data of the other customer is no longer visible:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; rls&lt;span &gt;.&lt;/span&gt;customer_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; id | customer_id | description | amount_cents | created_at 
----+-------------+-------------+--------------+------------
(0 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see how this provides protection against accidentally inserting or querying the data for the wrong customer. If you consistently set (and reset!) the &lt;code &gt;rls.customer_id&lt;/code&gt; variable, it ensures that all queries made are only seeing data for that particular customer.&lt;/p&gt;
&lt;p&gt;Now, the big caveat with this approach is that &lt;strong&gt;SQL injection could enable an attacker to issue their own SET command&lt;/strong&gt;, therefore accessing other customer&apos;s data. The session variable based approach is only safe when you protect yourself against SQL injections. Modern frameworks like Ruby on Rails are generally good at this, but you may want to consider running additional tools like &lt;a href=&quot;https://brakemanscanner.org/docs/warning_types/sql_injection/&quot;&gt;brakeman&lt;/a&gt; to ensure hand-written queries are correctly sanitized.&lt;/p&gt;
&lt;p&gt;Let&apos;s see how you can implement RLS in your Rails app.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/advanced-database-programming-rails-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        title=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        src=&quot;https://pganalyze.com/static/24260e03f3c098e161f84b87ce28122b/acb04/ebook_promo_advanced_database_programming_rails_postgres.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;row-level-security-in-ruby-on-rails&quot; &gt;&lt;a href=&quot;#row-level-security-in-ruby-on-rails&quot; aria-label=&quot;row level security in ruby on rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Row-Level Security in Ruby on Rails&lt;/h2&gt;
&lt;p&gt;Ruby on Rails does not provide any built-in integration with RLS, and as mentioned earlier its complicated to use an RLS setup where you have one database user per end customer, since Rails would have to keep separate connection pools for each user. The session variable based approach however is fairly straightforward to implement.&lt;/p&gt;
&lt;p&gt;First of all, let&apos;s review what we need to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;(1) Set up our database and tables for RLS through Rails migrations&lt;/li&gt;
&lt;li&gt;(2) Use a different user for our application than for our migrations&lt;/li&gt;
&lt;li&gt;(3) Set the customer ID when entering a customer-specific context, and reset the customer ID when exiting that context (to avoid leaks)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&apos;s take a look at the migration first:&lt;/p&gt;
&lt;h3 id=&quot;creating-rls-enabled-tables-in-rails-migrations&quot; &gt;&lt;a href=&quot;#creating-rls-enabled-tables-in-rails-migrations&quot; aria-label=&quot;creating rls enabled tables in rails migrations permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating RLS enabled tables in Rails migrations&lt;/h3&gt;
&lt;p&gt;To keep things simple, we&apos;ll assume that you are adding the &lt;code &gt;Transaction&lt;/code&gt; model and associated &lt;code &gt;transactions&lt;/code&gt; table to the application. This example assumes you have dropped the table we created manually earlier.&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;rails g model transaction&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Transaction&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;CreateTransactions&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.1&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;change&lt;/span&gt;&lt;/span&gt;
    create_table &lt;span &gt;:transactions&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; id&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:uuid&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;t&lt;span &gt;|&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;bigint &lt;span &gt;:customer_id&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;text &lt;span &gt;:description&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;bigint &lt;span &gt;:amount_cents&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;timestamptz &lt;span &gt;:created_at&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;

    &lt;span &gt;# Grant application user permissions on the table (this migration should run as the admin user)&lt;/span&gt;
    reversible &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;dir&lt;span &gt;|&lt;/span&gt;
      dir&lt;span &gt;.&lt;/span&gt;up &lt;span &gt;do&lt;/span&gt;
        execute &lt;span &gt;&apos;GRANT SELECT, INSERT, UPDATE, DELETE ON transactions TO app_user&apos;&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
      dir&lt;span &gt;.&lt;/span&gt;down &lt;span &gt;do&lt;/span&gt;
        execute &lt;span &gt;&apos;REVOKE SELECT, INSERT, UPDATE, DELETE ON transactions FROM app_user&apos;&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;

    &lt;span &gt;# Define RLS policy&lt;/span&gt;
    reversible &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;dir&lt;span &gt;|&lt;/span&gt;
      dir&lt;span &gt;.&lt;/span&gt;up &lt;span &gt;do&lt;/span&gt;
        execute &lt;span &gt;&apos;ALTER TABLE transactions ENABLE ROW LEVEL SECURITY&apos;&lt;/span&gt;
        execute &lt;span &gt;&quot;CREATE POLICY transactions_app_user ON transactions TO app_user USING (customer_id = NULLIF(current_setting(&apos;rls.customer_id&apos;, TRUE), &apos;&apos;)::bigint)&quot;&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
      dir&lt;span &gt;.&lt;/span&gt;down &lt;span &gt;do&lt;/span&gt;
        execute &lt;span &gt;&apos;DROP POLICY transactions_app_user ON transactions&apos;&lt;/span&gt;
        execute &lt;span &gt;&apos;ALTER TABLE transactions DISABLE ROW LEVEL SECURITY&apos;&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Before we run the migration, let&apos;s make sure we use two separate database users for migrations and the actual application.&lt;/p&gt;
&lt;h3 id=&quot;using-separate-users-for-migrations-in-rails&quot; &gt;&lt;a href=&quot;#using-separate-users-for-migrations-in-rails&quot; aria-label=&quot;using separate users for migrations in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using separate users for migrations in Rails&lt;/h3&gt;
&lt;p&gt;Whilst Rails now has built-in support for multiple database connections, it&apos;s not really suited for running the migrations with a different user. Luckily there is a simple solution to this, that works in most Rails versions.&lt;/p&gt;
&lt;p&gt;Typically a Rails production app has a Procfile that is used to define which process types can be created for the app. On your local machine &lt;a href=&quot;https://github.com/ddollar/foreman&quot;&gt;foreman&lt;/a&gt; can be used to handle the Procfile. The simplest Procfile looks like this:&lt;/p&gt;
&lt;div  data-language=&quot;yml&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;web&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; bundle exec puma &lt;span &gt;-&lt;/span&gt;C ./config/puma.rb
&lt;span &gt;console&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; bundle exec rails console
&lt;span &gt;migrate&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; bundle exec rake db&lt;span &gt;:&lt;/span&gt;migrate&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We&apos;ll assume that you commonly specify the database connection using the &lt;code &gt;DATABASE_URL&lt;/code&gt; variable, as would be the case when using Heroku for example. Through a bash variable substitution we can use a separate environment variable called &lt;code &gt;DATABASE_URL_ADMIN&lt;/code&gt; for database migrations:&lt;/p&gt;
&lt;div  data-language=&quot;yml&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;web&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; bundle exec puma &lt;span &gt;-&lt;/span&gt;C ./config/puma.rb
&lt;span &gt;console&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; bundle exec rails console
&lt;span &gt;migrate&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; DATABASE_URL=$&lt;span &gt;{&lt;/span&gt;DATABASE_URL_ADMIN&lt;span &gt;:&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;$DATABASE_URL&lt;span &gt;}&lt;/span&gt; bundle exec rake db&lt;span &gt;:&lt;/span&gt;migrate&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For local testing we can configure both database connection variables in my &lt;code &gt;.env&lt;/code&gt; file:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;DATABASE_URL=postgresql://app_user@127.0.0.1:5432/rlstest
DATABASE_URL_ADMIN=postgresql://app_admin@127.0.0.1:5432/rlstest&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When we call foreman, it will result in the migrations running as the admin user:&lt;/p&gt;
&lt;div  data-language=&quot;shell&quot;&gt;&lt;pre &gt;&lt;code &gt;foreman run migrate&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Similarly, on Heroku, you could have this run as part of your release command, or manually trigger the &lt;code &gt;migrate&lt;/code&gt; process type.&lt;/p&gt;
&lt;p&gt;Again, &lt;strong&gt;this separation is important so we can ensure the application always sets a particular customer ID for queries&lt;/strong&gt;, and does not get the &quot;free for all&quot; that table owners get which permits access on the whole table.&lt;/p&gt;
&lt;p&gt;In case you want to run some of your code as the admin user you could set up a separate connection for that using Rails&apos; &lt;a href=&quot;https://guides.rubyonrails.org/active_record_multiple_databases.html&quot;&gt;multiple database connections feature&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;setting-the-customer-id-in-rails&quot; &gt;&lt;a href=&quot;#setting-the-customer-id-in-rails&quot; aria-label=&quot;setting the customer id in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Setting the Customer ID in Rails&lt;/h3&gt;
&lt;p&gt;To ensure we access the database with the correct customer ID, we can first add helpers to the &lt;code &gt;ApplicationRecord&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Base&lt;/span&gt;
  &lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;abstract_class &lt;span &gt;=&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;

  &lt;span &gt;SET_CUSTOMER_ID_SQL&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;SET rls.customer_id = %s&apos;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;freeze
  &lt;span &gt;RESET_CUSTOMER_ID_SQL&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;RESET rls.customer_id&apos;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;freeze
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;with_customer_id&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;customer_id&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;amp;&lt;/span&gt;block&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;begin&lt;/span&gt;
      connection&lt;span &gt;.&lt;/span&gt;execute format&lt;span &gt;(&lt;/span&gt;&lt;span &gt;SET_CUSTOMER_ID_SQL&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; connection&lt;span &gt;.&lt;/span&gt;quote&lt;span &gt;(&lt;/span&gt;customer_id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
      block&lt;span &gt;.&lt;/span&gt;call
    &lt;span &gt;ensure&lt;/span&gt;
      connection&lt;span &gt;.&lt;/span&gt;execute &lt;span &gt;RESET_CUSTOMER_ID_SQL&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And then we can add an around filter to our &lt;code &gt;ApplicationController&lt;/code&gt;, making sure the correct customer gets set based on an existing &lt;code &gt;current_user&lt;/code&gt; method (e.g. from an authentication library like Devise).&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;ApplicationController&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActionController&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Base&lt;/span&gt;
  around_action &lt;span &gt;:with_customer_id&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;with_customer_id&lt;/span&gt;&lt;/span&gt;
    &lt;span &gt;ApplicationRecord&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;with_customer_id&lt;span &gt;(&lt;/span&gt;current_user&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
      &lt;span &gt;yield&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With this filter in place &lt;strong&gt;all queries within the request will automatically be limited to the current customer&lt;/strong&gt; - thanks to RLS.&lt;/p&gt;
&lt;p&gt;We can also use this when querying data in the console:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;ApplicationRecord&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;with_customer_id&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
  puts &lt;span &gt;Transaction&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;all&lt;span &gt;.&lt;/span&gt;to_a&lt;span &gt;.&lt;/span&gt;inspect
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;   (1.4ms)  SET rls.customer_id = 1
  Transaction Load (1.6ms)  SELECT &quot;transactions&quot;.* FROM &quot;transactions&quot;
[#&amp;lt;Transaction id: &quot;bfd4b810-487d-4622-af24-73d284fb90d4&quot;, customer_id: 1, description: &quot;test&quot;, amount_cents: 4200, created_at: &quot;2020-01-01 00:00:00.000000000 +0000&quot;&gt;]
   (1.5ms)  RESET rls.customer_id&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It&apos;s important that your application keeps using the same Postgres connection for its queries, as the one that we issued the &lt;code &gt;SET&lt;/code&gt; command on. Rails currently makes this quite straightforward, as the same connection will be used within a single Rails web request. Connections are returned to the pool after a request has finished (and we would have called &lt;code &gt;RESET rls.customer_id&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;If you have code that directly interacts with the Rails connection pool you should review the &lt;a href=&quot;https://api.rubyonrails.org/v6.1/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html&quot;&gt;Rails connection pool documentation&lt;/a&gt; or consider using a transaction and &lt;code &gt;SET LOCAL&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Any custom code that interacts with the Rails connection pool, or third-party connection poolers, such as pgbouncer in transaction pooling mode, have a risk that the security context gets mixed up, since a different connection could run the queries than the one that used the &lt;code &gt;SET&lt;/code&gt; command. In those cases using a wrapping transaction together with &lt;code &gt;SET LOCAL&lt;/code&gt; is the safest approach.&lt;/p&gt;
&lt;h2 id=&quot;performance-implications-of-using-rls-in-postgres&quot; &gt;&lt;a href=&quot;#performance-implications-of-using-rls-in-postgres&quot; aria-label=&quot;performance implications of using rls in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Performance Implications of using RLS in Postgres&lt;/h2&gt;
&lt;p&gt;Now, you might wonder - &lt;strong&gt;why doesn&apos;t everyone use RLS with their Rails applications?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are multiple reasons why you might choose not to use RLS in your application, such as the additional complexity and maintenance overhead, or if your data model is not a good fit. One thing we haven&apos;t looked at yet is performance.&lt;/p&gt;
&lt;p&gt;First of all, the good news is that Postgres has gotten better over time with considering RLS during query planning, especially since &lt;a href=&quot;https://www.postgresql.org/docs/10/release-10.html#id-1.11.6.22.5.3.5&quot;&gt;Postgres 10&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, there are still some things to consider with regards to performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;(1) Keep the &lt;code &gt;USING&lt;/code&gt; clause of RLS policies simple, to avoid non-obvious performance issues&lt;/li&gt;
&lt;li&gt;(2) When using custom functions in your queries, ensure to mark them as &lt;code &gt;LEAKPROOF&lt;/code&gt; - this allows the planner to run them early before RLS restrictions apply&lt;/li&gt;
&lt;li&gt;(3) Ensure the columns referenced in the RLS policy &lt;code &gt;USING&lt;/code&gt; clause are indexed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To illustrate that last point, let&apos;s look at the EXPLAIN plan of a query from earlier:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; rls&lt;span &gt;.&lt;/span&gt;customer_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;ANALYZE&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Seq Scan on transactions  (cost=0.00..28.22 rows=4 width=72) (actual time=0.012..0.027 rows=1 loops=1)
   Filter: (customer_id = (NULLIF(current_setting(&apos;rls.customer_id&apos;::text, true), &apos;&apos;::text))::bigint)
 Planning Time: 0.073 ms
 Execution Time: 0.094 ms
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you see Postgres automatically adds the implicit &lt;code &gt;WHERE&lt;/code&gt; clause based on the RLS policy that applies. That seems relatively straightforward. However note that we see a Sequential Scan here. Its important to include the columns used by RLS policies in your indices, for example by making a new index on &lt;code &gt;customer_id&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;transactions&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;customer_id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using Postgres monitoring tools such as &lt;code &gt;auto_explain&lt;/code&gt; can be very helpful to find outliers that are caused by the bad plans caused by RLS.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, we&apos;ve learned how RLS works in Postgres, and how it can be used with Ruby on Rails. RLS is not complicated to use, but is a separate layer of access control, which may seem non-obvious when you are used to a single database user with full permissions accessing the database. If you decide to use RLS its also a good idea to review the performance implications.&lt;/p&gt;
&lt;p&gt;If you prefer to write less SQL yourself, you may want to take a look at the &lt;a href=&quot;https://github.com/suus-io/rls_rails&quot;&gt;rls_rails&lt;/a&gt; library that provides useful helpers for both database migrations as well as setting of the current customer (or tenant) ID.&lt;/p&gt;
&lt;p&gt;In case you determined that RLS is too complicated, but you would like a similar guarantee that every query is constrained to a specific tenant, you may want to take a look at &lt;a href=&quot;https://github.com/citusdata/activerecord-multi-tenant&quot;&gt;activerecord-multi-tenant&lt;/a&gt; which automatically rewrites your queries on the Rails side to include a &lt;code &gt;tenant_id&lt;/code&gt;, before they get sent to Postgres.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/intent/tweet?text=Learn%20how%20to%20create%20and%20implement%20a%20row%20level%20security%20policy%20with%20%23Rails%2C%20allowing%20to%20limit%20the%20database%20rows%20a%20user%20can%20access.%0D%0A%0D%0Ahttps%3A%2F%2Fpganalyze.com%2Fblog%2Fpostgres-row-level-security-ruby-rails&quot;&gt;Share this post on Twitter&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Eze is a software developer and technical writer trying to make sense of the world—building amazing stuff and documenting every step of the journey.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[A look at Postgres 14: Performance and Monitoring Improvements]]></title><description><![CDATA[The first beta release of the upcoming Postgres 14 release was made available yesterday. In this article we'll take a first look at what's in the beta, with an emphasis on one major performance improvement, as well as three monitoring improvements that caught our attention. Before we get started, I wanted to highlight what always strikes me as an important unique aspect of Postgres: Compared to most other open-source database systems, Postgres is not the project of a single company, but rather…]]></description><link>https://pganalyze.com/blog/postgres-14-performance-monitoring</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-14-performance-monitoring</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Fri, 21 May 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;The first beta release of the upcoming Postgres 14 release was &lt;a href=&quot;https://www.postgresql.org/about/news/postgresql-14-beta-1-released-2213/&quot;&gt;made available yesterday&lt;/a&gt;. In this article we&apos;ll take a first look at what&apos;s in the beta, with an emphasis on one major performance improvement, as well as three monitoring improvements that caught our attention.&lt;/p&gt;
&lt;p&gt;Before we get started, I wanted to highlight what always strikes me as an important unique aspect of Postgres: Compared to most other open-source database systems, &lt;strong&gt;Postgres is not the project of a single company&lt;/strong&gt;, but rather many individuals coming together to work on a new release, year after year. And that includes everyone who tries out the beta releases, and &lt;a href=&quot;https://www.postgresql.org/developer/beta/&quot;&gt;reports bugs to the Postgres project&lt;/a&gt;. We hope this post inspires you to do your own testing and benchmarking.&lt;/p&gt;
&lt;p&gt;Now, I&apos;m personally most excited about &lt;strong&gt;better connection scaling in Postgres 14&lt;/strong&gt;. For this post we ran a detailed benchmark comparing Postgres 13.3 to 14 beta1 (note that the connection count is log scale):&lt;/p&gt;
&lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/91c9d9c27c4ca41d34699b52f3861794/22252/connection_scaling.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Connection Scaling Benchmark Numbers comparing Postgres 13.3 and Postgres 14 beta1&quot; title=&quot;Connection Scaling Benchmark Numbers comparing Postgres 13.3 and Postgres 14 beta1&quot; src=&quot;https://pganalyze.com/static/91c9d9c27c4ca41d34699b52f3861794/1d69c/connection_scaling.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#improved-active-and-idle-connection-scaling-in-postgres-14&quot;&gt;Improved Active and Idle Connection Scaling in Postgres 14&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#dive-into-memory-use-with-pg_backend_memory_contexts&quot;&gt;Dive into memory use with pg_backend_memory_contexts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#track-wal-activity-with-pg_stat_wal&quot;&gt;Track WAL activity with pg_stat_wal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#monitor-queries-with-the-built-in-postgres-query_id&quot;&gt;Monitor queries with the built-in Postgres query_id&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#and-200-other-improvements-in-the-postgres-14-release&quot;&gt;And 200+ other improvements in the Postgres 14 release!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;improved-active-and-idle-connection-scaling-in-postgres-14&quot; &gt;&lt;a href=&quot;#improved-active-and-idle-connection-scaling-in-postgres-14&quot; aria-label=&quot;improved active and idle connection scaling in postgres 14 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Improved Active and Idle Connection Scaling in Postgres 14&lt;/h2&gt;
&lt;p&gt;Postgres 14 brings significant improvements for those of us that need a high number of database connections. The Postgres connection model relies on processes instead of threads. This has some important benefits, but it also has overhead at large connection counts. With this new release, scaling active and idle connections has gotten significantly better, and will be a major improvement for the most demanding applications.&lt;/p&gt;
&lt;p&gt;For our test, we&apos;ve used two 96 vCore AWS instances (c5.24xlarge), one running Postgres 13.3, and one running Postgres 14 beta1. Both of these use Ubuntu 20.04, with the default system settings, but the Postgres connection limit has been increased to 11,000 connections.&lt;/p&gt;
&lt;p&gt;We use &lt;a href=&quot;https://www.postgresql.org/docs/current/pgbench.html&quot;&gt;pgbench&lt;/a&gt; to test connection scaling of active connections. To start, we initialize the database with pgbench scale factor 200:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;# Postgres 13.3
$ pgbench -i -s 200
...
done in 127.71 s (drop tables 0.02 s, create tables 0.02 s, client-side generate 81.74 s, vacuum 2.63 s, primary keys 43.30 s).&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;# Postgres 14 beta1
$ pgbench -i -s 200
...
done in 77.33 s (drop tables 0.02 s, create tables 0.02 s, client-side generate 48.19 s, vacuum 2.70 s, primary keys 26.40 s).&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Already here we can see that Postgres 14 does much better in the initial data load.&lt;/p&gt;
&lt;p&gt;We now launch read-only pgbench with a varying set of active connections, showing 5,000 concurrent connections as an example of a very active workload:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;# Postgres 13.3
$ pgbench -S -c 5000 -j 96 -M prepared -T30
...
tps = 417847.658491 (excluding connections establishing)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;# Postgres 14 beta1
$ pgbench -S -c 5000 -j 96 -M prepared -T30
...
tps = 495108.316805 (without initial connection time)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, the throughput of Postgres 14 at 5000 active connections is about 20% higher. &lt;strong&gt;At 10,000 active connections the improvement is 50% over Postgres 13&lt;/strong&gt;, and at lower connection counts you can also see consistent improvements.&lt;/p&gt;
&lt;p&gt;Note that you will usually see a noticeable TPS drop when the number of connections exceeds the number of CPUs, this is most likely due to CPU scheduling overhead, and not a limitation in Postgres itself. Now, most workloads don&apos;t actually have this many active connections, but rather a high number of idle connections.&lt;/p&gt;
&lt;p&gt;The original author of this work, &lt;a href=&quot;https://twitter.com/andresfreundtec&quot;&gt;Andres Freund&lt;/a&gt;, ran a benchmark on the throughput of a single active query, whilst also running 10,000 idle connections. The query went from 15,000 TPS to almost 35,000 TPS - that&apos;s over 2x better than in Postgres 13. You can find all the details in &lt;strong&gt;&lt;a href=&quot;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462#fn:1&quot;&gt;Andres Freund&apos;s original post introducing these improvements&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&quot;dive-into-memory-use-with-pg_backend_memory_contexts&quot; &gt;&lt;a href=&quot;#dive-into-memory-use-with-pg_backend_memory_contexts&quot; aria-label=&quot;dive into memory use with pg_backend_memory_contexts permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Dive into memory use with pg_backend_memory_contexts&lt;/h2&gt;
&lt;p&gt;Have you ever been curious why a certain Postgres connection is taking up a higher amount of memory? With the new &lt;code &gt;pg_backend_memory_contexts&lt;/code&gt; view you can take a close look at what exactly is allocated for a given Postgres process.&lt;/p&gt;
&lt;p&gt;To start, we can calculate how much memory is used by our current connection in total:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; pg_size_pretty&lt;span &gt;(&lt;/span&gt;&lt;span &gt;SUM&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;used_bytes&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_backend_memory_contexts&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; pg_size_pretty 
----------------
 939 kB
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, let&apos;s dive a bit deeper. When we query the table for the top 5 entries by memory usage, you will notice there is actually a lot of detailed information:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_backend_memory_contexts &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; used_bytes &lt;span &gt;DESC&lt;/span&gt; &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;          name           | ident |      parent      | level | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes 
-------------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
 CacheMemoryContext      |       | TopMemoryContext |     1 |      524288 |             7 |      64176 |           0 |     460112
 Timezones               |       | TopMemoryContext |     1 |      104120 |             2 |       2616 |           0 |     101504
 TopMemoryContext        |       |                  |     0 |       68704 |             5 |      13952 |          12 |      54752
 WAL record construction |       | TopMemoryContext |     1 |       49768 |             2 |       6360 |           0 |      43408
 MessageContext          |       | TopMemoryContext |     1 |       65536 |             4 |      22824 |           0 |      42712
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A memory context in Postgres is a memory region that is used for allocations to support activities such as query planning or query execution. Once Postgres completes work in a context, the whole context can be freed, simplifying memory handling. Through the use of memory contexts the Postgres source actually avoids doing manual &lt;code &gt;free&lt;/code&gt; calls for the most part (even though it&apos;s written in C), instead relying on memory contexts to clean up memory in groups. The top memory context here, CacheMemoryContext is used for many long-lived caches in Postgres.&lt;/p&gt;
&lt;p&gt;We can illustrate the impact of loading additional tables into a connection by running a query on a new table, and then querying the view again:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; test3&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_backend_memory_contexts &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; used_bytes &lt;span &gt;DESC&lt;/span&gt; &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;          name           | ident |      parent      | level | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes 
-------------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
 CacheMemoryContext      |       | TopMemoryContext |     1 |      524288 |             7 |      61680 |           1 |     462608
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see the new view illustrates that simply having queried a table on this connection will retain about 2kb of memory, even after the query has finished. This caching of table information is done to speed up future queries, but can sometimes cause surprising amounts of memory usage for multi-tenant databases with many different schemas. You can now illustrate such issues easily through this new monitoring view.&lt;/p&gt;
&lt;p&gt;If you&apos;d like to access this information for processes other than the current one, you can use the new &lt;code &gt;pg_log_backend_memory_contexts&lt;/code&gt; function which will cause the specified process to output its own memory consumption to the Postgres log:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; pg_log_backend_memory_contexts&lt;span &gt;(&lt;/span&gt;&lt;span &gt;10377&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;LOG:  logging memory contexts of PID 10377
STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used
LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used
LOG:  level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used
LOG:  level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used
LOG:  level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used
LOG:  level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used
LOG:  level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used
LOG:  level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
...
LOG:  level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used
LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;track-wal-activity-with-pg_stat_wal&quot; &gt;&lt;a href=&quot;#track-wal-activity-with-pg_stat_wal&quot; aria-label=&quot;track wal activity with pg_stat_wal permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Track WAL activity with pg_stat_wal&lt;/h2&gt;
&lt;p&gt;Building on the WAL monitoring capabilities in Postgres 13, the new release brings a new server-wide summary view for WAL information, called &lt;code &gt;pg_stat_wal&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You can use this to monitor WAL writes over time more easily:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_stat_wal&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;-[ RECORD 1 ]----+------------------------------
wal_records      | 3334645
wal_fpi          | 8480
wal_bytes        | 282414530
wal_buffers_full | 799
wal_write        | 429769
wal_sync         | 428912
wal_write_time   | 0
wal_sync_time    | 0
stats_reset      | 2021-05-21 07:33:22.941452+00&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With this new view we can get summary information such as how many Full Page Images (FPI) were written to the WAL, which can give you insights on when Postgres generated a lot of WAL records due to a checkpoint. Secondly, you can use the new &lt;code &gt;wal_buffers_full&lt;/code&gt; counter to quickly see when the &lt;code &gt;wal_buffers&lt;/code&gt; setting is set too low, which can cause unnecessary I/O that can be prevented by raising wal_buffers to a higher value.&lt;/p&gt;
&lt;p&gt;You can also get more details of the I/O impact of WAL writes by enabling the optional &lt;code &gt;track_wal_io_timing&lt;/code&gt; setting, which then gives you the exact I/O times for WAL writes, and WAL file syncs to disk. Note this setting can have noticeable overhead, so it&apos;s best turned off (the default) unless needed.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;monitor-queries-with-the-built-in-postgres-query_id&quot; &gt;&lt;a href=&quot;#monitor-queries-with-the-built-in-postgres-query_id&quot; aria-label=&quot;monitor queries with the built in postgres query_id permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Monitor queries with the built-in Postgres query_id&lt;/h2&gt;
&lt;p&gt;In a recent &lt;a href=&quot;https://www.timescale.com/state-of-postgres-results/#top-three&quot;&gt;survey done by TimescaleDB&lt;/a&gt; in March and April 2021, the &lt;code &gt;pg_stat_statements&lt;/code&gt; extension was named one of the top three extensions the surveyed user base uses with Postgres. &lt;code &gt;pg_stat_statements&lt;/code&gt; is bundled with Postgres, and with Postgres 14 one of the important features of the extensions got merged into core Postgres:&lt;/p&gt;
&lt;p&gt;The calculation of the &lt;code &gt;query_id&lt;/code&gt;, which uniquely identifies a query, whilst ignoring constant values. Thus, if you run the same query again it will have the same &lt;code &gt;query_id&lt;/code&gt;, enabling you to identify workload patterns on the database. Previously this information was only available with &lt;code &gt;pg_stat_statements&lt;/code&gt;, which shows aggregate statistics about queries that have finished executing, but now this is available with &lt;code &gt;pg_stat_activity&lt;/code&gt; as well as in log files.&lt;/p&gt;
&lt;p&gt;First we have to enable the new &lt;code &gt;compute_query_id&lt;/code&gt; setting and restart Postgres afterwards:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;ALTER&lt;/span&gt; SYSTEM &lt;span &gt;SET&lt;/span&gt; compute_query_id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;on&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you use &lt;code &gt;pg_stat_statements&lt;/code&gt; query IDs will be calculated by automatically, through the default &lt;code &gt;compute_query_id&lt;/code&gt; setting of &lt;code &gt;auto&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;With query IDs enabled, we can look at &lt;code &gt;pg_stat_activity&lt;/code&gt; during a pgbench run and see why this is helpful as compared to just looking at query text:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; query&lt;span &gt;,&lt;/span&gt; query_id &lt;span &gt;FROM&lt;/span&gt; pg_stat_activity &lt;span &gt;WHERE&lt;/span&gt; backend_type &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;client backend&apos;&lt;/span&gt; &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                 query                                  |      query_id      
------------------------------------------------------------------------+--------------------
 UPDATE pgbench_tellers SET tbalance = tbalance + -4416 WHERE tid = 3;  | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + -2979 WHERE tid = 10; | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + 2560 WHERE tid = 6;   | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + -65 WHERE tid = 7;    | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + -136 WHERE tid = 9;   | 885704527939071629
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All of these queries are the same from an application perspective, but their text is slightly different, making it hard to find patterns in the workload. With the query ID however we can clearly identify the number of certain kinds of queries, and assess performance problems more easily. For example, we can group by the query ID to see what&apos;s keeping the database busy:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; state&lt;span &gt;,&lt;/span&gt; query_id &lt;span &gt;FROM&lt;/span&gt; pg_stat_activity &lt;span &gt;WHERE&lt;/span&gt; backend_type &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;client backend&apos;&lt;/span&gt; &lt;span &gt;GROUP&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;3&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; count | state  |       query_id       
-------+--------+----------------------
    40 | active |   885704527939071629
     9 | active |  7660508830961861980
     1 | active | -7810315603562552972
     1 | active | -3907106720789821134
(4 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When you run this on your own system you may find that the query ID is different from the one shown here. This is due to query IDs being dependent on the internal representation of a Postgres query, which can be architecture dependent, and also considers internal IDs of tables instead of their names.&lt;/p&gt;
&lt;p&gt;The query ID information is also available in &lt;code &gt;log_line_prefix&lt;/code&gt; through the new %Q option, making it easier to get auto_explain output thats linked to a query:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;2021-05-21 08:18:02.949 UTC [7176] [user=postgres,db=postgres,app=pgbench,query=885704527939071629] LOG:  duration: 59.827 ms  plan:
	Query Text: UPDATE pgbench_tellers SET tbalance = tbalance + -1902 WHERE tid = 6;
	Update on pgbench_tellers  (cost=4.14..8.16 rows=0 width=0) (actual time=59.825..59.826 rows=0 loops=1)
	  -&gt;  Bitmap Heap Scan on pgbench_tellers  (cost=4.14..8.16 rows=1 width=10) (actual time=0.009..0.011 rows=1 loops=1)
	        Recheck Cond: (tid = 6)
	        Heap Blocks: exact=1
	        -&gt;  Bitmap Index Scan on pgbench_tellers_pkey  (cost=0.00..4.14 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
	              Index Cond: (tid = 6)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Want to link &lt;code &gt;auto_explain&lt;/code&gt; and &lt;code &gt;pg_stat_statements&lt;/code&gt;, and can&apos;t wait for Postgres 14?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We built our &lt;a src=&quot;https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser#fingerprints-in-pg_query-a-better-way-to-check-if-two-queries-are-identical&quot;&gt;own open-source query fingerprint mechanism&lt;/a&gt; that uniquely identifies queries based on their text. This is used in pganalyze for matching EXPLAIN plans to queries, and you can also use this in your own scripts, with any Postgres version.&lt;/p&gt;
&lt;h2 id=&quot;and-200-other-improvements-in-the-postgres-14-release&quot; &gt;&lt;a href=&quot;#and-200-other-improvements-in-the-postgres-14-release&quot; aria-label=&quot;and 200 other improvements in the postgres 14 release permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;And 200+ other improvements in the Postgres 14 release!&lt;/h2&gt;
&lt;p&gt;These are just some of the many improvements in the new Postgres release. You can find more on what&apos;s new in the &lt;a href=&quot;https://www.postgresql.org/docs/14/release-14.html&quot;&gt;release notes&lt;/a&gt;, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The new predefined roles &lt;code &gt;pg_read_all_data&lt;/code&gt;/&lt;code &gt;pg_write_all_data&lt;/code&gt; give global read or write access&lt;/li&gt;
&lt;li&gt;Automatic cancellation of long-running queries if the client disconnects&lt;/li&gt;
&lt;li&gt;Vacuum now skips index vacuuming when the number of removable index entries is insignificant&lt;/li&gt;
&lt;li&gt;Per-index information is now included in autovacuum logging output&lt;/li&gt;
&lt;li&gt;Partitions can now be detached in a non-blocking manner with &lt;code &gt;ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And many more. &lt;strong&gt;Now is the time to help test!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Download beta1 from the &lt;a href=&quot;https://www.postgresql.org/download/&quot;&gt;official package repositories&lt;/a&gt;, or build it from source. We can all contribute to making Postgres 14 a stable release in a few months from now.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;At pganalyze, we&apos;re excited about Postgres 14, and hope this post got you interested as well! Postgres shows again how many small improvements make it a stable, trustworthy database, that is built by the community, for the community.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/intent/tweet?text=%22An%20early%20look%20at%20%23Postgres14%20Performance%20and%20Monitoring%20Improvements%22%20-%20Here,%20%40pganalyze%20looks%20at%20idle%20and%20active%20connection%20scaling,%20memory%20monitoring,%20query%20IDs,%20and%20more%3A%20https%3A%2F%2Fpganalyze.com%2Fblog%2Fpostgres-14-performance-monitoring&quot;&gt;Share this post on Twitter&lt;/a&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Creating Custom Postgres Data Types in Rails]]></title><description><![CDATA[Postgres ships with the most widely used common data types, like integers and text, built in, but it's also flexible enough to allow you to define your own data types if your project demands it. Say you're saving price data and you want to ensure that it’s never negative. You might create a  type that you could then use to define columns on multiple tables. Or maybe you have data that makes more sense grouped together, like GPS coordinates. Postgres allows you to create a type to hold that data…]]></description><link>https://pganalyze.com/blog/custom-postgres-data-types-ruby-rails</link><guid isPermaLink="false">https://pganalyze.com/blog/custom-postgres-data-types-ruby-rails</guid><dc:creator><![CDATA[Josh Alletto]]></dc:creator><pubDate>Thu, 22 Apr 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Postgres ships with the most widely used common data types, like integers and text, built in, but it&apos;s also flexible enough to allow you to define your own data types if your project demands it.&lt;/p&gt;
&lt;p&gt;Say you&apos;re saving price data and you want to ensure that it’s never negative. You might create a &lt;code &gt;not_negative_int&lt;/code&gt; type that you could then use to define columns on multiple tables. Or maybe you have data that makes more sense grouped together, like GPS coordinates. Postgres allows you to &lt;strong&gt;create a type to hold that data together in one column&lt;/strong&gt; rather than spread it across multiple columns.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#custom-data-types-in-postgres&quot;&gt;Custom Data Types in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#composite-types&quot;&gt;Composite Types&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#custom-types-in-rails&quot;&gt;Custom Types in Rails&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#using-the-active-record-attributes-api-to-register-new-custom-types-in-rails&quot;&gt;Using the Active Record Attributes API to Register new Custom Types in Rails&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;In Rails, all attributes pass through the attributes API when they’re entered by the user or read from the database. Rails 5 introduced the &lt;a href=&quot;https://api.rubyonrails.org/classes/ActiveRecord/Attributes/ClassMethods.html&quot;&gt;Attributes API&lt;/a&gt;, allowing you to define your own attribute types and use them in your application.&lt;/p&gt;
&lt;p&gt;In this tutorial, you&apos;ll learn how to &lt;strong&gt;work with two of the most common custom types available in PostgreSQL&lt;/strong&gt;. You&apos;ll also see how to incorporate them into your Rails application using the Attributes API.&lt;/p&gt;
&lt;p&gt;Should you be interested in learning how to create custom Postgres data types in Django, we&apos;ve got you covered! Just read our dedicated article about it here: &lt;a href=&quot;https://pganalyze.com/blog/custom-postgres-data-types-django-python&quot;&gt;Creating Custom Postgres Data Types in Django&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
&lt;img src=&quot;https://pganalyze.com/b92a49ab9de2d43bbab7c9de6ac8a962/custom_postgres_data_types_ruby_rails_header_pganalyze.svg&quot; alt=&quot;Postgres Custom Data Type Example&quot;&gt;
&lt;/p&gt;
&lt;h2 id=&quot;custom-data-types-in-postgres&quot; &gt;&lt;a href=&quot;#custom-data-types-in-postgres&quot; aria-label=&quot;custom data types in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Custom Data Types in Postgres&lt;/h2&gt;
&lt;p&gt;There are two custom data types you&apos;ll learn about in this post:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createdomain.html&quot;&gt;Domain types&lt;/a&gt;: These allow you to put certain restrictions on a data type that can be reused later.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createtype.html&quot;&gt;Composite types&lt;/a&gt;: These let you group data together to form a new type.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;First, let&apos;s take a look at how to create a domain type. Say you want to ensure a username doesn&apos;t contain a &lt;code &gt;!&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; DOMAIN string_without_bang &lt;span &gt;as&lt;/span&gt; &lt;span &gt;VARCHAR&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt; &lt;span &gt;CHECK&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;value&lt;/span&gt; &lt;span &gt;!&lt;/span&gt;&lt;span &gt;~&lt;/span&gt; &lt;span &gt;&apos;!&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After that, you can use this domain type when you create our users table:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;
 id &lt;span &gt;serial&lt;/span&gt; &lt;span &gt;primary&lt;/span&gt; &lt;span &gt;key&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
 user_name string_without_bang
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let’s try creating a user with a username that contains an exclamation point. You&apos;ll see an error message:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; users&lt;span &gt;(&lt;/span&gt;user_name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;coolguy!!&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- ERROR:  value for domain string_without_bang violates check constraint &quot;string_without_bang_check&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can even use a domain in the definition of another domain:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; DOMAIN email_with_check &lt;span &gt;AS&lt;/span&gt; string_without_bang &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt; &lt;span &gt;CHECK&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;value&lt;/span&gt; &lt;span &gt;~&lt;/span&gt; &lt;span &gt;&apos;@&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; email_addresses &lt;span &gt;(&lt;/span&gt;
  user_id &lt;span &gt;integer&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  email email_with_check
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; email_addresses&lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;frank!@gmail.com&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- ERROR:  value for domain email_with_check violates check constraint &quot;string_without_bang_check&quot;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; email_addresses&lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;joshgmail.com&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- ERROR:  value for domain email_with_check violates check constraint &quot;email_with_check_check&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;composite-types&quot; &gt;&lt;a href=&quot;#composite-types&quot; aria-label=&quot;composite types permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Composite Types&lt;/h2&gt;
&lt;p&gt;Composite types allow you to group different pieces of data together into one column. They&apos;re useful for information that has more meaning when grouped together, like RGB color values or the dimensions of a package.&lt;/p&gt;
&lt;p&gt;Let’s start by creating a dimensions type:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TYPE&lt;/span&gt; dimensions &lt;span &gt;as&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
  depth &lt;span &gt;integer&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  width &lt;span &gt;integer&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  height &lt;span &gt;integer&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, let’s create a table using this new type. Try also using the domain type you created previously:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; orders &lt;span &gt;(&lt;/span&gt;
  product string_without_bang&lt;span &gt;,&lt;/span&gt;
  dims dimensions
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Add some data and take a look at the output when you query the table:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; orders&lt;span &gt;(&lt;/span&gt;product&lt;span &gt;,&lt;/span&gt; dims&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;widget&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;50&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;88&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;101&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; orders&lt;span &gt;;&lt;/span&gt;

 product &lt;span &gt;|&lt;/span&gt;    dims     
&lt;span &gt;---------+-------------&lt;/span&gt;
 widget  &lt;span &gt;|&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;50&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;88&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;101&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You&apos;ll see that all the data related to the dimensions of the package is &lt;strong&gt;saved together in the dims column&lt;/strong&gt;. But don&apos;t worry, you&apos;ll still be able to access the individual ints.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;dims&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;width &lt;span &gt;FROM&lt;/span&gt; orders&lt;span &gt;;&lt;/span&gt;

 width 
&lt;span &gt;-------&lt;/span&gt;
  &lt;span &gt;88&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/advanced-database-programming-rails-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        title=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        src=&quot;https://pganalyze.com/static/24260e03f3c098e161f84b87ce28122b/acb04/ebook_promo_advanced_database_programming_rails_postgres.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;custom-types-in-rails&quot; &gt;&lt;a href=&quot;#custom-types-in-rails&quot; aria-label=&quot;custom types in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Custom Types in Rails&lt;/h2&gt;
&lt;p&gt;In order to use our custom types in Rails, &lt;strong&gt;you’ll have to do two things&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create the migration that sets the types up for us in the database.&lt;/li&gt;
&lt;li&gt;Tell Rails how to handle your new type so you can easily work with it in Ruby.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently, Rails doesn&apos;t offer any built-in solution for creating types in migrations, so you&apos;ll have to run some raw SQL. The code below runs exactly what you ran above to create the type directly in PostgreSQL, then immediately uses the types to build the orders table:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;CreateOrders&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.1&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;up&lt;/span&gt;&lt;/span&gt;
    execute &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&amp;lt;~&lt;/span&gt;SQL&lt;/span&gt;
      CREATE TYPE dimensions as (
        depth integer,
        width integer,
        height integer
      );
      CREATE DOMAIN string_without_bang as VARCHAR NOT NULL CHECK (value !~ &apos;!&apos;);
    &lt;span &gt;SQL&lt;/span&gt;&lt;/span&gt;
    create_table &lt;span &gt;:orders&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;t&lt;span &gt;|&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;column &lt;span &gt;:product&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:string_without_bang&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;column &lt;span &gt;:dims&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:dimensions&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;down&lt;/span&gt;&lt;/span&gt;
    drop_table &lt;span &gt;:orders&lt;/span&gt;
    execute &lt;span &gt;&quot;DROP TYPE dimensions&quot;&lt;/span&gt;
    execute &lt;span &gt;&quot;DROP DOMAIN string_without_bang&quot;&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You&apos;ll need to use &lt;a href=&quot;https://www.gapintelligence.com/blog/up-and-down-a-rails-migration/&quot;&gt;&lt;code &gt;up&lt;/code&gt; and &lt;code &gt;down&lt;/code&gt;&lt;/a&gt; methods here since you’re running some raw SQL that Rails won&apos;t be able to easily undo on its own if you want to do a rollback.&lt;/p&gt;
&lt;p&gt;Run the migrations, and you&apos;ll see output that looks similar to this:&lt;/p&gt;
&lt;div  data-language=&quot;bash&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;==&lt;/span&gt; &lt;span &gt;20210211230550&lt;/span&gt; Orders: migrating &lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;
-- execute&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;CREATE TYPE dimensions as (&lt;span  title=&quot;\n&quot;&gt;\n&lt;/span&gt;  depth integer,&lt;span  title=&quot;\n&quot;&gt;\n&lt;/span&gt;  width integer,&lt;span  title=&quot;\n&quot;&gt;\n&lt;/span&gt;  height integer&lt;span  title=&quot;\n&quot;&gt;\n&lt;/span&gt;);&lt;span  title=&quot;\n&quot;&gt;\n&lt;/span&gt;CREATE DOMAIN string_without_bang as VARCHAR NOT NULL CHECK (value !~ &apos;!&apos;);&lt;span  title=&quot;\n&quot;&gt;\n&lt;/span&gt;&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
   -&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;.0012s
-- create_table&lt;span &gt;(&lt;/span&gt;:Orders&lt;span &gt;)&lt;/span&gt;
   -&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;.0058s
&lt;span &gt;==&lt;/span&gt; &lt;span &gt;20210211230550&lt;/span&gt; Orders: migrated &lt;span &gt;(&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;.0071s&lt;span &gt;)&lt;/span&gt; &lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;&lt;span &gt;==&lt;/span&gt;

unknown OID &lt;span &gt;25279&lt;/span&gt;: failed to recognize &lt;span &gt;type&lt;/span&gt; of &lt;span &gt;&apos;dims&apos;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt; It will be treated as String.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice that the migration succeeded, but &lt;strong&gt;Rails does not know what to do with the composite type&lt;/strong&gt;, so it will treat it as a string. If you check the database directly, you&apos;ll see that the type for dims column is what you expect:&lt;/p&gt;
&lt;div  data-language=&quot;psql&quot;&gt;&lt;pre &gt;&lt;code &gt;=# \d orders
 Column  |        Type         |
---------+---------------------+
 id      | bigint              |
 product | string_without_bang |
 dims    | dimensions          |&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Right now, if you create a new product, you&apos;ll need to enter the dims data as a properly formatted string like this:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;001&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o &lt;span &gt;=&lt;/span&gt; &lt;span &gt;Order&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;create product&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;hat&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; dims&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;(1,2,3)&apos;&lt;/span&gt;
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;002&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o&lt;span &gt;.&lt;/span&gt;dims
 &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;&quot;(1,2,3)&quot;&lt;/span&gt; 
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;003&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This setup doesn&apos;t allow you to update the individual elements without having to completely override the entire string. What&apos;s needed here is a dimensions class that has methods that know how to deal with this new data type. Luckily, &lt;strong&gt;Rails has a solution for this&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id=&quot;using-the-active-record-attributes-api-to-register-new-custom-types-in-rails&quot; &gt;&lt;a href=&quot;#using-the-active-record-attributes-api-to-register-new-custom-types-in-rails&quot; aria-label=&quot;using the active record attributes api to register new custom types in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using the Active Record Attributes API to Register new Custom Types in Rails&lt;/h3&gt;
&lt;p&gt;You can use the Active Record &lt;a href=&quot;https://api.rubyonrails.org/classes/ActiveRecord/Attributes/ClassMethods.html&quot;&gt;Attributes API&lt;/a&gt; to register the new type and control what it looks like when leaving and entering the database.&lt;/p&gt;
&lt;p&gt;Start by creating a dimensions class that takes in a string in the initialize method. Data will come in from the database as a string with parentheses &lt;code &gt;&quot;(1,2,3)&quot;&lt;/code&gt;, so you&apos;ll need to parse it and then set some instance variables. Notice the code also includes a &lt;code &gt;to_s&lt;/code&gt; method that returns the data back to the string with parentheses that the database will understand.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Dimension&lt;/span&gt;
  attr_accessor &lt;span &gt;:depth&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:width&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:height&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;initialize&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;values&lt;span &gt;)&lt;/span&gt;
    dims &lt;span &gt;=&lt;/span&gt; values &lt;span &gt;?&lt;/span&gt; sanitize_string&lt;span &gt;(&lt;/span&gt;values&lt;span &gt;)&lt;/span&gt; &lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; 
    &lt;span &gt;@depth&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; dims&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
    &lt;span &gt;@width&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; dims&lt;span &gt;[&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
    &lt;span &gt;@height&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; dims&lt;span &gt;[&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;sanitize_string&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;values&lt;span &gt;)&lt;/span&gt;
    values&lt;span &gt;.&lt;/span&gt;delete&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;()&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;split&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;,&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;map&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&amp;amp;&lt;/span&gt;&lt;span &gt;:to_i&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;to_s&lt;/span&gt;&lt;/span&gt;
    &lt;span &gt;&quot;(&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;depth&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;,&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;width&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;,&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;height&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;)&quot;&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This class will act as a &lt;strong&gt;wrapper&lt;/strong&gt; for the dimension type and will make it easier to work with, but you still need to tell Rails how to handle it when saving to the database and instantiating your order objects. Just like Rails knows how to take a Ruby string type or a Ruby int type and pass it off to PostgreSQL in a way it can save and understand, you need to tell Rails how to handle your new dimension type. You can do that by creating a &lt;code &gt;DimensionType&lt;/code&gt; that inherits from  &lt;code &gt;ActiveRecord::Type::Value&lt;/code&gt; and setting up a few methods.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;DimensionType&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Type&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Value&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;cast&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;Dimension&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;serialize&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;)&lt;/span&gt;
    value&lt;span &gt;.&lt;/span&gt;to_s
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;changed_in_place&lt;/span&gt;&lt;/span&gt;&lt;span &gt;?&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;raw_old_value&lt;span &gt;,&lt;/span&gt; new_value&lt;span &gt;)&lt;/span&gt;
    raw_old_value &lt;span &gt;!=&lt;/span&gt; serialize&lt;span &gt;(&lt;/span&gt;new_value&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code &gt;#cast&lt;/code&gt; method gets called by Active Record when setting an attribute in the model. You can use your new dimension class for this.&lt;/p&gt;
&lt;p&gt;The &lt;code &gt;#serialize&lt;/code&gt; method converts your dimension object to a type that PostgreSQL can understand. This is why you set up your &lt;code &gt;to_s&lt;/code&gt; method in your &lt;code &gt;Dimension&lt;/code&gt; class.&lt;/p&gt;
&lt;p&gt;Finally, &lt;code &gt;#changed_in_place?&lt;/code&gt; takes care of comparing the raw value in the database with your new value. This is what gets called whenever Active Record tries to decide if it needs to make an update to the database. &lt;code &gt;raw_old_value&lt;/code&gt; will always be a string because it&apos;s read directly from the database. &lt;code &gt;new_value&lt;/code&gt;, in this case, will be an instance of &lt;code &gt;Dimension&lt;/code&gt;, so it needs to be converted to a string in order to make the comparison.&lt;/p&gt;
&lt;p&gt;The last piece of the puzzle will be to tell the order model to use the new &lt;code &gt;DimensionType&lt;/code&gt; for the &lt;code &gt;dims&lt;/code&gt; attribute.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Order&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  attribute &lt;span &gt;:dims&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;DimensionType&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let’s open a Rails console and test out the new type:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;001&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o &lt;span &gt;=&lt;/span&gt; &lt;span &gt;Order&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;
 &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;#&amp;lt;Order id: nil, product: nil, dims: #&amp;lt;Dimension:0x00007fda47295e40 @depth=&quot;0&quot;, @width=&quot;0&quot;, @height=&quot;0&quot;&gt;&gt; &lt;/span&gt;
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;002&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o&lt;span &gt;.&lt;/span&gt;product &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;a wig&apos;&lt;/span&gt;
 &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;&quot;a wig&quot;&lt;/span&gt; 
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;003&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o&lt;span &gt;.&lt;/span&gt;dims&lt;span &gt;.&lt;/span&gt;width &lt;span &gt;=&lt;/span&gt; &lt;span &gt;9&lt;/span&gt;
 &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;9&lt;/span&gt; 
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;004&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o&lt;span &gt;.&lt;/span&gt;dims&lt;span &gt;.&lt;/span&gt;depth &lt;span &gt;=&lt;/span&gt; &lt;span &gt;4&lt;/span&gt;
 &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;4&lt;/span&gt; 
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;005&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o&lt;span &gt;.&lt;/span&gt;dims&lt;span &gt;.&lt;/span&gt;height &lt;span &gt;=&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;
 &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;1&lt;/span&gt; 
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;006&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; o&lt;span &gt;.&lt;/span&gt;save
&lt;span &gt;D&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;2021&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;02&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;22&lt;/span&gt;T09&lt;span &gt;:&lt;/span&gt;&lt;span &gt;55&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;41.588716&lt;/span&gt; &lt;span &gt;#79057] DEBUG -- :    (0.2ms)  BEGIN&lt;/span&gt;
&lt;span &gt;D&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;2021&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;02&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;22&lt;/span&gt;T09&lt;span &gt;:&lt;/span&gt;&lt;span &gt;55&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;41.607391&lt;/span&gt; &lt;span &gt;#79057] DEBUG -- :   Order Create (0.8ms)  INSERT INTO &quot;orders&quot; (&quot;product&quot;, &quot;dims&quot;) VALUES ($1, $2) RETURNING &quot;id&quot;  [[&quot;product&quot;, &quot;a wig&quot;], [&quot;dims&quot;, &quot;(4,9,1)&quot;]]&lt;/span&gt;
&lt;span &gt;D&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;2021&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;02&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;22&lt;/span&gt;T09&lt;span &gt;:&lt;/span&gt;&lt;span &gt;55&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;41.610457&lt;/span&gt; &lt;span &gt;#79057] DEBUG -- :    (1.2ms)  COMMIT&lt;/span&gt;
 &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;true&lt;/span&gt; 
&lt;span &gt;2.6&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt; &lt;span &gt;:&lt;/span&gt;&lt;span &gt;007&lt;/span&gt; &lt;span &gt;&gt;&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;See if you can follow the same process to set up the &lt;code &gt;string_without_bang&lt;/code&gt;!&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, we walked through how to create two different unique data types in PostgreSQL.&lt;/p&gt;
&lt;p&gt;The first, a &lt;strong&gt;domain type&lt;/strong&gt;, allows you to create checks on your data and reuse those checks on multiple columns. The second, the &lt;strong&gt;composite type&lt;/strong&gt;, lets you group data together in a meaningful way for storage in a single column. Finally, we learned how to hook into the Rails Attributes API to help instantiate your new type as an object that Ruby knows how to use.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22Creating%20Custom%20%23Postgres%20Data%20Types%20in%20%23Rails%22%20-%20Here,%20%40pganalyze%20show%20how%20to%20create%20domain%20types%20and%20composite%20types%20in%20PostgreSQL%20and%20show%20how%20to%20hook%20into%20the%20Rails%20Attributes%20API%20to%20help%20instantiate%20them%20as%20objects%20that%20Ruby%20can%20use%3A%20https%3A%2F%2Fpganalyze.com%2Fblog%2Fcustom-postgres-data-types-ruby-rails&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Josh Alletto is an instructor at Code Platoon. In 2018 he changed careers from education to tech and has been excited to find that his communication and presentation skills have transferred over to his new technical career. He&apos;s always looking for a new challenge and a dedicated team to collaborate with.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Introducing pg_query 2.0: The easiest way to parse Postgres queries]]></title><description><![CDATA[The query parser is a core component of Postgres: the database needs to understand what data you're asking for in order to return the right results. But this functionality is also useful for all sorts of other tools that work with Postgres queries. A few years ago, we released pg_query to support this functionality in a standalone C library. pganalyze uses pg_query to parse and analyze every SQL query that runs on your Postgres database. Our initial motivation was to create pg_query for checking…]]></description><link>https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser</link><guid isPermaLink="false">https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Thu, 18 Mar 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/75f87b67b2112c7051bbb7a763b21a0b/aa440/query_parsing_intro_image.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Parsing of &amp;quot;SELECT * FROM mytable&amp;quot; SQL statement into the associated Postgres parse tree&quot;
        title=&quot;Parsing of &amp;quot;SELECT * FROM mytable&amp;quot; SQL statement into the associated Postgres parse tree&quot;
        src=&quot;https://pganalyze.com/static/75f87b67b2112c7051bbb7a763b21a0b/1d69c/query_parsing_intro_image.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The query parser is a core component of Postgres: the database needs to understand what data you&apos;re asking for in order to return the right results. But this functionality is also useful for all sorts of other tools that work with Postgres queries. A few years ago, &lt;a href=&quot;https://pganalyze.com/blog/parse-postgresql-queries-in-ruby&quot;&gt;we released pg_query&lt;/a&gt; to support this functionality in a standalone C library.&lt;/p&gt;
&lt;p&gt;pganalyze uses pg_query to &lt;strong&gt;parse and analyze every SQL query that runs on your Postgres database&lt;/strong&gt;. Our initial motivation was to create pg_query for checking which tables a query references, or what kind of statement it is. Since then we&apos;ve expanded its use in pganalyze itself. pganalyze now truncates query text in a smart manner in the query overview. The &lt;a href=&quot;https://github.com/pganalyze/collector&quot;&gt;pganalyze-collector&lt;/a&gt; supports collecting EXPLAIN plans, and uses pg_query to support log-based EXPLAIN. And we link together &lt;code &gt;pg_stat_statements&lt;/code&gt; and &lt;code &gt;auto_explain&lt;/code&gt; data in pganalyze using query fingerprints (another pg_query feature we&apos;ll discuss in detail in a later section).&lt;/p&gt;
&lt;h2 id=&quot;postgres-community-tools-build-on-pg_query&quot; &gt;&lt;a href=&quot;#postgres-community-tools-build-on-pg_query&quot; aria-label=&quot;postgres community tools build on pg_query permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Postgres community tools build on pg_query&lt;/h2&gt;
&lt;p&gt;But, what we didn&apos;t expect at the time, was the tremendous interest we&apos;ve seen from the community. &lt;strong&gt;The Ruby library alone has received over 3.5 million downloads in its lifetime.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Thanks to many contributors, pg_query now has bindings for other languages beyond Ruby and Go, such as Python (&lt;a href=&quot;https://pypi.org/project/pglast/&quot;&gt;pglast&lt;/a&gt;, maintained by &lt;a href=&quot;https://github.com/lelit&quot;&gt;Lele Gaifax&lt;/a&gt;), Node.js (&lt;a href=&quot;https://www.npmjs.com/package/pgsql-parser&quot;&gt;pgsql-parser&lt;/a&gt;, maintained by &lt;a href=&quot;https://github.com/pyramation&quot;&gt;Dan Lynch&lt;/a&gt;) and even OCaml. There are also many notable third-party projects that use pg_query to parse Postgres queries. Here are some of our favorites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://sqlc.dev/&quot;&gt;sqlc&lt;/a&gt; provides type safe SQL-based databases access in Go&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://korban.net/posts/postgres/2017-09-18-debugging-complex-postgres-queries-with-pgdebug/&quot;&gt;pgdebug&lt;/a&gt; lets you debug complex CTEs and execute parts as a standalone query&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/cloudspannerecosystem/harbourbridge&quot;&gt;Google&apos;s HarbourBridge&lt;/a&gt; uses pg_query for helping customers trial Spanner from Postgres sources&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/cwida/duckdb/tree/master/third_party/libpg_query&quot;&gt;DuckDB&lt;/a&gt; uses a forked version of pg_query for their parsing layer&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://gitlab.com/gitlab-org/gitlab/-/blob/a36e2684/Gemfile#L310&quot;&gt;GitLab&lt;/a&gt; uses pg_query for normalizing queries in their internal error reporting&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/splitgraph/splitgraph/blob/b1784d3a2009c3ee3a027372c46dcc730ce2ca78/splitgraph/core/sql/__init__.py&quot;&gt;Splitgraph&lt;/a&gt; uses pg_query via the pglast Python binding to parse the SQL statements in Splitfiles&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/purcell/sqlint&quot;&gt;sqlint&lt;/a&gt; lints your SQL files for correctness&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Today, it&apos;s time to bring pg_query to the next level.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&quot;announcing-pg_query-20-better--faster-parsing-with-postgres-13-support&quot; &gt;&lt;a href=&quot;#announcing-pg_query-20-better--faster-parsing-with-postgres-13-support&quot; aria-label=&quot;announcing pg_query 20 better  faster parsing with postgres 13 support permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Announcing pg_query 2.0: Better &amp;#x26; faster parsing, with Postgres 13 support&lt;/h2&gt;
&lt;p&gt;We&apos;re excited to announce the next major version of pg_query, &lt;strong&gt;pg_query 2.0.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In this version, you&apos;ll find support for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parsing the PostgreSQL 13 query syntax&lt;/li&gt;
&lt;li&gt;Deparser as part of the core C library, to turn modified parse trees back into SQL&lt;/li&gt;
&lt;li&gt;New parse tree format based on &lt;a href=&quot;https://developers.google.com/protocol-buffers&quot;&gt;Protocol Buffers&lt;/a&gt; (Protobuf)&lt;/li&gt;
&lt;li&gt;Improved, faster query fingerprinting mechanism&lt;/li&gt;
&lt;li&gt;And much more!&lt;/li&gt;
&lt;/ul&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#postgres-community-tools-build-on-pg_query&quot;&gt;Postgres community tools build on pg_query&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#announcing-pg_query-20-better--faster-parsing-with-postgres-13-support&quot;&gt;Announcing pg_query 2.0: Better &amp;#x26; faster parsing, with Postgres 13 support&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#how-pg_query-turns-a-postgres-statement-into-a-parse-tree&quot;&gt;How pg_query turns a Postgres statement into a parse tree&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#using-libclang-to-extract-c-source-code-from-postgres&quot;&gt;Using LibClang to extract C source code from Postgres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#turning-postgres-parser-c-structs-into-json-and-protobufs&quot;&gt;Turning Postgres parser C structs into JSON and Protobufs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-pg_query-20-adds-support-for-protocol-buffers&quot;&gt;Why pg_query 2.0 adds support for Protocol Buffers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#turning-parse-trees-back-into-sql-using-a-deparser&quot;&gt;Turning parse trees back into SQL using a deparser&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-pg_query-deparser-with-coverage-for-all-postgres-regression-tests&quot;&gt;The pg_query deparser with coverage for all Postgres regression tests&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#fingerprints-in-pg_query-a-better-way-to-check-if-two-queries-are-identical&quot;&gt;Fingerprints in pg_query: A better way to check if two queries are identical&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-did-we-create-our-own-query-fingerprint-concept&quot;&gt;Why did we create our own query fingerprint concept?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#additional-changes-for-pg_query-20&quot;&gt;Additional changes for pg_query 2.0&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;To start, let&apos;s revisit how pg_query actually works.&lt;/p&gt;
&lt;h2 id=&quot;how-pg_query-turns-a-postgres-statement-into-a-parse-tree&quot; &gt;&lt;a href=&quot;#how-pg_query-turns-a-postgres-statement-into-a-parse-tree&quot; aria-label=&quot;how pg_query turns a postgres statement into a parse tree permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How pg_query turns a Postgres statement into a parse tree&lt;/h2&gt;
&lt;p&gt;There are many ways to parse SQL, but the scope for pg_query is very specific. That is, to be able to parse the full Postgres query syntax, the same way as Postgres does. The only reliable way to do this, is to &lt;strong&gt;use the Postgres parser itself&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;pg_query isn&apos;t the first project to do this, for example pgpool has a copy of the Postgres parser as well. But we needed an easily maintainable, self-contained version of the parser in a standalone C library. This would let us, and the Postgres community, use the parser from almost any language by writing a simple wrapper.&lt;/p&gt;
&lt;p&gt;How did we do this? We started by looking at the Postgres source. &lt;a href=&quot;https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/parser/parser.c#L42&quot;&gt;Looking at the source&lt;/a&gt;, you will find the function called &lt;code &gt;raw_parser&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
 * raw_parser
 *		Given a query in string form, do lexical and grammatical analysis.
 *
 * Returns a list of raw (un-analyzed) parse trees.  The immediate elements
 * of the list are always RawStmt nodes.
 */&lt;/span&gt;
List &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;raw_parser&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;char&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;str&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After raw parsing, Postgres goes into parse analysis. In that phase Postgres identifies the types of columns, maps table names to the schema and more. After that, Postgres does planning (see our introduction to &lt;a src=&quot;https://pganalyze.com/docs/explain/basics-of-postgres-query-planning&quot;&gt;Postgres query planning&lt;/a&gt;), and then executes the query based on the query plan.&lt;/p&gt;
&lt;p&gt;For pg_query, all we need is the raw parser. Looking at the code, &lt;strong&gt;we discovered a problem&lt;/strong&gt;. The parser code still depends on a lot of Postgres code, such as for memory management or error handling. We needed a repeatable way to extract just enough source code to compile and run the parser.&lt;/p&gt;
&lt;p&gt;Thus the idea was born to automatically extract the Postgres parser code and its dependencies.&lt;/p&gt;
&lt;h3 id=&quot;using-libclang-to-extract-c-source-code-from-postgres&quot; &gt;&lt;a href=&quot;#using-libclang-to-extract-c-source-code-from-postgres&quot; aria-label=&quot;using libclang to extract c source code from postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using LibClang to extract C source code from Postgres&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Our goal:&lt;/strong&gt; A set of self-contained C files that represent a copy of Postgres&apos; &lt;code &gt;raw_parser&lt;/code&gt; function. But we don&apos;t want to copy the code manually. Luckily we can use &lt;a href=&quot;https://clang.llvm.org/docs/Tooling.html#libclang&quot;&gt;LibClang&lt;/a&gt; to parse C code, and understand its dependencies.&lt;/p&gt;
&lt;p&gt;The details of this could fill many pages, but here is a simplified version of how this works:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Each translation unit (.c file) in the source is analyzed via LibClang&apos;s Ruby binding:&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&apos;ffi/clang&apos;&lt;/span&gt;

index &lt;span &gt;=&lt;/span&gt; &lt;span &gt;FFI&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Clang&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Index&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;new&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;true&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
translation_unit &lt;span &gt;=&lt;/span&gt; index&lt;span &gt;.&lt;/span&gt;parse_translation_unit&lt;span &gt;(&lt;/span&gt;file&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;... CFLAGS ...&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. The analysis walks through the file and marks each C method, as well as the symbols it references:&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;translation_unit&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;visit_children &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;cursor&lt;span &gt;,&lt;/span&gt; parent&lt;span &gt;|&lt;/span&gt;
  &lt;span &gt;@file_to_symbol_positions&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;location&lt;span &gt;.&lt;/span&gt;file&lt;span &gt;]&lt;/span&gt; &lt;span &gt;||&lt;/span&gt;&lt;span &gt;=&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;@file_to_symbol_positions&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;location&lt;span &gt;.&lt;/span&gt;file&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;spelling&lt;span &gt;]&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;extent&lt;span &gt;.&lt;/span&gt;start&lt;span &gt;.&lt;/span&gt;offset&lt;span &gt;,&lt;/span&gt; cursor&lt;span &gt;.&lt;/span&gt;extent&lt;span &gt;.&lt;/span&gt;&lt;span &gt;end&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;offset&lt;span &gt;]&lt;/span&gt;
  cursor&lt;span &gt;.&lt;/span&gt;visit_children &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;child_cursor&lt;span &gt;,&lt;/span&gt; parent&lt;span &gt;|&lt;/span&gt;
    &lt;span &gt;if&lt;/span&gt; child_cursor&lt;span &gt;.&lt;/span&gt;kind &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:cursor_decl_ref_expr&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; child_cursor&lt;span &gt;.&lt;/span&gt;kind &lt;span &gt;==&lt;/span&gt; &lt;span &gt;:cursor_call_expr&lt;/span&gt;
      &lt;span &gt;@references&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;spelling&lt;span &gt;]&lt;/span&gt; &lt;span &gt;||&lt;/span&gt;&lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
      &lt;span &gt;(&lt;/span&gt;&lt;span &gt;@references&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;cursor&lt;span &gt;.&lt;/span&gt;spelling&lt;span &gt;]&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; child_cursor&lt;span &gt;.&lt;/span&gt;spelling&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;uniq&lt;span &gt;!&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
    &lt;span &gt;:recurse&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. We resolve required C methods and their code, based on the top-level method we are looking for:&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;deep_resolve&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;method_name&lt;span &gt;,&lt;/span&gt; depth&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; trail&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; global_resolved_by_parent&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; static_resolved_by_parent&lt;span &gt;:&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; static_base_filename&lt;span &gt;:&lt;/span&gt; &lt;span &gt;nil&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
  global_dependents &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;@references&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;method_name&lt;span &gt;]&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  global_dependents&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;symbol&lt;span &gt;|&lt;/span&gt;
    deep_resolve&lt;span &gt;(&lt;/span&gt;symbol&lt;span &gt;,&lt;/span&gt; depth&lt;span &gt;:&lt;/span&gt; depth &lt;span &gt;+&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; trail&lt;span &gt;:&lt;/span&gt; trail &lt;span &gt;+&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;method_name&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; global_resolved_by_parent&lt;span &gt;:&lt;/span&gt; global_resolved_by_parent &lt;span &gt;+&lt;/span&gt; global_dependents&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;

deep_resolve&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;raw_parser&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;4. We write out just the portions of the C code that are required (see &lt;a href=&quot;https://github.com/pganalyze/libpg_query/blob/6517eedf6c3c6c53a14ecd8f01410bb8fc3c8ec1/scripts/extract_source.rb#L354&quot;&gt;details here&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With this, we have a working Postgres parser!&lt;/p&gt;
&lt;p&gt;You can find the full details in the &lt;a href=&quot;https://github.com/pganalyze/libpg_query/blob/13-latest/scripts/extract_source.rb&quot;&gt;pg_query source&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once we can call the Postgres parser in our standalone library, we can get the result as a parse tree, represented as Postgres parser C structs. But now we needed to make this useful in other languages, such as Ruby or Go.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;turning-postgres-parser-c-structs-into-json-and-protobufs&quot; &gt;&lt;a href=&quot;#turning-postgres-parser-c-structs-into-json-and-protobufs&quot; aria-label=&quot;turning postgres parser c structs into json and protobufs permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Turning Postgres parser C structs into JSON and Protobufs&lt;/h3&gt;
&lt;p&gt;It&apos;s a little known fact, but Postgres actually has a text representation of a query parse tree. Its rarely used directly, being reserved for internal communication and debugging. The easiest way to see an example is by looking at the &lt;code &gt;adbin&lt;/code&gt; field in &lt;a href=&quot;https://www.postgresql.org/docs/13/catalog-pg-attrdef.html&quot;&gt;pg_attref&lt;/a&gt;, which shows the internal representation for an expression of an column default value (to contrast, &lt;code &gt;pg_get_expr&lt;/code&gt; shows the expression in SQL):&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; adbin&lt;span &gt;,&lt;/span&gt; pg_get_expr&lt;span &gt;(&lt;/span&gt;adbin&lt;span &gt;,&lt;/span&gt; adrelid&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_attrdef &lt;span &gt;WHERE&lt;/span&gt; adrelid &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;mytable&apos;&lt;/span&gt;::regclass &lt;span &gt;AND&lt;/span&gt; adnum &lt;span &gt;=&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
adbin       | {FUNCEXPR :funcid 480 :funcresulttype 23 :funcretset false :funcvariadic false :funcformat 2 :funccollid 0 :inputcollid 0 :args ({FUNCEXPR :funcid 1574 :funcresulttype 20 :funcretset false :funcvariadic false :funcformat 0 :funccollid 0 :inputcollid 0 :args ({CONST :consttype 2205 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 68 :constvalue 4 [ -27 10 -122 1 0 0 0 0 ]}) :location 60}) :location -1}
pg_get_expr | nextval(&apos;mytable_id_seq&apos;::regclass)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This text format is not useful for working with a parse tree in other languages. Thus, we needed a more portable format to export the parse tree from C, and import it in another language such as Ruby.&lt;/p&gt;
&lt;p&gt;The initial version of pg_query used JSON for this. JSON is great, since you can parse it in pretty much any programming language. Thus, in this new pg_query release, we still support JSON.&lt;/p&gt;
&lt;p&gt;We&apos;re also introducing support for a new schema-based format, using Protocol Buffers (Protobuf).&lt;/p&gt;
&lt;h3 id=&quot;why-pg_query-20-adds-support-for-protocol-buffers&quot; &gt;&lt;a href=&quot;#why-pg_query-20-adds-support-for-protocol-buffers&quot; aria-label=&quot;why pg_query 20 adds support for protocol buffers permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Why pg_query 2.0 adds support for Protocol Buffers&lt;/h3&gt;
&lt;p&gt;Whilst JSON is convenient for passing around the parse tree, it has a few problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;JSON is slower to parse than a binary format&lt;/li&gt;
&lt;li&gt;Memory usage can become an issue with complex parse trees&lt;/li&gt;
&lt;li&gt;Building logic around a tree of JSON data is error-prone, as one needs to add a lot of checks to identify each node and its supported fields&lt;/li&gt;
&lt;li&gt;It&apos;s hard to instantiate new parse tree nodes, for example to use for deparsing back into a SQL statement&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In pg_query 1.0, accessing the value of a &quot;SELECT 1&quot; would have looked like this with the Ruby binding:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;result &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT 1&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
result&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;RawStmt&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;stmt&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;SelectStmt&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;targetList&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;ResTarget&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;val&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
&lt;span &gt;# =&gt; {&quot;A_Const&quot;=&gt;{&quot;val&quot;=&gt;{&quot;Integer&quot;=&gt;{&quot;ival&quot;=&gt;1}}, &quot;location&quot;=&gt;7}}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here is how Protobuf improves the parse tree handling in Ruby:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;result &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT 1&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
result&lt;span &gt;.&lt;/span&gt;tree&lt;span &gt;.&lt;/span&gt;stmts&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;stmt&lt;span &gt;.&lt;/span&gt;select_stmt&lt;span &gt;.&lt;/span&gt;target_list&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;res_target&lt;span &gt;.&lt;/span&gt;val&lt;span &gt;.&lt;/span&gt;a_const
&lt;span &gt;# =&gt; &amp;lt;PgQuery::A_Const: val: &amp;lt;PgQuery::Node: integer: &amp;lt;PgQuery::Integer: ival: 1&gt;&gt;, location: 7&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note how we have a full class definition for each parse tree node type, making interaction with the tree nodes significantly easier.&lt;/p&gt;
&lt;p&gt;Now, let&apos;s say I want to change a parse tree and turn it back into a SQL statement. For this, I need a deparser.&lt;/p&gt;
&lt;h2 id=&quot;turning-parse-trees-back-into-sql-using-a-deparser&quot; &gt;&lt;a href=&quot;#turning-parse-trees-back-into-sql-using-a-deparser&quot; aria-label=&quot;turning parse trees back into sql using a deparser permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Turning parse trees back into SQL using a deparser&lt;/h2&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/ecd4126fc167bfe7a01adca9808529e3/aa440/deparser.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Illustration of deparsing Postgres parse tree back into SELECT statement&quot;
        title=&quot;Illustration of deparsing Postgres parse tree back into SELECT statement&quot;
        src=&quot;https://pganalyze.com/static/ecd4126fc167bfe7a01adca9808529e3/1d69c/deparser.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Postgres itself has deparser logic in many places. For example postgres_fdw has a deparser to generate the query to send to the remote server. But, the deparser code in Postgres requires a post-parse analysis parse tree (that directly references relation OIDs, etc). That means we can&apos;t make use of it in pg_query, which works with raw parse trees.&lt;/p&gt;
&lt;p&gt;For many years now, the Ruby pg_query library has had a deparser. Over the years we&apos;ve had many community contributions to make it complete. The third-party libraries for Python and Node.js also have their own deparser. These efforts were all done in parallel, without sharing code. And the Go library is missing a deparser altogether.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How can we reduce the duplicated effort in the community?&lt;/strong&gt; By creating a new portable deparser for raw parse trees. This avoids having duplicate efforts for every pg_query-based library.&lt;/p&gt;
&lt;h3 id=&quot;the-pg_query-deparser-with-coverage-for-all-postgres-regression-tests&quot; &gt;&lt;a href=&quot;#the-pg_query-deparser-with-coverage-for-all-postgres-regression-tests&quot; aria-label=&quot;the pg_query deparser with coverage for all postgres regression tests permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The pg_query deparser with coverage for all Postgres regression tests&lt;/h3&gt;
&lt;p&gt;pg_query 2.0 features a new deparser, written in C. This was by far the biggest undertaking of this new release. The new deparser is able to generate all SQL queries used in the Postgres regression tests (which the pg_query parser can of course parse), and more.&lt;/p&gt;
&lt;p&gt;It works like this, here by example of the Go library, which before did not have a deparser:&lt;/p&gt;
&lt;div  data-language=&quot;go&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;package&lt;/span&gt; main

&lt;span &gt;import&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&quot;fmt&quot;&lt;/span&gt;
  pg_query &lt;span &gt;&quot;github.com/pganalyze/pg_query_go/v2&quot;&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;

&lt;span &gt;func&lt;/span&gt; &lt;span &gt;main&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;// Parse a query&lt;/span&gt;
  result&lt;span &gt;,&lt;/span&gt; err &lt;span &gt;:=&lt;/span&gt; pg_query&lt;span &gt;.&lt;/span&gt;&lt;span &gt;Parse&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT 42&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;if&lt;/span&gt; err &lt;span &gt;!=&lt;/span&gt; &lt;span &gt;nil&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;panic&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;err&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;

  &lt;span &gt;// Modify the parse tree&lt;/span&gt;
  result&lt;span &gt;.&lt;/span&gt;Stmts&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;Stmt&lt;span &gt;.&lt;/span&gt;&lt;span &gt;GetSelectStmt&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;GetTargetList&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;GetResTarget&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;Val &lt;span &gt;=&lt;/span&gt;
    pg_query&lt;span &gt;.&lt;/span&gt;&lt;span &gt;MakeAConstStrNode&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;Hello World&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

  &lt;span &gt;// Deparse back into a query&lt;/span&gt;
  stmt&lt;span &gt;,&lt;/span&gt; err &lt;span &gt;:=&lt;/span&gt; pg_query&lt;span &gt;.&lt;/span&gt;&lt;span &gt;Deparse&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;result&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;if&lt;/span&gt; err &lt;span &gt;!=&lt;/span&gt; &lt;span &gt;nil&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;panic&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;err&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;
  fmt&lt;span &gt;.&lt;/span&gt;&lt;span &gt;Printf&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;%s\n&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; stmt&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will output the following:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;SELECT &apos;Hello World&apos;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;First, the deparsing step encodes the Go structs into the new Protobuf format. Then, the C library decodes this into Postgres parse tree C structs. Last but not least, the C library&apos;s new deparser turns the C structs into the SQL query text.&lt;/p&gt;
&lt;p&gt;Stepping away from deparsing, let&apos;s take a look at the new fingerprinting mechanism:&lt;/p&gt;
&lt;h2 id=&quot;fingerprints-in-pg_query-a-better-way-to-check-if-two-queries-are-identical&quot; &gt;&lt;a href=&quot;#fingerprints-in-pg_query-a-better-way-to-check-if-two-queries-are-identical&quot; aria-label=&quot;fingerprints in pg_query a better way to check if two queries are identical permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Fingerprints in pg_query: A better way to check if two queries are identical&lt;/h2&gt;
&lt;p&gt;Let&apos;s start with the motivation for query fingerprints. pganalyze needs to link together Postgres statistics across different data sources. For example queries from &lt;code &gt;pg_stat_statements&lt;/code&gt; with the Postgres &lt;code &gt;auto_explain&lt;/code&gt; logs. You can see the fingerprint in pganalyze on the query details page:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/f1ff8b6c20366e040045fc97d416895b/acf8f/pganalyze_query_details.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;pganalyze Query Details page showing a query and its associated fingerprint value&quot;
        title=&quot;pganalyze Query Details page showing a query and its associated fingerprint value&quot;
        src=&quot;https://pganalyze.com/static/f1ff8b6c20366e040045fc97d416895b/1d69c/pganalyze_query_details.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This query can be represented differently depending on which part of Postgres you look at:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_stat_statements: &lt;code &gt;SELECT &quot;abalance&quot; FROM &quot;pgbench_accounts&quot; WHERE &quot;aid&quot; = ?&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;auto_explain: &lt;code &gt;SELECT abalance FROM pgbench_accounts WHERE aid = 4674588&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A simple text comparison would not be sufficient to determine that these queries are identical.&lt;/p&gt;
&lt;h3 id=&quot;why-did-we-create-our-own-query-fingerprint-concept&quot; &gt;&lt;a href=&quot;#why-did-we-create-our-own-query-fingerprint-concept&quot; aria-label=&quot;why did we create our own query fingerprint concept permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Why did we create our own query fingerprint concept?&lt;/h3&gt;
&lt;p&gt;Postgres already has the concept of a &quot;queryid&quot;, calculated based on the post-parse analysis tree. It&apos;s used in places such as &lt;code &gt;pg_stat_statements&lt;/code&gt; to distinguish the different query entries.&lt;/p&gt;
&lt;p&gt;But, this queryid is not available everywhere today, e.g. you can&apos;t get it with &lt;code &gt;auto_explain&lt;/code&gt; plans. It&apos;s also not portable between databases, as it&apos;s dependent on specific relation OIDs. Even if you have the exact same queries on your staging and production system, they will have different queryid values. And the queryid can&apos;t be generated outside the context of a Postgres server. Thus, pganalyze has its own mechanism, called a query fingerprint. &lt;/p&gt;
&lt;p&gt;Fingerprints identify a Postgres query based on its raw parse tree alone. We&apos;ve open-sourced this mechanism in pg_query:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;fingerprint&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SELECT a, b FROM c&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;fb1f305bea85c2f6&quot;&lt;/span&gt;

&lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;fingerprint&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SELECT b, a FROM c&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# =&gt; &quot;fb1f305bea85c2f6&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This mechanism does not need a running server, so all you need as input is a valid Postgres query.&lt;/p&gt;
&lt;p&gt;With pg_query 2.0, we&apos;ve done a few enhancements to the fingerprint functionality:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use the faster XXH3 hash function, instead of SHA-1.&lt;/strong&gt; pg_query 1.0 used the outdated cryptographic hash function SHA-1. Cryptographic guarantees are not needed for this use case, and XXH3 is much faster.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Contain the fingerprint in a 64-bit value, instead of 136 bits.&lt;/strong&gt; We&apos;ve determined that 64-bit precision is good enough for query fingerprints. Postgres itself thinks so too, since it uses 64-bit for the Postgres queryid. We often use data from &lt;code &gt;pg_stat_statements&lt;/code&gt;, so there is little benefit to more bits. Using a smaller data type also means better performance for pganalyze.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fix edge cases where two almost identical queries had different fingerprints&lt;/strong&gt;. Fingerprints should ignore query differences, when they result in the same query intent. We&apos;ve addressed a few cases where this was not working as expected. You can look at the corresponding &lt;a href=&quot;https://github.com/pganalyze/libpg_query/wiki/Fingerprinting#version-30-based-on-postgresql-13&quot;&gt;wiki page&lt;/a&gt; to understand these changes in more detail.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;additional-changes-for-pg_query-20&quot; &gt;&lt;a href=&quot;#additional-changes-for-pg_query-20&quot; aria-label=&quot;additional changes for pg_query 20 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Additional changes for pg_query 2.0&lt;/h2&gt;
&lt;p&gt;A few other things about the new release:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The pg_query library now resides in the pganalyze organization on GitHub. This makes it clear who maintains and funds the core development. We will continue to make pg_query available under the BSD 3-clause license.&lt;/li&gt;
&lt;li&gt;pg_query has a new method for splitting queries. This can be useful when you want to split a multi-statement string into its component statements, for example &lt;code &gt;SELECT &apos;;&apos;; SELECT &apos;foo&apos;&lt;/code&gt; into &lt;code &gt;SELECT &apos;;&apos;&lt;/code&gt; and &lt;code &gt;SELECT &apos;foo&apos;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;There is a new function available to access the Postgres scanner. This includes the location of comments in a query text. One could envision building a syntax highlighter based on this. Or extract comments from queries whilst ignoring comment-like tokens in a constant value.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The new pg_query 2.0 is &lt;a href=&quot;https://github.com/pganalyze/libpg_query&quot;&gt;available today&lt;/a&gt;, with bindings for &lt;a href=&quot;https://github.com/pganalyze/pg_query_go&quot;&gt;Go&lt;/a&gt; and &lt;a href=&quot;https://github.com/pganalyze/pg_query&quot;&gt;Ruby&lt;/a&gt; available to start. We are also working on a new pganalyze-maintained Rust binding that we&apos;ll have news about soon.&lt;/p&gt;
&lt;p&gt;Help us get the word out by &lt;a href=&quot;https://twitter.com/intent/tweet?text=Introducing%20pg_query%202.0%20-%20The%20easiest%20way%20to%20parse%20Postgres%20queries%3A%20https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser&quot;&gt;sharing this post on Twitter&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Efficient Postgres Full Text Search in Django]]></title><description><![CDATA[In this article, we'll take a look at making use of the built-in, natural language based Postgres Full Text Search in Django. Internet users have gotten increasingly discerning when it comes to search. When they type a keyword into your website's search bar, they expect to find logically ranked results, including related matches and misspellings. Because users are used to these sophisticated search systems, developers have to build applications that use more than simple  queries.  Postgres Full…]]></description><link>https://pganalyze.com/blog/full-text-search-django-postgres</link><guid isPermaLink="false">https://pganalyze.com/blog/full-text-search-django-postgres</guid><dc:creator><![CDATA[Adeyinka Adegbenro]]></dc:creator><pubDate>Wed, 24 Feb 2021 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;In this article, we&apos;ll take a look at making use of the &lt;strong&gt;built-in, natural language based&lt;/strong&gt; Postgres Full Text Search in Django. Internet users have gotten increasingly discerning when it comes to search. When they type a keyword into your website&apos;s search bar, they expect to find logically ranked results, including related matches and misspellings.&lt;/p&gt;
&lt;p&gt;Because users are used to these sophisticated search systems, developers have to build applications that use more than simple &lt;code &gt;LIKE&lt;/code&gt; queries.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/9485e7c9ff9b771c87eefffae64dabb5/aa440/query_comparison.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Comparison of Sequential Scan for LIKE vs Index Scan for Full Text Search&quot;
        title=&quot;Comparison of Sequential Scan for LIKE vs Index Scan for Full Text Search&quot;
        src=&quot;https://pganalyze.com/static/9485e7c9ff9b771c87eefffae64dabb5/1d69c/query_comparison.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch.html&quot;&gt;Postgres Full Text Search&lt;/a&gt; has been available since Postgres 8.3. It can be used to &lt;strong&gt;find records based on semantics and knowledge of the language&lt;/strong&gt; rather than simple string matching, is very flexible, and unlike other search options such as &lt;code &gt;LIKE&lt;/code&gt;, it performs well for partial matches.&lt;/p&gt;
&lt;p&gt;While &lt;code &gt;LIKE&lt;/code&gt; can be supported by indexes, that usually won’t work well when the &lt;code &gt;%&lt;/code&gt; operator is used on the left side of the search term. Typically this means the query planner reverts to sequential scans when using wildcard operators like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; title &lt;span &gt;FROM&lt;/span&gt; film &lt;span &gt;WHERE&lt;/span&gt; description &lt;span &gt;LIKE&lt;/span&gt; &lt;span &gt;&apos;%brilliant&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When searching multiple columns, you also have additional effort, as each column needs to be queried separately using &lt;code &gt;LIKE&lt;/code&gt;. There are more flexible alternatives, such as &lt;code &gt;SIMILAR TO&lt;/code&gt; and POSIX regular expression search, but those are still difficult to use when you want to catch different variations of a word (e.g., word variations like “jump,” “jumps,” “jumped,” and “jumping”).&lt;/p&gt;
&lt;p&gt;Because &lt;code &gt;LIKE&lt;/code&gt; and other simple methods lack language support and cannot handle word variations or ranking, Postgres Full Text Search (FTS) is generally a better option when implementing search directly in the database. With FTS, your searches will match all instances of the word, its plural, and the word&apos;s various tenses. Because FTS is bundled into Postgres, there’s no need to install extra software and no extra cost for using a third-party search provider. Additionally, all your data are stored in one place, which reduces your web application’s complexity.&lt;/p&gt;
&lt;p&gt;This article will show you how to use Full Text Search in raw PostgreSQL queries and implement equivalent queries in Django using the Postgres driver. Along the way, you’ll see some of the use cases for the various Full Text Search methods that Postgres provides.&lt;/p&gt;
&lt;p&gt;Should you be interested in learning more about Full Text Search with Rails, please check out our article here: &lt;a href=&quot;https://pganalyze.com/blog/full-text-search-ruby-rails-postgres&quot;&gt;Full Text Search in Milliseconds with Rails and PostgreSQL&lt;/a&gt;.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#core-concepts-of-postgres-full-text-search&quot;&gt;Core Concepts of Postgres Full Text Search&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#tsvector-data-type&quot;&gt;tsvector data type&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#tsquery-data-type&quot;&gt;tsquery data type&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#searching&quot;&gt;Searching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#ranking&quot;&gt;Ranking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#weighting&quot;&gt;Weighting&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#using-postgresql-full-text-search-in-django&quot;&gt;Using PostgreSQL Full Text Search in Django&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#searching-a-single-field&quot;&gt;Searching a Single field&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#searchvector&quot;&gt;SearchVector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#searchquery&quot;&gt;SearchQuery&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#searchrank&quot;&gt;SearchRank&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#optimizing-search-performance-in-django&quot;&gt;Optimizing Search Performance in Django&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#using-gin-indexes&quot;&gt;Using GIN indexes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#generated-columns-in-postgres-12&quot;&gt;Generated Columns in Postgres 12+&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#comparing-query-performance&quot;&gt;Comparing Query Performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;core-concepts-of-postgres-full-text-search&quot; &gt;&lt;a href=&quot;#core-concepts-of-postgres-full-text-search&quot; aria-label=&quot;core concepts of postgres full text search permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Core Concepts of Postgres Full Text Search&lt;/h2&gt;
&lt;p&gt;PostgreSQL provides several native functions for Full Text Search. In the following sections, you’ll see how to use them to “vectorize” your results and search queries so that you can use Postgres’s Full Text Search features.&lt;/p&gt;
&lt;h3 id=&quot;tsvector-data-type&quot; &gt;&lt;a href=&quot;#tsvector-data-type&quot; aria-label=&quot;tsvector data type permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;tsvector data type&lt;/h3&gt;
&lt;p&gt;Before a text or document can be searched using FTS, you need to convert it to an acceptable data type, known as a &lt;code &gt;tsvector&lt;/code&gt;. To convert plain text to a &lt;code &gt;tsvector&lt;/code&gt;, use the Postgres &lt;code &gt;to_tsvector&lt;/code&gt; function. This function reduces the original text to a set of word skeletons known as lexemes.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Lexeme&quot;&gt;Lexemes&lt;/a&gt; are important because they help match related words. For instance, the words &lt;em&gt;satisfy&lt;/em&gt;, &lt;em&gt;satisfying&lt;/em&gt; and &lt;em&gt;satisfied&lt;/em&gt; would convert to &lt;em&gt;satisfi&lt;/em&gt;. This means a search for &lt;em&gt;satisfy&lt;/em&gt; will return results containing any of the other terms as well. &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS&quot;&gt;Stop words&lt;/a&gt; such as “a,” “on,” “of,” “you,” “who,” etc. are removed because they appear too frequently to be relevant in searches. The &lt;code &gt;to_tsvector&lt;/code&gt; function returns the lexemes, along with a digit that denotes each word’s position in the text.&lt;/p&gt;
&lt;p&gt;Note that the output of the function is language-dependent. You should tell PostgreSQL to treat the text as English (or whatever language your results are stored in). To convert the sentence “A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank” to a &lt;code &gt;tsvector&lt;/code&gt;, run the following:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; search&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You’ll see output like the following:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                    search
----------------------------------------------------------------------
&apos;chase&apos;:12 &apos;documentari&apos;:3 &apos;fanci&apos;:2 &apos;frisbe&apos;:6 &apos;lumberjack&apos;:9 &apos;monkey&apos;:14 &apos;must&apos;:11 &apos;shark&apos;:17 &apos;tank&apos;:18&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This shows each word’s root as well as its position in the text. For example, the word fanciful, the second word in the text, has been broken down into the lexeme “fanci”, so you see &lt;code &gt;’fanci’:2&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;tsquery-data-type&quot; &gt;&lt;a href=&quot;#tsquery-data-type&quot; aria-label=&quot;tsquery data type permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;tsquery data type&lt;/h3&gt;
&lt;p&gt;Text search systems have two major components: the text being searched and the keyword being searched for. In the case of FTS, both components must be vectorized. You saw how searchable data is converted to a &lt;code &gt;tsvector&lt;/code&gt; in the previous section, so now you’ll see how search terms are vectorized into &lt;code &gt;tsquery&lt;/code&gt; values.&lt;/p&gt;
&lt;p&gt;Postgres offers functions that will convert text fields to &lt;code &gt;tsquery&lt;/code&gt; values such as &lt;code &gt;to_tsquery&lt;/code&gt;, &lt;code &gt;plainto_tsquery&lt;/code&gt; and &lt;code &gt;phraseto_tsquery&lt;/code&gt;. Search terms can also be combined with the &lt;code &gt;&amp;amp;&lt;/code&gt; (AND), &lt;code &gt;|&lt;/code&gt; (OR), and &lt;code &gt;!&lt;/code&gt; (NOT) operators, and parentheses can be used to group operators and determine their order. &lt;code &gt;to_tsquery&lt;/code&gt; converts the search terms to tokens and discards stop words.&lt;/p&gt;
&lt;p&gt;The following query:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;a &amp;amp; beautifully &amp;amp; very &amp;amp; quickly&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; search&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Returns the lexemes “beauti” and “quick” because “a” and “very” are stop words:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                    search
----------------------------------------------------------------------
&apos;beauti&apos; &amp;amp; &apos;quick&apos;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;searching&quot; &gt;&lt;a href=&quot;#searching&quot; aria-label=&quot;searching permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Searching&lt;/h3&gt;
&lt;p&gt;Now that you know how to create a &lt;code &gt;tsvector&lt;/code&gt; from your text data and a &lt;code &gt;tsquery&lt;/code&gt; from your search terms, you can perform a full-text search using the &lt;code &gt;@@&lt;/code&gt; operator.&lt;/p&gt;
&lt;p&gt;For example, you can run the following query to compare two strings:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;John&apos;&apos;s performance was found wanting&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; @@ to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;want&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query returns &lt;code &gt;TRUE&lt;/code&gt;, indicating that there was a match: &lt;code &gt;want&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you had a &lt;code &gt;film&lt;/code&gt; table containing movie titles and descriptions, you could use Full Text Search to find all films with a description containing the word &lt;em&gt;epic&lt;/em&gt; and either of the words &lt;em&gt;tale&lt;/em&gt; or &lt;em&gt;story&lt;/em&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; title&lt;span &gt;,&lt;/span&gt; description &lt;span &gt;FROM&lt;/span&gt; film
&lt;span &gt;WHERE&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;description&lt;span &gt;)&lt;/span&gt; @@ to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;epic &amp;amp; (story | tale)&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;10&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This would give you results similar to this:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/d5399405d52107d5e214420347c5456c/a8a6f/search_result.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Query result&quot;
        title=&quot;Query result&quot;
        src=&quot;https://pganalyze.com/static/d5399405d52107d5e214420347c5456c/1d69c/search_result.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3 id=&quot;ranking&quot; &gt;&lt;a href=&quot;#ranking&quot; aria-label=&quot;ranking permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Ranking&lt;/h3&gt;
&lt;p&gt;Ranking search results can ensure that the most relevant results are shown first. Postgres provides &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING&quot;&gt;two functions for ranking&lt;/a&gt; search results: &lt;code &gt;ts_rank&lt;/code&gt; and &lt;code &gt;ts_rankcd&lt;/code&gt;. &lt;code &gt;ts_rank&lt;/code&gt; considers the frequency of words, while &lt;code &gt;ts_rank_cd&lt;/code&gt; (“cd” means “coverage density”) considers the position of search terms within the text being searched.&lt;/p&gt;
&lt;p&gt;If you run the following query, you’ll see that &lt;code &gt;rank1&lt;/code&gt; and &lt;code &gt;rank2&lt;/code&gt; are ranked the same (&lt;code &gt;0.06078271&lt;/code&gt;) because each search query is found once in the sentence:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;Dolphins are to water as elephants are to forest&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;elephant&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; rank1&lt;span &gt;,&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;Dolphins are to water as elephants are to forest&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;dolphin&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; rank2&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The more tokens that match the text, the higher the rank. In the following example, &lt;code &gt;rank1&lt;/code&gt; has a higher rank than &lt;code &gt;rank2&lt;/code&gt; because the tokens “elephant” and “dolphin” are both found in the sentence while “snake” is not:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;Dolphins are to water as elephants are to forest&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;elephant &amp;amp; dolphin&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; rank1&lt;span &gt;,&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;Dolphins are to water as elephants are to forest&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;dolphin &amp;amp; snake&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; rank2&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;weighting&quot; &gt;&lt;a href=&quot;#weighting&quot; aria-label=&quot;weighting permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Weighting&lt;/h3&gt;
&lt;p&gt;You can also give relevance to some factors over others by weighing them. For instance, when searching the film table, the highest weight could be given to the movie title and less weight could be given to the description using the &lt;code &gt;setweight&lt;/code&gt; function:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;--- Set the weights&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;elephant&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;A&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;dolphin&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;B&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; weight&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;--- Run the query&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;elephant&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;A&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;dolphin&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;B&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;elephant&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; elephant_rank&lt;span &gt;,&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;elephant&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;A&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;dolphin&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;B&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;dolphin&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; dolphin_rank&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code &gt;elephant_rank&lt;/code&gt; is ranked higher because it matched elephant which has a higher weight of A, compared to dolphin_rank which has a weight of B.&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;| elephant_rank | dolphin_rank |
|---------------|--------------|
| 0.6079271     | 0.24317084   |
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code &gt;ts_rank&lt;/code&gt; takes an optional first argument, &lt;strong&gt;weight&lt;/strong&gt;. When this argument is left empty, it defaults to “{0.1, 0.2, 0.4, 1.0}” in the order D, C, B, A. By default, A has the highest weight of 1.0, B has 0.4, C has 0.2, and D has 0.1. You can set the weight of any of A, B, C, or D to a different value using any decimal between -0.1 and 1.0. This allows you to have fine-grained control over how results are returned and ensure that users see the right results for their queries.&lt;/p&gt;
&lt;p&gt;Now that you’ve seen how to use Postgres’ Full Text Search functions, you’re ready to start applying these ideas to your Django app.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;using-postgresql-full-text-search-in-django&quot; &gt;&lt;a href=&quot;#using-postgresql-full-text-search-in-django&quot; aria-label=&quot;using postgresql full text search in django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using PostgreSQL Full Text Search in Django&lt;/h2&gt;
&lt;p&gt;Using Postgres’ Full Text Search in Django is an ideal way to add accurate and fast search because it is easy to maintain and fast to work with. To demonstrate Full Text Search in Django, consider a PostgreSQL database &lt;code &gt;dvdrental&lt;/code&gt;, with a &lt;code &gt;film&lt;/code&gt; table, and an equivalent &lt;code &gt;Film&lt;/code&gt; model in a Django application implemented like this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; Models

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Film&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    film_id &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;AutoField&lt;span &gt;(&lt;/span&gt;primary_key&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    title &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    description &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;TextField&lt;span &gt;(&lt;/span&gt;blank&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__str__&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&apos;, &apos;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;join&lt;span &gt;(&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;film_id=&apos;&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; &lt;span &gt;str&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;.&lt;/span&gt;film_id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;title=&apos;&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;title&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;description=&apos;&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;description&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the remainder of this article, you’ll see how to search a single database table field, search multiple fields, rank search results, and optimize the performance of Full text Search using vector fields and indexes. You can run the commands from your &lt;a href=&quot;https://www.python.org/shell/&quot;&gt;Python shell&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;searching-a-single-field&quot; &gt;&lt;a href=&quot;#searching-a-single-field&quot; aria-label=&quot;searching a single field permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Searching a Single field&lt;/h3&gt;
&lt;p&gt;The simplest way to start using Full Text Search in Django is by using &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/search/#the-search-lookup&quot;&gt;search lookup&lt;/a&gt;. To search the &lt;code &gt;description&lt;/code&gt; column on the &lt;code &gt;Film&lt;/code&gt; model, append &lt;code &gt;__search&lt;/code&gt; to the column name when filtering the model:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;appname&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; Film
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; Film&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;description__search&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;An epic tale&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&amp;lt;&lt;/span&gt;QuerySet &lt;span &gt;[&lt;/span&gt;
    &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;8&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Airport Pollock&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Epic Tale of a Moose And a Girl who must Confront a Monkey &lt;span &gt;in&lt;/span&gt; Ancient India&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
    &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;97&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Bride Intrigue&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Epic Tale of a Robot And a Monkey who must Vanquish a Man &lt;span &gt;in&lt;/span&gt; New Orleans&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Under the hood, Django converts the &lt;code &gt;description&lt;/code&gt; field to a &lt;code &gt;tsvector&lt;/code&gt; and converts the search term to a &lt;code &gt;tsquery&lt;/code&gt;. You can check the underlying query to verify this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; connection&lt;span &gt;.&lt;/span&gt;queries&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;sql&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SELECT &quot;film&quot;.&quot;film_id&quot;, &quot;film&quot;.&quot;title&quot;, &quot;film&quot;.&quot;description&quot;, &apos;&lt;/span&gt;
 &lt;span &gt;&apos;&quot;film&quot;.&quot;release_year&quot;, &quot;film&quot;.&quot;language_id&quot;, &quot;film&quot;.&quot;rental_duration&quot;, &apos;&lt;/span&gt;
 &lt;span &gt;&apos;&quot;film&quot;.&quot;rental_rate&quot;, &quot;film&quot;.&quot;length&quot;, &quot;film&quot;.&quot;replacement_cost&quot;, &apos;&lt;/span&gt;
 &lt;span &gt;&apos;&quot;film&quot;.&quot;rating&quot;, &quot;film&quot;.&quot;last_update&quot;, &quot;film&quot;.&quot;special_features&quot;, &apos;&lt;/span&gt;
 &lt;span &gt;&apos;&quot;film&quot;.&quot;fulltext&quot;, &quot;film&quot;.&quot;index_column&quot;, &quot;film&quot;.&quot;vector_column&quot; FROM &quot;film&quot; &apos;&lt;/span&gt;
 &lt;span &gt;&apos;WHERE to_tsvector(COALESCE(&quot;film&quot;.&quot;description&quot;, \&apos;\&apos;)) @@ &apos;&lt;/span&gt;
 &lt;span &gt;&quot;plainto_tsquery(&apos;An epic tale&apos;) LIMIT 21&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;searchvector&quot; &gt;&lt;a href=&quot;#searchvector&quot; aria-label=&quot;searchvector permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;SearchVector&lt;/h3&gt;
&lt;p&gt;If you want to use the &lt;code &gt;tsvector&lt;/code&gt; on its own, you can use the Django &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/search/#searchvector&quot;&gt;SearchVector&lt;/a&gt;  class. For example, to search for the term “love” in both the title and description columns, you can run the following in your Python shell:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; Film&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;annotate&lt;span &gt;(&lt;/span&gt;search&lt;span &gt;=&lt;/span&gt;SearchVector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;title&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;description&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; config&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;search&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;love&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&amp;lt;&lt;/span&gt;QuerySet &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;374&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Graffiti Love&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Unbelieveable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer &lt;span &gt;in&lt;/span&gt; Berlin&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
&lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;448&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Idaho Love&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Fast&lt;span &gt;-&lt;/span&gt;Paced Drama of a Student And a Crocodile who must Meet a Database Administrator &lt;span &gt;in&lt;/span&gt; The Outback&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
 &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;458&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Indian Love&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Insightful Saga of a Mad Scientist And a Mad Scientist who must Kill a Astronaut &lt;span &gt;in&lt;/span&gt; An Abandoned Fun House&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
&lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;511&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Lawrence Love&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Fanciful Yarn of a Database Administrator And a Mad Cow who must Pursue a Womanizer &lt;span &gt;in&lt;/span&gt; Berlin&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
&lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;535&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Love Suicides&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Brilliant Panorama of a Hunter And a Explorer who must Pursue a Dentist &lt;span &gt;in&lt;/span&gt; An Abandoned Fun House&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
&lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;536&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Lovely Jingle&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Fanciful Yarn of a Crocodile And a Forensic Psychologist who must Discover a Crocodile &lt;span &gt;in&lt;/span&gt; The Outback&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When you inspect the underlying query, you can see how Django uses &lt;code &gt;to_tsvector&lt;/code&gt; to query both the &lt;code &gt;title&lt;/code&gt; and &lt;code &gt;description&lt;/code&gt; fields in the database:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pp&lt;span &gt;(&lt;/span&gt;connection&lt;span &gt;.&lt;/span&gt;queries&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;sql&apos;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;SELECT &quot;film&quot;.&quot;film_id&quot;, &quot;film&quot;.&quot;title&quot;, &quot;film&quot;.&quot;description&quot;, &apos;&lt;/span&gt;
 &lt;span &gt;&apos;to_tsvector(\&apos;english\&apos;::regconfig, COALESCE(&quot;film&quot;.&quot;title&quot;, \&apos;\&apos;) || \&apos; \&apos; &apos;&lt;/span&gt;
 &lt;span &gt;&apos;|| COALESCE(&quot;film&quot;.&quot;description&quot;, \&apos;\&apos;)) AS &quot;search&quot; FROM &quot;film&quot; WHERE &apos;&lt;/span&gt;
 &lt;span &gt;&apos;to_tsvector(\&apos;english\&apos;::regconfig, COALESCE(&quot;film&quot;.&quot;title&quot;, \&apos;\&apos;) || \&apos; \&apos; &apos;&lt;/span&gt;
 &lt;span &gt;&apos;|| COALESCE(&quot;film&quot;.&quot;description&quot;, \&apos;\&apos;)) @@ &apos;&lt;/span&gt;
 &lt;span &gt;&quot;plainto_tsquery(&apos;english&apos;::regconfig, &apos;love&apos;) LIMIT 21&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;searchquery&quot; &gt;&lt;a href=&quot;#searchquery&quot; aria-label=&quot;searchquery permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;SearchQuery&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/search/#searchquery&quot;&gt;SearchQuery&lt;/a&gt; is the abstraction of the &lt;code &gt;to_tsquery&lt;/code&gt;, &lt;code &gt;plainto_tsquery&lt;/code&gt; and &lt;code &gt;phraseto_tsquery&lt;/code&gt; functions in Postgres. There are several ways to use the SearchQuery class including using two keywords in a search:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; SearchQuery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;story beautiful&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or searching for a specific phrase:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; SearchQuery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;mad scientist&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; search_type&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;phrase&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Unlike SearchVector, SearchQuery supports boolean operators. The boolean operators combine search terms using logic just like they did in Postgres:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; SearchQuery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;(&apos;epic&apos; | &apos;beautiful&apos; | &apos;brilliant&apos;) &amp;amp; (&apos;tale&apos; | &apos;story&apos;)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; search_type&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;raw&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using SearchVector and SearchQuery together in a search allows you to create powerful custom searches in Django:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; vector &lt;span &gt;=&lt;/span&gt; SearchVector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;title&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;description&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; config&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# search the title and description columns..&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; query &lt;span &gt;=&lt;/span&gt; SearchQuery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;(&apos;epic&apos; | &apos;beautiful&apos; | &apos;brilliant&apos;) &amp;amp; (&apos;tale&apos; | &apos;story&apos;)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; search_type&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;raw&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# ..with the search term&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; Film&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;annotate&lt;span &gt;(&lt;/span&gt;search&lt;span &gt;=&lt;/span&gt;vector&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;search&lt;span &gt;=&lt;/span&gt;query&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&amp;lt;&lt;/span&gt;QuerySet &lt;span &gt;[&lt;/span&gt;
    &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;8&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Airport Pollock&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Epic Tale of a Moose And a Girl who must Confront a Monkey &lt;span &gt;in&lt;/span&gt; Ancient India&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;30&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Anything Savannah&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Epic Story of a Pastry Chef And a Woman who must Chase a Feminist &lt;span &gt;in&lt;/span&gt; An Abandoned Fun House&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;46&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Autumn Crow&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Beautiful Tale of a Dentist And a Mad Cow who must Battle a Moose &lt;span &gt;in&lt;/span&gt; The Sahara Desert&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;97&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Bride Intrigue&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Epic Tale of a Robot And a Monkey who must Vanquish a Man &lt;span &gt;in&lt;/span&gt; New Orleans&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
   &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;196&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Cruelty Unforgiven&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Brilliant Tale of a Car And a Moose who must Battle a Dentist &lt;span &gt;in&lt;/span&gt; Nigeria&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
   &lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;202&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Daddy Pittsburgh&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Epic Story of a A Shark And a Student who must Confront a Explorer &lt;span &gt;in&lt;/span&gt; The Gulf of Mexico&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;searchrank&quot; &gt;&lt;a href=&quot;#searchrank&quot; aria-label=&quot;searchrank permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;SearchRank&lt;/h3&gt;
&lt;p&gt;Using SearchVector and SearchQuery to generate Full Text Search queries in Django is a great start, but a robust search feature likely needs custom rankings as well. Search results can be ranked in a Django app using the &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/search/#searchrank&quot;&gt;SearchRank&lt;/a&gt; class.&lt;/p&gt;
&lt;p&gt;Here&apos;s an example using the default ranking (most matches):&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;postgres&lt;span &gt;.&lt;/span&gt;search &lt;span &gt;import&lt;/span&gt; SearchQuery&lt;span &gt;,&lt;/span&gt; SearchRank&lt;span &gt;,&lt;/span&gt; SearchVector
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; vector &lt;span &gt;=&lt;/span&gt; SearchVector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;title&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;description&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; config&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; query &lt;span &gt;=&lt;/span&gt; SearchQuery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;(&apos;epic&apos; | &apos;beautiful&apos; | &apos;brilliant&apos;) &amp;amp; (&apos;tale&apos; | &apos;story&apos;)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; search_type&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;raw&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; Film&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;annotate&lt;span &gt;(&lt;/span&gt;rank&lt;span &gt;=&lt;/span&gt;SearchRank&lt;span &gt;(&lt;/span&gt;vector&lt;span &gt;,&lt;/span&gt; query&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;order_by&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;-rank&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also add weights to each field in your SearchVector:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; vector &lt;span &gt;=&lt;/span&gt; SearchVector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;title&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; weight&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;A&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; SearchVector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;description&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; config&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; weight&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;B&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; query &lt;span &gt;=&lt;/span&gt; SearchQuery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;(&apos;epic&apos; | &apos;beautiful&apos; | &apos;brilliant&apos;) &amp;amp; (&apos;tale&apos; | &apos;story&apos;)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; search_type&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;raw&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; Film&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;annotate&lt;span &gt;(&lt;/span&gt;rank&lt;span &gt;=&lt;/span&gt;SearchRank&lt;span &gt;(&lt;/span&gt;vector&lt;span &gt;,&lt;/span&gt; query&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;order_by&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;-rank&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This makes matches in the &lt;code &gt;title&lt;/code&gt; field count more than those in the &lt;code &gt;description&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&quot;optimizing-search-performance-in-django&quot; &gt;&lt;a href=&quot;#optimizing-search-performance-in-django&quot; aria-label=&quot;optimizing search performance in django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Optimizing Search Performance in Django&lt;/h2&gt;
&lt;p&gt;To get the best performance from a Postgres Full Text Search, you need to create an indexed column that can store a &lt;code &gt;tsvector&lt;/code&gt; datatype. Performing a search on this new column will be orders of magnitude faster than generating a &lt;code &gt;tsvector&lt;/code&gt;  with SearchVector on the fly. If the text has been pre-converted and stored in a column, there&apos;s no need for runtime conversion.&lt;/p&gt;
&lt;h3 id=&quot;using-gin-indexes&quot; &gt;&lt;a href=&quot;#using-gin-indexes&quot; aria-label=&quot;using gin indexes permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using GIN indexes&lt;/h3&gt;
&lt;p&gt;To implement this new column in Django, you’ll need to add the &lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/search/#searchvectorfield&quot;&gt;SearchVectorField&lt;/a&gt; class to the model. To index this field, use a &lt;a href=&quot;https://www.postgresql.org/docs/12/textsearch-indexes.html&quot;&gt;GIN index&lt;/a&gt; as recommended for Full Text Search by PostgreSQL.&lt;/p&gt;
&lt;p&gt;PostgreSQL provides two main indexes to speed up full text search: GIN (Generalized Inverted Index) and GIST (Generalized Search Tree). The GIST index is faster to build and useful for frequently updated fields, but it can be lossy (i.e. it sometimes returns false positives). GIN is still very scalable, and while it isn’t lossy, it doesn’t allow you to store weights. Learn more about their &lt;a href=&quot;http://www.sai.msu.su/~megera/postgres/fts/doc/indexes-fts.html&quot;&gt;differences and use cases here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To add this field and index to a model, use the GinIndex and SearchVectorField classes like this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; Models

&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;postgres&lt;span &gt;.&lt;/span&gt;search &lt;span &gt;import&lt;/span&gt; SearchVectorField
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;postgres&lt;span &gt;.&lt;/span&gt;indexes &lt;span &gt;import&lt;/span&gt; GinIndex &lt;span &gt;# add the Postgres recommended GIN index &lt;/span&gt;

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Film&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    film_id &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;AutoField&lt;span &gt;(&lt;/span&gt;primary_key&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    title &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    description &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;TextField&lt;span &gt;(&lt;/span&gt;blank&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    vector_column &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;SearchVectorField&lt;span &gt;(&lt;/span&gt;null&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;  &lt;span &gt;# new field&lt;/span&gt;

    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__str__&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&apos;, &apos;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;join&lt;span &gt;(&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&apos;film_id=&apos;&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; &lt;span &gt;str&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;.&lt;/span&gt;film_id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;title=&apos;&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;title&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;description=&apos;&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;description&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

    &lt;span &gt;class&lt;/span&gt; &lt;span &gt;Meta&lt;/span&gt;
        indexes &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;GinIndex&lt;span &gt;(&lt;/span&gt;fields&lt;span &gt;=&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&quot;vector_column&quot;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# add index&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, run the migrations for your Django app:&lt;/p&gt;
&lt;div  data-language=&quot;bash&quot;&gt;&lt;pre &gt;&lt;code &gt;./manage.py makemigrations &lt;span &gt;&amp;lt;&lt;/span&gt;your_app_name&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; ./manage.py migrate &lt;span &gt;&amp;lt;&lt;/span&gt;your_app_name&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, you need a way to make sure that anytime the &lt;code &gt;title&lt;/code&gt; and &lt;code &gt;description&lt;/code&gt; field on the Film table are updated, the &lt;code &gt;vector_column&lt;/code&gt; field is automatically computed and stored. For this, you can use &lt;a href=&quot;https://www.postgresql.org/docs/12/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS&quot;&gt;PostgreSQL triggers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Since there&apos;s no way to use triggers in the Django model directly, add a SQL command in a new migration file:&lt;/p&gt;
&lt;div  data-language=&quot;bash&quot;&gt;&lt;pre &gt;&lt;code &gt; ./manage.py makemigrations &lt;span &gt;&amp;lt;&lt;/span&gt;your_app_name&lt;span &gt;&gt;&lt;/span&gt; -n create_trigger --empty
 &lt;span &gt;# Migrations for &apos;&amp;lt;your_app_name&gt;&apos;:&lt;/span&gt;
 &lt;span &gt;# &amp;lt;your_app_name&gt;/migrations/0003_create_trigger.py&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Open the auto-generated file and add a trigger set off by the &lt;code &gt;UPDATE&lt;/code&gt; command. This trigger computes the &lt;code &gt;vector_column&lt;/code&gt; field for new and existing rows:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; migrations

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;Migration&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;

    dependencies &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;&amp;lt;your_app_name&gt;&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;0002_auto_20210224_0325&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;

    operations &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        migrations&lt;span &gt;.&lt;/span&gt;RunSQL&lt;span &gt;(&lt;/span&gt;
            sql&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;&apos;&apos;
              CREATE TRIGGER vector_column_trigger
              BEFORE INSERT OR UPDATE OF title, description, vector_column
              ON film
              FOR EACH ROW EXECUTE PROCEDURE
              tsvector_update_trigger(
                vector_column, &apos;pg_catalog.english&apos;, title, description
              );

              UPDATE film SET vector_column = NULL;
            &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;

            reverse_sql &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos;
              DROP TRIGGER IF EXISTS vector_column_trigger
              ON film;
            &apos;&apos;&apos;&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Run the migrate command for your app again:&lt;/p&gt;
&lt;div  data-language=&quot;bash&quot;&gt;&lt;pre &gt;&lt;code &gt;python manage.py migrate &lt;span &gt;&amp;lt;&lt;/span&gt;your_app_name&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now your database should have a new column called &lt;code &gt;vector_column&lt;/code&gt; that contains an indexed &lt;code &gt;tsvector&lt;/code&gt; for each film’s title and description.&lt;/p&gt;
&lt;h3 id=&quot;generated-columns-in-postgres-12&quot; &gt;&lt;a href=&quot;#generated-columns-in-postgres-12&quot; aria-label=&quot;generated columns in postgres 12 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Generated Columns in Postgres 12+&lt;/h3&gt;
&lt;p&gt;Running Postgres 12 or newer? You can make use of the &lt;a href=&quot;https://www.postgresql.org/docs/13/ddl-generated-columns.html&quot;&gt;Generated Columns&lt;/a&gt; feature to avoid using triggers for updating the &lt;code &gt;tsvector&lt;/code&gt; column, by creating the column like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;ALTER&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; film &lt;span &gt;ADD&lt;/span&gt; &lt;span &gt;COLUMN&lt;/span&gt; vector_column tsvector GENERATED ALWAYS &lt;span &gt;AS&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
  setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;coalesce&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;title&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;A&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt;
  setweight&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;coalesce&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;description&lt;span &gt;,&lt;/span&gt;&lt;span &gt;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;B&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt; STORED&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that Django does not have official support for generated columns (tagged as &quot;wontfix&quot; in the &lt;a href=&quot;https://code.djangoproject.com/ticket/31300&quot;&gt;bug tracker&lt;/a&gt;), so you have to create the column manually in a migration:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; migrations

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;migrations&lt;span &gt;.&lt;/span&gt;Migration&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;

    dependencies &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;&amp;lt;your_app_name&gt;&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;0002_auto_20210224_0326&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;

    operations &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;
        migrations&lt;span &gt;.&lt;/span&gt;RunSQL&lt;span &gt;(&lt;/span&gt;
            sql&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;&apos;&apos;
              ALTER TABLE film ADD COLUMN vector_column tsvector GENERATED ALWAYS AS (
                setweight(to_tsvector(&apos;english&apos;, coalesce(title, &apos;&apos;)), &apos;A&apos;) ||
                setweight(to_tsvector(&apos;english&apos;, coalesce(description,&apos;&apos;)), &apos;B&apos;)
              ) STORED;
            &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;

            reverse_sql &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos;
              ALTER TABLE film DROP COLUMN vector_column;
            &apos;&apos;&apos;&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;comparing-query-performance&quot; &gt;&lt;a href=&quot;#comparing-query-performance&quot; aria-label=&quot;comparing query performance permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Comparing Query Performance&lt;/h3&gt;
&lt;p&gt;Now that you have added the new column to optimize performance, you can compare the non-indexed Full Text Search to using the SearchVectorField &lt;code &gt;vector_column&lt;/code&gt;. Using your Python shell, import your model and search the title and description without using the &lt;code &gt;vector_column&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; connection&lt;span &gt;,&lt;/span&gt; reset_queries
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt;your_app_name&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; Film
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;postgres&lt;span &gt;.&lt;/span&gt;search &lt;span &gt;import&lt;/span&gt; SearchVector&lt;span &gt;,&lt;/span&gt; SearchQuery
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; pprint &lt;span &gt;import&lt;/span&gt; pprint &lt;span &gt;as&lt;/span&gt; pp

&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; vector &lt;span &gt;=&lt;/span&gt; SearchVector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;title&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;description&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; config&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; query &lt;span &gt;=&lt;/span&gt; SearchQuery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;love&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;  Film&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;annotate&lt;span &gt;(&lt;/span&gt;search&lt;span &gt;=&lt;/span&gt;vector&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;search&lt;span &gt;=&lt;/span&gt;query&lt;span &gt;)&lt;/span&gt;
 &lt;span &gt;&amp;lt;&lt;/span&gt;QuerySet &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;374&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Graffiti Love&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Unbelieveable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer &lt;span &gt;in&lt;/span&gt; Berlin&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pp&lt;span &gt;(&lt;/span&gt;connection&lt;span &gt;.&lt;/span&gt;queries&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# Runtime of last run query&lt;/span&gt;
&lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&apos;sql&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;SELECT &quot;film&quot;.&quot;film_id&quot;, &quot;film&quot;.&quot;title&quot;, &quot;film&quot;.&quot;description&quot;, &apos;&lt;/span&gt;
         &lt;span &gt;&apos;&quot;film&quot;.&quot;vector_column&quot;, to_tsvector(\&apos;english\&apos;::regconfig, &apos;&lt;/span&gt;
         &lt;span &gt;&apos;COALESCE(&quot;film&quot;.&quot;title&quot;, \&apos;\&apos;) || \&apos; \&apos; || &apos;&lt;/span&gt;
         &lt;span &gt;&apos;COALESCE(&quot;film&quot;.&quot;description&quot;, \&apos;\&apos;)) AS &quot;search&quot; FROM &quot;film&quot; WHERE &apos;&lt;/span&gt;
         &lt;span &gt;&apos;to_tsvector(\&apos;english\&apos;::regconfig, COALESCE(&quot;film&quot;.&quot;title&quot;, \&apos;\&apos;) &apos;&lt;/span&gt;
         &lt;span &gt;&apos;|| \&apos; \&apos; || COALESCE(&quot;film&quot;.&quot;description&quot;, \&apos;\&apos;)) @@ &apos;&lt;/span&gt;
         &lt;span &gt;&quot;plainto_tsquery(&apos;love&apos;) LIMIT 21&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&apos;time&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;0.045&apos;&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; reset_queries&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# clears time of last query from memory, so we can re-use connection.queries for new queries&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, this query takes 0.045 seconds.&lt;/p&gt;
&lt;p&gt;Now, check how long it takes to search against the indexed SearchVectorField &lt;code &gt;vector_column&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; Film&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;vector_column&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;love&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
 &lt;span &gt;&amp;lt;&lt;/span&gt;QuerySet &lt;span &gt;[&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt;Film&lt;span &gt;:&lt;/span&gt; film_id&lt;span &gt;=&lt;/span&gt;&lt;span &gt;374&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;=&lt;/span&gt;Graffiti Love&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;=&lt;/span&gt;A Unbelieveable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer &lt;span &gt;in&lt;/span&gt; Berlin&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; pp&lt;span &gt;(&lt;/span&gt;connection&lt;span &gt;.&lt;/span&gt;queries&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&apos;sql&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;SELECT &quot;film&quot;.&quot;film_id&quot;, &quot;film&quot;.&quot;title&quot;, &quot;film&quot;.&quot;description&quot;, &apos;&lt;/span&gt;
         &lt;span &gt;&apos;&quot;film&quot;.&quot;vector_column&quot; FROM &quot;film&quot; WHERE &quot;film&quot;.&quot;vector_column&quot; @@ &apos;&lt;/span&gt;
         &lt;span &gt;&quot;plainto_tsquery(&apos;love&apos;) LIMIT 21&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&apos;time&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;0.001&apos;&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; reset_queries&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;On a table with 1,003 rows, query execution time went down from 0.045s to 0.001s!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This could scale up to make a significant difference if you’re dealing with millions of records. The only downside is that saving your data into the new column and indexing it will make writes take slightly longer. Still, this is usually a price worth paying as users expect search to be as fast as possible.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Postgres offers a wide range of tools to support FTS, and in this article, you’ve seen how some of them work. You also saw how to use these tools in a Django application and leverage the SearchVectorField class with a GIN index to optimize performance. While there’s certainly more to building a fast, accurate search application, having a strong understanding of Postgres’ Full Text Search features will help you understand if it’s the best option for you.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Further Reading:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch.html&quot;&gt;PostgreSQL Full Text Search Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/search/&quot;&gt;Django Full Text Search Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/blog/full-text-search-ruby-rails-postgres&quot;&gt;Full Text Search in Milliseconds with Rails and PostgreSQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;eBook: Efficient Search in Rails with Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Adeyinka works as a Software Engineer at BriteCore, and is based in Lagos, Nigeria. She loves researching and writing in-depth technical content. You can find her on &lt;a href=&quot;https://github.com/AdeyinkaAdegbenro&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Creating Custom Postgres Data Types in Django]]></title><description><![CDATA[Postgres allows you to define custom data types when the default types provided don't fit your needs. There are many situations where these custom data types come in handy. For example, if you have multiple columns in several tables that should be an  between 0 and 255, you could use a custom data type so that you only have to define the constraints once. Or, if you have complex data - like metadata about a file - and you want to save it to a single column instead of spreading it across several…]]></description><link>https://pganalyze.com/blog/custom-postgres-data-types-django-python</link><guid isPermaLink="false">https://pganalyze.com/blog/custom-postgres-data-types-django-python</guid><dc:creator><![CDATA[Josh Alletto]]></dc:creator><pubDate>Tue, 15 Dec 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/2c0091f36c4cb013824095ab46137ac5/c1b63/custom_postgres_data_types_django_header_pganalyze.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Postgres Custom Data Type Example&quot; title=&quot;Postgres Custom Data Type Example&quot; src=&quot;https://pganalyze.com/static/2c0091f36c4cb013824095ab46137ac5/1d69c/custom_postgres_data_types_django_header_pganalyze.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;Postgres allows you to define custom data types when the default types provided don&apos;t fit your needs. There are many situations where these custom data types come in handy.&lt;/p&gt;
&lt;p&gt;For example, if you have multiple columns in several tables that should be an &lt;code &gt;int&lt;/code&gt; between 0 and 255, you could use a custom data type so that you only have to define the constraints once. Or, if you have complex data - like metadata about a file - and you want to save it to a single column instead of spreading it across several, custom data types can help.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#custom-domains-in-postgres&quot;&gt;Custom Domains in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#composite-types-in-postgres&quot;&gt;Composite Types in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#custom-types-in-django&quot;&gt;Custom Types in Django&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#registering-a-type-with-psycopg2&quot;&gt;Registering a type with psycopg2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#representing-composite-types-as-a-python-class&quot;&gt;Representing composite types as a Python class&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-django-fields&quot;&gt;Using Django fields&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;No matter how you decide to define your datatype, Django has the functionality to allow you to map custom column data to model attributes. You can achieve this by extending the Django &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.Field&quot;&gt;field class&lt;/a&gt;. In this walkthrough, we&apos;ll see &lt;strong&gt;how to create custom types in Postgres&lt;/strong&gt; and then use them in Django to &lt;strong&gt;ensure consistent data types across your application&lt;/strong&gt;. We will do this by walking you through an &lt;a href=&quot;https://github.com/pganalyze-resources/django-custom-data-types-example&quot;&gt;example project&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Should you be interested in &lt;a href=&quot;https://pganalyze.com/blog/custom-postgres-data-types-ruby-rails&quot;&gt;how to create custom Postgres data types in Rails&lt;/a&gt;, check out my dedicated article about it!&lt;/p&gt;
&lt;h2 id=&quot;custom-domains-in-postgres&quot; &gt;&lt;a href=&quot;#custom-domains-in-postgres&quot; aria-label=&quot;custom domains in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Custom Domains in Postgres&lt;/h2&gt;
&lt;p&gt;There are several different kinds of custom data types in Postgres, including &lt;a href=&quot;https://www.postgresql.org/docs/9.5/datatype-enum.html&quot;&gt;enums&lt;/a&gt; and &lt;a href=&quot;https://www.postgresql.org/docs/9.5/rangetypes.html&quot;&gt;range types&lt;/a&gt;. The two we’ll use in our project today are called domain types and composite types.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, let’s take a look at &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createdomain.html&quot;&gt;domain types&lt;/a&gt;. Domains are a way of adding restrictions to an existing type so that it can be reused in columns across tables. They are particularly useful for columns like email addresses, phone numbers, or &lt;a href=&quot;https://postgis.net/docs/stdaddr.html&quot;&gt;street addresses&lt;/a&gt;, where you might find yourself repeating the same checks over and over. A custom domain allows you to define those checks once and then reuse them making them easier to manage and maintain.&lt;/p&gt;
&lt;p&gt;For our example project, we&apos;ll start by creating a custom data type that performs a check to ensure a string doesn&apos;t contain any spaces:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; DOMAIN string_no_spaces &lt;span &gt;as&lt;/span&gt; &lt;span &gt;VARCHAR&lt;/span&gt; &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt; &lt;span &gt;CHECK&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;value&lt;/span&gt; &lt;span &gt;!&lt;/span&gt;&lt;span &gt;~&lt;/span&gt; &lt;span &gt;&apos;\s&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we can use this type on as many tables or in as many columns as we like. For example say we don’t want to allow spaces in user_names for a chat app:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; users &lt;span &gt;(&lt;/span&gt;
  id &lt;span &gt;serial&lt;/span&gt; &lt;span &gt;primary&lt;/span&gt; &lt;span &gt;key&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
 user_name string_no_spaces
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now if you try to add a value with a space, Postgres will throw an error:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;INSERT INTO users(user_name) VALUES (&apos;I am a      bad user name&apos;);
-- ERROR:  value for domain string_no_spaces violates check constraint &quot;string_no_spaces_check&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can also reuse this domain in the definition of another domain. For example:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; DOMAIN email_with_check &lt;span &gt;AS&lt;/span&gt; string_no_spaces &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt; &lt;span &gt;CHECK&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;value&lt;/span&gt; &lt;span &gt;~&lt;/span&gt; &lt;span &gt;&apos;@&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; email_addresses &lt;span &gt;(&lt;/span&gt;
  user_id &lt;span &gt;integer&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  email email_with_check
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; email_addresses&lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;josh @gmail.com&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- ERROR:  value for domain email_with_check violates check constraint &quot;string_no_spaces_check&quot;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; email_addresses&lt;span &gt;(&lt;/span&gt;email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;joshgmail.com&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- ERROR:  value for domain email_with_check violates check constraint &quot;email_with_check_check&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here, we&apos;ve created a new check to ensure an email contains &lt;code &gt;@&lt;/code&gt; and we&apos;ve used &lt;code &gt;string_no_spaces&lt;/code&gt; as our base type. This allows us to inherit the no spaces check. Now data of datatype &lt;code &gt;email_with_check&lt;/code&gt; must contain &lt;code &gt;@&lt;/code&gt; and cannot contain spaces.&lt;/p&gt;
&lt;h2 id=&quot;composite-types-in-postgres&quot; &gt;&lt;a href=&quot;#composite-types-in-postgres&quot; aria-label=&quot;composite types in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Composite Types in Postgres&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;second&lt;/strong&gt; kind of custom data type we’ll look at today is called a &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createtype.html&quot;&gt;composite type&lt;/a&gt;. A composite type is essentially a group of data that can be held in a single column. Composite types can be helpful if you have lists of data that you don&apos;t want to be spread over multiple columns. Perhaps this data only makes sense when grouped together like the &lt;a href=&quot;https://til.hashrocket.com/posts/3693c7fc13-creating-custom-types-in-postgresql&quot;&gt;dimensions of a package&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/RGB_color_model&quot;&gt;RGB color data&lt;/a&gt; is another good example because it doesn&apos;t make much sense on its own - 255 is just an &lt;code &gt;int&lt;/code&gt; - but coupled with some labels and two other numbers (&lt;code &gt;red: 255, green: 0, blue: 0&lt;/code&gt;), it becomes the color red. Every time we access a color, we&apos;ll want to have all three of these values returned, so it saves us from having to query multiple columns for a group of data that is only meaningful when combined.&lt;/p&gt;
&lt;p&gt;Let&apos;s start by creating a new RGB color value type:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TYPE&lt;/span&gt; rgb_color_value &lt;span &gt;as&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
  red &lt;span &gt;integer&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  green &lt;span &gt;integer&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  blue &lt;span &gt;integer&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, we can create a new table and use both our domain and custom data type for the columns:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; colors &lt;span &gt;(&lt;/span&gt;
  name string_no_spaces&lt;span &gt;,&lt;/span&gt;
  rgb rgb_color_value
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; colors&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; rgb&lt;span &gt;)&lt;/span&gt; &lt;span &gt;VALUES&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;pink&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;252&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;15&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;192&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; colors&lt;span &gt;;&lt;/span&gt;

 name &lt;span &gt;|&lt;/span&gt; rgb   
&lt;span &gt;------+---------&lt;/span&gt;
 pink &lt;span &gt;|&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;252&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;15&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;192&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can even access the individual values. For example, if all we want is the green value:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;rgb&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;green &lt;span &gt;FROM&lt;/span&gt; colors&lt;span &gt;;&lt;/span&gt;

  green 
 &lt;span &gt;------&lt;/span&gt;
   &lt;span &gt;15&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;custom-types-in-django&quot; &gt;&lt;a href=&quot;#custom-types-in-django&quot; aria-label=&quot;custom types in django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Custom Types in Django&lt;/h2&gt;
&lt;p&gt;Let&apos;s use our &lt;code &gt;string_no_spaces&lt;/code&gt; domain and our &lt;code &gt;rgb_color_value&lt;/code&gt; composite type to create a Django model to define a color. &lt;code &gt;rgb_color_value&lt;/code&gt; is going to take the most work, so we&apos;ll start there and then come back to &lt;code &gt;string_no_spaces&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;registering-a-type-with-psycopg2&quot; &gt;&lt;a href=&quot;#registering-a-type-with-psycopg2&quot; aria-label=&quot;registering a type with psycopg2 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Registering a type with psycopg2&lt;/h3&gt;
&lt;p&gt;We&apos;ll use the &lt;a href=&quot;https://www.psycopg.org/docs/&quot;&gt;pyscopg2&lt;/a&gt; database adapter in this example. I won&apos;t go into how to set it up here, but I recommend &lt;a href=&quot;https://www.digitalocean.com/community/tutorials/how-to-use-postgresql-with-your-django-application-on-ubuntu-14-04&quot;&gt;this tutorial&lt;/a&gt;. It does a good job covering the setup if you aren&apos;t familiar with it yet.&lt;/p&gt;
&lt;p&gt;We&apos;ll need to start by &lt;a href=&quot;https://www.psycopg.org/docs/extensions.html#psycopg2.extensions.register_adapter&quot;&gt;registering and creating an adapter&lt;/a&gt; for our new type so that psycopg2 knows how to handle it. After we register it, psycopg will return values from the database as a named tuple.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; connection
&lt;span &gt;from&lt;/span&gt; psycopg2&lt;span &gt;.&lt;/span&gt;extras &lt;span &gt;import&lt;/span&gt; register_composite

Rgb &lt;span &gt;=&lt;/span&gt; register_composite&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&apos;rgb_color_value&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  connection&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;,&lt;/span&gt;
  globally&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;type&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above code will handle data coming to our app from the database, but &lt;strong&gt;we&apos;ll also need to tell psycopg what to do with data sent to the database&lt;/strong&gt;. That&apos;s where the adapter comes in:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; connection
&lt;span &gt;from&lt;/span&gt; psycopg2&lt;span &gt;.&lt;/span&gt;extras &lt;span &gt;import&lt;/span&gt; register_composite
&lt;span &gt;from&lt;/span&gt; psycopg2&lt;span &gt;.&lt;/span&gt;extensions &lt;span &gt;import&lt;/span&gt; register_adapter&lt;span &gt;,&lt;/span&gt; adapt&lt;span &gt;,&lt;/span&gt; AsIs

Rgb &lt;span &gt;=&lt;/span&gt; register_composite&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&apos;rgb_color_value&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  connection&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;,&lt;/span&gt;
  globally&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;type&lt;/span&gt;

&lt;span &gt;def&lt;/span&gt; &lt;span &gt;rgb_adapter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; AsIs&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;(%s, %s, %s)::rgb_color_value&quot;&lt;/span&gt; &lt;span &gt;%&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    adapt&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;.&lt;/span&gt;red&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;getquoted&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    adapt&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;.&lt;/span&gt;green&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;getquoted&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    adapt&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;.&lt;/span&gt;blue&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;getquoted&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

register_adapter&lt;span &gt;(&lt;/span&gt;Rgb&lt;span &gt;,&lt;/span&gt; rgb_adapter&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that psycopg knows about our new data type and how to handle it, we can create the same functionality for Django.&lt;/p&gt;
&lt;h3 id=&quot;representing-composite-types-as-a-python-class&quot; &gt;&lt;a href=&quot;#representing-composite-types-as-a-python-class&quot; aria-label=&quot;representing composite types as a python class permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Representing composite types as a Python class&lt;/h3&gt;
&lt;p&gt;We want to be able to do things with our objects like this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;rgb &lt;span &gt;=&lt;/span&gt; Rgb&lt;span &gt;(&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

my_color_object&lt;span &gt;.&lt;/span&gt;rgb &lt;span &gt;=&lt;/span&gt; rgb 

my_color_object&lt;span &gt;.&lt;/span&gt;save&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To do that, we&apos;ll need to start with a Python class that represents an RGB value.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Rgb&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__init__&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; red&lt;span &gt;,&lt;/span&gt; green&lt;span &gt;,&lt;/span&gt; blue&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        self&lt;span &gt;.&lt;/span&gt;red &lt;span &gt;=&lt;/span&gt; red
        self&lt;span &gt;.&lt;/span&gt;green &lt;span &gt;=&lt;/span&gt; green
        self&lt;span &gt;.&lt;/span&gt;blue &lt;span &gt;=&lt;/span&gt; blue&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We&apos;ll come back to this class in a bit, but first, we need to talk about fields.&lt;/p&gt;
&lt;h3 id=&quot;using-django-fields&quot; &gt;&lt;a href=&quot;#using-django-fields&quot; aria-label=&quot;using django fields permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using Django fields&lt;/h3&gt;
&lt;p&gt;You are probably familiar with many of &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/ref/models/fields/&quot;&gt;Django&apos;s built-in model fields&lt;/a&gt; like &lt;code &gt;models.CharField&lt;/code&gt; or &lt;code &gt;models.IntegerField&lt;/code&gt;. You&apos;ve also probably noticed that many of these fields correspond to data types we often use in Postgres (&lt;code &gt;varchar&lt;/code&gt;, &lt;code &gt;int&lt;/code&gt; etc.).&lt;/p&gt;
&lt;p&gt;For custom data types, Django allows us to create our fields and then use them in our models:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;RgbField&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Field&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;db_type&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; connection&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&apos;rgb_color_value&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All custom fields inherit from &lt;code &gt;models.Field&lt;/code&gt;. You can also inherit from existing fields like &lt;code &gt;models.CharField&lt;/code&gt; (which itself inherits from &lt;code &gt;models.Field&lt;/code&gt;) This is helpful if your custom type behaves similarly to an existing type. Since ours doesn&apos;t, we&apos;ll inherit directly from &lt;code &gt;models.Field&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next, we need to override &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/howto/custom-model-fields/#converting-values-to-python-objects&quot;&gt;three methods&lt;/a&gt; so that they will return instances of our &lt;code &gt;Rgb&lt;/code&gt; class. The first, &lt;code &gt;from_db_value()&lt;/code&gt; is called when data is loaded from the database. This is the method that will receive our named tuple we set up with Psycopg earlier. The second, &lt;code &gt;to_python()&lt;/code&gt; gets called during deserialization. These two need to return an instance of the &lt;code &gt;Rgb&lt;/code&gt; class.&lt;/p&gt;
&lt;p&gt;The last method we need to override is &lt;code &gt;get_prep_value&lt;/code&gt;, where we&apos;ll convert our &lt;code &gt;Rgb&lt;/code&gt; object back into a tuple before handing it off to Psycopg to save to the database. When we&apos;re done, our field class should look like this:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;RgbField&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Field&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
  
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;from_db_value&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;,&lt;/span&gt; expression&lt;span &gt;,&lt;/span&gt; connection&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
      &lt;span &gt;if&lt;/span&gt; value &lt;span &gt;is&lt;/span&gt; &lt;span &gt;None&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
          &lt;span &gt;return&lt;/span&gt; value
      &lt;span &gt;return&lt;/span&gt; Rgb&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;.&lt;/span&gt;red&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;.&lt;/span&gt;green&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;.&lt;/span&gt;blue&lt;span &gt;)&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;to_python&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
      &lt;span &gt;if&lt;/span&gt; &lt;span &gt;isinstance&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;,&lt;/span&gt; Rgb&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
          &lt;span &gt;return&lt;/span&gt; value

      &lt;span &gt;if&lt;/span&gt; value &lt;span &gt;is&lt;/span&gt; &lt;span &gt;None&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
          &lt;span &gt;return&lt;/span&gt; value

      &lt;span &gt;return&lt;/span&gt; Rgb&lt;span &gt;(&lt;/span&gt;value&lt;span &gt;.&lt;/span&gt;red&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;.&lt;/span&gt;green&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;.&lt;/span&gt;blue&lt;span &gt;)&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;get_prep_value&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
      &lt;span &gt;return&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;value&lt;span &gt;.&lt;/span&gt;red&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;.&lt;/span&gt;green&lt;span &gt;,&lt;/span&gt; value&lt;span &gt;.&lt;/span&gt;blue&lt;span &gt;)&lt;/span&gt;
  
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;db_type&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; connection&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
      &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&apos;rgb_color_value&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The checks I put in place above are suggestions from the &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/howto/custom-model-fields/#converting-values-to-python-objects&quot;&gt;Django docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Finally, we can create our model using our brand new Rgb field:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Color&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    rgb &lt;span &gt;=&lt;/span&gt; RgbField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But wait! Didn&apos;t we create a special &lt;code &gt;string_no_spaces&lt;/code&gt; domain that we want to use for the &lt;code &gt;name&lt;/code&gt; attribute?&lt;/p&gt;
&lt;p&gt;Since this type is just a string with checks at the database level, all we need to do is create the field with the appropriate &lt;code &gt;db_type&lt;/code&gt; method:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;StringNoSpacesField&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Field&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;db_type&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; connection&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&apos;string_no_spaces&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we can update our model and run our migrations:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Color&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    rgb &lt;span &gt;=&lt;/span&gt; RgbField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; StringNoSpacesField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let&apos;s confirm that everything is working as expected. In the python shell (I&apos;m using &lt;a href=&quot;https://django-extensions.readthedocs.io/en/latest/shell_plus.html&quot;&gt;shell plus&lt;/a&gt;), we&apos;ll create a new color:&lt;/p&gt;
&lt;div  data-language=&quot;bash&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; from colors.models &lt;span &gt;import&lt;/span&gt; Rgb
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; rgb &lt;span &gt;=&lt;/span&gt; Rgb&lt;span &gt;(&lt;/span&gt;&lt;span &gt;255&lt;/span&gt;, &lt;span &gt;0&lt;/span&gt;, &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; c &lt;span &gt;=&lt;/span&gt; Color.objects.create&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&apos;red&apos;&lt;/span&gt;, &lt;span &gt;rgb&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;rgb&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; c.rgb
&lt;span &gt;&amp;lt;&lt;/span&gt;colors.models.Rgb object at 0x104e3d6d&lt;span &gt;&lt;span &gt;8&lt;/span&gt;&gt;&lt;/span&gt;
&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; c.rgb.red
&lt;span &gt;255&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you try to create a color with a name that has a space in it, you will get an error like this:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;django.db.utils.IntegrityError: value for domain string_no_spaces violates check constraint &quot;string_no_spaces_check&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, let&apos;s check the database and make sure everything got saved as the correct type:&lt;/p&gt;
&lt;div  data-language=&quot;psql&quot;&gt;&lt;pre &gt;&lt;code &gt;customdt=# SELECT pg_typeof(rgb), pg_typeof(name) FROM colors_color;
    pg_typeof | pg_typeof     
-----------------+------------------
 rgb_color_value | string_no_spaces
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;From here, we could ensure that only numbers from 0 - 255 are entered by &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/howto/custom-model-fields/#writing-a-field-subclass&quot;&gt;overriding the __ init __ &lt;/a&gt; method and adding checks at the Postgres level. We could also create a new type for storing the hex code for each color in addition to the RGB value.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, we saw how to create new data types in Postgres and bring them into a Django application. We created a &lt;code &gt;string_no_spaces&lt;/code&gt; type with &lt;code &gt;CREATE DOMAIN&lt;/code&gt; to help us set up some database level checks on columns. We used &lt;code &gt;CREATE TYPE&lt;/code&gt; to create a brand new composite data type called &lt;code &gt;rgb_color_value&lt;/code&gt; that allowed us to group the data for a color value and save it to a single column.&lt;/p&gt;
&lt;p&gt;We then registered our new data types with psycopg2 so that the database adapter knew how to handle them. Finally, we took a look at the Django Field class. We learned how to control values coming to and from our database adapter to ensure our custom data type matches its corresponding Python class for use inside of our Django application.&lt;/p&gt;
&lt;p&gt;As mentioned above, you can find all resources talked about here on our
&lt;a href=&quot;https://github.com/pganalyze-resources/django-custom-data-types-example&quot;&gt;resources repository on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22Creating%20Custom%20%23Postgres%20Data%20Types%20in%20%23Django%22%20-%20Here,%20%40pganalyze%20show%20how%20to%20set%20up%20database%20level%20checks%20on%20columns,%20create%20new%20composite%20data%20types,%20register%20new%20data%20types%20with%20psycopg2,%20and%20explain%20the%20Django%20Field%20Class%3A%20https%3A%2F%2Fpganalyze.com%2Fblog%2Fcustom-postgres-data-types-django-python&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Josh is a former educator turned developer with a proven ability to learn quickly and adapt to different roles. In 2018 he changed careers from education to tech and has been excited to find that his communication and presentation skills have transferred over to his new technical career. He&apos;s always looking for a new challenge and a dedicated team to collaborate with.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[PostGIS vs. Geocoder in Rails]]></title><description><![CDATA[This article sets out to compare PostGIS in Rails with Geocoder and to highlight a couple of the areas where you'll want to (or need to) reach for one over the other. I will also present some of the terminology and libraries that I found along the way of working on this project and article as I set out to understand PostGIS better and how it is integrated with Rails. If you are interested in learning how to work with geospatial data with PostGIS in Django I recommend having a look at our blog…]]></description><link>https://pganalyze.com/blog/postgis-rails-geocoder</link><guid isPermaLink="false">https://pganalyze.com/blog/postgis-rails-geocoder</guid><dc:creator><![CDATA[Leigh Halliday]]></dc:creator><pubDate>Thu, 01 Oct 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;This article sets out to compare PostGIS in Rails with Geocoder and to highlight a couple of the areas where you&apos;ll want to (or need to) reach for one over the other. I will also present some of the terminology and libraries that I found along the way of working on this project and article as I set out to understand PostGIS better and how it is integrated with Rails.&lt;/p&gt;
&lt;p&gt;If you are interested in learning how to work with geospatial data with PostGIS in Django I recommend having a look at our blog post &lt;a href=&quot;https://pganalyze.com/blog/geodjango-postgis&quot;&gt;Using GeoDjango and PostGIS in Django&lt;/a&gt; here.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#installing-postgis&quot;&gt;Installing PostGIS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#activerecord-postgis-adapter&quot;&gt;ActiveRecord PostGIS Adapter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#our-example-data&quot;&gt;Our Example Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#building-a-geo-helper-class-with-postgis&quot;&gt;Building a Geo Helper Class with PostGIS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#finding-nearby-records-with-postgis-and-geocoder&quot;&gt;Finding Nearby Records with PostGIS and Geocoder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#finding-records-within-a-bounding-box-with-postgis-and-geocoder&quot;&gt;Finding Records Within a Bounding Box with PostGIS and Geocoder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#finding-records-within-a-polygon-with-postgis-and-geocoder&quot;&gt;Finding Records Within a Polygon with PostGIS and Geocoder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#finding-nearby-related-records-with-postgis-and-geocoder&quot;&gt;Finding Nearby Related Records with PostGIS and Geocoder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;PostGIS vs. Geocoder in Rails&quot;
        title=&quot;PostGIS vs. Geocoder in Rails&quot;
        src=&quot;https://pganalyze.com/static/383f2659b144f300d98f78a94aefe750/acb04/postgis_rails_geocoder_pganalyze.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;
Picture via &lt;a href=&quot;https://unsplash.com/@anniespratt&quot;&gt;Annie Spratt&lt;/a&gt; on Unsplash&lt;/p&gt;
&lt;p&gt;I have built a number of Rails applications over the years that show locations on a map, have nearby search functionality, and I had never used &lt;a href=&quot;https://postgis.net/&quot;&gt;PostGIS&lt;/a&gt; before! How was this possible? The reason is that there is a Ruby gem named &lt;a href=&quot;https://github.com/alexreisner/geocoder&quot;&gt;Geocoder&lt;/a&gt; which enables you to do these sorts of queries, and it&apos;s quite efficient! That said, there is a reason that PostGIS exists. For more complex geo queries I’d recommend reaching beyond Geocoder to PostGIS.&lt;/p&gt;
&lt;p&gt;As an example, if you wanted to find homes which have a school within 1km of them, or if you wanted to draw an oddly shaped polygon on a map and search within it, this is the world where PostGIS shines and makes these complex geo queries possible.&lt;/p&gt;
&lt;p&gt;In this article we will be covering:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PostGIS in Rails setup&lt;/li&gt;
&lt;li&gt;Finding nearby records (Geocoder + PostGIS)&lt;/li&gt;
&lt;li&gt;Finding records within a bounding box (Geocoder + PostGIS)&lt;/li&gt;
&lt;li&gt;Finding records within a polygon (PostGIS)&lt;/li&gt;
&lt;li&gt;Finding nearby related records (PostGIS)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The source code referenced in this article can be &lt;a href=&quot;https://github.com/pganalyze-resources/rails-postgis-demo&quot;&gt;found here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;installing-postgis&quot; &gt;&lt;a href=&quot;#installing-postgis&quot; aria-label=&quot;installing postgis permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Installing PostGIS&lt;/h2&gt;
&lt;p&gt;Postgres comes with a number of built-in extensions that you can enable, but unfortunately PostGIS (Spatial and Geographic objects for Postgres) isn&apos;t one of them. In order to enable this extension, you will have to use a Postgres install with PostGIS support. I recommend using the &lt;a href=&quot;https://registry.hub.docker.com/r/postgis/postgis&quot;&gt;official postgis docker image&lt;/a&gt;, but luckily many hosted Postgres solutions come with PostGIS already available. If you are not sure, you can query the available extensions with the following query:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;select&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; pg_available_extensions
&lt;span &gt;where&lt;/span&gt; name &lt;span &gt;like&lt;/span&gt; &lt;span &gt;&apos;%postgis%&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you&apos;d like to see if the extension is &lt;em&gt;already&lt;/em&gt; enabled, you can run this query:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;select&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; pg_extension&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And finally, to enable this extension, you can use the command &lt;code &gt;create extension postgis&lt;/code&gt;, but since we&apos;re working within Rails, there is a Gem that will take care of this step for us as we&apos;ll see below.&lt;/p&gt;
&lt;h2 id=&quot;activerecord-postgis-adapter&quot; &gt;&lt;a href=&quot;#activerecord-postgis-adapter&quot; aria-label=&quot;activerecord postgis adapter permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;ActiveRecord PostGIS Adapter&lt;/h2&gt;
&lt;p&gt;If you have confirmed that your version of Postgres supports the &lt;code &gt;postgis&lt;/code&gt; extension, you&apos;re ready to integrate it with your Rails application. This can be done by using the &lt;a href=&quot;https://github.com/rgeo/activerecord-postgis-adapter&quot;&gt;activerecord-postgis-adapter&lt;/a&gt; gem. Two things need to be done to get up and running. The first is to update the &lt;code &gt;adapter&lt;/code&gt; within &lt;code &gt;config/database.yml&lt;/code&gt; to be set to &lt;code &gt;postgis&lt;/code&gt;. Next, if this is a new application, you can run &lt;code &gt;rails db:create&lt;/code&gt; as normal, but if it is an existing one, you&apos;ll have to run the command &lt;code &gt;rake db:gis:setup&lt;/code&gt;. This command is enabling the postgis extension in your database.&lt;/p&gt;
&lt;h2 id=&quot;our-example-data&quot; &gt;&lt;a href=&quot;#our-example-data&quot; aria-label=&quot;our example data permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Our Example Data&lt;/h2&gt;
&lt;p&gt;We&apos;ll be working with sample data for a realtor website that allows us to find homes in a variety of ways, including homes that are nearby a local school. There are two models: &lt;code &gt;homes&lt;/code&gt; and &lt;code &gt;schools&lt;/code&gt;. The Rails migration to create these tables is below:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;CreateHomes&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;change&lt;/span&gt;&lt;/span&gt;
    create_table &lt;span &gt;:homes&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;t&lt;span &gt;|&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;string &lt;span &gt;:name&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;string &lt;span &gt;:status&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;bigint &lt;span &gt;:price&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;integer &lt;span &gt;:beds&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; default&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;integer &lt;span &gt;:baths&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; default&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;st_point &lt;span &gt;:coords&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; geographic&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;float &lt;span &gt;:longitude&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;float &lt;span &gt;:latitude&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;timestamps

      t&lt;span &gt;.&lt;/span&gt;index &lt;span &gt;:coords&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; using&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:gist&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;index &lt;span &gt;%i[latitude longitude]&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;index &lt;span &gt;:status&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;index &lt;span &gt;:price&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;

    create_table &lt;span &gt;:schools&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;t&lt;span &gt;|&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;st_point &lt;span &gt;:coords&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; geographic&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;

      t&lt;span &gt;.&lt;/span&gt;index &lt;span &gt;:coords&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; using&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:gist&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;timestamps
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By using &lt;code &gt;activerecord-postgis-adapter&lt;/code&gt; we are able to define PostGIS columns within our migration file. When working with PostGIS you can store a point (latitude + longitude) as a single column of type &lt;code &gt;ts_point&lt;/code&gt;, whereas when working with &lt;a href=&quot;https://github.com/alexreisner/geocoder&quot;&gt;Geocoder&lt;/a&gt; the latitude and longitude are stored as floats in separate columns. Because we are comparing the two approaches, we will store the data both ways, but typically you would choose one approach or the other.&lt;/p&gt;
&lt;p&gt;PostGIS &lt;strong&gt;geographic&lt;/strong&gt; columns can be indexed using &lt;a href=&quot;https://www.postgresql.org/docs/current/gist-intro.html&quot;&gt;GiST&lt;/a&gt; style indexes. GiST indexes are required over B-Tree indexes when working with geographic data because coordinates cannot be easily sorted along a single axis (such as numbers, letters, dates, etc...) in a way that would allow the database to speed up common geographic operations.&lt;/p&gt;
&lt;p&gt;The example project for this article contains a seeds file (run with &lt;code &gt;rake db:seed&lt;/code&gt;) which will generate 100k homes and 100 schools in and around the Atlanta, Georgia area.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;building-a-geo-helper-class-with-postgis&quot; &gt;&lt;a href=&quot;#building-a-geo-helper-class-with-postgis&quot; aria-label=&quot;building a geo helper class with postgis permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Building a Geo Helper Class with PostGIS&lt;/h2&gt;
&lt;p&gt;The Rails PostGIS adapter is based on a library named &lt;a href=&quot;https://github.com/rgeo/rgeo&quot;&gt;RGeo&lt;/a&gt;, which while incredibly powerful, I found a little bit confusing due to a lack of documentation. I ended up building a small helper class to generate different geo objects for me. The first thing to point out is what &lt;a href=&quot;https://en.wikipedia.org/wiki/Spatial_reference_system&quot;&gt;SRID&lt;/a&gt; is. Just like the imperial and metric systems are used to measure and weigh amounts using an agreed upon reference point, coordinates also need a coordinate reference system to ensure that the latitude and longitude that one uses means the same thing to different people when referring to a single place on earth. &lt;a href=&quot;https://spatialreference.org/ref/epsg/wgs-84/&quot;&gt;4326&lt;/a&gt; is the spatial system used for GPS satellite navigation systems and the one we will be using within this article.&lt;/p&gt;
&lt;p&gt;One last thing to define is what &lt;a href=&quot;https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry&quot;&gt;WKT&lt;/a&gt; is. Well-known Text representation of geometry is a string representation of a point, line string, and polygon (among other things) that we will be using in our examples in this article. This is the format Postgres (PostGIS) receives and displays geographic data types in.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Geo&lt;/span&gt;
  &lt;span &gt;SRID&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;4326&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;factory&lt;/span&gt;&lt;/span&gt;
    &lt;span &gt;@@factory&lt;/span&gt; &lt;span &gt;||&lt;/span&gt;&lt;span &gt;=&lt;/span&gt; &lt;span &gt;RGeo&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Geographic&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;spherical_factory&lt;span &gt;(&lt;/span&gt;srid&lt;span &gt;:&lt;/span&gt; &lt;span &gt;SRID&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;pairs_to_points&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;pairs&lt;span &gt;)&lt;/span&gt;
    pairs&lt;span &gt;.&lt;/span&gt;map &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;pair&lt;span &gt;|&lt;/span&gt; point&lt;span &gt;(&lt;/span&gt;pair&lt;span &gt;[&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; pair&lt;span &gt;[&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;point&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;longitude&lt;span &gt;,&lt;/span&gt; latitude&lt;span &gt;)&lt;/span&gt;
    factory&lt;span &gt;.&lt;/span&gt;point&lt;span &gt;(&lt;/span&gt;longitude&lt;span &gt;,&lt;/span&gt; latitude&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;line_string&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;points&lt;span &gt;)&lt;/span&gt;
    factory&lt;span &gt;.&lt;/span&gt;line_string&lt;span &gt;(&lt;/span&gt;points&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;polygon&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;points&lt;span &gt;)&lt;/span&gt;
    line &lt;span &gt;=&lt;/span&gt; line_string&lt;span &gt;(&lt;/span&gt;points&lt;span &gt;)&lt;/span&gt;
    factory&lt;span &gt;.&lt;/span&gt;polygon&lt;span &gt;(&lt;/span&gt;line&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;to_wkt&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;feature&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;&quot;srid=&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;&lt;span &gt;SRID&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;;&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;feature&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&quot;&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;finding-nearby-records-with-postgis-and-geocoder&quot; &gt;&lt;a href=&quot;#finding-nearby-records-with-postgis-and-geocoder&quot; aria-label=&quot;finding nearby records with postgis and geocoder permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Finding Nearby Records with PostGIS and Geocoder&lt;/h2&gt;
&lt;p&gt;One of the most common geo queries used in applications is to find all records within X distance from a known point (the user&apos;s location, an event, a search, etc...). Because we installed &lt;code &gt;Geocoder&lt;/code&gt; and added &lt;code &gt;reverse_geocoded_by :latitude, :longitude&lt;/code&gt; to our &lt;code &gt;Home&lt;/code&gt; class, we can use the &lt;code &gt;nearby&lt;/code&gt; method to find all homes within 5km of this latitude and longitude (which happens to be Atlanta, Georgia). Geocoder likes to have arrays with latitude and then longitude, as opposed to PostGIS which &lt;strong&gt;prefers the exact opposite&lt;/strong&gt; order!&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Home&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;near&lt;span &gt;(&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;33.753746&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.386330&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:all&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# ~5ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query ran in about 5ms on my computer (searching through 100k records)... pretty fast! The reason it is fast is because we added an index on the latitude and longitude fields, but also because Geocoder applies a bounding box filter which utilises the index. Remember the Spatial Reference System (SRID) that we mentioned above? Because our coordinates do not take place on a &lt;a href=&quot;https://en.wikipedia.org/wiki/Cartesian_coordinate_system&quot;&gt;Cartesian plane&lt;/a&gt;, we can’t use a standard distance formula to calculate the &lt;a href=&quot;https://www.mathsisfun.com/algebra/distance-2-points.html&quot;&gt;distance between two points&lt;/a&gt;. Although we won’t venture further into the math of this query below, it takes into consideration the Earth’s spherical nature when calculating the distance between two coordinates as specified by latitude and longitude. &lt;a href=&quot;https://www.movable-type.co.uk/scripts/latlong.html&quot;&gt;This article&lt;/a&gt; dives into more detail on these calculations if you are interested.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;homes&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;homes&lt;span &gt;.&lt;/span&gt;latitude &lt;span &gt;BETWEEN&lt;/span&gt; &lt;span &gt;33.708779919704064&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;33.798712080295935&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; homes&lt;span &gt;.&lt;/span&gt;longitude &lt;span &gt;BETWEEN&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.44041260768655&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.33224739231345&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;6371.0&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; ASIN&lt;span &gt;(&lt;/span&gt;SQRT&lt;span &gt;(&lt;/span&gt;POWER&lt;span &gt;(&lt;/span&gt;SIN&lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;33.753746&lt;/span&gt; &lt;span &gt;-&lt;/span&gt; homes&lt;span &gt;.&lt;/span&gt;latitude&lt;span &gt;)&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; PI&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;180&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;+&lt;/span&gt; COS&lt;span &gt;(&lt;/span&gt;&lt;span &gt;33.753746&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; PI&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;180&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; COS&lt;span &gt;(&lt;/span&gt;homes&lt;span &gt;.&lt;/span&gt;latitude &lt;span &gt;*&lt;/span&gt; PI&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;180&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; POWER&lt;span &gt;(&lt;/span&gt;SIN&lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.38633&lt;/span&gt; &lt;span &gt;-&lt;/span&gt; homes&lt;span &gt;.&lt;/span&gt;longitude&lt;span &gt;)&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; PI&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;180&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;BETWEEN&lt;/span&gt; &lt;span &gt;0.0&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We&apos;ll have to build our own &lt;code &gt;near&lt;/code&gt; query when working with PostGIS, but don&apos;t worry, it&apos;s pretty straight forward! The &lt;code &gt;g_near&lt;/code&gt; method lives within the &lt;code &gt;Home&lt;/code&gt; model, and takes advantage of the &lt;a href=&quot;https://postgis.net/docs/ST_DWithin.html&quot;&gt;ST_DWithin&lt;/a&gt; function provided by PostGIS. Remember that we have to convert our point into the correct WKT format so that PostGIS understands the data we are passing it.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;g_near&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;point&lt;span &gt;,&lt;/span&gt; distance&lt;span &gt;)&lt;/span&gt;
  where&lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;&apos;ST_DWithin(coords, :point, :distance)&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;{&lt;/span&gt; point&lt;span &gt;:&lt;/span&gt; &lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to_wkt&lt;span &gt;(&lt;/span&gt;point&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; distance&lt;span &gt;:&lt;/span&gt; distance &lt;span &gt;*&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;# wants meters not kms&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;

&lt;span &gt;Home&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;g_near&lt;span &gt;(&lt;/span&gt;&lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;point&lt;span &gt;(&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.386330&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.753746&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count &lt;span &gt;# ~5ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query performs just about as fast as the Geocoder version (because of our GiST index on the &lt;code &gt;coords&lt;/code&gt; column), but is definitely a little easier on the eyes to read.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;homes&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;ST_DWithin&lt;span &gt;(&lt;/span&gt;coords&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;srid=4326;POINT (-84.38633 33.753746)&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;5000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;finding-records-within-a-bounding-box-with-postgis-and-geocoder&quot; &gt;&lt;a href=&quot;#finding-records-within-a-bounding-box-with-postgis-and-geocoder&quot; aria-label=&quot;finding records within a bounding box with postgis and geocoder permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Finding Records Within a Bounding Box with PostGIS and Geocoder&lt;/h2&gt;
&lt;p&gt;Geocoder provides us a way to find all records within a bounding box (roughly a rectangle, ignoring projection onto a sphere), and we just have to pass it the bottom left (south west) and top right (north east) coordinates.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Home&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;within_bounding_box&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;[&lt;/span&gt;&lt;span &gt;33.7250057553&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.4224209302&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;[&lt;/span&gt;&lt;span &gt;33.774350796&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.3570139222&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count &lt;span &gt;# ~5ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Because it can use the index on latitude and longitude, it is quite efficient.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;homes&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;homes&lt;span &gt;.&lt;/span&gt;latitude &lt;span &gt;BETWEEN&lt;/span&gt; &lt;span &gt;33.7250057553&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;33.774350796&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; homes&lt;span &gt;.&lt;/span&gt;longitude &lt;span &gt;BETWEEN&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.4224209302&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.3570139222&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To perform a bounding box query using PostGis, we&apos;ll create a method named &lt;code &gt;g_within_box&lt;/code&gt; inside of the &lt;code &gt;Home&lt;/code&gt; model, and utilize a PostGIS function named &lt;a href=&quot;https://postgis.net/docs/ST_MakeEnvelope.html&quot;&gt;ST_MakeEnvelope&lt;/a&gt; along with the &lt;code &gt;&amp;amp;&amp;amp;&lt;/code&gt; operator.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;g_within_box&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;sw_point&lt;span &gt;,&lt;/span&gt; ne_point&lt;span &gt;)&lt;/span&gt;
  where&lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;&quot;coords &amp;amp;&amp;amp; ST_MakeEnvelope(:sw_lon, :sw_lat, :ne_lon, :ne_lat, &lt;span &gt;&lt;span &gt;#{&lt;/span&gt;
      &lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;SRID&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;/span&gt;)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;{&lt;/span&gt;
      sw_lon&lt;span &gt;:&lt;/span&gt; sw_point&lt;span &gt;.&lt;/span&gt;longitude&lt;span &gt;,&lt;/span&gt;
      sw_lat&lt;span &gt;:&lt;/span&gt; sw_point&lt;span &gt;.&lt;/span&gt;latitude&lt;span &gt;,&lt;/span&gt;
      ne_lon&lt;span &gt;:&lt;/span&gt; ne_point&lt;span &gt;.&lt;/span&gt;longitude&lt;span &gt;,&lt;/span&gt;
      ne_lat&lt;span &gt;:&lt;/span&gt; ne_point&lt;span &gt;.&lt;/span&gt;latitude
    &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;

&lt;span &gt;Home&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;g_within_box&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;point&lt;span &gt;(&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.4224209302&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.7250057553&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;point&lt;span &gt;(&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.3570139222&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.774350796&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count &lt;span &gt;# ~5ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again, this version performs at about the same efficiency as the Geocoder version.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;homes&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;coords &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; ST_MakeEnvelope&lt;span &gt;(&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.4224209302&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.7250057553&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.3570139222&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.774350796&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;4326&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/0f8374c9f7ad161492697445d965c753/9c7c2/postgis_rails_geocoder_bounding-box_pganalyze.jpg&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;PostGIS vs. Geocoder in Rails - Bounding Box&quot;
        title=&quot;PostGIS vs. Geocoder in Rails - Bounding Box&quot;
        src=&quot;https://pganalyze.com/static/0f8374c9f7ad161492697445d965c753/acb04/postgis_rails_geocoder_bounding-box_pganalyze.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2 id=&quot;finding-records-within-a-polygon-with-postgis-and-geocoder&quot; &gt;&lt;a href=&quot;#finding-records-within-a-polygon-with-postgis-and-geocoder&quot; aria-label=&quot;finding records within a polygon with postgis and geocoder permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Finding Records Within a Polygon with PostGIS and Geocoder&lt;/h2&gt;
&lt;p&gt;We&apos;re now into territory that &lt;strong&gt;requires&lt;/strong&gt; PostGIS. To find records inside of a &lt;a href=&quot;https://en.wikipedia.org/wiki/Polygon&quot;&gt;polygon&lt;/a&gt;, along with the help of our &lt;code &gt;Geo&lt;/code&gt; class helper and the &lt;a href=&quot;https://postgis.net/docs/ST_Covers.html&quot;&gt;ST_Covers&lt;/a&gt; function from PostGIS, we can create a method named &lt;code &gt;g_within_polygon&lt;/code&gt; in our &lt;code &gt;Home&lt;/code&gt; model. This polygon is a triangle, where the last point is the same as the first one, thereby &quot;closing&quot; the shape of the polygon.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;g_within_polygon&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;points&lt;span &gt;)&lt;/span&gt;
  polygon &lt;span &gt;=&lt;/span&gt; &lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;polygon&lt;span &gt;(&lt;/span&gt;points&lt;span &gt;)&lt;/span&gt;
  where&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;ST_Covers(:polygon, coords)&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; polygon&lt;span &gt;:&lt;/span&gt; &lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to_wkt&lt;span &gt;(&lt;/span&gt;polygon&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;

&lt;span &gt;Home&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;g_within_polygon&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;Geo&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;pairs_to_points&lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;[&lt;/span&gt;
      &lt;span &gt;[&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.39731626974567&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.75570358345219&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;[&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.33139830099567&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.86524376001825&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;[&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.25243406759724&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.770545357734925&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;[&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;84.39731626974567&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;33.75570358345219&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
    &lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count &lt;span &gt;# ~5ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query remains efficient due to the use of our GiST index, searching through 100k records in about 5ms.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;homes&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;ST_Covers&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;srid=4326;POLYGON ((-84.39731626974567 33.75570358345219, -84.33139830099567 33.86524376001825, -84.25243406759724 33.770545357734925, -84.39731626974567 33.75570358345219))&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; coords&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/ce2f065e32b345867f58e3bc79c5159c/c222a/postgis_rails_geocoder_polygon_pganalyze.jpg&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;PostGIS vs. Geocoder in Rails - Bounding Box&quot;
        title=&quot;PostGIS vs. Geocoder in Rails - Bounding Box&quot;
        src=&quot;https://pganalyze.com/static/ce2f065e32b345867f58e3bc79c5159c/acb04/postgis_rails_geocoder_polygon_pganalyze.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2 id=&quot;finding-nearby-related-records-with-postgis-and-geocoder&quot; &gt;&lt;a href=&quot;#finding-nearby-related-records-with-postgis-and-geocoder&quot; aria-label=&quot;finding nearby related records with postgis and geocoder permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Finding Nearby Related Records with PostGIS and Geocoder&lt;/h2&gt;
&lt;p&gt;Using PostGIS it is also possible to find &lt;strong&gt;related&lt;/strong&gt; nearby records. What do I mean by that? Let&apos;s try to find &lt;code &gt;available&lt;/code&gt; homes that are within 1km of a school. This can be done by joining to the &lt;code &gt;schools&lt;/code&gt; table and utilizing &lt;a href=&quot;https://postgis.net/docs/ST_DWithin.html&quot;&gt;ST_DWithin&lt;/a&gt; for the &lt;code &gt;on&lt;/code&gt; clause. Starting with the SQL we&apos;d like to produce:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  &lt;span &gt;count&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;DISTINCT&lt;/span&gt; homes&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt;
  homes
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; schools &lt;span &gt;ON&lt;/span&gt; ST_DWithin &lt;span &gt;(&lt;/span&gt;homes&lt;span &gt;.&lt;/span&gt;coords&lt;span &gt;,&lt;/span&gt; schools&lt;span &gt;.&lt;/span&gt;coords&lt;span &gt;,&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;WHERE&lt;/span&gt;
  homes&lt;span &gt;.&lt;/span&gt;&lt;span &gt;status&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;available&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Within the &lt;code &gt;Home&lt;/code&gt; model of our Rails application, we can create two scopes that allow us to find these homes. We&apos;re able to join 100k homes to the schools table (100 schools) based on their proximity in approximately 16ms.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Home&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  scope &lt;span &gt;:available&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; where&lt;span &gt;(&lt;/span&gt;status&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;available&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
  scope &lt;span &gt;:near_school&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        lambda &lt;span &gt;{&lt;/span&gt;
          select&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;DISTINCT ON (homes.id) homes.*&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;joins&lt;span &gt;(&lt;/span&gt;
            &lt;span &gt;&apos;INNER JOIN schools ON ST_DWithin (homes.coords, schools.coords, 1000)&apos;&lt;/span&gt;
          &lt;span &gt;)&lt;/span&gt;
        &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;
&lt;span &gt;# Example using the scopes declared above&lt;/span&gt;
&lt;span &gt;Home&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;available&lt;span &gt;.&lt;/span&gt;near_school&lt;span &gt;.&lt;/span&gt;count&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;distinct homes.id&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;# 16ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;We&apos;ve only scratched the surface of what you can do with PostGIS, yet we were able to cover a ton of functionality that is common among websites that allow you to filter results based on their location. That said, if PostGIS isn&apos;t available as an extension on your version of Postgres, or you aren&apos;t requiring the power that PostGIS provides, Geocoder offers you a great alternative.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22PostGIS%20vs.%20Geocoder%20in%20Rails%22%20-%20This%20article%20by%20%40pganalyze%20compares%20PostGIS%20in%20%23Rails%20with%20Geocoder%20and%20highlights%20areas%20where%20you%27ll%20want%20to%20reach%20for%20one%20over%20the%20other%3A%20https%3A%2F%2Fpganalyze.com%2Fblog%2Fpostgis-rails-geocoder&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Leigh Halliday is a guest author for the &lt;a src=&quot;https://pganalyze.com/&quot;&gt;pganalyze&lt;/a&gt; blog. He is a developer based out of Canada who works at &lt;a href=&quot;https://www.flipgive.com&quot;&gt;FlipGive&lt;/a&gt; as a full-stack developer. He writes about Ruby and React on &lt;a href=&quot;https://www.leighhalliday.com&quot;&gt;his blog&lt;/a&gt; and publishes React tutorials on &lt;a href=&quot;https://youtube.com/leighhalliday&quot;&gt;YouTube&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Lessons Learned from Running Postgres 13: Better Performance, Monitoring & More]]></title><description><![CDATA[Postgres 13 is almost here. It's been in beta since May, and the general availability release is
coming any day. We've been following Postgres 13 closely here at pganalyze, and have been running
the beta in one of our staging environments for several months now. There are no big new features in Postgres 13, but there are a lot of small but important incremental
improvements. Let's take a look. Performance Smaller Indexes with B-Tree Deduplication Extended Statistics Improvements in Postgres 1…]]></description><link>https://pganalyze.com/blog/postgres13-better-performance-monitoring-usability</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres13-better-performance-monitoring-usability</guid><dc:creator><![CDATA[Maciek Sakrejda]]></dc:creator><pubDate>Mon, 21 Sep 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;div &gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/1bb1d2f91b1933a7b1b6c23448c93116/aa440/postgres_13.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;postgres 13&quot; title=&quot;Astronaut writing Postgres 13&quot; src=&quot;https://pganalyze.com/static/1bb1d2f91b1933a7b1b6c23448c93116/1d69c/postgres_13.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/div&gt;
&lt;p&gt;Postgres 13 is almost here. It&apos;s been in beta since May, and the general availability release is
coming any day. We&apos;ve been following Postgres 13 closely here at pganalyze, and have been running
the beta in one of our staging environments for several months now.&lt;/p&gt;
&lt;p&gt;There are no big new features in Postgres 13, but there are a lot of small but important incremental
improvements. Let&apos;s take a look.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#performance&quot;&gt;Performance&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#smaller-indexes-with-b-tree-deduplication&quot;&gt;Smaller Indexes with B-Tree Deduplication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#extended-statistics-improvements-in-postgres-13&quot;&gt;Extended Statistics Improvements in Postgres 13&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#parallel-vacuum--better-support-for-append-only-workloads&quot;&gt;Parallel VACUUM &amp;#x26; Better Support for Append-only Workloads&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#incremental-sorting&quot;&gt;Incremental Sorting&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#monitoring&quot;&gt;Monitoring&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#wal-usage-stats&quot;&gt;WAL Usage Stats&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#better-statement-logging-in-postgres-13&quot;&gt;Better Statement Logging in Postgres 13&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#more-planning-information&quot;&gt;More Planning Information&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#usability&quot;&gt;Usability&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#glossary&quot;&gt;Glossary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#better-uuid-support&quot;&gt;Better UUID Support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#psql-improvements&quot;&gt;psql Improvements&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;performance&quot; &gt;&lt;a href=&quot;#performance&quot; aria-label=&quot;performance permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Performance&lt;/h2&gt;
&lt;p&gt;Postgres 13 performance improvements include both built-in optimizations and heuristics that will make
your database run better out of the box, as well as additional features to give you more flexibility
in optimizing your schema and queries.&lt;/p&gt;
&lt;h3 id=&quot;smaller-indexes-with-b-tree-deduplication&quot; &gt;&lt;a href=&quot;#smaller-indexes-with-b-tree-deduplication&quot; aria-label=&quot;smaller indexes with b tree deduplication permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Smaller Indexes with B-Tree Deduplication&lt;/h3&gt;
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; standalone=&quot;no&quot;?&gt;
&lt;!DOCTYPE svg PUBLIC &quot;-//W3C//DTD SVG 1.1//EN&quot; &quot;http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd&quot;&gt;
&lt;svg xmlns:xl=&quot;http://www.w3.org/1999/xlink&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; version=&quot;1.1&quot; viewBox=&quot;222 230.5 630.072 116&quot; width=&quot;630.072&quot; height=&quot;116&quot;&gt;
  &lt;defs&gt;
    &lt;font-face font-family=&quot;Avenir Next&quot; font-size=&quot;16&quot; panose-1=&quot;2 11 5 3 2 2 2 2 2 4&quot; units-per-em=&quot;1000&quot; underline-position=&quot;-75&quot; underline-thickness=&quot;50&quot; slope=&quot;0&quot; x-height=&quot;468&quot; cap-height=&quot;708&quot; ascent=&quot;1e3&quot; descent=&quot;-365.9973&quot; font-weight=&quot;400&quot;&gt;
      &lt;font-face-src&gt;
        &lt;font-face-name name=&quot;AvenirNext-Regular&quot;/&gt;
      &lt;/font-face-src&gt;
    &lt;/font-face&gt;
    &lt;font-face font-family=&quot;Avenir Next&quot; font-size=&quot;16&quot; panose-1=&quot;2 11 8 3 2 2 2 2 2 4&quot; units-per-em=&quot;1000&quot; underline-position=&quot;-75&quot; underline-thickness=&quot;50&quot; slope=&quot;0&quot; x-height=&quot;498&quot; cap-height=&quot;708&quot; ascent=&quot;1e3&quot; descent=&quot;-365.9973&quot; font-weight=&quot;700&quot;&gt;
      &lt;font-face-src&gt;
        &lt;font-face-name name=&quot;AvenirNext-Bold&quot;/&gt;
      &lt;/font-face-src&gt;
    &lt;/font-face&gt;
  &lt;/defs&gt;
  &lt;metadata&gt; Produced by OmniGraffle 7.17.2\n2020-09-20 19:29:10 +0000&lt;/metadata&gt;
  &lt;g id=&quot;Canvas_1&quot; stroke-dasharray=&quot;none&quot; stroke-opacity=&quot;1&quot; stroke=&quot;none&quot; fill=&quot;none&quot; fill-opacity=&quot;1&quot;&gt;
    &lt;title&gt;Canvas 1&lt;/title&gt;
    &lt;g id=&quot;Canvas_1_Layer_1&quot;&gt;
      &lt;title&gt;Layer 1&lt;/title&gt;
      &lt;g id=&quot;Graphic_2&quot;&gt;
        &lt;rect x=&quot;232&quot; y=&quot;240.5&quot; width=&quot;250.5&quot; height=&quot;39.5&quot; fill=&quot;#d8d5df&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_3&quot;&gt;
        &lt;rect x=&quot;232&quot; y=&quot;297&quot; width=&quot;89.5&quot; height=&quot;39.5&quot; fill=&quot;#d8eef0&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_9&quot;&gt;
        &lt;text transform=&quot;translate(244.128 251.026)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Avenir Next&quot; font-size=&quot;16&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;16&quot;&gt;&lt;!--218 MB--&gt;&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_10&quot;&gt;
        &lt;text transform=&quot;translate(244.128 309.013)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Avenir Next&quot; font-size=&quot;16&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;16&quot;&gt;&lt;!--67 MB--&gt;&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_13&quot;&gt;
        &lt;text transform=&quot;translate(502 251.026)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Avenir Next&quot; font-size=&quot;16&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;16&quot;&gt;deduplicate_items=off&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_14&quot;&gt;
        &lt;text transform=&quot;translate(502 307.526)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Avenir Next&quot; font-size=&quot;16&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;16&quot;&gt;deduplicate_items=on   &lt;/tspan&gt;
          &lt;tspan font-family=&quot;Avenir Next&quot; font-size=&quot;16&quot; font-weight=&quot;700&quot; fill=&quot;black&quot; y=&quot;16&quot;&gt;(new in Postgres 13)&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
    &lt;/g&gt;
  &lt;/g&gt;
&lt;/svg&gt;
&lt;p&gt;Postgres 13 introduces a way for B-Tree indexes to &lt;a href=&quot;https://www.postgresql.org/docs/13/btree-implementation.html#BTREE-DEDUPLICATION&quot;&gt;avoid storing duplicate entries in some situations&lt;/a&gt;.
In general, a B-Tree index consists of a tree of indexed values, with each leaf node pointing to a
particular row version. Because each leaf points to one row version, if you are indexing non-unique
values, those values need to be repeated.&lt;/p&gt;
&lt;p&gt;The de-duplication mechanism avoids that by having a leaf node point to several row versions if possible,
which leads to smaller indexes.&lt;/p&gt;
&lt;p&gt;Here is an example from our own pganalyze application schema: We have a &lt;code &gt;queries&lt;/code&gt; table to
track all the queries we monitor, and a &lt;code &gt;database_id&lt;/code&gt; field to track which database they belong to. We
index &lt;code &gt;database_id&lt;/code&gt; (so we can quickly fetch queries for a specific database), and because each database
typically has more than one query, there is a lot of duplication in this index.&lt;/p&gt;
&lt;p&gt;New B-Tree indexes in Postgres 13 use the deduplication feature by default, but if for some reason,
you need to turn it off, you can control it with the &lt;code &gt;deduplicate_items&lt;/code&gt; storage parameter. Here we
create the same index in two different ways, with deduplication explicitly on and off (though again,
you don&apos;t need to specify &lt;code &gt;on&lt;/code&gt;—this is the default):&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; CONCURRENTLY queries_db_id_idx_no_dedup &lt;span &gt;ON&lt;/span&gt; queries&lt;span &gt;(&lt;/span&gt;database_id&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;WITH&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;deduplicate_items&lt;span &gt;=&lt;/span&gt;&lt;span &gt;off&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;INDEX&lt;/span&gt; CONCURRENTLY queries_db_id_idx_yes_dedup &lt;span &gt;ON&lt;/span&gt; queries&lt;span &gt;(&lt;/span&gt;database_id&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;WITH&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;deduplicate_items&lt;span &gt;=&lt;/span&gt;&lt;span &gt;on&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;SELECT&lt;/span&gt; relname&lt;span &gt;,&lt;/span&gt; pg_size_pretty&lt;span &gt;(&lt;/span&gt;pg_relation_size&lt;span &gt;(&lt;/span&gt;oid&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_class
&lt;span &gt;WHERE&lt;/span&gt; relname &lt;span &gt;IN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;queries_db_id_idx_no_dedup&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;queries_db_id_idx_yes_dedup&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;           relname           | pg_size_pretty 
-----------------------------+----------------
 queries_db_id_idx_no_dedup  | 218 MB
 queries_db_id_idx_yes_dedup | 67 MB
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With deduplication, the new index is more than &lt;strong&gt;three times smaller&lt;/strong&gt;! Smaller indexes are faster to load
from disk, and take up less space in memory, meaning there&apos;s more room for your data.&lt;/p&gt;
&lt;p&gt;One interesting note here is that the index entries point to row &lt;em&gt;versions&lt;/em&gt; (as in, a row the way it
exists in one specific &lt;a href=&quot;https://www.postgresql.org/docs/13/mvcc.html&quot;&gt;MVCC&lt;/a&gt; state), not rows themselves,
so this feature &lt;strong&gt;can improve index size even for unique indexes&lt;/strong&gt;, where one would not expect any duplication
to occur.&lt;/p&gt;
&lt;p&gt;Note that deduplication is not possible in all cases (see above link for details), and that you
will need to reindex before you can take advantage of it if upgrading via &lt;code &gt;pg_upgrade&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;extended-statistics-improvements-in-postgres-13&quot; &gt;&lt;a href=&quot;#extended-statistics-improvements-in-postgres-13&quot; aria-label=&quot;extended statistics improvements in postgres 13 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Extended Statistics Improvements in Postgres 13&lt;/h3&gt;
&lt;p&gt;Postgres 10 introduced the concept of &lt;a href=&quot;https://www.postgresql.org/docs/13/planner-stats.html#PLANNER-STATS-EXTENDED&quot;&gt;extended statistics&lt;/a&gt;. Postgres keeps some statistics about the &quot;shape&quot; of your data to ensure it can plan queries efficiently,
but the statistics kept by default cannot track things like inter-column dependencies. Extended statistics
were introduced to address that: These are database objects (like indexes) that you create manually with
&lt;code &gt;CREATE STATISTICS&lt;/code&gt; to give the query planner more information for more specific situations. These would be
expensive for Postgres to determine automatically, but armed with an understanding of the semantics of your
schema, you can provide that additional info. Used carefully, this can lead to
&lt;a href=&quot;https://build.affinity.co/how-we-used-postgres-extended-statistics-to-achieve-a-3000x-speedup-ea93d3dcdc61&quot;&gt;massive performance improvements&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Postgres 13 brings a number of small but important improvements to extended statistics, including
support for using them with &lt;code &gt;OR&lt;/code&gt; clauses and in &lt;code &gt;IN&lt;/code&gt;/&lt;code &gt;ANY&lt;/code&gt; constant lists, allowing consideration
of multiple extended statistics objects in planning a query, and support for
&lt;a href=&quot;https://www.postgresql.org/docs/13/sql-alterstatistics.html&quot;&gt;setting a statistics target&lt;/a&gt; for
extended statistics:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;ALTER&lt;/span&gt; &lt;span &gt;STATISTICS&lt;/span&gt; table_stx &lt;span &gt;SET&lt;/span&gt; &lt;span &gt;STATISTICS&lt;/span&gt; &lt;span &gt;1000&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Like with the regular statistics target, this is a trade-off between additional planning time (and longer &lt;code &gt;ANALYZE&lt;/code&gt; runs), versus having more precise plans. We recommend using this in a targeted manner using EXPLAIN plans to confirm plan changes.&lt;/p&gt;
&lt;h3 id=&quot;parallel-vacuum--better-support-for-append-only-workloads&quot; &gt;&lt;a href=&quot;#parallel-vacuum--better-support-for-append-only-workloads&quot; aria-label=&quot;parallel vacuum  better support for append only workloads permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Parallel VACUUM &amp;#x26; Better Support for Append-only Workloads&lt;/h3&gt;
&lt;p&gt;Postgres multi-version concurrency control means you need to run &lt;code &gt;VACUUM&lt;/code&gt; regularly (usually you can rely
on the autovacuum process, though it may need some tuning). In Postgres 13, one notable improvement is
that multiple indexes for a single table can be vacuumed in parallel. This can lead to big performance
improvements in &lt;code &gt;VACUUM&lt;/code&gt; work. Parallel &lt;code &gt;VACUUM&lt;/code&gt; is the default and can be controlled with the &lt;code &gt;PARALLEL&lt;/code&gt; option:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;VACUUM &lt;span &gt;(&lt;/span&gt;PARALLEL &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; VERBOSE&lt;span &gt;)&lt;/span&gt; queries&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;INFO:  vacuuming &quot;public.queries&quot;
INFO:  launched 2 parallel vacuum workers for index vacuuming (planned: 2)
INFO:  scanned index &quot;index_queries_on_database_id&quot; to remove 1403418 row versions by parallel vacuum worker
DETAIL:  CPU: user: 0.98 s, system: 0.15 s, elapsed: 2.37 s
INFO:  scanned index &quot;index_queries_on_last_occurred_at&quot; to remove 1403418 row versions by parallel vacuum worker
DETAIL:  CPU: user: 0.88 s, system: 0.27 s, elapsed: 2.60 s
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Parallel VACUUM occurs when the following is true:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sufficient parallel workers are available, based on the system-wide limit set by &lt;a href=&quot;https://www.postgresql.org/docs/13/runtime-config-resource.html#GUC-MAX-PARALLEL-WORKERS-MAINTENANCE&quot;&gt;&lt;code &gt;max_parallel_maintenance_workers&lt;/code&gt;&lt;/a&gt; (defaults to 2)&lt;/li&gt;
&lt;li&gt;There are multiple indexes on the table (one index can be processed by one worker at a time)&lt;/li&gt;
&lt;li&gt;Index types support it (all built-in index types support parallelism to some extent)&lt;/li&gt;
&lt;li&gt;The indexes are large enough to exceed &lt;a href=&quot;https://www.postgresql.org/docs/13/runtime-config-query.html#GUC-MIN-PARALLEL-INDEX-SCAN-SIZE&quot;&gt;&lt;code &gt;min_parallel_index_scan_size&lt;/code&gt;&lt;/a&gt; (defaults to 512 kB)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Be aware that &lt;strong&gt;parallel VACUUM is currently not supported for autovacuum.&lt;/strong&gt; This new feature is intended for use in manual VACUUM runs that need to complete quickly, such as when insufficient autovacuum tuning has lead to an imminent TXID wraparound, and you need to intervene to fix it.&lt;/p&gt;
&lt;p&gt;On that note, an important &lt;code &gt;autovacuum&lt;/code&gt; improvement in Postgres 13 is that the autovacuum background process can now be triggered by &lt;code &gt;INSERT&lt;/code&gt; statements for append-only tables. The main purpose of VACUUM is to clean up old versions of updated and deleted rows, but it is also essential to set pages as all-visible for MVCC bookkeeping. All-visible pages allow index-only scans to avoid checking visibility status row-by-row, making them faster.&lt;/p&gt;
&lt;p&gt;We make extensive use of append-only tables at pganalyze for our timeseries data, and this improvement will make our lives considerably easier, avoiding the occasional manual VACUUM run on these tables. This new behavior can be controlled by the &lt;code &gt;autovacuum_vacuum_insert_threshold&lt;/code&gt; and &lt;code &gt;autovacuum_vacuum_insert_scale_factor&lt;/code&gt; variables.&lt;/p&gt;
&lt;h3 id=&quot;incremental-sorting&quot; &gt;&lt;a href=&quot;#incremental-sorting&quot; aria-label=&quot;incremental sorting permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Incremental Sorting&lt;/h3&gt;
&lt;p&gt;Sorting data is a common database task, and Postgres has a number of features to avoid unnecessary work
here. For example, if you have a B-Tree index on a column, and you query your table ordered by that column,
it can just scan that index in order to get sorted data.&lt;/p&gt;
&lt;p&gt;In Postgres 13, this is improved to handle partially sorted data. If you have an index on &lt;code &gt;(a, b)&lt;/code&gt; (or
the data is already sorted by &lt;code &gt;(a, b)&lt;/code&gt; for another reason), and you issue a query to order by &lt;code &gt;(a, b, c)&lt;/code&gt;,
Postgres understands that the input data is already partially sorted, and can avoid re-sorting the whole
dataset. This is especially useful if you have a &lt;code &gt;LIMIT&lt;/code&gt; in your query, since this can avoid even more
work.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;monitoring&quot; &gt;&lt;a href=&quot;#monitoring&quot; aria-label=&quot;monitoring permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Monitoring&lt;/h2&gt;
&lt;p&gt;Monitoring improvements in Postgres 13 include more details on &lt;code &gt;WAL&lt;/code&gt; usage, more options for logging your
queries, and more information on query planning.&lt;/p&gt;
&lt;h3 id=&quot;wal-usage-stats&quot; &gt;&lt;a href=&quot;#wal-usage-stats&quot; aria-label=&quot;wal usage stats permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;WAL Usage Stats&lt;/h3&gt;
&lt;p&gt;The write-ahead log (&lt;code &gt;WAL&lt;/code&gt;) ensures your data stays consistent in the event of a crash, even mid-write. Consistency
is a fundamental property of databases—it ensures your transaction either committed or did not commit; you don&apos;t
have to worry about in-between states. But on a busy system, &lt;code &gt;WAL&lt;/code&gt; writes can often be a bottleneck. To help
diagnose this, Postgres 13 includes more information on &lt;code &gt;WAL&lt;/code&gt; usage from your queries.&lt;/p&gt;
&lt;p&gt;&lt;code &gt;EXPLAIN&lt;/code&gt; now supports information about &lt;code &gt;WAL&lt;/code&gt; records generated during execution:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;ANALYZE&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; WAL&lt;span &gt;)&lt;/span&gt; &lt;span &gt;DELETE&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; users&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Delete on users  (cost=0.00..5409.00 rows=100000 width=6) (actual time=108.910..108.911 rows=0 loops=1)
   WAL: records=100000 fpi=741 bytes=11425721
   -&gt;  Seq Scan on users  (cost=0.00..5409.00 rows=100000 width=6) (actual time=8.519..51.850 rows=100000 loops=1)
 Planning Time: 6.083 ms
 Execution Time: 108.955 ms
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see that the &lt;code &gt;WAL&lt;/code&gt; line includes the number of records generated, the number of full page images (fpi), and
the number of &lt;code &gt;WAL&lt;/code&gt; bytes generated. Only non-zero values are printed in the default text format.&lt;/p&gt;
&lt;p&gt;This is also available in &lt;code &gt;pg_stat_statements&lt;/code&gt;. For example, on our staging environment, here is what we ran to get
the statement that produced the most &lt;code &gt;WAL&lt;/code&gt; records:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; query&lt;span &gt;,&lt;/span&gt; calls&lt;span &gt;,&lt;/span&gt; wal_records&lt;span &gt;,&lt;/span&gt; wal_fpi&lt;span &gt;,&lt;/span&gt; wal_bytes &lt;span &gt;FROM&lt;/span&gt; pg_stat_statements
  &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; wal_records &lt;span &gt;DESC&lt;/span&gt; &lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;-[ RECORD 1 ]---------------------------------------------------------------------------------------------
query       | CREATE TEMPORARY TABLE upsert_data (server_id uuid NOT NULL, backend_id uuid NOT NULL,
            | query_start timestamp NOT NULL, query_fingerprint bytea NOT NULL, query_text text NOT NULL)
calls       | 7974948
wal_records | 966960816
wal_fpi     | 1018412
wal_bytes   | 100086092097&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Like many other values in &lt;code &gt;pg_stat_statements&lt;/code&gt;, the &lt;code &gt;wal_records&lt;/code&gt;, &lt;code &gt;wal_fpi&lt;/code&gt;, and &lt;code &gt;wal_bytes&lt;/code&gt; values here are
cumulative since the last &lt;code &gt;pg_stat_statements_reset&lt;/code&gt; call.&lt;/p&gt;
&lt;p&gt;This info can help you identify your write-heavy queries and optimize as necessary. Note that write-heavy
queries can also affect replication: If you see replication lag, you can use these new features to
understand better which statements are causing it.&lt;/p&gt;
&lt;h3 id=&quot;better-statement-logging-in-postgres-13&quot; &gt;&lt;a href=&quot;#better-statement-logging-in-postgres-13&quot; aria-label=&quot;better statement logging in postgres 13 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Better Statement Logging in Postgres 13&lt;/h3&gt;
&lt;p&gt;Settings like &lt;code &gt;log_min_duration_statement&lt;/code&gt; are great to help you understand your slow queries, but how
slow is slow? Is the reporting query that runs overnight slow compared to the 5s query that runs
in the context of a web request? Is that 5s query, that runs once in a rarely-used endpoint, slow
compared to a 100ms query that runs twenty times to load your home page?&lt;/p&gt;
&lt;p&gt;Until now, &lt;code &gt;log_min_duration_statement&lt;/code&gt; was one blunt tool for all these situations, but Postgres 13 brings some
flexibility with sampling-based statement logging. You can set &lt;a href=&quot;https://www.postgresql.org/docs/13/runtime-config-logging.html#GUC-LOG-MIN-DURATION-SAMPLE&quot;&gt;&lt;code &gt;log_min_duration_sample&lt;/code&gt;&lt;/a&gt; to enable sampling, and then either set
&lt;code &gt;log_statement_sample_rate&lt;/code&gt; or &lt;code &gt;log_transaction_sample_rate&lt;/code&gt; to control sampling.&lt;/p&gt;
&lt;p&gt;Both of these settings work in a similar manner: they range from 0 to 1, and determine the chance that
a statement will be randomly selected for logging. The former applies to individual statements, the latter
determines logging for all statements in a transaction. If both &lt;code &gt;log_min_duration_statement&lt;/code&gt; and
&lt;code &gt;log_min_duration_sample&lt;/code&gt; are set, the former should be a higher threshold that logs everything,
and the latter can be a lower threshold that logs only occasionally.&lt;/p&gt;
&lt;p&gt;Another great statement logging improvement is being able to &lt;strong&gt;log parameters for failed statements&lt;/strong&gt;
with &lt;a href=&quot;https://www.postgresql.org/docs/13/runtime-config-logging.html#GUC-LOG-PARAMETER-MAX-LENGTH-ON-ERROR&quot;&gt;&lt;code &gt;log_parameter_max_length_on_error&lt;/code&gt;&lt;/a&gt;. Here&apos;s an example of setting this to &lt;code &gt;-1&lt;/code&gt; (unlimited)
and trying to run &lt;code &gt;SELECT pg_sleep($1)&lt;/code&gt; (with parameter &lt;code &gt;$1&lt;/code&gt; set to &lt;code &gt;3&lt;/code&gt;) on a connection with a
&lt;code &gt;statement_timeout&lt;/code&gt; of &lt;code &gt;1s&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;2020-09-17 12:23:03.161 PDT [321349] maciek@maciek ERROR:  canceling statement due to statement timeout
2020-09-17 12:23:03.161 PDT [321349] maciek@maciek CONTEXT:  extended query with parameters: $1 = &apos;3&apos;
2020-09-17 12:23:03.161 PDT [321349] maciek@maciek STATEMENT:  select pg_sleep($1)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The timeout case is especially useful: Since both the query text and the parameters are now available
in the logs, you could run &lt;code &gt;EXPLAIN&lt;/code&gt; on any failed query to figure out what query plan caused
it to hit the time-out (N.B.: you are not guaranteed to get the same plan that failed, but depending
on your workload, the odds are pretty good).&lt;/p&gt;
&lt;h3 id=&quot;more-planning-information&quot; &gt;&lt;a href=&quot;#more-planning-information&quot; aria-label=&quot;more planning information permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;More Planning Information&lt;/h3&gt;
&lt;p&gt;The usual culprit in slow queries is the query execution itself, but with a complex schema and an
elaborate query, planning can take significant time as well. Postgres 13 introduces two new
changes that make it easier to keep an eye on planning:&lt;/p&gt;
&lt;p&gt;First, the &lt;code &gt;BUFFERS&lt;/code&gt; option to &lt;code &gt;EXPLAIN&lt;/code&gt; gives you more information on memory usage during query planning.
Postgres manages memory for your data and indexes using a &quot;buffer pool&quot;, and the &lt;code &gt;BUFFERS&lt;/code&gt; option can
show you which parts of your query are using that memory and how. The
&lt;a href=&quot;https://www.postgresql.org/docs/13/sql-explain.html&quot;&gt;EXPLAIN documentation&lt;/a&gt; has some more details. New
in Postgres 13 is the ability to see how buffers are used during query planning:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;ANALYZE&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; BUFFERS&lt;span &gt;)&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_class&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Seq Scan on pg_class  (cost=0.00..16.86 rows=386 width=265) (actual time=0.014..0.120 rows=390 loops=1)
   Buffers: shared hit=13
 Planning Time: 1.021 ms
   Buffers: shared hit=118
 Execution Time: 0.316 ms
(5 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Second, &lt;code &gt;pg_stat_statements&lt;/code&gt; will keep track of time spent planning if you enable the
&lt;a href=&quot;https://www.postgresql.org/docs/13/pgstatstatements.html#id-1.11.7.38.8&quot;&gt;&lt;code &gt;pg_stat_statements.track_planning&lt;/code&gt;&lt;/a&gt;
setting:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; query&lt;span &gt;,&lt;/span&gt; plans&lt;span &gt;,&lt;/span&gt; total_plan_time&lt;span &gt;,&lt;/span&gt;
       min_plan_time&lt;span &gt;,&lt;/span&gt; max_plan_time&lt;span &gt;,&lt;/span&gt; mean_plan_time&lt;span &gt;,&lt;/span&gt; stddev_plan_time
&lt;span &gt;FROM&lt;/span&gt;   pg_stat_statements &lt;span &gt;WHERE&lt;/span&gt; queryid &lt;span &gt;=&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;7012080368802260371&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;-[ RECORD 1 ]----+----------------------------------------------------------------------
query            | SELECT query, plans, total_plan_time,                                +
                 |        min_plan_time, max_plan_time, mean_plan_time, stddev_plan_time+
                 | FROM   pg_stat_statements WHERE queryid = $1
plans            | 1
total_plan_time  | 0.083102
min_plan_time    | 0.083102
max_plan_time    | 0.083102
mean_plan_time   | 0.083102
stddev_plan_time | 0&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is turned off by default due to performance overhead for certain workloads, but if you suspect planning time is
an issue, it&apos;s definitely worth checking out. For more details on the performance regression, see
&lt;a href=&quot;https://www.postgresql.org/message-id/flat/2895b53b033c47ccb22972b589050dd9%40EX13D05UWC001.ant.amazon.com&quot;&gt;this mailing list discussion&lt;/a&gt;; this is expected to be resolved in the future and the default may change.&lt;/p&gt;
&lt;h2 id=&quot;usability&quot; &gt;&lt;a href=&quot;#usability&quot; aria-label=&quot;usability permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Usability&lt;/h2&gt;
&lt;p&gt;Postgres 13 usability improvements include better documentation, better built-in UUID support, and some handy
&lt;code &gt;psql&lt;/code&gt; enhancements.&lt;/p&gt;
&lt;h3 id=&quot;glossary&quot; &gt;&lt;a href=&quot;#glossary&quot; aria-label=&quot;glossary permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Glossary&lt;/h3&gt;
&lt;p&gt;TOAST? Tuple? Postmaster?&lt;/p&gt;
&lt;p&gt;Any complex system will develop its own jargon, and Postgres is no exception. Some of it comes from the
database field in general, some of it is Postgres-specific. Having dedicated language to talk precisely
about specific technical concepts is very useful, but it can be confusing for newcomers.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Tuple&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;A collection of attributes in a fixed order. That order may be defined by the table (or other relation) where the tuple is contained, in which case the tuple is often called a row. It may also be defined by the structure of a result set, in which case it is sometimes called a record.&lt;/em&gt;
- &lt;a href=&quot;https://www.postgresql.org/docs/13/glossary.html&quot;&gt;PostgreSQL Glossary&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You are likely familiar with the terms above, but if you ever run across something you are unclear on,
those and many others are now documented &lt;a href=&quot;https://www.postgresql.org/docs/13/glossary.html&quot;&gt;in a new glossary&lt;/a&gt;.
And now that there&apos;s an established place to do so, we can look forward to other technical terms being added
here in the future.&lt;/p&gt;
&lt;h3 id=&quot;better-uuid-support&quot; &gt;&lt;a href=&quot;#better-uuid-support&quot; aria-label=&quot;better uuid support permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Better UUID Support&lt;/h3&gt;
&lt;p&gt;If you use UUIDs in your system (and you should consider it—they&apos;re pretty handy), you&apos;re probably
pretty familiar with the &lt;code &gt;uuid-ossp&lt;/code&gt; extension. The base &lt;code &gt;uuid&lt;/code&gt; type is built in, but by default,
there&apos;s no simple mechanism to automatically generate new ones. The &lt;code &gt;uuid-ossp&lt;/code&gt; extension ships with Postgres,
but must be enabled explicitly to create UUID-generation functions like the common &lt;code &gt;uuid_generate_v4&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Postgres 13 ships with a &lt;code &gt;gen_random_uuid&lt;/code&gt; function that is equivalent to &lt;code &gt;uuid_generate_v4&lt;/code&gt;, but available
by default. If you were only using &lt;code &gt;uuid-ossp&lt;/code&gt; for that function, you no longer need the extension:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;=&gt; \dx
     List of installed extensions
 Name | Version | Schema | Description
------+---------+--------+-------------
(0 rows)

=&gt; SELECT gen_random_uuid();
           gen_random_uuid
--------------------------------------
 07b45dae-e92e-4f91-8661-5fc0ef947d03
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;psql-improvements&quot; &gt;&lt;a href=&quot;#psql-improvements&quot; aria-label=&quot;psql improvements permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;psql Improvements&lt;/h3&gt;
&lt;p&gt;There are a number of small &lt;code &gt;psql&lt;/code&gt; improvements in Postgres 13. My favorite is that &lt;code &gt;\e&lt;/code&gt;, the command
to invoke your &lt;code &gt;$EDITOR&lt;/code&gt; on the current query buffer, will now display the query text when you save
and exit (unless you directly submit it by ending with a semicolon or &lt;code &gt;\g&lt;/code&gt;). Previously, the query
text was saved, but hidden. Compare opening your editor and saving &lt;code &gt;SELECT 1&lt;/code&gt; in &lt;code &gt;psql&lt;/code&gt; 11:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;maciek=# \e
maciek-# ;
 ?column? 
----------
        1
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;versus &lt;code &gt;psql&lt;/code&gt; 13:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;maciek=# \e
maciek=# select 1
maciek-# ;
 ?column? 
----------
        1
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It&apos;s now clear what query text will be submitted when you complete your query.&lt;/p&gt;
&lt;p&gt;Postgres 13 also includes additional ways to customize your &lt;code &gt;psql&lt;/code&gt; prompt. You can do so, as always, with
&lt;a href=&quot;https://www.postgresql.org/docs/13/app-psql.html#APP-PSQL-PROMPTING&quot;&gt;&lt;code &gt;\set&lt;/code&gt;&lt;/a&gt; (typically in your &lt;code &gt;.psqlrc&lt;/code&gt;),
but there&apos;s a couple of new substitutions available:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code &gt;%x&lt;/code&gt; will display the status of the current transaction: an empty string for no
transaction, &lt;code &gt;*&lt;/code&gt; when in an open transaction, &lt;code &gt;!&lt;/code&gt; when in a failed transaction, or &lt;code &gt;?&lt;/code&gt; when the
transaction state is unknown (typically when there is no connection to the server)&lt;/li&gt;
&lt;li&gt;&lt;code &gt;%w&lt;/code&gt; will pad &lt;code &gt;PROMPT2&lt;/code&gt; (used when more input is expected) to be the same width as &lt;code &gt;PROMPT1&lt;/code&gt; to keep things
nicely aligned&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are &lt;a href=&quot;https://www.postgresql.org/docs/13/release-13.html#id-1.11.6.5.5.10.2&quot;&gt;some other small improvements&lt;/a&gt;
as well. And these are all client-side changes, so they will also work if you are using a new &lt;code &gt;psql&lt;/code&gt; with an older
server!&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;These are just a few of the many small improvements that come with Postgres 13. There are many others,
like partial TOAST decompression, trusted extensions (so you can enable them without being
superuser), PL/pgSQL performance improvements, and more. You can check out the &lt;a href=&quot;https://www.postgresql.org/docs/13/release-13.html&quot;&gt;full release notes&lt;/a&gt; on the Postgres web site.&lt;/p&gt;
&lt;p&gt;We&apos;re very excited for this release. We already support monitoring Postgres 13 in &lt;a href=&quot;https://pganalyze.com&quot;&gt;pganalyze&lt;/a&gt;,
and are already working on incorporating the new monitoring features directly into the product to give you better
insights into your database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://twitter.com/intent/tweet?text=%22Lessons%20Learned%20from%20Running%20%23Postgres13%20-%20via%20%40pganalyze%3A%20Parallel%20VACUUM,%20improved%20WAL%20Usage%20Stats,%20Extended%20Statistics%20Improvements,%20and%20more%20--%3E%20https://pganalyze.com/blog/postgres13-better-performance-monitoring-usability&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Using Postgres Row-Level Security in Python and Django]]></title><description><![CDATA[Postgres introduced row-level security in 2016 to give database administrators a way to limit the rows a user can access, adding an extra layer of data protection. What's nice about RLS is that if a user tries to select or alter a row they don't have access to, their query will return 0 rows, rather than throwing a permissions error. This way, a user can use , and they will only receive the rows they have access to with no knowledge of rows they don't. Most examples of RLS limit row access by…]]></description><link>https://pganalyze.com/blog/postgres-row-level-security-django-python</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-row-level-security-django-python</guid><dc:creator><![CDATA[Josh Alletto]]></dc:creator><pubDate>Thu, 13 Aug 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Postgres introduced &lt;a href=&quot;https://www.postgresql.org/docs/9.5/ddl-rowsecurity.html&quot;&gt;row-level security&lt;/a&gt; in 2016 to give database administrators a way to limit the rows a user can access, adding an extra layer of data protection. What&apos;s nice about RLS is that if a user tries to select or alter a row they don&apos;t have access to, their query will return 0 rows, rather than throwing a permissions error. This way, a user can use &lt;code &gt;select * from table_name&lt;/code&gt;, and they will only receive the rows they have access to with no knowledge of rows they don&apos;t.&lt;/p&gt;
&lt;p&gt;Most examples of RLS limit row access by database user. This can be a powerful feature. In this article, we will have a look at how you can make this happen for your Django app. The problem most people run into when trying to implement row level security is that most web applications, including Django applications, connect to the database with a single user, which makes it hard to take advantage of row level security.&lt;/p&gt;
&lt;p&gt;One way to get around this is to create a database user for each application user. We’ll start with just the database layer. We’ll build out our tables and create a couple of users, then write our first row level security policy to limit which rows those users can access. Once we have an understanding of how RLS works in Postgres, we’ll expand our project out into Django and see how we can handle working with policies and multiple database users in a web application.&lt;/p&gt;
&lt;p&gt;By the way, if you are interested in using Row-Level Security in Ruby on Rails, we have a dedicated article for that here: &lt;a href=&quot;https://pganalyze.com/blog/postgres-row-level-security-ruby-rails&quot;&gt;Using Postgres Row-Level Security in Ruby on Rails&lt;/a&gt;.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#how-to-use-rls-at-the-database-level&quot;&gt;How to use RLS at the database level&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#how-to-use-postgres-row-level-security-in-django&quot;&gt;How to Use Postgres Row-Level Security in Django&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#models&quot;&gt;Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#django-signals-creating-our-database-user&quot;&gt;Django Signals: Creating Our Database User&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#django-middleware-setting-current-user&quot;&gt;Django Middleware: Setting Current User&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/f45266cf24990e288a48e78e6d593ef5/29114/postgres-row-level-security-django.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Postgres Row-Level Security in Python and Django&quot;
        title=&quot;Postgres Row-Level Security in Python and Django&quot;
        src=&quot;https://pganalyze.com/static/f45266cf24990e288a48e78e6d593ef5/1d69c/postgres-row-level-security-django.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-to-use-rls-at-the-database-level&quot; &gt;&lt;a href=&quot;#how-to-use-rls-at-the-database-level&quot; aria-label=&quot;how to use rls at the database level permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How to use RLS at the database level&lt;/h2&gt;
&lt;p&gt;Before we get to the Django side of things, let&apos;s take a look at how RLS works in Postgres. We&apos;ll keep it simple and say we are building an app to help our salespeople keep track of their clients, and we want to make sure no salesperson can access the clients of another salesperson. (These are very competitive, cutthroat salespeople).&lt;/p&gt;
&lt;p&gt;First, let&apos;s set up our tables and populate them with some data:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; salespeople &lt;span &gt;(&lt;/span&gt;id &lt;span &gt;serial&lt;/span&gt; &lt;span &gt;primary&lt;/span&gt; &lt;span &gt;key&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; name &lt;span &gt;text&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; clients &lt;span &gt;(&lt;/span&gt;id &lt;span &gt;serial&lt;/span&gt; &lt;span &gt;primary&lt;/span&gt; &lt;span &gt;key&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; name &lt;span &gt;text&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; salesperson_id &lt;span &gt;integer&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; salespeople &lt;span &gt;(&lt;/span&gt;name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;values&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;Picard&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; salespeople &lt;span &gt;(&lt;/span&gt;name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;values&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;Crusher&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; clients &lt;span &gt;(&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; salesperson_id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;values&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;client1&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; clients &lt;span &gt;(&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; salesperson_id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;values&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;client2&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;INSERT&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; clients &lt;span &gt;(&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; salesperson_id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;values&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;client3&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, we have two salespeople. &lt;code &gt;Picard&lt;/code&gt; has one client, and &lt;code &gt;Crusher&lt;/code&gt; has two clients.&lt;/p&gt;
&lt;p&gt;Next, we are going to need some database users, one for each salesperson. Because two salespeople might share the same name, we are going to use the &lt;code &gt;id&lt;/code&gt; to create Postgres users. We are also going to create a role called &lt;code &gt;salespeople&lt;/code&gt;. This will be the role we grant permissions on, and all of our salespeople can inherit from it.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; ROLE &lt;span &gt;&quot;1&quot;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; ROLE &lt;span &gt;&quot;2&quot;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; ROLE salespeople&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;GRANT&lt;/span&gt; &lt;span &gt;select&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;insert&lt;/span&gt; &lt;span &gt;ON&lt;/span&gt; clients &lt;span &gt;TO&lt;/span&gt; salespeople&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;GRANT&lt;/span&gt; salespeople &lt;span &gt;TO&lt;/span&gt; &lt;span &gt;&quot;1&quot;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;GRANT&lt;/span&gt; salespeople &lt;span &gt;TO&lt;/span&gt; &lt;span &gt;&quot;2&quot;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This setup will come in handy in the next section when we have to deal with Django&apos;s tables in addition to the ones we create for our models.&lt;/p&gt;
&lt;p&gt;Now we are ready to set up RLS on our &lt;code &gt;clients&lt;/code&gt; table. Our policy will limit access to the Postgres &lt;code &gt;current_user&lt;/code&gt; so that they can only view rows where &lt;code &gt;current_user&lt;/code&gt; matches &lt;code &gt;salesperson_id&lt;/code&gt;.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;ALTER&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; clients &lt;span &gt;ENABLE&lt;/span&gt; &lt;span &gt;ROW&lt;/span&gt; &lt;span &gt;LEVEL&lt;/span&gt; SECURITY&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;CREATE&lt;/span&gt; POLICY salesperson_clients &lt;span &gt;ON&lt;/span&gt; clients &lt;span &gt;USING&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;salesperson_id::&lt;span &gt;text&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;current_user&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When we create the policy, we give it a name, &lt;code &gt;salesperson_clients&lt;/code&gt;, and enter the table we want to set the policy on, &lt;code &gt;clients&lt;/code&gt;. Next, we define the policy. In this case, it is very simple: the &lt;code &gt;salesperson_id&lt;/code&gt; on the table must be equal to the value of &lt;code &gt;current_user&lt;/code&gt;. We have to convert the &lt;code &gt;salesperson_id&lt;/code&gt; from an integer to text because our &lt;code &gt;current_user&lt;/code&gt; must be a string (we can&apos;t create Postgres users with integers as names).&lt;/p&gt;
&lt;p&gt;Right now, we are logged in as the &lt;code &gt;postgres&lt;/code&gt; user.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;session_user&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;current_user&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; session_user  | current_user  
---------------+---------------
 postgres      | postgres
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If we query our &lt;code &gt;clients&lt;/code&gt; table, we will be able to see all the rows because &lt;strong&gt;RLS policies do not apply to superusers.&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; clients&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; id |  name   | salesperson_id 
----+---------+----------------
  1 | client1 |              1
  2 | client2 |              2
  3 | client3 |              2
(3 rows)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But if we change the current user, we only get the rows that belong to that user.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; ROLE &lt;span &gt;&quot;1&quot;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;session_user&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;current_user&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; session_user | current_user 
--------------+--------------
 postgres     | 1
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; clients&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt; id |  name   | salesperson_id 
----+---------+----------------
  1 | client1 |              1
(1 row)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;how-to-use-postgres-row-level-security-in-django&quot; &gt;&lt;a href=&quot;#how-to-use-postgres-row-level-security-in-django&quot; aria-label=&quot;how to use postgres row level security in django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How to Use Postgres Row-Level Security in Django&lt;/h2&gt;
&lt;p&gt;Now, how can we translate this to a Django application?&lt;/p&gt;
&lt;p&gt;First, we will need to create a database user for each app user we create. One way to accomplish this would be to override the &lt;code &gt;save&lt;/code&gt; method on the Salesperson model, but this is a great opportunity to take advantage of &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/ref/signals/&quot;&gt;Django signals&lt;/a&gt; , so we&apos;ll create a signal that creates the database user after a new salesperson is saved.&lt;/p&gt;
&lt;p&gt;Next, we&apos;ll have to figure out how to switch to the correct user when a salesperson logs in. For this, we can use a middleware that gets the &lt;code &gt;salesperson_id&lt;/code&gt; and sets the role in the database.&lt;/p&gt;
&lt;h3 id=&quot;models&quot; &gt;&lt;a href=&quot;#models&quot; aria-label=&quot;models permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Models&lt;/h3&gt;
&lt;p&gt;Our models reflect exactly what we set up in our earlier database example. Here I chose to make Salesperson a proxy of Django&apos;s built-in &lt;code &gt;User&lt;/code&gt; model, but this is not required.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;auth&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; User

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Salesperson&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;User&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    &lt;span &gt;class&lt;/span&gt; &lt;span &gt;Meta&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        proxy &lt;span &gt;=&lt;/span&gt; &lt;span &gt;True&lt;/span&gt;
    
&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Client&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;50&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    Salesperson &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;ForeignKey&lt;span &gt;(&lt;/span&gt;Employee&lt;span &gt;,&lt;/span&gt; on_delete&lt;span &gt;=&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;CASCADE&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;django-signals-creating-our-database-user&quot; &gt;&lt;a href=&quot;#django-signals-creating-our-database-user&quot; aria-label=&quot;django signals creating our database user permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Django Signals: Creating Our Database User&lt;/h3&gt;
&lt;p&gt;We want to create a new database user every time a new salesperson record is created. We can use Django signals to execute some code after a new record is saved. If you&apos;re not familiar with signals, the Django docs on this topic are easy to understand. If this piqued your interest, &lt;a href=&quot;https://simpleisbetterthancomplex.com/tutorial/2016/07/28/how-to-create-django-signals.html&quot;&gt;this article&lt;/a&gt; goes into more detail.&lt;/p&gt;
&lt;p&gt;Here is the code for the signal itself, but you&apos;ll have to reference the above article to get it registered in your app:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; &lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; Salesperson
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db&lt;span &gt;.&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;signals &lt;span &gt;import&lt;/span&gt; post_save
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; connection

&lt;span &gt;def&lt;/span&gt; &lt;span &gt;create_db_user&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;sender&lt;span &gt;,&lt;/span&gt; instance&lt;span &gt;,&lt;/span&gt; created&lt;span &gt;,&lt;/span&gt; &lt;span &gt;**&lt;/span&gt;kwargs&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    &lt;span &gt;if&lt;/span&gt; created&lt;span &gt;:&lt;/span&gt;
        user_id &lt;span &gt;=&lt;/span&gt; instance&lt;span &gt;.&lt;/span&gt;&lt;span &gt;id&lt;/span&gt;
        &lt;span &gt;with&lt;/span&gt; connection&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; cursor&lt;span &gt;:&lt;/span&gt;
            cursor&lt;span &gt;.&lt;/span&gt;execute&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;f&apos;CREATE ROLE &quot;&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt;user_id&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&quot;&apos;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
            cursor&lt;span &gt;.&lt;/span&gt;execute&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;f&apos;GRANT salespeople TO &quot;&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt;user_id&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&quot;&apos;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

post_save&lt;span &gt;.&lt;/span&gt;connect&lt;span &gt;(&lt;/span&gt;create_db_user&lt;span &gt;,&lt;/span&gt; sender&lt;span &gt;=&lt;/span&gt;Salesperson&lt;span &gt;)&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code &gt;post_save&lt;/code&gt; signal can take a named argument &lt;code &gt;created&lt;/code&gt;, which is a boolean. This avoids running the code every time we update the record and ensures it will only run when we create a new salesperson. From there, we can get the user id from the instance and use &lt;code &gt;django.db.connection&lt;/code&gt; to run our SQL to create the role and grant permissions.&lt;/p&gt;
&lt;p&gt;It&apos;s very important to note that if you want to use Django&apos;s built-in &lt;code &gt;User&lt;/code&gt; model and the authentication that comes with it, you&apos;ll need to grant &lt;code &gt;salesperson&lt;/code&gt; permissions on the &lt;code &gt;django_admin_log&lt;/code&gt; and &lt;code &gt;auth_user&lt;/code&gt; tables. That&apos;s why it&apos;s so helpful to have this parent role that all individual users inherit from.&lt;/p&gt;
&lt;h3 id=&quot;django-middleware-setting-current-user&quot; &gt;&lt;a href=&quot;#django-middleware-setting-current-user&quot; aria-label=&quot;django middleware setting current user permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Django Middleware: Setting Current User&lt;/h3&gt;
&lt;p&gt;Now, we can write a &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/topics/http/middleware/&quot;&gt;middleware&lt;/a&gt; to switch the database user to the current application user making the request.&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; connection

&lt;span &gt;class&lt;/span&gt; &lt;span &gt;RlsMiddleware&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;object&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__init__&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; get_response&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        self&lt;span &gt;.&lt;/span&gt;get_response &lt;span &gt;=&lt;/span&gt; get_response
        
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__call__&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;self&lt;span &gt;,&lt;/span&gt; request&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        user_id &lt;span &gt;=&lt;/span&gt; request&lt;span &gt;.&lt;/span&gt;user&lt;span &gt;.&lt;/span&gt;&lt;span &gt;id&lt;/span&gt;
        &lt;span &gt;with&lt;/span&gt; connection&lt;span &gt;.&lt;/span&gt;cursor&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; cursor&lt;span &gt;:&lt;/span&gt;
            cursor&lt;span &gt;.&lt;/span&gt;execute&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;f&apos;SET ROLE &quot;&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt;user_id&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&quot; &apos;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

        response &lt;span &gt;=&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;get_response&lt;span &gt;(&lt;/span&gt;request&lt;span &gt;)&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; response&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We get the user id from the request object. After that, the code looks pretty similar to our signal. We use the Django &lt;code &gt;db&lt;/code&gt; connection again to set the role to the corresponding database user, which should match the application user&apos;s id. Don&apos;t forget to &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/topics/http/middleware/#activating-middleware&quot;&gt;register your middleware&lt;/a&gt; in &lt;code &gt;settings.py&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now we can use all of Django&apos;s built-in query methods while maintaining row-level security in Postgres. What is particularly cool is that, with the role set, all we need to do to get all of a salesperson&apos;s clients is call &lt;code &gt;Client.objects.all()&lt;/code&gt;, and we can be sure that only the clients related to the salesperson will be returned. If a salesperson tries to query for a client that doesn&apos;t belong to them, they&apos;ll get zero results.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article we were able to create a simple but powerful row level security policy and, with the help of Django middleware and Django signals, implement the policy at the application level. We saw how to create database users each time we created a new application user, and looked at setting the database role to the correct user after log in, ensuring each application user only had access to the rows that belonged to them.&lt;/p&gt;
&lt;p&gt;There are a few caveats here. For one, using the ids &lt;code &gt;1, 2, 3&lt;/code&gt; is probably not a good idea in production. You&apos;d want to set up some kind of UUID or some other identifier. Also, creating a new database user for every application user becomes hard to scale at a certain point. Row level security can be a useful tool for limiting access at the database level, and we just scratched the surface of what&apos;s possible.&lt;/p&gt;
&lt;p&gt;Still, you should be sure RLS is the right solution for your application before trying to implement it. In particular, the performance implications of row-level security, and how the Postgres planner treats it for query plans should not be overlooked. This has been &lt;a href=&quot;https://medium.com/@cazzer/designing-the-most-performant-row-level-security-strategy-in-postgres-a06084f31945&quot;&gt;significantly improved in Postgres 10&lt;/a&gt;, but its still essential to &lt;a src=&quot;https://pganalyze.com/postgres-explain&quot;&gt;monitor your Postgres query plans&lt;/a&gt; when using RLS.&lt;/p&gt;
&lt;p&gt;In many cases, RLS is not needed, and you’ll be able to secure your data using the &lt;a href=&quot;https://coderbook.com/@marcus/how-to-restrict-access-with-django-permissions/&quot;&gt;security measures&lt;/a&gt; already built into Django.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://ctt.ac/K5png&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Josh is a former educator turned developer with a proven ability to learn quickly and adapt to different roles. In 2018 he changed careers from education to tech and has been excited to find that his communication and presentation skills have transferred over to his new technical career. He&apos;s always looking for a new challenge and a dedicated team to collaborate with.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Postgres JSONB Fields in Django]]></title><description><![CDATA[I remember the first time I built user preferences into an app. At first, users just needed to be able to opt in or out of our weekly emails. "No big deal," I thought, "I'll just add a new field on the Users table." For a while, that was fine. A few weeks later, my boss asked me if we could let users opt into push notifications. Fine, that's just one more column on the database. Can't hurt, right? You probably see where this is going. Within months, my user table had 40 columns, and while…]]></description><link>https://pganalyze.com/blog/postgres-jsonb-django-python</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres-jsonb-django-python</guid><dc:creator><![CDATA[Karl Hughes]]></dc:creator><pubDate>Thu, 30 Jul 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;I remember the first time I built user preferences into an app. At first, users just needed to be able to opt in or out of our weekly emails. &quot;No big deal,&quot; I thought, &quot;I&apos;ll just add a new field on the Users table.&quot; For a while, that was fine. A few weeks later, my boss asked me if we could let users opt into push notifications. Fine, that&apos;s just one more column on the database. Can&apos;t hurt, right?&lt;/p&gt;
&lt;p&gt;You probably see where this is going.&lt;/p&gt;
&lt;p&gt;Within months, my user table had 40 columns, and while &lt;a href=&quot;https://nerderati.com/2017/01/03/postgresql-tables-can-have-at-most-1600-columns/&quot;&gt;Postgres can handle it&lt;/a&gt;, it gets pretty tricky for new devs to keep up with all of them. You can imagine it looked pretty similar to this settings screen of Quora.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/76b4cc659c95583e08ca8db052dd9d05/5d72a/quora_notification_preferences.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Quora&amp;#39;s notification preferences contain dozens of options&quot;
        title=&quot;Quora&amp;#39;s notification preferences contain dozens of options&quot;
        src=&quot;https://pganalyze.com/static/76b4cc659c95583e08ca8db052dd9d05/1d69c/quora_notification_preferences.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;Fortunately, there is rich support in Postgres for &lt;a href=&quot;https://www.postgresql.org/docs/current/datatype-json.html&quot;&gt;JSON fields&lt;/a&gt;, which can be very handy in situations like mine. Both JSON data types (&lt;code &gt;json&lt;/code&gt; and &lt;code &gt;jsonb&lt;/code&gt;) allow you to store entire objects or lists directly in your database. This means that you can store any number of user preferences in one column.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#why-two-types-of-postgres-json-fields&quot;&gt;Why two types of Postgres JSON fields?&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#querying-jsonb-data-in-postgres&quot;&gt;Querying JSONB data in Postgres&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#django-support-for-jsonb-fields&quot;&gt;Django support for JSONB fields&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#creating-jsonb-fields-using-migrations&quot;&gt;Creating JSONB fields using migrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#adding-data-to-jsonb-fields&quot;&gt;Adding data to JSONB fields&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#querying-jsonb-fields&quot;&gt;Querying JSONB fields&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#limitations-of-jsonb-fields-with-postgres-and-django&quot;&gt;Limitations of JSONB fields with Postgres and Django&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;why-two-types-of-postgres-json-fields&quot; &gt;&lt;a href=&quot;#why-two-types-of-postgres-json-fields&quot; aria-label=&quot;why two types of postgres json fields permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Why two types of Postgres JSON fields?&lt;/h2&gt;
&lt;p&gt;JSON support in Postgres gives you the flexibility of a document store database like &lt;a href=&quot;https://www.mongodb.com/&quot;&gt;Mongo&lt;/a&gt; with the speed and structure of a relational database. JSON support is powerful, but because it comes in two types (&lt;code &gt;json&lt;/code&gt; and &lt;code &gt;jsonb&lt;/code&gt;), it&apos;s helpful to understand which is the right choice for your application.
The &lt;code &gt;json&lt;/code&gt; data type was &lt;a href=&quot;https://datavirtuality.com/blog-json-in-postgresql/&quot;&gt;added in Postgres 9.2&lt;/a&gt; and enhanced in 9.3. This new data type allowed you to store &lt;a href=&quot;https://www.json.org/json-en.html&quot;&gt;JSON&lt;/a&gt; directly in your database and even query it. The problem was that &lt;code &gt;json&lt;/code&gt; data was stored as a special kind of &lt;code &gt;text&lt;/code&gt; field, so it was slow to query.&lt;/p&gt;
&lt;p&gt;Postgres &lt;a href=&quot;https://www.compose.com/articles/faster-operations-with-the-jsonb-data-type-in-postgresql/&quot;&gt;introduced &lt;code &gt;jsonb&lt;/code&gt; in 9.4&lt;/a&gt; to combat this issue. Unlike &lt;code &gt;json&lt;/code&gt; fields, &lt;code &gt;jsonb&lt;/code&gt; fields are stored in a &lt;strong&gt;binary structure&lt;/strong&gt; rather than text strings. While this means that writes are slightly slower, querying from &lt;code &gt;jsonb&lt;/code&gt; fields is significantly faster. It also allows you to index &lt;code &gt;jsonb&lt;/code&gt; fields. This makes &lt;code &gt;jsonb&lt;/code&gt; the preferred format for most JSON data stored in Postgres, and the typical choice for Django applications.&lt;/p&gt;
&lt;h2 id=&quot;querying-jsonb-data-in-postgres&quot; &gt;&lt;a href=&quot;#querying-jsonb-data-in-postgres&quot; aria-label=&quot;querying jsonb data in postgres permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Querying JSONB data in Postgres&lt;/h2&gt;
&lt;p&gt;The query syntax for accessing JSON in Postgres is not typical SQL. You have to use the specific &lt;a href=&quot;https://www.postgresql.org/docs/current/functions-json.html&quot;&gt;JSON operators and functions&lt;/a&gt;, so queries on JSON data look different from other Postgres queries.&lt;/p&gt;
&lt;p&gt;For example, if you stored the following data in a Postgres table called &lt;code &gt;profiles&lt;/code&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;preferences&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code &gt;Mike&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code &gt;{&quot;sms&quot;: false, &quot;daily_email&quot;: true, &quot;weekly_email&quot;: true}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code &gt;Lucy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code &gt;{&quot;sms&quot;: true, &quot;daily_email&quot;: false, &quot;weekly_email&quot;: false}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code &gt;Harriet&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code &gt;{&quot;sms&quot;: true, &quot;daily_email&quot;: true, &quot;weekly_email&quot;: true}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;And you wanted to query all users who have opted into your &lt;code &gt;daily_email&lt;/code&gt;, you&apos;d write a query like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;select&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;from&lt;/span&gt; profiles
&lt;span &gt;where&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;preferences&lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&apos;daily_email&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;::&lt;span &gt;boolean&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Which would give you back the rows for &lt;code &gt;Mike&lt;/code&gt; and &lt;code &gt;Harriet&lt;/code&gt;.
I&apos;m pretty good with SQL, but using JSON operators always slows me down. Fortunately, Django offers support for JSONB fields, so you don&apos;t have to become an expert at querying JSON in Postgres.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;django-support-for-jsonb-fields&quot; &gt;&lt;a href=&quot;#django-support-for-jsonb-fields&quot; aria-label=&quot;django support for jsonb fields permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Django support for JSONB fields&lt;/h2&gt;
&lt;p&gt;Since Django 1.9, the popular Python framework has supported &lt;code &gt;jsonb&lt;/code&gt; and &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/fields&quot;&gt;several other Postgres-specific fields&lt;/a&gt;. Native Django support means that creating &lt;code &gt;jsonb&lt;/code&gt; fields, using them in your models, inserting data into them, and querying from them are all possible with Django&apos;s ORM. Let&apos;s take a look at how you can get started using &lt;code &gt;jsonb&lt;/code&gt; with Django.&lt;/p&gt;
&lt;h3 id=&quot;creating-jsonb-fields-using-migrations&quot; &gt;&lt;a href=&quot;#creating-jsonb-fields-using-migrations&quot; aria-label=&quot;creating jsonb fields using migrations permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating JSONB fields using migrations&lt;/h3&gt;
&lt;p&gt;Django&apos;s Postgres module comes with several field classes that you can import and add to your models. If you want to use a JSON field, import the &lt;code &gt;JSONField&lt;/code&gt; class and use it for your model&apos;s property. In this example, we&apos;ll call the field &lt;code &gt;preferences&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;db &lt;span &gt;import&lt;/span&gt; models
&lt;span &gt;from&lt;/span&gt; django&lt;span &gt;.&lt;/span&gt;contrib&lt;span &gt;.&lt;/span&gt;postgres&lt;span &gt;.&lt;/span&gt;fields &lt;span &gt;import&lt;/span&gt; JSONField
 
&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Profile&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;models&lt;span &gt;.&lt;/span&gt;Model&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
    name &lt;span &gt;=&lt;/span&gt; models&lt;span &gt;.&lt;/span&gt;CharField&lt;span &gt;(&lt;/span&gt;max_length&lt;span &gt;=&lt;/span&gt;&lt;span &gt;200&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    preferences &lt;span &gt;=&lt;/span&gt; JSONField&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
 
    &lt;span &gt;def&lt;/span&gt; &lt;span &gt;__str__&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;self&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;
        &lt;span &gt;return&lt;/span&gt; self&lt;span &gt;.&lt;/span&gt;name&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Django only supports the &lt;code &gt;jsonb&lt;/code&gt; column type, so when you run your migrations, Django will create a table definition like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;create&lt;/span&gt; &lt;span &gt;table&lt;/span&gt; app_profile
&lt;span &gt;(&lt;/span&gt;
    id &lt;span &gt;serial&lt;/span&gt; &lt;span &gt;not&lt;/span&gt; &lt;span &gt;null&lt;/span&gt; &lt;span &gt;constraint&lt;/span&gt; app_profile_pkey &lt;span &gt;primary&lt;/span&gt; &lt;span &gt;key&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    name &lt;span &gt;varchar&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;200&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;not&lt;/span&gt; &lt;span &gt;null&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    preferences jsonb &lt;span &gt;not&lt;/span&gt; &lt;span &gt;null&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;adding-data-to-jsonb-fields&quot; &gt;&lt;a href=&quot;#adding-data-to-jsonb-fields&quot; aria-label=&quot;adding data to jsonb fields permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Adding data to JSONB fields&lt;/h3&gt;
&lt;p&gt;Because JSON fields don&apos;t enforce a particular schema, Django will convert any valid Python data type (dictionary, list, string, number, boolean) into the appropriate JSON. For example, if you want to add a new row to the &lt;code &gt;app_profile&lt;/code&gt; table created above, you can run the following in your Django application:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;from&lt;/span&gt; app&lt;span &gt;.&lt;/span&gt;models &lt;span &gt;import&lt;/span&gt; Profile
 
&lt;span &gt;# Create a Profile with preferences&lt;/span&gt;
p &lt;span &gt;=&lt;/span&gt; Profile&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;Tanner&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; preferences&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&apos;sms&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;False&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;daily_email&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;True&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;weekly_email&apos;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;True&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
p&lt;span &gt;.&lt;/span&gt;save&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will create a new user named &lt;code &gt;Tanner&lt;/code&gt; who will receive our &lt;code &gt;daily_email&lt;/code&gt; and &lt;code &gt;weekly_email&lt;/code&gt;, but no &lt;code &gt;sms&lt;/code&gt; messages.&lt;/p&gt;
&lt;h3 id=&quot;querying-jsonb-fields&quot; &gt;&lt;a href=&quot;#querying-jsonb-fields&quot; aria-label=&quot;querying jsonb fields permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Querying JSONB fields&lt;/h3&gt;
&lt;p&gt;Django uses the double underscore pattern from &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/topics/db/queries/#field-lookups&quot;&gt;field lookups&lt;/a&gt; to query JSON object keys. For example, if you want to get all the Profiles for users who have opted into our &lt;code &gt;daily_email&lt;/code&gt;, you&apos;d use the following code:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;results &lt;span &gt;=&lt;/span&gt; Profile&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;preferences__daily_email&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you want to check the SQL query that Django runs, you can print it from the &lt;code &gt;query&lt;/code&gt; property on the results object:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;print&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;results&lt;span &gt;.&lt;/span&gt;query&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# Output:&lt;/span&gt;
SELECT &lt;span &gt;&quot;app_profile&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;app_profile&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;app_profile&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;preferences&quot;&lt;/span&gt; 
FROM &lt;span &gt;&quot;app_profile&quot;&lt;/span&gt; WHERE &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;app_profile&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;preferences&quot;&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; daily_email&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;true&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, the query is slightly different from the one I manually wrote above (I cast &lt;code &gt;daily_email&lt;/code&gt; to a &lt;code &gt;boolean&lt;/code&gt;), but it accomplishes the same thing.
You can also filter records based on the keys they contain. For example, if some user accounts were created before you added the &lt;code &gt;sms&lt;/code&gt; option, you might want to find them and let the users know about the new option. You can use the &lt;code &gt;isnull&lt;/code&gt; field lookup on the &lt;code &gt;sms&lt;/code&gt; key in your JSON data:&lt;/p&gt;
&lt;div  data-language=&quot;python&quot;&gt;&lt;pre &gt;&lt;code &gt;results &lt;span &gt;=&lt;/span&gt; Profile&lt;span &gt;.&lt;/span&gt;objects&lt;span &gt;.&lt;/span&gt;&lt;span &gt;filter&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;preferences__sms__isnull&lt;span &gt;=&lt;/span&gt;&lt;span &gt;True&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are many other ways to filter queries using JSON fields, so be sure to &lt;a href=&quot;https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/fields/#querying-jsonfield&quot;&gt;check out the official Django docs for more&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;limitations-of-jsonb-fields-with-postgres-and-django&quot; &gt;&lt;a href=&quot;#limitations-of-jsonb-fields-with-postgres-and-django&quot; aria-label=&quot;limitations of jsonb fields with postgres and django permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Limitations of JSONB fields with Postgres and Django&lt;/h2&gt;
&lt;p&gt;It&apos;s worth noting that &lt;code &gt;jsonb&lt;/code&gt; fields come with some drawbacks. I&apos;ve already mentioned that it takes slightly longer to write data to &lt;code &gt;jsonb&lt;/code&gt; fields than &lt;code &gt;json&lt;/code&gt; because the JSON string must be converted to binary, but there are other reasons to avoid &lt;code &gt;jsonb&lt;/code&gt; fields.&lt;/p&gt;
&lt;p&gt;First, if your data needs to enforce a strict schema, JSON may not be an ideal choice. While you &lt;a href=&quot;https://blog.hagander.net/json-field-constraints-228/&quot;&gt;can use check constraints&lt;/a&gt; to enforce the use of specific fields, this isn&apos;t natively supported in Django, so you&apos;ll need to write your own migrations to accomplish this.&lt;/p&gt;
&lt;p&gt;A better way to address this shortcoming is by writing Django validation rules to enforce the structure you want. If you don&apos;t want to write the validation rules yourself, there&apos;s a popular package called &lt;a href=&quot;https://python-jsonschema.readthedocs.io/en/stable/&quot;&gt;&lt;code &gt;jsonschema&lt;/code&gt;&lt;/a&gt; that I&apos;d recommend.&lt;/p&gt;
&lt;p&gt;Another drawback to using JSON fields is handling changes to the shape of your data. If you want to add a new column to a database table in Postgres using Django, you simply update your model and run a migration. If you want to add a new field to a JSON column, it isn&apos;t quite as straightforward.&lt;/p&gt;
&lt;p&gt;A pattern I&apos;ve used before is to create a custom migration that loops through the affected records and updates each one individually. This naive method works for relatively small datasets, but it might not be a good idea if you need to update 1 million profiles in a production database. In that case, it might be better to write your code to handle the existence or absence of the key or run a &lt;a href=&quot;https://www.freecodecamp.org/news/how-to-update-objects-inside-jsonb-arrays-with-postgresql-5c4e03be256a/&quot;&gt;batch update on the JSON object&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;While JSON data types come with some drawbacks, they are useful in situations where you need more flexibility in your data structure. Thanks to Django&apos;s native support for &lt;code &gt;jsonb&lt;/code&gt;, you can get started using JSON data in your web applications without learning all the native Postgres query operators.&lt;/p&gt;
&lt;p&gt;Next time you need more flexibility in your data model and want to benefit from the strengths of Postgres give &lt;code &gt;jsonb&lt;/code&gt; fields a try.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article you might want to &lt;a href=&quot;https://ctt.ac/V1DNa&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the author&lt;/h2&gt;
&lt;p&gt;Karl Hughes is a technology team leader and software engineer. He is currently the founder of &lt;a href=&quot;https://draft.dev/?utm_source=pganalyze&quot;&gt;Draft&lt;/a&gt;, where he helps create technical content for engineering blogs.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Building SVG Components in React]]></title><description><![CDATA[React is well known as a great tool for building complex applications
from HTML and CSS, but that same approach can also be used with SVG to
build sophisticated custom UI elements. In this article, we'll give a brief overview of SVG, when to use it
(and when not to), and how to use it effectively in a React
application. We'll also briefly touch on how to integrate with d3
(which comes in very useful when working with SVG).  We relied heavily on SVG to build the charting updates we launched…]]></description><link>https://pganalyze.com/blog/building-svg-components-in-react</link><guid isPermaLink="false">https://pganalyze.com/blog/building-svg-components-in-react</guid><dc:creator><![CDATA[Maciek Sakrejda]]></dc:creator><pubDate>Thu, 09 Jul 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;React is well known as a great tool for building complex applications
from HTML and CSS, but that same approach can also be used with SVG to
build sophisticated custom UI elements.&lt;/p&gt;
&lt;p&gt;In this article, we&apos;ll give a brief overview of SVG, when to use it
(and when not to), and how to use it effectively in a React
application. We&apos;ll also briefly touch on how to integrate with d3
(which comes in very useful when working with SVG).&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/bb52c1a35ef422e8f97742e58b884df3/e8f1b/vacuum_list_example.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;React SVG example&quot;
        title=&quot;React SVG example&quot;
        src=&quot;https://pganalyze.com/static/bb52c1a35ef422e8f97742e58b884df3/1d69c/vacuum_list_example.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;We relied heavily on SVG to build the charting updates we launched
recently in pganalyze (check out my &lt;a href=&quot;https://pganalyze.com/blog/introducing-new-charts-and-date-picker&quot;&gt;blog post about
these&lt;/a&gt;
if you missed it), and we would like to share how we work with SVG in
React. At the end of the post, we link to a simple but functional
charting example based on our new charting code. We stripped it down
to make it easier to follow, and we think it&apos;s a great introduction to
building SVG components in React.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-svg&quot;&gt;What is SVG?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-to-use-svg&quot;&gt;When to use SVG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#svg-in-react&quot;&gt;SVG in React&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#handling-layout-in-svg&quot;&gt;Handling Layout in SVG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#sizing-in-svg&quot;&gt;Sizing in SVG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#svg-and-d3&quot;&gt;SVG and d3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#interactivity&quot;&gt;Interactivity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#styling-svg&quot;&gt;Styling SVG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#embedding-html-in-svg&quot;&gt;Embedding HTML in SVG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#full-example&quot;&gt;Full Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h3 id=&quot;what-is-svg&quot; &gt;&lt;a href=&quot;#what-is-svg&quot; aria-label=&quot;what is svg permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What is SVG?&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Scalable_Vector_Graphics&quot;&gt;SVG&lt;/a&gt; is an
XML-based vector graphics format. It is commonly used for icons and
illustrations, but the similarities to HTML make it a great fit to
extend your UI. Like HTML, SVG consists of a DOM tree of elements
which can be styled with CSS, can be scripted and animated, and can
dispatch events on user interaction. SVG is well-supported in modern
browsers, including Firefox, Safari, Chrome, and Edge. All of these
support embedding SVG directly in HTML, and React supports using SVG
elements to build your components.&lt;/p&gt;
&lt;p&gt;A thorough overview of SVG is beyond the scope of this post, but let&apos;s
review the salient features in the context of building UI components.
The actual elements available fall in a few different categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;simple lines and shapes, like &lt;code &gt;rect&lt;/code&gt;, &lt;code &gt;circle&lt;/code&gt;, and &lt;code &gt;line&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;more complex lines and shapes, like &lt;code &gt;polygon&lt;/code&gt; and &lt;code &gt;path&lt;/code&gt; (check out
&lt;a href=&quot;https://www.sitepoint.com/closer-look-svg-path-data/&quot;&gt;Joni Trythall&apos;s
post&lt;/a&gt; on
everything you can do with the path data attribute!)&lt;/li&gt;
&lt;li&gt;text, like the simple &lt;code &gt;text&lt;/code&gt;, the fancy &lt;code &gt;textPath&lt;/code&gt; (essentially
&lt;code &gt;text&lt;/code&gt; along an arbitrary &lt;code &gt;path&lt;/code&gt;), and the handy &lt;code &gt;title&lt;/code&gt; (for simple
tooltips similar to HTML&apos;s &lt;code &gt;title&lt;/code&gt; attribute)&lt;/li&gt;
&lt;li&gt;special elements to combine and manipulate these, like &lt;code &gt;mask&lt;/code&gt;,
&lt;code &gt;clipPath&lt;/code&gt;, and &lt;code &gt;pattern&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;other odds and ends, like the familiar anchor (&lt;code &gt;a&lt;/code&gt;) from
HTML (with the neat feature that it can conform to whatever shape it
is wrapping, and only that part is interactive).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mozilla&apos;s MDN has a good
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG&quot;&gt;reference&lt;/a&gt; to all
the element types available.&lt;/p&gt;
&lt;p&gt;The event system is very similar to what you&apos;re already used to in
HTML. Some events are different, but many familiar ones like &lt;code &gt;onClick&lt;/code&gt;,
&lt;code &gt;onMouseEnter&lt;/code&gt;, &lt;code &gt;onFocus&lt;/code&gt;, and &lt;code &gt;onKeyUp&lt;/code&gt; are there. Registering event
handlers is the same—svg elements expose &lt;code &gt;onEvent&lt;/code&gt; attributes and you
can add your callback there. If you&apos;re using TypeScript, note that
you&apos;ll need to parameterize generic React synthetic event types with
&lt;code &gt;SVGElement&lt;/code&gt; or just &lt;code &gt;Element&lt;/code&gt; instead of &lt;code &gt;HTMLElement&lt;/code&gt;. E.g.:&lt;/p&gt;
&lt;div  data-language=&quot;ts&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;const&lt;/span&gt; handleClick &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;e&lt;span &gt;:&lt;/span&gt; React&lt;span &gt;.&lt;/span&gt;MouseEvent&lt;span &gt;&amp;lt;&lt;/span&gt;Element&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;void&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;console&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;log&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;clicked&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; e&lt;span &gt;.&lt;/span&gt;currentTarget&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;when-to-use-svg&quot; &gt;&lt;a href=&quot;#when-to-use-svg&quot; aria-label=&quot;when to use svg permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;When to use SVG&lt;/h3&gt;
&lt;p&gt;If you think SVG might be a good fit for some section of your UI,
there&apos;s a good chance you&apos;re right, but you should consider your
options. There are always trade-offs. If you don&apos;t go with SVG, your
other likely options in a React app are going to be sticking with
HTML, or using Canvas. You can &lt;a href=&quot;https://pattle.github.io/simpsons-in-css/&quot;&gt;do a
lot&lt;/a&gt; with some plain divs
and CSS, so HTML may be suitable for more than you think. That said,
if you feel like your use case is pushing HTML to the breaking point
(or at least into a cryptic forest of obscure tags and esoteric
styling), maybe it&apos;s not the right fit. Remember &lt;a href=&quot;https://en.wikiquote.org/wiki/Brian_Kernighan&quot;&gt;Kernighan&apos;s
Law&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Everyone knows that debugging is twice as hard as writing a program in
the first place. So if you&apos;re as clever as you can be when you write it,
how will you ever debug it?&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;On the other end of the spectrum there&apos;s Canvas. Its &lt;a href=&quot;https://en.wikipedia.org/wiki/Immediate_mode_(computer_graphics)&quot;&gt;immediate
mode&lt;/a&gt;
paradigm means it can perform much better with huge datasets, but that
also makes it awkward to work with in React, and harder to script rich
interactivity. SVG (and HTML) have a &lt;a href=&quot;https://en.wikipedia.org/wiki/Retained_mode&quot;&gt;retained
mode&lt;/a&gt; model that&apos;s better
suited to building UIs.&lt;/p&gt;
&lt;p&gt;As a rule of thumb, if it&apos;s reasonable to stick with HTML, &lt;strong&gt;stick
with HTML&lt;/strong&gt;. If not, and you expect to work with a &lt;strong&gt;modest number of
data points&lt;/strong&gt; (the threshold will vary based on your UX needs and your
performance expectations), &lt;strong&gt;SVG is a good bet&lt;/strong&gt;. It will allow you to
build these components in a manner similar to building HTML
components, and to maintain a consistent look and feel with the rest
of your app. Otherwise, consider Canvas or even WebGL.&lt;/p&gt;
&lt;h3 id=&quot;svg-in-react&quot; &gt;&lt;a href=&quot;#svg-in-react&quot; aria-label=&quot;svg in react permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;SVG in React&lt;/h3&gt;
&lt;p&gt;The mechanics of using SVG elements in React are straightforward: Just
write a standard component and return an SVG tag instead of an HTML
tag. You only need to ensure you&apos;re nesting tags correctly and only
putting SVG elements inside an &lt;code &gt;&amp;lt;svg&gt;&lt;/code&gt; tag (just as you should ensure
you&apos;re not putting block-level elements in a &lt;code &gt;&amp;lt;p&gt;&lt;/code&gt;). There&apos;s no
special class to extend, no extra options to handle.&lt;/p&gt;
&lt;p&gt;Here is a trivial SVG component:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;Greeting&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt;name&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;text&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;hello &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;name&lt;span &gt;}&lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;text&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses the SVG &lt;code &gt;&amp;lt;text&gt;&lt;/code&gt; element instead of the &lt;code &gt;&amp;lt;span&gt;&lt;/code&gt; you might
expect in an HTML component, but as you can see, it&apos;s otherwise
identical to an HTML component. To use this, just wrap it in an
&lt;code &gt;&amp;lt;svg&gt;&lt;/code&gt; element:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;Main&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;svg&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;g&lt;/span&gt; &lt;span &gt;transform&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;translate(20,20)&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;Greeting&lt;/span&gt;&lt;/span&gt; &lt;span &gt;name&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;Maciek&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span &gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;g&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
    &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;svg&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;(Don&apos;t worry about the &lt;code &gt;&amp;lt;g&gt;&lt;/code&gt; element for now; we&apos;ll cover that next.)&lt;/p&gt;
&lt;h3 id=&quot;handling-layout-in-svg&quot; &gt;&lt;a href=&quot;#handling-layout-in-svg&quot; aria-label=&quot;handling layout in svg permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Handling Layout in SVG&lt;/h3&gt;
&lt;p&gt;A big difference between HTML and SVG is layout: In HTML, the normal
layout flow positions elements on the page automatically according to
a set of &lt;del&gt;simple&lt;/del&gt; rules. In SVG, it&apos;s up to you to place each
individual element exactly where it&apos;s supposed to go. There is &lt;strong&gt;no
built-in positioning&lt;/strong&gt; mechanism at all, and the order of the tags
really only determines what gets drawn on top of what (like z-index in
HTML; SVG has no explicit z-index.)&lt;/p&gt;
&lt;p&gt;The mechanism for this positioning is a coordinate system that&apos;s
standard in computer graphics: The origin is in the upper-left corner,
and positive x and y values move elements to the right and down,
respectively. You can think of it like an HTML document where all
elements are &lt;code &gt;position: absolute&lt;/code&gt;, and &lt;code &gt;x&lt;/code&gt; and &lt;code &gt;y&lt;/code&gt; are &lt;code &gt;top&lt;/code&gt; and
&lt;code &gt;left&lt;/code&gt;. Here&apos;s an example:&lt;/p&gt;
&lt;div&gt;
  &lt;svg-layout-example /&gt;
&lt;/div&gt;
&lt;p&gt;As you can see, the &lt;code &gt;x&lt;/code&gt; and &lt;code &gt;y&lt;/code&gt; offsets mean slightly different things
for different types of elements. For &lt;code &gt;rect&lt;/code&gt;, it&apos;s the upper-left
corner. For &lt;code &gt;circle&lt;/code&gt; and &lt;code &gt;ellipse&lt;/code&gt;, it&apos;s the center (in fact, circles
and ellipses use &lt;code &gt;cx&lt;/code&gt; and &lt;code &gt;cy&lt;/code&gt; attributes instead of &lt;code &gt;x&lt;/code&gt; and &lt;code &gt;y&lt;/code&gt; to
make this clearer). For text, it&apos;s a reference point, and the text&apos;s
placement relative to that point is configurable via attributes like
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/dominant-baseline&quot;&gt;dominant-baseline&lt;/a&gt;
and
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/text-anchor&quot;&gt;text-anchor&lt;/a&gt;
(by default, the text starts at the x position, with the baseline at
the y position.)&lt;/p&gt;
&lt;p&gt;At first blush, this seems like it would make any non-trivial SVG
component a nightmare to put together, but we can use some SVG
features and some conventions to help us build complex modular
components that work well together.&lt;/p&gt;
&lt;p&gt;Let&apos;s assume we&apos;re working with a certain explicit &lt;code &gt;width&lt;/code&gt; and
&lt;code &gt;height&lt;/code&gt; for our component. This can get tricky if you need your
component to be responsive, but we&apos;ll hand-wave around it for now; we
discuss that in more detail below.&lt;/p&gt;
&lt;p&gt;In general, we&apos;ve found a good approach for SVG components is to have
parents size and position their children by subdividing the parent&apos;s
own width and height. Parents determine each child&apos;s desired &lt;code &gt;width&lt;/code&gt;
and &lt;code &gt;height&lt;/code&gt; and pass those as props. We could also pass &lt;code &gt;x&lt;/code&gt; and &lt;code &gt;y&lt;/code&gt;
to have children position themselves, but SVG provides a handy element
that simplifies this: the group (&lt;code &gt;&amp;lt;g&gt;&lt;/code&gt;). As its name suggests, the
element is a way to apply a set of properties to a group of children.
Most relevant for us is the &lt;code &gt;transform&lt;/code&gt; attribute, specifically its
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/transform#Translate&quot;&gt;&lt;code &gt;translate&lt;/code&gt;&lt;/a&gt;
value. This lets us establish a new origin local to the group, offset
from the parent origin (which may be another &lt;code &gt;&amp;lt;g&gt;&lt;/code&gt;!) by the specified
x and y values. This is perfect for positioning children, since you
can easily do so in the parent. The children themselves can pretend
they&apos;re positioned at the origin, so they only have to worry about
their width and height, and what to render within that space. In fact,
this pattern is so useful, we have a helper component to do this:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;Translate&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt;x&lt;span &gt;=&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;y&lt;span &gt;=&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;children&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;!&lt;/span&gt;x &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span &gt;!&lt;/span&gt;y&lt;span &gt;)&lt;/span&gt; &lt;span &gt;return&lt;/span&gt; children&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;g&lt;/span&gt; &lt;span &gt;transform&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&lt;span &gt;`&lt;/span&gt;&lt;span &gt;translate(&lt;/span&gt;&lt;span &gt;&lt;span &gt;${&lt;/span&gt;x&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;&lt;span &gt;${&lt;/span&gt;y&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;`&lt;/span&gt;&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;children&lt;span &gt;}&lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;g&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Even though subdividing space like this in SVG component hierarchies
is a good rule of thumb, sometimes it may not be a good fit for some
part of your UI. Since &lt;code &gt;&amp;lt;g&gt;&lt;/code&gt; is &lt;strong&gt;not&lt;/strong&gt; a bounded container like a
sized &lt;code &gt;&amp;lt;div&gt;&lt;/code&gt; with &lt;code &gt;overflow: hidden&lt;/code&gt;, the &lt;code &gt;width&lt;/code&gt; and &lt;code &gt;height&lt;/code&gt;
pattern is just metadata for children to follow as a guideline. If
that pattern gets in the way, you can break the rules and have
children draw outside these bounds (though the result may be harder to
reason about).&lt;/p&gt;
&lt;p&gt;Another useful pattern is to have components specify all the sizing
and positioning metadata in constants at the top of the component:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;export&lt;/span&gt; &lt;span &gt;const&lt;/span&gt; &lt;span &gt;Chart&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt; data &lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; viewBoxWidth &lt;span &gt;=&lt;/span&gt; &lt;span &gt;800&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; viewBoxHeight &lt;span &gt;=&lt;/span&gt; &lt;span &gt;400&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; paddingX &lt;span &gt;=&lt;/span&gt; &lt;span &gt;6&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; paddingY &lt;span &gt;=&lt;/span&gt; &lt;span &gt;4&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; bottomAxisHeight &lt;span &gt;=&lt;/span&gt; &lt;span &gt;30&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; leftAxisWidth &lt;span &gt;=&lt;/span&gt; &lt;span &gt;50&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; bodyHeight &lt;span &gt;=&lt;/span&gt; viewBoxHeight &lt;span &gt;-&lt;/span&gt; bottomAxisHeight &lt;span &gt;-&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; paddingY&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; bodyWidth &lt;span &gt;=&lt;/span&gt; viewBoxWidth &lt;span &gt;-&lt;/span&gt; leftAxisWidth &lt;span &gt;-&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; paddingX&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; leftAxis &lt;span &gt;=&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    pos&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      x&lt;span &gt;:&lt;/span&gt; paddingX&lt;span &gt;,&lt;/span&gt;
      y&lt;span &gt;:&lt;/span&gt; paddingY&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    size&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      width&lt;span &gt;:&lt;/span&gt; leftAxisWidth&lt;span &gt;,&lt;/span&gt;
      height&lt;span &gt;:&lt;/span&gt; bodyHeight&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; bottomAxis &lt;span &gt;=&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    pos&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      x&lt;span &gt;:&lt;/span&gt; paddingX &lt;span &gt;+&lt;/span&gt; leftAxisWidth&lt;span &gt;,&lt;/span&gt;
      y&lt;span &gt;:&lt;/span&gt; paddingY &lt;span &gt;+&lt;/span&gt; bodyHeight&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    size&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      width&lt;span &gt;:&lt;/span&gt; bodyWidth&lt;span &gt;,&lt;/span&gt;
      height&lt;span &gt;:&lt;/span&gt; bottomAxisHeight&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; body &lt;span &gt;=&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    pos&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      x&lt;span &gt;:&lt;/span&gt; leftAxis&lt;span &gt;.&lt;/span&gt;pos&lt;span &gt;.&lt;/span&gt;x &lt;span &gt;+&lt;/span&gt; leftAxisWidth&lt;span &gt;,&lt;/span&gt;
      y&lt;span &gt;:&lt;/span&gt; paddingY&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    size&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      width&lt;span &gt;:&lt;/span&gt; bodyWidth&lt;span &gt;,&lt;/span&gt;
      height&lt;span &gt;:&lt;/span&gt; bodyHeight&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;// chart logic code omitted&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;svg&lt;/span&gt; &lt;span &gt;width&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;100%&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span &gt;height&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;400&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span &gt;viewBox&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&lt;span &gt;`&lt;/span&gt;&lt;span &gt;0 0 &lt;/span&gt;&lt;span &gt;&lt;span &gt;${&lt;/span&gt;viewBoxWidth&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt; &lt;/span&gt;&lt;span &gt;&lt;span &gt;${&lt;/span&gt;viewBoxHeight&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;`&lt;/span&gt;&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt; &lt;span &gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;body&lt;span &gt;.&lt;/span&gt;pos&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;/* chart body omitted */&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt; &lt;span &gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;leftAxis&lt;span &gt;.&lt;/span&gt;pos&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &amp;lt;LeftAxis &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;leftAxis&lt;span &gt;.&lt;/span&gt;size&lt;span &gt;}&lt;/span&gt;&lt;span &gt; /* other props omitted */ /&gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt; &lt;span &gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;bottomAxis&lt;span &gt;.&lt;/span&gt;pos&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &amp;lt;BottomAxis &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;bottomAxis&lt;span &gt;.&lt;/span&gt;size&lt;span &gt;}&lt;/span&gt;&lt;span &gt; /* other props omitted */ /&gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
    &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;svg&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This may look tedious and verbose at first, but having the &lt;strong&gt;layout
defined in one place&lt;/strong&gt;, where it can be tweaked centrally and
cross-referenced across child components, &lt;strong&gt;will make your life much
easier&lt;/strong&gt; as you inevitably adjust these. Plus, grouping related
properties and using destructuring to apply them to children
simplifies things a bit. It&apos;s worth the extra verbosity to avoid
having to hunt for magic constants across a complex component, and
updating all the different occurrences (while making sure you avoid
updating constants for unrelated properties that may have the same
value).&lt;/p&gt;
&lt;h3 id=&quot;sizing-in-svg&quot; &gt;&lt;a href=&quot;#sizing-in-svg&quot; aria-label=&quot;sizing in svg permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Sizing in SVG&lt;/h3&gt;
&lt;p&gt;In the layout discussion above, we assumed that an explicit width and
height are provided to our SVG element. This is reasonable if you have
a fixed-size element, but that means your element is not responsive or
even resizable. Fortunately, there are two approaches we can take to
work around this.&lt;/p&gt;
&lt;p&gt;The first is that the coordinate system discussed above is a
simplification. The actual mechanism is more complex (you can read a
great overview from Sara Soueidan
&lt;a href=&quot;https://www.sarasoueidan.com/blog/svg-coordinate-systems/&quot;&gt;here&lt;/a&gt;),
but the most relevant part for us is the &lt;code &gt;viewBox&lt;/code&gt; attribute of the
&lt;code &gt;svg&lt;/code&gt; element. This defines the actual coordinate system to be used
for layout inside the SVG element in terms of arbitrary units (if
unset, this defaults to the actual width and height of the element).
It also supports an x and y offset for the coordinate system (relative
to the upper left of the element), but you can likely leave these as
zero. The full syntax is&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;  viewBox=&quot;&amp;lt;xOffset&gt; &amp;lt;yOffset&gt; &amp;lt;width&gt; &amp;lt;height&gt;&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This lets us size our SVG component however we like (e.g.,
&lt;code &gt;width=&quot;100%&quot;&lt;/code&gt;), but still work in terms of subdividing a specific
width and height inside the component. One thing to note is that font
size will be relative to this viewBox coordinate system (that is, the
size of the font will vary based on the ratio of viewBox coordinate
system width to actual width). Another caveat is that if the aspect
ratio of your viewBox width and height does not match the actual
aspect ratio of the component, you&apos;ll probably want to tweak the
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/preserveAspectRatio&quot;&gt;preserveAspectRatio&lt;/a&gt;
property. If set to &quot;none&quot;, the coordinate system (and content) will
stretch to fit the dimensions of the actual component. This will
distort proportional width and height (so that, e.g., squares will no
longer be square), but if that&apos;s not a concern in your component, this
may be the simplest way to go.&lt;/p&gt;
&lt;p&gt;Another approach is to measure your component before you draw
anything, e.g., using a hook like &lt;a href=&quot;https://github.com/streamich/react-use/blob/master/docs/useMeasure.md&quot;&gt;useMeasure&lt;/a&gt;.
This is more complicated (it requires pulling in a dependency or
writing your own hook like this) and it delays rendering until the
component is sized, but it allows you to work in the actual
dimensions, ensuring no aspect ratio distortion.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;svg-and-d3&quot; &gt;&lt;a href=&quot;#svg-and-d3&quot; aria-label=&quot;svg and d3 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;SVG and d3&lt;/h3&gt;
&lt;p&gt;Whenever visualizing data in JavaScript, d3 is a great tool to
consider. However, since both d3 and React have strong opinions about
how to handle the DOM, getting them to play together nicely can be
tricky. A good rule of thumb is to use &lt;a href=&quot;https://wattenberger.com/blog/react-and-d3&quot;&gt;d3 for layout and React for
rendering&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In our pganalyze charts, we use d3 for scales, helpers for stacking
area series data, bisectors for finding data points near the cursor,
and for generating path data (the &lt;code &gt;d&lt;/code&gt; attribute) for line and area
charts. Almost everything else is plain React and SVG. Amelia
Wattenberger&apos;s blog, linked above, has a separate post that&apos;s a great
&lt;a href=&quot;https://wattenberger.com/blog/d3&quot;&gt;overview of the different d3
modules&lt;/a&gt;. Many of these are still
useful when working with React, but the rendering-oriented ones may be
more trouble than they&apos;re worth.&lt;/p&gt;
&lt;p&gt;The one exception is that we do use d3 selection to take advantage of
d3&apos;s axis convenience functions. They are isolated in their own Axis
components, and we found it&apos;s okay to let d3 handle rendering as long
as it&apos;s not competing with React. Since our Axis components have a
simple interface and are rarely updated once a chart is mounted, we
use the &lt;code &gt;useLayoutEffect&lt;/code&gt; hook to have d3 render the axis via the
helper function (and remove a previous render if there was one).
Here&apos;s the code:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;BottomAxis&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt; scale&lt;span &gt;,&lt;/span&gt; width &lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; ref &lt;span &gt;=&lt;/span&gt; &lt;span &gt;useRef&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;null&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;useLayoutEffect&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;const&lt;/span&gt; host &lt;span &gt;=&lt;/span&gt; &lt;span &gt;select&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;ref&lt;span &gt;.&lt;/span&gt;current&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    host&lt;span &gt;.&lt;/span&gt;&lt;span &gt;select&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;g&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;remove&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;const&lt;/span&gt; axisGenerator &lt;span &gt;=&lt;/span&gt; &lt;span &gt;axisBottom&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;scale&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;const&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;start&lt;span &gt;,&lt;/span&gt; end&lt;span &gt;]&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;extent&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;scale&lt;span &gt;.&lt;/span&gt;&lt;span &gt;range&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;start &lt;span &gt;==&lt;/span&gt; &lt;span &gt;null&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; end &lt;span &gt;==&lt;/span&gt; &lt;span &gt;null&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      &lt;span &gt;return&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;
    &lt;span &gt;const&lt;/span&gt; pxPerTick &lt;span &gt;=&lt;/span&gt; &lt;span &gt;80&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;const&lt;/span&gt; tickCount &lt;span &gt;=&lt;/span&gt; Math&lt;span &gt;.&lt;/span&gt;&lt;span &gt;ceil&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;end &lt;span &gt;-&lt;/span&gt; start&lt;span &gt;)&lt;/span&gt; &lt;span &gt;/&lt;/span&gt; pxPerTick&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    axisGenerator&lt;span &gt;.&lt;/span&gt;&lt;span &gt;ticks&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;tickCount&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

    &lt;span &gt;const&lt;/span&gt; group &lt;span &gt;=&lt;/span&gt; host&lt;span &gt;.&lt;/span&gt;&lt;span &gt;append&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;g&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;group&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;call&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;axisGenerator&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;scale&lt;span &gt;,&lt;/span&gt; width&lt;span &gt;]&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;g&lt;/span&gt; &lt;span &gt;ref&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;ref&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We use &lt;code &gt;useLayoutEffect&lt;/code&gt; instead of plain &lt;code &gt;useEffect&lt;/code&gt; since we want to
update the DOM with the new configuration &lt;strong&gt;before&lt;/strong&gt; the browser
&quot;paints&quot; the DOM updates. For more details on the differences, check
out &lt;a href=&quot;https://kentcdodds.com/blog/useeffect-vs-uselayouteffect&quot;&gt;this
overview&lt;/a&gt;
from Kent Dodds.&lt;/p&gt;
&lt;p&gt;Another tricky aspect of working with d3 in React is plugins. There
are &lt;a href=&quot;https://github.com/d3/d3/wiki/Plugins&quot;&gt;a number of great d3
plugins&lt;/a&gt;, but many of them
don&apos;t really fit into &quot;d3 for layout, React for rendering&quot; paradigm,
because they&apos;re designed around d3, not React. We used a couple of
plugins in our old code, but we found they didn&apos;t fit our new
approach, so we decided to remove them and reimplement their
functionality in our own components. Having more control over these
features was worth having to write some extra code. If you&apos;re
considering using d3 plugins, think about how they will integrate with
the rest of your code.&lt;/p&gt;
&lt;h3 id=&quot;interactivity&quot; &gt;&lt;a href=&quot;#interactivity&quot; aria-label=&quot;interactivity permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Interactivity&lt;/h3&gt;
&lt;p&gt;A killer feature of SVG is the similarity of the event model to plain
HTML, making it easy to build complex interactive interfaces. However,
when combined with React&apos;s component architecture, it&apos;s easy to cause
unnecessary re-renders of SVG components. Unnecessary re-renders can
be a much bigger problem than slow renders, because the former can
happen much more often. If you&apos;re re-rendering a significant part of
your component on mouse move events, for example, it will be hard to
get that to perform well, no matter how fast the individual pieces
render.&lt;/p&gt;
&lt;p&gt;Fortunately, SVG has a great way to &lt;strong&gt;avoid unnecessary renders&lt;/strong&gt;: you
can separate rendering and interactivity concerns into two different
layers. Because both layers are defined by the same data, it&apos;s fairly
easy to keep them in sync. Think of it as two mirror universes. In
one, the data and props determine what&apos;s drawn on screen, but nothing
is interactive. In the other, no data is rendered, but mouse events
(or touch or keyboard events) are captured and mapped back to data
(d3&apos;s &lt;a href=&quot;https://github.com/d3/d3-scale#continuous_invert&quot;&gt;scale.invert&lt;/a&gt;
is great here), which can then be used to display tooltips or respond
to click events. In the UI, this feels like a single set of
interactive elements, and it can avoid a lot of re-renders (especially
for any hover behavior) and keep the UI snappy. We have a full example
below, but think of it like this:&lt;/p&gt;
&lt;div&gt;
  &lt;svg-mouse-layers-example /&gt;
&lt;/div&gt;
&lt;p&gt;Note that the static data rendering only needs to happen once per new
set of data—if you have a lot of data points, this can make a
big difference. (Depending on how you design your component, you may
need to use &lt;a href=&quot;https://reactjs.org/docs/react-api.html#reactmemo&quot;&gt;React.memo&lt;/a&gt;
to avoid extra renders.)&lt;/p&gt;
&lt;p&gt;Another pattern we adopted to improve both performance and UI is to
map mouse events back to the data, and only respond if the mapped data
changes. That is, let&apos;s say your cursor is at (20,10) and this maps to
data point X. If you move to (21,10) but the closest data point is
still X, the UI does not react (other than the mouse pointer itself
moving, obviously). We don&apos;t move the tooltip (it&apos;s snapped to the
nearest data point, not to the cursor, and is always at a fixed
height) and there&apos;s no other UI changes. We found this less
distracting in the UI (why move things around if nothing meaningful
happened?), and it helps avoid tooltip re-renders.&lt;/p&gt;
&lt;p&gt;An important part of interactivity is avoiding interactions with
unwanted elements. For elements like tooltips and anything else that
pops up near the cursor, setting the &lt;code &gt;pointerEvents&lt;/code&gt; attribute to
&lt;code &gt;none&lt;/code&gt; will ensure these are not photobombing your pointer events. If
you don&apos;t do this, and these elements show up under the cursor, they
may force &lt;code &gt;mouseLeave&lt;/code&gt; events on the component where you were
previously tracking the mouse, forcing that element to hide them as
soon as they show up. You should generally consider adding that
attribute to anything non-interactive. It can be set on the &lt;code &gt;&amp;lt;g&gt;&lt;/code&gt;
element as well.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/87bc33aec48befe3510bb2b501da6f33/1439b/tooltip_example.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Tooltip example&quot;
        title=&quot;Tooltip example&quot;
        src=&quot;https://pganalyze.com/static/87bc33aec48befe3510bb2b501da6f33/1d69c/tooltip_example.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Here is our &lt;code &gt;Mouse&lt;/code&gt; component which tracks which data point we&apos;re
hovering over (if any) and re-renders its children whenever that
changes. It also takes a click callback to handle clicks on data
points. Note that for flexibility, mapping from screen coordinates to
data points happens with another callback provided as a prop:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;export&lt;/span&gt; &lt;span &gt;const&lt;/span&gt; &lt;span &gt;Mouse&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt; width&lt;span &gt;,&lt;/span&gt; height&lt;span &gt;,&lt;/span&gt; onClick&lt;span &gt;,&lt;/span&gt; children&lt;span &gt;,&lt;/span&gt; toDataPoint &lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;hoverPt&lt;span &gt;,&lt;/span&gt; setHoverPt&lt;span &gt;]&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;useState&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;undefined&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; &lt;span &gt;handleMouseMove&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;e&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;const&lt;/span&gt; mouse &lt;span &gt;=&lt;/span&gt; &lt;span &gt;getMouse&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;e&lt;span &gt;,&lt;/span&gt; width&lt;span &gt;,&lt;/span&gt; height&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;const&lt;/span&gt; newPt &lt;span &gt;=&lt;/span&gt; &lt;span &gt;toDataPoint&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;mouse&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

    &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;!&lt;/span&gt;&lt;span &gt;pointsEqual&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;hoverPt&lt;span &gt;,&lt;/span&gt; newPt&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      &lt;span &gt;setHoverPt&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;newPt&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; &lt;span &gt;handleMouseLeave&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;setHoverPt&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;undefined&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; &lt;span &gt;handleMouseUp&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    onClick &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; hoverPt &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span &gt;onClick&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;hoverPt&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;rect&lt;/span&gt;
        &lt;span &gt;width&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;width&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;
        &lt;span &gt;height&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;height&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;
        &lt;span &gt;pointerEvents&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;all&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt;
        &lt;span &gt;fill&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;none&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt;
        &lt;span &gt;stroke&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;none&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt;
        &lt;span &gt;onMouseMove&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;handleMouseMove&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;
        &lt;span &gt;onMouseLeave&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;handleMouseLeave&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;
        &lt;span &gt;onMouseUp&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;handleMouseUp&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;
      &lt;span &gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;children &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span &gt;children&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;hoverPt&lt;span &gt;)&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;
    &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;const&lt;/span&gt; &lt;span &gt;getMouse&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;e&lt;span &gt;,&lt;/span&gt; width&lt;span &gt;,&lt;/span&gt; height&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; dims &lt;span &gt;=&lt;/span&gt; e&lt;span &gt;.&lt;/span&gt;currentTarget&lt;span &gt;.&lt;/span&gt;&lt;span &gt;getBoundingClientRect&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; rawX &lt;span &gt;=&lt;/span&gt; e&lt;span &gt;.&lt;/span&gt;clientX &lt;span &gt;-&lt;/span&gt; dims&lt;span &gt;.&lt;/span&gt;left&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; rawY &lt;span &gt;=&lt;/span&gt; e&lt;span &gt;.&lt;/span&gt;clientY &lt;span &gt;-&lt;/span&gt; dims&lt;span &gt;.&lt;/span&gt;top&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; x &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;rawX &lt;span &gt;/&lt;/span&gt; dims&lt;span &gt;.&lt;/span&gt;width&lt;span &gt;)&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; width&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; y &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;rawY &lt;span &gt;/&lt;/span&gt; dims&lt;span &gt;.&lt;/span&gt;height&lt;span &gt;)&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; height&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; x&lt;span &gt;,&lt;/span&gt; y &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;const&lt;/span&gt; &lt;span &gt;pointsEqual&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;p1&lt;span &gt;,&lt;/span&gt; p2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;!&lt;/span&gt;p1 &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span &gt;!&lt;/span&gt;p2&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;p1 &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; p2 &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; p1&lt;span &gt;.&lt;/span&gt;x &lt;span &gt;===&lt;/span&gt; p2&lt;span &gt;.&lt;/span&gt;x &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; p1&lt;span &gt;.&lt;/span&gt;y &lt;span &gt;===&lt;/span&gt; p2&lt;span &gt;.&lt;/span&gt;y&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can then render anything that does depend on mouse position (like
the tooltip) through the render prop pattern:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;Mouse&lt;/span&gt;&lt;/span&gt; &lt;span &gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;body&lt;span &gt;.&lt;/span&gt;size&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;onClick&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;handleClick&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;toDataPoint&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;mapToDataPoint&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
  &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;pt&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;// N.B.: Tooltip just returns `null` if pt is `undefined`&lt;/span&gt;
    &lt;span &gt;return&lt;/span&gt; &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;Tooltip&lt;/span&gt;&lt;/span&gt; &lt;span &gt;point&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;pt&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;xScale&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;xScale&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;yScale&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;yScale&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;body&lt;span &gt;.&lt;/span&gt;size&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;
&lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;&lt;span &gt;Mouse&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;styling-svg&quot; &gt;&lt;a href=&quot;#styling-svg&quot; aria-label=&quot;styling svg permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Styling SVG&lt;/h3&gt;
&lt;p&gt;SVG can be styled with CSS just like HTML, but note that many of the
actual styles themselves are different: &lt;code &gt;fill&lt;/code&gt; instead of
&lt;code &gt;background-color&lt;/code&gt; (and instead of &lt;code &gt;color&lt;/code&gt; for text, somewhat
confusingly), &lt;code &gt;stroke-width&lt;/code&gt; instead of &lt;code &gt;border-width&lt;/code&gt;, etc. Aside
from that, familiar rules and selectors apply. Many styles can also be
specified via element attributes, and that may be preferable if you
need prop-level control over things like color or stroke width.&lt;/p&gt;
&lt;h3 id=&quot;embedding-html-in-svg&quot; &gt;&lt;a href=&quot;#embedding-html-in-svg&quot; aria-label=&quot;embedding html in svg permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Embedding HTML in SVG&lt;/h3&gt;
&lt;p&gt;One of the lesser-known features of SVG is that you can embed HTML
inside an SVG document with the
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG/Element/foreignObject&quot;&gt;foreignObject&lt;/a&gt;
tag. This is very useful for elements like legends or tooltips that
benefit from the more user-friendly text layout capabilities of HTML.
You can use standard HTML CSS in these components, and even use React
elements.&lt;/p&gt;
&lt;p&gt;One tricky aspect of this is that &lt;code &gt;foreignObject&lt;/code&gt; is a standard SVG
element, so it needs to be sized explicitly (like any other element).
This makes it hard to size things like tooltips: you may not know how
much space the label or value to display may need. But let&apos;s revisit
the concept of overlays we discussed earlier. The HTML component does
not need to just be the visible tooltip: a div is transparent out of
the box, so we can stack a transparent wrapper div in front of our
other content, and lay the tooltip out within it. The tooltip can then
size itself to fit the items contained therein. If you adjust tip
positioning based on the position along the x axis, you have almost
half the width of the graph to play with (if that&apos;s not enough, you
should probably rethink your tooltips).&lt;/p&gt;
&lt;p&gt;One other issue we found is that some browsers (most notably, Safari)
run into rendering issues with &lt;code &gt;foreignObject&lt;/code&gt; in some cases. &lt;a href=&quot;https://bugs.webkit.org/show_bug.cgi?id=23113&quot;&gt;This
bug&lt;/a&gt; details the
problem. The bug is eleven years old and has several duplicates, so
it&apos;s probably not getting fixed soon, but we found that setting
&lt;code &gt;position: fixed&lt;/code&gt; on the top-most element in &lt;code &gt;foreignObject&lt;/code&gt; worked
around these issues (and has no other layout impact, since in this
case, &lt;code &gt;fixed&lt;/code&gt; will function just like the default &lt;code &gt;static&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Here is our &lt;code &gt;Tooltip&lt;/code&gt; component:&lt;/p&gt;
&lt;div  data-language=&quot;jsx&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;Tooltip&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;&lt;span &gt;{&lt;/span&gt; point&lt;span &gt;,&lt;/span&gt; xScale&lt;span &gt;,&lt;/span&gt; yScale&lt;span &gt;,&lt;/span&gt; width&lt;span &gt;,&lt;/span&gt; height &lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;=&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;if&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;!&lt;/span&gt;point&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;return&lt;/span&gt; &lt;span &gt;null&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; tipY &lt;span &gt;=&lt;/span&gt; &lt;span &gt;50&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

  &lt;span &gt;const&lt;/span&gt; screenX &lt;span &gt;=&lt;/span&gt; &lt;span &gt;xScale&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;point&lt;span &gt;.&lt;/span&gt;x&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; screenY &lt;span &gt;=&lt;/span&gt; &lt;span &gt;yScale&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;point&lt;span &gt;.&lt;/span&gt;y&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; time &lt;span &gt;=&lt;/span&gt; &lt;span &gt;new&lt;/span&gt; &lt;span &gt;Date&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;point&lt;span &gt;.&lt;/span&gt;x&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;toLocaleString&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; value &lt;span &gt;=&lt;/span&gt; point&lt;span &gt;.&lt;/span&gt;y&lt;span &gt;.&lt;/span&gt;&lt;span &gt;toFixed&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; tipContent &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;div&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;span&lt;/span&gt; &lt;span &gt;className&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;styles&lt;span &gt;.&lt;/span&gt;tooltipLabel&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;Time&lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;span&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;: &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;time&lt;span &gt;}&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;div&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;div&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;span&lt;/span&gt; &lt;span &gt;className&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;styles&lt;span &gt;.&lt;/span&gt;tooltipLabel&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;Value&lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;span&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;: &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;value&lt;span &gt;}&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;div&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
    &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

  &lt;span &gt;const&lt;/span&gt; placeRight &lt;span &gt;=&lt;/span&gt; screenX &lt;span &gt;&amp;lt;&lt;/span&gt; width &lt;span &gt;/&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; tipOverlay&lt;span &gt;:&lt;/span&gt; Layout &lt;span &gt;=&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    size&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      width&lt;span &gt;:&lt;/span&gt; placeRight &lt;span &gt;?&lt;/span&gt; width &lt;span &gt;-&lt;/span&gt; screenX &lt;span &gt;:&lt;/span&gt; screenX&lt;span &gt;,&lt;/span&gt;
      height&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    pos&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      x&lt;span &gt;:&lt;/span&gt; placeRight &lt;span &gt;?&lt;/span&gt; screenX &lt;span &gt;:&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      y&lt;span &gt;:&lt;/span&gt; tipY&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;const&lt;/span&gt; tipStyles &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;styles&lt;span &gt;.&lt;/span&gt;tooltip&lt;span &gt;,&lt;/span&gt; placeRight &lt;span &gt;?&lt;/span&gt; styles&lt;span &gt;.&lt;/span&gt;tooltipRight &lt;span &gt;:&lt;/span&gt; styles&lt;span &gt;.&lt;/span&gt;tooltipLeft&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;join&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot; &quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;g&lt;/span&gt; &lt;span &gt;pointerEvents&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;none&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;circle&lt;/span&gt; &lt;span &gt;cx&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;screenX&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;cy&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;screenY&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;r&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;fill&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;none&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span &gt;stroke&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;blue&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span &gt;strokeWidth&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt; &lt;span &gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;tipOverlay&lt;span &gt;.&lt;/span&gt;pos&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;foreignObject&lt;/span&gt; &lt;span &gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;...&lt;/span&gt;tipOverlay&lt;span &gt;.&lt;/span&gt;size&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
          &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;div&lt;/span&gt; &lt;span &gt;className&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;styles&lt;span &gt;.&lt;/span&gt;tooltipContainer&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
            &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;div&lt;/span&gt; &lt;span &gt;className&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;tipStyles&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
              &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;tipContent&lt;span &gt;}&lt;/span&gt;&lt;span &gt;
            &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;div&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
          &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;div&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
        &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;foreignObject&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;&lt;span &gt;Translate&lt;/span&gt;&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;/* line indicating hover point */&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;
      &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&lt;/span&gt;line&lt;/span&gt; &lt;span &gt;x1&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;screenX&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;y1&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;x2&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;screenX&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;y2&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;height&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;stroke&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;&lt;/span&gt;darkslategray&lt;span &gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span &gt;strokeWidth&lt;/span&gt;&lt;span &gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &lt;span &gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;span &gt;
    &lt;/span&gt;&lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;/&lt;/span&gt;g&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And yes, you can embed SVG inside this embedded HTML, and then embed
HTML again, ad infinitum, just for kicks.&lt;/p&gt;
&lt;h3 id=&quot;full-example&quot; &gt;&lt;a href=&quot;#full-example&quot; aria-label=&quot;full example permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Full Example&lt;/h3&gt;
&lt;p&gt;Here is a full working example pulling together many of the concepts
discussed above:&lt;/p&gt;
&lt;div&gt;
  &lt;svg-example /&gt;
&lt;/div&gt;
&lt;p&gt;You can check out the source
&lt;a href=&quot;https://github.com/pganalyze/react-svg-example&quot;&gt;here&lt;/a&gt;, though we
reviewed most of it piece by piece in the various sections above.&lt;/p&gt;
&lt;h3 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;SVG can be a great way to extend your app&apos;s UI, and works well
out-of-the-box in React. We&apos;ve found it invaluable in rebuilding the
charting components in pganalyze, and we&apos;ll reach for it again
whenever it seems like a good fit. If you&apos;d like to see all we&apos;ve
discussed in action, applied to real world use cases, you can check
out the charts in the &lt;a src=&quot;https://pganalyze.com/product-tour&quot;&gt;pganalyze app&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Advanced Active Record: Using Subqueries in Rails]]></title><description><![CDATA[Active Record provides a great balance between the ability to perform simple queries simply, and also the ability to access the raw SQL sometimes required to get our jobs done. In this article, we will see a number of real-life examples of business needs that may arise at our jobs. They will come in the form of a request for data from someone else at the company, where we will first translate the request into SQL, and then into the Rails code necessary to find those records. We will be covering…]]></description><link>https://pganalyze.com/blog/active-record-subqueries-rails</link><guid isPermaLink="false">https://pganalyze.com/blog/active-record-subqueries-rails</guid><dc:creator><![CDATA[Leigh Halliday]]></dc:creator><pubDate>Wed, 24 Jun 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Active Record provides a great balance between the ability to perform simple queries simply, and also the ability to access the raw SQL sometimes required to get our jobs done. In this article, we will see a number of real-life examples of business needs that may arise at our jobs.&lt;/p&gt;
&lt;p&gt;They will come in the form of a request for data from someone else at the company, where we will first translate the request into SQL, and then into the Rails code necessary to find those records. We will be covering five different types of subqueries to help us find the requested data.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#working-with-active-record-in-rails&quot;&gt;Working with Active Record in Rails&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#what-are-subqueries-in-rails&quot;&gt;What are Subqueries in Rails&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#an-overview-of-our-data&quot;&gt;An Overview of our Data&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#the-where-subquery&quot;&gt;The Where Subquery&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#where-not-exists&quot;&gt;Where Not Exists&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#the-select-subquery&quot;&gt;The Select Subquery&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#the-from-subquery&quot;&gt;The From Subquery&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#the-having-subquery&quot;&gt;The Having Subquery&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#you-might-also-be-interested-in&quot;&gt;You might also be interested in&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;Let&apos;s take a look at why subqueries matter:&lt;/p&gt;
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; standalone=&quot;no&quot;?&gt;
&lt;!DOCTYPE svg PUBLIC &quot;-//W3C//DTD SVG 1.1//EN&quot; &quot;http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd&quot;&gt;
&lt;svg version=&quot;1.1&quot; xmlns:xl=&quot;http://www.w3.org/1999/xlink&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot; viewBox=&quot;29.5 167 879.5 452&quot; width=&quot;879.5&quot; height=&quot;452&quot;&gt;
  &lt;defs&gt;
    &lt;font-face font-family=&quot;Helvetica Neue&quot; font-size=&quot;16&quot; panose-1=&quot;2 0 8 3 0 0 0 9 0 4&quot; units-per-em=&quot;1000&quot; underline-position=&quot;-100&quot; underline-thickness=&quot;50&quot; slope=&quot;0&quot; x-height=&quot;524&quot; cap-height=&quot;722&quot; ascent=&quot;975.0061&quot; descent=&quot;-216.99524&quot; font-weight=&quot;700&quot;&gt;
      &lt;font-face-src&gt;
        &lt;font-face-name name=&quot;HelveticaNeue-Bold&quot;/&gt;
      &lt;/font-face-src&gt;
    &lt;/font-face&gt;
    &lt;marker orient=&quot;auto&quot; overflow=&quot;visible&quot; markerUnits=&quot;strokeWidth&quot; id=&quot;FilledArrow_Marker&quot; stroke-linejoin=&quot;miter&quot; stroke-miterlimit=&quot;10&quot; viewBox=&quot;-1 -4 10 8&quot; markerWidth=&quot;10&quot; markerHeight=&quot;8&quot; color=&quot;#cc0102&quot;&gt;
      &lt;g&gt;
        &lt;path d=&quot;M 8 0 L 0 -3 L 0 3 Z&quot; fill=&quot;currentColor&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
    &lt;/marker&gt;
    &lt;marker orient=&quot;auto&quot; overflow=&quot;visible&quot; markerUnits=&quot;strokeWidth&quot; id=&quot;Arrow_Marker&quot; stroke-linejoin=&quot;miter&quot; stroke-miterlimit=&quot;10&quot; viewBox=&quot;-1 -4 10 8&quot; markerWidth=&quot;10&quot; markerHeight=&quot;8&quot; color=&quot;#346591&quot;&gt;
      &lt;g&gt;
        &lt;path d=&quot;M 8 0 L 0 -3 L 0 3 Z&quot; fill=&quot;none&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
    &lt;/marker&gt;
    &lt;marker orient=&quot;auto&quot; overflow=&quot;visible&quot; markerUnits=&quot;strokeWidth&quot; id=&quot;FilledArrow_Marker_2&quot; stroke-linejoin=&quot;miter&quot; stroke-miterlimit=&quot;10&quot; viewBox=&quot;-1 -4 10 8&quot; markerWidth=&quot;10&quot; markerHeight=&quot;8&quot; color=&quot;#cb0200&quot;&gt;
      &lt;g&gt;
        &lt;path d=&quot;M 8 0 L 0 -3 L 0 3 Z&quot; fill=&quot;currentColor&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
    &lt;/marker&gt;
    &lt;font-face font-family=&quot;Monaco&quot; font-size=&quot;14&quot; units-per-em=&quot;1000&quot; underline-position=&quot;-37.597656&quot; underline-thickness=&quot;75.68359&quot; slope=&quot;0&quot; x-height=&quot;545.41016&quot; cap-height=&quot;757.8125&quot; ascent=&quot;1e3&quot; descent=&quot;-250&quot; font-weight=&quot;400&quot;&gt;
      &lt;font-face-src&gt;
        &lt;font-face-name name=&quot;Monaco&quot;/&gt;
      &lt;/font-face-src&gt;
    &lt;/font-face&gt;
    &lt;font-face font-family=&quot;Monaco&quot; font-size=&quot;13&quot; units-per-em=&quot;1000&quot; underline-position=&quot;-37.597656&quot; underline-thickness=&quot;75.68359&quot; slope=&quot;0&quot; x-height=&quot;545.41016&quot; cap-height=&quot;757.8125&quot; ascent=&quot;1e3&quot; descent=&quot;-250&quot; font-weight=&quot;400&quot;&gt;
      &lt;font-face-src&gt;
        &lt;font-face-name name=&quot;Monaco&quot;/&gt;
      &lt;/font-face-src&gt;
    &lt;/font-face&gt;
  &lt;/defs&gt;
  &lt;g id=&quot;Canvas_1&quot;  fill-opacity=&quot;1&quot; fill=&quot;none&quot; stroke-opacity=&quot;1&quot; stroke=&quot;none&quot;&gt;
    &lt;title&gt;Canvas 1&lt;/title&gt;
    &lt;g id=&quot;Canvas_1: Layer 1&quot;&gt;
      &lt;title&gt;Layer 1&lt;/title&gt;
      &lt;g id=&quot;Graphic_2&quot;&gt;
        &lt;rect x=&quot;765.5&quot; y=&quot;177&quot; width=&quot;133.5&quot; height=&quot;43.75&quot; fill=&quot;#326691&quot;/&gt;
        &lt;text transform=&quot;translate(770.5 189.14294)&quot; fill=&quot;white&quot;&gt;
          &lt;tspan font-family=&quot;Helvetica Neue&quot; font-size=&quot;16&quot; font-weight=&quot;700&quot; fill=&quot;white&quot; x=&quot;27.67&quot; y=&quot;16&quot;&gt;Postgres&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_3&quot;&gt;
        &lt;rect x=&quot;39.5&quot; y=&quot;177&quot; width=&quot;133.5&quot; height=&quot;43.75&quot; fill=&quot;#cc0100&quot;/&gt;
        &lt;text transform=&quot;translate(44.5 189.14294)&quot; fill=&quot;white&quot;&gt;
          &lt;tspan font-family=&quot;Helvetica Neue&quot; font-size=&quot;16&quot; font-weight=&quot;700&quot; fill=&quot;white&quot; x=&quot;42.958&quot; y=&quot;16&quot;&gt;Rails&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_9&quot;&gt;
        &lt;text transform=&quot;translate(309.538 466.27576)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Helvetica Neue&quot; font-size=&quot;16&quot; font-weight=&quot;700&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;16&quot;&gt;Advanced Active Record with Subqueries:&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_12&quot;&gt;
        &lt;line x1=&quot;106.25&quot; y1=&quot;220.75&quot; x2=&quot;106.25&quot; y2=&quot;608.5&quot; stroke=&quot;#c00&quot;  stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_13&quot;&gt;
        &lt;line x1=&quot;831.75&quot; y1=&quot;220.75&quot; x2=&quot;831.75&quot; y2=&quot;608.5&quot; stroke=&quot;#326690&quot;  stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_15&quot;&gt;
        &lt;line x1=&quot;117.5&quot; y1=&quot;296.5&quot; x2=&quot;811.1&quot; y2=&quot;296.5&quot; marker-end=&quot;url(#FilledArrow_Marker)&quot; stroke=&quot;#cc0102&quot; stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_17&quot;&gt;
        &lt;line x1=&quot;821&quot; y1=&quot;328&quot; x2=&quot;127.4&quot; y2=&quot;328&quot; marker-end=&quot;url(#Arrow_Marker)&quot; stroke=&quot;#346591&quot;  stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_18&quot;&gt;
        &lt;line x1=&quot;117.5&quot; y1=&quot;375&quot; x2=&quot;811.1&quot; y2=&quot;376.97186&quot; marker-end=&quot;url(#FilledArrow_Marker_2)&quot; stroke=&quot;#cb0200&quot; stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_20&quot;&gt;
        &lt;text transform=&quot;translate(350.01 224.35327)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Helvetica Neue&quot; font-size=&quot;16&quot; font-weight=&quot;700&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;16&quot;&gt;Simple usage of Active Record:&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_21&quot;&gt;
        &lt;rect x=&quot;94.5&quot; y=&quot;293&quot; width=&quot;22.5&quot; height=&quot;119&quot; fill=&quot;#cc0100&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_25&quot;&gt;
        &lt;rect x=&quot;821&quot; y=&quot;293&quot; width=&quot;22.5&quot; height=&quot;38.25049&quot; fill=&quot;#326691&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_40&quot;&gt;
        &lt;text transform=&quot;translate(320.5 273.54273)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Monaco&quot; font-size=&quot;14&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;14&quot;&gt;SELECT AVG(salary) FROM employees&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_43&quot;&gt;
        &lt;text transform=&quot;translate(431.8181 309.16504)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Monaco&quot; font-size=&quot;13&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;13&quot;&gt;99306.4&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_44&quot;&gt;
        &lt;text transform=&quot;translate(265.8911 352.83105)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Monaco&quot; font-size=&quot;14&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;14&quot;&gt;SELECT * FROM employees WHERE salary &amp;gt; 99306.4&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_48&quot;&gt;
        &lt;line x1=&quot;821&quot; y1=&quot;408.7495&quot; x2=&quot;127.4&quot; y2=&quot;408.7495&quot; marker-end=&quot;url(#Arrow_Marker)&quot; stroke=&quot;#346591&quot;  stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_47&quot;&gt;
        &lt;rect x=&quot;821&quot; y=&quot;372.64905&quot; width=&quot;22.5&quot; height=&quot;38.25049&quot; fill=&quot;#326691&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_46&quot;&gt;
        &lt;text transform=&quot;translate(435.71875 389.91455)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Monaco&quot; font-size=&quot;13&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;13&quot;&gt;Result&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_53&quot;&gt;
        &lt;line x1=&quot;117.5&quot; y1=&quot;534&quot; x2=&quot;811.1&quot; y2=&quot;534&quot; marker-end=&quot;url(#FilledArrow_Marker_2)&quot; stroke=&quot;#cb0200&quot; stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_52&quot;&gt;
        &lt;text transform=&quot;translate(151.27197 511.0972)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Monaco&quot; font-size=&quot;14&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;14&quot;&gt;SELECT * FROM employees WHERE salary &amp;gt; (SELECT AVG(salary) FROM employees)&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Line_51&quot;&gt;
        &lt;line x1=&quot;821&quot; y1=&quot;565.0156&quot; x2=&quot;127.4&quot; y2=&quot;565.0156&quot; marker-end=&quot;url(#Arrow_Marker)&quot; stroke=&quot;#346591&quot;  stroke-width=&quot;1&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_50&quot;&gt;
        &lt;rect x=&quot;821&quot; y=&quot;529.76514&quot; width=&quot;22.5&quot; height=&quot;38.25049&quot; fill=&quot;#326691&quot;/&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_49&quot;&gt;
        &lt;text transform=&quot;translate(438.71875 546.1807)&quot; fill=&quot;black&quot;&gt;
          &lt;tspan font-family=&quot;Monaco&quot; font-size=&quot;13&quot; font-weight=&quot;400&quot; fill=&quot;black&quot; x=&quot;0&quot; y=&quot;13&quot;&gt;Result&lt;/tspan&gt;
        &lt;/text&gt;
      &lt;/g&gt;
      &lt;g id=&quot;Graphic_54&quot;&gt;
        &lt;rect x=&quot;94.5&quot; y=&quot;530.0156&quot; width=&quot;22.5&quot; height=&quot;38.25049&quot; fill=&quot;#cc0100&quot;/&gt;
      &lt;/g&gt;
    &lt;/g&gt;
  &lt;/g&gt;
&lt;/svg&gt;
&lt;p&gt;In the first case, without subqueries, we are going to the database twice: First to get the average salary, and then again to get the result set. With a subquery, we can avoid the extra roundtrip, getting the result directly with a single query.&lt;/p&gt;
&lt;h2 id=&quot;working-with-active-record-in-rails&quot; &gt;&lt;a href=&quot;#working-with-active-record-in-rails&quot; aria-label=&quot;working with active record in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Working with Active Record in Rails&lt;/h2&gt;
&lt;p&gt;Active Record is a little like a walled garden. It protects us as developers (and our users) from the harsh realities of what lies beyond those walls: Differences in SQL between databases (MySQL, Postgres, SQLite), knowing how to properly escape strings to avoid &lt;a href=&quot;https://en.wikipedia.org/wiki/SQL_injection&quot;&gt;SQL injection attacks&lt;/a&gt;, and generally providing an elegant abstraction to interact with our database using the language of our choice, Ruby.&lt;/p&gt;
&lt;p&gt;But, SQL is extremely powerful! By understanding the SQL that Active Record is executing, we can open the gate in our walled garden to &lt;strong&gt;reach beyond what you may think is possible to accomplish in Rails&lt;/strong&gt;, taking advantage of optimizations and flexibility that may be difficult to achieve otherwise.&lt;/p&gt;
&lt;h2 id=&quot;what-are-subqueries-in-rails&quot; &gt;&lt;a href=&quot;#what-are-subqueries-in-rails&quot; aria-label=&quot;what are subqueries in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What are Subqueries in Rails&lt;/h2&gt;
&lt;p&gt;In this article, we will be learning how to use subqueries in Active Record. Subqueries are what their name implies: A query within a query. We will look at how to embed subqueries into the &lt;code &gt;SELECT&lt;/code&gt;, &lt;code &gt;FROM&lt;/code&gt;, &lt;code &gt;WHERE&lt;/code&gt;, and &lt;code &gt;HAVING&lt;/code&gt; clauses of SQL, to meet the demands of our business counterparts who are asking to view data in different and interesting ways.&lt;/p&gt;
&lt;p&gt;We&apos;ll be playing the role of a developer fielding questions from HR. They are asking for reports about our employees at BCE (Best Company Ever), and we&apos;ll do our best to find the data they need using Active Record.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/pganalyze/subqueries-rails-example&quot;&gt;source code for this article&lt;/a&gt; is available on GitHub.&lt;/p&gt;
&lt;h2 id=&quot;an-overview-of-our-data&quot; &gt;&lt;a href=&quot;#an-overview-of-our-data&quot; aria-label=&quot;an overview of our data permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;An Overview of our Data&lt;/h2&gt;
&lt;p&gt;Our database has 4 tables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;roles&lt;/strong&gt;: The job roles of our employees (Finance, Engineering, Sales, HR, etc...)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;employees&lt;/strong&gt;: The people that work for BCE&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;performance_reviews&lt;/strong&gt;: Performance reviews carried out by an employee&apos;s manager, giving them a score between 0 and 100&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;vacations&lt;/strong&gt;: Keeping track of when employees have taken vacation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using &lt;a href=&quot;https://dbdiagram.io/&quot;&gt;https://dbdiagram.io/&lt;/a&gt; we&apos;re able to see how these tables relate to each other:&lt;/p&gt;
&lt;p &gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/d18c3ddf9bf418f5b36c85d3e8f623a5/c2d9c/subqueries-rails-diagram-dark.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;An Overview of how 4 tables in our database relate to each other&quot; title=&quot;An Overview of how 4 tables in our database relate to each other&quot; src=&quot;https://pganalyze.com/static/d18c3ddf9bf418f5b36c85d3e8f623a5/1d69c/subqueries-rails-diagram-dark.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;If you are following along, the &lt;code &gt;rails db:seed&lt;/code&gt; command will generate 1,000 employees, 1,000 vacations, and 10,000 performance reviews.&lt;/p&gt;
&lt;h2 id=&quot;the-where-subquery&quot; &gt;&lt;a href=&quot;#the-where-subquery&quot; aria-label=&quot;the where subquery permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The Where Subquery&lt;/h2&gt;
&lt;p&gt;Now that we have our data set and we’re ready to go let’s help our HR team with their first request:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Leigh, could you find us all the employees that make &lt;em&gt;more than the average salary&lt;/em&gt; at BCE?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here we will use a subquery within the &lt;code &gt;WHERE&lt;/code&gt; clause to find the employees that match HR&apos;s request:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; employees
&lt;span &gt;WHERE&lt;/span&gt;
  employees&lt;span &gt;.&lt;/span&gt;salary &lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;salary&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; employees&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;My first attempt at replicating the query above looked like this:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;salary &gt; :avg&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; avg&lt;span &gt;:&lt;/span&gt; &lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;average&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:salary&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;But what it produced was two queries&lt;/em&gt;: One to find the average, and a second to query employees with a salary greater than that number. Not technically wrong, but &lt;strong&gt;it doesn&apos;t line up with the SQL we were going for.&lt;/strong&gt; There is also a potential performance impact of two round-trip requests to the database server, along with potential inconsistencies if a new employee making $1B/year is hired between queries one and two. Although this is unlikely in this particular scenario, it’s something to consider as a potential risk.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;-- find the average&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;AVG&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;employees&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;salary&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;employees&quot;&lt;/span&gt;
&lt;span &gt;-- find the employees&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;employees&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;employees&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;salary &lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;99306.4&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What we shouldn’t forget about &lt;a href=&quot;https://guides.rubyonrails.org/active_record_querying.html&quot;&gt;Active Record&lt;/a&gt; is that certain methods, such as &lt;code &gt;average(:salary)&lt;/code&gt;, actually execute the query and return a result, while other methods implement &lt;a href=&quot;https://en.wikipedia.org/wiki/Method_chaining&quot;&gt;Method Chaining&lt;/a&gt;, allowing you to chain multiple Active Record methods together, building up more complex SQL statements prior to their execution.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;salary &gt; (:avg)&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; avg&lt;span &gt;:&lt;/span&gt; &lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;avg(salary)&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This produces the SQL we want, but note that we had to wrap the placeholder condition &lt;code &gt;:avg&lt;/code&gt; in brackets, because the database wants subqueries wrapped in brackets as well.&lt;/p&gt;
&lt;p&gt;Because the seed data is generated randomly, your results will vary from mine, but I am seeing &lt;em&gt;487&lt;/em&gt; matching employees, getting a result that looks like this:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;#&amp;lt;ActiveRecord::Relation [#&amp;lt;Employee id: 4, role_id: 5, name: &quot;Bob Williams&quot;, salary: 127053.0, created_at: &quot;2020-04-26 18:42:53&quot;, updated_at: &quot;2020-04-26 18:42:53&quot;&gt;, #&amp;lt;Employee id: 5, role_id: 4, name: &quot;Bob Florez&quot;, salary: 149218.0, created_at: &quot;2020-04-26 18:42:53&quot;, updated_at: &quot;2020-04-26 18:42:53&quot;&gt;, ...]&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/advanced-database-programming-rails-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        title=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        src=&quot;https://pganalyze.com/static/24260e03f3c098e161f84b87ce28122b/acb04/ebook_promo_advanced_database_programming_rails_postgres.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;where-not-exists&quot; &gt;&lt;a href=&quot;#where-not-exists&quot; aria-label=&quot;where not exists permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Where Not Exists&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;Leigh, we would like to encourage employees to have a healthy work-life balance, and were hoping you could provide us with a list of all the &lt;em&gt;employees who have yet to take any vacation time&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For this case, &lt;code &gt;NOT EXISTS&lt;/code&gt; is a perfect fit, since it only matches records that &lt;strong&gt;do not&lt;/strong&gt; have a match in the subquery. An alternative is to perform a left outer join, only choosing the records with no matches on the right side. This is referred to as an &lt;a href=&quot;https://gerardnico.com/data/type/relation/sql/anti_join&quot;&gt;anti-join&lt;/a&gt;, where the purpose of the join is to find records that &lt;strong&gt;do not&lt;/strong&gt; have a matching record.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; employees
&lt;span &gt;WHERE&lt;/span&gt;
  &lt;span &gt;NOT&lt;/span&gt; &lt;span &gt;EXISTS&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; vacations
    &lt;span &gt;WHERE&lt;/span&gt; vacations&lt;span &gt;.&lt;/span&gt;employee_id &lt;span &gt;=&lt;/span&gt; employees&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you&apos;re interested in the &lt;strong&gt;LEFT OUTER JOIN&lt;/strong&gt; equivalent, it might look like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; employees&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt;
  employees
  &lt;span &gt;LEFT&lt;/span&gt; &lt;span &gt;OUTER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; vacations &lt;span &gt;ON&lt;/span&gt; vacations&lt;span &gt;.&lt;/span&gt;employee_id &lt;span &gt;=&lt;/span&gt; employees&lt;span &gt;.&lt;/span&gt;id
&lt;span &gt;WHERE&lt;/span&gt; vacations&lt;span &gt;.&lt;/span&gt;id &lt;span &gt;IS&lt;/span&gt; &lt;span &gt;NULL&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The subquery depends on a match between the &lt;code &gt;employees.id&lt;/code&gt; column and the &lt;code &gt;vacations.employee_id&lt;/code&gt; column, making it a &lt;a href=&quot;https://learnsql.com/blog/correlated-sql-subqueries-newbies/&quot;&gt;correlated subquery&lt;/a&gt;. Because Rails follows standard naming conventions when querying (the downcased plural form of our model), we can add the above condition into our subquery without too much difficulty.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&apos;NOT EXISTS (:vacations)&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  vacations&lt;span &gt;:&lt;/span&gt; &lt;span &gt;Vacation&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;1&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;employees.id = vacations.employee_id&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using my seed data, I am seeing &lt;em&gt;369&lt;/em&gt; employees that have yet to take any vacations.&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;#&amp;lt;ActiveRecord::Relation [#&amp;lt;Employee id: 2, role_id: 2, name: &quot;Alice Florez&quot;, salary: 86920.0, created_at: &quot;2020-04-26 18:42:53&quot;, updated_at: &quot;2020-04-26 18:42:53&quot;&gt;, #&amp;lt;Employee id: 5, role_id: 4, name: &quot;Bob Florez&quot;, salary: 149218.0, created_at: &quot;2020-04-26 18:42:53&quot;, updated_at: &quot;2020-04-26 18:42:53&quot;&gt;, ...]&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;the-select-subquery&quot; &gt;&lt;a href=&quot;#the-select-subquery&quot; aria-label=&quot;the select subquery permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The Select Subquery&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Leigh, could you provide us with a list of employees, &lt;em&gt;including the average salary&lt;/em&gt; of a BCE employee, and how much this &lt;em&gt;employee&apos;s salary differs from the average&lt;/em&gt;?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  &lt;span &gt;*&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;(&lt;/span&gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;salary&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; employees&lt;span &gt;)&lt;/span&gt; avg_salary&lt;span &gt;,&lt;/span&gt;
  salary &lt;span &gt;-&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;salary&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; employees&lt;span &gt;)&lt;/span&gt; above_avg
&lt;span &gt;FROM&lt;/span&gt; employees&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Because the subquery is repeated, we can save ourselves a little bit of hassle by placing the subquery SQL into a variable that we&apos;ll embed into the outer query. The &lt;code &gt;to_sql&lt;/code&gt; method is perfect for this, but it&apos;s also fantastic to peak into the SQL that Rails is producing without actually executing the query.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;avg_sql &lt;span &gt;=&lt;/span&gt; &lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;avg(salary)&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to_sql

&lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&apos;*&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;(&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;avg_sql&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;) avg_salary&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;salary - (&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;avg_sql&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;) avg_difference&quot;&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query does not limit the results in any way, but instead selects two additional columns (&lt;code &gt;avg_salary&lt;/code&gt; and &lt;code &gt;avg_difference&lt;/code&gt;). Looking at the first three results, I am seeing:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;[&lt;/span&gt;
  &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;role_id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;Joe Serna&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;86340.0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;99306.4&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_difference&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;12966.399999999994&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
  &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;role_id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;Alice Florez&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;86920.0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;99306.4&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_difference&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;12386.399999999994&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
  &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;role_id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;Amanda Florez&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;93600.0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;99306.4&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_difference&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;-&lt;/span&gt;&lt;span &gt;5706.399999999994&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;
&lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As with any SQL query, there are often many ways to arrive at the same result. In this example we used subqueries to find the average employee salary, but it may have been better to use &lt;a href=&quot;https://www.postgresql.org/docs/current/tutorial-window.html&quot;&gt;window functions&lt;/a&gt; instead. They give us the same result, but provide a simpler query which is actually more performant as well. Even on a small dataset of 1000 employees, this query takes approximately 12ms vs 18ms for the subquery equivalent.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  &lt;span &gt;*&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;salary&lt;span &gt;)&lt;/span&gt; &lt;span &gt;OVER&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; avg_salary&lt;span &gt;,&lt;/span&gt;
  salary &lt;span &gt;-&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;salary&lt;span &gt;)&lt;/span&gt; &lt;span &gt;OVER&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; avg_salary
&lt;span &gt;FROM&lt;/span&gt;
  employees&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The window function approach is actually easier to write in Rails as well!&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&apos;*&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;avg(salary) OVER () avg_salary&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;salary - avg(salary) OVER () avg_difference&quot;&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;the-from-subquery&quot; &gt;&lt;a href=&quot;#the-from-subquery&quot; aria-label=&quot;the from subquery permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The From Subquery&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Leigh, we&apos;d like to know the &lt;em&gt;average performance review score&lt;/em&gt; given across all our managers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After clarifying with HR, they are looking to take the average score each manager has given, and then take the average of those averages. In other words, the average average. When you are dealing with an &lt;strong&gt;aggregate of aggregates&lt;/strong&gt;, it needs to be accomplished in two steps. This can be done using a subquery as the &lt;code &gt;FROM&lt;/code&gt; clause, essentially giving us a temporary table to then select from, allowing us to find the average of those averages.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;avg_score&lt;span &gt;)&lt;/span&gt; reviewer_avg
&lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;SELECT&lt;/span&gt; reviewer_id&lt;span &gt;,&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;score&lt;span &gt;)&lt;/span&gt; avg_score
  &lt;span &gt;FROM&lt;/span&gt; performance_reviews
  &lt;span &gt;GROUP&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; reviewer_id&lt;span &gt;)&lt;/span&gt; reviewer_avgs&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To keep our Ruby code clean, we&apos;ll place the subquery into a variable which can then be embedded into the main query.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;from_sql &lt;span &gt;=&lt;/span&gt;
  &lt;span &gt;PerformanceReview&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:reviewer_id&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;avg(score) avg_score&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;group&lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;:reviewer_id&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to_sql

&lt;span &gt;PerformanceReview&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;avg(avg_score) reviewer_avg&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;from&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&quot;(&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;from_sql&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;) as reviewer_avgs&quot;&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;take&lt;span &gt;.&lt;/span&gt;reviewer_avg&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The result of this query is &lt;code &gt;50.652&lt;/code&gt;. This makes sense given that the seed data used a random value between 1 and 100 (&lt;code &gt;rand(1..100)&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;the-having-subquery&quot; &gt;&lt;a href=&quot;#the-having-subquery&quot; aria-label=&quot;the having subquery permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The Having Subquery&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Leigh, certain reviewers are consistently giving low performance review scores. Could you find us a list of all the managers whose average score is 25% below our company average? We need to find out what is happening.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We will start by joining the &lt;code &gt;employees&lt;/code&gt; table to the &lt;code &gt;performance_reviews&lt;/code&gt; table where the employee is the reviewer (a manager), and then take their average score. Then we will filter out these managers using a &lt;code &gt;HAVING&lt;/code&gt; clause to only include those whose score increased by 25% is still lower than the company average.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  employees&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;score&lt;span &gt;)&lt;/span&gt; avg_score&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;(&lt;/span&gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;score&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; performance_reviews&lt;span &gt;)&lt;/span&gt; company_avg
&lt;span &gt;FROM&lt;/span&gt;
  employees
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; performance_reviews
    &lt;span &gt;ON&lt;/span&gt; performance_reviews&lt;span &gt;.&lt;/span&gt;reviewer_id &lt;span &gt;=&lt;/span&gt; employees&lt;span &gt;.&lt;/span&gt;id
&lt;span &gt;GROUP&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; employees&lt;span &gt;.&lt;/span&gt;id
&lt;span &gt;HAVING&lt;/span&gt;
  &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;score&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;0.75&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
    &lt;span &gt;(&lt;/span&gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;avg&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;score&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;FROM&lt;/span&gt; performance_reviews&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You&apos;ll notice that I actually included &lt;strong&gt;two&lt;/strong&gt; subqueries in the above SQL. Because the SQL was saved to a variable (&lt;code &gt;avg_sql&lt;/code&gt;), we were able to reuse this both within the &lt;code &gt;SELECT&lt;/code&gt; portion of the query, and also within the &lt;code &gt;HAVING&lt;/code&gt; clause.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;avg_sql &lt;span &gt;=&lt;/span&gt; &lt;span &gt;PerformanceReview&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;avg(score)&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to_sql

&lt;span &gt;Employee&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;joins&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:employee_reviews&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;select&lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;&apos;employees.*&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&apos;avg(score) avg_score&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;(&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;avg_sql&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;) company_avg&quot;&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;group&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;employees.id&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;having&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;avg(score) &amp;lt; 0.75 * (&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;avg_sql&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;)&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The result of this query gives me 103 employees, and the first three of them look like:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;[&lt;/span&gt;
  &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;173&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;role_id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;Bob Williams&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;109206.0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_score&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;23.75&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;company_avg&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;50.04&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
  &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;390&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;role_id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;5&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;Bob Serna&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;127559.0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_score&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;26.0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;company_avg&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;50.04&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; 
  &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;802&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;role_id&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;4&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;Alice Halliday&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;salary&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;94956.0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;avg_score&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;35.88&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;company_avg&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;50.04&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;
&lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article we were able to see a number of (somewhat) real-life examples of real business needs translating first into SQL, and then into the Rails code necessary to find those records. A backend developer&apos;s career will consist in most likely hundreds of similar requests!&lt;/p&gt;
&lt;p&gt;Active Record gives us the ability to perform simple queries simply, but also lets us access the raw SQL which is sometimes required to get our jobs done. Subqueries are a perfect example of that, and we saw how to create subqueries in Rails and Active Record in the &lt;code &gt;SELECT&lt;/code&gt;, &lt;code &gt;FROM&lt;/code&gt;, &lt;code &gt;WHERE&lt;/code&gt;, and &lt;code &gt;HAVING&lt;/code&gt; clauses of an SQL statement. As we have seen in the examples above, with the expressiveness of Active Record, one doesn’t have to resort to writing completely in SQL to use a subquery.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article we’d appreciate it if you’d &lt;a href=&quot;https://ctt.ac/_6t20&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;you-might-also-be-interested-in&quot; &gt;&lt;a href=&quot;#you-might-also-be-interested-in&quot; aria-label=&quot;you might also be interested in permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;You might also be interested in&lt;/h2&gt;
&lt;p&gt;Learn more about how to make the most of Postgres and Ruby on Rails:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;eBook: Best Practices for Optimizing Postgres Query Performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a src=&quot;https://pganalyze.com/blog/materialized-views-ruby-rails&quot;&gt;Effectively Using Materialized Views in Ruby on Rails&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a src=&quot;https://pganalyze.com/blog/efficient-graphql-queries-in-ruby-on-rails-and-postgres&quot;&gt;Efficient GraphQL queries in Ruby on Rails &amp;#x26; Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the author&lt;/h2&gt;
&lt;p&gt;Leigh Halliday is a guest author for the &lt;a src=&quot;https://pganalyze.com/&quot;&gt;pganalyze&lt;/a&gt; blog. He is a developer based out of Canada who works at &lt;a href=&quot;https://www.flipgive.com&quot;&gt;FlipGive&lt;/a&gt; as a full-stack developer. He writes about Ruby and React on &lt;a href=&quot;https://www.leighhalliday.com&quot;&gt;his blog&lt;/a&gt; and publishes React tutorials on &lt;a href=&quot;https://youtube.com/leighhalliday&quot;&gt;YouTube&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Full Text Search in Milliseconds with Rails and PostgreSQL]]></title><description><![CDATA[Imagine the following scenario: You have a database full of job titles and descriptions, and you’re trying to find the best match. Typically you’d start by using an ILIKE expression, but this requires the search phrase to be an exact match. Then you might use trigrams, allowing spelling mistakes and inexact matches based on word similarity, but this makes it difficult to search using multiple words. What you really want to use is Full Text Search, providing the benefits of ILIKE and trigrams…]]></description><link>https://pganalyze.com/blog/full-text-search-ruby-rails-postgres</link><guid isPermaLink="false">https://pganalyze.com/blog/full-text-search-ruby-rails-postgres</guid><dc:creator><![CDATA[Leigh Halliday]]></dc:creator><pubDate>Thu, 16 Apr 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p &gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/9d4c1a81657f389003bb2790dc2af0ec/2cefc/header.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Postgres Full Text Search Example&quot; title=&quot;Postgres Full Text Search Example&quot; src=&quot;https://pganalyze.com/static/9d4c1a81657f389003bb2790dc2af0ec/1d69c/header.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Imagine the following scenario:&lt;/strong&gt; You have a database full of job titles and descriptions, and you’re trying to find the best match. Typically you’d start by using an &lt;a href=&quot;https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE&quot;&gt;ILIKE expression&lt;/a&gt;, but this requires the search phrase to be an exact match. Then you might use &lt;a href=&quot;https://pganalyze.com/blog/similarity-in-postgres-and-ruby-on-rails-using-trigrams&quot;&gt;trigrams&lt;/a&gt;, allowing spelling mistakes and inexact matches based on word similarity, but this makes it difficult to search using multiple words. What you really want to use is &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch.html&quot;&gt;Full Text Search&lt;/a&gt;, providing the benefits of ILIKE and trigrams, with the added ability to easily search through large documents using natural language.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-foundations-of-full-text-search&quot;&gt;The Foundations of Full Text Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#implementing-postgres-full-text-search-in-rails&quot;&gt;Implementing Postgres Full Text Search in Rails&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#configuring-pg_search&quot;&gt;Configuring pg_search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#optimizing-full-text-search-queries-in-rails&quot;&gt;Optimizing Full Text Search Queries in Rails&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#you-might-also-be-interested-in&quot;&gt;You might also be interested in&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the author&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;To summarize, here is a quick overview of popular built-in Postgres search options:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Postgres Feature&lt;/th&gt;
&lt;th&gt;Typical Use Case&lt;/th&gt;
&lt;th&gt;Can be indexed?&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LIKE/ILIKE&lt;/td&gt;
&lt;td&gt;Wildcard-style search for small data&lt;/td&gt;
&lt;td&gt;Sometimes&lt;/td&gt;
&lt;td&gt;Unpredictable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pg_trgm&lt;/td&gt;
&lt;td&gt;Similarity search for names, etc&lt;/td&gt;
&lt;td&gt;Yes (GIN/GIST)&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Text Search&lt;/td&gt;
&lt;td&gt;Natural language search&lt;/td&gt;
&lt;td&gt;Yes (GIN/GIST)&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In this article, we are going to learn about the inner workings of Full Text Search in Postgres and how to easily integrate Full Text Search into your Rails application using a fantastic gem named &lt;a href=&quot;https://rubygems.org/gems/pg_search&quot;&gt;pg_search&lt;/a&gt;. We will learn how to search multiple columns at once, to give one column precedence over another, and how to optimize our Full Text Search implementation, taking a single query from &lt;strong&gt;130ms&lt;/strong&gt; to &lt;strong&gt;7ms&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The full source code used in this article can &lt;a href=&quot;https://github.com/pganalyze/full-text-search-rails&quot;&gt;be found here&lt;/a&gt;. Instructions on how to run this application locally and how to load the sample data referenced within this article can be found in the README.&lt;/p&gt;
&lt;p&gt;If you are interested in efficient &lt;a href=&quot;https://pganalyze.com/blog/full-text-search-django-postgres&quot;&gt;Full Text Search in Postgres with Django&lt;/a&gt;, you can read our article about it.&lt;/p&gt;
&lt;h2 id=&quot;the-foundations-of-full-text-search&quot; &gt;&lt;a href=&quot;#the-foundations-of-full-text-search&quot; aria-label=&quot;the foundations of full text search permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The Foundations of Full Text Search&lt;/h2&gt;
&lt;p&gt;Let&apos;s break down the basics of Full Text Search, defining and explaining some of the most common terms you&apos;ll run into. Taking the text “looking for the right words”, we can see how Postgres stores this data internally, using the &lt;code &gt;to_tsvector&lt;/code&gt; function:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;looking for the right words&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- &apos;look&apos;:1 &apos;right&apos;:4 &apos;word&apos;:5&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the above SQL we have some text; often referred to as a &lt;code &gt;document&lt;/code&gt; when talking about Full Text Search. A document &lt;em&gt;must&lt;/em&gt; be parsed and converted into a special data type called a &lt;code &gt;tsvector&lt;/code&gt;, which we did using the function &lt;code &gt;to_tsvector&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code &gt;tsvector&lt;/code&gt; data type is comprised of &lt;a href=&quot;https://en.wikipedia.org/wiki/Lexeme&quot;&gt;lexemes&lt;/a&gt;. Lexemes are &lt;a href=&quot;https://github.com/Casecommons/pg_search#normalization&quot;&gt;normalized key words&lt;/a&gt; which were contained in the document that will be used when searching through it. In this case we used the &lt;code &gt;english&lt;/code&gt; language dictionary to normalize the words, breaking them down to their root. This means that &lt;code &gt;words&lt;/code&gt; became &lt;code &gt;word&lt;/code&gt;, and &lt;code &gt;looking&lt;/code&gt; became &lt;code &gt;look&lt;/code&gt;, with &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS&quot;&gt;very common words&lt;/a&gt; such as &lt;code &gt;for&lt;/code&gt; and &lt;code &gt;the&lt;/code&gt; being removed completely, to avoid false positives.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;looking for the right words&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; @@ to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;words&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- TRUE&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code &gt;@@&lt;/code&gt; operator allows us to check if a query (data type &lt;code &gt;tsquery&lt;/code&gt;) exists within a document (data type &lt;code &gt;tsvector&lt;/code&gt;). Much like &lt;code &gt;tsvector&lt;/code&gt;, &lt;code &gt;tsquery&lt;/code&gt; is also normalized prior to searching the document for matches.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;looking for the right words&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;words&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
   &lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- 0.06079271&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code &gt;ts_rank&lt;/code&gt; function takes a &lt;code &gt;tsvector&lt;/code&gt; and a &lt;code &gt;tsquery&lt;/code&gt;, returning a number that can be used when sorting the matching records, allowing us to sort the results from highest to lowest ranking.&lt;/p&gt;
&lt;p&gt;Now that you have seen a few examples, let’s have a look at one last one before getting to Rails. Following, you can see an example of a query which searches through the &lt;code &gt;jobs&lt;/code&gt; table where we are storing the &lt;code &gt;title&lt;/code&gt; and &lt;code &gt;description&lt;/code&gt; of each job. Here we are searching for the words &lt;code &gt;ruby&lt;/code&gt; and &lt;code &gt;rails&lt;/code&gt;, grabbing the 3 highest ranking results.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  id&lt;span &gt;,&lt;/span&gt;
  title&lt;span &gt;,&lt;/span&gt;
  ts_rank&lt;span &gt;(&lt;/span&gt;
    to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;ruby &amp;amp; rails&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; rank
&lt;span &gt;FROM&lt;/span&gt; jobs
&lt;span &gt;WHERE&lt;/span&gt;
  to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; title&lt;span &gt;)&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;)&lt;/span&gt; @@
  to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;ruby &amp;amp; rails&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; rank &lt;span &gt;DESC&lt;/span&gt;
&lt;span &gt;LIMIT&lt;/span&gt; &lt;span &gt;3&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The highest ranking result is a job with the title &quot;Ruby on Rails Developer&quot;... perfect! The full results of this query are:&lt;/p&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;[&lt;/span&gt;
  &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&quot;title&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Ruby on Rails Developer&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&quot;rank&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.40266925&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;109&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&quot;title&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Senior Ruby Developer - Remote&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&quot;rank&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.26552397&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;&quot;id&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;151&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&quot;title&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Team-Lead Developer&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    &lt;span &gt;&quot;rank&quot;&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;0.14533159&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query is actually concatenating (using &lt;code &gt;||&lt;/code&gt;) two &lt;code &gt;tsvector&lt;/code&gt; fields together. This allows us to search both the &lt;code &gt;title&lt;/code&gt; and the &lt;code &gt;description&lt;/code&gt; at the same time. Later, we&apos;ll see how to give additional weight (precedence) to the &lt;code &gt;title&lt;/code&gt; column.&lt;/p&gt;
&lt;h2 id=&quot;implementing-postgres-full-text-search-in-rails&quot; &gt;&lt;a href=&quot;#implementing-postgres-full-text-search-in-rails&quot; aria-label=&quot;implementing postgres full text search in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Implementing Postgres Full Text Search in Rails&lt;/h2&gt;
&lt;p&gt;With a basic understanding of Full Text Search under our belts, it&apos;s time to take our knowledge over to Rails. We will be using the &lt;a href=&quot;https://rubygems.org/gems/pg_search&quot;&gt;pg_search&lt;/a&gt; Gem, which can be used in two ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/Casecommons/pg_search#multi-search&quot;&gt;Multi Search&lt;/a&gt;: Search across multiple models and return a single array of results. Imagine having three models: Product, Brand, and Review. Using &lt;strong&gt;Multi Search&lt;/strong&gt; we could search across all of them at the same time, seeing a single set of search results. This would be perfect for adding &lt;a href=&quot;https://en.wikipedia.org/wiki/Federated_search&quot;&gt;federated search&lt;/a&gt; functionality to your app.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/Casecommons/pg_search#pg_search_scope&quot;&gt;Search Scope&lt;/a&gt;: Search within a single model, but with greater flexibility.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We will be focusing on the &lt;strong&gt;Search Scope&lt;/strong&gt; approach in this article, as it lets us dive into the configuration options available when working with Full Text Search in Rails. Let&apos;s add the Gem to our &lt;code &gt;Gemfile&lt;/code&gt; and get started:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# Gemfile&lt;/span&gt;
gem &lt;span &gt;&apos;pg_search&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;~&gt; 2.3&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&gt;= 2.3.2&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With that done, we can include a module in our &lt;code &gt;Job&lt;/code&gt; model, and define our first searchable field:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Job&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  &lt;span &gt;include&lt;/span&gt; &lt;span &gt;PgSearch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Model&lt;/span&gt;
  pg_search_scope &lt;span &gt;:search_title&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; against&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:title&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This adds a class level method to &lt;code &gt;Job&lt;/code&gt;, allowing us to find jobs with the following line, which automatically returns them ranked from best match to worst.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Job&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;search_title&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;Ruby on Rails&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If we were to append &lt;code &gt;to_sql&lt;/code&gt; to the above Ruby statement, we can see the SQL that is being generated. I have to warn you, it’s a bit messy, but that is because it handles not only searching, but also putting the results in the correct order using the &lt;code &gt;ts_rank&lt;/code&gt; function.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  &lt;span &gt;&quot;jobs&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt;
  &lt;span &gt;&quot;jobs&quot;&lt;/span&gt;
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt;
      &lt;span &gt;&quot;jobs&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; pg_search_id&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;(&lt;/span&gt;ts_rank&lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;coalesce&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;jobs&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;title&quot;&lt;/span&gt;::&lt;span &gt;text&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos; &apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos;Ruby&apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos; &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos; &apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos;on&apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos; &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos; &apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos;Rails&apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos; &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; rank
    &lt;span &gt;FROM&lt;/span&gt;
      &lt;span &gt;&quot;jobs&quot;&lt;/span&gt;
    &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;coalesce&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;jobs&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;title&quot;&lt;/span&gt;::&lt;span &gt;text&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; @@ &lt;span &gt;(&lt;/span&gt;to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos; &apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos;Ruby&apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos; &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos; &apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos;on&apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos; &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&amp;amp;&amp;amp;&lt;/span&gt; to_tsquery&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;simple&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;&apos;&apos; &apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos;Rails&apos;&lt;/span&gt; &lt;span &gt;||&lt;/span&gt; &lt;span &gt;&apos; &apos;&apos;&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; pg_search_5d9a17cb70b9733aadc073 &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;&quot;jobs&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; pg_search_5d9a17cb70b9733aadc073&lt;span &gt;.&lt;/span&gt;pg_search_id
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt;
  pg_search_5d9a17cb70b9733aadc073&lt;span &gt;.&lt;/span&gt;rank &lt;span &gt;DESC&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;jobs&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;ASC&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;configuring-pg_search&quot; &gt;&lt;a href=&quot;#configuring-pg_search&quot; aria-label=&quot;configuring pg_search permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Configuring pg_search&lt;/h2&gt;
&lt;p&gt;There are a number of ways you can configure pg_search: From support for &lt;a href=&quot;https://github.com/Casecommons/pg_search#prefix-postgresql-84-and-newer-only&quot;&gt;prefixes&lt;/a&gt; and &lt;a href=&quot;https://github.com/Casecommons/pg_search#negation&quot;&gt;negation&lt;/a&gt;, to specifying which language dictionary to use when normalizing the document, as well as adding multiple, weighted columns.&lt;/p&gt;
&lt;p&gt;By default &lt;code &gt;pg_search&lt;/code&gt; uses the &lt;code &gt;simple&lt;/code&gt; dictionary, which does zero normalization, but if we wanted to normalize our document using the &lt;code &gt;english&lt;/code&gt; dictionary, searching across both the &lt;code &gt;title&lt;/code&gt; and &lt;code &gt;description&lt;/code&gt;, it would look like:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Job&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  &lt;span &gt;include&lt;/span&gt; &lt;span &gt;PgSearch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Model&lt;/span&gt;
  pg_search_scope &lt;span &gt;:search_job&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                  against&lt;span &gt;:&lt;/span&gt; &lt;span &gt;%i[title description]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                  using&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; tsearch&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; dictionary&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;english&apos;&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can perform a search in the same way we did before: &lt;code &gt;Job.search_job(&quot;Ruby on Rails&quot;)&lt;/code&gt;. If we wanted to give higher precedence to the &lt;code &gt;title&lt;/code&gt; column, we can add &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch-controls.html&quot;&gt;weighting scores&lt;/a&gt; to each of the columns, with possible values of: A, B, C, D.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Job&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  &lt;span &gt;include&lt;/span&gt; &lt;span &gt;PgSearch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Model&lt;/span&gt;
  pg_search_scope &lt;span &gt;:search_job&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                  against&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; title&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;A&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;B&apos;&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                  using&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; tsearch&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; dictionary&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;english&apos;&lt;/span&gt; &lt;span &gt;}&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When you start combining columns, weighting them, and choosing which dictionary provides the best results, it really comes down to trial and error. Play around with it, try some queries and see if the results you get back match with what you are expecting!&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;optimizing-full-text-search-queries-in-rails&quot; &gt;&lt;a href=&quot;#optimizing-full-text-search-queries-in-rails&quot; aria-label=&quot;optimizing full text search queries in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Optimizing Full Text Search Queries in Rails&lt;/h2&gt;
&lt;p&gt;We have a problem! The query that is produced by &lt;code &gt;Job.search_job(&quot;Ruby on Rails&quot;)&lt;/code&gt; takes an astounding &lt;strong&gt;130ms&lt;/strong&gt;. That may not &lt;em&gt;seem&lt;/em&gt; like such a large number, but it is astounding because there are only 145 records in my database. Imagine if there were thousands! The majority of time is spent in the &lt;code &gt;to_tsvector&lt;/code&gt; function. We can verify this by running this streamlined query below, which takes almost as much time to execute as the full query which actually finds the matching jobs:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; to_tsvector&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; jobs&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- ~130ms&lt;/span&gt;

&lt;span &gt;SELECT&lt;/span&gt; description &lt;span &gt;FROM&lt;/span&gt; jobs&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;-- ~15ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This tells me that the slowness is in re-parsing and normalizing the document into a &lt;code &gt;tsvector&lt;/code&gt; data type every single time the query is executed. The folks at thoughtbot have a &lt;a href=&quot;https://thoughtbot.com/blog/optimizing-full-text-search-with-postgres-tsvector-columns-and-triggers&quot;&gt;great article about Full Text Search optimizations&lt;/a&gt;, where they add a pre-calculated &lt;code &gt;tsvector&lt;/code&gt; column, keeping it up-to-date with triggers. This is great because it &lt;strong&gt;allows us to avoid re-parsing our document for every query&lt;/strong&gt; and also lets us index this column!&lt;/p&gt;
&lt;p&gt;There is a similar but slightly different approach I want to cover today which I learned by reading through the Postgres documentation. It also involves adding a pre-calculated &lt;code &gt;tsvector&lt;/code&gt; column, but is done using a &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch-tables.html&quot;&gt;stored generated column&lt;/a&gt;. &lt;strong&gt;This means we don&apos;t need any triggers!&lt;/strong&gt; It should be noted that this approach is &lt;strong&gt;only available in Postgres 12 and above.&lt;/strong&gt; If you are using version 11 or earlier, the approach in the thoughtbot article is probably still the best one.&lt;/p&gt;
&lt;p&gt;As we are venturing into the territory of more custom Postgres functionality, not easily supported by the Rails schema file in Ruby, we&apos;ll want to &lt;a href=&quot;https://edgeguides.rubyonrails.org/active_record_migrations.html#types-of-schema-dumps&quot;&gt;switch the schema format&lt;/a&gt; from &lt;code &gt;:ruby&lt;/code&gt; to &lt;code &gt;:sql&lt;/code&gt;. This line can be added to the &lt;code &gt;application.rb&lt;/code&gt; file:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;config&lt;span &gt;.&lt;/span&gt;active_record&lt;span &gt;.&lt;/span&gt;schema_format &lt;span &gt;=&lt;/span&gt; &lt;span &gt;:sql&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, let&apos;s generate a migration to add a new column to the &lt;code &gt;jobs&lt;/code&gt; table which will be automatically generated based on the &lt;code &gt;setweight&lt;/code&gt; and &lt;code &gt;to_tsvector&lt;/code&gt; functions:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;AddSearchableColumnToJobs&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;up&lt;/span&gt;&lt;/span&gt;
    execute &lt;span &gt;&lt;span &gt;&lt;span &gt;&amp;lt;&amp;lt;-&lt;/span&gt;SQL&lt;/span&gt;
      ALTER TABLE jobs
      ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector(&apos;english&apos;, coalesce(title, &apos;&apos;)), &apos;A&apos;) ||
        setweight(to_tsvector(&apos;english&apos;, coalesce(description,&apos;&apos;)), &apos;B&apos;)
      ) STORED;
    &lt;span &gt;SQL&lt;/span&gt;&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;down&lt;/span&gt;&lt;/span&gt;
    remove_column &lt;span &gt;:jobs&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:searchable&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that, as of the writing of this article, Postgres always &lt;strong&gt;requires a generated column to be a “Stored” column&lt;/strong&gt;. That means it actually occupies space in your table and gets written on each INSERT/UPDATE. This also means that when you add a generated column to a table, it will require a rewrite of the table to actually set the values for all existing rows. This &lt;strong&gt;may block other operations&lt;/strong&gt; on your database.&lt;/p&gt;
&lt;p&gt;With our &lt;code &gt;tsvector&lt;/code&gt; column added (which is giving precedence to the &lt;code &gt;title&lt;/code&gt; over the &lt;code &gt;description&lt;/code&gt;, is using the &lt;code &gt;english&lt;/code&gt; dictionary, and is coalescing &lt;code &gt;null&lt;/code&gt; values into empty strings), we&apos;re ready to add an index to it. Either &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch-indexes.html&quot;&gt;GIN or GiST indexes&lt;/a&gt; can be used to speed up full text searches, but Postgres recommends &lt;code &gt;GIN&lt;/code&gt; as the preferred index due to &lt;code &gt;GiST&lt;/code&gt; searches being lossy, which &lt;strong&gt;may produce false matches&lt;/strong&gt;. We&apos;ll &lt;a href=&quot;https://thoughtbot.com/blog/how-to-create-postgres-indexes-concurrently-in&quot;&gt;add it concurrently&lt;/a&gt; to avoid locking issues when adding an index to large tables.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;AddIndexToSearchableJobs&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  disable_ddl_transaction&lt;span &gt;!&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;change&lt;/span&gt;&lt;/span&gt;
    add_index &lt;span &gt;:jobs&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:searchable&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; using&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:gin&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; algorithm&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:concurrently&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The last thing we need to do is to tell &lt;code &gt;pg_search&lt;/code&gt; to &lt;a href=&quot;https://github.com/Casecommons/pg_search#using-tsvector-columns&quot;&gt;use our tsvector&lt;/a&gt; &lt;code &gt;searchable&lt;/code&gt; column, rather than re-parsing the &lt;code &gt;title&lt;/code&gt; and &lt;code &gt;description&lt;/code&gt; fields each time. This is done by adding the &lt;code &gt;tsvector_column&lt;/code&gt; option to &lt;code &gt;tsearch&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Job&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  &lt;span &gt;include&lt;/span&gt; &lt;span &gt;PgSearch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Model&lt;/span&gt;
  pg_search_scope &lt;span &gt;:search_job&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                  against&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; title&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;A&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; description&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;B&apos;&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
                  using&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                    tsearch&lt;span &gt;:&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
                      dictionary&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;english&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; tsvector_column&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;searchable&apos;&lt;/span&gt;
                    &lt;span &gt;}&lt;/span&gt;
                  &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With this optimization done, we have gone from around &lt;strong&gt;130ms&lt;/strong&gt; to &lt;strong&gt;7ms&lt;/strong&gt; per query... not bad at all!&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/advanced-database-programming-rails-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        title=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        src=&quot;https://pganalyze.com/static/24260e03f3c098e161f84b87ce28122b/acb04/ebook_promo_advanced_database_programming_rails_postgres.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Let’s have a look at a real-life data set. We can prove the precision of our approach by looking at my database: Out of 145 jobs pulled from the &lt;a href=&quot;https://jobs.github.com/&quot;&gt;GitHub&lt;/a&gt; and &lt;a href=&quot;https://news.ycombinator.com/jobs&quot;&gt;Hacker News&lt;/a&gt; job APIs, searching for &quot;Ruby on Rails&quot; returns the following results:&lt;/p&gt;
&lt;div  data-language=&quot;json&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;[&lt;/span&gt;
  &lt;span &gt;&quot;Ruby on Rails Developer&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;Senior Ruby Developer - Remote&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;Gobble (YC W14) – Senior Full Stack Software Engineers – Toronto, On&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;DevOps (Remote - Europe)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;CareRev (YC S16) Is Hiring a Senior Back End Engineer in Los Angeles&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;Software Engineer, Full Stack (Rails, React)&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;Software Engineer&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;&quot;Technology Solutions Developer&quot;&lt;/span&gt;
&lt;span &gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;To summarize:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We have shown how to use Postgres&apos; Full Text Search within Rails and also how to customize it both in terms of functionality, but also in terms of performance. We ended up with a performant and flexible solution right inside the database we were already using.&lt;/p&gt;
&lt;p&gt;Many use cases for Full Text Search can be implemented directly inside Postgres, avoiding the need to install and maintain additional services such as Elasticsearch.&lt;/p&gt;
&lt;p&gt;If you find this article useful and want to share it with your peers you can &lt;a href=&quot;https://ctt.ac/dc_7d&quot;&gt;tweet about it here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;you-might-also-be-interested-in&quot; &gt;&lt;a href=&quot;#you-might-also-be-interested-in&quot; aria-label=&quot;you might also be interested in permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;You might also be interested in&lt;/h2&gt;
&lt;p&gt;Learn more about how to make the most of Postgres and Ruby on Rails:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;eBook: Best Practices for Optimizing Postgres Query Performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a src=&quot;https://pganalyze.com/blog/materialized-views-ruby-rails&quot;&gt;Effectively Using Materialized Views in Ruby on Rails&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a src=&quot;https://pganalyze.com/blog/efficient-graphql-queries-in-ruby-on-rails-and-postgres&quot;&gt;Efficient GraphQL queries in Ruby on Rails &amp;#x26; Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the author&lt;/h2&gt;
&lt;p&gt;Leigh Halliday is a guest author for the &lt;a src=&quot;https://pganalyze.com/&quot;&gt;pganalyze&lt;/a&gt; blog. He is a developer based out of Canada who works at &lt;a href=&quot;https://www.flipgive.com&quot;&gt;FlipGive&lt;/a&gt; as a full-stack developer. He writes about Ruby and React on &lt;a href=&quot;https://www.leighhalliday.com&quot;&gt;his blog&lt;/a&gt; and publishes React tutorials on &lt;a href=&quot;https://youtube.com/leighhalliday&quot;&gt;YouTube&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Effectively Using Materialized Views in Ruby on Rails]]></title><description><![CDATA[It's every developer's nightmare: SQL queries that get large and unwieldy. This can happen fairly quickly with the addition of multiple joins, a subquery and some complicated filtering logic. I have personally seen queries grow to nearly one hundred lines long in both the financial services and health industries. Luckily Postgres provides two ways to encapsulate large queries: Views and Materialized Views. In this article, we will cover in detail how to utilize both views and materialized views…]]></description><link>https://pganalyze.com/blog/materialized-views-ruby-rails</link><guid isPermaLink="false">https://pganalyze.com/blog/materialized-views-ruby-rails</guid><dc:creator><![CDATA[Leigh Halliday]]></dc:creator><pubDate>Thu, 16 Jan 2020 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;It&apos;s every developer&apos;s nightmare: SQL queries that get large and unwieldy. This can happen fairly quickly with the addition of multiple joins, a subquery and some complicated filtering logic. I have personally seen queries grow to nearly one hundred lines long in both the financial services and health industries.&lt;/p&gt;
&lt;p&gt;Luckily Postgres provides two ways to encapsulate large queries: &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createview.html&quot;&gt;Views&lt;/a&gt; and &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-creatematerializedview.html&quot;&gt;Materialized Views&lt;/a&gt;. In this article, we will cover in detail how to utilize both views and materialized views within &lt;strong&gt;Ruby on Rails&lt;/strong&gt;, and we can even take a look at creating and modifying them with database migrations.&lt;/p&gt;
&lt;p&gt;Our example will be a real-world sized dataset of hockey teams and their top scorers. If you&apos;d like to follow along, the source code covered in this article &lt;a href=&quot;https://github.com/pganalyze/materialized-views-demo&quot;&gt;can be found here&lt;/a&gt;.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-a-view&quot;&gt;What is a view?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-makes-a-view-materialized&quot;&gt;What makes a view materialized?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#creating-a-materialized-view&quot;&gt;Creating a materialized view&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#utilizing-a-materialized-view&quot;&gt;Utilizing a materialized view&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#refreshing-a-materialized-view&quot;&gt;Refreshing a materialized view&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-to-use-views-vs-materialized-views&quot;&gt;When to use views vs. materialized views?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#migrating-views&quot;&gt;Migrating views&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#testing-with-materialized-views&quot;&gt;Testing with materialized views&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;We&apos;ll also talk a bit about the performance benefits that a &lt;strong&gt;Materialized View&lt;/strong&gt; can bring to your application:&lt;/p&gt;
&lt;p &gt;
&lt;span  &gt;
      &lt;a  src=&quot;https://pganalyze.com/static/e8d6f0687816761fa4f44cf291976afc/d698c/header.png&quot;  target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
    &lt;span  &gt;&lt;/span&gt;
  &lt;img  alt=&quot;Visualization of Materialized View Plan Difference&quot; title=&quot;Visualization of Materialized View Plan Difference&quot; src=&quot;https://pganalyze.com/static/e8d6f0687816761fa4f44cf291976afc/1d69c/header.png&quot;    loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;
  &lt;/a&gt;
    &lt;/span&gt;
&lt;/p&gt;
&lt;h2 id=&quot;what-is-a-view&quot; &gt;&lt;a href=&quot;#what-is-a-view&quot; aria-label=&quot;what is a view permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What is a view?&lt;/h2&gt;
&lt;p&gt;A view allows us to query against the result of another query, providing a powerful way of abstracting away a complex query full of joins, conditions, groupings, and any other clause that can be added to an SQL query. Looking at the query below, it isn&apos;t overly complex, but it &lt;em&gt;does&lt;/em&gt; include 3 joins, grouping by a number of fields to aggregate the numbers of goals scored for a player each season.&lt;/p&gt;
&lt;p&gt;It takes approximately 450ms to execute on my computer. I am using seed data that generates &lt;a href=&quot;https://www.nhl.com/info/teams&quot;&gt;31 teams&lt;/a&gt;, each playing 200 games in a season, scoring 20 goals per game... a little unrealistic, but I wanted the dataset used to be substantial!&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt;
  players&lt;span &gt;.&lt;/span&gt;name &lt;span &gt;AS&lt;/span&gt; player_name&lt;span &gt;,&lt;/span&gt;
  players&lt;span &gt;.&lt;/span&gt;id &lt;span &gt;AS&lt;/span&gt; player_id&lt;span &gt;,&lt;/span&gt;
  players&lt;span &gt;.&lt;/span&gt;position &lt;span &gt;AS&lt;/span&gt; player_position&lt;span &gt;,&lt;/span&gt;
  matches&lt;span &gt;.&lt;/span&gt;season &lt;span &gt;AS&lt;/span&gt; season&lt;span &gt;,&lt;/span&gt;
  teams&lt;span &gt;.&lt;/span&gt;name &lt;span &gt;AS&lt;/span&gt; team_name&lt;span &gt;,&lt;/span&gt;
  teams&lt;span &gt;.&lt;/span&gt;id &lt;span &gt;AS&lt;/span&gt; team_id&lt;span &gt;,&lt;/span&gt;
  &lt;span &gt;count&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;goals&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;AS&lt;/span&gt; goal_count
&lt;span &gt;FROM&lt;/span&gt; goals
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; players &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;goals&lt;span &gt;.&lt;/span&gt;player_id &lt;span &gt;=&lt;/span&gt; players&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; matches &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;goals&lt;span &gt;.&lt;/span&gt;match_id &lt;span &gt;=&lt;/span&gt; matches&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;INNER&lt;/span&gt; &lt;span &gt;JOIN&lt;/span&gt; teams &lt;span &gt;ON&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;goals&lt;span &gt;.&lt;/span&gt;team_id &lt;span &gt;=&lt;/span&gt; teams&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;GROUP&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; players&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; teams&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; matches&lt;span &gt;.&lt;/span&gt;season&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A view allows us to take the final result of this query, and query &lt;em&gt;against&lt;/em&gt; that as if it were any other table. You can see why views can come in handy in many different scenarios. They allow for the succinct abstraction of a complicated query, and allow us to re-use this logic in a simple to understand way.&lt;/p&gt;
&lt;p&gt;Now, we could make a new view by running &lt;code &gt;CREATE VIEW&lt;/code&gt; in Postgres. But, as we all know, one-off schema changes are hard to keep track of. Instead, let&apos;s try something thats closer to how Rails does things. How does that look like? First things first, we&apos;ll create a view using &lt;a href=&quot;https://github.com/scenic-views/scenic&quot;&gt;Scenic&lt;/a&gt;. Scenic gives us the ability to define migrations that create, update, or drop views, just as you&apos;re used to doing with regular tables in Rails.&lt;/p&gt;
&lt;div  data-language=&quot;shell&quot;&gt;&lt;pre &gt;&lt;code &gt;rails g scenic:view top_scorers&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will generate two files. The first is named &lt;code &gt;db/views/top_scorers_v01.sql&lt;/code&gt;, and in it we will paste the SQL for the underlying query (from above). The second is &lt;code &gt;db/migrate/[date]_create_top_scorers.rb&lt;/code&gt;, and this is where the migration will live to migrate/rollback the creation of our view:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;CreateTopScorers&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;change&lt;/span&gt;&lt;/span&gt;
    create_view &lt;span &gt;:top_scorers&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With the view in place, we can now query against it. This query takes approximately 50ms to execute.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; top_scorers
&lt;span &gt;WHERE&lt;/span&gt;
  team_name &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;Toronto Maple Leafs&apos;&lt;/span&gt;
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; goal_count &lt;span &gt;DESC&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By creating a model in Rails, we can interact with it much like we would be able to with a typical model which is backed by a table. First things first, let&apos;s define the model, letting Rails know it is &lt;strong&gt;read-only&lt;/strong&gt;. Whilst some views &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createview.html#SQL-CREATEVIEW-UPDATABLE-VIEWS&quot;&gt;can be updated&lt;/a&gt;, this view contains a top-level &lt;code &gt;GROUP BY&lt;/code&gt; clause and thus can&apos;t be updated.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# app/models/top_scorer.rb&lt;/span&gt;
&lt;span &gt;class&lt;/span&gt; &lt;span &gt;TopScorer&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;readonly&lt;/span&gt;&lt;/span&gt;&lt;span &gt;?&lt;/span&gt;
    &lt;span &gt;true&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we can perform the same SQL query using the &lt;code &gt;TopScorer&lt;/code&gt; model:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;TopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;team_name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;Toronto Maple Leafs&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;order&lt;span &gt;(&lt;/span&gt;goal_count&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:desc&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;what-makes-a-view-materialized&quot; &gt;&lt;a href=&quot;#what-makes-a-view-materialized&quot; aria-label=&quot;what makes a view materialized permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What makes a view materialized?&lt;/h2&gt;
&lt;p&gt;A regular view still performs the underlying query which defined it. &lt;strong&gt;It will only be as efficient as its underlying query is&lt;/strong&gt;. This means, if the larger query discussed above takes 450ms to execute, executing &lt;code &gt;SELECT * FROM top_scorers&lt;/code&gt; will also take 450ms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Materialized views take regular views to the next level&lt;/strong&gt;, though they aren&apos;t without their drawbacks. The difference is that they save the result of the original query to a cached/temporary table. When you query a materialized view, you aren&apos;t querying the source data, rather the cached result.&lt;/p&gt;
&lt;p&gt;This can provide serious performance benefits, especially considering you can index materialized views. But, when the underlying data from the source tables is updated, the materialized view becomes out of date, serving up an older cached version of the data. We can resolve this by refreshing the materialized view, which we&apos;ll get to in a bit.&lt;/p&gt;
&lt;h2 id=&quot;creating-a-materialized-view&quot; &gt;&lt;a href=&quot;#creating-a-materialized-view&quot; aria-label=&quot;creating a materialized view permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Creating a materialized view&lt;/h2&gt;
&lt;p&gt;Just like we saw with our regular view, materialized views begin the same way, by executing a command to generate a new view migration: &lt;code &gt;rails g scenic:view mat_top_scorers&lt;/code&gt;. This produces two files, the first of which contains the SQL to produce the underlying view of the data. The difference is in the migration, passing in &lt;code &gt;materialized: true&lt;/code&gt; to the &lt;code &gt;create_view&lt;/code&gt; method. Also &lt;strong&gt;notice that we are able to add indexes to the materialized view&lt;/strong&gt;.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;CreateMatTopScorers&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;change&lt;/span&gt;&lt;/span&gt;
    create_view &lt;span &gt;:mat_top_scorers&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; materialized&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;

    add_index &lt;span &gt;:mat_top_scorers&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:player_name&lt;/span&gt;
    add_index &lt;span &gt;:mat_top_scorers&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:player_id&lt;/span&gt;
    add_index &lt;span &gt;:mat_top_scorers&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:team_name&lt;/span&gt;
    add_index &lt;span &gt;:mat_top_scorers&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:team_id&lt;/span&gt;
    add_index &lt;span &gt;:mat_top_scorers&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:season&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;utilizing-a-materialized-view&quot; &gt;&lt;a href=&quot;#utilizing-a-materialized-view&quot; aria-label=&quot;utilizing a materialized view permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Utilizing a materialized view&lt;/h2&gt;
&lt;p&gt;Like a regular view, we are able to define an ActiveRecord model that can query it. Also notice that we can define relationships which point to other ActiveRecord models. If you didn&apos;t know, you might not even realize it is pointing to a materialized view, except for the &lt;code &gt;readonly?&lt;/code&gt; method which was defined.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;MatTopScorer&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  belongs_to &lt;span &gt;:player&lt;/span&gt;
  belongs_to &lt;span &gt;:team&lt;/span&gt;
  belongs_to &lt;span &gt;:match&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;top_scorer_for_season&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;season&lt;span &gt;)&lt;/span&gt;
    where&lt;span &gt;(&lt;/span&gt;season&lt;span &gt;:&lt;/span&gt; season&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;order&lt;span &gt;(&lt;/span&gt;goal_count&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:desc&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;first
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;readonly&lt;/span&gt;&lt;/span&gt;&lt;span &gt;?&lt;/span&gt;
    &lt;span &gt;true&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let&apos;s take our new materialized view for a spin! Running the query &lt;code &gt;select * from mat_top_scorers&lt;/code&gt;, which took 450ms as a view, takes 5ms as a materialized view, &lt;strong&gt;90x faster&lt;/strong&gt;! The ruby code below, which took 50ms as a view, takes under 1ms to execute!&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;MatTopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;team_name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;Toronto Maple Leafs&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;order&lt;span &gt;(&lt;/span&gt;goal_count&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:desc&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For a side-by-side comparison, this performs the same query on both views:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;irb&lt;span &gt;(&lt;/span&gt;main&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;001&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;TopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;team_name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;Toronto Maple Leafs&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count
   &lt;span &gt;(&lt;/span&gt;&lt;span &gt;60.2&lt;/span&gt;ms&lt;span &gt;)&lt;/span&gt;  &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;top_scorers&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;&quot;top_scorers&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;team_name&quot;&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; $&lt;span &gt;1&lt;/span&gt;  &lt;span &gt;[&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&quot;team_name&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;Toronto Maple Leafs&quot;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;30&lt;/span&gt;

irb&lt;span &gt;(&lt;/span&gt;main&lt;span &gt;)&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;002&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;MatTopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;team_name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&apos;Toronto Maple Leafs&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count
   &lt;span &gt;(&lt;/span&gt;&lt;span &gt;1.3&lt;/span&gt;ms&lt;span &gt;)&lt;/span&gt;  &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;mat_top_scorers&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;&quot;mat_top_scorers&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;team_name&quot;&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; $&lt;span &gt;1&lt;/span&gt;  &lt;span &gt;[&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&quot;team_name&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;Toronto Maple Leafs&quot;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;30&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;refreshing-a-materialized-view&quot; &gt;&lt;a href=&quot;#refreshing-a-materialized-view&quot; aria-label=&quot;refreshing a materialized view permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Refreshing a materialized view&lt;/h2&gt;
&lt;p&gt;As mentioned previously, materialized views cache the underlying query&apos;s result to a temporary table. This is what gives us the speed improvements and the ability to add indexes. The downside is that we have to control when the cache is refreshed. Modifying the &lt;code &gt;MatTopScorer&lt;/code&gt; model, let&apos;s add a &lt;code &gt;refresh&lt;/code&gt; method that can be called any time the data is to be refreshed. You will need to figure out how often it makes sense to update the data for your specific use-case, depending on how often the data is changing and how quickly those changes need to be reflected to the end user.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;MatTopScorer&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  belongs_to &lt;span &gt;:player&lt;/span&gt;
  belongs_to &lt;span &gt;:team&lt;/span&gt;
  belongs_to &lt;span &gt;:match&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;refresh&lt;/span&gt;&lt;/span&gt;
    &lt;span &gt;Scenic&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;database&lt;span &gt;.&lt;/span&gt;refresh_materialized_view&lt;span &gt;(&lt;/span&gt;table_name&lt;span &gt;,&lt;/span&gt; concurrently&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; cascade&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;top_scorer_for_season&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;season&lt;span &gt;)&lt;/span&gt;
    where&lt;span &gt;(&lt;/span&gt;season&lt;span &gt;:&lt;/span&gt; season&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;order&lt;span &gt;(&lt;/span&gt;goal_count&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:desc&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;first
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;readonly&lt;/span&gt;&lt;/span&gt;&lt;span &gt;?&lt;/span&gt;
    &lt;span &gt;true&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To schedule the refresh, I like to use the &lt;a href=&quot;https://github.com/javan/whenever&quot;&gt;whenever gem&lt;/a&gt;. Let&apos;s call a rake task to refresh the materialized view every hour:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# config/schedule.rb&lt;/span&gt;
every &lt;span &gt;1.&lt;/span&gt;hour &lt;span &gt;do&lt;/span&gt;
  rake &lt;span &gt;&quot;refreshers:mat_top_scorers&quot;&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The rake task is simple, only calling the &lt;code &gt;refresh&lt;/code&gt; method defined on the &lt;code &gt;MatTopScorer&lt;/code&gt; model.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;# lib/tasks/refreshers.rake&lt;/span&gt;
namespace &lt;span &gt;:refreshers&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
  desc &lt;span &gt;&quot;Refresh materialized view for top scorers&quot;&lt;/span&gt;
  task mat_top_scorers&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:environment&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
    &lt;span &gt;MatTopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;refresh
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;when-to-use-views-vs-materialized-views&quot; &gt;&lt;a href=&quot;#when-to-use-views-vs-materialized-views&quot; aria-label=&quot;when to use views vs materialized views permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;When to use views vs. materialized views?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Views&lt;/strong&gt; focus on abstracting away complexity and encouraging reuse. Views allow you to interact with the &lt;em&gt;result&lt;/em&gt; of a query as if it were a table itself, but they do not provide a performance benefit, as the underlying query is still executed, perfect for sharing logic but still having real-time access to the source data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Materialized Views&lt;/strong&gt; are related to views, but go a step further. You get all the abstraction and reuse of a view, but the underlying data is cached, providing serious performance benefits. Materialized views are especially useful for - for example - reporting dashboards because they can be indexed to allow for performant filtering.&lt;/p&gt;
&lt;p&gt;If the purpose of the view is to provide a cleaner interface to complicated joins and query logic, and performance isn&apos;t too much of an issue, by all means stick with a regular view. Views have the advantage of always being &lt;em&gt;real-time&lt;/em&gt;, since they simply reference the real underlying data rather than a cached copy of it.&lt;/p&gt;
&lt;p&gt;If your purpose is to provide a cleaner interface in addition to performance improvements, and you can live with the data being not quite real-time, then creating it as a materialized view can provide some great benefits.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;migrating-views&quot; &gt;&lt;a href=&quot;#migrating-views&quot; aria-label=&quot;migrating views permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Migrating views&lt;/h2&gt;
&lt;p&gt;It&apos;s easy to &lt;a href=&quot;https://github.com/scenic-views/scenic#cool-but-what-if-i-need-to-change-that-view&quot;&gt;migrate views&lt;/a&gt; in Scenic. Views are versioned by default in Scenic and generating a view with the same name will create a v2, providing two files, just like it did the first time we generated a view (earlier in this article). Likewise, Scenic also provides a way to &lt;a href=&quot;https://github.com/scenic-views/scenic#i-dont-need-this-view-anymore-make-it-go-away&quot;&gt;drop a view&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;testing-with-materialized-views&quot; &gt;&lt;a href=&quot;#testing-with-materialized-views&quot; aria-label=&quot;testing with materialized views permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Testing with materialized views&lt;/h2&gt;
&lt;p&gt;Views and materialized views aren&apos;t particularly challenging to test, but it does require remembering that both types of views don&apos;t contain any original data in and of themselves, they are either a live view of an underlying query, or a cached view of an underlying query, as in the case of materialized views.&lt;/p&gt;
&lt;p&gt;Let&apos;s see how we would populate and then test our &lt;code &gt;MatTopScorer&lt;/code&gt; model in &lt;a href=&quot;https://rspec.info/&quot;&gt;RSpec&lt;/a&gt; and &lt;a href=&quot;https://github.com/thoughtbot/factory_bot&quot;&gt;factory_bot&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;After creating some test data using factory_bot, we&apos;ll call a method which is supposed to return the top scorer for a given season. It returns &lt;code &gt;nil&lt;/code&gt;, and that is expected. The underlying data exists, but because materialized views must be refreshed, something we haven&apos;t done yet, there is no data to be found.&lt;/p&gt;
&lt;p&gt;After calling &lt;code &gt;MatTopScorer.refresh&lt;/code&gt;, we&apos;re now able to retrieve the expected result.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;RSpec&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;describe &lt;span &gt;MatTopScorer&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; type&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:model&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
  describe &lt;span &gt;&quot;#top_scorer_for_season&quot;&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
    it &lt;span &gt;&quot;finds top scorer&quot;&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
      &lt;span &gt;# create some data using factory_bot helper methods&lt;/span&gt;
      match &lt;span &gt;=&lt;/span&gt; create&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:match&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
      player &lt;span &gt;=&lt;/span&gt; create&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:player&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
      goal &lt;span &gt;=&lt;/span&gt; create&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:goal&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; match&lt;span &gt;:&lt;/span&gt; match&lt;span &gt;,&lt;/span&gt; player&lt;span &gt;:&lt;/span&gt; player&lt;span &gt;)&lt;/span&gt;

      &lt;span &gt;# without any data in materialized view&lt;/span&gt;
      expect&lt;span &gt;(&lt;/span&gt;&lt;span &gt;MatTopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;top_scorer_for_season&lt;span &gt;(&lt;/span&gt;match&lt;span &gt;.&lt;/span&gt;season&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to eq&lt;span &gt;(&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;

      &lt;span &gt;MatTopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;refresh

      &lt;span &gt;# with data in materialized view&lt;/span&gt;
      top_scorer &lt;span &gt;=&lt;/span&gt; &lt;span &gt;MatTopScorer&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;top_scorer_for_season&lt;span &gt;(&lt;/span&gt;match&lt;span &gt;.&lt;/span&gt;season&lt;span &gt;)&lt;/span&gt;
      expect&lt;span &gt;(&lt;/span&gt;top_scorer&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to be_present
      expect&lt;span &gt;(&lt;/span&gt;top_scorer&lt;span &gt;.&lt;/span&gt;player&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to eq&lt;span &gt;(&lt;/span&gt;player&lt;span &gt;)&lt;/span&gt;
      expect&lt;span &gt;(&lt;/span&gt;top_scorer&lt;span &gt;.&lt;/span&gt;goal_count&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to eq&lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;With the help of Scenic, using views and materialized views feels right at home in Rails. Truthfully, I haven&apos;t used views as much as I have used materialized views. In particular, I&apos;ve found materialized views incredibly useful when building searchable reporting dashboards.&lt;/p&gt;
&lt;p&gt;The ability to group and summarize data by geographic region, category, grouped by date, in combination with adding the correct indexes has provided an efficient way to report on large amounts of data without relying on external reporting systems or causing excessive load on the production database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article we’d appreciate it if you’d &lt;a href=&quot;https://ctt.ac/3w1DT&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Leigh Halliday is a guest author for the pganalyze blog. He is a developer based out of Canada who works at &lt;a href=&quot;https://www.flipgive.com&quot;&gt;FlipGive&lt;/a&gt; as a full-stack developer. He writes about Ruby and React on &lt;a href=&quot;https://www.leighhalliday.com&quot;&gt;his blog&lt;/a&gt; and publishes React tutorials on &lt;a href=&quot;https://youtube.com/leighhalliday&quot;&gt;YouTube&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Similarity in Postgres and Rails using Trigrams]]></title><description><![CDATA[You typed "postgras", did you mean "postgres"? Use the best tool for the job. It seems like solid advice, but there's something to say about keeping things simple. There is a training and maintenance cost that comes with supporting an ever growing number of tools. It may be better advice to use an existing tool that works well, although not perfect, until it hurts. It all depends on your specific case. Postgres is an amazing relational database, and it supports more features than you might…]]></description><link>https://pganalyze.com/blog/similarity-in-postgres-and-ruby-on-rails-using-trigrams</link><guid isPermaLink="false">https://pganalyze.com/blog/similarity-in-postgres-and-ruby-on-rails-using-trigrams</guid><dc:creator><![CDATA[Leigh Halliday]]></dc:creator><pubDate>Tue, 19 Nov 2019 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/90cd2f76d32d33a522ec1069621e6651/2cefc/postgres_trigrams.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Trigram Example&quot;
        title=&quot;Trigram Example&quot;
        src=&quot;https://pganalyze.com/static/90cd2f76d32d33a522ec1069621e6651/1d69c/postgres_trigrams.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;You typed &lt;strong&gt;&quot;postgras&quot;&lt;/strong&gt;, did you mean &lt;strong&gt;&quot;postgres&quot;&lt;/strong&gt;?&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Use the best tool for the job.&lt;/em&gt; It seems like solid advice, but there&apos;s something to say about keeping things simple. There is a training and maintenance cost that comes with supporting an ever growing number of tools. It may be better advice to use an existing tool that works well, although not perfect, until it hurts. It all depends on your specific case.&lt;/p&gt;
&lt;p&gt;Postgres is an amazing relational database, and it supports more features than you might initially think! It has &lt;a href=&quot;https://www.postgresql.org/docs/current/textsearch.html&quot;&gt;full text search&lt;/a&gt;, &lt;a href=&quot;https://www.postgresql.org/docs/current/datatype-json.html&quot;&gt;JSON documents&lt;/a&gt;, and support for similarity matching through its &lt;a href=&quot;https://www.postgresql.org/docs/current/pgtrgm.html&quot;&gt;pg_trgm&lt;/a&gt; module.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#what-are-trigrams&quot;&gt;What are Trigrams?&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#postgres-trigram-example&quot;&gt;Postgres Trigram example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#ruby-trigram-example&quot;&gt;Ruby Trigram example&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#using-trigrams-in-rails&quot;&gt;Using Trigrams in Rails&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#showing-the-closest-matches-for-a-term-based-on-its-similarity&quot;&gt;Showing the closest matches for a term based on its similarity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;Today, we will break down how to use &lt;strong&gt;pg_trgm&lt;/strong&gt; for a light-weight, built-in similarity matcher. Why are we doing this? Well, before reaching for a tool purpose-built for search such as &lt;a href=&quot;https://www.elastic.co&quot;&gt;Elasticsearch&lt;/a&gt;, potentially complicating development by adding another tool to your development stack, it&apos;s worth seeing if Postgres suits your application&apos;s needs! You may be surprised!&lt;/p&gt;
&lt;p&gt;In this article, we will look at how it works under the covers, and how to use it efficiently in your Rails app.&lt;/p&gt;
&lt;h2 id=&quot;what-are-trigrams&quot; &gt;&lt;a href=&quot;#what-are-trigrams&quot; aria-label=&quot;what are trigrams permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What are Trigrams?&lt;/h2&gt;
&lt;p&gt;Trigrams, a subset of &lt;a href=&quot;https://en.wikipedia.org/wiki/N-gram&quot;&gt;n-grams&lt;/a&gt;, break text down into groups of three consecutive letters. Let&apos;s see an example: &lt;code &gt;postgres&lt;/code&gt;. It is made up of &lt;strong&gt;six&lt;/strong&gt; groups: pos, ost, stg, tgr, gre, res.&lt;/p&gt;
&lt;p&gt;This process of breaking a piece of text into smaller groups allows you to compare the groups of one word to the groups of another word. Knowing how many groups are shared between the two words allows you to make a comparison between them based on how similar their groups are.&lt;/p&gt;
&lt;h3 id=&quot;postgres-trigram-example&quot; &gt;&lt;a href=&quot;#postgres-trigram-example&quot; aria-label=&quot;postgres trigram example permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Postgres Trigram example&lt;/h3&gt;
&lt;p&gt;Postgres&apos; &lt;code &gt;pg_trgm&lt;/code&gt; module comes with a number of functions and operators to compare strings. &lt;strong&gt;We&apos;ll look at the &lt;code &gt;show_trgm&lt;/code&gt; and &lt;code &gt;similarity&lt;/code&gt; functions, along with the &lt;code &gt;%&lt;/code&gt; operator below:&lt;/strong&gt;&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;select&lt;/span&gt;
  show_trgm&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;postgras&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; tri1&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-- {&quot;  p&quot;,&quot; po&quot;,&quot;as &quot;,gra,ost,pos,ras,stg,tgr}&lt;/span&gt;
  show_trgm&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;postgres&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; tri2&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-- {&quot;  p&quot;,&quot; po&quot;,&quot;es &quot;,gre,ost,pos,res,stg,tgr}&lt;/span&gt;
  similarity&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;postgras&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;&apos;postgres&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-- 0.5&lt;/span&gt;
  &lt;span &gt;&apos;postgras&apos;&lt;/span&gt; &lt;span &gt;%&lt;/span&gt; &lt;span &gt;&apos;postgres&apos;&lt;/span&gt; &lt;span &gt;-- TRUE&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code &gt;show_trgm&lt;/code&gt; function isn&apos;t one you&apos;d necessarily use day-to-day, but it&apos;s good to see how Postgres breaks a string down into trigrams. You&apos;ll notice something interesting here, that two spaces are added to the beginning of the string, and a single space is added to the end.&lt;/p&gt;
&lt;p&gt;This is done for a couple of reasons:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The first reason&lt;/strong&gt; is that it allows trigram calculations on words with less than three characters, such as &lt;code &gt;Hi&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Secondly&lt;/strong&gt;, it ensures the first and last characters are not overly de-emphasized for comparisons. If we used only strict triplets, the first and last letters in longer words would each occur in only a single group: with padding they occur in three (for the first letter) and two (for the last). The last letter is less important for matching, which means that &lt;code &gt;postgres&lt;/code&gt; and &lt;code &gt;postgrez&lt;/code&gt; are more similar than &lt;code &gt;postgres&lt;/code&gt; and &lt;code &gt;postgras&lt;/code&gt;, even though they are both off by a single character.&lt;/p&gt;
&lt;p&gt;The &lt;code &gt;similarity&lt;/code&gt; function compares the trigrams from two strings and outputs a similarity number between 1 and 0. 1 means a perfect match, and 0 means no shared trigrams.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lastly&lt;/strong&gt;, we have the &lt;code &gt;%&lt;/code&gt; operator, which gives you a boolean of whether two strings are similar. By default, Postgres uses the number 0.3 when making this decision, but you can always update this setting.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;ruby-trigram-example&quot; &gt;&lt;a href=&quot;#ruby-trigram-example&quot; aria-label=&quot;ruby trigram example permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Ruby Trigram example&lt;/h3&gt;
&lt;p&gt;You don&apos;t need to know how to build a trigram in order to use them in Postgres, but it doesn&apos;t hurt to dive deeper and expand your knowledge. Let&apos;s take a look at how to implement something similar ourselves in Ruby.&lt;/p&gt;
&lt;p&gt;The first method will take a string, and output an array of trigrams, adding two spaces to the front, and one to the back of the original string, just like Postgres does.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;trigram&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;word&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt; &lt;span &gt;if&lt;/span&gt; word&lt;span &gt;.&lt;/span&gt;strip &lt;span &gt;==&lt;/span&gt; &lt;span &gt;&quot;&quot;&lt;/span&gt;

  parts &lt;span &gt;=&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  padded &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&quot;  &lt;span &gt;&lt;span &gt;#{&lt;/span&gt;word&lt;span &gt;}&lt;/span&gt;&lt;/span&gt; &quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;downcase
  padded&lt;span &gt;.&lt;/span&gt;chars&lt;span &gt;.&lt;/span&gt;each_cons&lt;span &gt;(&lt;/span&gt;&lt;span &gt;3&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;w&lt;span &gt;|&lt;/span&gt; parts &lt;span &gt;&amp;lt;&lt;/span&gt;&lt;span &gt;&amp;lt;&lt;/span&gt; w&lt;span &gt;.&lt;/span&gt;join &lt;span &gt;}&lt;/span&gt;
  parts
&lt;span &gt;end&lt;/span&gt;

p trigram&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;postgras&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# [&quot;  p&quot;, &quot; po&quot;, &quot;pos&quot;, &quot;ost&quot;, &quot;stg&quot;, &quot;tgr&quot;, &quot;gra&quot;, &quot;ras&quot;, &quot;as &quot;]&lt;/span&gt;
p trigram&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;postgres&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# [&quot;  p&quot;, &quot; po&quot;, &quot;pos&quot;, &quot;ost&quot;, &quot;stg&quot;, &quot;tgr&quot;, &quot;gre&quot;, &quot;res&quot;, &quot;es &quot;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next up, we&apos;ll compare the trigrams from our two words together, giving a ratio of how similar they are:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;similarity&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;word1&lt;span &gt;,&lt;/span&gt; word2&lt;span &gt;)&lt;/span&gt;
  tri1 &lt;span &gt;=&lt;/span&gt; trigram&lt;span &gt;(&lt;/span&gt;word1&lt;span &gt;)&lt;/span&gt;
  tri2 &lt;span &gt;=&lt;/span&gt; trigram&lt;span &gt;(&lt;/span&gt;word2&lt;span &gt;)&lt;/span&gt;

  &lt;span &gt;return&lt;/span&gt; &lt;span &gt;0.0&lt;/span&gt; &lt;span &gt;if&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;tri1&lt;span &gt;,&lt;/span&gt; tri2&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;any&lt;span &gt;?&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;arr&lt;span &gt;|&lt;/span&gt; arr&lt;span &gt;.&lt;/span&gt;size &lt;span &gt;==&lt;/span&gt; &lt;span &gt;0&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;

  &lt;span &gt;# Find number of trigrams shared between them&lt;/span&gt;
  same_size &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;tri1 &lt;span &gt;&amp;amp;&lt;/span&gt; tri2&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;size
  &lt;span &gt;# Find unique total trigrams in both arrays&lt;/span&gt;
  all_size &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;tri1 &lt;span &gt;|&lt;/span&gt; tri2&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;size

  same_size&lt;span &gt;.&lt;/span&gt;to_f &lt;span &gt;/&lt;/span&gt; all_size
&lt;span &gt;end&lt;/span&gt;

p similarity&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;postgras&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;postgres&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# 0.5&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that we have our similarity calculator, we can implement a simple &lt;code &gt;similar?&lt;/code&gt; method, which checks if the similarity is above the threshold of 0.3:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;similar&lt;/span&gt;&lt;/span&gt;&lt;span &gt;?&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;word1&lt;span &gt;,&lt;/span&gt; word2&lt;span &gt;)&lt;/span&gt;
  similarity&lt;span &gt;(&lt;/span&gt;word1&lt;span &gt;,&lt;/span&gt; word2&lt;span &gt;)&lt;/span&gt; &lt;span &gt;&gt;=&lt;/span&gt; &lt;span &gt;0.3&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;

p similar&lt;span &gt;?&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;postgras&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;postgres&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;# true&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;using-trigrams-in-rails&quot; &gt;&lt;a href=&quot;#using-trigrams-in-rails&quot; aria-label=&quot;using trigrams in rails permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Using Trigrams in Rails&lt;/h2&gt;
&lt;p&gt;There aren&apos;t too many gotchas in order to use these similarity functions and operators within your Rails app, but there are a couple!&lt;/p&gt;
&lt;p&gt;Below we have a migration to create a &lt;code &gt;cities&lt;/code&gt; table. When indexing the &lt;code &gt;name&lt;/code&gt; column, to ensure that querying with the similarity operator stays fast, we&apos;ll need to ensure that we use either a &lt;code &gt;gin&lt;/code&gt; or &lt;code &gt;gist&lt;/code&gt; index. We do this by indicating &lt;code &gt;using: :gin&lt;/code&gt;. &lt;strong&gt;In addition to that, we have to pass the opclass option&lt;/strong&gt; &lt;code &gt;opclass: :gin_trgm_ops&lt;/code&gt;, so it knows which type of &lt;code &gt;gin&lt;/code&gt; index to create.&lt;/p&gt;
&lt;p&gt;Unless you have already enabled the &lt;code &gt;pg_trgm&lt;/code&gt; extension, you will most likely receive an error, but this is easily fixed by adding &lt;code &gt;enable_extension :pg_trgm&lt;/code&gt; to your migration.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;CreateCities&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Migration&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;6.0&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;change&lt;/span&gt;&lt;/span&gt;
    enable_extension &lt;span &gt;:pg_trgm&lt;/span&gt;

    create_table &lt;span &gt;:cities&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;t&lt;span &gt;|&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;string &lt;span &gt;:name&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;
      t&lt;span &gt;.&lt;/span&gt;timestamps
      t&lt;span &gt;.&lt;/span&gt;index &lt;span &gt;:name&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; opclass&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:gin_trgm_ops&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; using&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:gin&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that we have the &lt;code &gt;pg_trgm&lt;/code&gt; extension enabled, and have correctly indexed the table, we can use the similarity operator &lt;code &gt;%&lt;/code&gt; inside of our where clauses, such as in the scope below:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;City&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  scope &lt;span &gt;:name_similar&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; where&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;name % :name&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; name&lt;span &gt;:&lt;/span&gt; name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;

&lt;span &gt;City&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;name_similar&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;Torono&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;count
&lt;span &gt;# SELECT COUNT(*) FROM &quot;cities&quot; WHERE (name % &apos;Torono&apos;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;showing-the-closest-matches-for-a-term-based-on-its-similarity&quot; &gt;&lt;a href=&quot;#showing-the-closest-matches-for-a-term-based-on-its-similarity&quot; aria-label=&quot;showing the closest matches for a term based on its similarity permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Showing the closest matches for a term based on its similarity&lt;/h2&gt;
&lt;p&gt;We may not want to only limit by similarity using the &lt;code &gt;%&lt;/code&gt; operator, but also &lt;strong&gt;order the results from most similar to least similar&lt;/strong&gt;. Take the example query and its result below:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;select&lt;/span&gt; name&lt;span &gt;,&lt;/span&gt; similarity&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;Dease Lake&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;from&lt;/span&gt; cities
&lt;span &gt;where&lt;/span&gt; name &lt;span &gt;%&lt;/span&gt; &lt;span &gt;&apos;Dease Lake&apos;&lt;/span&gt;
&lt;span &gt;order&lt;/span&gt; &lt;span &gt;by&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;desc&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This query finds cities which have a name similar to &lt;code &gt;Dease Lake&lt;/code&gt;, but you can see that we actually get seven results back, though we can clearly see that there was an exact match. Ideally then, we wouldn&apos;t just limit our query by similarity, but put it in the correct order as well.&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;Dease Lake  1
Deer Lake   0.5
Lake Louise 0.375
Lynn Lake   0.33333334
Red Lake    0.33333334
Cat Lake    0.33333334
Baker Lake  0.3125&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can do this by updating our scope to order by similarity. We have to be careful about this, because in order to use the similarity function, we need to pass in the user input of &lt;code &gt;&apos;Dease Lake&apos;&lt;/code&gt;. To avoid SQL injection attacks and to ensure safe string quoting, we&apos;ll use the &lt;code &gt;quote_string&lt;/code&gt; method from ActiveRecord::Base.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;City&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;ApplicationRecord&lt;/span&gt;
  scope &lt;span &gt;:name_similar&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    quoted_name &lt;span &gt;=&lt;/span&gt; &lt;span &gt;ActiveRecord&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Base&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;connection&lt;span &gt;.&lt;/span&gt;quote_string&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;)&lt;/span&gt;
    where&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;name % :name&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; name&lt;span &gt;:&lt;/span&gt; name&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      order&lt;span &gt;(&lt;/span&gt;&lt;span &gt;Arel&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;sql&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;similarity(name, &apos;&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;quoted_name&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&apos;) DESC&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now when we use the &lt;code &gt;name_similar&lt;/code&gt; scope, the result will be ordered with the most similar city first, allowing us to find &lt;code &gt;Dease Lake&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;City&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;name_similar&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;Dease Lake&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;.&lt;/span&gt;name
&lt;span &gt;# =&gt; Dease Lake&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And the SQL produced looks like:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;cities&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;cities&quot;&lt;/span&gt;
&lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;name &lt;span &gt;%&lt;/span&gt; &lt;span &gt;&apos;Dease Lake&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; similarity&lt;span &gt;(&lt;/span&gt;name&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&apos;Dease Lake&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;DESC&lt;/span&gt;
&lt;span &gt;LIMIT&lt;/span&gt; $&lt;span &gt;1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, we took a dive into the &lt;code &gt;pg_trgm&lt;/code&gt; extension, seeing first what trigrams actually are, and then how we can practically use similarity functions and operators in our Rails apps. &lt;strong&gt;This allows us to improve keyword searching&lt;/strong&gt;, by finding similar, rather than exact matches. We also managed to accomplish all of this &lt;strong&gt;without adding an additional backend service&lt;/strong&gt;, or too much additional complexity to our application.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article we&apos;d appreciate it if you&apos;d &lt;a href=&quot;https://ctt.ac/LZ466&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/advanced-database-programming-rails-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        title=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        src=&quot;https://pganalyze.com/static/24260e03f3c098e161f84b87ce28122b/acb04/ebook_promo_advanced_database_programming_rails_postgres.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Leigh Halliday is a guest author for the pganalyze blog. He is a developer based out of Canada who works at &lt;a href=&quot;https://www.flipgive.com&quot;&gt;FlipGive&lt;/a&gt; as a full-stack developer. He writes about Ruby and React on &lt;a href=&quot;https://www.leighhalliday.com&quot;&gt;his blog&lt;/a&gt; and publishes React tutorials on &lt;a href=&quot;https://youtube.com/leighhalliday&quot;&gt;YouTube&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Efficient GraphQL queries in Ruby on Rails & Postgres]]></title><description><![CDATA[GraphQL puts the user in control of their own destiny. Yes, they are confined to your schema, but beyond that they can access the data in any which way. Will they ask only for the "events", or also for the "category" of each event? We don't really know! In REST based APIs we know ahead of time what will be rendered, and can plan ahead by generating the required data efficiently, often by eager-loading the data we know we'll need. In this article, we will discuss what N+1 queries are, how they…]]></description><link>https://pganalyze.com/blog/efficient-graphql-queries-in-ruby-on-rails-and-postgres</link><guid isPermaLink="false">https://pganalyze.com/blog/efficient-graphql-queries-in-ruby-on-rails-and-postgres</guid><dc:creator><![CDATA[Leigh Halliday]]></dc:creator><pubDate>Tue, 24 Sep 2019 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;GraphQL puts the user in control of their own destiny. Yes, they are confined to your schema, but beyond that they can access the data in any which way. Will they ask only for the &quot;events&quot;, or also for the &quot;category&quot; of each event? We don&apos;t really know! In REST based APIs we know ahead of time what will be rendered, and can plan ahead by generating the required data efficiently, often by &lt;a href=&quot;https://apidock.com/rails/ActiveRecord/QueryMethods/includes&quot;&gt;eager-loading&lt;/a&gt; the data we know we&apos;ll need.&lt;/p&gt;
&lt;p&gt;In this article, we will discuss what &lt;strong&gt;N+1 queries&lt;/strong&gt; are, how they are easily produced in &lt;a href=&quot;https://graphql.org/learn/&quot;&gt;GraphQL&lt;/a&gt;, and how to solve them using the &lt;a href=&quot;https://github.com/Shopify/graphql-batch&quot;&gt;graphql-batch&lt;/a&gt; gem along with a few custom batch loaders.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/pganalyze/graphql-batch-example&quot;&gt;source code for this article&lt;/a&gt; is available on GitHub.&lt;/p&gt;
&lt;div &gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-are-n1-queries&quot;&gt;What are N+1 queries?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#n1-queries-in-graphql&quot;&gt;N+1 queries in GraphQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#optimizing-graphql-queries&quot;&gt;Optimizing GraphQL queries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#batch-loading-single-records&quot;&gt;Batch loading single records&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#batch-loading-many-records&quot;&gt;Batch loading many records&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#batch-loading-many-records-more-efficiently&quot;&gt;Batch loading many records more efficiently&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#batch-loading-active-storage-attachments&quot;&gt;Batch loading active storage attachments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#about-the-author&quot;&gt;About the Author&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;what-are-n1-queries&quot; &gt;&lt;a href=&quot;#what-are-n1-queries&quot; aria-label=&quot;what are n1 queries permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What are N+1 queries?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.sitepoint.com/silver-bullet-n1-problem/&quot;&gt;N+1 queries&lt;/a&gt; can occur when you have one-to-many relationships in your models. Each &lt;code &gt;Event&lt;/code&gt; belongs to a &lt;code &gt;Category&lt;/code&gt;. Let&apos;s say that you find the last five events and you want to get the &lt;strong&gt;category name&lt;/strong&gt; for each of them.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Event&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;last&lt;span &gt;(&lt;/span&gt;&lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;event&lt;span &gt;|&lt;/span&gt; puts event&lt;span &gt;.&lt;/span&gt;category&lt;span &gt;.&lt;/span&gt;name &lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Seems simple enough! We unfortunately just produced six queries. The first query to find the events, and another query to find each category&apos;s name. This is an easy problem to solve in Rails by using &lt;strong&gt;eager-loading&lt;/strong&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;Event&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;includes&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:category&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;last&lt;span &gt;(&lt;/span&gt;&lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;event&lt;span &gt;|&lt;/span&gt; puts event&lt;span &gt;.&lt;/span&gt;category&lt;span &gt;.&lt;/span&gt;name &lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By using the &lt;code &gt;includes&lt;/code&gt; method we&apos;ve been able to knock our queries down from six to two: The first to find the events, and the second to find the categories for those events.&lt;/p&gt;
&lt;h2 id=&quot;n1-queries-in-graphql&quot; &gt;&lt;a href=&quot;#n1-queries-in-graphql&quot; aria-label=&quot;n1 queries in graphql permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;N+1 queries in GraphQL&lt;/h2&gt;
&lt;p&gt;As we mentioned earlier, in GraphQL, the user is in charge of their own destiny. They may or may not ask for the category name for each event. The query below will produce N+1 SQL queries as it finds the category for each event:&lt;/p&gt;
&lt;div  data-language=&quot;graphql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;events&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;id&lt;/span&gt;
    &lt;span &gt;name&lt;/span&gt;
    &lt;span &gt;category&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      &lt;span &gt;id&lt;/span&gt;
      &lt;span &gt;name&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;optimizing-graphql-queries&quot; &gt;&lt;a href=&quot;#optimizing-graphql-queries&quot; aria-label=&quot;optimizing graphql queries permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Optimizing GraphQL queries&lt;/h2&gt;
&lt;p&gt;Yes, we could solve the N+1 query in the previous example by eager-loading the &lt;code &gt;category&lt;/code&gt; relationship, but if the user didn&apos;t actually want the category, why load it? We don&apos;t know what the user will ask for. There just so happens to be a better way, by &lt;strong&gt;lazy-loading&lt;/strong&gt; data only as its needed using the graphql-batch gem.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/__xuorig__&quot;&gt;Marc-André Giroux&lt;/a&gt; has written a very thorough article about the &lt;a href=&quot;https://medium.com/@__xuorig__/the-graphql-dataloader-pattern-visualized-3064a00f319f&quot;&gt;GraphQL Dataloader Pattern&lt;/a&gt; which I highly recommend reading before continuing.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/advanced-database-programming-rails-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        title=&quot;Download Free eBook: Advanced Database Programming with Rails and Postgres&quot;
        src=&quot;https://pganalyze.com/static/24260e03f3c098e161f84b87ce28122b/acb04/ebook_promo_advanced_database_programming_rails_postgres.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;batch-loading-single-records&quot; &gt;&lt;a href=&quot;#batch-loading-single-records&quot; aria-label=&quot;batch loading single records permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Batch loading single records&lt;/h2&gt;
&lt;p&gt;The simplest case for batch loading data is the example of each event belonging to a category. Inside of our &lt;code &gt;EventType&lt;/code&gt; class, there is a field called &lt;code &gt;category&lt;/code&gt; which allows the user to access the category of an event.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;EventType&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;BaseObject&lt;/span&gt;
  field &lt;span &gt;:category&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;CategoryType&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;category&lt;/span&gt;&lt;/span&gt;
    &lt;span &gt;# avoid `object.category`&lt;/span&gt;
    &lt;span &gt;RecordLoader&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;for&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;Category&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;load&lt;span &gt;(&lt;/span&gt;object&lt;span &gt;.&lt;/span&gt;category_id&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By using the &lt;code &gt;RecordLoader&lt;/code&gt; class to load the category, we actually avoid loading the category right away, and instead load all of the required categories with a single query. The query it ends up producing may end up looking like:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;categories&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;categories&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;&quot;categories&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;IN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;$&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;3&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;4&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Looking at the &lt;code &gt;RecordLoader&lt;/code&gt; class we can see how it works. The &lt;code &gt;perform&lt;/code&gt; method will receive all of the ids for a single model (Category in this case), load the records in a single SQL query, and then call the fulfill method for each of them. The &lt;code &gt;fulfill&lt;/code&gt; method resolves the promise, which is basically like putting a face to a name... you gave me an ID, and I&apos;ve fulfilled my promise to provide you with the corresponding record.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;RecordLoader&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;GraphQL&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Batch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Loader&lt;/span&gt;
  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;initialize&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;model&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;@model&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; model
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;perform&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;ids&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;# Find all ids for this model and fulfill their promises&lt;/span&gt;
    &lt;span &gt;@model&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;id&lt;span &gt;:&lt;/span&gt; ids&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;record&lt;span &gt;|&lt;/span&gt; fulfill&lt;span &gt;(&lt;/span&gt;record&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; record&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
    &lt;span &gt;# Handle cases where a record was not found and fulfill the value as nil&lt;/span&gt;
    ids&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;id&lt;span &gt;|&lt;/span&gt; fulfill&lt;span &gt;(&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; &lt;span &gt;nil&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;unless&lt;/span&gt; fulfilled&lt;span &gt;?&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can write a test for this class to ensure that it finds the records correctly, keeping in mind that in order for the lazy/promise based code to function correctly it needs to be wrapped inside something called an &lt;code &gt;executor&lt;/code&gt;.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;describe &lt;span &gt;RecordLoader&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
  it &lt;span &gt;&apos;loads&apos;&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
    event &lt;span &gt;=&lt;/span&gt; create&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:event&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    result &lt;span &gt;=&lt;/span&gt; &lt;span &gt;GraphQL&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Batch&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;batch &lt;span &gt;do&lt;/span&gt;
      &lt;span &gt;RecordLoader&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;for&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;Event&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;load&lt;span &gt;(&lt;/span&gt;event&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
    expect&lt;span &gt;(&lt;/span&gt;result&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to eq&lt;span &gt;(&lt;/span&gt;event&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;batch-loading-many-records&quot; &gt;&lt;a href=&quot;#batch-loading-many-records&quot; aria-label=&quot;batch loading many records permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Batch loading many records&lt;/h2&gt;
&lt;p&gt;We&apos;ve covered the case where we are batch loading &lt;strong&gt;a single record at a time&lt;/strong&gt;, but how do we handle the reverse scenario? We are displaying categories along with the first five events for each category, which would also produce an N+1 query, so let&apos;s see how we can solve it using a batch loader. The query we&apos;re discussing would look something like this:&lt;/p&gt;
&lt;div  data-language=&quot;graphql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;{&lt;/span&gt;
  &lt;span &gt;categories&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
    &lt;span &gt;id&lt;/span&gt;
    &lt;span &gt;name&lt;/span&gt;
    &lt;span &gt;events&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;first&lt;/span&gt;&lt;span &gt;:&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;{&lt;/span&gt;
      &lt;span &gt;id&lt;/span&gt;
      &lt;span &gt;name&lt;/span&gt;
    &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;}&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I have created a custom loader called &lt;code &gt;ForeignKeyLoader&lt;/code&gt; for this purpose. It will load the &lt;code &gt;events&lt;/code&gt; using the foreign key &lt;code &gt;category_id&lt;/code&gt;. I also added the ability to pass a lambda to merge in additional scopes into the query that will be run.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;CategoryType&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;BaseObject&lt;/span&gt;
  field &lt;span &gt;:events&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;EventType&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
    argument &lt;span &gt;:first&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;Int&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; required&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; default_value&lt;span &gt;:&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;events&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;first&lt;span &gt;:&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;ForeignKeyLoader&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;for&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;Event&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:category_id&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; merge&lt;span &gt;:&lt;/span&gt; &lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; order&lt;span &gt;(&lt;/span&gt;id&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:asc&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      load&lt;span &gt;(&lt;/span&gt;object&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;then&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;records&lt;span &gt;|&lt;/span&gt;
        records&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;(&lt;/span&gt;first&lt;span &gt;)&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The query that gets produced looks something like:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;events&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;events&quot;&lt;/span&gt;
&lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;&quot;events&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;category_id&quot;&lt;/span&gt; &lt;span &gt;IN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;$&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;3&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;4&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; $&lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; &lt;span &gt;&quot;events&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;ASC&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice in this case that we call the &lt;code &gt;then&lt;/code&gt; method to execute some code after the promise has been resolved. Here we see the first issue with this method... we only wanted five events for each category, but our query will load &lt;strong&gt;ALL&lt;/strong&gt; events for each category, and then, using the &lt;code &gt;first&lt;/code&gt; method on the resulting Array, narrow it down to only the first five events. If there are thousands of events, we could run into some serious issues.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;ForeignKeyLoader&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;GraphQL&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Batch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Loader&lt;/span&gt;
  attr_reader &lt;span &gt;:model&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:foreign_key&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:merge&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;loader_key_for&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;group_args&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;# avoiding including the `merge` lambda in loader key&lt;/span&gt;
    &lt;span &gt;# each lambda is unique which defeats the purpose of&lt;/span&gt;
    &lt;span &gt;# grouping queries together&lt;/span&gt;
    &lt;span &gt;[&lt;/span&gt;&lt;span &gt;self&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;concat&lt;span &gt;(&lt;/span&gt;group_args&lt;span &gt;.&lt;/span&gt;slice&lt;span &gt;(&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;initialize&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;model&lt;span &gt;,&lt;/span&gt; foreign_key&lt;span &gt;,&lt;/span&gt; merge&lt;span &gt;:&lt;/span&gt; &lt;span &gt;nil&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;@model&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; model
    &lt;span &gt;@foreign_key&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; foreign_key
    &lt;span &gt;@merge&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; merge
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;perform&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;foreign_ids&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;# find all the records&lt;/span&gt;
    scope &lt;span &gt;=&lt;/span&gt; model&lt;span &gt;.&lt;/span&gt;where&lt;span &gt;(&lt;/span&gt;foreign_key &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; foreign_ids&lt;span &gt;)&lt;/span&gt;
    scope &lt;span &gt;=&lt;/span&gt; scope&lt;span &gt;.&lt;/span&gt;merge&lt;span &gt;(&lt;/span&gt;merge&lt;span &gt;)&lt;/span&gt; &lt;span &gt;if&lt;/span&gt; merge&lt;span &gt;.&lt;/span&gt;present&lt;span &gt;?&lt;/span&gt;
    records &lt;span &gt;=&lt;/span&gt; scope&lt;span &gt;.&lt;/span&gt;to_a

    foreign_ids&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;foreign_id&lt;span &gt;|&lt;/span&gt;
      &lt;span &gt;# find the records required to fulfill each promise&lt;/span&gt;
      matching_records &lt;span &gt;=&lt;/span&gt; records&lt;span &gt;.&lt;/span&gt;select &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;r&lt;span &gt;|&lt;/span&gt;
        foreign_id &lt;span &gt;==&lt;/span&gt; r&lt;span &gt;.&lt;/span&gt;send&lt;span &gt;(&lt;/span&gt;foreign_key&lt;span &gt;)&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
      fulfill&lt;span &gt;(&lt;/span&gt;foreign_id&lt;span &gt;,&lt;/span&gt; matching_records&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;batch-loading-many-records-more-efficiently&quot; &gt;&lt;a href=&quot;#batch-loading-many-records-more-efficiently&quot; aria-label=&quot;batch loading many records more efficiently permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Batch loading many records more efficiently&lt;/h2&gt;
&lt;p&gt;It turns out that there &lt;em&gt;is&lt;/em&gt; a way to perform a query that says &quot;find me the first N records for each X&quot; (find me the first 5 records for each category), and that involves using &lt;a href=&quot;https://www.postgresql.org/docs/9.3/functions-window.html&quot;&gt;Postgres Window Functions&lt;/a&gt;. While researching this concept, this article about &lt;a href=&quot;https://spin.atomicobject.com/2016/03/12/select-top-n-per-group-postgresql/&quot;&gt;window functions&lt;/a&gt; was useful along with this article about bringing &lt;a href=&quot;https://blog.codeship.com/folding-postgres-window-functions-into-rails/&quot;&gt;window functions into Rails&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The following query produces the data that we want... we just need to figure out how to write a batch loader that generates the same result.&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;events&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;
&lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
  &lt;span &gt;SELECT&lt;/span&gt;
    &lt;span &gt;*&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
    row_number&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;OVER&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
      &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; category_id &lt;span &gt;ORDER&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; start_time &lt;span &gt;desc&lt;/span&gt;
    &lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; rank
  &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;events&quot;&lt;/span&gt;
  &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;&quot;events&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;category_id&quot;&lt;/span&gt; &lt;span &gt;IN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;3&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;4&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;)&lt;/span&gt; &lt;span &gt;as&lt;/span&gt; events
&lt;span &gt;WHERE&lt;/span&gt; rank &lt;span &gt;&amp;lt;=&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For this we&apos;ll create a batch loader called &lt;code &gt;WindowKeyLoader&lt;/code&gt; which is used like:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;CategoryType&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;BaseObject&lt;/span&gt;
  field &lt;span &gt;:events&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;[&lt;/span&gt;&lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;EventType&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
    argument &lt;span &gt;:first&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;Int&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; required&lt;span &gt;:&lt;/span&gt; &lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; default_value&lt;span &gt;:&lt;/span&gt; &lt;span &gt;5&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;events&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;first&lt;span &gt;:&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;WindowKeyLoader&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;for&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;Event&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:category_id&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      limit&lt;span &gt;:&lt;/span&gt; first&lt;span &gt;,&lt;/span&gt;
      order_col&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:start_time&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      order_dir&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:desc&lt;/span&gt;
    &lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;load&lt;span &gt;(&lt;/span&gt;object&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see the difference already. I am no longer required to slice the first N array elements in the &lt;code &gt;then&lt;/code&gt; block of the resolved promise. The actual batch loader class looks like:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;WindowKeyLoader&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;GraphQL&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Batch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Loader&lt;/span&gt;
  attr_reader &lt;span &gt;:model&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:foreign_key&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:limit&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:order_col&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:order_dir&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;initialize&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;model&lt;span &gt;,&lt;/span&gt; foreign_key&lt;span &gt;,&lt;/span&gt; limit&lt;span &gt;:&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; order_col&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:id&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; order_dir&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:asc&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;@model&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; model
    &lt;span &gt;@foreign_key&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; foreign_key
    &lt;span &gt;@limit&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; limit
    &lt;span &gt;@order_col&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; order_col
    &lt;span &gt;@order_dir&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; order_dir
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;perform&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;foreign_ids&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;# build the sub-query, limiting results by foreign key at this point&lt;/span&gt;
    &lt;span &gt;# we don&apos;t want to execute this query but get its SQL to be used later&lt;/span&gt;
    ranked_from &lt;span &gt;=&lt;/span&gt; model&lt;span &gt;.&lt;/span&gt;
      select&lt;span &gt;(&lt;/span&gt;&quot;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        row_number&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;OVER&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;
          &lt;span &gt;PARTITION&lt;/span&gt; &lt;span &gt;BY&lt;/span&gt; &lt;span &gt;#{foreign_key} ORDER BY #{order_col} #{order_dir}&lt;/span&gt;
        &lt;span &gt;)&lt;/span&gt; as rank&quot;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      where&lt;span &gt;(&lt;/span&gt;foreign_key &lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt; foreign_ids&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      to_sql

    &lt;span &gt;# use the sub-query from above to query records which have a rank&lt;/span&gt;
    &lt;span &gt;# value less than or equal to our limit&lt;/span&gt;
    records &lt;span &gt;=&lt;/span&gt; model&lt;span &gt;.&lt;/span&gt;
      from&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;(&lt;span &gt;&lt;span &gt;#{&lt;/span&gt;ranked_from&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;) as &lt;span &gt;&lt;span &gt;#{&lt;/span&gt;model&lt;span &gt;.&lt;/span&gt;table_name&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      where&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;rank &amp;lt;= &lt;span &gt;&lt;span &gt;#{&lt;/span&gt;limit&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      to_a

    &lt;span &gt;# match records and fulfill promises&lt;/span&gt;
    foreign_ids&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;foreign_id&lt;span &gt;|&lt;/span&gt;
      matching_records &lt;span &gt;=&lt;/span&gt; records&lt;span &gt;.&lt;/span&gt;select &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;r&lt;span &gt;|&lt;/span&gt;
        foreign_id &lt;span &gt;==&lt;/span&gt; r&lt;span &gt;.&lt;/span&gt;send&lt;span &gt;(&lt;/span&gt;foreign_key&lt;span &gt;)&lt;/span&gt;
      &lt;span &gt;end&lt;/span&gt;
      fulfill&lt;span &gt;(&lt;/span&gt;foreign_id&lt;span &gt;,&lt;/span&gt; matching_records&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We&apos;re able to test the &lt;code &gt;WindowKeyLoader&lt;/code&gt; by creating three events for a category but only asking for the first two of them:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;describe &lt;span &gt;WindowKeyLoader&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
  it &lt;span &gt;&apos;loads&apos;&lt;/span&gt; &lt;span &gt;do&lt;/span&gt;
    category &lt;span &gt;=&lt;/span&gt; create&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:category&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    events &lt;span &gt;=&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;1.&lt;/span&gt;&lt;span &gt;.3&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to_a&lt;span &gt;.&lt;/span&gt;map &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;n&lt;span &gt;|&lt;/span&gt;
      create&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:event&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; name&lt;span &gt;:&lt;/span&gt; &lt;span &gt;&quot;Event &lt;span &gt;&lt;span &gt;#{&lt;/span&gt;n&lt;span &gt;}&lt;/span&gt;&lt;/span&gt;&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; category&lt;span &gt;:&lt;/span&gt; category&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;

    result &lt;span &gt;=&lt;/span&gt; &lt;span &gt;GraphQL&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Batch&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;batch &lt;span &gt;do&lt;/span&gt;
      &lt;span &gt;WindowKeyLoader&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;for&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;
        &lt;span &gt;Event&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        &lt;span &gt;:category_id&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
        limit&lt;span &gt;:&lt;/span&gt; &lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; order_col&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:id&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; order_dir&lt;span &gt;:&lt;/span&gt; &lt;span &gt;:asc&lt;/span&gt;
      &lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;load&lt;span &gt;(&lt;/span&gt;category&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;

    expect&lt;span &gt;(&lt;/span&gt;result&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;to eq&lt;span &gt;(&lt;/span&gt;events&lt;span &gt;.&lt;/span&gt;first&lt;span &gt;(&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;batch-loading-active-storage-attachments&quot; &gt;&lt;a href=&quot;#batch-loading-active-storage-attachments&quot; aria-label=&quot;batch loading active storage attachments permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Batch loading active storage attachments&lt;/h2&gt;
&lt;p&gt;You may run into situations where you&apos;re loading polymorphic data, or other types of relationships which don&apos;t exactly fit into the mold of your standard has-many or belongs-to relationships. One case is with &lt;a href=&quot;https://edgeguides.rubyonrails.org/active_storage_overview.html&quot;&gt;ActiveStorage&lt;/a&gt;. In the code below we&apos;ll load an image URL for an event:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;EventType&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;Types&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;BaseObject&lt;/span&gt;
  field &lt;span &gt;:image&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;String&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; null&lt;span &gt;:&lt;/span&gt; &lt;span &gt;true&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;image&lt;/span&gt;&lt;/span&gt;
    &lt;span &gt;# produces 2N + 1 queries... yikes!&lt;/span&gt;
    &lt;span &gt;# url_for(object.image.variant({ quality: 75 }))&lt;/span&gt;

    &lt;span &gt;AttachmentLoader&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;for&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:Event&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:image&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;load&lt;span &gt;(&lt;/span&gt;object&lt;span &gt;.&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;then&lt;/span&gt; &lt;span &gt;do&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;image&lt;span &gt;|&lt;/span&gt;
      url_for&lt;span &gt;(&lt;/span&gt;image&lt;span &gt;.&lt;/span&gt;variant&lt;span &gt;(&lt;/span&gt;&lt;span &gt;{&lt;/span&gt; quality&lt;span &gt;:&lt;/span&gt; &lt;span &gt;75&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;end&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This data is stored using a polymorphic relationship that loads an &lt;code &gt;ActiveStorage::Attachment&lt;/code&gt; record, which then needs to load an &lt;code &gt;ActiveStorage::Blob&lt;/code&gt; record in order to produce the image url. It ends up producing a 2N + 1 query... yikes! Our &lt;code &gt;AttachmentLoader&lt;/code&gt; is able to completely optimize this field by cutting it down to just two queries to load as many images as you&apos;d like.&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;class&lt;/span&gt; &lt;span &gt;AttachmentLoader&lt;/span&gt; &lt;span &gt;&amp;lt;&lt;/span&gt; &lt;span &gt;GraphQL&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Batch&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Loader&lt;/span&gt;
  attr_reader &lt;span &gt;:record_type&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;:attachment_name&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;initialize&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;record_type&lt;span &gt;,&lt;/span&gt; attachment_name&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;@record_type&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; record_type
    &lt;span &gt;@attachment_name&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; attachment_name
  &lt;span &gt;end&lt;/span&gt;

  &lt;span &gt;def&lt;/span&gt; &lt;span &gt;&lt;span &gt;perform&lt;/span&gt;&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;record_ids&lt;span &gt;)&lt;/span&gt;
    &lt;span &gt;# find records and fulfill promises&lt;/span&gt;
    &lt;span &gt;ActiveStorage&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;:&lt;/span&gt;&lt;span &gt;Attachment&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      includes&lt;span &gt;(&lt;/span&gt;&lt;span &gt;:blob&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      where&lt;span &gt;(&lt;/span&gt;record_type&lt;span &gt;:&lt;/span&gt; record_type&lt;span &gt;,&lt;/span&gt; record_id&lt;span &gt;:&lt;/span&gt; record_ids&lt;span &gt;,&lt;/span&gt; name&lt;span &gt;:&lt;/span&gt; attachment_name&lt;span &gt;)&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
      &lt;span &gt;each&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;record&lt;span &gt;|&lt;/span&gt; fulfill&lt;span &gt;(&lt;/span&gt;record&lt;span &gt;.&lt;/span&gt;record_id&lt;span &gt;,&lt;/span&gt; record&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;

    &lt;span &gt;# fulfill unfound records&lt;/span&gt;
    record_ids&lt;span &gt;.&lt;/span&gt;&lt;span &gt;each&lt;/span&gt; &lt;span &gt;{&lt;/span&gt; &lt;span &gt;|&lt;/span&gt;id&lt;span &gt;|&lt;/span&gt; fulfill&lt;span &gt;(&lt;/span&gt;id&lt;span &gt;,&lt;/span&gt; &lt;span &gt;nil&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;unless&lt;/span&gt; fulfilled&lt;span &gt;?&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;id&lt;span &gt;)&lt;/span&gt; &lt;span &gt;}&lt;/span&gt;
  &lt;span &gt;end&lt;/span&gt;
&lt;span &gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this case we &lt;em&gt;are&lt;/em&gt; taking advantage of eager-loading, because for each attachment we will need its corresponding blob record.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/efficient-search-in-rails-with-postgres&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        title=&quot;Download Free eBook: Efficient Search in Rails with Postgres&quot;
        src=&quot;https://pganalyze.com/static/3e8bb134d6b5689ee9d20a10e6699b6c/acb04/ebook_promo_rails_search.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;GraphQL &lt;em&gt;can&lt;/em&gt; be as efficient as REST, but requires approaching optimizations from a different angle. Instead of upfront optimizations, we lazy-load data only when required, loading it in batches to avoid excess trips to the database. In this article, we covered techniques to load single records, multiple records, and records with different types of relationships, as is the case with Active Storage which has a polymorphic relationship.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Share this article:&lt;/strong&gt; If you liked this article we&apos;d appreciate it if you&apos;d &lt;a href=&quot;https://ctt.ac/Esi54&quot;&gt;tweet it to your peers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;about-the-author&quot; &gt;&lt;a href=&quot;#about-the-author&quot; aria-label=&quot;about the author permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;About the Author&lt;/h2&gt;
&lt;p&gt;Leigh Halliday is a guest author for the pganalyze blog. He is a developer based out of Canada who works at &lt;a href=&quot;https://www.flipgive.com&quot;&gt;FlipGive&lt;/a&gt; as a full-stack developer. He writes about Ruby and React on &lt;a href=&quot;https://www.leighhalliday.com&quot;&gt;his blog&lt;/a&gt; and publishes React tutorials on &lt;a href=&quot;https://youtube.com/leighhalliday&quot;&gt;YouTube&lt;/a&gt;.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures]]></title><description><![CDATA[Everyone’s favorite database, PostgreSQL, has a new release coming out soon: Postgres 11 In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance.  Just-In-Time compilation (JIT) in Postgres 11 Just-In-Time compilation (JIT) for query execution was added in Postgres 11. It's not going to be enabled for queries by default, similar to parallel…]]></description><link>https://pganalyze.com/blog/postgres11-jit-compilation-auto-prewarm-sql-stored-procedures</link><guid isPermaLink="false">https://pganalyze.com/blog/postgres11-jit-compilation-auto-prewarm-sql-stored-procedures</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Thu, 04 Oct 2018 12:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Everyone’s favorite database, PostgreSQL, has a new release coming out soon: &lt;strong&gt;&lt;a href=&quot;https://www.postgresql.org/docs/11/static/release-11.html&quot;&gt;Postgres 11&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/f510dbb938b762fb8a629528636a45d6/09ede/jit_performance.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;JIT Performance in Postgres 11&quot;
        title=&quot;JIT Performance in Postgres 11&quot;
        src=&quot;https://pganalyze.com/static/f510dbb938b762fb8a629528636a45d6/1d69c/jit_performance.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2 id=&quot;just-in-time-compilation-jit-in-postgres-11&quot; &gt;&lt;a href=&quot;#just-in-time-compilation-jit-in-postgres-11&quot; aria-label=&quot;just in time compilation jit in postgres 11 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Just-In-Time compilation (JIT) in Postgres 11&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.postgresql.org/docs/11/static/jit-reason.html&quot;&gt;Just-In-Time compilation (JIT)&lt;/a&gt; for query execution was added in Postgres 11. It&apos;s not going to be enabled for queries by default, similar to parallel query in Postgres 9.6, but can be very helpful for CPU-bound workloads and analytical queries.&lt;/p&gt;
&lt;p&gt;Specifically, JIT currently aims to optimize two essential parts of query execution: Expression evaluation and tuple deforming. To quote the &lt;a href=&quot;https://www.postgresql.org/docs/11/static/jit-reason.html&quot;&gt;Postgres documentation&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Expression evaluation&lt;/strong&gt; is used to evaluate WHERE clauses, target lists, aggregates and projections. It can be accelerated by generating code specific to each case.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tuple deforming&lt;/strong&gt; is the process of transforming an on-disk tuple into its in-memory representation. It can be accelerated by creating a function specific to the table layout and the number of columns to be extracted.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Often you will have a workload that is mixed, where some queries will benefit from JIT, and some will be slowed down by the overhead.&lt;/p&gt;
&lt;p&gt;Here is how you can monitor JIT performance using EXPLAIN and &lt;code &gt;auto_explain&lt;/code&gt;, as well as how you can determine whether your queries are benefiting from JIT optimization.&lt;/p&gt;
&lt;h3 id=&quot;monitoring-jit-with-explain--auto_explain&quot; &gt;&lt;a href=&quot;#monitoring-jit-with-explain--auto_explain&quot; aria-label=&quot;monitoring jit with explain  auto_explain permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Monitoring JIT with EXPLAIN / auto_explain&lt;/h3&gt;
&lt;p&gt;First of all, you will need to make sure that your Postgres packages are compiled with JIT support (&lt;code &gt;--with-llvm&lt;/code&gt; configuration switch). Assuming that you have Postgres binaries compiled like that, the &lt;code &gt;jit&lt;/code&gt; configuration parameter controls whether JIT is actually being used.&lt;/p&gt;
&lt;p&gt;For this example, we’re working with one of our staging databases, and pick a relatively simple query that can benefit from JIT:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; log_lines
 &lt;span &gt;WHERE&lt;/span&gt; log_classification &lt;span &gt;=&lt;/span&gt; &lt;span &gt;65&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;details&lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&apos;new_dead_tuples&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;::&lt;span &gt;integer&lt;/span&gt; &lt;span &gt;&gt;=&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For context, the table &lt;code &gt;log_lines&lt;/code&gt; is an internal log event statistics table of pganalyze, which is typically indexed per-server, but in this case we want to run an analytical query across all servers to count interesting &lt;a src=&quot;https://pganalyze.com/docs/log-insights/autovacuum/A65&quot;&gt;autovacuum completed log events&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First, if we run the query with &lt;code &gt;jit = off&lt;/code&gt;, we will get an execution plan and runtime like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;ANALYZE&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; BUFFERS&lt;span &gt;)&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; log_lines
    &lt;span &gt;WHERE&lt;/span&gt; log_classification &lt;span &gt;=&lt;/span&gt; &lt;span &gt;65&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;details&lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&apos;new_dead_tuples&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;::&lt;span &gt;integer&lt;/span&gt; &lt;span &gt;&gt;=&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                        QUERY PLAN                                                        │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=649724.03..649724.04 rows=1 width=8) (actual time=3498.939..3498.939 rows=1 loops=1)                    │
│   Buffers: shared hit=1538 read=386328                                                                                   │
│   I/O Timings: read=1098.036                                                                                             │
│   -&amp;gt;  Seq Scan on log_lines  (cost=0.00..649675.55 rows=19393 width=0) (actual time=0.028..3437.032 rows=667063 loops=1) │
│         Filter: ((log_classification = 65) AND (((details -&amp;gt;&amp;gt; &amp;#39;new_dead_tuples&amp;#39;::text))::integer &amp;gt;= 0))                  │
│         Rows Removed by Filter: 14396065                                                                                 │
│         Buffers: shared hit=1538 read=386328                                                                             │
│         I/O Timings: read=1098.036                                                                                       │
│ Planning Time: 0.095 ms                                                                                                  │
│ Execution Time: 3499.089 ms                                                                                              │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

Time: 3499.580 ms (00:03.500)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note the usage of EXPLAIN&apos;s &lt;code &gt;BUFFERS&lt;/code&gt; option so we can compare whether any caching behavior affects our benchmarking. We can also see that I/O time was 1,098 ms out of 3,499 ms, so this query is definitely CPU bound.&lt;/p&gt;
&lt;p&gt;For comparison, when we enable JIT, we can see the following:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SET&lt;/span&gt; jit &lt;span &gt;=&lt;/span&gt; &lt;span &gt;on&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;EXPLAIN&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;&lt;span &gt;ANALYZE&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; BUFFERS&lt;span &gt;)&lt;/span&gt; &lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;COUNT&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;*&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; log_lines
    &lt;span &gt;WHERE&lt;/span&gt; log_classification &lt;span &gt;=&lt;/span&gt; &lt;span &gt;65&lt;/span&gt; &lt;span &gt;AND&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;details&lt;span &gt;-&lt;/span&gt;&lt;span &gt;&gt;&gt;&lt;/span&gt;&lt;span &gt;&apos;new_dead_tuples&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;::&lt;span &gt;integer&lt;/span&gt; &lt;span &gt;&gt;=&lt;/span&gt; &lt;span &gt;0&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                        QUERY PLAN                                                         │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=649724.03..649724.04 rows=1 width=8) (actual time=2816.497..2816.498 rows=1 loops=1)                     │
│   Buffers: shared hit=1570 read=386296                                                                                    │
│   I/O Timings: read=1154.438                                                                                              │
│   -&amp;gt;  Seq Scan on log_lines  (cost=0.00..649675.55 rows=19393 width=0) (actual time=78.912..2759.717 rows=667063 loops=1) │
│         Filter: ((log_classification = 65) AND (((details -&amp;gt;&amp;gt; &amp;#39;new_dead_tuples&amp;#39;::text))::integer &amp;gt;= 0))                   │
│         Rows Removed by Filter: 14396065                                                                                  │
│         Buffers: shared hit=1570 read=386296                                                                              │
│         I/O Timings: read=1154.438                                                                                        │
│ Planning Time: 0.095 ms                                                                                                   │
│ JIT:                                                                                                                      │
│   Functions: 4                                                                                                            │
│   Options: Inlining true, Optimization true, Expressions true, Deforming true                                             │
│   Timing: Generation 1.044 ms, Inlining 14.205 ms, Optimization 46.678 ms, Emission 17.868 ms, Total 79.795 ms            │
│ Execution Time: 2817.713 ms                                                                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)

Time: 2818.250 ms (00:02.818)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this case, JIT yields about a &lt;strong&gt;25%&lt;/strong&gt; speed-up, due to spending less CPU time, without any extra effort on our end. We can also see that JIT tasks themselves added 79 ms to the runtime.&lt;/p&gt;
&lt;p&gt;You can fine tune whether JIT is used for a particular query by the &lt;code &gt;jit_above_cost&lt;/code&gt; parameter which applies to the total cost of the query as determined by the Postgres planner. The cost is &lt;code &gt;649724&lt;/code&gt; in the above EXPLAIN output, which exceeds the default &lt;code &gt;jit_above_cost&lt;/code&gt; threshold of &lt;code &gt;100000&lt;/code&gt;. In a future post we&apos;ll walk through more examples of when using JIT can be beneficial.&lt;/p&gt;
&lt;p&gt;You can gather these JIT statistics either for individual queries that you are interested in (using EXPLAIN), or automatically collect it for all of your queries using the &lt;code &gt;auto_explain&lt;/code&gt; extension. If you want to learn more about how to enable &lt;code &gt;auto_explain&lt;/code&gt; we recommend reviewing our guide about it: &lt;a src=&quot;https://pganalyze.com/docs/log-insights/setup/tuning-log-config-settings&quot;&gt;pganalyze Log Insights - Tuning Log Config Settings&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fun fact:&lt;/strong&gt; As part of the writing of this article we ran experiments with JIT and &lt;code &gt;auto_explain&lt;/code&gt;, and discovered that JIT information wasn’t included with &lt;code &gt;auto_explain&lt;/code&gt;, but only with regular EXPLAINs. Luckily, we were able to &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b076eb7669d7279d0f446305c2e12dffd6bc3347&quot;&gt;contribute a bug fix to Postgres&lt;/a&gt;, which has been merged and will be part of the Postgres 11 release.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;preventing-cold-caches-auto-prewarm-in-postgres-11&quot; &gt;&lt;a href=&quot;#preventing-cold-caches-auto-prewarm-in-postgres-11&quot; aria-label=&quot;preventing cold caches auto prewarm in postgres 11 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Preventing cold caches: Auto prewarm in Postgres 11&lt;/h2&gt;
&lt;p&gt;A neat feature that will help you improve performance right after restarting Postgres, is the new autoprewarm background worker functionality.&lt;/p&gt;
&lt;p&gt;If you are not familiar with &lt;a href=&quot;https://www.postgresql.org/docs/11/static/pgprewarm.html&quot;&gt;pg_prewarm&lt;/a&gt;, its an extension thats bundled with Postgres (much like &lt;code &gt;pg_stat_statements&lt;/code&gt;), that you can use to preload data that’s on disk into the Postgres buffer cache.&lt;/p&gt;
&lt;p&gt;It is often very useful to ensure that a certain table is cached before the first production query hits the database, to avoid an overly slow response due to data being loaded from disk.&lt;/p&gt;
&lt;p&gt;Previously, you needed to manually specify which relations (i.e. tables) and which page offsets to preload, which was cumbersome, and hard to automate.&lt;/p&gt;
&lt;h3 id=&quot;caching-tables-with-autoprewarm&quot; &gt;&lt;a href=&quot;#caching-tables-with-autoprewarm&quot; aria-label=&quot;caching tables with autoprewarm permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Caching tables with autoprewarm&lt;/h3&gt;
&lt;p&gt;Starting in Postgres 11, you can instead have this done automatically, by adding &lt;code &gt;pg_prewarm&lt;/code&gt; to &lt;code &gt;shared_preload_libraries&lt;/code&gt; like this:&lt;/p&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;shared_preload_libraries = &amp;#39;pg_prewarm,pg_stat_statements&amp;#39;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Doing this will automatically save information on which tables/indices are in the buffer cache (and which parts of them) every 300 seconds to a file called &lt;code &gt;autoprewarm.blocks&lt;/code&gt;, and use that information after Postgres restarts to reload the previously cached data from disk into the buffer cache, thus improving initial query performance.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;stored-procedures-in-postgres-11&quot; &gt;&lt;a href=&quot;#stored-procedures-in-postgres-11&quot; aria-label=&quot;stored procedures in postgres 11 permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Stored procedures in Postgres 11&lt;/h2&gt;
&lt;p&gt;Postgres has had database server-side functions for a long time, with a variety of supported languages. You might have used the term “procedures” before to refer to such functions, as they are similar to what’s called “Stored Procedures” in other database systems such as Oracle.&lt;/p&gt;
&lt;p&gt;However, one detail that is sometimes missed, is that the existing functions in Postgres were always running within the same transaction. There was no way to begin, commit, or rollback a transaction within a function, as they were not allowed to run outside of a transaction context.&lt;/p&gt;
&lt;p&gt;Starting in Postgres 11, you will have the ability to use &lt;code &gt;CREATE PROCEDURE&lt;/code&gt; instead of &lt;code &gt;CREATE FUNCTION&lt;/code&gt; to create procedures.&lt;/p&gt;
&lt;h3 id=&quot;benefits-of-using-stored-procedures&quot; &gt;&lt;a href=&quot;#benefits-of-using-stored-procedures&quot; aria-label=&quot;benefits of using stored procedures permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Benefits of using stored procedures&lt;/h3&gt;
&lt;p&gt;Compared to regular functions, procedures can do more than just query or modify data: They also have the ability to begin/commit/rollback transactions within the procedure.&lt;/p&gt;
&lt;p&gt;Particularly for those moving over from Oracle to PostgreSQL, the new procedure functionality can be a significant time saver. You can find some examples of how to convert procedures between those two relational database systems in the &lt;a href=&quot;https://www.postgresql.org/docs/11/static/plpgsql-porting.html&quot;&gt;Postgres documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;how-to-use-stored-procedures&quot; &gt;&lt;a href=&quot;#how-to-use-stored-procedures&quot; aria-label=&quot;how to use stored procedures permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How to use stored procedures&lt;/h3&gt;
&lt;p&gt;First, let’s create a simple procedure that handles some tables:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;PROCEDURE&lt;/span&gt; my_table_task&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;LANGUAGE&lt;/span&gt; plpgsql &lt;span &gt;AS&lt;/span&gt; $$
&lt;span &gt;DECLARE&lt;/span&gt;
&lt;span &gt;BEGIN&lt;/span&gt;
  &lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; table_committed &lt;span &gt;(&lt;/span&gt;id &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;COMMIT&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; table_rolled_back &lt;span &gt;(&lt;/span&gt;id &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;ROLLBACK&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;END&lt;/span&gt; $$&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can then call this procedure like this, using the new CALL statement:&lt;/p&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;=# CALL my_table_task();
CALL
Time: 1.573 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here you can see the benefit of procedures - despite the rollback the overall execution is successful, and the first table got created, but the second one was not since the transaction was rolled back.&lt;/p&gt;
&lt;h3 id=&quot;be-careful-transaction-timestamps-and-xact_start-for-procedures&quot; &gt;&lt;a href=&quot;#be-careful-transaction-timestamps-and-xact_start-for-procedures&quot; aria-label=&quot;be careful transaction timestamps and xact_start for procedures permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Be careful: Transaction timestamps and xact_start for procedures&lt;/h3&gt;
&lt;p&gt;Expanding on how transactions work inside procedures, there is currently an oddity with the transaction timestamp, which for example you can see in &lt;code &gt;xact_start&lt;/code&gt;. When we expand the procedure like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;PROCEDURE&lt;/span&gt; my_table_task&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;LANGUAGE&lt;/span&gt; plpgsql &lt;span &gt;AS&lt;/span&gt; $$
&lt;span &gt;DECLARE&lt;/span&gt;
  clock_str &lt;span &gt;TEXT&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  tx_str &lt;span &gt;TEXT&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;BEGIN&lt;/span&gt;
  &lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; table_committed &lt;span &gt;(&lt;/span&gt;id &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt; clock_timestamp&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; clock_str&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt; transaction_timestamp&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; tx_str&lt;span &gt;;&lt;/span&gt;
    RAISE NOTICE &lt;span &gt;&apos;After 1st CREATE TABLE: % clock, % xact&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; clock_str&lt;span &gt;,&lt;/span&gt; tx_str&lt;span &gt;;&lt;/span&gt;
    PERFORM pg_sleep&lt;span &gt;(&lt;/span&gt;&lt;span &gt;5&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;COMMIT&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;CREATE&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; table_rolled_back &lt;span &gt;(&lt;/span&gt;id &lt;span &gt;int&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt; clock_timestamp&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; clock_str&lt;span &gt;;&lt;/span&gt;
    &lt;span &gt;SELECT&lt;/span&gt; transaction_timestamp&lt;span &gt;(&lt;/span&gt;&lt;span &gt;)&lt;/span&gt; &lt;span &gt;INTO&lt;/span&gt; tx_str&lt;span &gt;;&lt;/span&gt;
    RAISE NOTICE &lt;span &gt;&apos;After 2nd CREATE TABLE: % clock, % xact&apos;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; clock_str&lt;span &gt;,&lt;/span&gt; tx_str&lt;span &gt;;&lt;/span&gt;
  &lt;span &gt;ROLLBACK&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;END&lt;/span&gt; $$&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And then call the procedure, we see the following:&lt;/p&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;=# CALL my_table_task();
NOTICE:  00000: After 1st CREATE TABLE: 2018-10-03 22:17:26 clock, 2018-10-03 22:17:26 xact
NOTICE:  00000: After 2nd CREATE TABLE: 2018-10-03 22:17:31 clock, 2018-10-03 22:17:26 xact
CALL
Time: 5022.598 ms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Despite there being two transactions in the procedure, the transaction start timestamp is that of when the procedure got called, not when the embedded transaction actually started.&lt;/p&gt;
&lt;p&gt;You will see the same problem with the &lt;code &gt;xact_start&lt;/code&gt; field in &lt;code &gt;pg_stat_activity&lt;/code&gt;, causing monitoring scripts to potentially detect false positives for long running transactions. This issue is &lt;a href=&quot;https://www.postgresql.org/message-id/flat/20180920234040.GC29981%40momjian.us&quot;&gt;currently in discussion&lt;/a&gt; and likely to be changed before the final release.&lt;/p&gt;
&lt;h3 id=&quot;how-often-does-my-stored-procedure-get-called&quot; &gt;&lt;a href=&quot;#how-often-does-my-stored-procedure-get-called&quot; aria-label=&quot;how often does my stored procedure get called permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How often does my stored procedure get called?&lt;/h3&gt;
&lt;p&gt;Now, if you want to monitor the performance of procedures, it gets a bit difficult. Whilst regular functions can be tracked using &lt;code &gt;track_functions = on&lt;/code&gt;, there is no such facility for procedures. You can however track the execution of CALL statements using &lt;code &gt;pg_stat_statements&lt;/code&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; query&lt;span &gt;,&lt;/span&gt; calls&lt;span &gt;,&lt;/span&gt; total_time &lt;span &gt;FROM&lt;/span&gt; pg_stat_statements &lt;span &gt;WHERE&lt;/span&gt; query &lt;span &gt;LIKE&lt;/span&gt; &lt;span &gt;&apos;CALL%&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;┌────────────┬───────┬────────────┐
│   query    │ calls │ total_time │
├────────────┼───────┼────────────┤
│ CALL abc() │     4 │    5.62299 │
└────────────┴───────┴────────────┘&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In addition, when you enable &lt;code &gt;pg_stat_statements.track = all&lt;/code&gt;, queries that are called from within a procedure will be tracked, and made available in &lt;a href=&quot;https://pganalyze.com&quot;&gt;Postgres query performance monitoring tools such as pganalyze&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Postgres 11 is going to be the best Postgres release yet, and we are excited to put it into use.&lt;/p&gt;
&lt;p&gt;Whilst common wisdom is to not upgrade right after a release, we encourage you to try out the new release early, help the community find bugs (just like we did!), and make sure that your performance monitoring systems are ready to handle the new features that were added.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;PS: If this article was useful to you and you want to share it with your peers you can tweet it by clicking &lt;a href=&quot;https://ctt.ac/JbyV9&quot;&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Postgres Log Monitoring 101: Deadlocks, Checkpoint Tuning & Blocked Queries]]></title><description><![CDATA[Those of us who operate production PostgreSQL databases have many jobs to do - and often there isn't enough time
to take a regular look at the Postgres log files. However, often times those logs contain critical details on how new application code is affecting the database due to locking issues, or how certain configuration parameters cause the database to produce I/O spikes. This post highlights three common performance problems you can find by looking at, and automatically filtering your…]]></description><link>https://pganalyze.com/blog/postgresql-log-monitoring-101-deadlocks-checkpoints-blocked-queries</link><guid isPermaLink="false">https://pganalyze.com/blog/postgresql-log-monitoring-101-deadlocks-checkpoints-blocked-queries</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Mon, 12 Feb 2018 00:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Those of us who operate production PostgreSQL databases have many jobs to do - and often there isn&apos;t enough time
to take a regular look at the Postgres log files.&lt;/p&gt;
&lt;p&gt;However, often times those logs contain critical details on how new application code is affecting the database due to locking issues, or how certain configuration parameters cause the database to produce I/O spikes.&lt;/p&gt;
&lt;p&gt;This post highlights three common performance problems you can find by looking at, and automatically filtering your Postgres logs.&lt;/p&gt;
&lt;h2 id=&quot;blocked-queries&quot; &gt;&lt;a href=&quot;#blocked-queries&quot; aria-label=&quot;blocked queries permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Blocked Queries&lt;/h2&gt;
&lt;p&gt;One of the most performance-related log events are blocked queries, due to waiting for locks that another query has taken. On systems that have problems with locks you will often also see very high CPU utilization that can&apos;t be explained.&lt;/p&gt;
&lt;p&gt;First, in order to enable logging of lock waits, set &lt;code &gt;log_lock_waits = on&lt;/code&gt; in your Postgres config. This will emit a log event like the following if a query has been waiting for longer than &lt;code &gt;deadlock_timeout&lt;/code&gt; (default 1s):&lt;/p&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;LOG: process 123 still waiting for ShareLock on transaction 12345678 after 1000.606 ms
STATEMENT: SELECT table WHERE id = 1 FOR UPDATE;
CONTEXT: while updating tuple (1,3) in relation “table”
DETAIL: Process holding the lock: 456. Wait queue: 123.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This tells us that we&apos;re seeing lock contention on updates for &lt;code &gt;table&lt;/code&gt;, as another transaction holds a lock on the same row we&apos;re trying to update. You can often see this caused by complex transactions that hold locks for too long. One frequent anti-pattern in a typical web app is to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open a transaction&lt;/li&gt;
&lt;li&gt;Update a timestamp field (e.g. &lt;code &gt;updated_at&lt;/code&gt; in Ruby on Rails)&lt;/li&gt;
&lt;li&gt;Make an API call to an external service&lt;/li&gt;
&lt;li&gt;Commit the transaction&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The lock on the row that you updated in Step 2 will be held all the way to 4., which means if the API call takes a few seconds total, you will be holding a lock on that row for that time. If you have any concurrency in your system that affects the same rows, you will see lock contention, and the above lock notice for the queries in Step 2.&lt;/p&gt;
&lt;p&gt;Often you however have to go back to a development or staging system with full query logging, to understand the full context of a transaction thats causing the problem.&lt;/p&gt;
&lt;h2 id=&quot;deadlocks&quot; &gt;&lt;a href=&quot;#deadlocks&quot; aria-label=&quot;deadlocks permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Deadlocks&lt;/h2&gt;
&lt;p&gt;Related to blocked queries, but slightly different, are deadlocks, which result in a cancelled query due to it deadlocking against another query.&lt;/p&gt;
&lt;p&gt;The easiest way to reproduce a deadlock is doing the following:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;--- session 1&lt;/span&gt;
&lt;span &gt;BEGIN&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;table&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;1&lt;/span&gt; &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;UPDATE&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;

&lt;span &gt;--- session 2&lt;/span&gt;
&lt;span &gt;BEGIN&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;table&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;UPDATE&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;table&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;1&lt;/span&gt; &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;UPDATE&lt;/span&gt;&lt;span &gt;;&lt;/span&gt; &lt;span &gt;--- this will block waiting for session 1 to finish&lt;/span&gt;

&lt;span &gt;--- session 1&lt;/span&gt;
&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;table&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; id &lt;span &gt;=&lt;/span&gt; &lt;span &gt;2&lt;/span&gt; &lt;span &gt;FOR&lt;/span&gt; &lt;span &gt;UPDATE&lt;/span&gt;&lt;span &gt;;&lt;/span&gt; &lt;span &gt;--- this can never finish as it deadlocks against session 2&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again after &lt;code &gt;deadlock_timeout&lt;/code&gt; Postgres will see the locking problem. In this case it decides that this will never finish, and emit the following to the logs:&lt;/p&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;2018-02-12 09:24:52.176 UTC [3098] ERROR:  deadlock detected
2018-02-12 09:24:52.176 UTC [3098] DETAIL:  Process 3098 waits for ShareLock on transaction 219201; blocked by process 3099.
	Process 3099 waits for ShareLock on transaction 219200; blocked by process 3098.
	Process 3098: SELECT * FROM table WHERE id = 2 FOR UPDATE;
	Process 3099: SELECT * FROM table WHERE id = 1 FOR UPDATE;
2018-02-12 09:24:52.176 UTC [3098] HINT:  See server log for query details.
2018-02-12 09:24:52.176 UTC [3098] CONTEXT:  while locking tuple (0,1) in relation &amp;quot;table&amp;quot;
2018-02-12 09:24:52.176 UTC [3098] STATEMENT:  SELECT * FROM table WHERE id = 2 FOR UPDATE;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You might think that deadlocks never happen in production, but the unfortunate truth is that heavy use of ORM frameworks can hide the circular dependency situation that produces deadlocks, and its certainly something to watch out for when you make use of complex transactions.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/monitoring-postgres-logs&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: The Top 6 Postgres Log Events To Monitor&quot;
        title=&quot;Download Free eBook: The Top 6 Postgres Log Events To Monitor&quot;
        src=&quot;https://pganalyze.com/static/d5520b49175a81a398bfb64c836919c5/acb04/ebook_promo_log_events.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;checkpoints&quot; &gt;&lt;a href=&quot;#checkpoints&quot; aria-label=&quot;checkpoints permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Checkpoints&lt;/h2&gt;
&lt;p&gt;Last but not least, checkpoints. For those unfamiliar, checkpointing is the mechanism by which PostgreSQL persists all changes to the data directory, which before were only in shared buffers and the WAL. Its what gives you a consistent copy of your data in one place (the data directory).&lt;/p&gt;
&lt;p&gt;Due to the fact that checkpoints have to write out all the changes you&apos;ve submitted to the database (which before were already written to the WAL), they can produce quite a lot of I/O - in particular when you are actively loading data.&lt;/p&gt;
&lt;p&gt;The easiest way to produce a checkpoint is to call &lt;code &gt;CHECKPOINT&lt;/code&gt;, but very few people would do that frequently in production. Instead Postgres has a mechanism that automatically triggers a checkpoint, most commonly due to either &lt;code &gt;time&lt;/code&gt;, or &lt;code &gt;xlog&lt;/code&gt;. After turning on &lt;code &gt;log_checkpoints = 1&lt;/code&gt; you can see this in the logs like this:&lt;/p&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;Feb 09 08:30:07am PST 12772 LOG: checkpoint starting: time
Feb 09 08:15:50am PST 12772 LOG: checkpoint starting: xlog
Feb 09 08:10:39am PST 12772 LOG: checkpoint starting: xlog&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or when visualized over time, it can look like this:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/6bb89bdedc1b3b85d87d722a9985a14a/58354/checkpoint_starting.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Log Insights: Checkpoint Starting analysis&quot;
        title=&quot;Log Insights: Checkpoint Starting analysis&quot;
        src=&quot;https://pganalyze.com/static/6bb89bdedc1b3b85d87d722a9985a14a/1d69c/checkpoint_starting.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Occasionally Postgres will also output the following warning, which hints at the tuning you can do:&lt;/p&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;Feb 09 10:21:11am PST 5677 LOG: checkpoints are occurring too frequently (17 seconds apart)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With checkpoints you want to avoid having them occur to frequently, as each checkpoint will produce significant I/O, as well as cause all changes that are written to WAL right after to be written as a &lt;a href=&quot;https://www.postgresql.org/docs/10/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES&quot;&gt;full-page write&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Ideally you would see checkpoints spaced out evenly and usually started by &lt;code &gt;time&lt;/code&gt; instead of &lt;code &gt;xlog&lt;/code&gt;. You can influence this behavior by the following config settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code &gt;checkpoint_timeout&lt;/code&gt; - the time after which a &lt;code &gt;time&lt;/code&gt; checkpoint will be kicked off (defaults to every 5 minutes)&lt;/li&gt;
&lt;li&gt;&lt;code &gt;max_wal_size&lt;/code&gt; - the maximum amount of WAL that will be accumulated before an &lt;code &gt;xlog&lt;/code&gt; checkpoint gets triggered (defaults to 1 GB)&lt;/li&gt;
&lt;li&gt;&lt;code &gt;checkpoint_completion_target&lt;/code&gt; - how quickly a checkpoint finishes (defaults to &lt;code &gt;0.5&lt;/code&gt; which means it will finish in half the time of &lt;code &gt;checkpoint_timeout&lt;/code&gt;, i.e. 2.5 minutes)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On many production systems I&apos;ve seen &lt;code &gt;max_wal_size&lt;/code&gt; be increased to support higher write rates, &lt;code &gt;checkpoint_timeout&lt;/code&gt; to be slightly increased as well to avoid too frequent time-based checkpoints, as well as setting &lt;code &gt;checkpoint_completion_target&lt;/code&gt; to &lt;code &gt;0.9&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You should however tune all of this based on your own system, and the logs, so you can choose whats correct for your setup. Also note that less frequent checkpoints mean recovery of the server is going to take longer, as Postgres will have to replay all WAL, starting from the previous checkpoint, when booting after a crash.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; &gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Postgres log files contain a treasure of useful data you can analyze in order to make your system behave faster, as well as debug production issues. This data is readily available, but often difficult to parse.&lt;/p&gt;
&lt;p&gt;This article tries to point the way towards which log lines are worth filtering for on production systems.&lt;/p&gt;
&lt;p&gt;If you don&apos;t want to bother with setting up your own filters in a third party logging system, try out &lt;a href=&quot;https://pganalyze.com/blog/postgres-log-monitoring-with-pganalyze/&quot;&gt;pganalyze Postgres Log Insights&lt;/a&gt;: a real-time PostgreSQL log analysis and log monitoring system built into pganalyze.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Visualizing & Tuning Postgres Autovacuum]]></title><description><![CDATA[In this post we'll take a deep dive into one of the mysteries of PostgreSQL: VACUUM and autovacuum. The Postgres autovacuum logic can be tricky to understand and tune - it has many moving parts,
and is hard to understand, in particular for application developers who don't spend
all day looking at database documentation. But luckily there are recent improvements in Postgres, in particular the addition of
pg_stat_progress_vacuum
in Postgres 9.6, that make understanding autovacuum and VACUUM…]]></description><link>https://pganalyze.com/blog/visualizing-and-tuning-postgres-autovacuum</link><guid isPermaLink="false">https://pganalyze.com/blog/visualizing-and-tuning-postgres-autovacuum</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Tue, 28 Nov 2017 00:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/939f208fc12f8c026ee0fb8e800af11c/5df5d/timeline_short.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;VACUUM timeline visualization&quot;
        title=&quot;VACUUM timeline visualization&quot;
        src=&quot;https://pganalyze.com/static/939f208fc12f8c026ee0fb8e800af11c/1d69c/timeline_short.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;In this post we&apos;ll take a deep dive into one of the mysteries of PostgreSQL: VACUUM and autovacuum.&lt;/p&gt;
&lt;p&gt;The Postgres autovacuum logic can be tricky to understand and tune - it has many moving parts,
and is hard to understand, in particular for application developers who don&apos;t spend
all day looking at database documentation.&lt;/p&gt;
&lt;p&gt;But luckily there are recent improvements in Postgres, in particular the addition of
&lt;a href=&quot;https://www.postgresql.org/docs/10/static/progress-reporting.html&quot;&gt;pg_stat_progress_vacuum&lt;/a&gt;
in Postgres 9.6, that make understanding autovacuum and VACUUM
behavior a bit easier.&lt;/p&gt;
&lt;p&gt;In this post we describe an approach to autovacuum tuning that is based on sampling
these statistics over time, visualizing them, and then making tuning decisions based on data.
The visualizations shown are all screenshots of real data, and are available for
early access in pganalyze.&lt;/p&gt;
&lt;h2 id=&quot;why-vacuum&quot; &gt;&lt;a href=&quot;#why-vacuum&quot; aria-label=&quot;why vacuum permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Why VACUUM?&lt;/h2&gt;
&lt;p&gt;First of all, why we need VACUUM, 101:&lt;/p&gt;
&lt;p&gt;When you perform UPDATE and DELETE operations on a table in Postgres,
the database has to keep around the old row data for concurrently running queries and transactions,
due to its MVCC model. Once all concurrent transactions that have seen these old rows have finished,
they effectively become dead rows which will need to be removed.&lt;/p&gt;
&lt;p&gt;VACUUM is the process by which PostgreSQL cleans up these dead rows, and turns the space they have
occupied into usable space again, to be used for future writes.&lt;/p&gt;
&lt;p&gt;A more detailed description can be found in the &lt;a href=&quot;https://www.postgresql.org/docs/10/static/routine-vacuuming.html&quot;&gt;PostgreSQL documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;which-tables-have-vacuum-running&quot; &gt;&lt;a href=&quot;#which-tables-have-vacuum-running&quot; aria-label=&quot;which tables have vacuum running permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Which tables have VACUUM running?&lt;/h2&gt;
&lt;p&gt;The easiest thing you can check on a running PostgreSQL system is which VACUUM
operations are running right now. In all Postgres versions this information shows up in the &lt;code &gt;pg_stat_activity&lt;/code&gt; view,
look for query values that start with &quot;autovacuum: &quot;, or which contain the word &quot;VACUUM&quot;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; pid&lt;span &gt;,&lt;/span&gt; query &lt;span &gt;FROM&lt;/span&gt; pg_stat_activity &lt;span &gt;WHERE&lt;/span&gt; query &lt;span &gt;LIKE&lt;/span&gt; &lt;span &gt;&apos;autovacuum: %&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;-------+----------------------------------------------------------------------------
 10469 | autovacuum: VACUUM ANALYZE public.schema_columns
 12848 | autovacuum: VACUUM public.replication_follower_stats (to prevent wraparound)
 28626 | autovacuum: VACUUM public.schema_index_stats (to prevent wraparound)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Based on sampling this data, we can generate a timeline view that helps us distinguish
tables that are frequently vacuumed, from tables that have long running vacuums, to
tables that don&apos;t get vacuumed much at all.&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/0b80ae48b95f283af7b2d8c9a80c4049/7970d/timeline.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;VACUUM timeline visualization with details&quot;
        title=&quot;VACUUM timeline visualization with details&quot;
        src=&quot;https://pganalyze.com/static/0b80ae48b95f283af7b2d8c9a80c4049/1d69c/timeline.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;In the screenshot you can see the top 10 tables (by frequency) colored the same way,
and in particular the table thats colored light yellow stand out as effectively
having VACUUM running continuously.&lt;/p&gt;
&lt;p&gt;We can also see that one manual VACUUM was started by the DBA user (colored in cyan),
and that it ran much quicker than the same colored version started by autovacuum
earlier in the day.&lt;/p&gt;
&lt;h2 id=&quot;when-does-autovacuum-run&quot; &gt;&lt;a href=&quot;#when-does-autovacuum-run&quot; aria-label=&quot;when does autovacuum run permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;When does autovacuum run?&lt;/h2&gt;
&lt;p&gt;Another question that frequently comes up is, why did autovacuum decide to start
VACUUMing a table?&lt;/p&gt;
&lt;p&gt;There are essentially two major reasons:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1) To prevent Transaction ID wraparound&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The number of non-frozen transaction IDs has reached &quot;autovacuum_freeze_max_age&quot;
(default 200 million transactions), and VACUUM is required to prevent
transaction ID wraparound.&lt;/p&gt;
&lt;p&gt;We won&apos;t go too much into detail on tuning this parameter in this post, but rather reserve this as a
follow-on topic.&lt;/p&gt;
&lt;p&gt;Note that this can&apos;t be disabled, so it will cause autovacuum to start VACUUM,
even if it is otherwise disabled. If you keep cancelling autovacuum processes
started for this reason you will eventually have to perform a manual VACUUM,
as Postgres will shut down the database otherwise.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2) To mark dead rows &amp;#x26; enable re-use for new data&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As you run UPDATEs and DELETEs, dead rows will accumulate, as described earlier
in the post. Once the number of dead rows (or tuples) has exceeded the threshold,
autovacuum will start a VACUUM run.&lt;/p&gt;
&lt;p&gt;The following formula is used to decide whether vacuuming is needed:&lt;/p&gt;
&lt;div  data-language=&quot;text&quot;&gt;&lt;pre &gt;&lt;code &gt;vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By default the base threshold is 50 rows, and the scale factor is 20%. That means,
a table will be vacuumed as soon as the number of dead rows exceeds 20% of all
rows in the table, given that at least 50 rows are marked as dead.&lt;/p&gt;
&lt;p&gt;In order to understand when this gets triggered, you can look at the &lt;code &gt;n_live_tup&lt;/code&gt; and &lt;code &gt;n_dead_tup&lt;/code&gt;
values in &lt;a href=&quot;https://www.postgresql.org/docs/10/static/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW&quot;&gt;pg_stat_user_tables&lt;/a&gt;:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; pg_stat_user_tables &lt;span &gt;WHERE&lt;/span&gt; relname &lt;span &gt;=&lt;/span&gt; &lt;span &gt;&apos;backend_states&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;-[ RECORD 1 ]-------+------------------------------
relid               | 732156523
schemaname          | public
relname             | backend_states
...
n_live_tup          | 23047184
n_dead_tup          | 108373
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can then take this information, together with the autovacuum settings, and visualize it:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/043b1a8340aa3f989198da109eaacd97/0f882/vacuum_table.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;VACUUM table&quot;
        title=&quot;VACUUM table&quot;
        src=&quot;https://pganalyze.com/static/043b1a8340aa3f989198da109eaacd97/1d69c/vacuum_table.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Here you can see that as soon as the dead tuples (grey/red area) reach the threshold (grey line),
a VACUUM process kicks off (red line in the lower graph).&lt;/p&gt;
&lt;p&gt;On a table that can&apos;t keep up with VACUUM, which results in bloat due to dead rows,
this would instead look like this:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/59e3ed24aa959efa2876311d1a2f64f2/e4ba2/vacuum_table_frequent.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;VACUUM table alternative&quot;
        title=&quot;VACUUM table alternative&quot;
        src=&quot;https://pganalyze.com/static/59e3ed24aa959efa2876311d1a2f64f2/1d69c/vacuum_table_frequent.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-fast-does-autovacuum-run&quot; &gt;&lt;a href=&quot;#how-fast-does-autovacuum-run&quot; aria-label=&quot;how fast does autovacuum run permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How fast does autovacuum run?&lt;/h2&gt;
&lt;p&gt;A VACUUM process that was started by autovacuum is artificially throttled in the default
PostgreSQL configuration, so it doesn&apos;t fully utilize the CPU and I/O available.&lt;/p&gt;
&lt;p&gt;That is the correct way to operate for most systems, as you wouldn&apos;t want VACUUM to
slow down application queries during business hours.&lt;/p&gt;
&lt;p&gt;The system that Postgres follows for this is that every VACUUM operation accumulates
cost, which you can think of as points that get added up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#GUC-VACUUM-COST-PAGE-HIT&quot;&gt;vacuum_cost_page_hit&lt;/a&gt; (cost for vacuuming a page found in the buffer cache, default 1)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#GUC-VACUUM-COST-PAGE-MISS&quot;&gt;vacuum_cost_page_miss&lt;/a&gt; (cost for vacuuming a page retrieved from disk, default 10)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#GUC-VACUUM-COST-PAGE-DIRTY&quot;&gt;vacuum_cost_page_dirty&lt;/a&gt; (cost for writing back a modified page to disk, default 20)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once the sum of costs has reached autovacuum_cost_limit (default 200 for autovacuum, disabled for manual VACUUM),
the VACUUM process will sleep and do nothing for autovacuum_vacuum_cost_delay (default 20 ms).&lt;/p&gt;
&lt;p&gt;With the default parameters, that means that autovacuum will at most write 4MB/s to disk, and read 8MB/s from disk or the OS page cache.&lt;/p&gt;
&lt;p&gt;&lt;a src=&quot;https://pganalyze.com/ebooks/optimizing-postgres-query-performance&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        title=&quot;Download Free eBook: How To Get 3x Faster Postgres&quot;
        src=&quot;https://pganalyze.com/static/c15d0b3082bebd2680b86cc948555f76/acb04/ebook_promo_query_performance.jpg&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-far-has-this-vacuum-made-progress&quot; &gt;&lt;a href=&quot;#how-far-has-this-vacuum-made-progress&quot; aria-label=&quot;how far has this vacuum made progress permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;How far has this VACUUM made progress?&lt;/h2&gt;
&lt;p&gt;VACUUM runs through three different major phases as part of its operation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scanning Heap&lt;/li&gt;
&lt;li&gt;Vacuuming Indices&lt;/li&gt;
&lt;li&gt;Vacuuming Heap&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As well as a few &lt;a href=&quot;https://www.postgresql.org/docs/10/static/progress-reporting.html#VACUUM-PHASES&quot;&gt;minor phases&lt;/a&gt; that are usually really quick.&lt;/p&gt;
&lt;p&gt;The &quot;Vacuuming Indices&quot; and &quot;Vacuuming Heap&quot; phase might run multiple times if the
&lt;code &gt;autovacuum_work_mem&lt;/code&gt; setting is set to a too low value that not all dead tuples
can be held in memory.&lt;/p&gt;
&lt;p&gt;Based on sampling &lt;a href=&quot;https://www.postgresql.org/docs/10/static/progress-reporting.html&quot;&gt;pg_stat_progress_vacuum&lt;/a&gt; we can visualize in detail what goes on:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/c799301db2bace33f8bc62383323647b/1acf3/vacuum_detail.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;VACUUM details&quot;
        title=&quot;VACUUM details&quot;
        src=&quot;https://pganalyze.com/static/c799301db2bace33f8bc62383323647b/1d69c/vacuum_detail.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This works even whilst a autovacuum or manual VACUUM is still running, and so we can
get a visual indication of how long we will roughly have to wait for it to finish.&lt;/p&gt;
&lt;h2 id=&quot;what-should-i-tune-first&quot; &gt;&lt;a href=&quot;#what-should-i-tune-first&quot; aria-label=&quot;what should i tune first permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What should I tune first?&lt;/h2&gt;
&lt;p&gt;In general, one might think that VACUUM is an expensive operation, and you&apos;d want
to only run it infrequently, maybe even as a nightly maintenance task.&lt;/p&gt;
&lt;p&gt;That however is often the wrong way to approach it, as rarely run VACUUMs are much
more expensive since they have more work to do, and it also means your system
will spend more time in a sub-optimal state.&lt;/p&gt;
&lt;p&gt;Instead, try to have VACUUM run more often, in proportion to UPDATEs and DELETEs your
application performs. Frequently run VACUUMs will be faster, as there is less work to perform.&lt;/p&gt;
&lt;p&gt;There is two primary tunings you should consider on production Postgres databases:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1) Lower autovacuum_vacuum_scale_factor on tables with old, inactive data&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For tables with a lot of old, inactive data, consider lowering the threshold by
which autovacuum is triggered. Since the calculation is based on the number of
total rows in the table, autovacuum will not notice if most recent rows have been
modified, since the overall number of dead rows will still be way below the default
threshold of 20%.&lt;/p&gt;
&lt;p&gt;However, you will see the impact of dead rows on your query performance, as the
dead rows have to be scanned over when reading data. Reducing the scale factor to
keep down the total number of dead rows can make sense in such cases.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2) Adjust autovacuum_cost_limit / autovacuum_cost_delay for bigger machines&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The default settings for throttling are quite conservative on modern systems. Unless
you run on the smallest instance type, or with the cheapest storage, it often makes sense
to speed up autovacuum a bit.&lt;/p&gt;
&lt;p&gt;In addition, for small tables that have a lot of updates/deletes, it can happen that autovacuum is not
able to keep up, and that you will see new VACUUMs start pretty much right after
the previous one was finished. In such cases adjusting the throttling on a per-table basis
might also make sense.&lt;/p&gt;
&lt;p&gt;Note that most autovacuum configuration
settings can be overridden on a per-table basis:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;ALTER&lt;/span&gt; &lt;span &gt;TABLE&lt;/span&gt; my_table &lt;span &gt;SET&lt;/span&gt; &lt;span &gt;(&lt;/span&gt;autovacuum_vacuum_scale_factor &lt;span &gt;=&lt;/span&gt; &lt;span &gt;0.05&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It often makes sense to review that table&apos;s particular statistics, e.g. how often is
the table updated and how many dead tuples does it accumulate, before modifying
autovacuum settings.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The visualizations shown in this post are based on real data, and are now available
for early access to all pganalyze customers on the Scale plan and higher.&lt;/p&gt;
&lt;p&gt;Reach out to have this feature enabled for your account - we&apos;d be happy to walk you
through it, and help you tune autovacuum on your database.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Whats New in Postgres 10: Monitoring Improvements]]></title><description><![CDATA[Postgres 10 has been stamped on Monday, and will most likely be released this week, so this seems like a good time
to review what this new release brings in terms of Monitoring functionality built into the database. In this post you'll see a few things that we find exciting about the new release, as well as
some tips on what to adjust, whether you use a hosted Postgres monitoring tool like pganalyze,
or if you've written your own scripts. New "pg_monitor" Monitoring Role Most users of Postgres…]]></description><link>https://pganalyze.com/blog/whats-new-in-postgres-10-monitoring-improvements</link><guid isPermaLink="false">https://pganalyze.com/blog/whats-new-in-postgres-10-monitoring-improvements</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Wed, 04 Oct 2017 00:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;Postgres 10 has been stamped on Monday, and will most likely be released this week, so this seems like a good time
to review what this new release brings in terms of Monitoring functionality built into the database.&lt;/p&gt;
&lt;p&gt;In this post you&apos;ll see a few things that we find exciting about the new release, as well as
some tips on what to adjust, whether you use a hosted Postgres monitoring tool like pganalyze,
or if you&apos;ve written your own scripts.&lt;/p&gt;
&lt;h2 id=&quot;new-pg_monitor-monitoring-role&quot; &gt;&lt;a href=&quot;#new-pg_monitor-monitoring-role&quot; aria-label=&quot;new pg_monitor monitoring role permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;New &quot;pg_monitor&quot; Monitoring Role&lt;/h2&gt;
&lt;p&gt;Most users of Postgres obviously don&apos;t want to give monitoring tools access to superuser, but in
the past this was often required, as many Postgres statistic views (e.g. pg_stat_statements)
only show the values for the current user, unless you are superuser.&lt;/p&gt;
&lt;p&gt;This meant that you had to workaround with &lt;code &gt;SECURITY DEFINER&lt;/code&gt; functions that queries
the statistics views as superuser, but could be called from a restricted user.&lt;/p&gt;
&lt;p&gt;Now, you can use the monitoring role in Postgres 10 to instead give a user specific
access to monitor statistics views, without giving out any other access.&lt;/p&gt;
&lt;p&gt;Its as simple as:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;GRANT&lt;/span&gt; pg_monitor &lt;span &gt;TO&lt;/span&gt; monitoring_user&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And afterwards that user can simply access statistics views without running into &lt;code &gt;&amp;lt;insufficient privilege&gt;&lt;/code&gt; issues like before.&lt;/p&gt;
&lt;p&gt;This also works with pganalyze out of the box, so once you upgrade to 10 you can
simply grant the monitoring role to the pganalyze user, and drop the helper
functions we&apos;ve previously asked you to create.&lt;/p&gt;
&lt;p&gt;A subset of often used views that the monitoring role now grants you access to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_stat_statements&lt;/li&gt;
&lt;li&gt;pg_stat_activity&lt;/li&gt;
&lt;li&gt;pg_stat_replication&lt;/li&gt;
&lt;li&gt;pg_stat_progress_vacuum&lt;/li&gt;
&lt;li&gt;.. &lt;a href=&quot;https://www.postgresql.org/docs/10/static/monitoring-stats.html&quot;&gt;and more&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that there more &lt;a href=&quot;https://www.postgresql.org/docs/10/static/default-roles.html&quot;&gt;fine-grained roles&lt;/a&gt; you can assign, should you want to.&lt;/p&gt;
&lt;h2 id=&quot;renaming-of-xlog-to-wal-and-location-to-lsn&quot; &gt;&lt;a href=&quot;#renaming-of-xlog-to-wal-and-location-to-lsn&quot; aria-label=&quot;renaming of xlog to wal and location to lsn permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Renaming of &quot;xlog&quot; to &quot;wal&quot;, and &quot;location&quot; to &quot;lsn&quot;&lt;/h2&gt;
&lt;p&gt;If you&apos;ve written your own monitoring scripts to check replication lag, and other
statistics that have to do with WAL or LSNs, you&apos;ll need to update some function names.&lt;/p&gt;
&lt;p&gt;In this new release, besides the WAL directory being renamed from &quot;pg_xlog&quot; to &quot;pg_wal&quot;,
all system administration functions have also been renamed to match this change. In addition,
where previously functions had the name &quot;location&quot; in them, it now refers to &quot;lsn&quot;.&lt;/p&gt;
&lt;p&gt;You are most likely going to run into this with the often used &lt;code &gt;pg_current_xlog_location&lt;/code&gt; (now &lt;code &gt;pg_current_wal_lsn&lt;/code&gt;), as well as the helper method &lt;code &gt;pg_xlog_location_diff&lt;/code&gt; (now &lt;code &gt;pg_wal_lsn_diff&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Also note that the &lt;code &gt;sent_location&lt;/code&gt;, &lt;code &gt;write_location&lt;/code&gt;, etc fields in &lt;code &gt;pg_stat_replication&lt;/code&gt; have been renamed to &lt;code &gt;sent_lsn&lt;/code&gt;, &lt;code &gt;write_lsn&lt;/code&gt; and so forth.&lt;/p&gt;
&lt;h2 id=&quot;wait-events--non-client-connections-in-pg_stat_activity&quot; &gt;&lt;a href=&quot;#wait-events--non-client-connections-in-pg_stat_activity&quot; aria-label=&quot;wait events  non client connections in pg_stat_activity permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Wait Events &amp;#x26; Non-Client Connections in pg_stat_activity&lt;/h2&gt;
&lt;p&gt;The &lt;code &gt;pg_stat_activity&lt;/code&gt; view and underlying data structure has been thoroughly improved this release, and now shows not just client connections and autovacuum, but also other background workers that are running in the system:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; pid&lt;span &gt;,&lt;/span&gt; backend_type&lt;span &gt;,&lt;/span&gt; backend_start &lt;span &gt;FROM&lt;/span&gt; pg_stat_activity &lt;span &gt;WHERE&lt;/span&gt; backend_type &lt;span &gt;!=&lt;/span&gt; &lt;span &gt;&apos;client backend&apos;&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt; pid |    backend_type     |         backend_start         
-----+---------------------+-------------------------------
  58 | autovacuum launcher | 2017-10-03 21:02:45.458053+00
  60 | background worker   | 2017-10-03 21:02:45.459172+00
  56 | background writer   | 2017-10-03 21:02:45.457657+00
  55 | checkpointer        | 2017-10-03 21:02:45.457491+00
  57 | walwriter           | 2017-10-03 21:02:45.457817+00&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you have previously written monitoring scripts that rely on counting the number of entries in pg_stat_activity, you should filter the view by &lt;code &gt;backend_type = &apos;client backend&apos;&lt;/code&gt;, or switch to using &lt;code &gt;numbackends&lt;/code&gt; from &lt;code &gt;pg_stat_database&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In addition to this, the new release also brings an additional 115 wait events (visible in &lt;code &gt;wait_event_type&lt;/code&gt; and &lt;code &gt;wait_event&lt;/code&gt; in &lt;code &gt;pg_stat_activity&lt;/code&gt;), in particular more than 60 new I/O related events which help you understand better what a query is busy with.&lt;/p&gt;
&lt;p&gt;You can find the full list of wait events in the &lt;a href=&quot;https://www.postgresql.org/docs/10/static/monitoring-stats.html#wait-event-table&quot;&gt;Postgres documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;amcheck&quot; &gt;&lt;a href=&quot;#amcheck&quot; aria-label=&quot;amcheck permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;amcheck&lt;/h2&gt;
&lt;p&gt;Last but not least, a useful feature for consistency checking got added in this release. Initially developed by Peter Geoghegan and battle-tested at Heroku Postgres, this new tool allows you to check a B-Tree index for corruption as well as verify that invariants in the structure of the index are as expected.&lt;/p&gt;
&lt;p&gt;It first needs to be created as &lt;code &gt;CREATE EXTENSION amcheck&lt;/code&gt; and can then be run by a superuser like this:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; bt_index_check&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&apos;my_test_index&apos;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;&lt;span &gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt; bt_index_check
----------------
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;An empty result indicates that the index is consistent, as would be expected.&lt;/p&gt;
&lt;p&gt;Note that amcheck accesses the index through the shared buffer cache, so it might not show problems at the disk level right away. See more details on its &lt;a href=&quot;https://www.postgresql.org/docs/10/static/amcheck.html&quot;&gt;documentation page&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;This concludes a short overview of new monitoring functionality in Postgres 10.&lt;/p&gt;
&lt;p&gt;Note that there are many other amazing new features like parallel query, logical replication and declarative partitioning that are not covered in this post.&lt;/p&gt;
&lt;p&gt;If this article proved useful to you, you might also be interested in our &lt;a href=&quot;https://pganalyze.com/blog/postgresql-log-monitoring-101-deadlocks-checkpoints-blocked-queries&quot;&gt;Postgres Log Monitoring 101&lt;/a&gt; article where we take a closer look at Deadlocks, Checkpoint Tuning, and Blocked Queries.&lt;/p&gt; ]]&gt;</content:encoded></item><item><title><![CDATA[Introducing pg_query: Parse PostgreSQL queries in Ruby]]></title><description><![CDATA[In this article we'll take a look at the new pg_query Ruby library. pg_query is a Ruby library I wrote to help you parse SQL queries and work with the PostgreSQL parse tree. We use this extension inside pganalyze to provide contextual information for each query and find columns which might need an index. At the end of this article you'll also find monitor.rb - a ready-to-use example that filters pg_stat_statements output and restricts it to only show a specific table. Existing Solutions to Parse…]]></description><link>https://pganalyze.com/blog/parse-postgresql-queries-in-ruby</link><guid isPermaLink="false">https://pganalyze.com/blog/parse-postgresql-queries-in-ruby</guid><dc:creator><![CDATA[Lukas Fittl]]></dc:creator><pubDate>Tue, 17 Jun 2014 00:00:00 GMT</pubDate><content:encoded>&lt;![CDATA[ &lt;p&gt;In this article we&apos;ll take a look at the new &lt;strong&gt;&lt;a href=&quot;https://github.com/pganalyze/pg_query&quot;&gt;pg_query&lt;/a&gt;&lt;/strong&gt; Ruby library.&lt;/p&gt;
&lt;p&gt;pg_query is a Ruby library I wrote to help you parse SQL queries and work with the PostgreSQL parse tree. We use this extension inside &lt;a href=&quot;https://pganalyze.com&quot;&gt;pganalyze&lt;/a&gt; to provide contextual information for each query and find columns which might need an index.&lt;/p&gt;
&lt;p&gt;At the end of this article you&apos;ll also find &lt;strong&gt;&lt;a href=&quot;https://gist.github.com/lfittl/301542602607b738b23f&quot;&gt;monitor.rb&lt;/a&gt;&lt;/strong&gt; - a ready-to-use example that filters pg_stat_statements output and restricts it to only show a specific table.&lt;/p&gt;
&lt;h2 id=&quot;existing-solutions-to-parse-sql-queries&quot; &gt;&lt;a href=&quot;#existing-solutions-to-parse-sql-queries&quot; aria-label=&quot;existing solutions to parse sql queries permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Existing Solutions to Parse SQL Queries&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;http://xkcd.com/208/&quot;&gt;&lt;span
      
      
    &gt;
      &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;xckd comic on regular expressions&quot;
        title=&quot;xckd comic on regular expressions&quot;
        src=&quot;https://pganalyze.com/static/e6b0aa1e4ff445198ecb4cef11709213/bc962/xkcd_regexp.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
    &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After a longer period of research on this problem, we&apos;ve come to a few realizations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Obviously, using regular expressions for parsing any complex language is &lt;a href=&quot;http://stackoverflow.com/a/1732454&quot;&gt;a bad idea&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;None of the existing parsers work really well, or are maintained. For example &lt;a href=&quot;https://github.com/andialbrecht/sqlparse&quot;&gt;sqlparse&lt;/a&gt; is focused on re-indenting and beautifying SQL - not for actually working with the query.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Writing and maintaining our own SQL parser is a bad idea. SQL is complex, even for simple things like &lt;a href=&quot;http://www.postgresql.org/docs/current/static/sql-select.html&quot;&gt;SELECT&lt;/a&gt;. And don&apos;t get me started on Common Table Expressions, sub-queries and other fun features.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Our conclusion:&lt;/strong&gt; The only way to correctly parse all valid SQL queries that PostgreSQL understands, now and in the future, is to use PostgreSQL itself.&lt;/p&gt;
&lt;p&gt;And in general, PostgreSQL turns out to have a pretty good SQL parser - other SQL databases &lt;a href=&quot;https://www.youtube.com/watch?v=ZvmMzI0X7fE#t=4m15s&quot;&gt;even use it as a reference implementation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So we&apos;ve pretty much determined that we wanted to use the PostgreSQL parser itself - but how do we access it?&lt;/p&gt;
&lt;h2 id=&quot;accessing-the-postgresql-parser&quot; &gt;&lt;a href=&quot;#accessing-the-postgresql-parser&quot; aria-label=&quot;accessing the postgresql parser permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Accessing the PostgreSQL Parser&lt;/h2&gt;
&lt;p&gt;Lets get the PostgreSQL server source, go down the rabbit hole and find what we need:&lt;/p&gt;
&lt;div  data-language=&quot;c&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;/*
 * raw_parser
 * Given a query in string form, do lexical
 * and grammatical analysis.
 *
 * Returns a list of raw (un-analyzed) parse trees.
 */&lt;/span&gt;
List &lt;span &gt;*&lt;/span&gt;
&lt;span &gt;raw_parser&lt;/span&gt;&lt;span &gt;(&lt;/span&gt;&lt;span &gt;const&lt;/span&gt; &lt;span &gt;char&lt;/span&gt; &lt;span &gt;*&lt;/span&gt;str&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;{&lt;/span&gt;
	&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;
&lt;span &gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is the C function that takes a query and returns a parse tree as C structs.&lt;/p&gt;
&lt;p&gt;Luckily this function is fairly independent, it does not need pg_catalog access (tables, indices, statistics, etc) since it runs before the query is rewritten, planned and executed:&lt;/p&gt;
&lt;p&gt;&lt;span
      
      
    &gt;
      &lt;a
    
    src=&quot;https://pganalyze.com/static/9acf49ec25e6461b6ce43b2e8fd2793b/db783/query_execution.png&quot;
    
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    
    
  &gt;&lt;/span&gt;
  &lt;img
        
        alt=&quot;Diagram of query execution flow in Postgres&quot;
        title=&quot;Diagram of query execution flow in Postgres&quot;
        src=&quot;https://pganalyze.com/static/9acf49ec25e6461b6ce43b2e8fd2793b/db783/query_execution.png&quot;
        
        
        
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Unfortunately &lt;a href=&quot;https://github.com/postgres/postgres/blob/0a7832005792fa6dad171f9cadb8d587fe0dd800/src/backend/parser/parser.c#L35&quot;&gt;&lt;code &gt;raw_parser(...)&lt;/code&gt;&lt;/a&gt; is not exposed or included in any of the PostgreSQL libraries - and its quite difficult to extract the parser from PostgreSQL without taking a whole lot of other code with you.&lt;/p&gt;
&lt;p&gt;The pgpool project &lt;a href=&quot;http://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob;f=src/parser/gram.y;hb=HEAD&quot;&gt;has actually done this&lt;/a&gt;, but they do need to update that code for every new major release. We&apos;ve therefore turned to a slightly different approach:&lt;/p&gt;
&lt;p&gt;We use the PostgreSQL server code directly - by &lt;strong&gt;statically linking the code into our own shared library.&lt;/strong&gt; Through a bit of linking magic, we &lt;a href=&quot;https://github.com/pganalyze/pg_query/blob/e80afe63a2ae10695608ab8d53b10cd7beb32124/ext/pg_query/pg_query.c#L33&quot;&gt;simply call the internal parser functions&lt;/a&gt;, and expose that function through a Ruby interface, to be used like this:&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&apos;pg_query&apos;&lt;/span&gt;

pp &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT 1&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;#&amp;lt;PgQuery:0x007f8cdaa8f8b8&lt;/span&gt;
 &lt;span &gt;@parsetree&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;
  &lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;SELECT&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
     &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;distinctClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;intoClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;targetList&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
       &lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;RESTARGET&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;indirection&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;val&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;A_CONST&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;val&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;1&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;7&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;7&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;fromClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;whereClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;groupClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;havingClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;windowClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;valuesLists&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;sortClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;limitOffset&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;limitCount&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;lockingClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;withClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;op&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;all&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;larg&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;rarg&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
 &lt;span &gt;@query&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;SELECT 1&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
 &lt;span &gt;@warnings&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The result is a PostgreSQL parse tree as used by PostgreSQL internally.&lt;/p&gt;
&lt;h2 id=&quot;parsing-normalized-queries&quot; &gt;&lt;a href=&quot;#parsing-normalized-queries&quot; aria-label=&quot;parsing normalized queries permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Parsing Normalized Queries&lt;/h2&gt;
&lt;p&gt;Now, to the interesting part. Assume we collect pg_stat_statements queries like this one:&lt;/p&gt;
&lt;div  data-language=&quot;sql&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;SELECT&lt;/span&gt; &lt;span &gt;&quot;users&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;*&lt;/span&gt; &lt;span &gt;FROM&lt;/span&gt; &lt;span &gt;&quot;users&quot;&lt;/span&gt; &lt;span &gt;WHERE&lt;/span&gt; &lt;span &gt;&quot;users&quot;&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;&lt;span &gt;&quot;id&quot;&lt;/span&gt; &lt;span &gt;=&lt;/span&gt; ?&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that the actual value has been replaced by the &lt;code &gt;?&lt;/code&gt; character. Unfortunately, the PostgreSQL parser can&apos;t parse queries normalized in this manner. It would simply return a syntax error.&lt;/p&gt;
&lt;p&gt;At first, we simply replaced all occurences of &lt;code &gt;?&lt;/code&gt; with &lt;code &gt;$0&lt;/code&gt; (a parameter reference) before parsing, so that the query can be parsed correctly.&lt;/p&gt;
&lt;p&gt;There are however a few problems with that kind of &quot;dumb&quot; string replacement - most prominentely: We&apos;re breaking all operators containing &lt;code &gt;?&lt;/code&gt;, like for example those for &lt;a href=&quot;http://www.postgresql.org/docs/devel/static/functions-json.html&quot;&gt;JSONB&lt;/a&gt; in 9.4.&lt;/p&gt;
&lt;p&gt;Our improved solution to this: &lt;a href=&quot;https://github.com/pganalyze/postgres/compare/REL9_3_STABLE...pg_query?w=1#diff-3&quot;&gt;We&apos;ve patched the PostgreSQL parser&lt;/a&gt; to support &lt;code &gt;?&lt;/code&gt; as a parameter reference (identical with &lt;code &gt;$0&lt;/code&gt;).&lt;/p&gt;
&lt;div  data-language=&quot;ruby&quot;&gt;&lt;pre &gt;&lt;code &gt;&lt;span &gt;require&lt;/span&gt; &lt;span &gt;&apos;pg_query&apos;&lt;/span&gt;

pp &lt;span &gt;PgQuery&lt;/span&gt;&lt;span &gt;.&lt;/span&gt;parse&lt;span &gt;(&lt;/span&gt;&lt;span &gt;&quot;SELECT * FROM x WHERE y = ?&quot;&lt;/span&gt;&lt;span &gt;)&lt;/span&gt;
&lt;span &gt;#&amp;lt;PgQuery:0x007f8cdaaaae10&lt;/span&gt;
 &lt;span &gt;@parsetree&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;
  &lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;SELECT&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
     &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;distinctClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;intoClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;targetList&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
       &lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;RESTARGET&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;indirection&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;val&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;COLUMNREF&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;fields&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;A_STAR&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;7&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;7&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;fromClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
       &lt;span &gt;[&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;RANGEVAR&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
          &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;schemaname&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;relname&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;x&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;inhOpt&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;2&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;relpersistence&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;&quot;p&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;alias&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
           &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;14&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;whereClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
       &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;AEXPR&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;
         &lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;name&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&quot;=&quot;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;lexpr&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;COLUMNREF&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;fields&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;&quot;y&quot;&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;22&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;rexpr&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;PARAMREF&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;{&lt;/span&gt;&lt;span &gt;&quot;number&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt; &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;26&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
          &lt;span &gt;&quot;location&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;24&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;groupClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;havingClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;windowClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;valuesLists&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;sortClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;limitOffset&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;limitCount&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;lockingClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;withClause&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;op&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;0&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;all&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;false&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;larg&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
      &lt;span &gt;&quot;rarg&quot;&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;span &gt;nil&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;}&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
 &lt;span &gt;@query&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;&quot;SELECT * FROM x WHERE y = ?&quot;&lt;/span&gt;&lt;span &gt;,&lt;/span&gt;
 &lt;span &gt;@warnings&lt;/span&gt;&lt;span &gt;=&lt;/span&gt;&lt;span &gt;[&lt;/span&gt;&lt;span &gt;]&lt;/span&gt;&lt;span &gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Unfortunately, right now, this parser change limits the usage of &lt;code &gt;?&lt;/code&gt; in operators to those in core - specifically JSONB and gemetric operators. If you use third-party extensions or custom operators that contain &lt;code &gt;?&lt;/code&gt;, pg_query likely won&apos;t be able to parse those queries.&lt;/p&gt;
&lt;h2 id=&quot;the-result&quot; &gt;&lt;a href=&quot;#the-result&quot; aria-label=&quot;the result permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The Result&lt;/h2&gt;
&lt;p&gt;As a proof of concept, I wrote &lt;strong&gt;&lt;a href=&quot;https://gist.github.com/lfittl/301542602607b738b23f&quot;&gt;monitor.rb&lt;/a&gt;&lt;/strong&gt;, a Ruby script that  shows the current information stored inside pg_stat_statements in a top-like manner, filtered by a specific table:&lt;/p&gt;
&lt;div  data-language=&quot;shell&quot;&gt;&lt;pre &gt;&lt;code &gt;monitor.rb -d sampledb -t &lt;span &gt;users&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div  data-language=&quot;code&quot;&gt;&lt;pre &gt;&lt;code &gt;AVG     | QUERY
--------------------------------------------------------------------------------
1.5ms   | SELECT &amp;quot;users&amp;quot;.* FROM &amp;quot;users&amp;quot;
0.1ms   | SELECT &amp;quot;users&amp;quot;.* FROM &amp;quot;users&amp;quot; WHERE &amp;quot;users&amp;quot;.&amp;quot;id&amp;quot; = ? ORDER BY &amp;quot;users&amp;quot;.&amp;quot;id&amp;quot; ASC LIMIT ?
0.1ms   | UPDATE &amp;quot;users&amp;quot; SET &amp;quot;fullname&amp;quot; = $1, &amp;quot;updated_at&amp;quot; = $2 WHERE &amp;quot;users&amp;quot;.&amp;quot;id&amp;quot; = ?
0.0ms   | SELECT &amp;quot;users&amp;quot;.* FROM &amp;quot;users&amp;quot; WHERE &amp;quot;users&amp;quot;.&amp;quot;id&amp;quot; = $1 LIMIT 1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This could be easily extended to highlight queries accessing large tables, potentially missing indices, etc.&lt;/p&gt;
&lt;h2 id=&quot;going-forward&quot; &gt;&lt;a href=&quot;#going-forward&quot; aria-label=&quot;going forward permalink&quot; &gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Going Forward&lt;/h2&gt;
&lt;p&gt;As you can see, PostgreSQL parse trees are quite useful - and there are many more analysis/grouping options that could be explored.&lt;/p&gt;
&lt;p&gt;If you enjoyed reading this, please give &lt;a href=&quot;https://github.com/pganalyze/pg_query&quot;&gt;pg_query&lt;/a&gt; a try. Simply install it using:&lt;/p&gt;
&lt;div  data-language=&quot;shell&quot;&gt;&lt;pre &gt;&lt;code &gt;gem &lt;span &gt;install&lt;/span&gt; pg_query&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;During installation of the library a full PostgreSQL server is compiled, so it might take 5-10 minutes. Using a gem cache is advised for deployment.&lt;/p&gt;
&lt;p&gt;Interested in support for other languages? &lt;a href=&quot;mailto:lukas@pganalyze.com&quot;&gt;Drop me a line&lt;/a&gt; and I&apos;d love to chat how we can add support for Python, Perl, you name it.&lt;/p&gt;
&lt;p&gt;Furthermore, we&apos;ll try to get some of our patches upstream for PostgreSQL 9.5 - this specifically relates to our changes in outfuncs.c, supporting additional query nodes and JSON output. Your help and feedback is appreciated.&lt;/p&gt;
&lt;p&gt;And of course, if you build something cool with this, let us know! :)&lt;/p&gt; ]]&gt;</content:encoded></item></channel></rss>