pganalyze Blog

Waiting for Postgres 19: Reduced timing overhead for EXPLAIN ANALYZE with RDTSC

Lukas Fittl — Sat, 11 Apr 2026 12:00:00 GMT

In today’s E122 of “5mins of Postgres” we're talking about the upcoming Postgres 19 release, and how a change in the Postgres instrumentation handling reduces overhead of timing measurements in EXPLAIN ANALYZE using the RDTSC instruction, and why this will allow turning on auto_explain.log_timing for more workloads.

We dive into the recently committed change that I (Lukas) authored together with Andres Freund and David Geier. See the full transcript with examples below.

Share this episode: Click here to share this episode on LinkedIn. Feel free to sign up for our newsletter and subscribe to our YouTube channel.

The problem of slow timing measurements
RDTSC vs RDTSCP
The new timing_clock_source Postgres setting
Live demo on Postgres 19 development branch
What we have discussed in this episode of 5mins of Postgres

Transcript

Welcome back to 5mins of Postgres! Today we talk about a change in the upcoming Postgres 19 release that will lower timing overhead for EXPLAIN ANALYZE.

This is a change that I contributed myself together with Andres Freund and David Geier, and we've worked on this change for a couple of years now actually. But in this release, we basically sat down and we really figured out all the little details that make this work. Now, this was committed recently to the Postgres 19 development branch, and to be clear, it might still be taken out of the final release if any issues are found, but right now, I think there's a decent chance it stays in.

Postgres 19 will be released in September or October, and feature freeze just happened and the beta release will come out sometime in May this year. Now let me show you a little bit more about what this change is about.

The problem of slow timing measurements

Back in 2020, Andres Freund started a mailing list thread where he was basically saying when you run EXPLAIN ANALYZE on a query, it looks a lot slower than it actually is. So in this example here, Andres created a table with 50 million rows:

CREATE TABLE lotsarows(key int not null);
INSERT INTO lotsarows SELECT generate_series(1, 50000000);
VACUUM FREEZE lotsarows;

Very simple table, and then he ran a COUNT(*) on that table:

SELECT count(*) FROM lotsarows;

If I run the COUNT(*) without any EXPLAIN, I get a run time of about 1,900 milliseconds. If I run, EXPLAIN ANALYZE with TIMING OFF and back in that release also with BUFFERS OFF, I get a runtime of about 2,300 milliseconds. Now, if I turn TIMING ON the runtime more than doubles from the actual time. Instead of my query taking 1,900 milliseconds, the query now takes 4,200 milliseconds:

-- best of three:
SELECT count(*) FROM lotsarows;
Time: 1923.394 ms (00:01.923)

-- best of three:
EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM lotsarows;
Time: 2319.830 ms (00:02.320)

-- best of three:
EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
Time: 4202.649 ms (00:04.203)

And first of all, that's a problem because it skews what my actual performance is. If I'm doing testing with EXPLAIN ANALYZE, and I don't recognize that timing has overhead, I basically think my query is slower than it actually is. The other issue is that if you run auto_explain, usually we recommend people turn log_timing off. Just for example, here in pganalyze's install instructions, we like recommending people to use auto explain, but we always tell people today to turn timing off because we think that this is not safe to use on most production systems without knowing your workload better.

If we look at the problem here in more detail, Andres basically did a little profile here and he looked at where is that overhead coming from?

-   95.49%     0.00%  postgres     postgres                 [.] agg_retrieve_direct (inlined)
   - agg_retrieve_direct (inlined)
      - 79.27% fetch_input_tuple
         - ExecProcNode (inlined)
            - 75.72% ExecProcNodeInstr
               + 25.22% SeqNext
               - 21.74% InstrStopNode
                  + 17.80% __GI___clock_gettime (inlined)
               - 21.44% InstrStartNode
                  + 19.23% __GI___clock_gettime (inlined)
               + 4.06% ExecScan
      + 13.09% advance_aggregates (inlined)
        1.06% MemoryContextReset

RDTSC vs RDTSCP

So first of all, in that profile we see the InstrStartNode and InstrStopNode calls. So those are basically calls that get added by Postgres when instrumentation is on, so when I'm running an EXPLAIN ANALYZE, and we can see that most of that time is spent in the clock_gettime function. On a modern Linux system, this is not actually a syscall. Instead, it directly calls RDTSCP. RDTSCP is basically a special instruction on the CPU that gets what's called the timestamp counter.

And think of the timestamp counter as a value that keeps going up, that basically counts cycles, but it counts cycles in a way that isn't influenced by power level changes or other issues that might cause it to be skewed. So it's actually pretty reliable. Now the problem is that what RDTSCP does is it waits until all prior instructions have finished and we say instructions we mean CPU instructions. And so basically what happens is that the timing itself is not just getting the time, but it's also blocking other activity from occurring.

It's blocking the CPU from basically running things in parallel effectively. Now, there is a different instruction called RDTSC without the P. And this instruction basically does not have this blocking of other concurrent instructions. And so when you have this in the picture, then it actually drastically lowers the performance overhead of the timing.

In this particular example Andres ran at the time, instead of the query taking 4,200 milliseconds, it actually took only 2,600 milliseconds:

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                          QUERY PLAN                                                           │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=846239.20..846239.21 rows=1 width=8) (actual time=2610.235..2610.235 rows=1 loops=1)                         │
│   ->  Seq Scan on lotsarows  (cost=0.00..721239.16 rows=50000016 width=0) (actual time=0.006..1512.886 rows=50000000 loops=1) │
│ Planning Time: 0.028 ms                                                                                                       │
│ Execution Time: 2610.256 ms                                                                                                   │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

Time: 2610.589 ms (00:02.611)

This was mainly a prototype at the time. So a lot of the complexities, and part of the reason why this took so long to get implemented is because we needed to make sure that this worked in all kinds of different systems that Postgres gets used on.

The new timing_clock_source Postgres setting

One of the things we ended up adding based on discussions on the mailing lists is a new setting to control whether this gets used or not. So with the new "timing_clock_source" setting, you basically control whether you automatically use the TSC clock source on x86-64 CPUs that are modern enough that have the right instructions. You can force the old way of using the system clock, or you can explicitly set the TSC clock source.

Now in Postgres, we're now basically splitting into two different use cases. So for things like EXPLAIN ANALYZE where we don't necessarily care about a very short, exactly precise measurement, like it's more about the cumulative time that gets taken we use the RDTSC instruction versus in other cases where we care about the higher precision, and it's still a short, run time we do use the RDTSCP instruction, which is higher overhead. Now there is a lot of supporting code to make this work in different environments, if you're interested in how that works, look at the "instr_time.c" file.

Live demo on Postgres 19 development branch

I want to show you an actual example of how this improvement now looks like in the 19 branch. So here I have an SSH client because my machine right now actually is a MacBook. And this initial release will only be focused on getting the fast timing in for x86-64. ARM has a similar instruction, but there is some outstanding issues for ARM machines. So right now I'm connected here via SSH to a different machine. This machine sits right next to me, it's this little Framework Desktop here, but that one is an x86 machine.

And so now what I can do here is I have my Postgres branch already built. I'm first going to run the pg_test_timing utility, it basically measures that overhead of timing. Now here we get three different measurements:

System clock source: clock_gettime (CLOCK_MONOTONIC)
Average loop time including overhead: 18.80 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0       0.0000     0.0000          0
       1       0.0000     0.0000          0
       3       0.0000     0.0000          0
       7       0.0000     0.0000          0
      15      12.7533    12.7533   20353931
      31      87.2357    99.9890  139225930
...

Clock source: RDTSCP
Average loop time including overhead: 16.94 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0       0.0000     0.0000          0
       1       0.0000     0.0000          0
       3       0.0000     0.0000          0
       7       0.0000     0.0000          0
      15      31.1807    31.1807   55204578
      31      68.8159    99.9966  121836600
...

Fast clock source: RDTSC
Average loop time including overhead: 11.69 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0       0.0000     0.0000          0
       1       0.0000     0.0000          0
       3       0.0000     0.0000          0
       7       0.0000     0.0000          0
      15      83.5188    83.5188  214321443
      31      16.4789    99.9977   42287217
...

TSC frequency in use: 2993629 kHz
TSC frequency from calibration: 2994357 kHz
TSC clock source will be used by default, unless timing_clock_source is set to 'system'.

We get the built in clock source called clock_gettime. That took 18 nanoseconds to get a time measurement. Now we're checking with RDTSCP, which again, blocks out of order instructions. That one takes 16.9 nanoseconds. And then if we're running with RDTSC, it takes 11.6 nanoseconds. So clearly RDTSC has less overhead here, I'm getting 50% benefit in this test timing program. I also see which frequency gets used, and then I also see whether that new clock source will used by default. If I don't want to use it, I would have to set timing_clock_source to system explicitly.

The only reason why that would make sense by the way, is if for some reason your TSC is emulated in a certain way so the timing measurements are not stable. And then timing_clock_source = system might provide you those stable measurements.

Now I can run a psql client, show you the actual example. I already have that table that Andres created as an example here as well. First of all, I'll turn on \timing. This is on the psql side, just gives me the run time. Now I'm doing a SELECT COUNT(*):

postgres=# SELECT count(*) FROM lotsarows;
  count   
----------
 50000000
(1 row)

Time: 268.466 ms

This is a more modern machine, so this takes the same 50 million rows, just goes a little faster. So I have about 260 - 270 milliseconds of runtime here.

If I run with EXPLAIN (ANALYZE, TIMING OFF, BUFFERS OFF), let's start with that. I'm not doing a lot of extra work really. I'm just counting how many rows got returned:

postgres=# EXPLAIN (ANALYZE, TIMING OFF, BUFFERS OFF) SELECT count(*) FROM lotsarows;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual rows=1.00 loops=1)
   ->  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual rows=1.00 loops=3)
               ->  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual rows=16666666.67 loops=3)
 Planning Time: 0.174 ms
 Execution Time: 297.043 ms
(8 rows)

Time: 297.535 ms

That's pretty simple.

And then if I now turn TIMING ON this is with the TSC clock source, I get a measurement of about 350 milliseconds:

postgres=# EXPLAIN (ANALYZE, TIMING ON, BUFFERS OFF) SELECT count(*) FROM lotsarows;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual time=349.687..351.719 rows=1.00 loops=1)
   ->  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual time=349.606..351.709 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual time=347.932..347.933 rows=1.00 loops=3)
               ->  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual time=0.149..201.918 rows=16666666.67 loops=3)
 Planning Time: 0.186 ms
 Execution Time: 351.773 ms
(8 rows)

Time: 352.171 ms

I'm still seeing, I would say about a 20 - 25% overhead here. So it's not free, but it's substantially better than with the system clock source.

If I do SET timing_clock_source = system, and I do the timing again, you see a drastic difference:

SET timing_clock_source = 'system';
EXPLAIN (ANALYZE, TIMING ON, BUFFERS OFF) SELECT count(*) FROM lotsarows;

                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual time=799.624..801.496 rows=1.00 loops=1)
   ->  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual time=799.535..801.488 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual time=797.885..797.887 rows=1.00 loops=3)
               ->  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual time=0.073..417.005 rows=16666666.67 loops=3)
 Planning Time: 0.115 ms
 Execution Time: 801.529 ms
(8 rows)

Time: 801.979 ms

Just for clarity, if I just did a regular select count star here, it would take me 260 milliseconds to run the actual query:

postgres=# SELECT count(*) FROM lotsarows;
  count   
----------
 50000000
(1 row)

Time: 263.824 ms

And with the old timing clock source, I get a run time of 800 milliseconds. Versus with the new TSC clock source, I get 355 milliseconds:

SET timing_clock_source = 'tsc';
EXPLAIN (ANALYZE, TIMING ON, BUFFERS OFF) SELECT count(*) FROM lotsarows;

                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=482655.97..482655.98 rows=1 width=8) (actual time=353.401..355.238 rows=1.00 loops=1)
   ->  Gather  (cost=482655.75..482655.96 rows=2 width=8) (actual time=353.292..355.229 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=481655.75..481655.76 rows=1 width=8) (actual time=351.081..351.082 rows=1.00 loops=3)
               ->  Parallel Seq Scan on lotsarows  (cost=0.00..429572.40 rows=20833340 width=0) (actual time=0.131..200.584 rows=16666666.67 loops=3)
 Planning Time: 0.150 ms
 Execution Time: 355.291 ms
(8 rows)

Time: 355.690 ms

So a drastic difference, and I think this to me also makes a difference for many systems where I would feel comfortable using auto_explain on with log_timing on just because, most queries are not this extreme. To be clear, many realistic queries have much less repetition over just these instrumentation start and stop functions.

Previously you would've seen 5-10% on average, now you'll probably see 2-3% on average, which for many systems is a good trade off to have the full instrumentation data available in auto_explain.

There's many other new features that are coming up, hear some more about that in upcoming episodes.

I hope you learned something new from E122 of 5mins of Postgres. Feel free to subscribe to our YouTube channel, sign up for our newsletter or follow us on LinkedIn to get updates about new episodes!

What we have discussed in this episode of 5mins of Postgres

]]>

The Dilemma of the ‘AI DBA’

Lukas Fittl — Wed, 11 Mar 2026 00:00:00 GMT

Like many in the industry, my perspective on AI tools has shifted considerably over the past year, specifically when it comes to software engineering tasks. Going from “this is nice, but doesn’t really solve complex tasks for me” to “this actually works pretty well for certain use cases.” But the more capable these tools become, the sharper one dilemma gets: you can hand off the work, but an AI agent won’t ultimately be responsible when the database goes down and your app stops working.

For databases, the terms ‘AI DBA’ and ‘self-driving database’ have become marketing buzzwords with the promise of having an agent that can handle creating indexes, optimizing data models, and tuning parameter settings, leaving humans free to focus on higher-value work. The appeal is understandable. Databases are hard; Postgres can behave in odd ways; and, if an agent can absorb that complexity, why invest in becoming an expert yourself?

While I’m a big believer in automating routine tasks, I worry the ‘AI DBA’ discourse is missing the mark in terms of the practical, grounded truth of how to use AI tools effectively, especially in production, and who’s responsible when incidents happen.

If we let the AI do it all willy-nilly, then we accumulate cognitive debt and lose important context, making it harder to take responsibility for the outcome. But there is hope yet: And it comes in the form of enabling engineers, instead of replacing DBAs.

How the ‘AI DBA’ framing gets it wrong
What LLMs are actually good at
Let's enable engineers and DBAs to own responsibility for their database
Looking ahead

How the ‘AI DBA’ framing gets it wrong

Framing the role of AI in databases as an ‘AI DBA’ makes a critical mistake: it conflates doing the work with owning the outcome. DevOps gave us a useful precedent here. It didn't remove responsibility from teams: it moved it closer to them. A feature isn't done when it's merged: it's done when it works in production. That same standard should apply to the database: a deployment isn't done until it performs in production. AI doesn't change that bar.

Let’s imagine we have a database team today, with titles like “DBA” or “data platform engineer”:

And let’s say our plan here is that we can replace parts of that team with our new ‘AI DBA’ agent, that can do the work in a good enough way, and is available at all times:

But what happens in that scenario if we have the ‘AI DBA’ agent in the picture? Does it magically fix all production problems? Today it would struggle with even having production access in the first place, because giving production credentials to an autonomous AI agent does not absolve you of its decisions.

What LLMs are actually good at

Even if models improve significantly, they are still LLMs. You can't hold an agent accountable. It needs approvals for high-risk actions. Which means in any realistic scenario, responsibility falls back on either the infrastructure team or the application team — and we've just made the handoff murkier.

Worse, framing the problem as ‘nobody wants to do DBA work, so let's replace the DBA’ sends a clear message to experienced database engineers: your expertise isn't valued here. And beyond the question of accountability, it creates serious problems in practice.

If we think back to why tools like Claude Code have had such tremendous success over the last year, it’s because it put engineers in the driver’s seat - and made them more effective at what they’re already doing. Quickly cross-referencing different pieces of source code, letting the LLM write code for CRUD tasks, exploring different ways of solving a problem, or investigating production incidents from different data sources effectively, whilst quickly going back to the source.

What does this mean for working with Postgres databases?

Rather than replacing database experts with an AI agent, we should focus on what tasks LLMs genuinely excel at today: Information retrieval across different tools, locating the source code file that produced a query, reviewing pull requests automatically for bad patterns, and providing basic fluency for someone unfamiliar with the database, and apply that focus to enabling engineers who work with databases but whose day-to-day job isn't the database.

Let's enable engineers and DBAs to own responsibility for their database

The role of the DBA or data platform engineer needs to change. Successful teams already focus on enabling application engineers, instead of being gatekeepers to changes. The future is specific, purpose-built tools, owned by platform teams, made to be reliable for production use:

If we get it right, AI tools can help us collect evidence for performance optimizations, so that when the application engineer goes to the data platform team for help, they bring the information necessary to facilitate effective investigative work.

AI tools can also help us bridge the gap in the other direction: data platform engineers can put on the shoes of the application engineer and become familiar with the codebase, by asking things like "Where did this query get called?" or "Does this field get used somewhere?"

To enable organizations to roll out AI tools not just in development, but in production use too, we need to be clear on what is being done - and write code that abstracts production information and possibly actions in a safe way. Whether that means specific tool calls, sandboxing, or providing restricted access via a CLI, it needs to be curated to suit an organization’s environment.

The data platform team should own and provide safe, reliable tools that enable engineers across the organization to use AI tools effectively with production statistics and metadata, and be responsible for their own database.

Looking ahead

At pganalyze we build the best monitoring and optimization tools for Postgres, to enable both engineers and platform teams to work better together. One of the ways we do that is we make sure you have reliable monitoring data about your production system. Which query was running yesterday? What EXPLAIN plan was being used? Did the plan switch unexpectedly?

And it turns out that data is pretty useful when working with AI tools. The pganalyze MCP Server, now in early access, enables safe sharing of specific information about production databases, whilst keeping in mind specific workflows, and enabling engineers to work better.

There is more to come later this year. Our aim is to focus on automating the tedious tasks, whilst staying grounded in what actually works for production systems. Sometimes it makes sense to use an AI tool, and sometimes deterministic logic is the best choice. And I’m excited to keep working with, and hearing from teams what works for them, and discover new best practices together.

With thanks to Maciek Sakrejda, Bison Hubert and Laura Kelso for input and reviews on this article.

]]>

How we used pg_query to rewrite queries to fix bad query plans

Keiko Oda — Mon, 06 Oct 2025 12:00:00 GMT

Rewriting SQL queries programmatically is harder than it looks. As a human, adding an extra AND condition to a WHERE clause is simple enough. But doing the same thing in code quickly gets complicated. You might try regex, but the real difficulty is coming up with a pattern that works for every variation of a query. AI could generate plausible rewrites, but it's hard to guarantee correctness. These rewrites may look valid, but SQL has many subtle corner cases, so it's difficult to prove that the transformed query always behaves identically.

Query rewrite 101
Example #1 - Add +0 to ORDER BY to avoid index misuse
Example #2 - Transform multiple OR clauses to ANY
Conclusion

As we are developing the new Query Advisor feature in pganalyze, we need a way to take query insights one step further: not only highlight potential issues, but also suggest alternative query patterns. To do that safely, we turn to pg_query.

Using the pg_query open source library, you can parse a query into a structured parse tree, tweak it at the tree level, and then regenerate valid SQL. It ensures the output is deterministic and syntactically correct. With a recent change, it will also support pretty-printing with configurable indentation and line length, making rewrites more powerful and easier to read.

In this post, we will show a few examples of how you can use pg_query to rewrite queries, starting from a simple demonstration and then moving on to real-world patterns that benefit from rewriting.

Query rewrite 101

Let's walk through a really simple case of using pg_query to rewrite a query. Bindings are available for Ruby, Rust and Go, as well as community-maintained ports for Node.js and Python. If you want to learn more about the basics of pg_query, check out our past blog post. In this blog post, we'll use the Ruby bindings.

Let's start with a simple query:

require "pg_query"
parsed_query = PgQuery.parse("SELECT id FROM tbl1")
# => #<PgQuery::ParserResult:0x000000015da1dc50
#  @aliases=nil,
#  @cte_names=nil,
#  @functions=nil,
#  @query="SELECT id FROM tbl1",
#  @tables=nil,
#  @tree=
#   <PgQuery::ParseResult: version: 170005, stmts: [<PgQuery::RawStmt: stmt: <PgQuery::Node: select_stmt:
#     <PgQuery::SelectStmt: distinct_clause: [],
#       target_list: [<PgQuery::Node: res_target: <PgQuery::ResTarget: name: "", indirection: [], val: <PgQuery::Node: column_ref: <PgQuery::ColumnRef: fields: [<PgQuery::Node: string: <PgQuery::String: sval: "id">>], location: 7>>, location: 7>>],
#       from_clause: [<PgQuery::Node: range_var: <PgQuery::RangeVar: catalogname: "", schemaname: "", relname: "tbl1", inh: true, relpersistence: "p", location: 15>>],
#       group_clause: [], group_distinct: false, window_clause: [], values_lists: [], sort_clause: [],
#       limit_option: :LIMIT_OPTION_DEFAULT, locking_clause: [], op: :SETOP_NONE, all: false>>,
#     stmt_location: 0, stmt_len: 0>]>,
#  @warnings=[]>
parsed_query.tables
# => ["tbl1"]

Here, parsed_query is a parse result that contains a parse tree. It also exposes useful methods, such as tables, which tells us which tables are used in the query.

The parse tree for this query looks like the following:

We can either walk the tree to visit nodes (which we'll cover later), or drill down directly to a specific node. For example, to reach the table name of the from clause:

parsed_query.tree.stmts[0].stmt.select_stmt.from_clause[0].range_var.relname
# => "tbl1"

Updating this lets us change the table name. After updating the node, we can call deparse to generate SQL again:

parsed_query.tree.stmts[0].stmt.select_stmt.from_clause[0].range_var.relname = "tbl2"
parsed_query.deparse
# => "SELECT id FROM tbl2"

Now that we have the basic idea of how rewriting works, let's move on to more practical examples.

Example #1 - Add +0 to ORDER BY to avoid index misuse

When you find a slow query using ORDER BY combined with LIMIT, it's important to check whether the planner is picking the right index. This is something we verify in the Query Advisor feature.

Let's start with a simple query:

SELECT * FROM items WHERE object_id = 123 LIMIT 1

With this query, when the items table has an object_id index (e.g. items_object_id_idx), the planner will usually use it, and the query should finish quickly, as long as the index is selective.

Now, let's add an ORDER BY:

SELECT * FROM items WHERE object_id = 123 ORDER BY id LIMIT 1

In some cases, this can cause the planner to choose a plan like "Index Scan Backward using items_pkey on items", and then filter out rows where object_id = 123. If many rows are removed by that filter, the query can become significantly slower.

A simple workaround is to add "+0" to the ORDER BY id. This prevents the planner from using the primary key index (items_pkey).

SELECT * FROM items WHERE object_id = 123 ORDER BY id + 0 LIMIT 1

Let's create a parse tree from the query (without "+0") and look at the "ORDER BY id" part:

parsed_query = PgQuery.parse('SELECT * FROM items WHERE object_id = 123 ORDER BY id LIMIT 1')
parsed_query.tree.stmts[0].stmt.select_stmt.sort_clause.sort_by[0].sort_by.node
# => <PgQuery::Node: column_ref: <PgQuery::ColumnRef: fields: [<PgQuery::Node: string: <PgQuery::String: sval: "id">>], location: 51>>

It's a bit hard to read, but the sort_by node here is a ColumnRef node pointing to id. To add "+0", we replace it with an A_Expr node that represents a binary expression with id on the left and 0 on the right.

In the below code, create a new A_Expr node:

sort_by_node = parsed_query.tree.stmts[0].stmt.select_stmt.sort_clause.sort_by[0].sort_by.node
new_node = PgQuery::Node.new(
  a_expr: PgQuery::A_Expr.new(
    kind: :AEXPR_OP,
    name: [PgQuery::Node.new(string: PgQuery::String.new(sval: '+'))],
    lexpr: sort_by_node.dup, # Note: to reuse existing nodes, make sure to duplicate to avoid accidentally modifying the original tree
    rexpr: PgQuery::Node.new(a_const: PgQuery::A_Const.new(ival: PgQuery::Integer.new(ival: 0)))
  )
)

Finally, we assign the new node and deparse the query. Don't forget to use the new pretty-printing options:

parsed_query.tree.stmts[0].stmt.select_stmt.sort_clause.sort_by[0].sort_by.node = new_node
opts = PgQuery::DeparseOpts.new(pretty_print: true, indent_size: 2, trailing_newline: true)
parsed_query.deparse(opts: opts)
# => "SELECT *\nFROM items\nWHERE object_id = 123\nORDER BY id + 0\nLIMIT 1\n"

For more on why this rewrite helps, see our blog post Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage.

Example #2 - Transform multiple OR clauses to ANY

With Postgres 18, the planner can transform certain chains of OR comparisons into ANY, which can produce a better plan (https://postgr.es/c/ae4569161). This happens at the planner level, but let's take a look at how we can do the same thing explicitly using a pg_query rewrite.

Let's look at a query that compares the id column to multiple constants:

EXPLAIN SELECT id FROM items WHERE id = 41 OR id = 42 OR id = 43;
                                  QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on items  (cost=12.89..24.37 rows=3 width=8)
   Recheck Cond: ((id = 41) OR (id = 42) OR (id = 43))
   ->  BitmapOr  (cost=12.89..12.89 rows=3 width=0)
         ->  Bitmap Index Scan on items_pkey  (cost=0.00..4.29 rows=1 width=0)
               Index Cond: (id = 41)
         ->  Bitmap Index Scan on items_pkey  (cost=0.00..4.29 rows=1 width=0)
               Index Cond: (id = 42)
         ->  Bitmap Index Scan on items_pkey  (cost=0.00..4.29 rows=1 width=0)
               Index Cond: (id = 43)
(9 rows)

Now, let's rewrite it with ANY:

EXPLAIN SELECT id FROM items WHERE id = ANY('{41,42,43}');
                                   QUERY PLAN
----------------------------------------------------------------------------------
 Index Only Scan using items_pkey on items  (cost=0.29..12.92 rows=3 width=8)
   Index Cond: (id = ANY ('{41,42,43}'::bigint[]))
(2 rows)

Notice how the cost dropped from 24.37 down to 12.92. The query returns the same results, but instead of three Bitmap Index Scans, it uses a single Index Only Scan. Let's take a closer look at the parse tree.

At the parse tree level, the first query is represented as a bool_expr (BoolExpr) with OR_EXPR, containing three equality expressions (id = 41, id = 42, id = 43).

The ANY form, on the other hand, is represented as an a_expr (A_Expr) with ANY. This corresponds to = ANY(array), with the column id on the left and an array of constants on the right ({41,42,43}).

The rewrite steps are:

Find an OR expression made up of multiple id = <const> comparisons
Collect all the constants
Replace the BoolExpr with an A_Expr representing id = ANY(array)

In the example code below, to simplify the replace step, it replaces the matching args elements within the BoolExpr node with a single A_Expr node, collapsing the OR chain into = ANY(...).

In Example #1, we just drilled down to a node and swapped it. This time, let's walk the whole tree, find any matching pattern, and rewrite it.

def transform_or_to_any(query)
  parsed_query = PgQuery.parse(query)
  parsed_query.walk! do |node|
    # Find the BoolExpr node with OR_EXPR
    next unless node.is_a?(PgQuery::BoolExpr) && node.boolop == :OR_EXPR
    keep_as_is = []
    group_by_lexpr = {}
    node.args.each do |arg|
      # Note: only group when the arg is ColumnRef = A_Const (e.g. col1 = 123)
      # For other cases (e.g. col1 IS TRUE, col1 != 345), leave it as is
      if arg.node == :a_expr &&
          arg.a_expr.name.first.node == :string &&
          arg.a_expr.name.first.string.sval == '=' &&
          arg.a_expr.lexpr.node == :column_ref &&
          arg.a_expr.rexpr.node == :a_const
        # In order to use this as a hash key, remove the location info by setting to 0
        arg.a_expr.lexpr.inner.location = 0
        group_by_lexpr[arg.a_expr.lexpr] ||= []
        group_by_lexpr[arg.a_expr.lexpr] << arg.a_expr.rexpr.dup
      else
        keep_as_is << arg.dup
      end
    end
    # No multiple ORs with the same column (lexpr)
    next unless group_by_lexpr.any? { |k, v| v.length > 1 }

    # Create new args with AEXPR_OP_ANY a_expr for grouped args
    any_args = []
    group_by_lexpr.each do |lexpr, rexprs|
      if rexprs.length == 1
        keep_as_is << PgQuery::Node.new(
          a_expr: PgQuery::A_Expr.new(
            kind: :AEXPR_OP,
            name: [PgQuery::Node.new(string: PgQuery::String.new(sval: '='))],
            lexpr: lexpr,
            rexpr: rexprs.first
          )
        )
      else
        any_args << PgQuery::Node.new(
          a_expr: PgQuery::A_Expr.new(
            kind: :AEXPR_OP_ANY,
            name: [PgQuery::Node.new(string: PgQuery::String.new(sval: '='))],
            lexpr: lexpr,
            rexpr: PgQuery::Node.new(a_const: QueryParameters.values_to_array(rexprs))
          )
        )
      end
    end
    node.args.replace(any_args + keep_as_is)
  end

  parsed_query.deparse
end

Now let's try the transform on some variations:

# Simple case
transform_or_to_any('SELECT id FROM items WHERE id = 41 OR id = 42 OR id = 43')
# => "SELECT id FROM items WHERE id = ANY('{41,42,43}')"

# With AND and OR
transform_or_to_any('SELECT id FROM items WHERE id = 41 OR id = 42 AND id = 43 OR id = 44')
# => "SELECT id FROM items WHERE id = ANY('{41,44}') OR (id = 42 AND id = 43)"

# ORs in subqueries or UNIONs
transform_or_to_any(<<~SQL)
SELECT id FROM items WHERE id IN (SELECT id FROM items2 WHERE id = 41 OR id = 42)
UNION
SELECT id FROM items3 WHERE id = 43 OR id = 44
SQL
# => "SELECT id FROM items WHERE id IN (SELECT id FROM items2 WHERE id = ANY('{41,42}')) UNION SELECT id FROM items3 WHERE id = ANY('{43,44}')"

You can see that it transforms ORs properly no matter where they appear in the query.

Conclusion

The examples we looked at, such as adding an expression to influence index usage or transforming multiple OR clauses into ANY, show how query rewriting with pg_query can solve real-world problems in a safe and consistent way. These are only a few cases, and the same approach can be applied to many other kinds of transformations.

In developing the Query Advisor feature, rewriting the query using pg_query has been an essential piece. We hope that sharing these examples encourages you to explore what is possible with this library in your own projects.

]]>

Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O

Lukas Fittl — Wed, 07 May 2025 12:00:00 GMT

With the Postgres 18 Beta 1 release this week, a multi-year effort and significant architectural shift in Postgres is taking shape: Asynchronous I/O (AIO). These capabilities are still under active development, but they represent a fundamental change in how Postgres handles I/O, offering the potential for significant performance gains, particularly in cloud environments where latency is often the bottleneck.

Why asynchronous I/O matters
- How Postgres 17’s read streams paved the way
New io_method setting in Postgres 18
Asynchronous I/O in action
Heads Up: Async I/O makes I/O timing information hard to interpret
Conclusion
- In summary
- References

While some features may still be adjusted or dropped during the beta period before the final release, now is the best time to test and validate how Postgres 18 performs in practice. In Postgres 18 AIO is limited to read operations; writes remain synchronous, though support may expand in future versions.

In this post, we explain what asynchronous I/O is, how it works in Postgres 18, and what it means for performance optimization.

Why asynchronous I/O matters

Postgres has historically operated under a synchronous I/O model, meaning every read request is a blocking system call. The database must pause and wait for the operating system to return the data before continuing. This design introduces unnecessary waits on I/O, especially in cloud environments where storage is often network-attached (e.g. Amazon EBS) and I/O can have over 1ms of latency.

In a simplified model, we can illustrate the difference like this, ignoring any prefetching/batching the Linux kernel might do:

You can picture synchronous I/O like an imaginary librarian who retrieves one book at a time, returning before fetching the next. This inefficiency compounds as the number of physical reads for a logical operation increases.

Asynchronous I/O eliminates that bottleneck by allowing programs to issue multiple read requests concurrently, without waiting for prior reads to return. In an async program flow, I/O requests are scheduled to be read into a memory location and the program waits for completion of those reads, instead of issuing each read individually.

How Postgres 17’s read streams paved the way

The work for implementing asynchronous I/O in Postgres has been many years in the making. Postgres 17 introduced an essential internal abstraction, with the introduction of read stream APIs. These internal changes standardized how read operations were issued across different subsystems and streamlined the use of posix_fadvise() to request that the operating system prefetch data in advance.

However, this advisory mechanism only hinted to the kernel to load data into the OS page cache, not into Postgres’ own shared buffers. Postgres still had to issue syscalls for each read, and OS readahead behaviour is not always consistent.

The upcoming Postgres 18 release removes this indirection. With true asynchronous reads, data is fetched directly into shared buffers by the database itself, bypassing reliance on kernel-level heuristics and enabling more predictable, higher-throughput I/O behavior.

New io_method setting in Postgres 18

To control the mechanism used for asynchronous I/O, Postgres 18 introduces a new configuration parameter: io_method. This setting determines how read operations are dispatched under the hood, and whether they’re handled synchronously, offloaded to I/O workers, or submitted directly to the kernel via io_uring.

The io_method setting must be set in postgresql.conf and cannot be changed without restarting. It controls which I/O implementation Postgres will use and is essential to understand when tuning I/O performance in Postgres 18. There are three possible settings for io_method, with the current default (as of Beta 1) being worker.

io_method = sync

The sync setting in Postgres 18 mirrors the synchronous behavior as was implemented in Postgres 17. Reads are still synchronous and blocking, using posix_fadvise() to achieve read-ahead in the Linux kernel.

io_method = worker

The worker setting utilizes dedicated I/O worker processes running in the background that retrieve data independently of query execution. The main backend process enqueues read requests, and these workers interact with the Linux kernel to fetch data, which is then delivered into shared buffers, without blocking the main process.

The number of I/O workers can be configured through the new io_workers setting, and defaults to 3. These workers are always running, and shared across all connections and databases.

io_method = io_uring

This Linux-specific method uses io_uring, a high-performance I/O interface introduced in kernel version 5.1. Asynchronous I/O has been available in Linux since kernel version 2.5, but it was largely considered inefficient and hard to use. io_uring establishes a shared ring buffer between Postgres and the kernel, minimizing syscall overhead. This is the most efficient option, eliminating the need for I/O worker processes entirely, but is only available on newer Linux kernels and requires file systems and configurations compatible with io_uring support.

Important note: As of the Postgres 18 Beta 1, asynchronous I/O is supported for sequential scans, bitmap heap scans, and maintenance operations like VACUUM.

Asynchronous I/O in action

Asynchronous I/O delivers the most noticeable gains in cloud environments where storage is network-attached, such as Amazon EBS volumes. In these setups, individual disk reads often take multiple milliseconds, introducing substantial latency compared to local SSDs.

With traditional synchronous I/O, each of these reads blocks query execution until the data arrives, leading to idle CPU time and degraded throughput. By contrast, asynchronous I/O allows Postgres to issue multiple read requests in parallel and continue processing while waiting for results. This reduces query latency and enables much more efficient use of available I/O bandwidth and CPU cycles.

Benchmark on AWS: Doubling read performance & even greater gains from io_uring

To evaluate the performance impact of asynchronous I/O, we benchmarked a representative workload on AWS, comparing Postgres 17 with Postgres 18 using different io_method settings. The workload remained identical across versions, allowing us to isolate the effects of the new I/O infrastructure.

We've tested on an AWS c7i.8xlarge instance (32 vCPUs, 64 GB RAM), with a dedicated 100GB io2 EBS volume for Postgres, with 20,000 provisioned IOPS. The test table was 3.5GB in size:

CREATE TABLE test(id int);
INSERT INTO test SELECT * FROM generate_series(0, 100000000);

test=# \dt+
                                   List of relations
 Schema | Name | Type  |  Owner   | Persistence | Access method |  Size   | Description 
--------+------+-------+----------+-------------+---------------+---------+-------------
 public | test | table | postgres | permanent   | heap          | 3458 MB | 
(1 row)

Between test runs we cleared the OS page cache (sync; echo 3 > /proc/sys/vm/drop_caches), and restarted Postgres, to gather cold cache results. Warm cache results represent running the query a second time. We repeated the complete test run for each configuration multiple times, retaining the best result out of three.

Whilst we also tested with parallel query, to keep results easier to understand all results below are with parallel query turned off (max_parallel_workers_per_gather = 0).

Cold cache results:

Postgres 17, using synchronous I/O, established the baseline. It showed consistent read latency, but throughput was limited by the need to complete each I/O request before issuing the next:

test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 15830.880 ms (00:15.831)

Postgres 18, when configured with io_method = sync, performed nearly identically, confirming that behavior remains unchanged without enabling asynchronous I/O:

test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 15071.089 ms (00:15.071)

However, when we switch to using the worker method, with 3 I/O workers (the default) a clear improvement shows:

test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 10051.975 ms (00:10.052)

We observed some gains by raising the number of I/O workers, but the biggested improvement comes when utilizing io_uring:

test=# SELECT COUNT(*) FROM test;
   count   
-----------
 100000001
(1 row)

Time: 5723.423 ms (00:05.723)

When we graph this (measuring runtime in ms, lower is better), it’s clear that Postgres 18 performs significantly better in cold cache situations:

For cold cache tests, both worker and io_uring delivered a consistent 2-3x improvement in read performance compared to the legacy sync method.

Whilst worker offers a slight benefit for warm cache tests due to its parallelism, io_uring consistently performed better in cold cache tests, and its lower syscall overhead and reduced process coordination would make io_uring the recommended setting for maximizing I/O performance in Postgres 18.

This performance shift for disk reads has meaningful implications for infrastructure planning, especially in cloud environments. By reducing I/O wait time, asynchronous reads can substantially increase query throughput, reduce latency and CPU overhead. For read-heavy workloads, this may translate into smaller instance sizes or better utilization of existing resources.

Tuning effective_io_concurrency

In Postgres 18, effective_io_concurrency becomes more interesting, but only when used with an asynchronous io_method such as worker or io_uring. Previously, this setting merely advised the OS to prefetch data using posix_fadvise. Now, it directly controls how many asynchronous read-ahead requests Postgres issues internally.

The number of blocks read ahead is influenced by both effective_io_concurrency and io_combine_limit, following the general formula:

maximum read-ahead = effective_io_concurrency × io_combine_limit

This gives DBAs and engineers greater control over I/O behavior. The optimal value requires benchmarking, as it depends on your I/O subsystem. For example, higher values may benefit cloud environments with high latency that also support high concurrency, like AWS EBS with high provisioned IOPS.

When doing our benchmarks, we also tested higher effective_io_concurrency (between 16 and 128) but did not see a meaningful difference. However, that is likely due to the simple test query used.

It’s worth noting that the previous default of effective_io_concurrency was 1 in Postgres 17, which is now raised to 16, based on benchmarks done by the Postgres community.

Monitoring I/Os in flight with pg_aios

As mentioned, previous versions of Postgres with synchronous I/O made it easy to spot read delays: the backend process would block while waiting for disk access, and monitoring tools like pganalyze can reliably surface IO / DataFileRead as a wait event during these stalls.

For example, here we can see wait events clearly in Postgres 17 synchronous I/O.

With asynchronous I/O in Postgres 18, backend wait behavior changes. When using io_method = worker, the backend process delegates reads to a separate I/O worker. As a result, the backend may appear idle or show the new IO / AioIoCompletion wait event, while the I/O worker shows the actual I/O wait events:

SELECT backend_type, query, state, wait_event_type, wait_event
  FROM pg_stat_activity
 WHERE backend_type = 'client backend' OR backend_type = 'io worker';

  backend_type  | state  | wait_event_type |   wait_event    
----------------+--------+-----------------+-----------------
 client backend | active | IO              | AioIoCompletion
 io worker      |        | IO              | DataFileRead
 io worker      |        | IO              | DataFileRead
 io worker      |        | IO              | DataFileRead
(4 rows)

With io_method = io_uring, read operations are submitted directly to the kernel and completed asynchronously. The backend does not block on a traditional I/O syscall, so this activity is not visible from the Postgres side, even though I/O is in progress.

To help with debugging of I/O requests in flight, the new pg_aios view can show Postgres internal state, even when using io_uring:

SELECT * FROM pg_aios;

  pid  | io_id | io_generation |    state     | operation |    off    | length | target | handle_data_len | raw_result | result  |                   target_desc                    | f_sync | f_localmem | f_buffered 
-------+-------+---------------+--------------+-----------+-----------+--------+--------+-----------------+------------+---------+--------------------------------------------------+--------+------------+------------
 91452 |     1 |          4781 | SUBMITTED    | read      | 996278272 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383760..383775 in file "base/16384/16389" | f      | f          | t
 91452 |     2 |          4785 | SUBMITTED    | read      | 996147200 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383744..383759 in file "base/16384/16389" | f      | f          | t
 91452 |     3 |          4796 | SUBMITTED    | read      | 996409344 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383776..383791 in file "base/16384/16389" | f      | f          | t
 91452 |     4 |          4802 | SUBMITTED    | read      | 996016128 | 131072 | smgr   |              16 |            | UNKNOWN | blocks 383728..383743 in file "base/16384/16389" | f      | f          | t
 91452 |     5 |          3175 | COMPLETED_IO | read      | 995885056 | 131072 | smgr   |              16 |     131072 | UNKNOWN | blocks 383712..383727 in file "base/16384/16389" | f      | f          | t
(5 rows)

Understanding these behavior changes and understanding the impact of asynchronous execution is essential when optimizing I/O performance in Postgres 18.

Heads Up: Async I/O makes I/O timing information hard to interpret

Asynchronous I/O introduces a shift in how execution timing is reported. When the backend no longer blocks directly on disk reads (as is the case with worker or io_uring) the complete time spent doing I/O may not be reflected in EXPLAIN ANALYZE output. This can make I/O-bound queries seem to require less I/O effort than previously.

First, let's run the earlier query in EXPLAIN ANALYZE on a cold cache in Postgres 17:

test=# EXPLAIN (ANALYZE, BUFFERS, TIMING OFF) SELECT COUNT(*) FROM test;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1692478.40..1692478.41 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared read=442478
   I/O Timings: shared read=14779.316
   ->  Seq Scan on test  (cost=0.00..1442478.32 rows=100000032 width=0) (actual rows=100000001 loops=1)
         Buffers: shared read=442478
         I/O Timings: shared read=14779.316
 Planning:
   Buffers: shared hit=13 read=6
   I/O Timings: shared read=3.182
 Planning Time: 8.136 ms
 Execution Time: 18006.405 ms
(11 rows)

We've read 442,478 buffers in 14.8 seconds.

And now, we repeat the test on Postgres 18 with the default settings (io_method = worker):

test=# EXPLAIN (ANALYZE, BUFFERS, TIMING OFF) SELECT COUNT(*) FROM test;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1692478.40..1692478.41 rows=1 width=8) (actual rows=1.00 loops=1)
   Buffers: shared read=442478
   I/O Timings: shared read=7218.835
   ->  Seq Scan on test  (cost=0.00..1442478.32 rows=100000032 width=0) (actual rows=100000001.00 loops=1)
         Buffers: shared read=442478
         I/O Timings: shared read=7218.835
 Planning:
   Buffers: shared hit=13 read=6
   I/O Timings: shared read=2.709
 Planning Time: 2.925 ms
 Execution Time: 10480.827 ms
(11 rows)

We've read 442,478 buffers in 7.2 seconds.

Whilst with parallel query we get a summary of all the I/O time across all parallel workers, no such summarization occurs with I/O workers. What we are seeing is the wait time for the I/O to be completed, ignoring any parallelism that may happen behind the scenes.

This is technically not a behaviour change, since even in Postgres 17 the time reported was the time spent waiting on I/Os, not the time spent performing the I/O, e.g. Kernel I/O time for readahead was never accounted for.

Historically I/O timing was often equated with I/O effort, instead of just looking at shared buffer read counts, in order to distinguish from a OS page cache hit. Now, in Postgres 18, interpreting I/O timing requires more caution: asynchronous I/O can hide I/O overhead in query plans.

Conclusion

To summarize, the upcoming release of Postgres 18 marks the beginning of a major evolution in how I/O is handled. While currently limited to reads, asynchronous I/O already opens the door to significant performance improvements in high-latency cloud environments.

But some of these gains come with tradeoffs. Engineering teams will need to adjust their observability practices, learn new semantics for timing and wait events, and perhaps revisit tuning parameters with previously limited impact, like effective_io_conurrency.

In summary

Asynchronous I/O support in Postgres 18 introduces worker (as the default) and io_uring options under the new io_method setting.
Benchmarks show up to a 2-3x throughput improvement for read-heavy workloads in cloud environments.
Observability practices need to evolve: EXPLAIN ANALYZE may underreport I/O effort, and new views like pg_aios will help provide insights.
Tools like pganalyze will be adapting to these changes to continue surfacing relevant performance insights.

As Postgres development continues, future versions (19 and beyond) may bring asynchronous write support, further reducing I/O bottlenecks in modern workloads, and enabling production use of Direct I/O.

References

]]>

Postgres vs. SQL Server: B-Tree Index Differences & the Benefit of Deduplication

Lukas Fittl — Thu, 03 Apr 2025 12:00:00 GMT

When it comes to optimizing query performance, indexing is one of the most powerful tools available to database engineers. Both PostgreSQL and Microsoft SQL Server (or Azure SQL) use B-Tree indexes as their default indexing structure, but the way each system implements, maintains, and uses those indexes varies in subtle but important ways.

In this blog post, we explore key areas where PostgreSQL and SQL Server diverge: how their B-Tree indexes implementations behave under the hood and how they store and access data on disk. We'll also benchmark the impact of deduplication of values on index size in each database system.

We've also included a comprehensive reference guide at the end (see Postgres vs. SQL Server Index Comparison Table). Whether you're optimizing queries or planning a migration, these differences can have a meaningful impact on both performance and indexing strategy.

How B-Tree indexing works in PostgreSQL vs. SQL Server
Comparison Table: PostgreSQL vs. SQL Server Indexing
Choosing the right index for your workload
References:

How B-Tree indexing works in PostgreSQL vs. SQL Server

At a high level, both databases use B-Tree indexes to speed up equality and range queries. B-Trees maintain sorted order and are balanced for consistent read performance. But while the concept is similar in both databases, the way it's implemented has important performance consequences.

SQL Server uses clustered indexes to physically order the table's data by the indexed column. When a clustered index is defined, the rows in the table are stored in the same order as the index itself. Nonclustered indexes are stored separately and point to rows using a row locator, either a RID or the clustered key. This physical ordering can be beneficial for range scans or pagination queries, but it also means you're limited to one clustered index per table. More importantly, SQL Server stores each index entry in full, even if multiple entries have identical values on the same page. There's no deduplication, so indexes with many repeated values can grow large and consume excessive I/O.

PostgreSQL does not have clustered indexes in the SQL Server sense. All PostgreSQL tables are stored as unordered heaps, and indexes are purely logical structures that point to tuples in the heap. This design gives PostgreSQL some flexibility: it allows for easier index maintenance and avoids the complications of physical reordering.

However, it also means that you can't rely on an index to define how the table is physically laid out. If query performance depends on reading data in a particular order, Postgres does allow you to run the CLUSTER command, but it requires a full table lock. In production environments, you can use tools like pg_repack to achieve a similar result.

So while both databases use B-Tree indexes as their default, SQL Server's tight coupling between index and physical storage creates a different set of expectations and limitations. PostgreSQL's index model has some performance downsides (since there is no clustered index implementation), but distinct features like deduplication make it perform better in other situations.

PostgreSQL's B-Tree deduplication

Deduplication was introduced in PostgreSQL version 13 and addresses a common inefficiency in traditional B-Tree indexes. When many rows share the same indexed value—think status codes, boolean flags, or timestamps—standard B-Trees store each value and its corresponding tuple pointer individually. This results in bloated index pages and increased maintenance cost, especially for write-heavy workloads.

PostgreSQL deduplicates repeated values within a single index page by default. Instead of storing the same key value multiple times, it stores it once and maintains a compact structure that tracks all matching heap pointers. This reduces index size significantly and improves cache performance, since more index entries fit in memory.

SQL Server does not support deduplication. Each index entry is stored independently, even if the values are identical. In datasets with skewed distributions, PostgreSQL's approach leads to more compact, more efficient indexes, with fewer pages and less disk I/O.

Benchmarking B-Tree indexes on PostgreSQL vs. SQL Server

To understand how PostgreSQL's index deduplication affects real-world performance and storage, we ran a benchmark comparing B-Tree index sizes across PostgreSQL and SQL Server under varying levels of data duplication. Each test created a table of 10 million rows with differing levels of value repetition, ranging from entirely unique values to repeated values at a 1000x factor.

Here's how we structured the test in both databases, so you can reproduce it yourself.

PostgreSQL Test Setup

CREATE TABLE factor_1(col int);  
CREATE TABLE factor_10(col int);  
CREATE TABLE factor_100(col int);  
CREATE TABLE factor_1000(col int);

INSERT INTO factor_1 SELECT * FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_10 SELECT val / 10 FROM GENERATE_SERIES(1, 10000000) x(val);  
INSERT INTO factor_100 SELECT val / 100 FROM GENERATE_SERIES(1, 10000000) x(val);  
INSERT INTO factor_1000 SELECT val / 1000 FROM GENERATE_SERIES(1, 10000000) x(val);

CREATE INDEX factor_1_idx ON factor_1(col);  
CREATE INDEX factor_10_idx ON factor_10(col);  
CREATE INDEX factor_100_idx ON factor_100(col);  
CREATE INDEX factor_1000_idx ON factor_1000(col);

CREATE INDEX factor_1_idx_no_dup_fill100 ON factor_1(col) WITH (deduplicate_items = off, fillfactor = 100);  
CREATE INDEX factor_10_idx_no_dup_fill100 ON factor_10(col) WITH (deduplicate_items = off, fillfactor = 100);  
CREATE INDEX factor_100_idx_no_dup_fill100 ON factor_100(col) WITH (deduplicate_items = off, fillfactor = 100);  
CREATE INDEX factor_1000_idx_no_dup_fill100 ON factor_1000(col) WITH (deduplicate_items = off, fillfactor = 100);

SQL Server Test Setup

CREATE TABLE factor_1(col int);  
CREATE TABLE factor_10(col int);  
CREATE TABLE factor_100(col int);  
CREATE TABLE factor_1000(col int);

INSERT INTO factor_1 SELECT * FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_10 SELECT value / 10 FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_100 SELECT value / 100 FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_1000 SELECT value / 1000 FROM GENERATE_SERIES(1, 10000000);

CREATE INDEX factor_1_idx ON factor_1(col);  
CREATE INDEX factor_10_idx ON factor_10(col);  
CREATE INDEX factor_100_idx ON factor_100(col);  
CREATE INDEX factor_1000_idx ON factor_1000(col);

Benchmark results: PostgreSQL's deduplication reduces index size

When we benchmarked index sizes across PostgreSQL and SQL Server, we saw a sharp divergence as data duplication increased. With values repeated 1,000 times, a PostgreSQL index using deduplication was 3x smaller than the same index created with deduplication turned off. Compared to SQL Server, which does not support deduplication and stores each repeated value in full, PostgreSQL consistently produced smaller, more efficient indexes.

This difference matters. High-cardinality columns like status flags, timestamps, and categorical fields are common in production systems. When these values repeat across millions of rows, large indexes can quickly become a performance bottleneck, slowing scans, increasing I/O, and inflating memory usage.

PostgreSQL's deduplication reduces index size significantly, making it easier to keep indexes in memory and reduce disk pressure. For teams moving from SQL Server to PostgreSQL, or simply scaling out workloads with heavily used indexes, this optimization isn't just theoretical. It has a direct impact on resource usage, query performance, and overall operational efficiency.

Comparison Table: PostgreSQL vs. SQL Server Indexing

Index implementations for both B-Tree and other index types vary significantly between PostgreSQL and SQL Server. We've put together a comprehensive index comparison table to help you as a reference in your SQL Server to PostgreSQL migrations.

(Certain index types exist in SQL Server but not in PostgreSQL or vice versa. We've noted supportability as follows: 🟢 Supported index type 🔴 Not supported index type.)

Index Type	Use Case Example	PostgreSQL	SQL Server
B-Tree	Best for general-purpose indexing, equality and range queries (e.g., filtering users by age or date).	🟢 Default index type, supports equality & range queries, sorting, and pattern matching with prefixes.	🟢 On SQL Server the default structure for clustered and nonclustered indexes is a B-Tree.
Clustered	Automatically orders table rows by the index key; best for frequently sorted queries.	🔴 PostgreSQL does not have clustered indexes; instead, you can use the `CLUSTER` command to order the table based on a nonclustered index; however, this order will not be preserved as new data gets inserted.	🟢 Equivalent to PostgreSQL B-Tree; sorts & stores data in order based on key.
Nonclustered	Useful for indexes that speed up searches without affecting physical storage order.	🟢 In PostgreSQL all indexes are nonclustered.	🟢 Can be created on heap or a clustered index; stores data separately from the table.
Hash	Optimized for exact match lookups, like searching by user ID or email address.	🟢 In PostgreSQL, hash indexes can only index a single column. While you can create multiple indexes to support a query, typically a multi-column B-Tree index is more effective.	🟢 Used for memory-optimized tables; requires a fixed bucket count.
Filtered / Partial	Efficient for indexing a subset of data, such as active users only.	🟢 PostgreSQL can use Partial Indexes to index only a subset of rows.	🟢 A Filtered Index is a nonclustered index that indexes only a subset of table rows.
BRIN	Best for very large tables where data is naturally ordered, such as time-series data.	🟢 Stores summaries of block ranges; best for large, sequentially stored data.	🔴 N/A
Full-text	Used for natural language searches, such as searching text in articles or product reviews.	🟢 PostgreSQL supports Full-Text Search using GIN indexes on `tsvector` columns.	🟢 SQL Server uses an inverted index for text-based queries, similar to PostgreSQL GIN.
GIN	Great for indexing JSONB, arrays, and full-text search (e.g., searching product descriptions).	🟢 Inverted index; best for JSON, full-text search, and arrays.	🔴 Partial capability via Full-text index.
Vector	Efficiently perform similarity search or nearest neighbor search across high-dimensional data, most commonly in AI and machine learning applications.	🟢 PostgreSQL doesn't include vector support natively, but the open-source extension pgvector enables vector storage and indexing.	🔴 SQL Server does not natively support vector indexing or search. Microsoft recommends using its Azure AI Search instead.
XML	Optimized for querying and storing XML documents.	🔴 PostgreSQL does not support indexes directly on XML types; however, expression indexes can be used on subsets of the XML data. For unstructured documents, JSONB is the recommended data type.	🟢 SQL Server has dedicated indexes on XML data types.
Spatial	Used for geographic queries, e.g., finding locations within a radius.	🟢 In PostgreSQL spatial indexing queries are provided by the open source PostGIS extension.	🟢 SQL Server has built in spatial data types.
SP-GiST	Used for hierarchical data structures like tree-based searches (e.g., routing networks).	🟢 Supports non-balanced tree structures like quadtrees & k-d trees, good for hierarchical data.	🔴 N/A
GiST	Ideal for geometric and full-text search queries, e.g., finding nearby locations.	🟢 Infrastructure for specialized indexes; used for geometric & full-text search.	🔴 N/A
Columnstore	Best for OLAP workloads and analytical queries (e.g., data warehousing).	🔴 While PostgreSQL has different extensions that offer columnar storage, like Citus and Timescale, it's a relatively recent implementation and may be limited by use case.	🟢 SQL Server has built-in columnar storage implemented as an index type since SQL Server 2012.

Choosing the right index for your workload

Understanding the differences between PostgreSQL and SQL Server indexing is crucial when optimizing query performance, planning a migration, or designing a high-performance database. Choosing the right indexing strategy requires deep knowledge of query execution patterns and performance trade-offs. Many teams manually experiment with different indexing strategies, which can lead to over-indexing, redundant indexes, or missed optimization opportunities.

Instead of trial and error, pganalyze Index Advisor automatically detects missing indexes, redundant indexes, and optimal column order for multicolumn indexes by applying a constraint programming model against real query execution data. This removes the guesswork and ensures that PostgreSQL databases are indexed for maximum performance.

References:

]]>

Comparing EXPLAIN Plans is hard (and how pganalyze does it)

Maciek Sakrejda — Thu, 06 Feb 2025 12:00:00 GMT

The Postgres EXPLAIN command is invaluable when trying to understand query performance. SQL is a declarative language, and the Postgres query planner will decide the most efficient way to execute a query. However, plan selection is based on statistics, configuration settings, and heuristics—not a crystal ball. Sometimes there's a substantial gap between what the planner thinks is most efficient and reality. In those situations, EXPLAIN can help Postgres users understand the planner's "reasoning" in selecting a particular plan.

In this post, we'll walk through EXPLAIN plan fundamentals, why it's helpful to compare EXPLAIN plans and the challenges presented by existing tools. We'll also discuss how that that influenced our product roadmap at pganalyze to create a text-based diff interface, which we first rolled out as part of the beta release of Query Tuning Workbooks earlier this year. Now, we're expanding that same functionality to the EXPLAIN plan list under query details and adding a new comparison metric, buffers.

Existing plan comparisons
Building a bespoke EXPLAIN plan comparison
In Summary

Existing plan comparisons

Sometimes, a single query can end up being executed with several different plans (e.g., due to statistics that vary with query parameters), and understanding a suboptimal plan is often easier when contrasted with a "good" plan. One can figure out the differences and what's causing them, and rewrite the query to pick a more optimal plan.

Unfortunately, Postgres plans are not easy to understand, let alone to compare. We wanted to provide an easier way to review the differences between plans. The EXPLAIN command goes back all the way to Postgres95, the first community open-source, SQL-based release. But comparing EXPLAIN output still seems to be a fairly ad-hoc process now, thirty years later.

Take a simple query like

SELECT * FROM pg_class WHERE relname = 'pg_class'

By default, you will likely get a regular index scan:

                                                          	QUERY PLAN                                                          	 
---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.27..8.29 rows=1 width=273) (actual time=0.033..0.035 rows=1 loops=1)
   Index Cond: (relname = 'pg_class'::name)
   Buffers: shared hit=3
 Planning Time: 0.127 ms
 Execution Time: 0.060 ms
(5 rows)

If regular index scans are disabled, you'll get a bitmap index scan followed by a bitmap heap scan:

                                                        	QUERY PLAN                                                        	 
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on pg_class  (cost=4.28..8.29 rows=1 width=273) (actual time=0.027..0.029 rows=1 loops=1)
   Recheck Cond: (relname = 'pg_class'::name)
   Heap Blocks: exact=1
   Buffers: shared hit=3
   ->  Bitmap Index Scan on pg_class_relname_nsp_index  (cost=0.00..4.28 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1)
     	Index Cond: (relname = 'pg_class'::name)
     	Buffers: shared hit=2
 Planning Time: 0.157 ms
 Execution Time: 0.087 ms
(9 rows)

Comparing something like this by looking at the two plans side-by-side is pretty straightforward because the plan is small, but once you need to compare larger plans, you may want a better mechanism. There are no EXPLAIN-specific comparison tools, but GNU diff has been around since the early seventies (Wikipedia has a nice overview of the history), and is still a go-to tool for comparing text files. But diff output of the plans above is not very usable:

1,4c1,5
<                                                           	QUERY PLAN                                                          	 
< ---------------------------------------------------------------------------------------------------------------------------------------
<  Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.27..8.29 rows=1 width=273) (actual time=0.033..0.035 rows=1 loops=1)
<	Index Cond: (relname = 'pg_class'::name)
---
>                                                         	QUERY PLAN                                                        	 
> -----------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on pg_class  (cost=4.28..8.29 rows=1 width=273) (actual time=0.027..0.029 rows=1 loops=1)
>	Recheck Cond: (relname = 'pg_class'::name)
>	Heap Blocks: exact=1
6,8c7,12
<  Planning Time: 0.127 ms
<  Execution Time: 0.060 ms
< (5 rows)
---
>	->  Bitmap Index Scan on pg_class_relname_nsp_index  (cost=0.00..4.28 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1)
>      	Index Cond: (relname = 'pg_class'::name)
>      	Buffers: shared hit=2
>  Planning Time: 0.157 ms
>  Execution Time: 0.087 ms
> (9 rows)

It shows us lines that are due to differences in plan structure, but also some differences due to cost estimates, timing or I/O differences, or other irrelevant details.

Building a bespoke EXPLAIN plan comparison

We experimented with a couple of different approaches to improve this experience when using pganalyze for recording and comparing query plans. We settled on an interface built on a text-based diff of the text output (inspired by diff and GitHub's git changeset rendering), but optimized for understanding the most important EXPLAIN plan differences:

The plans in the comparison are rendered to focus on the plan structure (since this is usually what leads to the biggest performance differences between plans). Changes in runtime or I/O will not show up as a difference between plans, but you can select a comparison metric to focus on, and see the values of that metric for each node in the plan. You can also click on a node in either Plan A or Plan B to see details about that node, just like when viewing full EXPLAIN plans.

As mentioned, we first introduced EXPLAIN plan comparison as part of the Query Tuning Workbooks feature we launched in beta. When tuning a query, being able to compare plans easily is extremely useful.

Today we're extending this functionality to the query EXPLAIN plan list, for plans captured through Automated EXPLAIN. When multiple distinct plans for a query exist, it can be hard to understand what the differences are. Now, you can select two plans on the Query Detail page for a specific query to see their comparison:

As part of this release, we're also adding buffers used as one of the execution metrics to comparisons. Buffer usage can be tricky to compare because buffer hits can be double-counted in Postgres' current statistics accounting. But the sources of double-counting are somewhat limited: most of that happens in Nested Loop joins, and sometimes with Index Scans. It can't always reliably be used to determine "how much data did this query load" with a warm cache, but it can still be useful to compare two plans with a similar structure.

In Summary

We're excited to expand our EXPLAIN comparison feature beyond Query Tuning Workbooks, and we hope you'll find this feature useful. If you're an existing user, you can find the feature on the EXPLAIN Plans tab of the Query Detail page under Query Performance. If you're new to pganalyze, visit our Getting Started Guide and sign up for a free trial today.

]]>

Replacing Oracle Hints: Best Practices with pg_hint_plan on PostgreSQL

Lukas Fittl — Wed, 05 Feb 2025 12:00:00 GMT

If you're migrating from Oracle Database to PostgreSQL, you're likely accustomed to using hints to optimize queries. In Oracle, these are special directives embedded in SQL (like /*+ INDEX(...) */) that steer the optimizer's execution plan. They can be extremely useful but also introduce complexity and “hint debt” over time.

PostgreSQL takes a very different approach to query optimization. Rather than supporting built-in hints, the Postgres community, historically, has emphasized relying on its cost-based planner to choose execution plans based on statistics, indexes, and configuration parameters. In practice, that works many times, but there can be cases where the planner is stubborn and keeps picking a bad plan. In migration situations, this is particularly complicated, because performance may be dependent on a particular execution plan that was previously specified using an Oracle hint.

So you might ask yourself: how do you replicate or replace Oracle hints when you migrate to Postgres? That's where the pg_hint_plan extension comes in.

In this post, we'll explore the differences between Oracle's hint system and PostgreSQL's planner with pg_hint_plan, discuss when you still need hints in your Postgres queries, and walk through best practices for using pg_hint_plan effectively, including how pganalyze can help.

When (and when not) to use hints
Mapping Oracle hints to pg_hint_plan
Best practices for debugging pg_hint_plan hints
Using pganalyze to test query hints
Conclusion
References

When (and when not) to use hints

It might be tempting to migrate all Oracle hints into pg_hint_plan, but this can be overkill and sometimes even counterproductive in PostgreSQL. Let's talk about where hints fit into a well-tuned Postgres environment.

Relying on PostgreSQL's cost-based planner

PostgreSQL is built around a cost-based planner that typically selects efficient execution paths without manual intervention. It uses:

Statistics on table sizes, column data distribution, etc.
Planner cost settings like random_page_cost and cpu_tuple_cost
Server configuration parameters such as enable_seqscan, work_mem, and effective_cache_size

The philosophy behind PostgreSQL's planner is that if your statistics, indexes, and cost parameters are well-tuned, the engine can usually figure out the best plan on its own, and there is rarely a need to rely on hints.

However, this system isn't perfect, and Postgres sometimes picks sub-optimal plans, as we've talked about in our Postgres planner quirks series.

Root causes of Postgres planner problems

A common problem with Postgres query plans are out of date, or incorrect statistics. Statistics about tables columns and the selectivity of query filters are critical for the planner to make good decisions. Frequent ANALYZE operations combined with tuned statistics target settings and using CREATE STATISTICS, ensure that the system captures current information about data distributions.

A thoughtfully designed schema with well-chosen indexes and, when appropriate, table partitioning, often provides a bigger performance boost than manual hints, which can only do so much on a large table.

Settings such as work_mem, random_page_cost, and effective_cache_size have a significant impact on the decisions the planner makes, yet they are often set at the default value, which can cause bad query plans. Optimizing these settings can resolve many query performance challenges without introducing hints. When the planner's cost model aligns well with the realities of your hardware and data, it typically arrives at better plans.

When hints can help

Despite the strengths of PostgreSQL's planner, there are times when hints prove beneficial. In fact, forcing a certain plan for debugging can offer valuable insight into why the planner's default choice might be less than ideal, and which part of the query plan had inaccurate costs, often caused by statistics issues.

Legacy Oracle queries often rely heavily on hints, and adjusting them or restructuring the schema might be too risky or time-intensive. In such cases, pg_hint_plan can replicate specific behaviors from Oracle without a total rewrite. Hints also help in highly complex queries or unusual data distributions that consistently lead the planner astray. They are likewise useful as a temporary patch while deeper issues, such as missing statistics or incorrectly set parameters, are being addressed.

When statistical accuracy, schema design, and parameter tuning are all properly addressed in Postgres, hints become an added layer of complexity rather than a necessity. Use them sparingly, focusing on special cases that truly require hard-coded logic.

Mapping Oracle hints to pg_hint_plan

Both Oracle hints and pg_hint_plan hints are embedded in SQL statements using /*+ ... */. They can:

Force the use of specific indexes or join methods (e.g., nested loops)
Enable or disable parallel execution
Override other plan choices

These hints can be very direct: “Use index X on this table,” or “Join table A and B using a Nested Loop Join.” This level of control is sometimes essential when the database optimizer doesn't pick an optimal plan on its own or when you need consistent performance across different instances.

When you do decide to replicate Oracle hints in Postgres, you'll likely look for direct equivalents. pg_hint_plan supports many—but not all—Oracle-like hints. pg_hint_plan primarily controls scan methods, join methods, join order, and query parallelism. Many of Oracle's advanced hints for rewriting queries, star transformations, dynamic sampling, and specialized caching are simply not available or applicable in Postgres.

Instead, in Postgres, you often achieve similar behavior by tuning planner GUCs (like enable_hashjoin, enable_nestloop), rewriting queries, materializing parts of the query with the MATERIALIZED keyword for CTEs, or using indexes/constraints that nudge the Postgres planner.

Let's review some common situations and map them from Oracle Database hints to pg_hint_plan syntax or other Postgres alternatives.

Access path (or index) hints

Oracle Hint	pg_hint_plan Equivalent	Notes
`FULL(table)` Force a full table scan	`SeqScan(table)`	Forces Postgres to use a sequential scan (called Full Table Scan on Oracle) on the named table.
`INDEX(table [index])` Force index scan	`IndexScan(table [index])` or `IndexOnlyScan(table [index])` or `BitmapScan(table [index])`	pg_hint_plan has separate hints for regular index scans, index-only scans, or bitmap index scans.
`INDEX_FFS(table index)` Fast full index scan	No direct equivalent. `IndexOnlyScan` is approximate.	Postgres can answer a query from the index by using an IndexOnlyScan, if all filtered and returned columns are indexed. However, Postgres sometimes still checks the table to verify visibility of deleted rows (this cannot be turned off).
`INDEX_DESC(table [index])` Reverse index scan	`IndexScan` with an `ORDER BY ... DESC` in the query itself.	pg_hint_plan can't directly enforce a descending index scan; you typically rely on query order or an index with the right sort order.
`NO_INDEX(table [index])` Disallow index	No equivalent.	No equivalent to disallow individual indexes.
`INDEX_JOIN(table)` Use index join	No equivalent.	PostgreSQL does not have a direct "index join" concept like Oracle.

In Oracle, you might have:

SELECT /*+ INDEX(table1 idx_table1_col) */ 
       col1, col2
FROM   table1
WHERE  col1 = 'something'
ORDER BY col2 LIMIT 1;

In PostgreSQL with pg_hint_plan, you'd translate it to:

/*+
  IndexScan(table1 idx_table1_col)
*/
SELECT col1, col2
FROM   table1
WHERE  col1 = 'something'
ORDER BY col2 LIMIT 1;

Join operation hints

Oracle Hint	pg_hint_plan Equivalent	Notes
`USE_NL(table1 table2)` Use nested loops	`NestLoop(table1 table2)`	Forces a Nested Loop Join between the two named tables.
`USE_HASH(table1 table2)` Use hash join	`HashJoin(table1 table2)`	Forces a Hash Join between the two named tables.
`USE_MERGE(table1 table2)` Use sort-merge join	`MergeJoin(table1 table2)`	Forces a Merge Join between the two named tables.
`USE_NL_WITH_INDEX(t1 idx1)`	`NestLoop(table1 table2)` + `IndexScan(table1 index1)` + `Leading((table2 table1))`	In order to perform what Postgres calls a Parameterized Index Scan, the hints must force both a NestedLoop, the Join Order (via Leading) and the use of the correct Index. Note that the Leading hint requires use of extra parenthesis to force the ordering. The first table listed is the outer table, followed by the inner table (which is the one the index scan is on).
`NO_USE_NL(t1 [t2...])` `NO_USE_MERGE(t1 [t2...])` `NO_USE_HASH(t1 [t2...])`	`NoNestLoop(t1 t2 [t3...])` `NoMergeJoin(t1 t2 [t3...])` `NoHashJoin(t1 t2 [t3...])`	pg_hint_plans instructs PostgreSQL's query planner not to use a Nested Loop/Merge/Hash join for the listed tables (which need to include both the inner and the outer table), while the Oracle hint tells the optimizer not to use a Nested Loop/Merge/Hash join for each specified table where it is the inner table of the join.

Join order hints

Oracle Hint	pg_hint_plan Equivalent	Notes
`ORDERED` Join in the order of tables in the FROM clause	`Set(join_collapse_limit 1)`	In Postgres, setting the `join_collapse_limit` setting to "1" will force Postgres to join the tables in the order they are listed in the query. You can set this either via pg_hint_plan or a regular `SET` command before running the query. See examples in the Postgres documentation.
`LEADING(t1 t2 ... tN)`	`Leading(t1 t2 ... tN)` `Leading(((t1 t2) t3))`	pg_hint_plan supports `Leading(...)` to fix the join order. You can list multiple tables in the desired join sequence. Use the syntax with additional parenthesis around each pair to specify which table is used as the inner vs outer table.

Parallel / degree of parallelism hints

Oracle Hint	pg_hint_plan Equivalent	Notes
`PARALLEL(table, n)` Parallel degree n	`Parallel(table n hard)`	pg_hint_plan by default ("soft") only sets the configured maximum number of workers (`max_parallel_workers_per_gather`) but won't force a parallel plan if the costs are not in its favor. You can force a parallel plan by specifying the third argument as `hard`, which matches Oracle's behaviour when specifying a specific parallel degree.
`NO_PARALLEL(table)` Disallow parallel	`Parallel(table 0)`	pg_hint_plan inhibits parallel execution when the table value is set to zero.

Example usage in pg_hint_plan, increasing the parallel workers from the default of 2 (max_parallel_workers_per_gather) to 4 just for this query's use of the "sales" table:

/*+
  Parallel(sales 4)
*/
SELECT ...

Query transformation & subquery hints

Oracle has many hints controlling query transformations (like unnesting subqueries, merging views, star transformations, etc.). pg_hint_plan does not provide direct equivalents for these transformations; PostgreSQL's planner transformations are generally not hint-based but either controlled automatically or by GUC parameters.

Oracle Hint	pg_hint_plan Equivalent	Notes
`UNNEST / NO_UNNEST`	None	PostgreSQL decides automatically on subquery unnesting (lateral joins, subquery flattening, etc.), and pg_hint_plan cannot influence this. However, queries can be rewritten to use a CTE with the `NOT MATERIALIZED` keyword, which will behave similar to Oracle's `UNNEST`, or `MATERIALIZED` which will behave like `NO_UNNEST`. See Postgres documentation.
`MERGE` / `NO_MERGE`	None	In Postgres, views are inlined automatically as if they were a subquery; there is no fine-grained hint for controlling this.
`PUSH_SUBQ` / `NO_PUSH_SUBQ`	None	No direct control over subquery execution in `pg_hint_plan`.
`STAR_TRANSFORMATION` / `NO_STAR_TRANSFORMATION`	None	Oracle's star transformations for data warehouse schemas have no direct counterpart in Postgres.
`FACT` / `NO_FACT`	None	Oracle uses these for star schemas; not applicable in Postgres.

Result cache and other specialized hints

Oracle Hint	pg_hint_plan Equivalent	Notes
`RESULT_CACHE` / `NO_RESULT_CACHE`	None	PostgreSQL does not have a built-in query result cache like Oracle.
`OPT_PARAM(...)`	`Set(...)`	Postgres parameters are typically set at the session level ("SET" command) or via "Set" hints in pg_hint_plan. Note the parameters that can be set differ between Oracle and Postgres.
`DYNAMIC_SAMPLING(...)`	None	Postgres statistics system works based on a separate ANALYZE of the table outside of query execution and does not have an equivalent of dynamic sampling.
`QB_NAME`	None	pg_hint_plan does not offer an equivalent to Oracle's query block functionality for hints.
`PUSH_PRED` / `NO_PUSH_PRED`	None	Postgres handles predicate pushdown automatically based on heuristics for subqueries; no direct hint.
`USE_CONCAT`	None	Oracle uses this to force expansion of `OR` clauses into `UNION ALL` queries. Postgres does not support doing this transformation automatically, manual rewrite of the query is needed. See our blog post for an example.
`NO_QUERY_TRANSFORMATION`	None	Postgres's transformations during the planning process can not be turned off / modified via hints.

Additional pg_hint_plan Features (no Oracle equivalent)

pg_hint_plan has additional hints that don't map to Oracle hints but can be helpful:

Rows(table1 table2 [ n ]): Tells the planner to assume a join between table1 and table 2 returns n rows (replacing or adjusting the statistics-derived estimate), influencing join order and plan choices.
Memoize(table1 table2) / NoMemoize(table1 table2): Influences whether the Memoize functionality is applied to the given join tables. Memoize can sometimes cause Postgres planner costs to be off, and as such the “NoMemoize” hint can be useful to avoid query plans that might favor a Nested Loop Join.

Best practices for debugging pg_hint_plan hints

Sometimes a pg_hint_plan hint won't take effect, and it's not always clear why that might be, as Postgres will always give you a plan, even if the pg_hint_plan hints did not take effect.

The most common problems can be:

Specifying multiple hint comments (if you have multiple hints you must specify them all in one /*+ ... */ comment)
Using incorrect pg_hint_plan syntax (e.g. NestedLoop instead of NestLoop)
The planner not having a viable path to use the hint (e.g. because the requested index can't be used for a given expression)
Re-used table names not having unique aliases in a query (you need to assign an alias to each table in such situations)
Hints for partitioned tables must target the partition table parent, not the children
Subqueries that do not have an assigned name (i.e. are not a CTE) can only be hinted in some cases

However, by default you may not see any clear indication of a problem, since pg_hint_plan does not show any debug output by default.

To understand better why hints may not have been used, you can enable the pg_hint_plan.print_debug flag. This will give you output like this:

SET pg_hint_plan.debug_print = true;  
/*+ NestedLoop(table1 table2) */ EXPLAIN SELECT * FROM …;

INFO:  pg_hint_plan: hint syntax error at or near "NestedLoop".  
DETAIL:  Unrecognized hint keyword "NestedLoop".  
                                          QUERY PLAN                                        	   
----------------------------------------------------------------------------------------------------  
…

Additionally you can show more detailed output about hint usage by raising the client log level (client_min_messages) to LOG, which will tell you which hints were used successfully:

SET client_min_messages = LOG;

/*+ NestLoop(table1 table2) IndexScan(table3) */ EXPLAIN SELECT * FROM table1 JOIN table2 
ON (table2_id = table2.id) WHERE table1_id = '123';

LOG:  pg_hint_plan:
used hint:
NestLoop(table1 table2)
not used hint:
IndexScan(table3)
duplication hint:
error hint:
                                        QUERY PLAN                                     	 
----------------------------------------------------------------------------------------------
...

You can find additional aspects to consider in the pg_hint_plan documentation.

Using pganalyze to test query hints

Oftentimes Oracle-to-Postgres migrations run into challenges when on a deadline to complete pre-production performance testing or right after going live. In such situations, pganalyze can help you quickly iterate on different hints and benchmark query plans using Query Tuning Workbooks.

In the following example, we compared a baseline query with a query variant that uses pg_hint_plan to choose a particular index. From these results, it's clear that implementing the hint improves performance by more than 60%, plus it's documented for the whole team to see why the change was made.

By iterating through this process of identifying slow queries, testing variants, and implementing optimizations, you avoid guesswork, ensure that each hint actually benefits your application, and prevent adding unnecessary complexity to your database.

Conclusion

Migrating Oracle hints to PostgreSQL can be a tricky process, but pg_hint_plan provides a valuable tool for those times when you really need to guide Postgres' planner. Nonetheless, remember that PostgreSQL is intended to make sound decisions based on strong statistics, strategic indexing, and well-chosen cost parameters, which can all be optimized using pganalyze. Hints should serve as a targeted solution, not the default approach.

References

Documentation

5mins of Postgres episodes on planner quirks

Webinars & eBooks

Blog posts

]]>

Introducing pg_query for Postgres 16 - Parsing SQL/JSON, Windows support, PL/pgSQL parse mode & more

Lukas Fittl — Thu, 11 Jan 2024 12:00:00 GMT

Parsing SQL queries and turning them into a syntax tree is not a simple task. Especially when you want to support special syntax that is specific to a particular database engine, like Postgres. And when you’re working with queries day in day out, like we do at pganalyze, understanding the actual intent of a query, which tables it scans, which columns it filters on, and such, is essential.

Almost 10 years ago, we determined that in order to create the best product for monitoring and optimizing Postgres, we needed to parse queries the way that Postgres does. We released the first version of pg_query back in 2014, and have seen many different projects outside of pganalyze utilize our open-source project. For example, to support migration use cases, create linting tools, or check which queries an application executes (see our post from 2021 for some examples). And to name just one vanity metric, the Ruby binding for pg_query has been downloaded an incredible 34 million times!

Today, we’re excited to announce the new pg_query release based on the Postgres 16 parser, which introduces support for running on Windows (a frequently requested addition), alternate query parse modes (e.g. to parse PL/pgSQL assignments), as well as parsing and deparsing new Postgres syntax, such as SQL/JSON. We’ve released updated Ruby, Rust and Go bindings, and expect bindings maintained by the community, such as for Node.js and Python, to be updated soon as well.

In this post, we showcase how to use pg_query in your application, and a few benefits of the new release. But first, let’s go back to the basics - how does pg_query work?

pg_query, the Postgres parser as a standalone C library

At its core, pg_query is all about making the “raw_parser” function from Postgres available. We’ve written about this in more detail in the original pg_query announcement, but the quick summary is:

We apply a tiny amount of patches on top of Postgres, e.g. to help with parsing $n parameter references in queries from pg_stat_statements
We utilize libclang to build a tree of dependencies between functions and global variables in the Postgres source code
In some cases, we apply mocks to avoid entering parts of Postgres we don’t need (e.g., functions that access the file system)
We locate all the source code necessary for the functions we want to call (like “raw_parser”), and remove all other code, to make sure the compiler doesn’t do unnecessary work, or pull in functionality we don’t need
From the built-in node definitions (which are C structs), we automatically create output functions for JSON and protocol buffers, to make it convenient to write bindings in other programming languages

Overall, this results in a library that can parse SQL text and return a Postgres parse tree for you to work with and modify, whilst supporting the full syntax that Postgres itself supports.

From an end user perspective that means you can, for example in the Ruby library, use the following code to parse a query, and find out which table it's querying:

require 'pg_query'
parsed_query = PgQuery.parse("SELECT * FROM users")
puts parsed_query.tree.stmts.first.stmt.select_stmt.from_clause.first.range_var.inspect
# => <PgQuery::RangeVar: catalogname: "", schemaname: "", relname: "users", inh: true, relpersistence: "p", location: 14>

The parse tree structs are automatically generated as protocol buffer definitions based on Postgres’ internal structs located in parsenodes.h and adjacent files, and the language-specific bindings can use each language’s protobuf libraries to have properly typed structs as well.

The main change in the core parsing functionality in this release is that we’ve added support for compiling libpg_query on Windows (with either MSVC, or an MSYS2 stack using MinGW/etc), a frequently requested feature.

Using query fingerprints to identify queries across servers

Besides parsing itself, there was another major use case that we needed to solve for pganalyze: The ability to group queries together.

Postgres itself generates a “queryid” to support this. Originally part of pg_stat_statements, it has been part of Postgres core since Postgres 14, and is generated when “compute_query_id” is enabled (automatically done when using pg_stat_statements). However, the Postgres queryid has its flaws: Besides not always grouping together as well as it could (e.g. in the case of IN lists), it’s not portable. If you ran the same query on two different servers, you would get two different query IDs. This difference in query IDs is primarily explained by the fact that Postgres determines which tables a query references based on the relation OIDs. But those OIDs are not stable across servers, as they are internal identifiers.

With the pg_query fingerprint we intentionally went another way: We utilize the name (and schema) of the table, as it is present in the raw parse tree that pg_query has access to, when generating a unique identifier for a query.

There are of course many other parts of a query we also take into consideration, e.g. referenced columns, expressions, functions, etc. To enable grouping we do not include constant values in the fingerprint, to ensure that two similar queries get the same fingerprint:

PgQuery.fingerprint("SELECT * FROM users WHERE id = 1")
# => "a0ead580058af585"
PgQuery.fingerprint("SELECT * FROM users WHERE id = 2")
# => "a0ead580058af585"
PgQuery.fingerprint("SELECT * FROM users WHERE email = $1")
# => "e213d9d32c7097d5"

What else can we use fingerprints for? One use case that we’ve heard about from pganalyze customers, is to use query fingerprints to help identify the same query on both the application side and the database.

Specifically, by using pg_query in application side tracing to tag a query, and then, when looking at a slow trace, using that data in pganalyze to find more detailed information about database-side performance. This also inspired our recent integration with OpenTelemetry, which solves the same use case in a slightly different way.

Utilizing deparsing to upgrade queries to Postgres 16 SQL/JSON syntax

Now to something new in the Postgres 16 release! In Postgres 16, one of the bigger syntax changes was the addition of SQL/JSON. And pg_query fully supports that, both for parsing, as well as deparsing (which allows you to turn a syntax tree back into a SQL statement).

We can use the pg_query deparser to write the equivalent of a codemod for SQL statements, that rewrites the legacy syntax into the more standard SQL/JSON syntax.

For example, imagine we have many places where we build JSON objects manually in SQL using the “json_build_object” function, and wanted to replace that with the new JSON_OBJECT syntax:

q = PgQuery.parse("SELECT json_build_object('key1', 1, 'key2', 'val');")
q.walk! do |node|
  next unless node.is_a?(PgQuery::Node) && node.node == :func_call
  func_name = node.func_call.funcname[0].string.sval
  if func_name == 'json_build_object'
    exprs = node.func_call.args.each_slice(2).map do |key, value|
      PgQuery::Node.from(
        PgQuery::JsonKeyValue.new(
          key: key,
          value: PgQuery::JsonValueExpr.new(raw_expr: value)
        )
      )
    end
    node.inner = PgQuery::JsonObjectConstructor.new(exprs: exprs)
  end
end
q.deparse
# => "SELECT JSON_OBJECT('key1': 1, 'key2': 'val')"

Each release, we test the pg_query deparser for completeness with the full set of Postgres regression tests, and be it SQL/JSON, or other new syntax, you can rest assured that pg_query supports it.

Alternate parse modes to work with PL/pgSQL expressions

Since Postgres 14, PL/pgSQL expressions are now parsed through the regular “raw_parser” functionality, by passing a special mode flag that then allows for PL/pgSQL specific syntax.

We didn’t support this in pg_query before, but thanks to a contribution by Landan Cheruka, there is now a way to parse PL/pgSQL expressions directly with pg_query.

Let’s first utilize parse_plpgsql to parse a function definition, the example taken from the Postgres documentation:

CREATE OR REPLACE FUNCTION cs_fmt_browser_version(v_name varchar,
                                              	  v_version varchar)
RETURNS varchar AS $$
BEGIN
  IF v_version IS NULL THEN
	RETURN v_name;
  END IF;
  RETURN v_name || '/' || v_version;
END;$$;

{
  "PLpgSQL_function": {
    "datums": [
      { "PLpgSQL_var": { "refname": "v_name", "datatype": { "PLpgSQL_type": { "typname": "UNKNOWN" } } } },
      { "PLpgSQL_var": { "refname": "v_version", "datatype": { "PLpgSQL_type": { "typname": "UNKNOWN" } } } },
      { "PLpgSQL_var": { "refname": "found", "datatype": { "PLpgSQL_type": { "typname": "UNKNOWN" } } } }
    ],
    "action": {
      "PLpgSQL_stmt_block": {
        "body": [
          {
            "PLpgSQL_stmt_if": {
              "cond": {
                "PLpgSQL_expr": { "query": "v_version IS NULL", "parseMode": 2 }
              },
              "then_body": [
                {
                  "PLpgSQL_stmt_return": {
                    "expr": {
                      "PLpgSQL_expr": { "query": "v_name", "parseMode": 2 }
                    }
...
            "PLpgSQL_stmt_return": {
              "expr": {
                "PLpgSQL_expr": { "query": "v_name || '/' || v_version", "parseMode": 2 }
...

In this function parse tree, you can see the different PLpgSQL_expr expressions, but the actual expression is just text. We can now use the new pg_query_parse_opt function to turn that text into a parse tree:

#include <pg_query.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
  PgQueryParseResult result;

  result = pg_query_parse_opts("v_name || '/' || v_version", PG_QUERY_PARSE_PLPGSQL_EXPR);

  if (result.error) {
	printf("error: %s at %d\n", result.error->message, result.error->cursorpos);
  } else {
	printf("%s\n", result.parse_tree);
  }

  pg_query_free_parse_result(result);

  return 0;
}

And that gives us a regular parse tree to work with:

{
	"version": 160001,
	"stmts": [
    	{
        	"stmt": {
            	"SelectStmt": {
                	"targetList": [
                    	{
                        	"ResTarget": {
                            	"val": {
                                	"A_Expr": {
                                    	"kind": "AEXPR_OP",
…

We’re still in the process of updating language bindings to support optionally using these parse modes, and would be curious to hear about more use cases for working with PL/pgSQL and pg_query.

A shout-out to the community

pg_query wouldn’t be the same without the community!

We want to expressly call out:

Lele Gaifax for maintaining the Python binding “pglast” and proactively testing libpg_query PRs
Landan Cheruka for adding support for alternate parse modes
Anuraag Agrawal for contributions to enable use in WebAssembly (see pg_query_go without cgo)
Mehmet Emin KARAKAŞ for the many deparser improvements over the years
Philipp Steinrötter for creating the Postgres Language Server based on pg_query.rs, and giving lots of good feedback on how things could work better
And everyone else who contributed to libpg_query and related projects!

Looking ahead, we’re also looking forward to continued conversations with the Postgres community on how we could upstream parts of pg_query as a core part of Postgres, so a query parsing library could be provided directly as part of Postgres.

In conclusion

We’re excited about the new pg_query version, and we’re always happy to hear about new use cases you find for using it to work with Postgres queries. If you have ideas on how pg_query could be better, feel free to open an issue on GitHub.

And if you’ve benefited from pg_query in the past, and have not yet tried out pganalyze to optimize your Postgres performance, you can try out pganalyze with our free 14-day trial.

]]>

Postgres 16: Cumulative I/O statistics with pg_stat_io

Lukas Fittl — Tue, 14 Feb 2023 12:00:00 GMT

One of the most common questions I get from people running Postgres databases at scale is:
How do I optimize the I/O operations of my database?

Historically, getting a complete picture of all the I/O produced by a Postgres server has been challenging. To start with, Postgres splits its I/O activity into writing the WAL stream, and reads/writes to the data directory. The real challenge is understanding second-order effects around writes: Typically the write to the data directory happens after the transaction commits, and understanding which process actually writes to the data directory (and when) is hard.

This whole situation has become an even bigger challenge in the cloud, when faced with provisioned IOPS, or worse, having to pay for individual I/Os like on Amazon Aurora. Often the solution has been to look at parts of the system that have instrumentation (such as individual queries), to get at least some sense for where the activity is happening.

Last weekend, a major improvement to the visibility into I/O activity was committed to the upcoming Postgres 16 by Andres Freund, and authored by Melanie Plageman, with documentation contributed by Samay Sharma. My colleague Maciek Sakrejda and I have reviewed this patch through its various iterations, and we're very excited about what it brings to Postgres observability.

Welcome, pg_stat_io. Let's take a look:

Querying system-wide I/O statistics in Postgres
Use cases for pg_stat_io
Sneak peek: Visualizing pg_stat_io in pganalyze
The future of I/O observability in Postgres

Querying system-wide I/O statistics in Postgres

Let's start by using a local Postgres built fresh from the development branch. Note that Postgres 16 is still under heavy development, not even at beta stage, and should definitely not be used on production. For this I followed the new cheatsheet for using the Meson build system (also new in Postgres 16), which significantly speeds up the build and test process.

We can start by querying pg_stat_io to get a sense for which information is tracked, omitting rows that are empty:

SELECT * FROM pg_stat_io WHERE reads <> 0 OR writes <> 0 OR extends <> 0;

    backend_type     | io_object | io_context |  reads   | writes  | extends | op_bytes | evictions |  reuses  | fsyncs |          stats_reset          
---------------------+-----------+------------+----------+---------+---------+----------+-----------+----------+--------+-------------------------------
 autovacuum launcher | relation  | normal     |       19 |       5 |         |     8192 |        13 |          |      0 | 2023-02-13 11:50:27.583875-08
 autovacuum worker   | relation  | normal     |    15972 |    2494 |    2894 |     8192 |     17430 |          |      0 | 2023-02-13 11:50:27.583875-08
 autovacuum worker   | relation  | vacuum     |  5754853 | 3006563 |       0 |     8192 |      2056 |  5752594 |        | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | bulkread   | 25832582 |  626900 |         |     8192 |    753962 | 25074439 |        | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | bulkwrite  |     4654 | 2858085 | 3259572 |     8192 |    998220 |  2209070 |        | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | normal     |   960291 |  376524 |  159497 |     8192 |   1103707 |          |      0 | 2023-02-13 11:50:27.583875-08
 client backend      | relation  | vacuum     |   128710 |       0 |       0 |     8192 |      1221 |   127489 |        | 2023-02-13 11:50:27.583875-08
 background worker   | relation  | bulkread   | 39059938 |  590896 |         |     8192 |    802939 | 38253662 |        | 2023-02-13 11:50:27.583875-08
 background worker   | relation  | normal     |   257533 |  118972 |       0 |     8192 |    256437 |          |      0 | 2023-02-13 11:50:27.583875-08
 background writer   | relation  | normal     |          |  243142 |         |     8192 |           |          |      0 | 2023-02-13 11:50:27.583875-08
 checkpointer        | relation  | normal     |          |  390141 |         |     8192 |           |          |  18812 | 2023-02-13 11:50:27.583875-08
 standalone backend  | relation  | bulkwrite  |        0 |       0 |       8 |     8192 |         0 |        0 |        | 2023-02-13 11:50:27.583875-08
 standalone backend  | relation  | normal     |      689 |     983 |     470 |     8192 |         0 |          |      0 | 2023-02-13 11:50:27.583875-08
 standalone backend  | relation  | vacuum     |       10 |       0 |       0 |     8192 |         0 |        0 |        | 2023-02-13 11:50:27.583875-08
(14 rows)

At a high level, this information can be interpreted as:

Statistics are tracked for a given backend type, I/O object type (i.e. whether it's a temporary table), and I/O context (more on that later)
The main statistics are counting I/O operations: reads, writes and extends (a special kind of write to resize data files)
For each I/O operation the size in bytes is noted to help interpret the statistics (currently always block size, i.e., usually 8kB)
Additionally, the number of shared buffer evictions, ring buffer re-uses and fsync calls are tracked

On Postgres 16, this system-wide information will always available. You can find the complete details of each field in the Postgres documentation.

Note that pg_stat_io shows logical I/O operations issued by Postgres. Whilst this often eventually maps to an actual I/O to a disk (especially in the case of writes), the operating system has its own caching and batching mechanism, and will for example often times split up an 8kB write to become two individual 4kB writes to the file system.

Generally we can assume that this captures all I/O issued by Postgres, except for:

I/O for writing the Write-Ahead-Log (WAL)
Special cases such as tables being moved between tablespaces
Temporary files (such as used for sorts, or extensions like pg_stat_statements)

Note that temporary relations are tracked (they are not the same as temporary files): In pg_stat_io these are marked as io_object = "temp relation" - you may otherwise be familiar with them being called "local buffers" in other statistics views.

With the basics in place, we can take a closer look at some use cases and learn why this matters.

Use cases for pg_stat_io

Tracking Write I/O activity in Postgres

Lifecycle of a write in Postgres, and what is currently not visible in most statistics

When looking at a write in Postgres, we need to look beyond what a client sees as the query runtime, or something like pg_stat_statements can track. Postgres has a complex set of mechanisms that guarantee durability of writes, whilst allowing clients to return quickly, trusting that the server has persisted the data in a crash safe manner.

The first thing that Postgres does to persist data, is to write it to the WAL log. Once this has succeeded, the client will receive confirmation that the write has been successful. But what happens afterwards is where the additional statistics tracking comes in handy.

For example, if you look at a given INSERT statement in pg_stat_statements, the shared_blks_written field is often going to tell you next to nothing, because the actual write to the data directory typically occurs at a later time, in order to batch writes for efficiency and to avoid I/O spikes.

In addition to writing the WAL, Postgres will also update the shared (or local) buffers for the write. Such an update will mark the buffer page in question as "dirty".

Then, in most cases, another process is responsible for actually writing the dirty page to the data directory. There are three main process types to consider:

The background writer: Runs continuously in the background to write out (some) dirty pages
The checkpointer: Runs on a scheduled basis, or based on amount of WAL written, and writes out all dirty pages not yet written
All other process types, including regular client backends: Write out dirty pages if they need to evict the buffer page in question

The main thing to understand is when the third case occurs - because it can drastically slow down queries. Even a simple "SELECT" might have to suddenly write to disk, before it has enough space in shared buffers to read in its data.

Historically you were already able to see some of this activity through the pg_stat_bgwriter view, specifically the fields named buffers_. However, this was incomplete, did not consider autovacuum activity explicitly, and did not let you understand the root cause of a write (e.g. a buffer eviction).

With pg_stat_io you can simply look at the writes field, and see both an accurate aggregate number, as well as exactly which process in Postgres actually ended up writing your data to disk.

Improve workload stability and sizing shared_buffers by monitoring shared buffer evictions

One of the most important metrics that pg_stat_io helps give clarity on, is the situation where a buffer page in shared buffers is evicted. Since shared buffers is a fixed size pool of pages (each 8kb in size, on most Postgres systems), what is cached inside it matters a great deal - especially when your working set exceeds shared buffers.

By default, if you're on a self-managed Postgres, the shared_buffers setting is set to 128MB - or about 16,000 pages. Let's imagine you end up having loaded something through a very inefficient index scan, that ended up consuming all 128MB.

What happens when you suddenly read something completely different? Postgres has to go and remove some of the old data from cache - also known as evicting a buffer page.

This eviction has two main effects:

Data that was in Postgres buffer cache before, is no longer in the cache (note it may still be in the OS page cache)
If the page that was evicted was marked as "dirty", the process evicting it also has to write the old page to disk

Both of these aspects matter for sizing shared buffers, and pg_stat_io can clearly show this by tracking evictions for each backend type across the system. Further, if you see a sudden spike in evictions, and then suddenly a lot of reads, it can help you infer that the cached data that was evicted, was actually needed again shortly afterwards. If in doubt, you can use the pg_buffercache extension to look at the current shared buffers contents in detail.

Tracking cumulative I/O activity by autovacuum and manual VACUUMs

It's a fact that every Postgres server needs the occasional VACUUM - whether you schedule it manually, or have autovacuum take care of it for you. It helps clean up dead rows and makes space re-usable, and it freezes pages to prevent transaction ID wraparound.

But there is such a thing as VACUUMing too often. If not tuned correctly, VACUUM and autovacuum can have a dramatic effect on I/O activity. Historically the best bet was to look at the output of log_autovacuum_min_duration, which will give you information like this:

  LOG:  automatic vacuum of table "mydb.pg_toast.pg_toast_42593": index scans: 0
        pages: 0 removed, 13594 remain, 13594 scanned (100.00% of total)
        tuples: 0 removed, 54515 remain, 0 are dead but not yet removable
        removable cutoff: 11915, which was 6 XIDs old when operation ended
        new relfrozenxid: 11915, which is 4139 XIDs ahead of previous value
        frozen: 13594 pages from table (100.00% of total) had 54515 tuples frozen
        index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
        avg read rate: 0.113 MB/s, avg write rate: 0.113 MB/s
        buffer usage: 13614 hits, 13602 misses, 13600 dirtied
        WAL usage: 40786 records, 13600 full page images, 113072608 bytes
        system usage: CPU: user: 0.26 s, system: 0.52 s, elapsed: 939.84 s

From the buffer usage you can determine that this single VACUUM had to read 13602 pages, and marked 13600 pages as dirty. But what if we want to get a more complete picture, and across all our VACUUMs?

With pg_stat_io, you can now see a system-wide measurement of the impact of VACUUM, by looking at everything marked as io_context = 'vacuum', or associated to the autovacuum worker backend type:

SELECT * FROM pg_stat_io WHERE backend_type = 'autovacuum worker' OR (io_context = 'vacuum' AND (reads <> 0 OR writes <> 0 OR extends <> 0));

    backend_type    | io_object | io_context |  reads  | writes  | extends | op_bytes | evictions | reuses  | fsyncs |          stats_reset          
--------------------+-----------+------------+---------+---------+---------+----------+-----------+---------+--------+-------------------------------
 autovacuum worker  | relation  | bulkread   |       0 |       0 |         |     8192 |         0 |       0 |        | 2023-02-13 11:50:27.583875-08
 autovacuum worker  | relation  | normal     |   16306 |    2494 |    2915 |     8192 |     17785 |         |      0 | 2023-02-13 11:50:27.583875-08
 autovacuum worker  | relation  | vacuum     | 5824251 | 3028684 |       0 |     8192 |      2588 | 5821460 |        | 2023-02-13 11:50:27.583875-08
 client backend     | relation  | vacuum     |  128710 |       0 |       0 |     8192 |      1221 |  127489 |        | 2023-02-13 11:50:27.583875-08
 standalone backend | relation  | vacuum     |      10 |       0 |       0 |     8192 |         0 |       0 |        | 2023-02-13 11:50:27.583875-08
(5 rows)

In this particular example, in sum, the autovacuum worker has read 44.4 GB of data (5,824,251 buffer pages), and written 23.1GB (3,028,684 buffer pages).

If you track these statistics over time, it will help you have a crystal-clear picture of whether autovacuum is to blame for an I/O spike during business hours. It will also help you make changes to tune autovacuum with more confidence, e.g. making autovacuum more aggressive to prevent bloat.

Visibility into bulk read/write strategies (sequential scans and COPY)

Have you ever used COPY in Postgres to load data? Or read data from a table using a sequential scan? You may not know that in most cases, this data does not pass through shared buffers in the regular way. Instead, Postgres uses a special dedicated ring buffer that ensures that most of shared buffers is undisturbed by such large activities.

Before pg_stat_io, it was near impossible to understand this activity in Postgres, as there was simply no tracking for it. Now, we can finally see both bulk reads (typically large sequential scans) and bulk writes (typically COPY in), and the I/O activity they cause.

You can simply filter for the new bulkwrite and bulkread values in io_context, and have visibility into this activity:

SELECT * FROM pg_stat_io WHERE io_context IN ('bulkread', 'bulkwrite') AND (reads <> 0 OR writes <> 0 OR extends <> 0);

    backend_type    | io_object | io_context |  reads   | writes  | extends | op_bytes | evictions |  reuses  | fsyncs |          stats_reset          
--------------------+-----------+------------+----------+---------+---------+----------+-----------+----------+--------+-------------------------------
 client backend     | relation  | bulkread   | 25900458 |  627059 |         |     8192 |    754610 | 25141667 |        | 2023-02-13 11:50:27.583875-08
 client backend     | relation  | bulkwrite  |     4654 | 2858085 | 3259572 |     8192 |    998220 |  2209070 |        | 2023-02-13 11:50:27.583875-08
 background worker  | relation  | bulkread   | 39059938 |  590896 |         |     8192 |    802939 | 38253662 |        | 2023-02-13 11:50:27.583875-08
 standalone backend | relation  | bulkwrite  |        0 |       0 |       8 |     8192 |         0 |        0 |        | 2023-02-13 11:50:27.583875-08
(4 rows)

In this example, there is 495 GB of bulk read activity, and 21 GB of bulk write activity we had no good way of identifying before. However, and most importantly, we don't have to worry about the evictions count here - these are all evictions from the special bulk read / bulk write ring buffer, not from regular shared buffers.

Sneak peek: Visualizing pg_stat_io in pganalyze

It's still a while until Postgres 16 will be released (usually September or October each year), but to help test things (and because it's exciting!) I took a quick stab at updating pganalyze in an experimental branch to collect pg_stat_io metrics and visualize them over time.

Here is a very early look at how this may look like in the future:

Experimental view of how pg_stat_io could look like when visualized over time

Even though this is just running locally on my laptop, already we can see a clear pattern where writes are done by the checkpointer and background writer processes, most of the time. We can also see my checkpoint_timeout being set to 5min (the default), with both writes and fsyncs happening like clockwork - note the workload is periodic every 10 minutes, so every second checkpoint has less work to do.

However, we can also clearly see a spike in activity - and that spike can be easily explained: To generate more database activity, I triggered a big daily background process around 8:10pm UTC. The high amount of data read caused the working set to momentarily exceed shared buffers, and caused a large amount of buffer evictions, which then caused the client backend having to write out buffer pages unexpectedly.

On this system I have a very small shared_buffers setting (the default, 128 MB). I should probably increase shared_buffers...

The future of I/O observability in Postgres

A lot of the ground work for pg_stat_io actually happened previously in Postgres 15, through the new cumulative statistics system using shared memory.

Before Postgres 15, statistics tracking had to go through the statistics collector (an obscure process that received UDP packets from individual processes part of Postgres), which was slow and error prone. This historically limited the ability to collect more advanced statistics easily. As the addition of pg_stat_io shows, it is now much easier to track additional information about how Postgres operates.

Amongst the immediate improvements that are already being discussed are:

Tracking of system-wide buffer cache hits (to allow calculating an accurate buffer cache hit ratio)
Cumulative system-wide I/O times (not just I/O counts as currently present in pg_stat_io)
Better cumulative WAL statistics (i.e. going beyond what pg_stat_wal offers)
Additional I/O tracking for tables and indexes

Our team at pganalyze is excited to have helped shape the new pg_stat_io view, and we look forward to continue working with the community on making Postgres better.

Share this article: If you'd like to share this article with your peers, you can tweet about it here.

PS: If you're interested in learning more about optimizing Postgres I/O performance and costs you can check out our webinar recording.

]]>

Lock monitoring in Postgres: Find blocking queries in a lock tree with pganalyze

Keiko Oda — Thu, 01 Dec 2022 13:00:00 GMT

Postgres databases power many mission critical applications, and applications expect consistent query performance. If even a single query takes longer than expected, it can lead to unhappy users, or delayed background processes. We can use EXPLAIN to debug a slow query, but there is one Postgres problem it won't tell us about: Blocked queries. You may also know this as "blocked sessions" from other database systems. This is when one query holds a lock on a table and the other is waiting for those locks to be released.

Historically, the solution for Postgres lock monitoring was to run a set of queries provided by the community to debug the issue. These queries either look at the pg_locks view in Postgres, or use the newer pg_blocking_pids() function to walk the lock tree in Postgres. But this involves a lot of manual work, as well as being present when the problem occurs. If a problem happened earlier in the day and resolved itself, the lock information is already gone.

Today, we're excited to announce a better method for Postgres lock monitoring and alerting. The new pganalyze Lock Monitoring feature automatically detects locking/blocking queries as they happen, can alert you of production incidents in near-real time, and keeps a history of past locking incidents to help you understand an earlier locking problem.

Introducing the new pganalyze Lock Monitoring feature
- Identifying Postgres connections that block queries, and lead to cascading lock waits
- Long running migrations that hold exclusive locks for too long
Get alerted of blocking/locking query problems in near-real time
Behind the scenes: pg_blocking_pids()
Try the new pganalyze Lock Monitoring features now

Introducing the new pganalyze Lock Monitoring feature

Demonstration of how an idle connection progresses and blocks DELETE FROM and ALTER TABLE queries which in turn block 3 other SELECT queries

Previously, pganalyze already collected wait events. These events tell you what Postgres connections are waiting on, and the Wait Event History makes it easy to find outliers over time. With the new extended Connections page, you can now easily discover which connection is blocking other queries right inside the Connections page, and quickly jump to historic locking problems and see indirect relationships.

For example, when you have many "Waiting for Lock" connections, the database is likely having some trouble. It can be challenging to identify why a query is blocked. The new pganalyze Lock Monitoring feature lets you follow the whole story, from queries that are waiting to the connection that is causing the lock waits in the firsts place, and helps you prioritize the issues you should resolve first.

Next, let's look at two typical examples of blocked queries that you would encounter with a production application:

Identifying Postgres connections that block queries, and lead to cascading lock waits

Long-running query with lots of operations on multiple tables holding locks longer than usual

This first example is from a production situation we encountered on the pganalyze application database itself. The bottom two queries (PID: 24825 and 30051) were waiting for tuples—row versions—that the bolded query (PID: 48665) was also going to lock—and it had priority in the lock tree. That query itself was waiting for a deletion from the long-running query above (PID: 27542).

Lock tree of cascading lock waits

Here, we had one recurring long-running query that was combining DELETEs on multiple tables, and holding locks longer than usual. Therefore it was not a situation of "somebody ran a bad query by accident" or "somebody wrote a migration that takes an exclusive lock on the table for a long time", but rather that we needed to re-think how to avoid this query in the first place.

Specifically, in this application we saw two possible solutions here:

Split up the query into multiple smaller queries, and potentially only delete a subset of rows at a time
Use table partitioning to avoid a pattern where a daily DELETE is necessary

Let's take a look at another common locking situation: Schema migrations with slow DDL statements.

Long running migrations that hold exclusive locks for too long

Connection holding an AccessExclusive lock on the articles table, taken by an earlier DDL statement in the same transaction, blocking other queries on the table

The second example is a common locking situation: long running migrations. There are several types of migrations that can lock the table and block other queries.

Let's look at the following scenario: a new column called data is introduced to the table articles, and that column needs to be backfilled. A migration script in Rails could look like this:

class AddDataToArticles < ActiveRecord::Migration[7.0]
  def change
    add_column :articles, :data, :text
    Article.update_all data: "backfilling_value"
  end
end

Another way of looking at this Rails migration is:

Start transaction (happens automatically in Rails migrations)
Add column to the table (this is very fast, but takes an exclusive lock)
Run the backfill query (this is slow)
Commit the transaction and release the locks

Since this migration will happen in one transaction, that Article.update_all data: "backfilling_value" would happen inside of the transaction (PID: 44248), with the query being UPDATE articles SET data = $1 like you can see in the screenshot. That transaction would hold the exclusive lock on the articles table from the add_column :articles, :data, :text part.

Now, what effect would this have on this database? With this example, backfilling 70M rows took almost 10 minutes. During this time, any queries that include the articles table (even a simple SELECT) had to wait for the migration to be done. If this was a web application, this migration would have taken the application down for 10 minutes!

It is very important that we don't write the migration like this to begin with. However, in case we do run such a migration, it is helpful to quickly know the lock information, so we can take the appropriate action, like canceling a query/migration. As a side note, you can find great examples of "bad migrations" in the strong migrations project.

Get alerted of blocking/locking query problems in near-real time

Lock Monitoring alert

The new pganalyze Lock Monitoring feature includes a new "Blocking Queries" alert that will notify you when a query is blocking other queries for more than a specified time threshold. By default, the alert will trigger only when the query is blocking 3 or more other queries for more than 5 minutes, and consider it critical after 10 minutes. You can configure this based on your operational standards to something as low as 1 query being blocked for 10 seconds, if you would like to get notified for any query being blocked right away (we don't recommend actually setting it this low for most environments).

By default, these new alerts will show up in the pganalyze UI. Based on your preferences you can enable notifications to be sent by email, Slack or PagerDuty. You can learn more about our alerts and checkups here.

Behind the scenes: pg_blocking_pids()

To obtain the lock information, the pganalyze collector uses the pg_blocking_pids() function. This function returns the list of PIDs a particular query is waiting for (is blocked by):

test_db=# SELECT pid, pg_blocking_pids(pid) FROM pg_stat_activity WHERE wait_event_type = 'Lock';
  pid  | pg_blocking_pids 
-------+------------------
 81175 | {33219}
 81189 | {33219,81175}
 85112 | {81189}
 85128 | {81189}
 85146 | {81189}
(5 rows)

Calling this function uses the Postgres lock manager, which can be a heavily utilized component on busy Postgres systems. To keep overhead at a minimum, the collector only calls this function when a query is already in the "Waiting for Lock" state, and we know that there is a reason to get additional information. In our benchmarks as well as tests on production systems, we have observed no negative performance impact from tracking this additional data. You can disable this feature by passing the --no-postgres-locks option to the pganalyze collector, if needed.

Lock tree based on the result of pg_blocking_pids()

In case you are calling the pg_blocking_pids() function manually, be careful to look at the lock tree (as shown in the diagram) to detect other connections that have priority for acquiring the lock. If you are using pganalyze Lock Monitoring feature, this is done automatically for you.

Try the new pganalyze Lock Monitoring features now

If you are an existing pganalyze customer on the current Scale or Enterprise Cloud plans, you can start using the new pganalyze Lock Monitoring features today, or if you are not yet using pganalyze you can sign up for a free 14-day trial.

To collect the necessary locking/blocking information, make sure to upgrade to the pganalyze collector version v0.46.0 or newer. A new Enterprise Server release including this will be released soon.

We also want to extend our thanks to our early access group that reached out in response to the pganalyze newsletter. We've already incorporated feedback, and are looking to add more improvements, such as identifying which individual object a lock is being held on—the whole table, a particular row, or a virtual transaction ID. We are planning to keep iterating on this new set of features and would love to hear your feedback.

Share this on Twitter

]]>

How Postgres Chooses Which Index To Use For A Query

Lukas Fittl — Fri, 01 Apr 2022 12:00:00 GMT

Using Postgres sometimes feels like magic. But sometimes the magic is too much, such as when you are trying to understand the reason behind a seemingly bad Postgres query plan.

I've often times found myself in a situation where I asked myself: "Postgres, what are you thinking?". Staring at an EXPLAIN plan, seeing a Sequential Scan, and being puzzled as to why Postgres isn't doing what I am expecting.

This has led me down the path of reading the Postgres source, in search for answers. Why is Postgres choosing a particular index over another one, or not choosing an index altogether?

In this blog post I aim to give an introduction to how the Postgres planner analyzes your query, and how it decides which indexes to use. Additionally, we’ll look at a puzzling situation where the join type can impact which indexes are being used.

We’ll look at a lot of Postgres source code, but if you are short on time, you might want to jump to how B-tree index costing works, and why Nested Loop Joins impact index usage.

We’ll also talk about an upcoming pganalyze feature at the very end!

A tour of Postgres: Parse analysis and early stages of planning
Where Index Scans are made
New features coming soon to pganalyze
Conclusion
Other helpful resources

A tour of Postgres: Parse analysis and early stages of planning

To start with, let’s look at a query’s lifecycle in Postgres. There are four important steps in how a query is handled:

Parsing: Turning query text into an Abstract Syntax Tree (AST)
Parse analysis: Turning table names into actual references to table objects
Planning: Finding and creating the optimal query plan
Execution: Executing the query plan

For understanding how the planner chooses which indexes to use, let’s first take a look at what parse analysis does.

Whilst there are multiple entry points into parse analysis, depending if you have query parameters or not, the core function in parse analysis is transformStmt (source):

/*
* transformStmt -
*    recursively transform a Parse tree into a Query tree.
*/
Query *
transformStmt(ParseState *pstate, Node *parseTree)
{

This takes the raw parse tree output (from the first step), and returns a Query struct. It has a lot of specific cases, as it handles both regular SELECTs as well as UPDATEs and other DML statements. Note that utility statements (DDL, etc) mostly get passed through to the execution phase.

Since we are interested in tables and indexes, let’s take a closer look at how parse analysis handles the FROM clause:

void
transformFromClause(ParseState *pstate, List *frmList)
{
   ListCell   *fl;
 
   /*
    * The grammar will have produced a list of RangeVars, RangeSubselects,
    * RangeFunctions, and/or JoinExprs. Transform each one (possibly adding
    * entries to the rtable), check for duplicate refnames, and then add it
    * to the joinlist and namespace.
    */
   foreach(fl, frmList)
   {
       …
 
       n = transformFromClauseItem(pstate, n,
                                   &nsitem,
                                   &namespace);
 
…
/*
* transformFromClauseItem -
*    Transform a FROM-clause item, adding any required entries to the
*    range table list being built in the ParseState, and return the
*    transformed item ready to include in the joinlist.  Also build a
*    ParseNamespaceItem list describing the names exposed by this item.
*    This routine can recurse to handle SQL92 JOIN expressions.
*/
static Node *
transformFromClauseItem(ParseState *pstate, Node *n,
                       ParseNamespaceItem **top_nsitem,
                       List **namespace)
{
…

Postgres already separates between the range table list (essentially a list of all the tables referenced by the query), and the joinlist. This distinction will also be visible at a later point in the planner.

Note that at this point Postgres has not yet made up its mind which indexes to use - it just decided that the FROM reference you called “foobar” is actually the table “foobar” in the “public” schema with OID 16424.

This information now gets stored in the Query struct, which is the result of the parse analysis phase. This Query struct is then passed into the planner, and that’s where it gets interesting.

Four levels of planning a query

Commonly we would start with the standard_planner (source) function as an entry point into the Postgres planner:

PlannedStmt *
standard_planner(Query *parse, const char *query_string, int cursorOptions,
                ParamListInfo boundParams)
{

This takes our Query struct, and ultimately returns a PlannedStmt. For reference, the PlannedStmt struct (source) looks like this:

/* ----------------
*      PlannedStmt node
*
* The output of the planner is a Plan tree headed by a PlannedStmt node.
* PlannedStmt holds the "one time" information needed by the executor.
* ----------------
*/
typedef struct PlannedStmt
{
   NodeTag     type;
 
   CmdType     commandType;    /* select|insert|update|delete|utility */
 
…
 
   struct Plan *planTree;      /* tree of Plan nodes */
 
…

The tree of plan nodes is what you would be familiar with if you’ve looked at an EXPLAIN output before - ultimately EXPLAIN is based on walking that plan tree and showing you a text/JSON/etc version of it.

The core function of the planner is best described in these lines of standard_planner:

/* primary planning entry point (may recurse for subqueries) */
root = subquery_planner(glob, parse, NULL,
                        false, tuple_fraction);

/* Select best Path and turn it into a Plan */
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);

top_plan = create_plan(root, best_path);

The planner first creates what are called “paths” using the subquery_planner (which may recursively call itself), and then the planner picks the best path. Best on this best path, the actual plan tree is constructed.

For understanding how the planner chose which indexes to use, we must therefore look at paths, not at plan nodes. Let’s see what subquery_planner (source) does:

/*--------------------
* subquery_planner
*    Invokes the planner on a subquery.  We recurse to here for each
*    sub-SELECT found in the query tree.
…
*/
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
                PlannerInfo *parent_root,
                bool hasRecursion, double tuple_fraction)
{

As described in the comment, this handles each sub-SELECT separately - but note that even if the original query contains a written sub-SELECT, the planner may optimize it away to pull it up into the parent planning process, if possible.

For the purposes of focusing on index choice, here are the two key parts of subquery_planner:

/*
 * Do the main planning.  If we have an inherited target relation, that
 * needs special processing, else go straight to grouping_planner.
 */
if (parse->resultRelation &&
    rt_fetch(parse->resultRelation, parse->rtable)->inh)
    inheritance_planner(root);
else
    grouping_planner(root, false, tuple_fraction);

…

/*
 * Make sure we've identified the cheapest Path for the final rel.  (By
 * doing this here not in grouping_planner, we include initPlan costs in
 * the decision, though it's unlikely that will change anything.)
 */
set_cheapest(final_rel);

This method also optimizes for the cheapest path - we’ll see more on that in a moment. But for now, let’s go deeper down the rabbit hole and look at grouping_planner (source):

/* --------------------
 * grouping_planner
 *    Perform planning steps related to grouping, aggregation, etc.
 *
 * This function adds all required top-level processing to the scan/join
 * Path(s) produced by query_planner.
 *
 * --------------------
 */
static void
grouping_planner(PlannerInfo *root, bool inheritance_update,
                double tuple_fraction)
{

Reading through its code, turns out we’re still not there. It’s actually query_planner that we are looking for, as described in this comment:

RelOptInfo *current_rel;
…
/*
* Generate the best unsorted and presorted paths for the scan/join
* portion of this Query, ie the processing represented by the
* FROM/WHERE clauses.  (Note there may not be any presorted paths.)
* We also generate (in standard_qp_callback) pathkey representations
* of the query's sort clause, distinct clause, etc.
*/
current_rel = query_planner(root, standard_qp_callback, &qp_extra);

Before we dive into the query_planner method, let’s pause for a moment and look at what the result of query_planner is, the RelOptInfo struct:

Breaking down a query into tables being scanned (RelOptInfo and RestrictInfo structs)

In the Postgres planner, RelOptInfo is best described as the internal representation of a particular table that is being scanned (with either a sequential scan, or an index scan).

When trying to understand how Postgres interprets your query, adding debug information that shows RelOptInfo would be the closest that you can get to seeing which tables Postgres is going to scan, and how it makes a decision between different scan methods, such as an Index Scan.

RelOptInfo (source) has many details to it, but the key parts for our focus on indexing are these:

/*----------
* RelOptInfo
*      Per-relation information for planning/optimization
…
*      pathlist - List of Path nodes, one for each potentially useful
*                 method of generating the relation
… 
*      baserestrictinfo - List of RestrictInfo nodes, containing info about
*                  each non-join qualification clause in which this relation
*                  participates (only used for base rels)
…
*      joininfo  - List of RestrictInfo nodes, containing info about each
*                  join clause in which this relation participates
…
*/
typedef struct RelOptInfo
{
…
   List       *pathlist;       /* Path structures */
…
   List       *baserestrictinfo;   /* RestrictInfo structures (if base rel) */
…
   List       *joininfo;       /* RestrictInfo structures for join clauses
                                * involving this rel */
…
}

Before we interpret this, let’s look at RestrictInfo (source):

/*
* Restriction clause info.
*
* We create one of these for each AND sub-clause of a restriction condition
* (WHERE or JOIN/ON clause).  Since the restriction clauses are logically
* ANDed, we can use any one of them or any subset of them to filter out
* tuples, without having to evaluate the rest.
..
*/
typedef struct RestrictInfo
{
   NodeTag     type;
   Expr       *clause;         /* the represented clause of WHERE or JOIN */
…
}

A note on terminology: This references “base relations”, which are relations (aka tables) that are looked at solely on their individual basis, as compared to in the context of a JOIN.

In the code sample, RestrictInfo is how our WHERE clause and JOIN conditions get represented. This is the part that is key to understanding how Postgres compares your query against the indexes that exist.

You can think about it this way - for each table that’s included in the query, Postgres generates two lists of “restriction” clauses:

Base restriction clauses: Typically part of your WHERE clause, and are expressions that involve only the table itself - for example users.id = 123
Join clauses: Typically part of your JOIN clause, and expressions that involve multiple tables - for example users.id = comments.user_id

Note the reason that Postgres calls these “restriction” clauses is because they restrict (or filter) the amount of data that is being returned from your table. And how can we effectively filter data from a table? By using an index!

The base restriction clauses will typically be used to filter down the amount of data that is being returned from the table. But join clauses oftentimes will not, as they are only used as part of the matching of rows that happens during the JOIN operation.

The one exception to this are Nested Loop Joins - but we’ll come back to that.

Choosing different paths and scan methods

Let’s go back to query_planner (source), and what it does:

/*
* query_planner
*    Generate a path (that is, a simplified plan) for a basic query,
*    which may involve joins but not any fancier features.
*
* Since query_planner does not handle the toplevel processing (grouping,
* sorting, etc) it cannot select the best path by itself.  Instead, it
* returns the RelOptInfo for the top level of joining, and the caller
* (grouping_planner) can choose among the surviving paths for the rel.
…
*/
RelOptInfo *
query_planner(PlannerInfo *root,
             query_pathkeys_callback qp_callback, void *qp_extra)
{
…
   /*
    * Construct RelOptInfo nodes for all base relations used in the query.
    */
   add_base_rels_to_query(root, (Node *) parse->jointree);
 
…
   /*
    * Ready to do the primary planning.
    */
   final_rel = make_one_rel(root, joinlist);
 
   return final_rel;
}

The main point of query_planner itself is to create a set of RelOptInfo nodes, do a bunch of processing involving them, and then passing them to make_one_rel. As that name says, it creates one “final rel”, which is also a RelOptInfo node, that is then used to create our final plan.

We’ve looked at a bunch of code already, but now it’s time to get to the exciting part!

The implementation of make_one_rel (source) sits in a file with the important sounding name of allpaths.c - and as referenced earlier, when we talk about plan choices, we need to understand which path is chosen, as that is used to then create a plan node.

/*
 * make_one_rel
 *    Finds all possible access paths for executing a query, returning a
 *    single rel that represents the join of all base rels in the query.
 */
RelOptInfo *
make_one_rel(PlannerInfo *root, List *joinlist)
{
…
   /*
    * Compute size estimates and consider_parallel flags for each base rel.
    */
   set_base_rel_sizes(root);
 
…
 
   /*
    * Generate access paths for each base rel.
    */
   set_base_rel_pathlists(root);
   /*
    * Generate access paths for the entire join tree.
    */
   rel = make_rel_from_joinlist(root, joinlist);
 
   return rel;
}

Paths are chosen in three steps:

Estimate the sizes of the involved tables
Find the best path for each base relation
Find the best path for the entire join tree

The first step is mainly concerned with size estimates as they relate to the output of scanning the relation. This impacts the cost and rows numbers you are familiar with from EXPLAIN - and this may impact joins, but typically should not directly impact index usage.

Now step 2 is key to our goal here. And set_base_rel_pathlists ultimately calls set_plain_rel_pathlist (source), which finally looks like what we are interested in:

/*
 * set_plain_rel_pathlist
 *    Build access paths for a plain relation (no subquery, no inheritance)
 */
static void
set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
{
   …
 
   /* Consider sequential scan */
   add_path(rel, create_seqscan_path(root, rel, required_outer, 0));
 
   /* If appropriate, consider parallel sequential scan */
   if (rel->consider_parallel && required_outer == NULL)
       create_plain_partial_paths(root, rel);
 
   /* Consider index scans */
   create_index_paths(root, rel);
 
   /* Consider TID scans */
   create_tidscan_paths(root, rel);
}

Where Index Scans are made

Creating the two types of index scans: plain vs parameterized

Let’s look at create_index_paths (source), since we want to see how indexes are picked:

/*
* create_index_paths()
*    Generate all interesting index paths for the given relation.
*    Candidate paths are added to the rel's pathlist (using add_path).
*
* To be considered for an index scan, an index must match one or more
* restriction clauses or join clauses from the query's qual condition,
* or match the query's ORDER BY condition, or have a predicate that
* matches the query's qual condition.
*
* There are two basic kinds of index scans.  A "plain" index scan uses
* only restriction clauses (possibly none at all) in its indexqual,
* so it can be applied in any context.  A "parameterized" index scan uses
* join clauses (plus restriction clauses, if available) in its indexqual.
* When joining such a scan to one of the relations supplying the other
* variables used in its indexqual, the parameterized scan must appear as
* the inner relation of a nestloop join; it can't be used on the outer side,
* nor in a merge or hash join.
…
*/
void
create_index_paths(PlannerInfo *root, RelOptInfo *rel)
{
…
   /* Examine each index in turn */
   foreach(lc, rel->indexlist)
   {
       IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
 
       …
 
       /*
        * Ignore partial indexes that do not match the query.
        */
       if (index->indpred != NIL && !index->predOK)
           continue;
 
       /*
        * Identify the restriction clauses that can match the index.
        */
       match_restriction_clauses_to_index(root, index, &rclauseset);
 
       /*
        * Build index paths from the restriction clauses.  These will be
        * non-parameterized paths.  Plain paths go directly to add_path(),
        * bitmap paths are added to bitindexpaths to be handled below.
        */
       get_index_paths(root, rel, index, &rclauseset,
                       &bitindexpaths);
 
       /*
        * Identify the join clauses that can match the index.  For the moment
        * we keep them separate from the restriction clauses.
        */
       match_join_clauses_to_index(root, rel, index,
                                   &jclauseset, &joinorclauses);
…
       /*
        * If we found any plain or eclass join clauses, build parameterized
        * index paths using them.
        */
       if (jclauseset.nonempty || eclauseset.nonempty)
           consider_index_join_clauses(root, rel, index,
                                       &rclauseset,
                                       &jclauseset,
                                       &eclauseset,
                                       &bitjoinpaths);
   }
 
…
}

There are a lot of things to take in here - and we’ve already removed BitmapOr/BitmapAnd index scans from this code sample.

First of all, this builds two types of index scans:

Plain index scans, that only use the base restriction clauses
Parameterized index scans, that use both base restriction clauses and join clauses

We’ll talk more about the second case in a moment.

Other key aspects to understand:

Partial indexes (i.e. those with an attached WHERE clause on the index definition) are matched against the set of restriction clauses and discarded here if they don’t match
Each index is both considered for an Index Scan and Index Only Scan (through the “build_index_paths” method), as well as for a Bitmap Heap Scan / Bitmap Index Scan
Each potential way of using an index gets a cost assigned - and this cost decides whether Postgres actually chooses the index (see earlier notion of the “best path”), or not

For understanding how costing works, you can look at the cost_index function (source), which gets called from build_index_paths through a few hoops.

/*
* cost_index
*    Determines and returns the cost of scanning a relation using an index.
…
* In addition to rows, startup_cost and total_cost, cost_index() sets the
* path's indextotalcost and indexselectivity fields.  These values will be
* needed if the IndexPath is used in a BitmapIndexScan.
*/
void
cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
          bool partial_path)
{
…
   /*
    * Call index-access-method-specific code to estimate the processing cost
    * for scanning the index, as well as the selectivity of the index (ie,
    * the fraction of main-table tuples we will have to retrieve) and its
    * correlation to the main-table tuple order.
    */
   amcostestimate(root, path, loop_count,
                  &indexStartupCost, &indexTotalCost,
                  &indexSelectivity, &indexCorrelation,
                  &index_pages);

Whilst there are other factors in costing an index scan, the main responsibility falls to the Index Access Method.

Understanding B-tree index cost estimates

The most common index access method (or index type) is B-tree, so let’s look at btcostestimate:

void
btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
              Cost *indexStartupCost, Cost *indexTotalCost,
              Selectivity *indexSelectivity, double *indexCorrelation,
              double *indexPages)
{
…
   /*
    * For a btree scan, only leading '=' quals plus inequality quals for the
    * immediately next attribute contribute to index selectivity (these are
    * the "boundary quals" that determine the starting and stopping points of
    * the index scan).
    */
   indexBoundQuals = …
 
   /*
    * If the index is partial, AND the index predicate with the
    * index-bound quals to produce a more accurate idea of the number of
    * rows covered by the bound conditions.
    */
   selectivityQuals = add_predicate_to_index_quals(index, indexBoundQuals);
 
   btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
                                             index->rel->relid,
                                             JOIN_INNER,
                                             NULL);
   numIndexTuples = btreeSelectivity * index->rel->tuples;
…
   costs.numIndexTuples = numIndexTuples;
   genericcostestimate(root, path, loop_count, &costs);
…

As you can see a lot revolves around determining how many index tuples will be matched by the scan - as that’s the main expensive portion of querying a B-tree index.

The first step is determining the boundaries of the index scan, as it relates to the data stored in the index. In particular this is relevant for multi-column B-tree indexes, where only a subset of the columns might match the query.

You may have heard before about the best practice of ordering B-tree columns so the columns that are queried by an equality comparison (“=” operator) are put first, followed by one optional inequality comparison (“<>” operator), followed by any other columns. This recommendation is based on the physical structure of the B-tree index, and the cost model also reflects this constraint.

Put differently: The more specific you are with matching equality comparisons, the less parts of the index have to be scanned. This is represented here by the calculation of “btreeSelectivity”. If this number is small, the cost of the index scan will be less, as determined by “genericcostestimate” based on the estimated number of index tuples being scanned.

For creating the ideal B-tree index, you would:

Focus on indexing columns used in equality comparisons
Index the columns with the best selectivity (i.e. being most specific), so that only a small portion of the index has to be scanned
Involve a small number of columns (possibly only one), to keep the index size small - and thus reduce the total number of pages in the index

If you follow these steps, you will create a B-tree index that has a low cost, and that Postgres should choose.

Now, there is one more thing we wanted to talk about, and that involves the notion of Parameterized Index Scans:

Parameterized Index Scans, or: Why Nested Loop are sometimes a good join type

As noted earlier, when Postgres looks at the potential index scans, it creates both plain index scans, and parameterized index scans.

Plain index scans only involve parts of your query that involve the table itself, and would typically reference the clauses found in the WHERE clause.

Parameterized index scans on the other hand involve the part of your query that references two different tables. Oftentimes you would find these clauses in the JOIN clause.

Let’s take a look at a practical example. Assume the following schema and indexes:

CREATE TABLE t1 (
  id bigint PRIMARY KEY,
  field text
);
CREATE TABLE t2 (
  id bigint PRIMARY KEY,
  t1_id bigint,
  other_field text
);
CREATE INDEX t1_field_idx ON t1(field);
CREATE INDEX t2_t1_id_idx ON t2(t1_id);

And this query:

SELECT *
FROM t1
JOIN t2 ON (t1.id = t2.t1_id)
WHERE t1.field = '123'

We have two tables to scan - t1 and t2.

For t1, we can utilize a plain index scan on the t1_field_idx index - and that will perform well, since we have a specific value that is present in the query, that ideally matches a small amount of rows.

When we run an EXPLAIN on the query, the simplest plan will look like this:

EXPLAIN SELECT *
FROM t1
JOIN t2 ON (t1.id = t2.t1_id)
WHERE t1.field = '123';

                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Hash Join  (cost=13.74..37.26 rows=5 width=88)
   Hash Cond: (t2.t1_id = t1.id)
   ->  Seq Scan on t2  (cost=0.00..20.70 rows=1070 width=48)
   ->  Hash  (cost=13.67..13.67 rows=6 width=40)
         ->  Bitmap Heap Scan on t1  (cost=4.20..13.67 rows=6 width=40)
               Recheck Cond: (field = '123'::text)
               ->  Bitmap Index Scan on t1_field_idx  (cost=0.00..4.20 rows=6 width=0)
                     Index Cond: (field = '123'::text)
(8 rows)

Or put visually:

As we can see Postgres uses a Sequential Scan on t2. Let’s add some more data into the tables, to see if that changes the plan:

INSERT INTO t1 SELECT val, val::text FROM generate_series(0, 1000) AS x(val);
INSERT INTO t2 SELECT val, val, val::text FROM generate_series(0, 1000) AS x(val);

Note that we are effectively creating exactly one entry that matches the t1.field = '123' condition, and we are also creating exactly one t2 entry for each t1 entry.

If we re-run the EXPLAIN, we get the following plan:

                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Nested Loop  (cost=0.55..16.60 rows=1 width=30)
   ->  Index Scan using t1_field_idx on t1  (cost=0.28..8.29 rows=1 width=11)
         Index Cond: (field = '123'::text)
   ->  Index Scan using t2_t1_id_idx on t2  (cost=0.28..8.29 rows=1 width=19)
         Index Cond: (t1_id = t1.id)
(5 rows)

As you can see, we now get an index scan on t2_t1_id_idx. This shows a Parameterized Index Scan in action - this is only possible because the join chosen by Postgres is a Nested Loop - not a Hash Join or Merge Join.

A quick summary of how different join types impact index usage:

Merge Join: Needs sorted output from the scan node (thus can benefit from a sorted index like B-tree), but doesn't use the JOIN clause to restrict the data when scanning the table
Hash Join: Doesn’t need sorted output, and doesn’t use the JOIN clause to restrict the data when scanning the table
Nested Loop Join: Doesn’t need sorted output from the scan node, but for one of the two tables uses the JOIN clause to restrict the data when scanning the table

Understanding what’s in your WHERE, your JOIN clause and your likely JOIN type is key, as all three will impact index usage.

If you see a surprising Sequential Scan, you might want to review whether all possible index scans were parameterized index scans, and how the plan changes when you add an additional WHERE clause.

New features coming soon to pganalyze

If you find you’re having a hard time reasoning about all of this, you are not alone!

The reason we’ve spent a lot of time looking through these parts of the Postgres source code, is because they form the basis of a new upcoming version of the Index Advisor.

And as part of the new Index Advisor, we’ll show you additional information for all scans on a table, to help you assess how Postgres uses existing indexes, and what the best indexing strategy might be.

Here is a sneak peek from our current design iteration:

The same WHERE clause and JOIN clause data from the Postgres planner is shown in the Scans list, to help you make an assessment of how Postgres builds Plain Index Scans and Parameterized Index Scans for your queries.

But more on this another day!

Conclusion

In this post we’ve gone down and chased through the Postgres source code until we’ve found the place where indexing decisions happen. We’ve looked at B-tree costing in particular, and looked at a puzzling case of how Nested Loops can affect index usage, by allowing the use of Parameterized Index Scans.

If you optimize your queries, it helps to understand which tables you are scanning, and what the involved WHERE and JOIN clauses are. Additionally, it’s important to understand the different join types, and that only Nested Loop joins can make use of indexes on columns in the JOIN clause.

Do you think your peers might be interested in this article? Share this on Twitter.

Other helpful resources

]]>

Postgres in 2021: An Observer's Year In Review

Lukas Fittl — Fri, 07 Jan 2022 12:00:00 GMT

Every January, the pganalyze team takes time to sit down to reflect on the year gone by. Of course, we are thinking about pganalyze, our customers and how we can improve our product. But, more importantly, we always take a bird's-eye view at what has happened in our industry, and specifically in the Postgres community. As you can imagine: A lot!

So we thought: Instead of trying to summarize everything, let's review what happened with the Postgres project, and what is most exciting from our personal perspective. Coincidentally, a new Postgres Commitfest has just started, so it's the perfect time to read about new functionality that is being proposed by the PostgreSQL community.

The following are my own thoughts on the past year of Postgres, and a few of the things that I'm excited about looking ahead. Let's take a look:

Postgres Performance: Sometimes it's the small things
Does autovacuum dream of 64-bit Transaction IDs?
EXPLAIN: Nested Loops can be deceiving
Extended Statistics: Help the Postgres planner do its job better
Crustaceous Postgres: Using Rust For Extensions & more
Other highlights from Postgres development in 2021
Conclusion

Postgres Performance: Sometimes it's the small things

To start with, I wanted to look at one very specific change that I actually hadn't noticed until recently.

Specifically: The performance of IN clauses, and the work done to improve performance for long IN lists in Postgres 14.

First, let's set up a test table with some data that we can query:

CREATE TABLE tbl (
    id int
);

INSERT INTO tbl SELECT i FROM generate_series(1,100000) n(i);

Now, we run a very simple query with a long IN list on Postgres 13:

postgres=# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]);
 count 
-------
  1000
(1 row)

Time: 360.520 ms

This is noticeably slow. With Postgres 14 however:

postgres=# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]);
 count 
-------
  1000
(1 row)

Time: 12.246 ms

An amazing 30x improvement! Note that this is most pronounced with Sequential Scans, or other situations where the executor makes a lot of comparisons, i.e. when the expression shows up as a Filter clause.

The reason I like this change is that it demonstrates what the Postgres community does well: Refine the existing system, and optimize clear inefficiencies, without requiring users to change their queries.

Of course, there are many other exciting performance efforts, here are a few:

Postgres 14: Connection scaling improvements
Postgres 14: Memoization of Nested Loops
Postgres 14: libpq pipelining
In Development: libpq compression
In Development: Asynchronous I/O and direct I/O (see also this presentation by Andres Freund)

Does autovacuum dream of 64-bit Transaction IDs?

Now, on to a much bigger topic. If you've scaled Postgres, you've likely come to meet the archenemy of a large Postgres installation: VACUUM, or rather its cousin, autovacuum, which cleans up dead tuples from your tables and advances the transaction ID horizon in Postgres.

Much has been said (1, 2, 3) about what happens when you hit Transaction ID (TXID) Wraparound, a situation in which Postgres is unable to start a new transaction. A recent blog post illustrating Notion's motivation to shard their Postgres deployment, puts it well:

More worrying was the prospect of transaction ID (TXID) wraparound, a safety mechanism in which Postgres would stop processing all writes to avoid clobbering existing data. Realizing that TXID wraparound would pose an existential threat to the product, our infrastructure team doubled down and got to work.

- Garrett Fidalgo - Herding elephants: Lessons learned from sharding Postgres at Notion

The root cause here is actually very simple. Transaction IDs are stored as 32-bit integers in Postgres. For example on individual rows in the table, to identify when the row first became visible to other transactions.

Most people would agree that moving from 32-bit to 64-bit Transaction IDs is a good idea. There have been multiple attempts over the years, but in the last weeks a new patch by Maxim Orlov has kickstarted a new discussion.

Whilst the community's motivation to fix this is certainly there, the early reviews give a glimpse of what needs to be considered when moving to 64-bit TXIDs:

32-bit systems will have issues with atomic read/write of shared transaction ID variables
Extremely long-running transactions could fail if they exceed the new "short transaction ID" boundary (which remains at 32-bit in this patch)
On-disk format - keeping compatibility with the old format vs rewriting all data when an old cluster is upgraded (this patch tries to avoid changing the on-disk format)
Multixact freeze still needs to happen at a somewhat regular frequency (one of the activities that VACUUM takes care of today)
Memory overhead of larger 64-bit IDs in hot code paths (e.g. those optimized by recent connection scalability improvements)

And Peter Geoghegan puts it succinctly in reviewing the patch:

I believe that a good solution to the problem that this patch tries to solve needs to be more ambitious. I think that we need to return to first principles, rather than extending what we have already.

Despite the email thread being titled "Add 64-bit XIDs into PostgreSQL 15", given these concerns, it's extremely unlikely that a change like this would make it into Postgres 15 at this point - but one can dream, and look ahead to Postgres 16.

Looking for something you can use today?

Postgres 14 brought two great improvements in the area of VACUUM and bloat reduction: (1) The new bottom-up index deletion for B-tree indexes, (2) The new VACUUM "emergency mode" that provides better protection against impeding TXID Wraparound.

EXPLAIN: Nested Loops can be deceiving

Commitfests are about encouraging code reviews, first and foremost. Whilst looking through patches, I noticed a small one, which adds additional information about Nested Loops to EXPLAIN.

The patch was initially proposed back in 2020, and saw some minor refactorings in 2021, but no-one had reviewed it yet in this Commitfest. So I took the opportunity to review it.

First, to understand what the patch aims to do, let's look at a common EXPLAIN output for a Nested Loop:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop (actual rows=23 loops=1)
   Output: tbl1.col1, tprt.col1
   ->  Seq Scan on public.tbl1 (actual rows=5 loops=1)
         Output: tbl1.col1
   ->  Append (actual rows=5 loops=5)
         ->  Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5)
               Output: tprt_1.col1
               Index Cond: (tprt_1.col1 < tbl1.col1)
         ->  Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4)
               Output: tprt_2.col1
               Index Cond: (tprt_2.col1 < tbl1.col1)
         ->  Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2)
               Output: tprt_3.col1
               Index Cond: (tprt_3.col1 < tbl1.col1)
...

Based on this we might assume that each loop produces 5 rows, as the existing "actual rows" statistic shows the average across all loops.

But this example shows well where the math already doesn't add up: The parent Append node returns 5 rows on average, but the child node "actual rows" add up to 6. And the top Nested Loop node returns 23 rows, but we can't see clearly which index these rows are being found in.

With the patch in place, we get an extra row with Loop information:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop (actual rows=23 loops=1)
   Output: tbl1.col1, tprt.col1
   ->  Seq Scan on public.tbl1 (actual rows=5 loops=1)
         Output: tbl1.col1
   ->  Append (actual rows=5 loops=5)
         Loop Min Rows: 2  Max Rows: 6  Total Rows: 23
         ->  Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5)
               Loop Min Rows: 2  Max Rows: 2  Total Rows: 10
               Output: tprt_1.col1
               Index Cond: (tprt_1.col1 < tbl1.col1)
         ->  Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4)
               Loop Min Rows: 2  Max Rows: 3  Total Rows: 11
               Output: tprt_2.col1
               Index Cond: (tprt_2.col1 < tbl1.col1)
         ->  Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2)
               Loop Min Rows: 1  Max Rows: 1  Total Rows: 2
               Output: tprt_3.col1
               Index Cond: (tprt_3.col1 < tbl1.col1)
...

You can see how much clearer the picture is with this. We can understand that both tprt1_idx and tprt2_idx contributed about equally to the result. We can also see that some loop iterations have smaller row counts (2), vs other iterations have higher counts (6). When TIMING is turned on, you also get information on the min/max time of the loop iterations.

Given the prevalance of slow query plans that contain a Nested Loop, this appears to be a very useful patch. The main open item with this patch appears to be the slight overhead caused by collecting additional statistics - something to be discussed further on the mailinglist.

Interested in other EXPLAIN improvements? Here's what happened recently:

Postgres 14: pg_stat_statements queryid is now built into core, and shows in EXPLAIN output
In Development: Showing applied extended statistics in EXPLAIN (to quote my colleague Maciek: "Oh neat, that's pretty cool.")
In Development: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Extended Statistics: Help the Postgres planner do its job better

Going back to what you can use today: Extended Statistics on Expressions, released in Postgres 14.

Let's back up there for a moment. If you are not familiar, extended statistics allow you to collect additional statistics about table contents, so the Postgres planner can provide better query plans.

The general syntax is like this:

CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
    [ ( statistics_kind [, ... ] ) ]
    ON column_name, column_name [, ...]
    FROM table_name

Before Postgres 14 you could already create extended statistics that help the planner understand the correlation between two columns, which often times is necessary to avoid selectivity mis-estimates.

With the new extended statistics for expressions, you can inform the planner how selective a particular expression is, which in turn leads to better query plans. Here is an example of how to use this:

CREATE TABLE tbl (
    a timestamptz
);
CREATE STATISTICS st ON date_trunc('month', a) FROM tbl;

This will cause Postgres to not only collect statistics about a itself (which it does by default), but also the expression that uses the date_trunc function, and what the statistics of results of that expression are. You can find a complete example in the Postgres docs.

In addition to this, there are many changes in-flight that are being discussed:

In Development: Improve selectivity estimates when extended statistics are present
In Development: Extended statistics for Var op Var clauses / Expr op Expr
In Development: Estimating JOINs using extended statistics

Crustaceous Postgres: Using Rust For Extensions & more

A side topic that isn't actually about Postgres development itself, but still pretty exciting on a larger scale: Postgres and Rust. As you probably know, Postgres itself is written in C, and that is unlikely to change.

However there are two great examples of Rust being used to augment the Postgres ecosystem.

First, you can write Postgres extensions in Rust using pgx, and by now this approach has matured to the point that even established extension authors such as the TimescaleDB team have started adopting Rust for some of their projects, such as the TimescaleDB toolkit.

Second, there are new systems being developed that build on Postgres, that utilize Rust as their language of choice, e.g. for networked services. The most interesting development in 2021 in this regard is the work of the team at ZenithDB, that is working on an Apache 2.0-licensed variant of a shared disk-type scale-out architecture (similar to Amazon Aurora), built on Postgres, with services written in Rust.

Conclusion

The above might feel quite extensive, but that's not merely all of the things that have happened with Postgres in 2021. I'm excited to be part of such a vibrant community contributing to making Postgres continuously better and am eager to see what's to come for Postgres in 2022!

At pganalyze we're committed to providing the best Postgres monitoring and observability to help you uncover deep insights about Postgres performance. Whether your Postgres runs in the cloud, your on-premises data center, or a Raspberry Pi: You can give pganalyze a try.

Share this on Twitter

]]>

The Fastest Way To Load Data Into Postgres With Ruby on Rails

Eze Sunday Eze — Tue, 14 Dec 2021 12:00:00 GMT

Data migration is a delicate and sometimes complicated and time-consuming process. Whether you are loading data from a legacy application to a new application or you just want to move data from one database to another, you’ll most likely need to create a migration script that will be accurate, efficient, and fast to help with the process — especially if you are planning to load a huge amount of data.

There are several ways you can load data from an old Rails app or other application to Rails. In this article, I’ll explain a few ways to load data to a PostgreSQL database with Rails. We’ll go over their pros and cons, so you can choose the method that works best for your situation.

Postgres is an innovative database. According to a recent study by DB-Engines (PDF), PostgreSQL’s popularity rating increased by 65 percent from January 2016–January 2019, while the rating of MySQL, SQL Server, and Oracle decreased by 10–16 percent during the same period.

PostgreSQL has a strong reputation for handling large data sets. However, with the wrong tools and solutions, its powers can be undermined. So what’s the fastest way to load data to a Postgres database in your Rails app? Let’s look at four different methods, and then we’ll see which is the fastest.

Inserting one record at a time to load data to your Postgres database
- Pros of single-row inserts with Postgres
- Cons of single-row inserts with Postgres
Bulk Inserts with Active Record Import to load data to your Postgres database
- Pros of Bulk Inserts with Active Record in Ruby on Rails and Postgres
- Cons of Bulk Inserts with Active Record in Ruby on Rails and Postgres
Using PostgreSQL Copy with Activerecord-copy to load data to your Postgres database
- Pros of using PostgreSQL Copy with Activerecord-copy
- Cons of using PostgreSQL Copy with Activerecord-copy
4. Using background jobs to load data to your Postgres database
Final Thoughts About Loading Large Data Sets into a PostgreSQL Database with Rails
Speed comparison of different ways to load data into Postgres with Rails
Other articles and resources you might like

Inserting one record at a time to load data to your Postgres database

One easy way to load data to a Postgres database is to loop through the data and insert them one at a time.

Here’s a sample code to do this in Rails, assuming we have the source data in a CSV file:

# lib/tasks/one_record_at_a_time.rake
require 'csv'
require "benchmark"

namespace :import do
   desc "imports data from csv to postgresql"
   task :single_record => :environment do
       #This function loops over the content of the csv file and creates a new record for each of them.
       def insert_user
           CSV.foreach(filename, headers: true) do |row|
               User.create(row)
           end
       end
       puts Benchmark.realtime {insert_user } #Here we are using benchmark to measure the speed
   end
end

But there’s a problem with this approach. Inserting data one at a time into a PostgreSQL database is extremely slow. I ran this Rake task to insert over a million records and measured it with Benchmark. The report came back with a result of over 1.3 hours, that’s a long time. There's overhead in both the database and the application in processing rows one-by-one, and additional latency in waiting for the database round trip for each row.

We’ll see a better approach in the next section, but for now, here’s a summary of the pros and cons of single-row inserts:

Pros of single-row inserts with Postgres

Doesn’t require an external dependency

Cons of single-row inserts with Postgres

Very slow
Might lock your session for a long time
Not suitable for inserting large data sets
If one insert fails, you’re stuck with partially loaded data

Bulk Inserts with Active Record Import to load data to your Postgres database

Running a bulk insert query is a better and faster way to load data into your Postgres database, and the Rails gem activerecord-import makes it easy to load massive data in bulk in a way that the Active Record ORM can understand and manipulate.

Instead of hitting your database multiple times, processing transactions, and doing all the back and forth with your app and database, the Active Record Import gem allows you to build up large insert queries and run them at once.

You can install the Active Record Import gem by adding gem 'activerecord-import' to your Gemfile and running bundle install in your terminal. This gem adds import to Active Record classes. That means you’ll only need to call the import method on your model classes to load the data into your database.

Here is an example:

# lib/tasks/active_record_import.rake
require 'csv'
require "benchmark"

namespace :import do
   desc "imports data from csv to postgresql"
   users = []
   task :batch_record => :environment do
       CSV.foreach(filename, headers: true) do |row|
           users << row
       end
       newusers = users.map do |attrs|
           User.new(attrs)
       end
       time = Benchmark.realtime {User.import(newusers)}
       puts time
   end
end

Notice how we’re building up the record in an array—users—and passing the array to the import method on the User model— User.import(newusers).

That’s really all that needs to be done. However, you can choose to pass only some specific columns and the values in an array to the import method if you want to. For example, User.import columns values where the columns will be an array like ["first_name", "last_name"], while the values will be an array like [ ['Peter', 'Joseph'], ['Banabas', 'Bob Jones'] ].

I analyzed loading a million records into a Postgres database with Rails using this method, and it took only 5.1 minutes. Remember the first method took 1.3 hours? This method is 1,529% ( ~15x ) faster. That’s impressive.

Pros of Bulk Inserts with Active Record in Ruby on Rails and Postgres

Follows Active Record Associations, meaning Rails ORM is able to do its magic with the loaded data
Faster to load data into your PostgreSQL database
Doesn’t have per-row overhead
If insert fails, your transaction will rollback the insert

Cons of Bulk Inserts with Active Record in Ruby on Rails and Postgres

The activerecord-import gem might conflict with other gems that add .import method to the Active Record model. However, in cases where this might happen, you can use the .bulk_import method also attached to your model classes as an alternative.

See how batch import improved our speed by over 1,529%? That was incredible, right? There is still a faster way to load data to a Postgres database.

Using PostgreSQL Copy with Activerecord-copy to load data to your Postgres database

COPY is the fastest way to load data to a PostgreSQL database; it uses the combined power of a bulk insert and avoids some of the overhead of repeatedly parsing and planning an INSERT.

The gem activerecord-copy provides an easy-to-use interface for implementing COPY in your Rails app. You’ll need to add the line gem 'activerecord-import' to your Gemfile and run bundle install in your terminal to install the gem and get ready to use it.

Here is a sample Rake task showing how you can use it:

# lib/tasks/active_record_copy.rake
require 'csv'
require "benchmark"
namespace :copy do
   desc "imports data from csv to postgresql"
   task :data => :environment do
       def insert_user
           users = []
           CSV.foreach(filename, headers: true) do |row|
               users << row
           end
           time = Time.now.getutc
          
           User.copy_from_client [:first_name, :last_name, :email, :created_at, :updated_at] do |copy|
               users.each do |d|
                   copy << [d[:first_name], d[:last_name], d[:email] ,time, time ]
               end
           end
       end
       puts Benchmark.realtime {insert_user}
   end
end

The activerecord-copy gem adds a copy_from_client method to all your model classes, as shown in the snippet above (you’ll have to define the columns and their values as shown).

Note that when you use the activerecord-copy gem, the time stamp is not created for you automatically. You’ll have to create this yourself. You’ll also notice where I created the time stamp time = Time.now.getutc; that’s because Rails will not create time stamps for you automatically with COPY.

Pros of using PostgreSQL Copy with Activerecord-copy

Doesn’t have per-row overhead
If insert fails, your transaction will rollback the insert
Super fast

Cons of using PostgreSQL Copy with Activerecord-copy

Manually set time stamps (created_at, updated_at, etc.)

I analyzed the activerecord-copy performance with a transaction of over one million records, as I did for other methods, and the speed is about 1.5 minutes. Insanely fast compared to the other methods we’ve seen in this article.

4. Using background jobs to load data to your Postgres database

If you frequently load new data to your database, one great way to improve your app’s performance is to run your data loading using a background job. There are several tools that make this possible, for example, Rails’ delayed_job gem, sidekiq, and resque.

However, just like Active Record, Rails uses Active Jobs to allow you to use any of these supported adapters within your Rails app without bothering about job-specific implementation. So you could set up a script for Active Record and run the script in a background job using Active Jobs and the delayed_job adapter. That way, you'll be running your data loading in the background.

Let’s walk through how to set up your Active Job to run your background process:

Since you’re going to use the delayed_job adapter, install the delayed_job_active_record gem.
Add gem 'delayed_job_active_record' to your Gemfile.
Run bundle install on your terminal/command line.
Run the following command to create a delayed job migration for the delayed jobs table:

rails g delayed_job:active_record
rake db:migrate

Generate an Active Job by running the following command:

rails generate job import_data

Open the file created in your app/jobs directory—app/jobs/import_data_job.rb—and add your data loading code:

# app/jobs/import_data_job.rb
class ImportDataJob < ApplicationJob
   queue_as :default
   def perform(*args)
   # Write your code here to load records to the database. You can use any of the fast methods we've discussed.
   end
end

In order for Rails to be aware of the Active Job adapter you want to use, you need to add the adapter to your config file. Just add this line: config.active_job.queue_adapter = :delayed_job_active_record.

   # config/application.rb
   module YourApp
     class Application < Rails::Application
       # Be sure to have the adapter's gem in your Gemfile
       # and follow the adapter's specific installation
       # and deployment instructions.
       config.active_job.queue_adapter = :delayed_job_active_record
     end
   end

Depending on how often you want the job to run, you can set the job to be enqueued at a specific time or immediately, following the instructions in the Active Jobs documentation.

One way you can do this is to allow the job to run asynchronously. Create a Rake task, add ImportDataJob.perform_later to the task, and run it. Example:

namespace :active_jobs do
   desc "imports data from sql to postgresql"
   task :import => :environment do
       ImportDataJob.perform_later
   end
end

Once this is done, you can now run the task rake active_jobs:import on your terminal.

Final Thoughts About Loading Large Data Sets into a PostgreSQL Database with Rails

When considering how to optimize your database performance, it’s best to first figure out the optimization options the database has already provided. As you may have noticed, most of the tools and techniques in this article leverage the hidden power of the PostgreSQL database. Sometimes, it might just be your implementation slowing down your database performance.

Speed comparison of different ways to load data into Postgres with Rails

Here’s a table summarizing the various speeds of the methods discussed in this article.

Method	Speed	Amount of records
One record at a time insert	1.3 hours	1,000,000
Bulk inserts with Activerecord Import	5.1 minutes	1,000,000
PostgreSQL Copy with Activerecord-copy	1.5 minutes	1,000,000
Using Background Jobs	< 1 sec (perceived)	1,000,000

You’ve learned that if you’re loading a huge amount of data into your PostgreSQL database, one insert at a time is slow and shouldn’t even be considered. For ultimate performance, you want to use COPY. Of course, you’ve also seen the caveats of each method, and you should weigh all the pros and cons before making your final decision.

Share this article: If you liked this article we’d appreciate it if you’d tweet it to your peers.

Understanding Postgres GIN Indexes: The Good and the Bad

Lukas Fittl — Thu, 02 Dec 2021 12:00:00 GMT

Adding, tuning and removing indexes is an essential part of maintaining an application that uses a database. Oftentimes, our applications rely on sophisticated database features and data types, such as JSONB, array types or full text search in Postgres. A simple B-tree index does not work in such situations, for example to index a JSONB column. Instead, we need to look beyond, to GIN indexes.

Almost 15 years ago to the dot, GIN indexes were added in Postgres 8.2, and they have since become an essential tool in the application DBA’s toolbox. GIN indexes can seem like magic, as they can index what a normal B-tree cannot, such as JSONB data types and full text search. With this great power comes great responsibility, as GIN indexes can have adverse effects if used carelessly.

In this article, we’ll take an in-depth look at GIN indexes in Postgres, building on, and referencing many great articles that have been written over the years by the community. We’ll start by reviewing what GIN indexes can do, how they are structured, and their most common use cases, such as for indexing JSONB columns, or to support Postgres full text search.

But, understanding the fundamentals is only part of the puzzle. It’s much better when we can also learn from real world examples on busy databases. We’ll review a specific situation that the GitLab database team found themselves in this year, as it relates to write overhead caused by GIN indexes on a busy table with more than 1000 updates per minute.

And we’ll conclude with a review of the trade-offs between the GIN write overhead and the possible performance gains. Plus: We’ve added support for GIN index recommendations to the pganalyze Index Advisor.

To start with, let’s review what a GIN index looks like:

GIN Index in Postgres: What is it actually?
- Indexing tsvector columns for Postgres full text search
- Indexing LIKE searches with Trigrams and gin_trgm_ops
PostgreSQL, JSONB and GIN Indexes
- Postgres GIN index for JSONB columns using jsonb_ops and jsonb_path_ops
Multi-Column GIN Indexes, and Combining GIN and B-tree indexes
The downside of GIN Indexes: Expensive Updates
GIN index support in the pganalyze Index Advisor
Conclusion
Other helpful resources

GIN Index in Postgres: What is it actually?

“The GIN index type was designed to deal with data types that are subdividable and you want to search for individual component values (array elements, lexemes in a text document, etc)” - Tom Lane

The GIN index type was initially created by Teodor Sigaev and Oleg Bartunov, first released in Postgres 8.2, on December 5, 2006 - almost 15 years ago. Since then, GIN has seen many improvements, but the fundamental structure remains similar. GIN stands for "Generalized Inverted iNdex". "Inverted" refers to the way that the index structure is set up, building a table-encompassing tree of all column values, where a single row can be represented in many places within the tree. By comparison, a B-tree index generally has one location where an index entry points to a specific row.

Another way of explaining GIN indexes comes from a presentation by Oleg Bartunov and Alexander Korotkov at PGConf.EU 2012 in Prague. They describe a GIN index like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:

It’s important to note that the exact mapping of a column of a given data type is dependent on the GIN index operator class. That means, instead of having a uniform representation of data in the index, like with B-trees, a GIN index can have very different index contents depending on which data type and operator class you are using. Some data types, such as JSONB have more than one GIN operator class to support the most optimal index structure for specific query patterns.

Before we move on, one more thing to know: GIN indexes only support Bitmap Index Scans (not Index Scan or Index Only Scan), due to the fact that they only store parts of the row values in each index page. Don’t be surprised when EXPLAIN always shows Bitmap Index / Heap Scans for your GIN indexes.

Let’s take a look at a few examples:

Indexing tsvector columns for Postgres full text search

The initial motivation for GIN indexes was full text search. Before GIN was added, there was no way to index full text search in Postgres, instead requiring a very slow sequential scan of the table.

We’ve previously written about Postgres full text search with Django, as well as how to do it with Ruby on Rails on the pganalyze blog.

A simple example for a full text search index looks like this:

CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', body));

This uses an expression index to create a GIN index that contains the indexed tsvector values for each row. You can then query like this:

SELECT title
FROM pgweb
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'friend');

As described in the Postgres documentation, the tsvector GIN index structure is focused on lexemes:

“GIN indexes are the preferred text search index type. As inverted indexes, they contain an index entry for each word (lexeme), with a compressed list of matching locations. Multi-word searches can find the first match, then use the index to remove rows that are lacking additional words.”

GIN indexes are the best starting point when using Postgres Full Text Search. There are situations where a GIST index might be preferred (see the Postgres documentation for details), and if you run your own server you could also consider the newer RUM index types available through an extension.

Let's see what else GIN has to offer:

Indexing LIKE searches with Trigrams and gin_trgm_ops

Sometimes Full Text Search isn't the right fit, but you find yourself needing to index a LIKE search on a particular column:

CREATE TABLE test_trgm (t text);
SELECT * FROM test_trgm WHERE t LIKE '%foo%bar';

Due to the nature of the LIKE operation, which supports arbitrary wildcard expressions, this is fundamentally hard to index. However, the pg_trgm extension can help. When you create an index like this:

CREATE INDEX trgm_idx ON test_trgm USING gin (t gin_trgm_ops);

Postgres will split the row values into trigrams, allowing indexed searches:

EXPLAIN SELECT * FROM test_trgm WHERE t LIKE '%foo%bar';

                               QUERY PLAN                               
------------------------------------------------------------------------
 Bitmap Heap Scan on test_trgm  (cost=16.00..20.02 rows=1 width=32)
   Recheck Cond: (t ~~ '%foo%bar'::text)
   ->  Bitmap Index Scan on trgm_idx  (cost=0.00..16.00 rows=1 width=0)
         Index Cond: (t ~~ '%foo%bar'::text)
(4 rows)

Effectiveness of this method varies with the exact data set. But when it works, it can speed up searches on arbitrary text data quite significantly.

PostgreSQL, JSONB and GIN Indexes

JSONB was added to Postgres almost 10 years after GIN indexes were introduced - and it shows the flexibility of the GIN index type that they are the preferred way to index JSONB columns.

Postgres GIN index for JSONB columns using jsonb_ops and jsonb_path_ops

With JSONB in Postgres we gain the flexibility of not having to define our schema upfront, but instead we can dynamically add data to a column in our table in JSON format.

The most basic GIN index example for JSONB looks like this:

CREATE TABLE test (
  id bigserial PRIMARY KEY,
  data jsonb
);
INSERT INTO test(data) VALUES ('{"field": "value1"}');
INSERT INTO test(data) VALUES ('{"field": "value2"}');
INSERT INTO test(data) VALUES ('{"other_field": "value42"}');
CREATE INDEX ON test USING gin(data);

As you can see with EXPLAIN, this is able to use the index, for example when querying for all rows that have the field key defined:

EXPLAIN SELECT * FROM test WHERE data ? 'field';

                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=8.00..12.01 rows=1 width=40)
   Recheck Cond: (data ? 'field'::text)
   ->  Bitmap Index Scan on test_data_idx  (cost=0.00..8.00 rows=1 width=0)
         Index Cond: (data ? 'field'::text)
(4 rows)

The way this gets stored is based on the keys and values of the JSONB data. In the above test data, the default jsonb_ops operator class would store the following values in the GIN index, as separate entries: field, other_field, value1, value2, value42. Depending on the search the GIN index will combine multiple index entries to satisfy the specific query conditions.

Now, we can also use the non-default jsonb_path_ops operator class with a JSONB GIN index. This uses an optimized GIN index structure that would instead store the above data as three individual entries using a hash function: hashfn(field, value1), hashfn(field, value2) and hashfn(other_field, value42).

The jsonb_path_ops class is intended to efficiently support containment queries. First we specify the operator class during index creation:

CREATE INDEX ON test USING gin(data jsonb_path_ops);

And then we can use it for queries such as the following:

EXPLAIN SELECT * FROM test WHERE data @> '{"field": "value1"}';

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=8.00..12.01 rows=1 width=40)
   Recheck Cond: (data @> '{"field": "value1"}'::jsonb)
   ->  Bitmap Index Scan on test_data_idx1  (cost=0.00..8.00 rows=1 width=0)
         Index Cond: (data @> '{"field": "value1"}'::jsonb)
(4 rows)

As you can see it’s easy to index a JSONB column. Note that you could technically also index JSONB with other index types by taking specific parts of the data. For example, we could use a B-tree expression index to index the field keys:

CREATE INDEX ON test USING btree ((data ->> 'field'));

The Postgres query planner will then use the specific expression index behind the scenes, if your query matches the expression:

EXPLAIN SELECT * FROM test WHERE data->>'field' = 'value1';

                                 QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Scan using test_expr_idx on test  (cost=0.13..8.15 rows=1 width=40)
   Index Cond: ((data ->> 'field'::text) = 'value1'::text)
(2 rows)

There is one more thing we should look at with finding the right GIN index, and that is multi-column GIN indexes.

Multi-Column GIN Indexes, and Combining GIN and B-tree indexes

Often times you’ll have queries that filter on a column that uses a data type that’s ideal for GIN indexes, such as JSONB, but you are also filtering on another column, that is more of a typical B-tree index candidate:

CREATE TABLE records (
  id bigserial PRIMARY KEY,
  customer_id int4,
  data jsonb
);

SELECT * FROM records WHERE customer_id = 123 AND data @> '{ "location": "New York" }';

In addition you might have a query like the following:

SELECT * FROM records WHERE customer_id = 123;

And you are considering which index to create for the two queries combined.

There are two fundamental strategies you can take:

(1) Create two separate indexes, one on customer_id using a B-tree, and one on data using GIN
- In this situation, for the first query, Postgres might use BitmapAnd to combine the index search results from both indexes to find the affected rows
- Whilst the idea of using two separate indexes sounds great in theory, in practice it often turns out to be the worse performing option. You can find some discussions about this on the Postgres mailing lists.
(2) Create one multi-column GIN index on both customer_id and data
- Note that multi-column GIN indexes don’t help much with making the index more effective, but they can help cover multiple queries with the same index

For implementing the second strategy, we need the help of the “btree_gin” extension in Postgres (part of contrib) that contains operator classes for data types that are not subdividable.

You can create the extension and the multi-column index like this:

CREATE EXTENSION btree_gin;
CREATE INDEX ON records USING gin (data, customer_id);

Note that index column order does not matter for GIN indexes. And as we can see, this gets used during query planning:

EXPLAIN SELECT * FROM records WHERE customer_id = 123 AND data @> '{ "location": "New York" }';

                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on records  (cost=16.01..20.03 rows=1 width=41)
   Recheck Cond: ((customer_id = 123) AND (data @> '{"location": "New York"}'::jsonb))
   ->  Bitmap Index Scan on records_customer_id_data_idx  (cost=0.00..16.01 rows=1 width=0)
         Index Cond: ((customer_id = 123) AND (data @> '{"location": "New York"}'::jsonb))
(5 rows)

It’s rather uncommon to use multi-column GIN indexes, but depending on your workload it might make sense. Remember that larger indexes mean more I/O, making index lookups slower, and writes more expensive.

The downside of GIN Indexes: Expensive Updates

As you saw in the examples above, GIN indexes are special because they often contain multiple index entries per single row that is being inserted. This is essential to enable the use cases that GIN supports, but causes one significant problem: Updating the index is expensive.

Due to the fact that a single row can cause 10s or worst case 100s of index entries to be updated, it’s important to understand the special fastupdate mechanism of GIN indexes.

By default fastupdate is enabled for GIN indexes, and it causes index updates to be deferred, so they can occur at a point where multiple updates have to be made, reducing the overhead for a single UPDATE, at the expense of having to do the work at a later point.

The data that is deferred is kept in the special pending list, which then gets flushed to the main index structure in one of three situations:

The gin_pending_list_limit (default of 4MB) is reached during a regular index update
Explicit call to the gin_clean_pending_list function
Autovacuum on the table with the GIN index (GIN pending list cleanup happens at the end of vacuum)

As you can imagine this can be quite an expensive operation, which is why one symptom of index write overhead with GIN can be that every Nth INSERT or UPDATE statement suddenly is a lot slower, in case you run into the first scenario above, where the gin_pending_list_limit is reached.

This exact situation happened to the team at GitLab recently. Let’s look at a real life example of where GIN updates became a problem.

GIN trigram indexes: A lesson from GitLab

The team at GitLab often publishes their discussions of database optimizations publicly, and we can learn a lot from these interactions. A recent example discussed a GIN trigram index that caused merge requests to be quite slow occasionally:

“We can see there are a number of slow updates for updating a merge request. The interesting thing here is that we see very little locking statements (locking is logged after 5 seconds waiting), which suggests something else is occurring to make these slow.”

This was determined to be caused by the GIN pending list:

“Anecdotally, cleaning the gin index pending-list for the description field on the merge_requests table can cost multiple seconds. The overhead does increase when there are more pending entries to write to the index. In this informal survey of manually running gin_clean_pending_list( 'index_merge_requests_on_description_trigram'::regclass ) the duration varied between 465 ms and 3155 ms.”

The team further investigated, and determined that the GIN pending list was flushed a very high number of times during business hours:

“this gin index's pending list fills up roughly once every 2.7 seconds during the peak hours of a normal weekday.”

If you want to read the full story, GitLab’s Matt Smiley has done an excellent analysis of the problem they’ve encountered.

As we can see, getting good data about the actual overhead of GIN pending list updates is critical.

Measuring GIN pending list overhead and size

To validate whether the GIN pending list is a problem on a busy table, we can do a few things:

First, we could utilize the pgstatginindex function together with something like psql’s \watch command to keep a close eye on a particular index:

CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatginindex('myindex');

 version | pending_pages | pending_tuples 
---------+---------------+----------------
       2 |             0 |              0
(1 row)

Second, If you run your own database server, you can use “perf” dynamic tracepoints to measure calls to the ginInsertCleanup function in Postgres:

sudo perf probe -x /usr/lib/postgresql/14/bin/postgres ginInsertCleanup
sudo perf stat -a -e probe_postgres:ginInsertCleanup -- sleep 60

An alternate method, using DTrace, was described in a 2019 PGCon talk. The authors of that talk also ended up visualizing different gin_pending_list_limit and work_mem settings:

As they discovered, the memory limit during flushing of the pending list makes a quite noticable difference.

If you don't have the luxury of direct access to your database server, you can estimate how often the pending list fills up based on the average size of index tuples and other statistics.

Now, if we determine that we have a problem, what can we do about it?

Strategies for dealing with GIN pending list update issues

There are multiple alternate ways you can resolve issues like the one GitLab encountered:

(1) Reduce gin_pending_list_limit
- Have more frequent, smaller flushes
- This may sound odd - but gin_pending_list_limit started out as being determined by work_mem (instead of being its own setting), and is only configurable separately since Postgres 9.5 - explaining the 4MB default, which may be too high in some cases
(2) Increase gin_pending_list_limit
- Have more opportunities to cleanup the list outside of the regular workload
(3) Turning off fastupdate
- Taking the overhead with each individual INSERT/UPDATE
(4) Tune autovacuum to run more often on the table, in order to clean the pending list
(5) Explicitly calling gin_clean_pending_list(), instead of relying on Autovacuum
(6) Drop the GIN index
- If you have alternate ways of indexing the data, for example using expression indexes

Depending on your workload one or multiple of these approaches could be a good fit.

In addition, it’s important to ensure you have sufficient memory available during the GIN pending list cleanup. The memory limit used for the pending list flush can be confusing, and is not related to the size of gin_pending_list_limit. Instead it uses the following Postgres settings:

work_mem during regular INSERT/UPDATE
maintenance_work_mem during gin_clean_pending_list() call
autovacuum_work_mem during autovacuum

Last but not least, you may want to consider partitioning or sharding a table that encounters problems like this. It may not be the easiest thing to do, but scaling GIN indexes to heavy write workloads is quite a tricky business.

GIN index support in the pganalyze Index Advisor

Not sure if your workload could utilize a GIN index, or which index to create for your queries?

We have now added initial support for GIN and GIST index recommendations to the pganalyze Index Advisor.

Here is an example of a GIN index recommendation for an existing tsvector column:

Note that the costing and size estimation logic for GIN and GIST indexes is still being actively developed.

We recommend trying out the Index Advisor recommendation on your own system to assess its effectiveness, as well as monitoring the production table for write overhead after you have added an index. You may also need to tweak your queries to make use of a particular index.

Conclusion

GIN indexes are powerful, and often the only way to index certain queries and data types. But with great power comes great responsibility. Use GIN indexes wisely, especially on tables that are heavily written to.

And when you are not sure which GIN index could work, try out the pganalyze Index Advisor.

If you want to share this article with your peers, feel free to tweet it.

Other helpful resources

Using Postgres CREATE INDEX: Understanding operator classes, index types & more

How we deconstructed the Postgres planner to find indexing opportunities

Efficient Search in Rails with Postgres (PDF eBook)

Efficient Postgres Full Text Search in Django

Full Text Search in Milliseconds with Rails and PostgreSQL

Efficient Pagination in Django and Postgres

eBook: Effective Indexing in Postgres

Webinar: How To Reason About Indexing Your Postgres Database

5mins of Postgres E17: Demystifying Postgres for application developers: A mental model for tables and indexes

pganalyze Index Advisor for Postgres

]]>

Postgres Views in Django

Josh Alletto — Tue, 16 Nov 2021 12:00:00 GMT

At my first job, we worked with a lot of data. I quickly found that when there's a lot of data, there are bound to be some long, convoluted SQL queries. Many of ours contained multiple joins, conditionals, and filters. One of the ways we kept the complexity manageable was to create Postgres views for common queries.

Postgres views allow you to query against the results of another query. Views can be composed of columns from one or more tables or even other views, and they are easy to work with in a Django app. In this article, you’ll learn about the two different types of Postgres views and how to decide when and if you should use them. Finally, you’ll create a view and set up a Django app to use it.

Why Postgres views?
Materialized Views in Postgres
Creating a Materialized View in Postgres
Using Postgres Views in Django and Python
- The Model
Conclusion

Why Postgres views?

One reason to use a view is that they help cut down on complexity.

For example, your customer data may be spread across several tables: a customers table, an emails table, and an addresses table. Addresses could reference more data in a cities and states table. This is an effective schema for your data, but you have to join all these tables every time you want to get a complete view of a customer. This may not be bad if you only do this occasionally, but it’s quite cumbersome if you’re going to query it often. Even if you only want two or three records, you still need to perform all these joins.

You can solve this problem by creating a view that looks like this:

CREATE VIEW complete_customer_data AS
SELECT
  concat(customers.first_name,' ', customers.last_name) AS customer_name,
  addresses.street_address AS street,
  addresses.zip_code AS zip,
  cities.city, AS city
  states.state AS state,
FROM customers
  INNER JOIN addresses ON customers.id = addresses.customer_id
  INNER JOIN cities ON addresses.city_id = cities.id
  INNER JOIN state ON cities.state_id = state.id;

As you can see, a view is just a query. Now, if you want to query all your customers from “Chicago,” you can query the view, which is much easier to write and more readable.

SELECT * FROM complete_customer_data 
WHERE city = 'Chicago';

Materialized Views in Postgres

Views are great for simplifying code, but with large datasets, you're not really saving any time when you run them because a view is only as fast as its underlying query. For costly queries and large datasets, this can be a drawback.

A better solution when performance is a concern might be to create a materialized view. Materialized views allow you to cache the results of the query on disk in a temporary table. This makes running queries against the view much faster.

The drawback to materialized views is that the cached results do not automatically update when the data in the base tables changes. So in the example above, if a customer changed their address and we made our view a materialized view, we would not see the change until we refreshed the view. This reruns the query and caches the new results. You’ll see an example of this in the next section.

Creating a Materialized View in Postgres

Imagine you have an online store and want to send out coupons to customers, offering them different deals based on how often they shop and where they live.

You will start with a query that tracks customers by order frequency, how much they ordered, and where they are ordering from. Rather than rewriting this query each time, you can create a view that allows you to find a subset of customers. For example, you might want to see all customers who live in Texas that have bought more than three products in the past five months. Since this query needs to check against all of your customers and all of their orders, it will take a long time to run, so use a materialized view that you can refresh as often as you need to.

CREATE MATERIALIZED VIEW customer_order_volume AS
SELECT
  concat(customers.id, '-', orders.id) AS unique_id,
  concat(customers.first_name,' ', customers.last_name) AS customer_name, 
  orders.created_on AS purchase_date,
  addresses.city AS city,
  addresses.state AS state,
  count(products.product_name) AS order_size,
  sum(products.product_cost) AS order_cost
FROM orders
  INNER JOIN customers ON orders.customer_id = customers.id
  INNER JOIN products_orders po ON orders.id = po.order_id
  INNER JOIN products ON po.product_id = products.id
  INNER JOIN addresses ON addresses.customer_id = orders.customer_id
GROUP BY customer_name, purchase_date, city, state;

This view combines the customer_id and order_id to create a unique identifier for each row. This will help you out later in the tutorial.

You can query materialized views the same way you queried the regular view, but this time, the view’s results have been cached, so the underlying query doesn’t run again.

SELECT * FROM customer_order_volume
WHERE state in ('TX', 'IL', 'OH')
ORDER BY state;

When you want to refresh the data, run:

REFRESH MATERIALIZED VIEW customer_order_volume;

Refreshing a view like this is the fastest method, but you risk blocking other connections trying to read from the view during the refresh. If you want to be able to refresh the view without interrupting read access, you’ll need to do a concurrent refresh:

REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_volume;

This only works if your view has a unique identifier: a column or comma separated list of columns from the view. You need to explicitly set it by creating an index on your materialized view:

CREATE UNIQUE INDEX ON customer_order_volume(unique_id);

You can also remove the view if you don't need it anymore:

DROP VIEW customer_order_volume;

Using Postgres Views in Django and Python

First, the bad news: as of this writing, Django's ORM cannot create views for you. You’ll have to write some raw SQL for Django to run during the migration.

The good news is that once the view is created, it's relatively easy to use it in Django. You just need to set up a model like you would for any other table in the database. In the following sections, you’ll create a materialized view and a method to refresh it. If you want to use a regular view, the process is the same, you just won’t need the refresh method.

The Model

The model attributes should reflect the columns returned by your view just like they would for any other table.

from django.db import models

class CustomerOrderVolume(models.Model):
    unique_id   = models.CharField(max_length=255, primary_key=True)
    customer_name = models.CharField(max_length=255)
    city          = models.CharField(max_length=255)
    state         = models.CharField(max_length=255)
    purchase_date = models.DateField()
    order_size    = models.IntegerField()
    order_cost    = models.FloatField()

    class Meta:
        managed = False
        db_table='customer_order_volume'

Most notable here is the Meta class. Setting manage to false tells Django you don't need it to create the table in the migration. You also need to explicitly set the db_table name so that Django knows which table to run queries on.

The last thing to note about the model is that you need to set one of our fields as a primary key. Otherwise, Django will expect a column called id and throw an error when it doesn't find one. In this case, you can again take advantage of the unique ID field you’ll create for the view.

Create your migration as usual. After the migration is created, add a call to the RunSQL method in the options section of the migration to create the view:

from django.db import migrations, models

class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='CustomerOrderVolume',
            fields=[
                ('unique_id', models.CharField(max_length=255, primary_key=True, serialize=False)),
                ('customer_name', models.CharField(max_length=255)),
                ('city', models.CharField(max_length=255)),
                ('state', models.CharField(max_length=255)),
                ('purchase_date', models.DateField()),
                ('order_size', models.IntegerField()),
                ('order_cost', models.FloatField()),
            ],
            options={
                'db_table': 'customer_order_volume',
                'managed': False,
            },
        ),
         migrations.RunSQL(
            """
            CREATE MATERIALIZED VIEW customer_order_volume AS
                SELECT
                concat(customers.id, orders.id) AS unique_id, 
                concat(customers.first_name,' ', customers.last_name) AS customer_name, 
                orders.created_on AS purchase_date,
                addresses.city AS city,
                addresses.state AS state,
                count(products.product_name) AS order_size,
                sum(products.product_cost) AS order_cost
                FROM orders
                INNER JOIN customers ON orders.customer_id = customers.id
                INNER JOIN products_orders po ON orders.id = po.order_id
                INNER JOIN products ON po.product_id = products.id
                INNER JOIN addresses ON addresses.customer_id = orders.customer_id
                GROUP BY unique_id, customer_name, purchase_date, city, state;
            """,
            "DROP VIEW customer_order_volume;"
        )
    ]

Supply the RunSQL method with SQL code to create and destroy the view. When you run the migrations, Django won’t create a customer_order_volume table because you set managed to false, but it will run the raw SQL and create the view for you.

Finally, create a refresh method that you can call anytime you want to update your materialized view. I chose to create it as a class method, but this is not required. You can do this anywhere since all you are doing is executing raw SQL.

@classmethod
    def refresh_view(cl):
        with connection.cursor() as cursor:
            cursor.execute("REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_volume")

This method can be called whenever you want to repopulate your view’s data. This could be done via a cron job that runs at night when traffic to the site is low.

Now, you can test the view from the Django shell:

In [3]: c = CustomerOrderVolume.objects.all()
In [4]: c
Out[4]: <QuerySet [<CustomerOrderVolume: CustomerOrderVolume object (Jonathan Griffith)>, <CustomerOrderVolume: CustomerOrderVolume object (Stephanie Fernandez)>, <CustomerOrderVolume: CustomerOrderVolume object (Austin Burns)>, ....

You can see that Django returns a query set just like it would with any other model. Similarly, you can filter and access attributes on the objects just as you'd expect:

In [3]: order = c.first()
In [4]: order
Out[4]: <CustomerOrderVolume: CustomerOrderVolume object (Adam Turner)>
In [5]: order.purchase_date
Out[5]: datetime.datetime(2020, 7, 9, 20, 50, 43, 895459)

This is a good start. From here, a helpful addition would be a database table that keeps track of how often the view gets refreshed. You could set up a cron job to run your refresh function for you at night or on the weekends, or it could be called from a signal when the underlying models are updated. Be aware that the refresh might take a while if you have a lot of underlying data, so you probably don’t want to call it too frequently.

Conclusion

In this post, you saw the two different types of views available in Postgresql, and the reasons you might want to create a view for your application. Views are useful if you want to limit the amount of code you write each time you query the database, cut down on the complexity of a large query, or cache the results of a costly query. Whatever your reason, once your view is created, it's just a matter of setting up your Django model correctly to get it working in your Python application. Finally, don’t forget to create a refresh method to update the view if you elect to use a materialized view.

Share this article: If you liked this article we’d appreciate it if you’d tweet it to your peers.

PS: If you are interested in learning about views and materialized views in Ruby on Rails check out our article about it here: Effectively Using Materialized Views in Ruby on Rails

]]>

How we deconstructed the Postgres planner to find indexing opportunities

Lukas Fittl — Tue, 02 Nov 2021 12:00:00 GMT

Everyone who has used Postgres has directly or indirectly used the Postgres planner. The Postgres planner is central to determining how a query gets executed, whether indexes get used, how tables are joined, and more. When Postgres asks itself "How do we run this query?”, the planner answers.

And just like Postgres has evolved over decades, the planner has not stood still either. It can sometimes be challenging to understand what exactly the Postgres planner does, and which data it bases its decisions on.

Earlier this year we set out to gain a deep understanding of the planner to improve indexing tools for Postgres. Based on this work we launched the first iteration of the pganalyze Index Advisor over a month ago, and have received an incredible amount of feedback and overall response.

In this post we take a closer look at how we extracted the planner into a standalone library, just like we did with pg_query. We then assess whether this approach compares to an actually running server, and what is possible now that we can run the planner code. Based on this we look at how we used its decision making know-how to find indexing opportunities, and review the topic of clause selectivity, and how we incorporated feedback by a Postgres community member.

Planning a Postgres query without a running database server
Understanding Postgres clause selectivity
- How we incorporated Postgres community feedback
Creating the best index, vs creating “good enough” indexes
- Join us for design research sessions
Conclusion

Planning a Postgres query without a running database server

At pganalyze we offer performance recommendations for production database systems, without requiring complex installation steps or version upgrades. Whilst Postgres’ extension system is very capable, and we have many ideas on what we could track or do inside Postgres itself, we intentionally decided not to focus on a Postgres extension for giving index advice.

There are three top motivations for not creating an extension:

Index decisions often happen during development, where the database that you are working with is not production sized
Not everyone has direct access to the production database - it’s important we create tooling that can be used by the whole development team
Adopting a new Postgres extension on a production database is risky, especially if the code is new - and you may not be able to install custom extensions (e.g. on Amazon RDS)

We’ve thus focused on creating something that runs separately from Postgres, but knows how Postgres works. Our approach is inspired by our work on pg_query, and enables planning a query solely based on the query text, the schema definition, and table statistics.

We utilized libclang to automatically extract source code from Postgres, just like we've done for pg_query. Whilst for pg_query we extracted a little bit over 100,000 lines of Postgres source, for the planner we extracted almost 470,000 lines of Postgres source, more than 4x the amount of code. For reference, Postgres itself is almost 1,000,000 lines of source code (as determined by sloccount).

Examples of code from Postgres we didn’t use: The executor (except for some initialization routines), the storage subsystem, frontend code, and various specialized code paths.

A good amount of engineering time later, we ended up with a seemingly simple function in a C library, that takes a query, a schema definition, and returns a result similar to an EXPLAIN plan:

/*
 * Plan the provided query utilizing the schema definition and the
 * provided table statistics, and return an EXPLAIN-like result.
 */
PgPlanResult pg_plan(const char* query, const char *schema_and_statistics)
{
  …
}

This function is idempotent, that is, when you pass the same set of input parameters, you will always get the same output parameters.

This required some additional modifications to the extracted code (we have about 90 small patches to adjust certain code paths), especially in places where Postgres does the rare on-demand checking of file sizes, or looking at the B-tree meta page. All of these are instead a fixed input parameter, defined using SET commands in the schema definition.

How accurate is this planning process?

Let’s take a look at one of our own test queries:

WITH unused_indexes AS MATERIALIZED (
  SELECT schema_indexes.id, schema_indexes.name, schema_indexes.last_used_at, schema_indexes.database_id, schema_indexes.table_id 
    FROM schema_indexes
         JOIN schema_tables ON (schema_indexes.table_id = schema_tables.id)
   WHERE schema_indexes.database_id IN (1)
         AND schema_tables.invalidated_at_snapshot_id IS NULL
         AND schema_indexes.invalidated_at_snapshot_id IS NULL
         AND schema_indexes.is_valid
         AND NOT schema_indexes.is_unique AND 0 <> ALL (schema_indexes.columns)
         AND schema_indexes.last_used_at < now() - '14 day'::interval
)
SELECT ui.id, ui.name, ui.last_used_at, ui.database_id, ui.table_id 
  FROM unused_indexes ui
 WHERE COALESCE((
         SELECT size_bytes
           FROM schema_index_stats_35d sis
          WHERE sis.schema_index_id = ui.id
                AND collected_at = '2021-10-31 06:40:04' LIMIT 1), 0) > 32768

This query is used inside the pganalyze application to find indexes that were not in use in the last 14 days. Running EXPLAIN (FORMAT JSON) for the query on our production system, we get a result like this:

 [
   {
     "Plan": {
       "Node Type": "CTE Scan",
       …
       "Startup Cost": 3172.85,
       "Total Cost": 3311.01, 
       "Plan Rows": 11,
       "Plan Width": 60,
       "Filter": "(COALESCE((SubPlan 2), '0'::bigint) > 32768)",
       "Plans": [
         {
           "Node Type": "Nested Loop",
           …
           "Startup Cost": 1.12,
           "Total Cost": 3172.85,
           "Plan Rows": 32,
           "Plan Width": 63, 
           "Inner Unique": true,
           "Plans": [
             {
               "Node Type": "Index Scan",
              "Parent Relationship": "Outer", 
               …
               "Index Name": "index_schema_indexes_on_database_id",
               …
               "Startup Cost": 0.56,
               "Total Cost": 2581.00,
               "Plan Rows": 69,
               "Plan Width": 63,
               "Index Cond": "(database_id = 1)",
               "Filter": "(is_valid AND (NOT is_unique) AND (last_used_at < '2021-10-17'::date) AND (0 <> ALL (columns)))"
             },
             {
               "Node Type": "Index Scan",
               "Parent Relationship": "Inner",
               …
               "Index Name": "schema_tables_pkey",
               …
               "Startup Cost": 0.56,
               "Total Cost": 8.58,
               "Plan Rows": 1,
               "Plan Width": 8,
               "Index Cond": "(id = schema_indexes.table_id)",
               "Filter": "(invalidated_at_snapshot_id IS NULL)"
             }
...

Note that we are intentionally running EXPLAIN without ANALYZE, since we care about the cost-based estimation model used by the planner.

And now, running the same query, with its schema definition and production statistics (but not the actual table data!) provided to the pg_plan function:

[
  {
    "Plan": {
      "Node ID": 0,
      "Node Type": "CTE Scan",
      …
      "Startup Cost": 3181.43,
      "Total Cost": 3324.07,
      "Plan Rows": 11,
      "Plan Width": 60,
      "Filter": "(COALESCE((SubPlan 2), '0'::bigint) > 32768)",
      "Plans": [
        {
          "Node ID": 1,
          "Node Type": "Nested Loop",
          …
          "Startup Cost": 1.12,
          "Total Cost": 3181.43,
          "Plan Rows": 33,
          "Plan Width": 63,
          "Inner Unique": true,
          "Plans": [
            {
              "Node ID": 2,
              "Node Type": "Index Scan",
              "Parent Relationship": "Outer", 
              …
              "Index Name": "index_schema_indexes_on_database_id",
              …
              "Startup Cost": 0.56,
              "Total Cost": 2581.00,
              "Plan Rows": 70,
              "Plan Width": 63,
              "Index Cond": "(database_id = 1)",
              "Filter": "(is_valid AND (NOT is_unique) AND (last_used_at < '2021-10-17'::date) AND (0 <> ALL (columns)))",
            },
            {
              "Node ID": 3,
              "Node Type": "Index Scan",
              "Parent Relationship": "Inner",
              …
              "Index Name": "schema_tables_pkey",
              …
              "Startup Cost": 0.56,
              "Total Cost": 8.58,
              "Plan Rows": 1,
              "Plan Width": 8,
              "Index Cond": "(id = schema_indexes.table_id)",
              "Filter": "(invalidated_at_snapshot_id IS NULL)",
            }
            …

As you can see, for this query the plan cost estimation is within a 1% margin of the actual production estimates. That means, we provided the Postgres planner the exact same input parameters as used on the actual database server, and the cost calculation matched almost to the dot.

Now that we’ve established a basis for running the planner and getting cost estimates, let’s look at what we can do with this.

Finding multiple possible plan paths, not just the best path

When the Postgres planner plans a query, it is under time-sensitive circumstances. That is, all extra work to find a better plan would lead to the planner itself being slow. To be fast, the planner quickly throws away plan options it does not consider worth pursuing.

That unfortunately means we can’t just run EXPLAIN with a flag that says “show me all possible plan variants” - the planner code is simply not written in a way that’s possible, at least not today.

However, with our pg_plan logic running outside the server itself, we do not have these strict speed requirements, and can therefore spend more time looking at alternatives and keeping them around for analysis. For example, here is the internal information we have for a scan node on a table, that illustrates the different paths that could be taken to fulfill the query:

    "Scans": [
      {
        "Node ID": 2,
        "Relation OID": 16398,
        "Restriction Clauses": [
	      …
        ],
        "Plans": [
          {
            "Plan": {
              "Node Type": "Index Scan",
              "Index Name": "schema_indexes_table_id_name_idx",
              …
              "Startup Cost": 0.68,
              "Total Cost": 352.94,
              "Index Cond": "(table_id = id)",
              "Filter": "(is_valid AND (NOT is_unique) AND (last_used_at < '2021-10-17'::date) AND (database_id = 1) AND (0 <> ALL (columns)))",
            },
          },
          {
            "Plan": {
              "Node Type": "Index Scan",
              "Index Name": "index_schema_indexes_on_database_id",
              ...
              "Startup Cost": 0.56,
              "Total Cost": 2581.00,
              "Index Cond": "(database_id = 1)",
              "Filter": "(is_valid AND (NOT is_unique) AND (last_used_at < '2021-10-17'::date) AND (0 <> ALL (columns)))",
            },
          },
          {
            "Plan": {
              "Node ID": 0,
              "Node Type": "Seq Scan",
              ...
              "Startup Cost": 0.00,
              "Total Cost": 3933763.60,
              "Filter": "((invalidated_at_snapshot_id IS NULL) AND is_valid AND (NOT is_unique) AND (last_used_at < '2021-10-17'::date) AND (database_id = 1) AND (0 <> ALL (columns)))",
            },
          },

As you can see, the Seq Scan option was clearly more expensive and not considered. You can also see the different index options and their costs.

What is especially interesting with this plan is that there was actually a cheaper index scan available, but Postgres did not end up using it in the final plan. This is because the Nested Loop ended up being cheaper by using the schema_indexes table as the outer table in the nested loop. The first index could only have been used if the Nested Loop relationship was inverted. That is, if table_id values were used as the input to the schema_indexes scan, instead of table_id values being the output thats matched against the schema_tables table's id column.

As you can see, this data can be especially useful when determining why a particular index wasn’t used, or to consider how to consolidate indexes. In the pganalyze Index Advisor this is surfaced visually in the advanced analysis view:

Note we also indicate the individual filter clauses for the scan, and show which indexes are matching each clause.

Making index recommendations based on restriction clauses

In addition to comparing different existing indexes, we can use the data available to the planner to ask the question “What would the best index look like?”.

For a scan like the above example, we get a list of restriction clauses, which is a combination of the WHERE clauses as well as the JOIN condition. For index scans to work as expected, one or more of the clauses need to match the index definition.

The data looks like this for each scan:

        "Restriction Clauses": [
          {
            "ID": 1,
            "Expression": "schema_indexes.is_valid",
            "Selectivity": 0.9926,
            "Relation Column": "is_valid"
          },
          {
            "ID": 2,
            "Expression": "(schema_indexes.database_id = 1)",
            "Selectivity": 0.0001,
            "OpExpr": {
              "Operator": {
                "Oid": 416,
                "Name": "=",
                "Left Type": "bigint",
                "Right Type": "integer",
                "Result Type": "boolean",
                "Source Func": "int84eq"
              }
            },
            "Relation Column": "database_id"
          },
         ...

Using this data we then attempt a best guess at making a new index, run a CREATE INDEX command behind the scenes, and re-run the Postgres planner to reconsider the new index. If the cost of the new scan improves on the initial scan we make a recommendation and note the difference in estimated cost.

In summary, you can imagine the Index Advisor working roughly like this:

Understanding Postgres clause selectivity

If you look closely at the earlier advanced analysis screenshot, you will notice a new field that we’ve just made available in a new Index Advisor update: Selectivity.

What is Selectivity? It indicates what fraction of rows of the table will be matched by the particular clause of the query. This information is then used by Postgres to estimate the row count that a node returns, as well as determine the cost of that plan node.

Selectivity estimations are front and center to how the planner operates, but they are unfortunately hidden behind the scenes, and historically one would have had to resort to counting/filtering the actual data to confirm how frequent certain values are, or do manual queries against the Postgres catalog.

How does the planner know the selectivity? Counting actual table rows would be very expensive in time sensitive situations. Instead, it primarily relies on the pg_statistic table (often accessed through the pg_stats view for debugging), that keeps table statistics collected by the ANALYZE command in Postgres. You can learn more about how the Postgres planner uses statistics in the Postgres documentation.

The data in the pg_stats view can be queried like this:

SELECT * FROM pg_stats WHERE tablename = 'z' AND attname = 'a';

-[ RECORD 1 ]----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | z
attname                | a
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 17
most_common_vals       | {2,3,7,12,13,4,1,5,11,14,9,6,10,8,15,16,0}
most_common_freqs      | {0.0653,0.06446667,0.063766666,0.06363333,0.063533336,0.063433334,0.0629,0.061966665,0.061833333,0.0618,0.0611,0.0605,0.0604,0.060366668,0.059666667,0.0332,0.032133333}
histogram_bounds       | 
correlation            | 0.061594862
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   |

In this example you can see that there are a total of 17 distinct values (n_distinct), with values between 1 and 15 having equal frequency, and 0 and 16 being less frequent (most_common_vals/most_common_freqs). None of the rows have NULL values (null_frac).

Now, to have accurate plans in the Index Advisor, this same information can be provided using the new special SET commands in the schema definition:

SET pganalyze.avg_width.public.z.a = 4;
SET pganalyze.correlation.public.z.a = 0.061594862;
SET pganalyze.most_common_freqs.public.z.a = '{0.0653,0.06446667,0.063766666,0.06363333,0.063533336,0.063433334,0.0629,0.061966665,0.061833333,0.0618,0.0611,0.0605,0.0604,0.060366668,0.059666667,0.0332,0.032133333}';
SET pganalyze.most_common_vals.public.z.a = '{2,3,7,12,13,4,1,5,11,14,9,6,10,8,15,16,0}';
SET pganalyze.n_distinct.public.z.a = 17;
SET pganalyze.null_frac.public.z.a = 0;

You can learn how to retrieve this information, as well as all the available settings, in the Index Advisor documentation.

Based on this data we can now calculate the selectivity of a clause like z.a = 12 to determine that it is 0.0636. Or put differently, the planner estimates that 6.36% of the table would match this condition. This same information is now directly visible in the Index Advisor, when viewing the advanced analysis.

How we incorporated Postgres community feedback

At this point we’d also like to give a shout-out to Hubert Lubaczewski (aka “depesz”), who reviewed the initial version of the index advisor, had some critical feedback, and provided an example we could investigate further.

Based on improvements we've done, we now take selectivity estimates into account for index suggestions. In particular, we give priority to columns with low selectivity, i.e. those that match a small number of rows. Note this requires use of SET commands in addition to the raw schema data for the best results.

With these recent changes the pganalyze Index Advisor recommendation matches depesz's handcrafted index suggestion in the blog post:

This is a good example of how our planner-based index advisor approach can be improved and tuned, as its behavior is modeled on Postgres itself.

Creating the best index, vs creating “good enough” indexes

Another question that came up in multiple conversations, is “Should I just create all indexes that the Index Advisor recommends for each query?”.

Unless you have just a handful of queries, the answer to that is no - you shouldn’t just create every index, because that would slow down writes to the table, as they have to update each index separately.

Today, the best way to utilize the index advisor for a whole database, is to try out different CREATE INDEX statements - and make sure to update the schema definition with your index definition, to have the Index Advisor make a determination based on the existing indexes.

But we are taking this a step further. The work we are currently doing in this area is focused on two aspects:

Utilize query workload data from pg_stat_statements to weigh common queries heavier in index recommendations, and come up with “good enough” indexes that cover more queries
Estimate the write overhead of a new index, based on the number of updates/deletes/inserts on a table, as well as the estimated index size

With this, not only are we targeting better summary recommendations, but we also want to help you determine when you can consolidate indexes, where you have two existing very similar indexes.

Curious to learn more? Sign up to join us for a design research session:

Join us for design research sessions

If you are up for testing early prototypes and answering questions to help us understand your workflows better, then we would like to invite you to our design research sessions:

pganalyze Design Research Sign-Up

Conclusion

Realistically, there will always be a trial-and-error aspect in making indexing decisions. But good tools can help guide you in those decisions, no matter your level of Postgres know-how. Our goal with the pganalyze Index Advisor is to make indexing an activity that can be done by the whole team, and where it’s easy to get a CREATE INDEX statement to start working from.

As you see, the Index Advisor is based on the core logic of Postgres itself, and that forms the basis for making complex assessments behind the scenes. We believe in an iterative process and sharing what we’ve learned, and hope to continue the conversation on how to make indexing better for Postgres.

We've recently made a number of updates to the Index Advisor. Give it a try, and use the new SET table statistics syntax for best results. Encounter an issue with the Index Advisor? You can provide feedback through our dedicated discussion board on GitHub, or send us a support request for in-app functionality.

If you want to share this article with your peers, feel free to tweet it.

]]>

A better way to index your Postgres database: pganalyze Index Advisor

Lukas Fittl — Thu, 23 Sep 2021 12:00:00 GMT

When you run an application with a relational database attached, you will no doubt have encountered this question: Which indexes should I create?

For some of us, indexing comes naturally, and B-tree, GIN and GIST are words of everyday use. And for some of us it’s more challenging to find out which index to create, taking a lot of time to get right. But what unites us is that creating and tweaking indexes is part of our job when we use a relational database such as Postgres in production. We need to get indexes right, in order to make sure our application performs well.

There are multiple ways to determine which indexes get used in your Postgres database. For example, you may choose to query the pg_stat_user_indexes table. There are Postgres extensions like HypoPG to try out hypothetical indexes on your database server. And some of us may decide to go ahead and simply index every column on every table.

But the reality nowadays is that modern apps are complex, and applications built on Postgres grow at an incredible pace. This makes indexing more important, but also more challenging than ever. As developers we want to focus on what matters, and not spend hours investigating which Postgres index to create.

At the beginning of this year we set out to improve the status quo for indexing with Postgres. And today, after many months of effort and having published an eBook about Index in Postgres, we’re excited to announce the new pganalyze Index Advisor for Postgres.

Before we dive into all the details, let’s take a step back and ask ourselves “How could we determine which index to create?”

Postgres Indexing: Is machine learning the answer?
How Postgres determines when to use an index
Creating the best Postgres index for your query
Review existing indexes with the Index Advisor
Try out the Index Advisor for free with the standalone tool
Automatic index advisor for your production queries in pganalyze
pganalyze Index Advisor and new pricing plans
Conclusion

Postgres Indexing: Is machine learning the answer?

It’s 2021, and of course we had to ask ourselves - is this a problem that requires ML and AI? Couldn’t we just train a model to create the right indexes for us?

We turned to GitHub CoPilot, the most sophisticated AI-based helper that exists today for developers, and asked it to create an index for a real world query in our own Postgres database:

Suffice to say that indexing like this is not effective. You will end up with significant overhead due to indexing almost everything, including columns that are not even referenced in the query.

Whilst this ML model will certainly improve, and there is research on more purpose-built solutions for databases, the point is: ML is not the magic solution we are looking for. We need more than just machine learning to know which indexes to create.

In fact, from our own experience, knowing which index to create does not require an ML model at all. Knowing how to create the best index can be done with a deterministic approach, that takes into account production database queries and schema statistics, and has a detailed understanding of how Postgres works.

And who knows best how Postgres works? Postgres itself!

How Postgres determines when to use an index

We started out by asking ourselves the question: How does Postgres decide which index to use? We can find this logic in the Postgres planner, which takes a parsed query and turns it into an execution plan.

Specifically, we decided to look at the function create_index_paths(..), where you can see that Postgres loops over all indexes on a particular table, and decides which indexes can be used:

void
create_index_paths(PlannerInfo *root, RelOptInfo *rel)
{
	...

	/* Skip the whole mess if no indexes */
	if (rel->indexlist == NIL)
		return;

	/* Bitmap paths are collected and then dealt with at the end */
	bitindexpaths = bitjoinpaths = joinorclauses = NIL;

	/* Examine each index in turn */
	foreach(lc, rel->indexlist)
	{
		IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);

		/*
		 * Ignore partial indexes that do not match the query.
		 * (generate_bitmap_or_paths() might be able to do something with
		 * them, but that's of no concern here.)
		 */
		if (index->indpred != NIL && !index->predOK)
			continue;
   ...

Going into all this logic would likely fill multiple books, and it is based on decades of academic research. Cleary, Postgres is very sophisticated about determining which indexes can be used for a given query. Amongst the core decisions it makes are:

Does the index match the columns used in the query?
Does the query’s operator match the operator class of the index?
Does the index have a sort order that can be used by the query to avoid an explicit Sort step?
Does the query condition match a partial index condition?

And many other requirements and heuristics that need extensive knowledge of Postgres’ inner workings.

At pganalyze, we looked at this, and other functions, and we asked ourselves: What if we used the Postgres planner to tell us which index it would like to see, based on a given query?

That is, instead of asking “does this index match this query?”, we are asking “what’s the perfect index for this query?”. Perfect as in: ticks all the boxes in terms of operators/operator classes, columns and data types, and can be used to fulfill the query filter and join clauses of the query, if possible.

This logic based on the Postgres planner is the centerpiece of the new pganalyze Index Advisor. Our index advisor is available in the pganalyze app, but we also decided to provide a free, standalone version available to anyone.

Simply paste your query and schema data and get insights on whether existing indexes are useful, or learn why indexes you thought might help are ignored. Note that data uploaded to the standalone pganalyze Index Advisor stays local within your browser, unless you explicitly use the share functionality.

Going forward in this article, when you see examples and screenshots of the index advisor for Postgres, we are showing the public, standalone tool.

Creating the best Postgres index for your query

Let’s go back to our earlier example, and run it through the pganalyze Index Advisor:

As you can see, we get a recommendation for a single multi column index that covers all columns that are in the WHERE clause, except for the column that’s inside the OR condition. This is the best index that we can create to ensure the query runs fast.

At launch the index advisor is focused on recommending B-tree indexes, with support for other index types coming soon.

Note that the index advisor also understands common query patterns like filtering out records based on deleted_at column, and recommends partial indexes for these queries:

Review existing indexes with the Index Advisor

The pganalyze Index Advisor is also able to determine how different existing indexes perform, to help you understand which index Postgres will most likely use.

For example, imagine a schema and index definition like this:

CREATE TABLE events(
  id bigserial PRIMARY KEY,
  created_at timestamptz,
  severity smallint,
  organization_id bigint,
  description text,
  details jsonb
);
CREATE INDEX ON events(organization_id, severity);

We want to understand how effective this index is for queries that only query the “severity” column, without looking up a particular organization.

With the index advisor, we can see the cost difference between the indexes, and that Postgres prefers using the single-column index in most situations:

This can be explained by the fact that single-column indexes are usually smaller, and it’s more efficient, especially in older Postgres releases, to find index records when the queried column is listed first in the column list. You may still choose to use a multi-column index, but this helps you understand the trade-off.

Try out the Index Advisor for free with the standalone tool

Want to try out the index advisor yourself? As mentioned above, we developed a standalone version of the index advisor that runs fully in your web browser, powered by our self-contained Postgres planner compiled to WebAssembly.

You can simply go to https://pganalyze.com/index-advisor, paste your query and schema, and get your recommendations. If you don’t have a query and schema ready, for example because you are reading this on your mobile phone, you can take a look at how it works with a set of examples we added for your convenience.

We’ve also ensured that the standalone tool is ready for collaboration. If you want to share index recommendations with your team, simply click the [Share] button. After you confirm, this uploads the result of the index advisor to the pganalyze servers for sharing, and gives you a unique URL to share. Note that unless you share, all data stays local within your web browser.

Of course, copying query texts can be tedious and a lot of work. But, if you are a pganalyze customer, we already have your query information in our app. The second part of today’s launch is about the new in-app pganalyze Index Advisor.

Automatic index advisor for your production queries in pganalyze

With the new Index Advisor in pganalyze, you can now see at a glance what index recommendations exist for each of your queries. You can simply go to the query details page for your queries, and see what the Index Advisor recommends:

This is really nice, but we already have work underway to help you get an even better assessment of index usage summarized across your whole database. But more on that soon (sign up for the newsletter if you want to get updates about this).

pganalyze Index Advisor and new pricing plans

The pganalyze Index Advisor represents a significant improvement to the core functionality of pganalyze, and introduces additional sophisticated processing for each query received by pganalyze. We are therefore taking this moment to introduce both a new Production and a new Scale plan. In addition to the Index Advisor, the new Scale plan also features SAML-based Single Sign On in early access, to integrate with identity providers such as Okta.

If you are an existing pganalyze customer on (what is now) a legacy plan you can try out the Index Advisor until the end of October 2021. Trying out the Index Advisor requires no changes to your existing pganalyze integration.

If you do not have an account with us at the moment but sign up for a new trial the pganalyze Index Advisor will be activated for your 14-day trial. Try it out today in the pganalyze app, or start a new trial.

Conclusion

All of us at pganalyze are excited to share the new pganalyze Index Advisor with you. Try out the standalone tool or explore the new in-app functionality today. We hope the standalone tool is a service you will come back to time and again and get value out of it. Feel free to bookmark it!

You can provide feedback through our dedicated discussion board on GitHub, or send us a support request for in-app functionality. We look forward to hearing from you.

If you want to share this article with your peers, feel free to tweet it.

]]>

Using Postgres CREATE INDEX: Understanding operator classes, index types & more

Lukas Fittl — Thu, 12 Aug 2021 12:00:00 GMT

Most developers working with databases know the challenge: New code gets deployed to production, and suddenly the application is slow. We investigate, look at our APM tools and our database monitoring, and we find out that the new code caused a new query to be issued. We investigate further, and discover the query is not able to use an index.

But what makes an index usable by a query, and how can we add the right index in Postgres?

In this post we’ll look at the practical aspects of using the CREATE INDEX command, as well as how you can analyze a PostgreSQL query for its operators and data types, so you can choose the best index definition.

How do you create an index in Postgres?
- Parse analysis: How Postgres interprets your query
- Looking behind the scenes: Operators and data types
Finding the right index type
Specifying operator classes during CREATE INDEX
Specifying multiple columns when adding a Postgres index
Using functions and expressions in an index definition
Specifying a WHERE clause to create partial PostgreSQL indexes
Using INCLUDE to create a covering index for Index-Only Scans
Adding and dropping PostgreSQL indexes safely on production
Conclusion

How do you create an index in Postgres?

Before we dive into the internals, let’s set the stage and look at the most basic way of creating an index in Postgres. The essence of adding an index is this:

CREATE INDEX ON [table] ([column1]);

For an actual example, let’s say we have a query on our users table that looks for a particular email address:

SELECT * FROM users WHERE users.email = 'test@example.com';

We can see this query is searching for values in the “email” column - so the index we should create is on that particular column:

CREATE INDEX ON users (email);

When we run this command, Postgres will create an index for us.

It's important to remember that indexes are redundant data structures. If you drop an index you don't lose any data. The primary benefit of an index is to allow faster searching of particular rows in a table. The alternative to having an index is to have Postgres scan each row individually ("Sequential Scan"), which is of course very slow for large tables.

Let's take a look behind the scenes of how Postgres determines whether to use an index.

Parse analysis: How Postgres interprets your query

When Postgres runs our query, it steps through multiple stages. At a high level, they are:

Parsing (see our blog post on the Postgres parser)
Parse analysis
Planning
Execution

Throughout these stages the query is no longer just text - it's represented as a tree. Each stage modifies and annotates the tree structure, until it's finally executed. For understanding Postgres index usage, we need to first understand what parse analysis does.

Lets pick a slightly more complex example:

SELECT * FROM users WHERE users.email = 'test@example.com' AND users.deleted_at IS NULL;

We can look at the result of parse analysis by turning on the debug_print_parse setting, and then looking at the Postgres logs (not recommended on production databases):

LOG:  parse tree:
DETAIL:     {QUERY 
	   ...
	      :quals 
	         {BOOLEXPR 
	         :boolop and 
	         :args (
	            {OPEXPR 
	            :opno 98 
	            :opfuncid 67 
	            :opresulttype 16 
	            :opretset false 
	            :opcollid 0 
	            :inputcollid 100 
	            :args (
	               ...
	            )
	            :location 38
	            }
	            {NULLTEST 
	            :arg 
	               ...
	            :nulltesttype 0 
	            :argisrow false 
	            :location 80
	            }
...

This format is a bit hard to read - let’s look at it in a more visual way, and with names instead of OIDs:

We can see two important parse nodes here, one for each expression in the WHERE clause. The OpExpr node, and the NullTest node. For now, let's focus on the OpExpr node.

Looking behind the scenes: Operators and data types

It's important to remember that Postgres is an object-relational database system. That is, it's designed from the ground up to be extensible. Many of the references that are added in parse analysis are not hard-coded logic, but instead reference actual database objects in the Postgres catalog tables.

The two most important objects to know about are data types and operators. You are most likely familiar with data types in Postgres, for example you have used them when specifying the schema for your table. Operators in Postgres define how particular comparisons between one or two values, for example in a WHERE clause, are implemented.

The OpExpr node represents an expression that uses an operator to compare one or two values of a given type. In this case you can see we are using the =(text, text) operator. This operator utilizes the = symbol as its name, and has a text data type on the left and right of the operator.

We can query the pg_operator table to see details about it, including which function implements the operator:

SELECT oid, oid::regoperator, oprcode, oprnegate::regoperator
  FROM pg_operator
 WHERE oprname = '=' AND oprleft = 'text'::regtype AND oprright = 'text'::regtype;

 oid |     oid      | oprcode |   oprnegate   
-----+--------------+---------+---------------
  98 | =(text,text) | texteq  | <>(text,text)
(1 row)

And if you really want to know what’s happening, you can look up the operator's underlying texteq function in the Postgres source:

/*
 * Comparison functions for text strings.
 */
Datum
texteq(PG_FUNCTION_ARGS)
{
    ...
    if (lc_collate_is_c(collid) ||
		collid == DEFAULT_COLLATION_OID ||
		pg_newlocale_from_collation(collid)->deterministic)
    {
        ...
    	result = (memcmp(VARDATA_ANY(targ1), VARDATA_ANY(targ2),
				  len1 - VARHDRSZ) == 0);
        ...
    }
    else
	{
        ...
        result = (text_cmp(arg1, arg2, collid) == 0);
        ...
    }
    ...
}

That function illustrates nicely how Postgres considers the collation to determine whether it can do a fast comparison that simply compares bytes, or whether it has to do a more expensive full text comparison. As we can see from the source, using a C locale for your collation can yield performance benefits.

Of course you can also define your own custom operators that work on your own custom data types. Postgres is extensible like that, and that’s actually pretty neat.

Operators are essential for creating the right index. The operator that is used by an expression is the most important detail, besides the column name, that indicates whether a particular index can be used.

You can think of operators as the "how" we want to search the table for values. For example, we may use a simple = operator to match values for equality against an input value. Or we may utilize a more complex operator, such as @@ to perform a text search on a tsvector column.

Finding the right index type

When you think of an index type, it's important to remember that it's ultimately a specific data structure that supports a specific, limited set of search operators. For example, the most common index type in Postgres, the B-tree index, supports the = operator as well as the range comparison operators (<, <=, =>, >), and the ~ and ~* operators in some cases. It does not support any other operators.

Let's say we have a tsvector column on our users table, and we use the @@ operator to search the column:

SELECT * FROM users WHERE about_text_search @@ to_tsquery('index');

Even if I create an index, it keeps doing a sequential scan:

CREATE INDEX ON users(about_text_search);

pgaweb=# EXPLAIN SELECT * FROM users WHERE about_text_search @@ to_tsquery('index');
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on users  (cost=10000000000.00..10000000006.51 rows=1 width=4463)
   Filter: (about_text_search @@ to_tsquery('index'::text))
(2 rows)

This is because a B-tree index does not have the correct data structure to support text searches. There is no operator class that matches B-Tree indexes and the @@(tsvector,tsquery) operator.

Like earlier, thanks to Postgres extensibility, we can introspect the system to understand operator classes. Which index type can support the @@ operator on a tsvector column?

We can query the internal tables to answer this question:

SELECT am.amname AS index_method,
       opf.opfname AS opfamily_name,
       amop.amopopr::regoperator AS opfamily_operator
  FROM pg_am am,
       pg_opfamily opf,
       pg_amop amop
 WHERE opf.opfmethod = am.oid AND amop.amopfamily = opf.oid
       AND amop.amopopr = '@@(tsvector,tsquery)'::regoperator;

 index_method | opfamily_name |  opfamily_operator   
--------------+---------------+----------------------
 gist         | tsvector_ops  | @@(tsvector,tsquery)
 gin          | tsvector_ops  | @@(tsvector,tsquery)
(2 rows)

Looks like we need either a GIN or GIST index! We can create a GIN index like this:

CREATE INDEX ON users USING gin (about_text_search);

And voilà, it can be used by the query:

=# EXPLAIN SELECT * FROM users WHERE about_text_search @@ to_tsquery('index');
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Bitmap Heap Scan on users  (cost=8.25..12.51 rows=1 width=4463)
   Recheck Cond: (about_text_search @@ to_tsquery('index'::text))
   ->  Bitmap Index Scan on users_about_text_search_idx1  (cost=0.00..8.25 rows=1 width=0)
         Index Cond: (about_text_search @@ to_tsquery('index'::text))
(4 rows)

What's that tsvector_ops name we saw in the internal Postgres table?

That's how index types are linked to operators, using operator families and operator classes. For a given operator, there can be multiple different operator classes - an operator class defines how data is represented for a particular index type, and how the search operation for that index works to implement the operator used in a query.

Specifying operator classes during CREATE INDEX

For example let’s look at =(text,text), which is the operator used in an earlier query:

SELECT am.amname AS index_method,
       opf.opfname AS opfamily_name,
       amop.amopopr::regoperator AS opfamily_operator
  FROM pg_am am,
       pg_opfamily opf,
       pg_amop amop
 WHERE opf.opfmethod = am.oid AND amop.amopfamily = opf.oid
       AND amop.amopopr = '=(text,text)'::regoperator;

 index_method |  opfamily_name   | opfamily_operator 
--------------+------------------+-------------------
 btree        | text_ops         | =(text,text)
 hash         | text_ops         | =(text,text)
 btree        | text_pattern_ops | =(text,text)
 hash         | text_pattern_ops | =(text,text)
 spgist       | text_ops         | =(text,text)
 brin         | text_minmax_ops  | =(text,text)
 gist         | gist_text_ops    | =(text,text)
(7 rows)

You can see there is a default operator class (text_ops) that gets used when you don’t explicitly specify it - for text columns the default operator class is often all you need.

But there are cases where we want to set a particular operator class. For example, let's say we run a LIKE query on our database, and our database happens to use the en_US.UTF-8 collation - in that case, you will see the LIKE query is not actually able to use an index:

CREATE INDEX ON users (email);

pgaweb=# EXPLAIN SELECT * FROM users WHERE email LIKE 'lukas@%';
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on users  (cost=10000000000.00..10000000001.26 rows=1 width=4463)
   Filter: ((email)::text ~~ 'lukas@%'::text)
(2 rows)

Generally, LIKE queries are challenging to index, but if you do not have a leading wildcard, an index can be created that works for them - but you need to either (1) use the C locale on your database (effectively saying you don’t want language-specific text sorting/comparison), or (2) use the text_pattern_ops operator class.

Let’s create the same index, but this time specify the text_pattern_ops operator class:

CREATE INDEX ON users (email text_pattern_ops);

pgaweb=# EXPLAIN SELECT * FROM users WHERE email LIKE 'lukas@%';
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Index Scan using users_email_idx on users  (cost=0.14..8.16 rows=1 width=4463)
   Index Cond: (((email)::text ~>=~ 'lukas@'::text) AND ((email)::text ~<~ 'lukasA'::text))
   Filter: ((email)::text ~~ 'lukas@%'::text)
(3 rows)

As you can see now the same LIKE query can use the index.

Now that we know our index type and operator class for our columns, let's look at a few other aspects of creating an index.

Specifying multiple columns when adding a Postgres index

One essential feature is the option to add multiple columns to an index definition.

You can do it simply like that:

CREATE INDEX ON [table] ([column_a], [column_b]);

But what does that actually do? Turns out it’s dependent on the index type. Each index type has a different representation for multiple columns in its data structure. And some index types like BRIN or Hash do not support multiple columns.

However with the most common index type, B-tree, multi-column indexes work well, and they are commonly used. The most important thing to know for multi-column B-tree indexes: Column order matters. If you have some queries that only utilize column_a, but all queries utilize column_b, you should put column_b first in your index definition. If you don’t follow this rule, you will end up with queries doing a lot more work because they have to skip over all the earlier columns that they can’t filter on. With GIST indexes on the other hand, this does not matter - and you can specify columns in any order.

Another decision to make is: Should I create multiple indexes, one for each column I’m querying by, or should I create a single multi-column index?

CREATE INDEX ON [table] ([column_a]);
CREATE INDEX ON [table] ([column_b]);
--- or
CREATE INDEX ON [table] ([column_a], [column_b]);

When looking at an individual query, the answer will almost always be: Create a single multi-column index that matches the query. It will be faster than having multiple indexes.

But if you have a larger workload, it may make sense to create multiple single-column indexes. Be aware that Postgres will have to do more work in that case, and you should verify what indexes actually get chosen by looking at your EXPLAIN plans.

Using functions and expressions in an index definition

Stepping back from specific index types for a moment: Postgres has a universal feature that applies to all index types, that's pretty useful: Instead of indexing a particular column's value, you can index an expression that references the column's data.

For example, we might typically compare our user email addresses with the lower(..) function:

SELECT * FROM users WHERE lower(email) = $1

If you were to run EXPLAIN on this, you would notice that Postgres is not able to use a simple index on email here - since it doesn’t match the expression.

But since lower(..) is what’s called a "immutable" function, we can use it to create an expression index, that indexes all values of email with their lower-case form:

CREATE INDEX ON users (lower(email));

Now our query will be able to use the index. Note that this does not work for all functions. For example, if you were to create an index on now(), it would fail:

CREATE INDEX ON users (now());

ERROR:  functions in index expression must be marked IMMUTABLE

Additionally, remember that expression indexes only work when they match the query. If we only have an index on lower(email), a query that simply references email won’t be able to use the index.

Specifying a WHERE clause to create partial PostgreSQL indexes

Let’s return to an example we saw at the beginning of the post - but now let’s look at the NullTest expression:

Here we are making sure we only get rows that are not yet marked as deleted by our application. Depending on your workload, this may be a very large number of rows that needs to be skipped over.

Whilst you could create an index that includes the deleted_at column, it would be quite wasteful to have all these index entries that you don’t actually want to ever look at.

Postgres has a better way: With partial indexes, you can restrict for which rows the index has index entries. When the restriction does not apply, the row won’t be saved to the index, saving space. And during query execution, this also acts as a significant time saver in many cases, since the planner can do a simple check to determine which partial indexes match, and ignore all that don't match.

In practice, all you need to do is add a WHERE clause to your index definition:

CREATE INDEX ON users(email) WHERE deleted_at IS NULL;

There are reasons why you may not want to do that though:

First, adding this restriction means that only queries that contain deleted_at IS NULL will be able to use the index. That means you may need two indexes, one with that restriction and the other without.

Second, adding hundreds or thousands of partial indexes causes overhead in the Postgres planner, as it has to do a more expensive analysis to determine which indexes can be used.

Using INCLUDE to create a covering index for Index-Only Scans

Last but not least, let’s talk about a more recent addition to Postgres: The INCLUDE keyword that can be added to CREATE INDEX.

Before we look at what this keyword does, let’s understand the difference between an Index Scan and an Index-Only Scan. An Index-Only Scan is possible when all data that is needed can be retrieved from the index itself - instead of having to fetch it from disk.

Note that Index-Only scans only work when the table has been recently VACUUMed - otherwise Postgres will need to check visibility too often for each index entry, and therefore does not opt to use Index-Only Scans, preferring an Index Scan instead in most cases.

Let's look at two examples - one query that matches an index fully, and one that does not (because of the target list):

CREATE INDEX ON users (email, id);

=# EXPLAIN SELECT id FROM users WHERE email = 'test@example.com';
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Index Only Scan using users_email_id_idx on users  (cost=0.14..4.16 rows=1 width=4)
   Index Cond: (email = 'test@example.com'::text)
(2 rows)

=# EXPLAIN SELECT id, fullname FROM users WHERE email = 'test@example.com';
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Index Scan using users_email_id_idx on users  (cost=0.14..8.15 rows=1 width=520)
   Index Cond: ((email)::text = 'test@example.com'::text)
(2 rows)

Now, to get an Index Only Scan for the second query we can create an index that includes that column at the end - and that makes Postgres use an Index Only Scan:

CREATE INDEX ON users (email, id, fullname);

=# EXPLAIN SELECT id, fullname FROM users WHERE email = 'test@example.com';
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Index Only Scan using users_email_id_fullname_idx on users  (cost=0.14..4.16 rows=1 width=520)
   Index Cond: (email = 'test@example.com'::text)
(2 rows)

However, doing this has a few restrictions: It doesn’t work if you have unique indexes (since any column would modify what’s being checked for being unique), and it bloats the data stored in the index for searching.

For B-tree indexes the new INCLUDE keyword is the better approach:

CREATE INDEX ON users (email, id) INCLUDE (fullname);

This keeps the overhead for such additional columns slightly lower, works without problems with UNIQUE constraint indexes, and clearly communicates the intent: That you only added a column in order to support Index Only Scans.

This is a feature best used sparingly: Adding more data to the index means larger index values, which on its own can be a problem - it’s usually not a good idea to just add a lot of columns to the INCLUDE clause for an index.

Adding and dropping PostgreSQL indexes safely on production

I’ll end with a warning: Creating indexes on production databases requires a bit of thought. Not just which index definition to use, but also how to create them, and when to take the I/O impact of the new index being built.

The most important thing: Remember that Postgres will take an exclusive lock when you simply run CREATE INDEX, that will block all reads and writes to that table. That’s why Postgres has the special CONCURRENTLY keyword. When you create an index on a table on production that already has data, always specify this keyword:

CREATE INDEX CONCURRENTLY ON users (email) WHERE deleted_at IS NULL;

This is the same when dropping an index with DROP INDEX - adding CONCURRENTLY reduces the locking requirements slightly, making it faster to use this operation on production.

Conclusion

In this post you should have gotten a fundamental understanding of how operators and operator classes related to indexing, and why knowing these concepts is essential to creating the best index for complex queries. We also looked at a few complimentary features of the CREATE INDEX command, that are typically needed when reasoning about which index to create.

There are actually a few things we didn’t talk about: Adding indexes to specific tablespaces, using index storage parameters (especially useful for GIN index types!) and specifying the sort order for a particular column. I encourage you to take a further look at the Postgres documentation for these topics.

Share this article: If you liked this article you might want to tweet it to your peers.

]]>

Efficient Pagination: PostgreSQL and Django

Ryan Westerberg — Tue, 20 Jul 2021 12:00:00 GMT

You could say most web frameworks take a naive approach to pagination. Using PostgreSQL’s COUNT, LIMIT, and OFFSET features for pagination works fine for the majority of web applications, but if you have tables with a million records or more, performance degrades quickly.

Django is an excellent framework for building web applications, but its default pagination method falls into this trap at scale. In this article, I’ll help you understand Django’s pagination limitations and offer three alternative methods that will improve your application’s performance. Along the way, you’ll see the tradeoffs and use cases for each method so you can decide which is the best fit for your application.

Understanding Naive PostgreSQL Pagination
Performance of Naive PostgreSQL Pagination
Pagination Handling in Django
PostgreSQL Pagination in Django: Option 1 – Removing the COUNT Query
PostgreSQL Pagination in Django: Option 2 – Approximating the COUNT
PostgreSQL Pagination in Django: Option 3 – Keyset Pagination
Conclusion

Understanding Naive PostgreSQL Pagination

Let’s have a look at an example of the type of pagination control that a web application might use:

In this control, the user may go to the previous and next pages or jump directly to a specific page. The query to get the tenth page using Postgres’ LIMIT and OFFSET approach might look like this:

SELECT *
FROM users
ORDER BY created_at DESC
LIMIT 10
OFFSET 100

Notice that to get the correct OFFSET, you must multiply the page number you want by the LIMIT.

There's one more query needed to display our pagination control. You must know the number of records in the table. Without that information, you won’t know how many pages you need to seek through.

SELECT count(*)
FROM users

You might see how this approach can become a real performance issue very quickly.

Performance of Naive PostgreSQL Pagination

To better understand the performance bottleneck of LIMIT and OFFSET, you can import some test data and try it out. First, create a table large enough to encounter slowdowns:

CREATE TABLE USERS (
    id serial,
    name varchar(50)
);

INSERT INTO users
SELECT
    --- Ten million records
    generate_series(1,10000000) AS id,
    --- Example: "e6f2c6842d146c518185e1e47add9532"
    substr(md5(random()::text), 0, 50) AS name;

When you run the query to get the tenth page of results, the response is nearly instant. On my 2018 Macbook Pro with the latest version of Postgres, I see data in 89ms.

However, for queries farther in, the wait times increase. With a LIMIT of 10 and an OFFSET of 5,000,000, a response takes 2.62 seconds.

Finally, you can look at the COUNT query, which runs on all 10 million rows. That query now takes a lethargic 4.45 seconds.

The slow performance in these examples is caused by the way that OFFSET and COUNT work. Getting to the specified page using OFFSET requires the database to traverse each index up to the page you want. Therefore, the performance degrades the farther you peer into the table.

The naive approach to pagination using COUNT, LIMIT, and OFFSET is only a viable solution for tables under a million rows. In large tables, you can expect to see slow queries and congruently a poor user experience.

Pagination Handling in Django

Now that you have some background knowledge on the performance of pagination queries in Postgres, you can start to understand why pagination slows down in Django. To demonstrate how Django handles pagination, I created a new application with a User model and inserted 10 million records by adapting our earlier query.

# {project}/users/models.py
from django.db import models

class User(models.Model):
    name = models.CharField(max_length=50)

I used the admin site to test the pagination speed since it works out of the box.

# {project}/users/admin.py
from django.contrib import admin
from .models import User

admin.site.register(User)

Now that the User model is presented in the admin panel, I can see the table with 10 million records.

Using django-debug-toolbar I can peer into the SQL queries that Django is generating in real-time. There are two queries used to generate this UI:

-- Count the total number of records - 2.43 seconds
SELECT COUNT(*) AS "__count"
  FROM "users_user"

-- Get first page of items - 2ms
SELECT "users_user"."id",
       "users_user"."name"
  FROM "users_user"
 ORDER BY "users_user"."id" DESC
 LIMIT 100

These queries should look familiar because they are almost identical to the naive pagination queries above. Strangely, the count query is triggered twice, which means that when you load the Django admin panel, you have to wait for the database to count every single row of the table two times.

When you click on page 99,999, Django will fire off two count queries again and another pagination query using LIMIT and OFFSET:

-- Get the 99,999 page (100 results per page) - 13.34 seconds
SELECT "users_user"."id",
       "users_user"."name"
  FROM "users_user"
 ORDER BY "users_user"."id" DESC
 LIMIT 100
OFFSET 9999900

This query takes a whopping 13 seconds to finish!

Clearly, the naive approach to pagination in Django is slow for large tables. Over time, your database tables will likely grow, and as they reach tens of millions of records, you and your customers are going to start to notice these terrible load times. So what can you do about it?

In the following sections, I’ll show you three options for improving your pagination performance in a Django application.

PostgreSQL Pagination in Django: Option 1 – Removing the COUNT Query

The COUNT query dominates the loading time for the first page of results. When skipping to later pages, the offset query is the slowest, but I’ll focus on improving the COUNT first in this first option.

This may come as a surprise, but one solution is to remove the count query completely.

Won’t that break the UI!?

Sort of… In this case, it might be reasonable not to know how many pages are in the Users table. It’s not often that users will find themselves at the five millionth page of a table of records. Navigating to the next and previous page is typically enough control. Using a search box or filter is likely a better method for finding the record you want. Look at Django’s search_fields for information on enabling search and filtering in the admin panel and feel free to read through this pganalyze article about Full Text Search in Django.

The most well-known example of this type of pagination is the Google search results page. At the bottom of the page, a truncated pagination control shows direct links to only the first 10 pages:

This doesn’t mean that Google is preventing you from seeing the rest of the billions of results. It’s simply telling you that a refined search term would be a better way to get to those results than pagination.

If getting rid of the COUNT makes sense in your application, Django makes it easy to hide. First, overwrite the count property of the default Paginator:

# {project}/users/paginator.py
from django.core.paginator import Paginator
from django.utils.functional import cached_property

class UserPaginator(Paginator):
    
    @cached_property
    def count(self):
        return 9999999999

Notice the placeholder value is a number much larger than you expect to have results for.

Django responds to this adjustment by displaying the first few pages in the pagination component, as you would expect. However, the last few pages will be the fake count. When you click on a page that doesn’t exist, Django will take you to the last page no matter how many records you have.

After you override the default paginator, import it into your admin model:

# {project}/users/admin.py
from django.contrib import admin
from .models import User
from .paginator import UserPaginator

@admin.register(User)
class UserTableAdmin(admin.ModelAdmin):
    show_full_result_count = False
    paginator = UserPaginator

Note that I also set show_full_result_count to False. This will turn off the second count query that I noted earlier.

After updating my application with these changes, I reduced the time for the first page from ~5 seconds to 8ms. Keep in mind that this table is still suffering from slow OFFSET queries though. Jumping to page 50,0000 took 18 seconds. Before I show you how to address the OFFSET problem, I’ll show you one more method to improve the COUNT query.

PostgreSQL Pagination in Django: Option 2 – Approximating the COUNT

Another way to reduce the time spent on the COUNT query is to use some built-in Postgres features to estimate the total number of records when the count takes too long. You can see a thorough implementation of the approach in this gist which overloads the count method in Django’s Paginator class.

The first things to do is to set a statement_timeout on the query and fallback to an estimated count. You can use atomic transactions to set the timeout to 150ms.

...
    try:
        with transaction.atomic(), connection.cursor() as cursor:
            # Limit to 150 ms
            cursor.execute('SET LOCAL statement_timeout TO 150;')
            return super().count
    except OperationalError:
            pass
...

If the count method returns data before the time limit is up, then the real value is used. However, if the query takes longer, Django will fallback to an approximate value stored in the pg_class metadata. That metadata is updated when commands like VACUUM, ANALYZE and CREATE INDEX are called, or autovacuum runs on the table.

...
        with transaction.atomic(), connection.cursor() as cursor:
            # Obtain estimated values (only valid with PostgreSQL)
            cursor.execute(
                    "SELECT reltuples FROM pg_class WHERE relname = %s",
                    [self.object_list.query.model._meta.db_table]
            )
            estimate = int(cursor.fetchone()[0])
            return estimate
...

After implementing this method, the maximum time you will spend loading the COUNT is 150ms.

PostgreSQL Pagination in Django: Option 3 – Keyset Pagination

To solve the slow OFFSET problem, you can replace it with keyset (or seek) pagination. In keyset pagination, each page is fetched by an ordered field like an id or created_at date. Instead of iteratively counting pages, as OFFSET does, keyset pagination filters directly by the ordered field.

Example of using keyset pagination

SELECT *
    FROM user
    WHERE id < 60 -- The last item in the previous page
    ORDER BY id DESC
    LIMIT 10

This approach is a little different from OFFSET because you must know the value you are starting at. Imagine that you are on page four of the table, and you know the first id of page five. Instead of counting all the records, you can go directly to that id and return the next ten items.

This works because indexes can support a query like this efficiently. Asking a B-tree index on id to return 10 entries before a certain id only requires loading 10 index entries. Contrast that to using OFFSET, where all entries up to the offset and then the specified limit need to be loaded, making high offsets very expensive.

When using keyset pagination together with the right index, you will see a significant performance boost. The time complexity to query any record in the database is constant. For example, seeking the last page of the large table generated above takes just 78ms.

This method also guards against sparse data. If a user is deleted and the order is not sequential in the table anymore, keyset pagination is not affected; it will skip the missing value with no problem.

Trade-offs of keyset pagination

Keyset pagination comes with a couple of trade-offs. Without the offset, you don’t know exactly how many pages there are in the table or which page number you are currently on. Additionally, keyset pagination requires a sortable field on your model. Sequential IDs and date fields work well.

Using the dj-pagination plugin for Django

Unfortunately, I could not find a library that extends Django’s core Paginator to add keyset pagination. The admin table requires a paginator of that type, so I'll demonstrate the same generated data in a new application view using the dj-pagination plugin.

Add the application into your INSTALLED_APPS and middleware - the docs explain well. Then, create a view in your Users app:

# {project}/users/views.py
from django.shortcuts import render
from .models import User


def index(request):
    context = {
        'users': User.objects.order_by('id').all()
    }
    return render(request, 'users/index.html', context)

This code makes the users QuerySet available in the view context and tells it to render the template file. Next, add the template directory to your TEMPLATES setting object and add your new template file:

# {project}/templates/users/index.html
{% load pagination_tags %}

{% autopaginate users %}

{% for user in users %}
    {{ user.name }}
{% endfor %}

{% paginate %}

In a real application, you would add additional markup and styling to show the list of users, but this demonstrates the tags that dj-pagination makes available to you.

Now that you have a view and template, create a urls.py file to route requests to your view:

# {project}/users/urls.py
from django.urls import path
from . import views

app_name = 'users'
urlpatterns = [
    # ex: /users/
    path('', views.index, name='index'),
]

Finally, add it to your root URLs:

// various imports...

urlpatterns = [
    // routes...
    path('users/', include('users.urls')),
]

Now, when you point your browser to the /users page, you will see a list of usernames with simple pagination controls. The generated query returns results in less than 100ms, regardless of which page I navigate to.

If you are looking for a way to have more control over keyset pagination, django-infinite-scroll-pagination might be worth a look.

Conclusion

In this article, you learned about how pagination works in Django. While naive pagination performs well for small tables, this method quickly degrades in performance as your table grows to millions of rows. Furthermore, jumping to a record deep in the table will be very slow in a query that uses OFFSET.

The good news is that you can speed things up by altering the COUNT query. Additionally, switching to keyset pagination will improve the performance of page lookups and make them work in constant time. Django makes it easy to alter its default configuration, giving you the power to build a performant solution for pagination in Django.

Share this article: If you liked this article you might want to tweet it to your peers.

]]>

PostgreSQL Partitioning in Django

Josh Alletto — Thu, 08 Jul 2021 12:00:00 GMT

Postgres 10 introduced partitioning to improve performance for very large database tables. You will typically start to see the performance benefits with tables of 1 million or more records, but the technical complexity usually doesn’t pay off unless you’re dealing with hundreds of gigabytes of data.

Though there are several advantages to partitioning, it requires more tables, which can become cumbersome to work with, especially if you change your data structure in the future. Please note: If you are just starting out with a small database, you probably don't need partitioning.

That said, if you think you may have a legitimate reason to partition your Postgres database and you want to use Django to manage it, this article is the right one for you.

What is Database Partitioning?
How does PostgreSQL partitioning work?
Creating a partitioned table in PostgreSQL
Comparing Partitioned Postgres Table Performance with Python and Faker
Postgres Data Partitioning in Django
Conclusion
About the Author

What is Database Partitioning?

You may hear partitioning and think it is similar to sharding, where a database or table is spread out across several different nodes. In fact, partitioning in PostgreSQL involves splitting a single table up into several different tables, but partitioning is performed on the same node. Partitioning allows you to organize the data into subsets that are easier for the query planner to traverse. This can vastly increase the speed of lookups, deletes, and inserts.

To that end, the way you choose to partition the data should reflect the way you want to access your data. In other words, if you are frequently accessing records based on when they were created, you should probably partition based on creation date. If you're regularly grabbing subsets of data based on a region or country, you may want to partition based on the records’ locations.

There are three types of partitioning supported by PostgreSQL:

List Partitioning in Postgres

List Partitioning allows you to explicitly state which values you would like to put into each partition. For example, you could partition a table of North American climate data by country with a United States, Canada, and Mexico partition. Since you can create partitions of a partition, you could further split these tables up by state or province.

Range Partitioning in Postgres

Range Partitions are the most useful and the kind I’ll use most in this tutorial. They allow you to specify partitions based on a range of numbers or dates. A table for storing measurements on an hourly basis might be partitioned by date and time. This would make looking up new measurements or deleting older measurements much faster.

Hash Partitions in Postgres

Hash Partitions split data by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder. This comes in handy if there isn't a clear way to organize the data, or you want a pseudo-random breakdown of your data.

How does PostgreSQL partitioning work?

One important thing to understand about a partitioned table is that the partitions themselves are also tables. They are created individually, and you can query them separately, though you would rarely want to use this feature.

The other thing to understand is that the partitioned table - the table you will split into smaller tables - doesn't hold any data. It exists as a parent to the partitions and a blueprint for the table schema.

Creating a partitioned table in PostgreSQL

The best way to understand partitions and see some of their benefits is to consider an example. Start by setting up a new people table, which you’ll compare to the partitioned table:

CREATE TABLE people (
  id BIGSERIAL PRIMARY KEY,
  full_name text NOT NULL,
  birth_date date NOT NULL
);

Now create another table with the same columns but partitioned:

CREATE TABLE people_partitioned (
  id BIGSERIAL,
  full_name text NOT NULL,
  birth_date date NOT NULL,
  PRIMARY KEY (id, birth_date)
) PARTITION BY RANGE (birth_date);

Here, we've created a RANGE partition that uses birth dates to delimit records in each partition. You could just as easily do this for created_on timestamps or an int column like a measurement value or record ID. Note that we had to define the primary key to include both the id column and the birth_date column we are partitioning by, since primary keys always need to include the partition column(s).

Remember, a partitioned table on its own doesn't contain any data. You need to create the tables that represent the partitions themselves. In this case, split the data up into chunks of fifty years:

CREATE TABLE people_partitioned_birthdays_1800_to_1850 PARTITION OF people_partitioned
    FOR VALUES FROM ('1800-01-01') TO ('1850-12-31');

CREATE TABLE people_partitioned_birthdays_1850_to_1900 PARTITION OF people_partitioned
    FOR VALUES FROM ('1850-12-31') TO ('1900-12-31');

CREATE TABLE people_partitioned_birthdays_1900_to_1950 PARTITION OF people_partitioned
    FOR VALUES FROM ('1900-12-31') TO ('1950-12-31');

CREATE TABLE people_partitioned_birthdays_1950_to_2000 PARTITION OF people_partitioned
    FOR VALUES FROM ('1950-12-31') TO ('2000-12-31');

CREATE TABLE people_partitioned_birthdays_2000_to_2050 PARTITION OF people_partitioned
    FOR VALUES FROM ('2000-12-31') TO ('2050-12-31');

Each partition table is declared a PARTITION OF the people_partitioned table and includes the range of values you want to include in that table. It's best to give the tables descriptive names.

Now, you can insert data into the people_partitioned table:

INSERT INTO people_partitioned (full_name, birth_date) VALUES ('Bob Sponge', '2000-08-21');

If you query the people_partitoned table, you’ll get the data you just inserted:

SELECT * FROM people_partitioned;

full_name  | birth_date
-----------+-----------
Bob Sponge | 2000-08-21

To ensure the record went into the right partition, query the partition table directly:

SELECT * FROM people_partitioned_birthdays_2000_to_2050;

full_name  | birth_date
-----------+-----------
Bob Sponge | 2000-08-21

As you can see, records are stored in the individual tables but accessible through the top-level partitioned table as well. This makes accessing the data relatively straightforward as you don’t have to keep track of which data is in each partition.

Comparing Partitioned Postgres Table Performance with Python and Faker

Next, I used Python and Faker to populate each table with ten million rows of random data. To compare the performance on each table, run a SELECT query for anyone born between 1901 and 1920:

SELECT * FROM people WHERE EXTRACT(year FROM birth_date) > 1901 AND EXTRACT(year FROM birth_date) < 1920;

The query returned 1,313,997 rows of data. Our unpartitioned table ran the query in 4.109 seconds while the partitioned table returned the exact same rows in 2.878 seconds, a difference of 1.23 seconds.

This dataset is relatively small compared to what you would typically see in a partitioned database, but if you had to run a similar query on 500 million rows, you can see how partitioning could make a big difference.

To continue our tutorial, next, delete everybody born in 1990:

DELETE FROM people WHERE EXTRACT(year FROM birth_date) = 1990;

In this case, each table deleted 73,015 rows. The non-partitioned table did it in 00:05.431 seconds and the partitioned table finished deleting the same rows in 00:03.688 seconds - 1.74 seconds faster.

A great use case for partitioning is data that accumulates quickly in large quantities like up-to-the-minute weather data. This data is very relevant near the time it is collected but becomes much less useful a week later. Because the partitions are just tables, you can just drop irrelevant tables, making deletes even faster.

Postgres Data Partitioning in Django

Django's ORM doesn't have built-in support for partitioned tables, so if you want to use partitions in your application, it's going to take a little extra work.

One way to use partitions is to roll your own migrations that run raw SQL. This will work, but it means you're going to have to manually manage the migrations for all changes you make to the table in the future.

Another option is to use a package called django-postgres-extra. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for TRUNCATE TABLE and table partitioning.

After you install the package, add it to your installed apps:

INSTALLED_APPS = [
    ...
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'psqlextra',
    ...
]

Next, set your partitioned model to inherit from PostgresPartitionedModel from psqlextra. You'll also need to set up a meta class to define what kind of partition you would like to use (Range, List, Hash) and the column you'd like to partition by:

from django.db import models
from psqlextra.types import PostgresPartitioningMethod
from psqlextra.models import PostgresPartitionedModel

class Person(PostgresPartitionedModel):
    class PartitioningMeta:
        method = PostgresPartitioningMethod.RANGE
        key = ["birth_date"]
    
    full_name = models.TextField()
    birth_date = models.DateField()

Create the migration with python manage.py pgmakemigrations. You should get a file that looks something like this:

# Generated by Django 3.1.2 on 2020-10-13 23:34

from django.db import migrations, models
import psqlextra.backend.migrations.operations.add_default_partition
import psqlextra.backend.migrations.operations.create_partitioned_model
import psqlextra.manager.manager
import psqlextra.models.partitioned
import psqlextra.types


class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
        psqlextra.backend.migrations.operations.create_partitioned_model.PostgresCreatePartitionedModel(
            name='Person',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('full_name', models.TextField()),
                ('birth_date', models.DateField()),
            ],
            options={
                'abstract': False,
                'base_manager_name': 'objects',
            },
            partitioning_options={
                'method': psqlextra.types.PostgresPartitioningMethod['RANGE'],
                'key': ['birth_date'],
            },
            bases=(psqlextra.models.partitioned.PostgresPartitionedModel,),
            managers=[
                ('objects', psqlextra.manager.manager.PostgresManager()),
            ],
        ),
        psqlextra.backend.migrations.operations.add_default_partition.PostgresAddDefaultPartition(
            model_name='Person',
            name='default',
        ),
    ]

Next, create some empty migration files - one for each partition.

You can create an empty migration with python manage.py makemigrations --empty yourappname. Then, use django-postgres-extra to set up the migrations:

from psqlextra.backend.migrations.operations import PostgresAddRangePartition

class Migration(migrations.Migration):
    dependencies = [
        ('people', '0001_initial'),
    ]
    
    operations = [
        PostgresAddRangePartition(
           model_name="person",
           name="people_partitioned_birthdays_1800_to_1850",
           from_values='1800-01-01',
           to_values='1850-12-31',
        ),
    ]

Again, you'll need to create one of these for every partition you need. To get our example from the section above to work, I would need five migrations in addition to the one created for the model.

Creating a migration to delete one of your partitions is basically the same:

from django.db import migrations, models

from psqlextra.migrations.operations import PostgresDeleteListPartition

class Migration(migrations.Migration):
    operations = [
        PostgresDeleteListPartition(
           model_name="person",
           name="people_partitioned_birthdays_1800_to_1850",
        ),
    ]

Now, when you use the Django model, the data will be stored across the partitions, but the ORM will work as you expect it to for any Django application.

Conclusion

In this article, you learned about the different types of partitions available in PostgreSQL. You saw that a partition is just a table that links to a parent table and helps organize data so that it can be accessed faster. Finally, you saw that even though Django's ORM does not natively support partitioning, it is possible to use the feature with the help of the django-postgres-extra package. It is also possible to create your own migrations and set it up that way.

No matter how you decide to go about it, it’s important to remember that you shouldn't use partitions unless you are sure it's the right move for your project. Partitions will make individual tables smaller but give you more tables to manage and for Postgres to search. They are typically best used for larger tables more than 100 GB in size.

Share this article: If you liked this article you might want to tweet it to your peers.

About the Author

Josh is a former educator turned developer with a proven ability to learn quickly and adapt to different roles. In 2018 he changed careers from education to tech and has been excited to find that his communication and presentation skills have transferred over to his new technical career. He's always looking for a new challenge and a dedicated team to collaborate with.

]]>

GeoDjango and PostGIS in Django

Adeyinka Adegbenro — Thu, 24 Jun 2021 12:00:00 GMT

In this article, I’ll introduce you to spatial data in PostgreSQL and Django. You’ll learn how to use PostGIS and GeoDjango to create, store, and manipulate geographic data (both raster and vector) in a Python web application.

Spatial data is any geographic data that contains information related to the earth, such as rivers, boundaries, cities, or natural landmarks. It describes the contours, topology, size, and shape of these features. Maps are a common method of visualizing spatial data, which is typically represented in vector or raster form. Along the way, you’ll see several use cases for spatial data that you’re likely to encounter as a software developer.

If you are interested in reading about PostGIS in Rails I can recommend our PostGIS vs. Geocoder in Rails article on the pganalyze blog where we compare PostGIS in Rails with Geocoder and highlight a couple of the areas where you'll want to (or need to) reach for one over the other.

Vector data vs. raster data

Vector data is a representation of the earth using points, lines, and polygons. A point is used to represent small, discrete areas using an “x” and “y” coordinate. Connected points create lines, which may be used to describe roads, streams, and networks. Polygons are formed from an enclosed connection of lines and represent features with an enclosed area like buildings, islands, and borders. Vector data types are more common in relational databases than raster data.

Raster data, on the other hand, is a representation of geographic data in pixels. It typically refers to imagery of the earth taken from aerial satellites. They are usually stored in a grid of rows and columns with relevant metadata, such as measurements and resolution. Raster data is faster and less expensive to create than vector data types.

Vector data vs. raster data
Spatial data in Postgres with PostGIS
GeoDjango for spatial data in Django
More about GeoDjango and PostGIS

Spatial data in Postgres with PostGIS

Whenever you need to answer questions about your geographic environment, such as "How far is the hospital?,” “Where is the closest store?,” “How high is that skyscraper?,” or "What is the fastest route?” spatial data is likely to come into play.

Spatial data is also used in statistics for analyzing patterns and relationships between elements. For example, when analyzing the spread of a disease in a geographical area, hot zones can be identified and quarantined using spatial data. These data can be used to identify the source of an outbreak, the zoning of cities, and much more. Because more software applications are dependent on location, the manner with which you manage and store spatial data is more critical than ever.

PostgreSQL, on its own, does not provide support for the storage of spatial data. This is where PostGIS comes in. PostGIS is a free, open-source extension that adds spatial data capabilities to PostgreSQL databases. PostGIS allows you to store spatial data and use its library of functions to manipulate it. A database with PostGIS can store geographic coordinates, lines, and shapes and query them using spatial functions.

If you use a Database-as-a-Service provider such as Amazon RDS or Google Cloud SQL, PostGIS is likely to already be installed. If you run your own server, check the PostGIS website) for details. Once installed, enabling PostGIS is as simple as:

CREATE EXTENSION postgis;

Now, let's see how we can work with geospatial data in Django.

GeoDjango for spatial data in Django

GeoDjango is a Django module used for creating geographic applications. It can be used to manage a spatial database in Python. It comes integrated with Django, but can be used as a standalone framework as well. It aims to make it as easy as possible to create location-based web applications.

In the following sections, you’ll see four different use cases for GeoDjango. These will illustrate how you can create, store, and retrieve spatial data in a Django application backed by a Postgres database that uses PostGIS. You’ll also see how to use spatial data for common operations like finding the distance between two locations in space.

Saving polygons Using GEOSGeometry

A polygon is a type of vector data: a connection of Points that form an enclosed shape. You can add a polygon to a spatial database in Django using GEOSGeometry.

The GEOSGeometry class comes from the GEOS API. It takes two arguments, the first argument being a string input which represents the geometry being saved, and a second optional argument, an SRID (spatial reference identifier) number. The SRID is a unique identifier that defines what coordinate system you would like to use and describes how to convert data to real-world locations. When performing geospatial functions such as finding distance and area data, it is important to use data with the same SRID as the one used in the database to ensure the correct result.

To save a Polygon to a spatial database using GEOSGeometry, make sure a Polygon field is defined on your model. Suppose you have a Bank model that represents all the banks in a state with a PolygonField (poly) that outlines the physical real-life boundary and shape of a particular bank branch:

from django.contrib.gis.db import models

class Bank(models.Model):
    name = models.CharField(max_length=20)
    address = models.CharField(max_length=128)
    zip_code = models.CharField(max_length=5)
    poly = models.PolygonField()

    def __str__(self):
        return self.name

To store data on such a field with GEOSGeometry, you can run the following:

>>> from app.models import Bank
>>> from django.contrib.gis.geos import GEOSGeometry
>>> polygon = GEOSGeometry('POLYGON ((-98.503358 29.335668, -98.503086 29.335668, -98.503086 29.335423, -98.503358 29.335423, -98.503358 29.335668))', srid=4326)
>>> bank = Bank(name='Suntrust Bank', address='144 Monsourd Blvd, San Antonio Texas, USA',zip_code='78221', poly=polygon)
>>> bank.save()

Using the GEOSGeometry class, you have created a Polygon object that represents an outline of a certain Suntrust bank in San Antonio, Texas. Each coordinate given to the POLYGON parameter defines a “corner” of the building’s outline.

Saving Models with Raster Fields Using GDALRaster

When working with raster data, you need the field used for storing a raster (called a RasterField). The raster functionality has always been part of PostGIS, but as of PostGIS 3.0, the raster extension has been broken into a separate extension. After installation, make sure the extension is enabled in your database by running:

CREATE EXTENSION postgis_raster;

Now, suppose you have a model called Elevation with a raster field on it. The Elevation model would represent the vertical and horizontal dimension of different surfaces, and the RasterField on it (rast, as seen below) would be a field that takes in an abstracted raster object describing the elevation. For example, it could be a satellite mapping of the terrain of a hill:

from django.contrib.gis.db import models

class Elevation(models.Model):
    name = models.CharField(max_length=100)
    rast = models.RasterField()

The RasterField stores a GDALRaster object. GDALRaster is an object that supports the reading of spatial file formats such as raster files. It can be instantiated with two inputs. The first parameter can be either a string representing a file path or dictionary or a byte object representing the raster. The second parameter specifies whether the raster should be opened in “write mode.” If you don’t use write mode, you cannot modify the raster data.

Below, GDALRaster takes in the raster.tif file, reads it as a file object and abstracts it into a GDALRaster object that can be stored in the model’s RasterField:

>>> from django.contrib.gis.gdal import GDALRaster
>>> rast = GDALRaster('/path/to/raster/raster.tif', write=True)
>>> rast.name
/path/to/raster/raster.tif

>>> rast.width, rast.height # this file has 163 by 174 pixels
(163, 174)

>>> topography = Elevation(name='Mount Fuji', rast=rast)
>>> topography.save()

In this way, you can store a raster’s .tif image file representing the terrain of Mount Fuji.

A new raster can also be created using raw data from a Python dictionary containing the parameters scale, size, origin, and srid. Below, you can see how to define a new raster that describes a canyon with a width and height of 10 pixels and bands which represent a single layer of data in the raster:

>>> rst = GDALRaster({'width': 10, 'height': 10, 'name': 'canyon', 'srid': 4326, 'bands': [{"data": range(100)}]})
>>> rst.name
'canyon'
>>> topography = Elevation(name='Mount Fuji', rast=rst)
>>> topography.save()

Searching for Points in Space Using Geometry Lookups

Geometry Lookups help you find points, lines, and polygons within another geometry. For example, you can use geometry lookups to determine if a point lies within a polygon's surface.

First, create a Country model defined as follows:

class Country(models.Model):
    name = models.CharField(max_length=50)
    area = models.IntegerField()
    pop2005 = models.IntegerField('Population 2005')
    fips = models.CharField('FIPS Code', max_length=2, null=True)
    iso2 = models.CharField('2 Digit ISO', max_length=2)
    iso3 = models.CharField('3 Digit ISO', max_length=3)
    un = models.IntegerField('United Nations Code')
    region = models.IntegerField('Region Code')
    subregion = models.IntegerField('Sub-Region Code')
    lon = models.FloatField()
    lat = models.FloatField()

    # GeoDjango-specific: a geometry field (MultiPolygonField)
    mpoly = models.MultiPolygonField()

    # Returns the string representation of the model.
    def __str__(self):
        return self.name

Country represents a table that stores the boundaries of world countries. Next, you can use GeoDjango to check if a particular Point coordinate is stored in a mpoly field in one of the countries in the database:

>>> from app.models import Country
>>> from django.contrib.gis.geos import Point
>>> point = Point(954158.1, 4215137.1, srid=32140)
>>> Country.objects.filter(mpoly__contains=point)
<QuerySet [<Country: United States>]>

You can also do a spatial lookup to determine if a point is inside a particular country. Run the code below to define a Point object that represents a location in Valdagrone, San Marino. Then, you can search for this Point using the contains method:

>>> san_marino = Country.objects.get(name='San Marino')
>>> pnt = Point(12.4604, 43.9420) # Valdagrone, San Marino
>>> san_marino.mpoly.contains(pnt)
True

Calculating the distance between points

Finally, GeoDjango can be used to calculate the distance between two points. Assuming you know two point coordinates and want to find the distance between them, you could run the following in your Python shell:

>>> from django.contrib.gis.geos import GEOSGeometry
>>> point1 = GEOSGeometry('SRID=4326;POINT(-167.8522796630859 65.55173492431641)').transform(900913, clone=True) # Tin City, Alaska
>>> point2 = GEOSGeometry('SRID=4326;POINT(-165.4089813232422 64.50033569335938)').transform(900913, clone=True) # Nome, Alaska
>>> distance = point1.distance(point2) # in meters
>>> distance / 1000 # in Kilometers
388.3890308954561

This example uses the transform method to convert the Point coordinates from latitude/longitude decimal degrees to metric distance.

To illustrate a more Django-specific example, you could create a model for cities in the United States that looks like this:

class Cities(models.Model):
    feature = models.CharField(max_length=20)
    name = models.CharField(max_length=30)
    county = models.CharField(max_length=20)
    state = models.CharField(max_length=20)
    the_geom = models.PointField()

    # Returns the string representation of the model.
    def __str__(self):
        return self.name

To calculate the distance between the cities Point Hope and Point Lay, you can use the models like this:

>>> from app.models import Cities
>>> pt_hope = Cities.objects.get(name='Point Hope')
>>> pt_lay = Cities.objects.get(name='Point Lay')
>>> pt_hope_meters = pt_hope.the_geom.transform(900913, clone=True)
>>> pt_lay_meters = pt_lay.the_geom.transform(900913, clone=True)
>>> pt_hope_meters.distance(pt_lay_meters)
594946.4349305361

GeoDjango also provides some distance lookup functions such as distance_lt, distance_lte, distance_gt, distance_gte and dwithin. For example:

>>> from django.contrib.gis.geos import Point
>>> from django.contrib.gis.measure import D
>>> pnt = Point(-163.0928955078125, 69.72028350830078) # Point Lay
>>> dist = Cities.objects.filter(the_geom__distance_lte=(pnt, D(km=7))) # find all cities within 7 kilometers of Point Lay
>>> dist = Cities.objects.filter(the_geom__distance_gte=(pnt, D(mi=20))) # find all cities greater than or equal to 20 miles away from Point Lay

In this way, you can use GeoDjango to find the distance between two models having location points or two raw point objects. Combining this method with vector or raster data about roads, you could build complex distance calculations for driving, walking, or biking into your application.

Using Postgres Row-Level Security in Ruby on Rails

Eze Sunday Eze — Tue, 25 May 2021 12:00:00 GMT

Securing access to your Postgres database is more important than ever. With applications growing more complex, often times using multiple programming languages and frameworks within the same app, it can be challenging to ensure access to customer data is handled consistently. For example, if you are building a SaaS application where different companies use the application, you don't want users of Company A to see the data of users in Company B by accident.

Sure, you could use create a separate Postgres schema for each customer, or try to ensure the WHERE clause of every single query includes the particular company—but what if you forget a WHERE clause? That means users from company A will be able to see or manipulate data from company B and maybe other companies, at some point. You don't want that to happen.

Row-Level Security (RLS) solves this problem. It is an additional layer of security that allows you to limit access to database rows based on the currently logged in database user or other attributes of a Postgres connection. With RLS, you wouldn't even need to add a WHERE clause to your queries to limit access to certain rows because users will be able to access only rows that the Row-Level Security policy allows them to have access to.

In this post, you are going to learn how Row-Level Security works with Postgres and how you can implement it in your Rails app. As a side note: Should you be interested in learning how to use Row-Level Security with Python and Django, you can read our dedicated article about it here: Using Postgres Row-Level Security in Python and Django.

Row-Level Security in Postgres
Row-Level Security in Ruby on Rails
Performance Implications of using RLS in Postgres
Conclusion
About the Author

Row-Level Security in Postgres

Row-Level Security is an advanced security feature that was first released in PostgreSQL 9.5. Instead of adding restrictions to an entire table, with RLS, we can add fine-grained access restrictions for individual rows based on policies. You can imagine RLS like an implicit WHERE clause that automatically gets added to all your reads and writes on specific tables.

There are trade-offs to consider with RLS, and it may not always be the best fit because of implementation complexity and performance implications. We'll get to these later on, but lets take a look at how RLS works first.

How to Create a Postgres RLS Policy

For this example we assume our customers store financial records with us, and we are looking to use RLS for ensuring no data gets shared by accident with other customers.

Let's create a transactions table to start with:

CREATE TABLE transactions (
    id uuid PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
    customer_id bigint NOT NULL,
    description text NOT NULL,
    amount_cents bigint NOT NULL,
    created_at timestamptz NOT NULL
);

We could use the GRANT mechanism in Postgres to restrict access, but that only works in an all-or-nothing approach - it doesn't let you restrict access to certain rows in the table.

This is what Row-Level Security helps us with. Enable RLS on the accounts table we just created using the ALTER TABLE command:

ALTER TABLE transactions ENABLE ROW LEVEL SECURITY;

Since we have not created a policy yet, this will enable a default-deny policy on the table, meaning all access is denied. However, the table owner, superusers and roles with the BYPASSRLS attribute will not be subject to this policy.

Now, we'll need to create a policy that defines the database access for our application user, depending on which end customer is currently logged in.

For mapping an end customer to an RLS policy, we have two options:

(1) Create separate database users for each customer, and check the current_user in the RLS policy
(2) Use a session variable that indicates which customer is logged in, e.g. by calling SET rls.customer_id = 42, and then checking that in the policy using current_setting

From a security and isolation perspective, using separate database users is clearly better, but it ends up being complicated to manage in practice. This is especially the case when using a framework like Ruby on Rails that would have to maintain per-user connection pools. You can take a look at the Postgres documentation to see an example of how RLS with separate database users works.

Using Postgres session variables in Row-Level Security policies

For this post we'll focus on using session variables for determining which end customer is currently logged in, and checking that variable in our RLS policy.

We'll use the variable rls.customer_id to identify the current customer ID. Note that you can use any variable name, e.g. myapp.user_id - just make sure the name doesn't conflict with any Postgres config settings.

To start, we'll create a new Postgres database user for our application. It's generally a good practice to keep administrative Postgres users separate from regular users, and this is especially important with RLS since the administrative user would typically be the table owner, and that user would by default always have full access to the table.

Let's create the user on our database:

CREATE USER app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON transactions TO app_user;

If we connect with this user to the database, and attempt to query the transactions table, we'll get an empty result.

SELECT * FROM transactions;

 id | customer_id | description | amount_cents | created_at 
----+-------------+-------------+--------------+------------
(0 rows)

This is by design - if the RLS policy denies access for a SELECT you will simply get an empty result. You can imagine the default-deny RLS policy as a WHERE false clause that will always return nothing.

When we attempt to insert data we can see the RLS policy in effect more easily:

INSERT INTO transactions(customer_id, description, amount_cents, created_at)
  VALUES (1, 'test', 4200, '2020-01-01 00:00:00');

ERROR:  new row violates row-level security policy for table "transactions"

Let's create a policy to replace the default-deny policy, that allows access based on the current value of the rls.customer_id session variable:

CREATE POLICY transactions_app_user
  ON transactions
  TO app_user
  USING (customer_id = NULLIF(current_setting('rls.customer_id', TRUE), '')::bigint);

This permits SELECT, INSERT, UPDATE and DELETE access if the value of the customer_id column matches the rls.customer_id session variable. Since session variables use the text type, we need to cast it to bigint in the policy definition for comparison, and use NULLIF to ensure empty values don't get turned into 0, but rather NULL, meaning no access.

When we connect now, we would still get the same error by default, since rls.customer_id would not be set yet. However, when we set the rls.customer_id, our query will succeed:

SET rls.customer_id = 1;
INSERT INTO transactions(customer_id, description, amount_cents, created_at)
  VALUES (1, 'test', 4200, '2020-01-01 00:00:00');

Testing RLS permissions with different customers

If we attempted to add a record for a different user, that would fail, since it violates the RLS policy:

INSERT INTO transactions(customer_id, description, amount_cents, created_at)
  VALUES (2, 'test2', 2300, '2020-01-01 00:00:00');

ERROR:  new row violates row-level security policy for table "transactions"

For querying the data we just added, we can see our own row when querying the table:

SELECT * FROM transactions;

                  id                  | customer_id | description | amount_cents |     created_at      
--------------------------------------+-------------+-------------+--------------+---------------------
 bfd4b810-487d-4622-af24-73d284fb90d4 |           1 | test        |         4200 | 2020-01-01 00:00:00
(1 row)

However, if we change the customer ID, the data of the other customer is no longer visible:

SET rls.customer_id = 2;
SELECT * FROM transactions;

 id | customer_id | description | amount_cents | created_at 
----+-------------+-------------+--------------+------------
(0 rows)

You can see how this provides protection against accidentally inserting or querying the data for the wrong customer. If you consistently set (and reset!) the rls.customer_id variable, it ensures that all queries made are only seeing data for that particular customer.

Now, the big caveat with this approach is that SQL injection could enable an attacker to issue their own SET command, therefore accessing other customer's data. The session variable based approach is only safe when you protect yourself against SQL injections. Modern frameworks like Ruby on Rails are generally good at this, but you may want to consider running additional tools like brakeman to ensure hand-written queries are correctly sanitized.

Let's see how you can implement RLS in your Rails app.

Row-Level Security in Ruby on Rails

Ruby on Rails does not provide any built-in integration with RLS, and as mentioned earlier its complicated to use an RLS setup where you have one database user per end customer, since Rails would have to keep separate connection pools for each user. The session variable based approach however is fairly straightforward to implement.

First of all, let's review what we need to do:

(1) Set up our database and tables for RLS through Rails migrations
(2) Use a different user for our application than for our migrations
(3) Set the customer ID when entering a customer-specific context, and reset the customer ID when exiting that context (to avoid leaks)

Let's take a look at the migration first:

Creating RLS enabled tables in Rails migrations

To keep things simple, we'll assume that you are adding the Transaction model and associated transactions table to the application. This example assumes you have dropped the table we created manually earlier.

rails g model transaction

class Transaction < ApplicationRecord
end

class CreateTransactions < ActiveRecord::Migration[6.1]
  def change
    create_table :transactions, id: :uuid do |t|
      t.bigint :customer_id
      t.text :description
      t.bigint :amount_cents
      t.timestamptz :created_at
    end

    # Grant application user permissions on the table (this migration should run as the admin user)
    reversible do |dir|
      dir.up do
        execute 'GRANT SELECT, INSERT, UPDATE, DELETE ON transactions TO app_user'
      end
      dir.down do
        execute 'REVOKE SELECT, INSERT, UPDATE, DELETE ON transactions FROM app_user'
      end
    end

    # Define RLS policy
    reversible do |dir|
      dir.up do
        execute 'ALTER TABLE transactions ENABLE ROW LEVEL SECURITY'
        execute "CREATE POLICY transactions_app_user ON transactions TO app_user USING (customer_id = NULLIF(current_setting('rls.customer_id', TRUE), '')::bigint)"
      end
      dir.down do
        execute 'DROP POLICY transactions_app_user ON transactions'
        execute 'ALTER TABLE transactions DISABLE ROW LEVEL SECURITY'
      end
    end
  end
end

Before we run the migration, let's make sure we use two separate database users for migrations and the actual application.

Using separate users for migrations in Rails

Whilst Rails now has built-in support for multiple database connections, it's not really suited for running the migrations with a different user. Luckily there is a simple solution to this, that works in most Rails versions.

Typically a Rails production app has a Procfile that is used to define which process types can be created for the app. On your local machine foreman can be used to handle the Procfile. The simplest Procfile looks like this:

web: bundle exec puma -C ./config/puma.rb
console: bundle exec rails console
migrate: bundle exec rake db:migrate

We'll assume that you commonly specify the database connection using the DATABASE_URL variable, as would be the case when using Heroku for example. Through a bash variable substitution we can use a separate environment variable called DATABASE_URL_ADMIN for database migrations:

web: bundle exec puma -C ./config/puma.rb
console: bundle exec rails console
migrate: DATABASE_URL=${DATABASE_URL_ADMIN:-$DATABASE_URL} bundle exec rake db:migrate

For local testing we can configure both database connection variables in my .env file:

DATABASE_URL=postgresql://app_user@127.0.0.1:5432/rlstest
DATABASE_URL_ADMIN=postgresql://app_admin@127.0.0.1:5432/rlstest

When we call foreman, it will result in the migrations running as the admin user:

foreman run migrate

Similarly, on Heroku, you could have this run as part of your release command, or manually trigger the migrate process type.

Again, this separation is important so we can ensure the application always sets a particular customer ID for queries, and does not get the "free for all" that table owners get which permits access on the whole table.

In case you want to run some of your code as the admin user you could set up a separate connection for that using Rails' multiple database connections feature.

Setting the Customer ID in Rails

To ensure we access the database with the correct customer ID, we can first add helpers to the ApplicationRecord:

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  SET_CUSTOMER_ID_SQL = 'SET rls.customer_id = %s'.freeze
  RESET_CUSTOMER_ID_SQL = 'RESET rls.customer_id'.freeze
  def self.with_customer_id(customer_id, &block)
    begin
      connection.execute format(SET_CUSTOMER_ID_SQL, connection.quote(customer_id))
      block.call
    ensure
      connection.execute RESET_CUSTOMER_ID_SQL
    end
  end
end

And then we can add an around filter to our ApplicationController, making sure the correct customer gets set based on an existing current_user method (e.g. from an authentication library like Devise).

class ApplicationController < ActionController::Base
  around_action :with_customer_id

  def with_customer_id
    ApplicationRecord.with_customer_id(current_user.id) do
      yield
    end
  end
end

With this filter in place all queries within the request will automatically be limited to the current customer - thanks to RLS.

We can also use this when querying data in the console:

ApplicationRecord.with_customer_id(1) do
  puts Transaction.all.to_a.inspect
end

   (1.4ms)  SET rls.customer_id = 1
  Transaction Load (1.6ms)  SELECT "transactions".* FROM "transactions"
[#<Transaction id: "bfd4b810-487d-4622-af24-73d284fb90d4", customer_id: 1, description: "test", amount_cents: 4200, created_at: "2020-01-01 00:00:00.000000000 +0000">]
   (1.5ms)  RESET rls.customer_id

It's important that your application keeps using the same Postgres connection for its queries, as the one that we issued the SET command on. Rails currently makes this quite straightforward, as the same connection will be used within a single Rails web request. Connections are returned to the pool after a request has finished (and we would have called RESET rls.customer_id).

If you have code that directly interacts with the Rails connection pool you should review the Rails connection pool documentation or consider using a transaction and SET LOCAL.

Any custom code that interacts with the Rails connection pool, or third-party connection poolers, such as pgbouncer in transaction pooling mode, have a risk that the security context gets mixed up, since a different connection could run the queries than the one that used the SET command. In those cases using a wrapping transaction together with SET LOCAL is the safest approach.

Performance Implications of using RLS in Postgres

Now, you might wonder - why doesn't everyone use RLS with their Rails applications?

There are multiple reasons why you might choose not to use RLS in your application, such as the additional complexity and maintenance overhead, or if your data model is not a good fit. One thing we haven't looked at yet is performance.

First of all, the good news is that Postgres has gotten better over time with considering RLS during query planning, especially since Postgres 10.

However, there are still some things to consider with regards to performance:

(1) Keep the USING clause of RLS policies simple, to avoid non-obvious performance issues
(2) When using custom functions in your queries, ensure to mark them as LEAKPROOF - this allows the planner to run them early before RLS restrictions apply
(3) Ensure the columns referenced in the RLS policy USING clause are indexed

To illustrate that last point, let's look at the EXPLAIN plan of a query from earlier:

SET rls.customer_id = 1;
EXPLAIN ANALYZE SELECT * FROM transactions;

                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Seq Scan on transactions  (cost=0.00..28.22 rows=4 width=72) (actual time=0.012..0.027 rows=1 loops=1)
   Filter: (customer_id = (NULLIF(current_setting('rls.customer_id'::text, true), ''::text))::bigint)
 Planning Time: 0.073 ms
 Execution Time: 0.094 ms
(4 rows)

As you see Postgres automatically adds the implicit WHERE clause based on the RLS policy that applies. That seems relatively straightforward. However note that we see a Sequential Scan here. Its important to include the columns used by RLS policies in your indices, for example by making a new index on customer_id:

CREATE INDEX ON transactions(customer_id);

Using Postgres monitoring tools such as auto_explain can be very helpful to find outliers that are caused by the bad plans caused by RLS.

Conclusion

In this article, we've learned how RLS works in Postgres, and how it can be used with Ruby on Rails. RLS is not complicated to use, but is a separate layer of access control, which may seem non-obvious when you are used to a single database user with full permissions accessing the database. If you decide to use RLS its also a good idea to review the performance implications.

If you prefer to write less SQL yourself, you may want to take a look at the rls_rails library that provides useful helpers for both database migrations as well as setting of the current customer (or tenant) ID.

In case you determined that RLS is too complicated, but you would like a similar guarantee that every query is constrained to a specific tenant, you may want to take a look at activerecord-multi-tenant which automatically rewrites your queries on the Rails side to include a tenant_id, before they get sent to Postgres.

Share this post on Twitter

About the Author

Eze is a software developer and technical writer trying to make sense of the world—building amazing stuff and documenting every step of the journey.

]]>

A look at Postgres 14: Performance and Monitoring Improvements

Lukas Fittl — Fri, 21 May 2021 12:00:00 GMT

The first beta release of the upcoming Postgres 14 release was made available yesterday. In this article we'll take a first look at what's in the beta, with an emphasis on one major performance improvement, as well as three monitoring improvements that caught our attention.

Before we get started, I wanted to highlight what always strikes me as an important unique aspect of Postgres: Compared to most other open-source database systems, Postgres is not the project of a single company, but rather many individuals coming together to work on a new release, year after year. And that includes everyone who tries out the beta releases, and reports bugs to the Postgres project. We hope this post inspires you to do your own testing and benchmarking.

Now, I'm personally most excited about better connection scaling in Postgres 14. For this post we ran a detailed benchmark comparing Postgres 13.3 to 14 beta1 (note that the connection count is log scale):

Improved Active and Idle Connection Scaling in Postgres 14
Dive into memory use with pg_backend_memory_contexts
Track WAL activity with pg_stat_wal
Monitor queries with the built-in Postgres query_id
And 200+ other improvements in the Postgres 14 release!
Conclusion

Improved Active and Idle Connection Scaling in Postgres 14

Postgres 14 brings significant improvements for those of us that need a high number of database connections. The Postgres connection model relies on processes instead of threads. This has some important benefits, but it also has overhead at large connection counts. With this new release, scaling active and idle connections has gotten significantly better, and will be a major improvement for the most demanding applications.

For our test, we've used two 96 vCore AWS instances (c5.24xlarge), one running Postgres 13.3, and one running Postgres 14 beta1. Both of these use Ubuntu 20.04, with the default system settings, but the Postgres connection limit has been increased to 11,000 connections.

We use pgbench to test connection scaling of active connections. To start, we initialize the database with pgbench scale factor 200:

# Postgres 13.3
$ pgbench -i -s 200
...
done in 127.71 s (drop tables 0.02 s, create tables 0.02 s, client-side generate 81.74 s, vacuum 2.63 s, primary keys 43.30 s).

# Postgres 14 beta1
$ pgbench -i -s 200
...
done in 77.33 s (drop tables 0.02 s, create tables 0.02 s, client-side generate 48.19 s, vacuum 2.70 s, primary keys 26.40 s).

Already here we can see that Postgres 14 does much better in the initial data load.

We now launch read-only pgbench with a varying set of active connections, showing 5,000 concurrent connections as an example of a very active workload:

# Postgres 13.3
$ pgbench -S -c 5000 -j 96 -M prepared -T30
...
tps = 417847.658491 (excluding connections establishing)

# Postgres 14 beta1
$ pgbench -S -c 5000 -j 96 -M prepared -T30
...
tps = 495108.316805 (without initial connection time)

As you can see, the throughput of Postgres 14 at 5000 active connections is about 20% higher. At 10,000 active connections the improvement is 50% over Postgres 13, and at lower connection counts you can also see consistent improvements.

Note that you will usually see a noticeable TPS drop when the number of connections exceeds the number of CPUs, this is most likely due to CPU scheduling overhead, and not a limitation in Postgres itself. Now, most workloads don't actually have this many active connections, but rather a high number of idle connections.

The original author of this work, Andres Freund, ran a benchmark on the throughput of a single active query, whilst also running 10,000 idle connections. The query went from 15,000 TPS to almost 35,000 TPS - that's over 2x better than in Postgres 13. You can find all the details in Andres Freund's original post introducing these improvements.

Dive into memory use with pg_backend_memory_contexts

Have you ever been curious why a certain Postgres connection is taking up a higher amount of memory? With the new pg_backend_memory_contexts view you can take a close look at what exactly is allocated for a given Postgres process.

To start, we can calculate how much memory is used by our current connection in total:

SELECT pg_size_pretty(SUM(used_bytes)) FROM pg_backend_memory_contexts;

 pg_size_pretty 
----------------
 939 kB
(1 row)

Now, let's dive a bit deeper. When we query the table for the top 5 entries by memory usage, you will notice there is actually a lot of detailed information:

SELECT * FROM pg_backend_memory_contexts ORDER BY used_bytes DESC LIMIT 5;

          name           | ident |      parent      | level | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes 
-------------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
 CacheMemoryContext      |       | TopMemoryContext |     1 |      524288 |             7 |      64176 |           0 |     460112
 Timezones               |       | TopMemoryContext |     1 |      104120 |             2 |       2616 |           0 |     101504
 TopMemoryContext        |       |                  |     0 |       68704 |             5 |      13952 |          12 |      54752
 WAL record construction |       | TopMemoryContext |     1 |       49768 |             2 |       6360 |           0 |      43408
 MessageContext          |       | TopMemoryContext |     1 |       65536 |             4 |      22824 |           0 |      42712
(5 rows)

A memory context in Postgres is a memory region that is used for allocations to support activities such as query planning or query execution. Once Postgres completes work in a context, the whole context can be freed, simplifying memory handling. Through the use of memory contexts the Postgres source actually avoids doing manual free calls for the most part (even though it's written in C), instead relying on memory contexts to clean up memory in groups. The top memory context here, CacheMemoryContext is used for many long-lived caches in Postgres.

We can illustrate the impact of loading additional tables into a connection by running a query on a new table, and then querying the view again:

SELECT * FROM test3;
SELECT * FROM pg_backend_memory_contexts ORDER BY used_bytes DESC LIMIT 5;

          name           | ident |      parent      | level | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes 
-------------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
 CacheMemoryContext      |       | TopMemoryContext |     1 |      524288 |             7 |      61680 |           1 |     462608
...

As you can see the new view illustrates that simply having queried a table on this connection will retain about 2kb of memory, even after the query has finished. This caching of table information is done to speed up future queries, but can sometimes cause surprising amounts of memory usage for multi-tenant databases with many different schemas. You can now illustrate such issues easily through this new monitoring view.

If you'd like to access this information for processes other than the current one, you can use the new pg_log_backend_memory_contexts function which will cause the specified process to output its own memory consumption to the Postgres log:

SELECT pg_log_backend_memory_contexts(10377);

LOG:  logging memory contexts of PID 10377
STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used
LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used
LOG:  level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used
LOG:  level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used
LOG:  level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used
LOG:  level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used
LOG:  level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used
LOG:  level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
...
LOG:  level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used
LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used

Track WAL activity with pg_stat_wal

Building on the WAL monitoring capabilities in Postgres 13, the new release brings a new server-wide summary view for WAL information, called pg_stat_wal.

You can use this to monitor WAL writes over time more easily:

SELECT * FROM pg_stat_wal;

-[ RECORD 1 ]----+------------------------------
wal_records      | 3334645
wal_fpi          | 8480
wal_bytes        | 282414530
wal_buffers_full | 799
wal_write        | 429769
wal_sync         | 428912
wal_write_time   | 0
wal_sync_time    | 0
stats_reset      | 2021-05-21 07:33:22.941452+00

With this new view we can get summary information such as how many Full Page Images (FPI) were written to the WAL, which can give you insights on when Postgres generated a lot of WAL records due to a checkpoint. Secondly, you can use the new wal_buffers_full counter to quickly see when the wal_buffers setting is set too low, which can cause unnecessary I/O that can be prevented by raising wal_buffers to a higher value.

You can also get more details of the I/O impact of WAL writes by enabling the optional track_wal_io_timing setting, which then gives you the exact I/O times for WAL writes, and WAL file syncs to disk. Note this setting can have noticeable overhead, so it's best turned off (the default) unless needed.

Monitor queries with the built-in Postgres query_id

In a recent survey done by TimescaleDB in March and April 2021, the pg_stat_statements extension was named one of the top three extensions the surveyed user base uses with Postgres. pg_stat_statements is bundled with Postgres, and with Postgres 14 one of the important features of the extensions got merged into core Postgres:

The calculation of the query_id, which uniquely identifies a query, whilst ignoring constant values. Thus, if you run the same query again it will have the same query_id, enabling you to identify workload patterns on the database. Previously this information was only available with pg_stat_statements, which shows aggregate statistics about queries that have finished executing, but now this is available with pg_stat_activity as well as in log files.

First we have to enable the new compute_query_id setting and restart Postgres afterwards:

ALTER SYSTEM SET compute_query_id = 'on';

If you use pg_stat_statements query IDs will be calculated by automatically, through the default compute_query_id setting of auto.

With query IDs enabled, we can look at pg_stat_activity during a pgbench run and see why this is helpful as compared to just looking at query text:

SELECT query, query_id FROM pg_stat_activity WHERE backend_type = 'client backend' LIMIT 5;

                                 query                                  |      query_id      
------------------------------------------------------------------------+--------------------
 UPDATE pgbench_tellers SET tbalance = tbalance + -4416 WHERE tid = 3;  | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + -2979 WHERE tid = 10; | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + 2560 WHERE tid = 6;   | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + -65 WHERE tid = 7;    | 885704527939071629
 UPDATE pgbench_tellers SET tbalance = tbalance + -136 WHERE tid = 9;   | 885704527939071629
(5 rows)

All of these queries are the same from an application perspective, but their text is slightly different, making it hard to find patterns in the workload. With the query ID however we can clearly identify the number of certain kinds of queries, and assess performance problems more easily. For example, we can group by the query ID to see what's keeping the database busy:

SELECT COUNT(*), state, query_id FROM pg_stat_activity WHERE backend_type = 'client backend' GROUP BY 2, 3;

 count | state  |       query_id       
-------+--------+----------------------
    40 | active |   885704527939071629
     9 | active |  7660508830961861980
     1 | active | -7810315603562552972
     1 | active | -3907106720789821134
(4 rows)

When you run this on your own system you may find that the query ID is different from the one shown here. This is due to query IDs being dependent on the internal representation of a Postgres query, which can be architecture dependent, and also considers internal IDs of tables instead of their names.

The query ID information is also available in log_line_prefix through the new %Q option, making it easier to get auto_explain output thats linked to a query:

2021-05-21 08:18:02.949 UTC [7176] [user=postgres,db=postgres,app=pgbench,query=885704527939071629] LOG:  duration: 59.827 ms  plan:
	Query Text: UPDATE pgbench_tellers SET tbalance = tbalance + -1902 WHERE tid = 6;
	Update on pgbench_tellers  (cost=4.14..8.16 rows=0 width=0) (actual time=59.825..59.826 rows=0 loops=1)
	  ->  Bitmap Heap Scan on pgbench_tellers  (cost=4.14..8.16 rows=1 width=10) (actual time=0.009..0.011 rows=1 loops=1)
	        Recheck Cond: (tid = 6)
	        Heap Blocks: exact=1
	        ->  Bitmap Index Scan on pgbench_tellers_pkey  (cost=0.00..4.14 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
	              Index Cond: (tid = 6)

Want to link auto_explain and pg_stat_statements, and can't wait for Postgres 14?

We built our own open-source query fingerprint mechanism that uniquely identifies queries based on their text. This is used in pganalyze for matching EXPLAIN plans to queries, and you can also use this in your own scripts, with any Postgres version.

And 200+ other improvements in the Postgres 14 release!

These are just some of the many improvements in the new Postgres release. You can find more on what's new in the release notes, such as:

The new predefined roles pg_read_all_data/pg_write_all_data give global read or write access
Automatic cancellation of long-running queries if the client disconnects
Vacuum now skips index vacuuming when the number of removable index entries is insignificant
Per-index information is now included in autovacuum logging output
Partitions can now be detached in a non-blocking manner with ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY

And many more. Now is the time to help test!

Download beta1 from the official package repositories, or build it from source. We can all contribute to making Postgres 14 a stable release in a few months from now.

Conclusion

At pganalyze, we're excited about Postgres 14, and hope this post got you interested as well! Postgres shows again how many small improvements make it a stable, trustworthy database, that is built by the community, for the community.

Share this post on Twitter

]]>

Creating Custom Postgres Data Types in Rails

Josh Alletto — Thu, 22 Apr 2021 12:00:00 GMT

Postgres ships with the most widely used common data types, like integers and text, built in, but it's also flexible enough to allow you to define your own data types if your project demands it.

Say you're saving price data and you want to ensure that it’s never negative. You might create a not_negative_int type that you could then use to define columns on multiple tables. Or maybe you have data that makes more sense grouped together, like GPS coordinates. Postgres allows you to create a type to hold that data together in one column rather than spread it across multiple columns.

Custom Data Types in Postgres
Composite Types
Custom Types in Rails
- Using the Active Record Attributes API to Register new Custom Types in Rails
Conclusion
About the Author

In Rails, all attributes pass through the attributes API when they’re entered by the user or read from the database. Rails 5 introduced the Attributes API, allowing you to define your own attribute types and use them in your application.

In this tutorial, you'll learn how to work with two of the most common custom types available in PostgreSQL. You'll also see how to incorporate them into your Rails application using the Attributes API.

Should you be interested in learning how to create custom Postgres data types in Django, we've got you covered! Just read our dedicated article about it here: Creating Custom Postgres Data Types in Django.

Custom Data Types in Postgres

There are two custom data types you'll learn about in this post:

Domain types: These allow you to put certain restrictions on a data type that can be reused later.
Composite types: These let you group data together to form a new type.

First, let's take a look at how to create a domain type. Say you want to ensure a username doesn't contain a !:

CREATE DOMAIN string_without_bang as VARCHAR NOT NULL CHECK (value !~ '!');

After that, you can use this domain type when you create our users table:

CREATE TABLE users (
 id serial primary key, 
 user_name string_without_bang
);

Let’s try creating a user with a username that contains an exclamation point. You'll see an error message:

INSERT INTO users(user_name) VALUES ('coolguy!!');
-- ERROR:  value for domain string_without_bang violates check constraint "string_without_bang_check"

You can even use a domain in the definition of another domain:

CREATE DOMAIN email_with_check AS string_without_bang NOT NULL CHECK (value ~ '@');

CREATE TABLE email_addresses (
  user_id integer,
  email email_with_check
);

INSERT INTO email_addresses(email) VALUES ('frank!@gmail.com');
-- ERROR:  value for domain email_with_check violates check constraint "string_without_bang_check"

INSERT INTO email_addresses(email) VALUES ('joshgmail.com');
-- ERROR:  value for domain email_with_check violates check constraint "email_with_check_check"

Composite Types

Composite types allow you to group different pieces of data together into one column. They're useful for information that has more meaning when grouped together, like RGB color values or the dimensions of a package.

Let’s start by creating a dimensions type:

CREATE TYPE dimensions as (
  depth integer,
  width integer,
  height integer
);

Next, let’s create a table using this new type. Try also using the domain type you created previously:

CREATE TABLE orders (
  product string_without_bang,
  dims dimensions
);

Add some data and take a look at the output when you query the table:

INSERT INTO orders(product, dims) VALUES('widget', (50,88,101));

SELECT * FROM orders;

 product |    dims     
---------+-------------
 widget  | (50,88,101)

You'll see that all the data related to the dimensions of the package is saved together in the dims column. But don't worry, you'll still be able to access the individual ints.

SELECT (dims).width FROM orders;

 width 
-------
  88

Custom Types in Rails

In order to use our custom types in Rails, you’ll have to do two things:

Create the migration that sets the types up for us in the database.
Tell Rails how to handle your new type so you can easily work with it in Ruby.

Currently, Rails doesn't offer any built-in solution for creating types in migrations, so you'll have to run some raw SQL. The code below runs exactly what you ran above to create the type directly in PostgreSQL, then immediately uses the types to build the orders table:

class CreateOrders < ActiveRecord::Migration[6.1]
  def up
    execute <<~SQL
      CREATE TYPE dimensions as (
        depth integer,
        width integer,
        height integer
      );
      CREATE DOMAIN string_without_bang as VARCHAR NOT NULL CHECK (value !~ '!');
    SQL
    create_table :orders do |t|
      t.column :product, :string_without_bang
      t.column :dims, :dimensions
    end
  end

  def down
    drop_table :orders
    execute "DROP TYPE dimensions"
    execute "DROP DOMAIN string_without_bang"
  end
end

You'll need to use up and down methods here since you’re running some raw SQL that Rails won't be able to easily undo on its own if you want to do a rollback.

Run the migrations, and you'll see output that looks similar to this:

== 20210211230550 Orders: migrating =========================================
-- execute("CREATE TYPE dimensions as (\n  depth integer,\n  width integer,\n  height integer\n);\nCREATE DOMAIN string_without_bang as VARCHAR NOT NULL CHECK (value !~ '!');\n")
   -> 0.0012s
-- create_table(:Orders)
   -> 0.0058s
== 20210211230550 Orders: migrated (0.0071s) ================================

unknown OID 25279: failed to recognize type of 'dims'. It will be treated as String.

Notice that the migration succeeded, but Rails does not know what to do with the composite type, so it will treat it as a string. If you check the database directly, you'll see that the type for dims column is what you expect:

=# \d orders
 Column  |        Type         |
---------+---------------------+
 id      | bigint              |
 product | string_without_bang |
 dims    | dimensions          |

Right now, if you create a new product, you'll need to enter the dims data as a properly formatted string like this:

2.6.3 :001 > o = Order.create product: 'hat', dims: '(1,2,3)'
2.6.3 :002 > o.dims
 => "(1,2,3)" 
2.6.3 :003 >

This setup doesn't allow you to update the individual elements without having to completely override the entire string. What's needed here is a dimensions class that has methods that know how to deal with this new data type. Luckily, Rails has a solution for this.

Using the Active Record Attributes API to Register new Custom Types in Rails

You can use the Active Record Attributes API to register the new type and control what it looks like when leaving and entering the database.

Start by creating a dimensions class that takes in a string in the initialize method. Data will come in from the database as a string with parentheses "(1,2,3)", so you'll need to parse it and then set some instance variables. Notice the code also includes a to_s method that returns the data back to the string with parentheses that the database will understand.

class Dimension
  attr_accessor :depth, :width, :height

  def initialize(values)
    dims = values ? sanitize_string(values) : [0,0,0] 
    @depth = dims[0]
    @width = dims[1]
    @height = dims[2]
  end

  def sanitize_string(values)
    values.delete("()").split(',').map(&:to_i)
  end

  def to_s
    "(#{depth},#{width},#{height})"
  end
end

This class will act as a wrapper for the dimension type and will make it easier to work with, but you still need to tell Rails how to handle it when saving to the database and instantiating your order objects. Just like Rails knows how to take a Ruby string type or a Ruby int type and pass it off to PostgreSQL in a way it can save and understand, you need to tell Rails how to handle your new dimension type. You can do that by creating a DimensionType that inherits from ActiveRecord::Type::Value and setting up a few methods.

class DimensionType < ActiveRecord::Type::Value
  def cast(value)
    Dimension.new(value)
  end

  def serialize(value)
    value.to_s
  end

  def changed_in_place?(raw_old_value, new_value)
    raw_old_value != serialize(new_value)
  end
end

The #cast method gets called by Active Record when setting an attribute in the model. You can use your new dimension class for this.

The #serialize method converts your dimension object to a type that PostgreSQL can understand. This is why you set up your to_s method in your Dimension class.

Finally, #changed_in_place? takes care of comparing the raw value in the database with your new value. This is what gets called whenever Active Record tries to decide if it needs to make an update to the database. raw_old_value will always be a string because it's read directly from the database. new_value, in this case, will be an instance of Dimension, so it needs to be converted to a string in order to make the comparison.

The last piece of the puzzle will be to tell the order model to use the new DimensionType for the dims attribute.

class Order < ApplicationRecord
  attribute :dims, DimensionType.new
end

Let’s open a Rails console and test out the new type:

2.6.3 :001 > o = Order.new
 => #<Order id: nil, product: nil, dims: #<Dimension:0x00007fda47295e40 @depth="0", @width="0", @height="0">> 
2.6.3 :002 > o.product = 'a wig'
 => "a wig" 
2.6.3 :003 > o.dims.width = 9
 => 9 
2.6.3 :004 > o.dims.depth = 4
 => 4 
2.6.3 :005 > o.dims.height = 1
 => 1 
2.6.3 :006 > o.save
D, [2021-02-22T09:55:41.588716 #79057] DEBUG -- :    (0.2ms)  BEGIN
D, [2021-02-22T09:55:41.607391 #79057] DEBUG -- :   Order Create (0.8ms)  INSERT INTO "orders" ("product", "dims") VALUES ($1, $2) RETURNING "id"  [["product", "a wig"], ["dims", "(4,9,1)"]]
D, [2021-02-22T09:55:41.610457 #79057] DEBUG -- :    (1.2ms)  COMMIT
 => true 
2.6.3 :007 >

See if you can follow the same process to set up the string_without_bang!

Conclusion

In this article, we walked through how to create two different unique data types in PostgreSQL.

The first, a domain type, allows you to create checks on your data and reuse those checks on multiple columns. The second, the composite type, lets you group data together in a meaningful way for storage in a single column. Finally, we learned how to hook into the Rails Attributes API to help instantiate your new type as an object that Ruby knows how to use.

Share this article: If you liked this article you might want to tweet it to your peers.

About the Author

Josh Alletto is an instructor at Code Platoon. In 2018 he changed careers from education to tech and has been excited to find that his communication and presentation skills have transferred over to his new technical career. He's always looking for a new challenge and a dedicated team to collaborate with.

]]>

Introducing pg_query 2.0: The easiest way to parse Postgres queries

Lukas Fittl — Thu, 18 Mar 2021 12:00:00 GMT

The query parser is a core component of Postgres: the database needs to understand what data you're asking for in order to return the right results. But this functionality is also useful for all sorts of other tools that work with Postgres queries. A few years ago, we released pg_query to support this functionality in a standalone C library.

pganalyze uses pg_query to parse and analyze every SQL query that runs on your Postgres database. Our initial motivation was to create pg_query for checking which tables a query references, or what kind of statement it is. Since then we've expanded its use in pganalyze itself. pganalyze now truncates query text in a smart manner in the query overview. The pganalyze-collector supports collecting EXPLAIN plans, and uses pg_query to support log-based EXPLAIN. And we link together pg_stat_statements and auto_explain data in pganalyze using query fingerprints (another pg_query feature we'll discuss in detail in a later section).

Postgres community tools build on pg_query

But, what we didn't expect at the time, was the tremendous interest we've seen from the community. The Ruby library alone has received over 3.5 million downloads in its lifetime.

Thanks to many contributors, pg_query now has bindings for other languages beyond Ruby and Go, such as Python (pglast, maintained by Lele Gaifax), Node.js (pgsql-parser, maintained by Dan Lynch) and even OCaml. There are also many notable third-party projects that use pg_query to parse Postgres queries. Here are some of our favorites:

sqlc provides type safe SQL-based databases access in Go
pgdebug lets you debug complex CTEs and execute parts as a standalone query
Google's HarbourBridge uses pg_query for helping customers trial Spanner from Postgres sources
DuckDB uses a forked version of pg_query for their parsing layer
GitLab uses pg_query for normalizing queries in their internal error reporting
Splitgraph uses pg_query via the pglast Python binding to parse the SQL statements in Splitfiles
sqlint lints your SQL files for correctness

Today, it's time to bring pg_query to the next level.

Announcing pg_query 2.0: Better & faster parsing, with Postgres 13 support

We're excited to announce the next major version of pg_query, pg_query 2.0.

In this version, you'll find support for:

Parsing the PostgreSQL 13 query syntax
Deparser as part of the core C library, to turn modified parse trees back into SQL
New parse tree format based on Protocol Buffers (Protobuf)
Improved, faster query fingerprinting mechanism
And much more!

Postgres community tools build on pg_query
Announcing pg_query 2.0: Better & faster parsing, with Postgres 13 support
How pg_query turns a Postgres statement into a parse tree
Turning parse trees back into SQL using a deparser
- The pg_query deparser with coverage for all Postgres regression tests
Fingerprints in pg_query: A better way to check if two queries are identical
- Why did we create our own query fingerprint concept?
Additional changes for pg_query 2.0
Conclusion

To start, let's revisit how pg_query actually works.

How pg_query turns a Postgres statement into a parse tree

There are many ways to parse SQL, but the scope for pg_query is very specific. That is, to be able to parse the full Postgres query syntax, the same way as Postgres does. The only reliable way to do this, is to use the Postgres parser itself.

pg_query isn't the first project to do this, for example pgpool has a copy of the Postgres parser as well. But we needed an easily maintainable, self-contained version of the parser in a standalone C library. This would let us, and the Postgres community, use the parser from almost any language by writing a simple wrapper.

How did we do this? We started by looking at the Postgres source. Looking at the source, you will find the function called raw_parser:

/*
 * raw_parser
 *		Given a query in string form, do lexical and grammatical analysis.
 *
 * Returns a list of raw (un-analyzed) parse trees.  The immediate elements
 * of the list are always RawStmt nodes.
 */
List *
raw_parser(const char *str)
{
    ...

After raw parsing, Postgres goes into parse analysis. In that phase Postgres identifies the types of columns, maps table names to the schema and more. After that, Postgres does planning (see our introduction to Postgres query planning), and then executes the query based on the query plan.

For pg_query, all we need is the raw parser. Looking at the code, we discovered a problem. The parser code still depends on a lot of Postgres code, such as for memory management or error handling. We needed a repeatable way to extract just enough source code to compile and run the parser.

Thus the idea was born to automatically extract the Postgres parser code and its dependencies.

Using LibClang to extract C source code from Postgres

Our goal: A set of self-contained C files that represent a copy of Postgres' raw_parser function. But we don't want to copy the code manually. Luckily we can use LibClang to parse C code, and understand its dependencies.

The details of this could fill many pages, but here is a simplified version of how this works:

1. Each translation unit (.c file) in the source is analyzed via LibClang's Ruby binding:

require 'ffi/clang'

index = FFI::Clang::Index.new(true, true)
translation_unit = index.parse_translation_unit(file, ['... CFLAGS ...'])

2. The analysis walks through the file and marks each C method, as well as the symbols it references:

translation_unit.cursor.visit_children do |cursor, parent|
  @file_to_symbol_positions[cursor.location.file] ||= {}
  @file_to_symbol_positions[cursor.location.file][cursor.spelling] = [cursor.extent.start.offset, cursor.extent.end.offset]
  cursor.visit_children do |child_cursor, parent|
    if child_cursor.kind == :cursor_decl_ref_expr || child_cursor.kind == :cursor_call_expr
      @references[cursor.spelling] ||= []
      (@references[cursor.spelling] << child_cursor.spelling).uniq!
    end
    :recurse
  end
end

3. We resolve required C methods and their code, based on the top-level method we are looking for:

def deep_resolve(method_name, depth: 0, trail: [], global_resolved_by_parent: [], static_resolved_by_parent: [], static_base_filename: nil)
  ...
  global_dependents = (@references[method_name] || []
  global_dependents.each do |symbol|
    deep_resolve(symbol, depth: depth + 1, trail: trail + [method_name], global_resolved_by_parent: global_resolved_by_parent + global_dependents)
  end
  ...
end

deep_resolve('raw_parser')

4. We write out just the portions of the C code that are required (see details here)

With this, we have a working Postgres parser!

You can find the full details in the pg_query source.

Once we can call the Postgres parser in our standalone library, we can get the result as a parse tree, represented as Postgres parser C structs. But now we needed to make this useful in other languages, such as Ruby or Go.

Turning Postgres parser C structs into JSON and Protobufs

It's a little known fact, but Postgres actually has a text representation of a query parse tree. Its rarely used directly, being reserved for internal communication and debugging. The easiest way to see an example is by looking at the adbin field in pg_attref, which shows the internal representation for an expression of an column default value (to contrast, pg_get_expr shows the expression in SQL):

SELECT adbin, pg_get_expr(adbin, adrelid) FROM pg_attrdef WHERE adrelid = 'mytable'::regclass AND adnum = 1;

-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
adbin       | {FUNCEXPR :funcid 480 :funcresulttype 23 :funcretset false :funcvariadic false :funcformat 2 :funccollid 0 :inputcollid 0 :args ({FUNCEXPR :funcid 1574 :funcresulttype 20 :funcretset false :funcvariadic false :funcformat 0 :funccollid 0 :inputcollid 0 :args ({CONST :consttype 2205 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 68 :constvalue 4 [ -27 10 -122 1 0 0 0 0 ]}) :location 60}) :location -1}
pg_get_expr | nextval('mytable_id_seq'::regclass)

This text format is not useful for working with a parse tree in other languages. Thus, we needed a more portable format to export the parse tree from C, and import it in another language such as Ruby.

The initial version of pg_query used JSON for this. JSON is great, since you can parse it in pretty much any programming language. Thus, in this new pg_query release, we still support JSON.

We're also introducing support for a new schema-based format, using Protocol Buffers (Protobuf).

Why pg_query 2.0 adds support for Protocol Buffers

Whilst JSON is convenient for passing around the parse tree, it has a few problems:

JSON is slower to parse than a binary format
Memory usage can become an issue with complex parse trees
Building logic around a tree of JSON data is error-prone, as one needs to add a lot of checks to identify each node and its supported fields
It's hard to instantiate new parse tree nodes, for example to use for deparsing back into a SQL statement

In pg_query 1.0, accessing the value of a "SELECT 1" would have looked like this with the Ruby binding:

result = PgQuery.parse("SELECT 1")
result.tree[0]['RawStmt']['stmt']['SelectStmt']['targetList'][0]['ResTarget']['val']
# => {"A_Const"=>{"val"=>{"Integer"=>{"ival"=>1}}, "location"=>7}}

Here is how Protobuf improves the parse tree handling in Ruby:

result = PgQuery.parse("SELECT 1")
result.tree.stmts[0].stmt.select_stmt.target_list[0].res_target.val.a_const
# => <PgQuery::A_Const: val: <PgQuery::Node: integer: <PgQuery::Integer: ival: 1>>, location: 7>

Note how we have a full class definition for each parse tree node type, making interaction with the tree nodes significantly easier.

Now, let's say I want to change a parse tree and turn it back into a SQL statement. For this, I need a deparser.

Turning parse trees back into SQL using a deparser

Postgres itself has deparser logic in many places. For example postgres_fdw has a deparser to generate the query to send to the remote server. But, the deparser code in Postgres requires a post-parse analysis parse tree (that directly references relation OIDs, etc). That means we can't make use of it in pg_query, which works with raw parse trees.

For many years now, the Ruby pg_query library has had a deparser. Over the years we've had many community contributions to make it complete. The third-party libraries for Python and Node.js also have their own deparser. These efforts were all done in parallel, without sharing code. And the Go library is missing a deparser altogether.

How can we reduce the duplicated effort in the community? By creating a new portable deparser for raw parse trees. This avoids having duplicate efforts for every pg_query-based library.

The pg_query deparser with coverage for all Postgres regression tests

pg_query 2.0 features a new deparser, written in C. This was by far the biggest undertaking of this new release. The new deparser is able to generate all SQL queries used in the Postgres regression tests (which the pg_query parser can of course parse), and more.

It works like this, here by example of the Go library, which before did not have a deparser:

package main

import (
  "fmt"
  pg_query "github.com/pganalyze/pg_query_go/v2"
)

func main() {
  // Parse a query
  result, err := pg_query.Parse("SELECT 42")
  if err != nil {
    panic(err)
  }

  // Modify the parse tree
  result.Stmts[0].Stmt.GetSelectStmt().GetTargetList()[0].GetResTarget().Val =
    pg_query.MakeAConstStrNode("Hello World", -1)

  // Deparse back into a query
  stmt, err := pg_query.Deparse(result)
  if err != nil {
    panic(err)
  }
  fmt.Printf("%s\n", stmt)
}

This will output the following:

SELECT 'Hello World'

First, the deparsing step encodes the Go structs into the new Protobuf format. Then, the C library decodes this into Postgres parse tree C structs. Last but not least, the C library's new deparser turns the C structs into the SQL query text.

Stepping away from deparsing, let's take a look at the new fingerprinting mechanism:

Fingerprints in pg_query: A better way to check if two queries are identical

Let's start with the motivation for query fingerprints. pganalyze needs to link together Postgres statistics across different data sources. For example queries from pg_stat_statements with the Postgres auto_explain logs. You can see the fingerprint in pganalyze on the query details page:

This query can be represented differently depending on which part of Postgres you look at:

pg_stat_statements: SELECT "abalance" FROM "pgbench_accounts" WHERE "aid" = ?
auto_explain: SELECT abalance FROM pgbench_accounts WHERE aid = 4674588

A simple text comparison would not be sufficient to determine that these queries are identical.

Why did we create our own query fingerprint concept?

Postgres already has the concept of a "queryid", calculated based on the post-parse analysis tree. It's used in places such as pg_stat_statements to distinguish the different query entries.

But, this queryid is not available everywhere today, e.g. you can't get it with auto_explain plans. It's also not portable between databases, as it's dependent on specific relation OIDs. Even if you have the exact same queries on your staging and production system, they will have different queryid values. And the queryid can't be generated outside the context of a Postgres server. Thus, pganalyze has its own mechanism, called a query fingerprint.

Fingerprints identify a Postgres query based on its raw parse tree alone. We've open-sourced this mechanism in pg_query:

PgQuery.fingerprint('SELECT a, b FROM c')
# => "fb1f305bea85c2f6"

PgQuery.fingerprint('SELECT b, a FROM c')
# => "fb1f305bea85c2f6"

This mechanism does not need a running server, so all you need as input is a valid Postgres query.

With pg_query 2.0, we've done a few enhancements to the fingerprint functionality:

Use the faster XXH3 hash function, instead of SHA-1. pg_query 1.0 used the outdated cryptographic hash function SHA-1. Cryptographic guarantees are not needed for this use case, and XXH3 is much faster.
Contain the fingerprint in a 64-bit value, instead of 136 bits. We've determined that 64-bit precision is good enough for query fingerprints. Postgres itself thinks so too, since it uses 64-bit for the Postgres queryid. We often use data from pg_stat_statements, so there is little benefit to more bits. Using a smaller data type also means better performance for pganalyze.
Fix edge cases where two almost identical queries had different fingerprints. Fingerprints should ignore query differences, when they result in the same query intent. We've addressed a few cases where this was not working as expected. You can look at the corresponding wiki page to understand these changes in more detail.

Additional changes for pg_query 2.0

A few other things about the new release:

The pg_query library now resides in the pganalyze organization on GitHub. This makes it clear who maintains and funds the core development. We will continue to make pg_query available under the BSD 3-clause license.
pg_query has a new method for splitting queries. This can be useful when you want to split a multi-statement string into its component statements, for example SELECT ';'; SELECT 'foo' into SELECT ';' and SELECT 'foo'
There is a new function available to access the Postgres scanner. This includes the location of comments in a query text. One could envision building a syntax highlighter based on this. Or extract comments from queries whilst ignoring comment-like tokens in a constant value.

Conclusion

The new pg_query 2.0 is available today, with bindings for Go and Ruby available to start. We are also working on a new pganalyze-maintained Rust binding that we'll have news about soon.

Help us get the word out by sharing this post on Twitter.

]]>

Efficient Postgres Full Text Search in Django

Adeyinka Adegbenro — Wed, 24 Feb 2021 12:00:00 GMT

In this article, we'll take a look at making use of the built-in, natural language based Postgres Full Text Search in Django. Internet users have gotten increasingly discerning when it comes to search. When they type a keyword into your website's search bar, they expect to find logically ranked results, including related matches and misspellings.

Because users are used to these sophisticated search systems, developers have to build applications that use more than simple LIKE queries.

Postgres Full Text Search has been available since Postgres 8.3. It can be used to find records based on semantics and knowledge of the language rather than simple string matching, is very flexible, and unlike other search options such as LIKE, it performs well for partial matches.

While LIKE can be supported by indexes, that usually won’t work well when the % operator is used on the left side of the search term. Typically this means the query planner reverts to sequential scans when using wildcard operators like this:

SELECT title FROM film WHERE description LIKE '%brilliant';

When searching multiple columns, you also have additional effort, as each column needs to be queried separately using LIKE. There are more flexible alternatives, such as SIMILAR TO and POSIX regular expression search, but those are still difficult to use when you want to catch different variations of a word (e.g., word variations like “jump,” “jumps,” “jumped,” and “jumping”).

Because LIKE and other simple methods lack language support and cannot handle word variations or ranking, Postgres Full Text Search (FTS) is generally a better option when implementing search directly in the database. With FTS, your searches will match all instances of the word, its plural, and the word's various tenses. Because FTS is bundled into Postgres, there’s no need to install extra software and no extra cost for using a third-party search provider. Additionally, all your data are stored in one place, which reduces your web application’s complexity.

This article will show you how to use Full Text Search in raw PostgreSQL queries and implement equivalent queries in Django using the Postgres driver. Along the way, you’ll see some of the use cases for the various Full Text Search methods that Postgres provides.

Should you be interested in learning more about Full Text Search with Rails, please check out our article here: Full Text Search in Milliseconds with Rails and PostgreSQL.

Core Concepts of Postgres Full Text Search
Using PostgreSQL Full Text Search in Django
Optimizing Search Performance in Django
Conclusion
About the Author

Core Concepts of Postgres Full Text Search

PostgreSQL provides several native functions for Full Text Search. In the following sections, you’ll see how to use them to “vectorize” your results and search queries so that you can use Postgres’s Full Text Search features.

tsvector data type

Before a text or document can be searched using FTS, you need to convert it to an acceptable data type, known as a tsvector. To convert plain text to a tsvector, use the Postgres to_tsvector function. This function reduces the original text to a set of word skeletons known as lexemes.

Lexemes are important because they help match related words. For instance, the words satisfy, satisfying and satisfied would convert to satisfi. This means a search for satisfy will return results containing any of the other terms as well. Stop words such as “a,” “on,” “of,” “you,” “who,” etc. are removed because they appear too frequently to be relevant in searches. The to_tsvector function returns the lexemes, along with a digit that denotes each word’s position in the text.

Note that the output of the function is language-dependent. You should tell PostgreSQL to treat the text as English (or whatever language your results are stored in). To convert the sentence “A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank” to a tsvector, run the following:

SELECT to_tsvector('english', 'A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank') AS search;

You’ll see output like the following:

                    search
----------------------------------------------------------------------
'chase':12 'documentari':3 'fanci':2 'frisbe':6 'lumberjack':9 'monkey':14 'must':11 'shark':17 'tank':18

This shows each word’s root as well as its position in the text. For example, the word fanciful, the second word in the text, has been broken down into the lexeme “fanci”, so you see ’fanci’:2.

tsquery data type

Text search systems have two major components: the text being searched and the keyword being searched for. In the case of FTS, both components must be vectorized. You saw how searchable data is converted to a tsvector in the previous section, so now you’ll see how search terms are vectorized into tsquery values.

Postgres offers functions that will convert text fields to tsquery values such as to_tsquery, plainto_tsquery and phraseto_tsquery. Search terms can also be combined with the & (AND), | (OR), and ! (NOT) operators, and parentheses can be used to group operators and determine their order. to_tsquery converts the search terms to tokens and discards stop words.

The following query:

SELECT to_tsquery('english', 'a & beautifully & very & quickly') AS search;

Returns the lexemes “beauti” and “quick” because “a” and “very” are stop words:

                    search
----------------------------------------------------------------------
'beauti' & 'quick'

Searching

Now that you know how to create a tsvector from your text data and a tsquery from your search terms, you can perform a full-text search using the @@ operator.

For example, you can run the following query to compare two strings:

SELECT to_tsvector('english', 'John''s performance was found wanting') @@ to_tsquery('english', 'want');

This query returns TRUE, indicating that there was a match: want.

If you had a film table containing movie titles and descriptions, you could use Full Text Search to find all films with a description containing the word epic and either of the words tale or story:

SELECT title, description FROM film
WHERE to_tsvector(description) @@ to_tsquery('epic & (story | tale)')
LIMIT 10;

This would give you results similar to this:

Ranking

Ranking search results can ensure that the most relevant results are shown first. Postgres provides two functions for ranking search results: ts_rank and ts_rankcd. ts_rank considers the frequency of words, while ts_rank_cd (“cd” means “coverage density”) considers the position of search terms within the text being searched.

If you run the following query, you’ll see that rank1 and rank2 are ranked the same (0.06078271) because each search query is found once in the sentence:

SELECT
  ts_rank(
    to_tsvector('english', 'Dolphins are to water as elephants are to forest'), 
    to_tsquery('english', 'elephant')
  ) AS rank1,
  ts_rank(
    to_tsvector('english', 'Dolphins are to water as elephants are to forest'), 
    to_tsquery('english', 'dolphin')
  ) AS rank2;

The more tokens that match the text, the higher the rank. In the following example, rank1 has a higher rank than rank2 because the tokens “elephant” and “dolphin” are both found in the sentence while “snake” is not:

SELECT
  ts_rank(
    to_tsvector('english', 'Dolphins are to water as elephants are to forest'), 
    to_tsquery('english', 'elephant & dolphin')
  ) AS rank1,
  ts_rank(
    to_tsvector('english', 'Dolphins are to water as elephants are to forest'), 
    to_tsquery('english', 'dolphin & snake')
  ) AS rank2;

Weighting

You can also give relevance to some factors over others by weighing them. For instance, when searching the film table, the highest weight could be given to the movie title and less weight could be given to the description using the setweight function:

--- Set the weights
SELECT setweight(to_tsvector('english', 'elephant'), 'A') || setweight(to_tsvector('english', 'dolphin'), 'B') AS weight;

--- Run the query
SELECT
  ts_rank(
    setweight(to_tsvector('english', 'elephant'), 'A') || setweight(to_tsvector('english', 'dolphin'), 'B'), 
    to_tsquery('english', 'elephant')
  ) AS elephant_rank,
  ts_rank(
    setweight(to_tsvector('english', 'elephant'), 'A') || setweight(to_tsvector('english', 'dolphin'), 'B'),
    to_tsquery('english', 'dolphin')
  ) AS dolphin_rank;

elephant_rank is ranked higher because it matched elephant which has a higher weight of A, compared to dolphin_rank which has a weight of B.

| elephant_rank | dolphin_rank |
|---------------|--------------|
| 0.6079271     | 0.24317084   |

ts_rank takes an optional first argument, weight. When this argument is left empty, it defaults to “{0.1, 0.2, 0.4, 1.0}” in the order D, C, B, A. By default, A has the highest weight of 1.0, B has 0.4, C has 0.2, and D has 0.1. You can set the weight of any of A, B, C, or D to a different value using any decimal between -0.1 and 1.0. This allows you to have fine-grained control over how results are returned and ensure that users see the right results for their queries.

Now that you’ve seen how to use Postgres’ Full Text Search functions, you’re ready to start applying these ideas to your Django app.

Using PostgreSQL Full Text Search in Django

Using Postgres’ Full Text Search in Django is an ideal way to add accurate and fast search because it is easy to maintain and fast to work with. To demonstrate Full Text Search in Django, consider a PostgreSQL database dvdrental, with a film table, and an equivalent Film model in a Django application implemented like this:

from django.db import Models

class Film(models.Model):
    film_id = models.AutoField(primary_key=True)
    title = models.CharField(max_length=255)
    description = models.TextField(blank=True, null=True)

    def __str__(self):
        return ', '.join(['film_id=' + str(self.film_id), 'title=' + self.title, 'description=' + self.description])

In the remainder of this article, you’ll see how to search a single database table field, search multiple fields, rank search results, and optimize the performance of Full text Search using vector fields and indexes. You can run the commands from your Python shell.

Searching a Single field

The simplest way to start using Full Text Search in Django is by using search lookup. To search the description column on the Film model, append __search to the column name when filtering the model:

>>> from <appname>.models import Film
>>> Film.objects.filter(description__search='An epic tale')
<QuerySet [
    <Film: film_id=8, title=Airport Pollock, description=A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India>, 
    <Film: film_id=97, title=Bride Intrigue, description=A Epic Tale of a Robot And a Monkey who must Vanquish a Man in New Orleans>..]>

Under the hood, Django converts the description field to a tsvector and converts the search term to a tsquery. You can check the underlying query to verify this:

>>> connection.queries[0]['sql']
('SELECT "film"."film_id", "film"."title", "film"."description", '
 '"film"."release_year", "film"."language_id", "film"."rental_duration", '
 '"film"."rental_rate", "film"."length", "film"."replacement_cost", '
 '"film"."rating", "film"."last_update", "film"."special_features", '
 '"film"."fulltext", "film"."index_column", "film"."vector_column" FROM "film" '
 'WHERE to_tsvector(COALESCE("film"."description", \'\')) @@ '
 "plainto_tsquery('An epic tale') LIMIT 21")

SearchVector

If you want to use the tsvector on its own, you can use the Django SearchVector class. For example, to search for the term “love” in both the title and description columns, you can run the following in your Python shell:

>>> Film.objects.annotate(search=SearchVector('title', 'description', config='english')).filter(search='love')
<QuerySet [<Film: film_id=374, title=Graffiti Love, description=A Unbelieveable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer in Berlin>, 
<Film: film_id=448, title=Idaho Love, description=A Fast-Paced Drama of a Student And a Crocodile who must Meet a Database Administrator in The Outback>,
 <Film: film_id=458, title=Indian Love, description=A Insightful Saga of a Mad Scientist And a Mad Scientist who must Kill a Astronaut in An Abandoned Fun House>, 
<Film: film_id=511, title=Lawrence Love, description=A Fanciful Yarn of a Database Administrator And a Mad Cow who must Pursue a Womanizer in Berlin>, 
<Film: film_id=535, title=Love Suicides, description=A Brilliant Panorama of a Hunter And a Explorer who must Pursue a Dentist in An Abandoned Fun House>, 
<Film: film_id=536, title=Lovely Jingle, description=A Fanciful Yarn of a Crocodile And a Forensic Psychologist who must Discover a Crocodile in The Outback>]>

When you inspect the underlying query, you can see how Django uses to_tsvector to query both the title and description fields in the database:

>>> pp(connection.queries[0]['sql'])
('SELECT "film"."film_id", "film"."title", "film"."description", '
 'to_tsvector(\'english\'::regconfig, COALESCE("film"."title", \'\') || \' \' '
 '|| COALESCE("film"."description", \'\')) AS "search" FROM "film" WHERE '
 'to_tsvector(\'english\'::regconfig, COALESCE("film"."title", \'\') || \' \' '
 '|| COALESCE("film"."description", \'\')) @@ '
 "plainto_tsquery('english'::regconfig, 'love') LIMIT 21")

SearchQuery

SearchQuery is the abstraction of the to_tsquery, plainto_tsquery and phraseto_tsquery functions in Postgres. There are several ways to use the SearchQuery class including using two keywords in a search:

>>> SearchQuery("story beautiful")

Or searching for a specific phrase:

>>> SearchQuery("mad scientist", search_type="phrase")

Unlike SearchVector, SearchQuery supports boolean operators. The boolean operators combine search terms using logic just like they did in Postgres:

>>> SearchQuery("('epic' | 'beautiful' | 'brilliant') & ('tale' | 'story')", search_type="raw")

Using SearchVector and SearchQuery together in a search allows you to create powerful custom searches in Django:

>>> vector = SearchVector('title', 'description', config='english') # search the title and description columns..
>>> query = SearchQuery("('epic' | 'beautiful' | 'brilliant') & ('tale' | 'story')", search_type="raw") # ..with the search term
>>> Film.objects.annotate(search=vector).filter(search=query)
<QuerySet [
    <Film: film_id=8, title=Airport Pollock, description=A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India>, <Film: film_id=30, title=Anything Savannah, description=A Epic Story of a Pastry Chef And a Woman who must Chase a Feminist in An Abandoned Fun House>,
    <Film: film_id=46, title=Autumn Crow, description=A Beautiful Tale of a Dentist And a Mad Cow who must Battle a Moose in The Sahara Desert>, <Film: film_id=97, title=Bride Intrigue, description=A Epic Tale of a Robot And a Monkey who must Vanquish a Man in New Orleans>, 
   <Film: film_id=196, title=Cruelty Unforgiven, description=A Brilliant Tale of a Car And a Moose who must Battle a Dentist in Nigeria>, 
   <Film: film_id=202, title=Daddy Pittsburgh, description=A Epic Story of a A Shark And a Student who must Confront a Explorer in The Gulf of Mexico>...]>

SearchRank

Using SearchVector and SearchQuery to generate Full Text Search queries in Django is a great start, but a robust search feature likely needs custom rankings as well. Search results can be ranked in a Django app using the SearchRank class.

Here's an example using the default ranking (most matches):

>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector('title', 'description', config='english')
>>> query = SearchQuery("('epic' | 'beautiful' | 'brilliant') & ('tale' | 'story')", search_type="raw")
>>> Film.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')

You can also add weights to each field in your SearchVector:

>>> vector = SearchVector('title', weight='A') + SearchVector('description', config='english', weight='B')
>>> query = SearchQuery("('epic' | 'beautiful' | 'brilliant') & ('tale' | 'story')", search_type="raw")
>>> Film.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')

This makes matches in the title field count more than those in the description.

Optimizing Search Performance in Django

To get the best performance from a Postgres Full Text Search, you need to create an indexed column that can store a tsvector datatype. Performing a search on this new column will be orders of magnitude faster than generating a tsvector with SearchVector on the fly. If the text has been pre-converted and stored in a column, there's no need for runtime conversion.

Using GIN indexes

To implement this new column in Django, you’ll need to add the SearchVectorField class to the model. To index this field, use a GIN index as recommended for Full Text Search by PostgreSQL.

PostgreSQL provides two main indexes to speed up full text search: GIN (Generalized Inverted Index) and GIST (Generalized Search Tree). The GIST index is faster to build and useful for frequently updated fields, but it can be lossy (i.e. it sometimes returns false positives). GIN is still very scalable, and while it isn’t lossy, it doesn’t allow you to store weights. Learn more about their differences and use cases here.

To add this field and index to a model, use the GinIndex and SearchVectorField classes like this:

from django.db import Models

from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex # add the Postgres recommended GIN index 

class Film(models.Model):
    film_id = models.AutoField(primary_key=True)
    title = models.CharField(max_length=255)
    description = models.TextField(blank=True, null=True)
    vector_column = models.SearchVectorField(null=True)  # new field

    def __str__(self):
        return ', '.join(['film_id=' + str(self.film_id), 'title=' + self.title, 'description=' + self.description])

    class Meta
        indexes = (GinIndex(fields=["vector_column"]),) # add index

Now, run the migrations for your Django app:

./manage.py makemigrations <your_app_name> && ./manage.py migrate <your_app_name>

Next, you need a way to make sure that anytime the title and description field on the Film table are updated, the vector_column field is automatically computed and stored. For this, you can use PostgreSQL triggers.

Since there's no way to use triggers in the Django model directly, add a SQL command in a new migration file:

 ./manage.py makemigrations <your_app_name> -n create_trigger --empty
 # Migrations for '<your_app_name>':
 # <your_app_name>/migrations/0003_create_trigger.py

Open the auto-generated file and add a trigger set off by the UPDATE command. This trigger computes the vector_column field for new and existing rows:

from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('<your_app_name>', '0002_auto_20210224_0325'),
    ]

    operations = [
        migrations.RunSQL(
            sql='''
              CREATE TRIGGER vector_column_trigger
              BEFORE INSERT OR UPDATE OF title, description, vector_column
              ON film
              FOR EACH ROW EXECUTE PROCEDURE
              tsvector_update_trigger(
                vector_column, 'pg_catalog.english', title, description
              );

              UPDATE film SET vector_column = NULL;
            ''',

            reverse_sql = '''
              DROP TRIGGER IF EXISTS vector_column_trigger
              ON film;
            '''
        ),
    ]

Run the migrate command for your app again:

python manage.py migrate <your_app_name>

Now your database should have a new column called vector_column that contains an indexed tsvector for each film’s title and description.

Generated Columns in Postgres 12+

Running Postgres 12 or newer? You can make use of the Generated Columns feature to avoid using triggers for updating the tsvector column, by creating the column like this:

ALTER TABLE film ADD COLUMN vector_column tsvector GENERATED ALWAYS AS (
  setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
  setweight(to_tsvector('english', coalesce(description,'')), 'B')
) STORED;

Note that Django does not have official support for generated columns (tagged as "wontfix" in the bug tracker), so you have to create the column manually in a migration:

from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('<your_app_name>', '0002_auto_20210224_0326'),
    ]

    operations = [
        migrations.RunSQL(
            sql='''
              ALTER TABLE film ADD COLUMN vector_column tsvector GENERATED ALWAYS AS (
                setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
                setweight(to_tsvector('english', coalesce(description,'')), 'B')
              ) STORED;
            ''',

            reverse_sql = '''
              ALTER TABLE film DROP COLUMN vector_column;
            '''
        ),
    ]

Comparing Query Performance

Now that you have added the new column to optimize performance, you can compare the non-indexed Full Text Search to using the SearchVectorField vector_column. Using your Python shell, import your model and search the title and description without using the vector_column:

>>> from django.db import connection, reset_queries
>>> from <your_app_name>.models import Film
>>> from django.contrib.postgres.search import SearchVector, SearchQuery
>>> from pprint import pprint as pp

>>> vector = SearchVector('title', 'description', config='english')
>>> query = SearchQuery('love')
>>>  Film.objects.annotate(search=vector).filter(search=query)
 <QuerySet [<Film: film_id=374, title=Graffiti Love, description=A Unbelieveable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer in Berlin>,
...
>>> pp(connection.queries) # Runtime of last run query
[{'sql': 'SELECT "film"."film_id", "film"."title", "film"."description", '
         '"film"."vector_column", to_tsvector(\'english\'::regconfig, '
         'COALESCE("film"."title", \'\') || \' \' || '
         'COALESCE("film"."description", \'\')) AS "search" FROM "film" WHERE '
         'to_tsvector(\'english\'::regconfig, COALESCE("film"."title", \'\') '
         '|| \' \' || COALESCE("film"."description", \'\')) @@ '
         "plainto_tsquery('love') LIMIT 21",
  'time': '0.045'}]
>>> reset_queries() # clears time of last query from memory, so we can re-use connection.queries for new queries

As you can see, this query takes 0.045 seconds.

Now, check how long it takes to search against the indexed SearchVectorField vector_column:

>>> Film.objects.filter(vector_column='love')
 <QuerySet [<Film: film_id=374, title=Graffiti Love, description=A Unbelieveable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer in Berlin>, 
...
>>> pp(connection.queries)
[{'sql': 'SELECT "film"."film_id", "film"."title", "film"."description", '
         '"film"."vector_column" FROM "film" WHERE "film"."vector_column" @@ '
         "plainto_tsquery('love') LIMIT 21",
  'time': '0.001'}]
>>> reset_queries()

On a table with 1,003 rows, query execution time went down from 0.045s to 0.001s!

This could scale up to make a significant difference if you’re dealing with millions of records. The only downside is that saving your data into the new column and indexing it will make writes take slightly longer. Still, this is usually a price worth paying as users expect search to be as fast as possible.

Conclusion

Postgres offers a wide range of tools to support FTS, and in this article, you’ve seen how some of them work. You also saw how to use these tools in a Django application and leverage the SearchVectorField class with a GIN index to optimize performance. While there’s certainly more to building a fast, accurate search application, having a strong understanding of Postgres’ Full Text Search features will help you understand if it’s the best option for you.

Further Reading:

About the Author

Adeyinka works as a Software Engineer at BriteCore, and is based in Lagos, Nigeria. She loves researching and writing in-depth technical content. You can find her on GitHub.

]]>

Creating Custom Postgres Data Types in Django

Josh Alletto — Tue, 15 Dec 2020 12:00:00 GMT

Postgres allows you to define custom data types when the default types provided don't fit your needs. There are many situations where these custom data types come in handy.

For example, if you have multiple columns in several tables that should be an int between 0 and 255, you could use a custom data type so that you only have to define the constraints once. Or, if you have complex data - like metadata about a file - and you want to save it to a single column instead of spreading it across several, custom data types can help.

Custom Domains in Postgres
Composite Types in Postgres
Custom Types in Django
Conclusion
About the Author

No matter how you decide to define your datatype, Django has the functionality to allow you to map custom column data to model attributes. You can achieve this by extending the Django field class. In this walkthrough, we'll see how to create custom types in Postgres and then use them in Django to ensure consistent data types across your application. We will do this by walking you through an example project.

Should you be interested in how to create custom Postgres data types in Rails, check out my dedicated article about it!

Custom Domains in Postgres

There are several different kinds of custom data types in Postgres, including enums and range types. The two we’ll use in our project today are called domain types and composite types.

First, let’s take a look at domain types. Domains are a way of adding restrictions to an existing type so that it can be reused in columns across tables. They are particularly useful for columns like email addresses, phone numbers, or street addresses, where you might find yourself repeating the same checks over and over. A custom domain allows you to define those checks once and then reuse them making them easier to manage and maintain.

For our example project, we'll start by creating a custom data type that performs a check to ensure a string doesn't contain any spaces:

CREATE DOMAIN string_no_spaces as VARCHAR NOT NULL CHECK (value !~ '\s');

Now we can use this type on as many tables or in as many columns as we like. For example say we don’t want to allow spaces in user_names for a chat app:

CREATE TABLE users (
  id serial primary key, 
 user_name string_no_spaces
);

Now if you try to add a value with a space, Postgres will throw an error:

INSERT INTO users(user_name) VALUES ('I am a      bad user name');
-- ERROR:  value for domain string_no_spaces violates check constraint "string_no_spaces_check"

We can also reuse this domain in the definition of another domain. For example:

CREATE DOMAIN email_with_check AS string_no_spaces NOT NULL CHECK (value ~ '@');

CREATE TABLE email_addresses (
  user_id integer,
  email email_with_check
);

INSERT INTO email_addresses(email) VALUES ('josh @gmail.com');
-- ERROR:  value for domain email_with_check violates check constraint "string_no_spaces_check"

INSERT INTO email_addresses(email) VALUES ('joshgmail.com');
-- ERROR:  value for domain email_with_check violates check constraint "email_with_check_check"

Here, we've created a new check to ensure an email contains @ and we've used string_no_spaces as our base type. This allows us to inherit the no spaces check. Now data of datatype email_with_check must contain @ and cannot contain spaces.

Composite Types in Postgres

The second kind of custom data type we’ll look at today is called a composite type. A composite type is essentially a group of data that can be held in a single column. Composite types can be helpful if you have lists of data that you don't want to be spread over multiple columns. Perhaps this data only makes sense when grouped together like the dimensions of a package.

RGB color data is another good example because it doesn't make much sense on its own - 255 is just an int - but coupled with some labels and two other numbers (red: 255, green: 0, blue: 0), it becomes the color red. Every time we access a color, we'll want to have all three of these values returned, so it saves us from having to query multiple columns for a group of data that is only meaningful when combined.

Let's start by creating a new RGB color value type:

CREATE TYPE rgb_color_value as (
  red integer,
  green integer,
  blue integer
);

Next, we can create a new table and use both our domain and custom data type for the columns:

CREATE TABLE colors (
  name string_no_spaces,
  rgb rgb_color_value
);

INSERT INTO colors(name, rgb) VALUES('pink', (252,15,192));

SELECT * FROM colors;

 name | rgb   
------+---------
 pink | (252,15,192)

We can even access the individual values. For example, if all we want is the green value:

SELECT (rgb).green FROM colors;

  green 
 ------
   15

Custom Types in Django

Let's use our string_no_spaces domain and our rgb_color_value composite type to create a Django model to define a color. rgb_color_value is going to take the most work, so we'll start there and then come back to string_no_spaces.

Registering a type with psycopg2

We'll use the pyscopg2 database adapter in this example. I won't go into how to set it up here, but I recommend this tutorial. It does a good job covering the setup if you aren't familiar with it yet.

We'll need to start by registering and creating an adapter for our new type so that psycopg2 knows how to handle it. After we register it, psycopg will return values from the database as a named tuple.

from django.db import connection
from psycopg2.extras import register_composite

Rgb = register_composite(
  'rgb_color_value',
  connection.cursor().cursor,
  globally=True
).type

The above code will handle data coming to our app from the database, but we'll also need to tell psycopg what to do with data sent to the database. That's where the adapter comes in:

from django.db import connection
from psycopg2.extras import register_composite
from psycopg2.extensions import register_adapter, adapt, AsIs

Rgb = register_composite(
  'rgb_color_value',
  connection.cursor().cursor,
  globally=True
).type

def rgb_adapter(value):
  return AsIs("(%s, %s, %s)::rgb_color_value" % (
    adapt(value.red).getquoted(),
    adapt(value.green).getquoted(),
    adapt(value.blue).getquoted()
  ))

register_adapter(Rgb, rgb_adapter)

Now that psycopg knows about our new data type and how to handle it, we can create the same functionality for Django.

Representing composite types as a Python class

We want to be able to do things with our objects like this:

rgb = Rgb(255, 0, 0)

my_color_object.rgb = rgb 

my_color_object.save()

To do that, we'll need to start with a Python class that represents an RGB value.

class Rgb:
    def __init__(self, red, green, blue):
        self.red = red
        self.green = green
        self.blue = blue

We'll come back to this class in a bit, but first, we need to talk about fields.

Using Django fields

You are probably familiar with many of Django's built-in model fields like models.CharField or models.IntegerField. You've also probably noticed that many of these fields correspond to data types we often use in Postgres (varchar, int etc.).

For custom data types, Django allows us to create our fields and then use them in our models:

from django.db import models

class RgbField(models.Field):
    
    def db_type(self, connection):
        return 'rgb_color_value'

All custom fields inherit from models.Field. You can also inherit from existing fields like models.CharField (which itself inherits from models.Field) This is helpful if your custom type behaves similarly to an existing type. Since ours doesn't, we'll inherit directly from models.Field.

Next, we need to override three methods so that they will return instances of our Rgb class. The first, from_db_value() is called when data is loaded from the database. This is the method that will receive our named tuple we set up with Psycopg earlier. The second, to_python() gets called during deserialization. These two need to return an instance of the Rgb class.

The last method we need to override is get_prep_value, where we'll convert our Rgb object back into a tuple before handing it off to Psycopg to save to the database. When we're done, our field class should look like this:

class RgbField(models.Field):
  
  def from_db_value(self, value, expression, connection):
      if value is None:
          return value
      return Rgb(value.red, value.green, value.blue)

  def to_python(self, value):
      if isinstance(value, Rgb):
          return value

      if value is None:
          return value

      return Rgb(value.red, value.green, value.blue)

  def get_prep_value(self, value):
      return (value.red, value.green, value.blue)
  
  def db_type(self, connection):
      return 'rgb_color_value'

The checks I put in place above are suggestions from the Django docs.

Finally, we can create our model using our brand new Rgb field:

class Color(models.Model):
    rgb = RgbField()
    name = models.CharField()

But wait! Didn't we create a special string_no_spaces domain that we want to use for the name attribute?

Since this type is just a string with checks at the database level, all we need to do is create the field with the appropriate db_type method:

class StringNoSpacesField(models.Field):
    
    def db_type(self, connection):
        return 'string_no_spaces'

Now we can update our model and run our migrations:

class Color(models.Model):
    rgb = RgbField()
    name = StringNoSpacesField()

Let's confirm that everything is working as expected. In the python shell (I'm using shell plus), we'll create a new color:

>>> from colors.models import Rgb
>>> rgb = Rgb(255, 0, 0)
>>> c = Color.objects.create(name='red', rgb=rgb)
>>> c.rgb
<colors.models.Rgb object at 0x104e3d6d8>
>>> c.rgb.red
255

If you try to create a color with a name that has a space in it, you will get an error like this:

django.db.utils.IntegrityError: value for domain string_no_spaces violates check constraint "string_no_spaces_check"

Now, let's check the database and make sure everything got saved as the correct type:

customdt=# SELECT pg_typeof(rgb), pg_typeof(name) FROM colors_color;
    pg_typeof | pg_typeof     
-----------------+------------------
 rgb_color_value | string_no_spaces
(1 row)

From here, we could ensure that only numbers from 0 - 255 are entered by overriding the __ init __ method and adding checks at the Postgres level. We could also create a new type for storing the hex code for each color in addition to the RGB value.

Conclusion

In this article, we saw how to create new data types in Postgres and bring them into a Django application. We created a string_no_spaces type with CREATE DOMAIN to help us set up some database level checks on columns. We used CREATE TYPE to create a brand new composite data type called rgb_color_value that allowed us to group the data for a color value and save it to a single column.

We then registered our new data types with psycopg2 so that the database adapter knew how to handle them. Finally, we took a look at the Django Field class. We learned how to control values coming to and from our database adapter to ensure our custom data type matches its corresponding Python class for use inside of our Django application.

As mentioned above, you can find all resources talked about here on our resources repository on GitHub.

Share this article: If you liked this article you might want to tweet it to your peers.

About the Author

]]>

PostGIS vs. Geocoder in Rails

Leigh Halliday — Thu, 01 Oct 2020 12:00:00 GMT

This article sets out to compare PostGIS in Rails with Geocoder and to highlight a couple of the areas where you'll want to (or need to) reach for one over the other. I will also present some of the terminology and libraries that I found along the way of working on this project and article as I set out to understand PostGIS better and how it is integrated with Rails.

If you are interested in learning how to work with geospatial data with PostGIS in Django I recommend having a look at our blog post Using GeoDjango and PostGIS in Django here.

Installing PostGIS
ActiveRecord PostGIS Adapter
Our Example Data
Building a Geo Helper Class with PostGIS
Finding Nearby Records with PostGIS and Geocoder
Finding Records Within a Bounding Box with PostGIS and Geocoder
Finding Records Within a Polygon with PostGIS and Geocoder
Finding Nearby Related Records with PostGIS and Geocoder
Conclusion
About the Author

Picture via Annie Spratt on Unsplash

I have built a number of Rails applications over the years that show locations on a map, have nearby search functionality, and I had never used PostGIS before! How was this possible? The reason is that there is a Ruby gem named Geocoder which enables you to do these sorts of queries, and it's quite efficient! That said, there is a reason that PostGIS exists. For more complex geo queries I’d recommend reaching beyond Geocoder to PostGIS.

As an example, if you wanted to find homes which have a school within 1km of them, or if you wanted to draw an oddly shaped polygon on a map and search within it, this is the world where PostGIS shines and makes these complex geo queries possible.

In this article we will be covering:

PostGIS in Rails setup
Finding nearby records (Geocoder + PostGIS)
Finding records within a bounding box (Geocoder + PostGIS)
Finding records within a polygon (PostGIS)
Finding nearby related records (PostGIS)

The source code referenced in this article can be found here.

Installing PostGIS

Postgres comes with a number of built-in extensions that you can enable, but unfortunately PostGIS (Spatial and Geographic objects for Postgres) isn't one of them. In order to enable this extension, you will have to use a Postgres install with PostGIS support. I recommend using the official postgis docker image, but luckily many hosted Postgres solutions come with PostGIS already available. If you are not sure, you can query the available extensions with the following query:

select *
from pg_available_extensions
where name like '%postgis%'

If you'd like to see if the extension is already enabled, you can run this query:

select * from pg_extension

And finally, to enable this extension, you can use the command create extension postgis, but since we're working within Rails, there is a Gem that will take care of this step for us as we'll see below.

ActiveRecord PostGIS Adapter

If you have confirmed that your version of Postgres supports the postgis extension, you're ready to integrate it with your Rails application. This can be done by using the activerecord-postgis-adapter gem. Two things need to be done to get up and running. The first is to update the adapter within config/database.yml to be set to postgis. Next, if this is a new application, you can run rails db:create as normal, but if it is an existing one, you'll have to run the command rake db:gis:setup. This command is enabling the postgis extension in your database.

Our Example Data

We'll be working with sample data for a realtor website that allows us to find homes in a variety of ways, including homes that are nearby a local school. There are two models: homes and schools. The Rails migration to create these tables is below:

class CreateHomes < ActiveRecord::Migration[6.0]
  def change
    create_table :homes do |t|
      t.string :name, null: false
      t.string :status, null: false
      t.bigint :price, null: false
      t.integer :beds, null: false, default: 0
      t.integer :baths, null: false, default: 0
      t.st_point :coords, null: false, geographic: true
      t.float :longitude, null: false
      t.float :latitude, null: false
      t.timestamps

      t.index :coords, using: :gist
      t.index %i[latitude longitude]
      t.index :status
      t.index :price
    end

    create_table :schools do |t|
      t.st_point :coords, null: false, geographic: true

      t.index :coords, using: :gist
      t.timestamps
    end
  end
end

By using activerecord-postgis-adapter we are able to define PostGIS columns within our migration file. When working with PostGIS you can store a point (latitude + longitude) as a single column of type ts_point, whereas when working with Geocoder the latitude and longitude are stored as floats in separate columns. Because we are comparing the two approaches, we will store the data both ways, but typically you would choose one approach or the other.

PostGIS geographic columns can be indexed using GiST style indexes. GiST indexes are required over B-Tree indexes when working with geographic data because coordinates cannot be easily sorted along a single axis (such as numbers, letters, dates, etc...) in a way that would allow the database to speed up common geographic operations.

The example project for this article contains a seeds file (run with rake db:seed) which will generate 100k homes and 100 schools in and around the Atlanta, Georgia area.

Building a Geo Helper Class with PostGIS

The Rails PostGIS adapter is based on a library named RGeo, which while incredibly powerful, I found a little bit confusing due to a lack of documentation. I ended up building a small helper class to generate different geo objects for me. The first thing to point out is what SRID is. Just like the imperial and metric systems are used to measure and weigh amounts using an agreed upon reference point, coordinates also need a coordinate reference system to ensure that the latitude and longitude that one uses means the same thing to different people when referring to a single place on earth. 4326 is the spatial system used for GPS satellite navigation systems and the one we will be using within this article.

One last thing to define is what WKT is. Well-known Text representation of geometry is a string representation of a point, line string, and polygon (among other things) that we will be using in our examples in this article. This is the format Postgres (PostGIS) receives and displays geographic data types in.

class Geo
  SRID = 4326

  def self.factory
    @@factory ||= RGeo::Geographic.spherical_factory(srid: SRID)
  end

  def self.pairs_to_points(pairs)
    pairs.map { |pair| point(pair[0], pair[1]) }
  end

  def self.point(longitude, latitude)
    factory.point(longitude, latitude)
  end

  def self.line_string(points)
    factory.line_string(points)
  end

  def self.polygon(points)
    line = line_string(points)
    factory.polygon(line)
  end

  def self.to_wkt(feature)
    "srid=#{SRID};#{feature}"
  end
end

Finding Nearby Records with PostGIS and Geocoder

One of the most common geo queries used in applications is to find all records within X distance from a known point (the user's location, an event, a search, etc...). Because we installed Geocoder and added reverse_geocoded_by :latitude, :longitude to our Home class, we can use the nearby method to find all homes within 5km of this latitude and longitude (which happens to be Atlanta, Georgia). Geocoder likes to have arrays with latitude and then longitude, as opposed to PostGIS which prefers the exact opposite order!

Home.near([33.753746, -84.386330], 5).count(:all) # ~5ms

This query ran in about 5ms on my computer (searching through 100k records)... pretty fast! The reason it is fast is because we added an index on the latitude and longitude fields, but also because Geocoder applies a bounding box filter which utilises the index. Remember the Spatial Reference System (SRID) that we mentioned above? Because our coordinates do not take place on a Cartesian plane, we can’t use a standard distance formula to calculate the distance between two points. Although we won’t venture further into the math of this query below, it takes into consideration the Earth’s spherical nature when calculating the distance between two coordinates as specified by latitude and longitude. This article dives into more detail on these calculations if you are interested.

SELECT COUNT(*) FROM "homes" WHERE (homes.latitude BETWEEN 33.708779919704064 AND 33.798712080295935 AND homes.longitude BETWEEN -84.44041260768655 AND -84.33224739231345 AND (6371.0 * 2 * ASIN(SQRT(POWER(SIN((33.753746 - homes.latitude) * PI() / 180 / 2), 2) + COS(33.753746 * PI() / 180) * COS(homes.latitude * PI() / 180) * POWER(SIN((-84.38633 - homes.longitude) * PI() / 180 / 2), 2)))) BETWEEN 0.0 AND 5)

We'll have to build our own near query when working with PostGIS, but don't worry, it's pretty straight forward! The g_near method lives within the Home model, and takes advantage of the ST_DWithin function provided by PostGIS. Remember that we have to convert our point into the correct WKT format so that PostGIS understands the data we are passing it.

def self.g_near(point, distance)
  where(
    'ST_DWithin(coords, :point, :distance)',
    { point: Geo.to_wkt(point), distance: distance * 1000 } # wants meters not kms
  )
end

Home.g_near(Geo.point(-84.386330, 33.753746), 5).count # ~5ms

This query performs just about as fast as the Geocoder version (because of our GiST index on the coords column), but is definitely a little easier on the eyes to read.

SELECT COUNT(*) FROM "homes" WHERE (ST_DWithin(coords, 'srid=4326;POINT (-84.38633 33.753746)', 5000))

Finding Records Within a Bounding Box with PostGIS and Geocoder

Geocoder provides us a way to find all records within a bounding box (roughly a rectangle, ignoring projection onto a sphere), and we just have to pass it the bottom left (south west) and top right (north east) coordinates.

Home.within_bounding_box(
  [33.7250057553, -84.4224209302],
  [33.774350796, -84.3570139222]
).count # ~5ms

Because it can use the index on latitude and longitude, it is quite efficient.

SELECT COUNT(*) FROM "homes" WHERE (homes.latitude BETWEEN 33.7250057553 AND 33.774350796 AND homes.longitude BETWEEN -84.4224209302 AND -84.3570139222)

To perform a bounding box query using PostGis, we'll create a method named g_within_box inside of the Home model, and utilize a PostGIS function named ST_MakeEnvelope along with the && operator.

def self.g_within_box(sw_point, ne_point)
  where(
    "coords && ST_MakeEnvelope(:sw_lon, :sw_lat, :ne_lon, :ne_lat, #{
      Geo::SRID
    })",
    {
      sw_lon: sw_point.longitude,
      sw_lat: sw_point.latitude,
      ne_lon: ne_point.longitude,
      ne_lat: ne_point.latitude
    }
  )
end

Home.g_within_box(
  Geo.point(-84.4224209302, 33.7250057553),
  Geo.point(-84.3570139222, 33.774350796)
).count # ~5ms

Again, this version performs at about the same efficiency as the Geocoder version.

SELECT COUNT(*) FROM "homes" WHERE (coords && ST_MakeEnvelope(-84.4224209302, 33.7250057553, -84.3570139222, 33.774350796, 4326))

Finding Records Within a Polygon with PostGIS and Geocoder

We're now into territory that requires PostGIS. To find records inside of a polygon, along with the help of our Geo class helper and the ST_Covers function from PostGIS, we can create a method named g_within_polygon in our Home model. This polygon is a triangle, where the last point is the same as the first one, thereby "closing" the shape of the polygon.

def self.g_within_polygon(points)
  polygon = Geo.polygon(points)
  where('ST_Covers(:polygon, coords)', polygon: Geo.to_wkt(polygon))
end

Home.g_within_polygon(
  Geo.pairs_to_points(
    [
      [-84.39731626974567, 33.75570358345219],
      [-84.33139830099567, 33.86524376001825],
      [-84.25243406759724, 33.770545357734925],
      [-84.39731626974567, 33.75570358345219]
    ]
  )
).count # ~5ms

This query remains efficient due to the use of our GiST index, searching through 100k records in about 5ms.

SELECT COUNT(*) FROM "homes" WHERE (ST_Covers('srid=4326;POLYGON ((-84.39731626974567 33.75570358345219, -84.33139830099567 33.86524376001825, -84.25243406759724 33.770545357734925, -84.39731626974567 33.75570358345219))', coords))

Using PostGIS it is also possible to find related nearby records. What do I mean by that? Let's try to find available homes that are within 1km of a school. This can be done by joining to the schools table and utilizing ST_DWithin for the on clause. Starting with the SQL we'd like to produce:

SELECT
  count(DISTINCT homes.id)
FROM
  homes
  INNER JOIN schools ON ST_DWithin (homes.coords, schools.coords, 1000)
WHERE
  homes.status = 'available'

Within the Home model of our Rails application, we can create two scopes that allow us to find these homes. We're able to join 100k homes to the schools table (100 schools) based on their proximity in approximately 16ms.

class Home < ApplicationRecord
  scope :available, -> { where(status: 'available') }
  scope :near_school,
        lambda {
          select('DISTINCT ON (homes.id) homes.*').joins(
            'INNER JOIN schools ON ST_DWithin (homes.coords, schools.coords, 1000)'
          )
        }
end
# Example using the scopes declared above
Home.available.near_school.count('distinct homes.id') # 16ms

Conclusion

We've only scratched the surface of what you can do with PostGIS, yet we were able to cover a ton of functionality that is common among websites that allow you to filter results based on their location. That said, if PostGIS isn't available as an extension on your version of Postgres, or you aren't requiring the power that PostGIS provides, Geocoder offers you a great alternative.

Share this article: If you liked this article you might want to tweet it to your peers.

About the Author

Leigh Halliday is a guest author for the pganalyze blog. He is a developer based out of Canada who works at FlipGive as a full-stack developer. He writes about Ruby and React on his blog and publishes React tutorials on YouTube.

]]>

Lessons Learned from Running Postgres 13: Better Performance, Monitoring & More

Maciek Sakrejda — Mon, 21 Sep 2020 12:00:00 GMT

Postgres 13 is almost here. It's been in beta since May, and the general availability release is coming any day. We've been following Postgres 13 closely here at pganalyze, and have been running the beta in one of our staging environments for several months now.

There are no big new features in Postgres 13, but there are a lot of small but important incremental improvements. Let's take a look.

Performance
Monitoring
Usability
Conclusion

Performance

Postgres 13 performance improvements include both built-in optimizations and heuristics that will make your database run better out of the box, as well as additional features to give you more flexibility in optimizing your schema and queries.

Smaller Indexes with B-Tree Deduplication

Postgres 13 introduces a way for B-Tree indexes to avoid storing duplicate entries in some situations. In general, a B-Tree index consists of a tree of indexed values, with each leaf node pointing to a particular row version. Because each leaf points to one row version, if you are indexing non-unique values, those values need to be repeated.

The de-duplication mechanism avoids that by having a leaf node point to several row versions if possible, which leads to smaller indexes.

Here is an example from our own pganalyze application schema: We have a queries table to track all the queries we monitor, and a database_id field to track which database they belong to. We index database_id (so we can quickly fetch queries for a specific database), and because each database typically has more than one query, there is a lot of duplication in this index.

New B-Tree indexes in Postgres 13 use the deduplication feature by default, but if for some reason, you need to turn it off, you can control it with the deduplicate_items storage parameter. Here we create the same index in two different ways, with deduplication explicitly on and off (though again, you don't need to specify on—this is the default):

CREATE INDEX CONCURRENTLY queries_db_id_idx_no_dedup ON queries(database_id)
WITH (deduplicate_items=off);

CREATE INDEX CONCURRENTLY queries_db_id_idx_yes_dedup ON queries(database_id)
WITH (deduplicate_items=on);

SELECT relname, pg_size_pretty(pg_relation_size(oid)) FROM pg_class
WHERE relname IN ('queries_db_id_idx_no_dedup', 'queries_db_id_idx_yes_dedup');

           relname           | pg_size_pretty 
-----------------------------+----------------
 queries_db_id_idx_no_dedup  | 218 MB
 queries_db_id_idx_yes_dedup | 67 MB
(2 rows)

With deduplication, the new index is more than three times smaller! Smaller indexes are faster to load from disk, and take up less space in memory, meaning there's more room for your data.

One interesting note here is that the index entries point to row versions (as in, a row the way it exists in one specific MVCC state), not rows themselves, so this feature can improve index size even for unique indexes, where one would not expect any duplication to occur.

Note that deduplication is not possible in all cases (see above link for details), and that you will need to reindex before you can take advantage of it if upgrading via pg_upgrade.

Extended Statistics Improvements in Postgres 13

Postgres 10 introduced the concept of extended statistics. Postgres keeps some statistics about the "shape" of your data to ensure it can plan queries efficiently, but the statistics kept by default cannot track things like inter-column dependencies. Extended statistics were introduced to address that: These are database objects (like indexes) that you create manually with CREATE STATISTICS to give the query planner more information for more specific situations. These would be expensive for Postgres to determine automatically, but armed with an understanding of the semantics of your schema, you can provide that additional info. Used carefully, this can lead to massive performance improvements.

Postgres 13 brings a number of small but important improvements to extended statistics, including support for using them with OR clauses and in IN/ANY constant lists, allowing consideration of multiple extended statistics objects in planning a query, and support for setting a statistics target for extended statistics:

ALTER STATISTICS table_stx SET STATISTICS 1000;

Like with the regular statistics target, this is a trade-off between additional planning time (and longer ANALYZE runs), versus having more precise plans. We recommend using this in a targeted manner using EXPLAIN plans to confirm plan changes.

Parallel VACUUM & Better Support for Append-only Workloads

Postgres multi-version concurrency control means you need to run VACUUM regularly (usually you can rely on the autovacuum process, though it may need some tuning). In Postgres 13, one notable improvement is that multiple indexes for a single table can be vacuumed in parallel. This can lead to big performance improvements in VACUUM work. Parallel VACUUM is the default and can be controlled with the PARALLEL option:

VACUUM (PARALLEL 2, VERBOSE) queries;

INFO:  vacuuming "public.queries"
INFO:  launched 2 parallel vacuum workers for index vacuuming (planned: 2)
INFO:  scanned index "index_queries_on_database_id" to remove 1403418 row versions by parallel vacuum worker
DETAIL:  CPU: user: 0.98 s, system: 0.15 s, elapsed: 2.37 s
INFO:  scanned index "index_queries_on_last_occurred_at" to remove 1403418 row versions by parallel vacuum worker
DETAIL:  CPU: user: 0.88 s, system: 0.27 s, elapsed: 2.60 s
...

Parallel VACUUM occurs when the following is true:

Sufficient parallel workers are available, based on the system-wide limit set by max_parallel_maintenance_workers (defaults to 2)
There are multiple indexes on the table (one index can be processed by one worker at a time)
Index types support it (all built-in index types support parallelism to some extent)
The indexes are large enough to exceed min_parallel_index_scan_size (defaults to 512 kB)

Be aware that parallel VACUUM is currently not supported for autovacuum. This new feature is intended for use in manual VACUUM runs that need to complete quickly, such as when insufficient autovacuum tuning has lead to an imminent TXID wraparound, and you need to intervene to fix it.

On that note, an important autovacuum improvement in Postgres 13 is that the autovacuum background process can now be triggered by INSERT statements for append-only tables. The main purpose of VACUUM is to clean up old versions of updated and deleted rows, but it is also essential to set pages as all-visible for MVCC bookkeeping. All-visible pages allow index-only scans to avoid checking visibility status row-by-row, making them faster.

We make extensive use of append-only tables at pganalyze for our timeseries data, and this improvement will make our lives considerably easier, avoiding the occasional manual VACUUM run on these tables. This new behavior can be controlled by the autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor variables.

Incremental Sorting

Sorting data is a common database task, and Postgres has a number of features to avoid unnecessary work here. For example, if you have a B-Tree index on a column, and you query your table ordered by that column, it can just scan that index in order to get sorted data.

In Postgres 13, this is improved to handle partially sorted data. If you have an index on (a, b) (or the data is already sorted by (a, b) for another reason), and you issue a query to order by (a, b, c), Postgres understands that the input data is already partially sorted, and can avoid re-sorting the whole dataset. This is especially useful if you have a LIMIT in your query, since this can avoid even more work.

Monitoring

Monitoring improvements in Postgres 13 include more details on WAL usage, more options for logging your queries, and more information on query planning.

WAL Usage Stats

The write-ahead log (WAL) ensures your data stays consistent in the event of a crash, even mid-write. Consistency is a fundamental property of databases—it ensures your transaction either committed or did not commit; you don't have to worry about in-between states. But on a busy system, WAL writes can often be a bottleneck. To help diagnose this, Postgres 13 includes more information on WAL usage from your queries.

EXPLAIN now supports information about WAL records generated during execution:

EXPLAIN (ANALYZE, WAL) DELETE FROM users;

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Delete on users  (cost=0.00..5409.00 rows=100000 width=6) (actual time=108.910..108.911 rows=0 loops=1)
   WAL: records=100000 fpi=741 bytes=11425721
   ->  Seq Scan on users  (cost=0.00..5409.00 rows=100000 width=6) (actual time=8.519..51.850 rows=100000 loops=1)
 Planning Time: 6.083 ms
 Execution Time: 108.955 ms
(5 rows)

You can see that the WAL line includes the number of records generated, the number of full page images (fpi), and the number of WAL bytes generated. Only non-zero values are printed in the default text format.

This is also available in pg_stat_statements. For example, on our staging environment, here is what we ran to get the statement that produced the most WAL records:

SELECT query, calls, wal_records, wal_fpi, wal_bytes FROM pg_stat_statements
  ORDER BY wal_records DESC LIMIT 1;

-[ RECORD 1 ]---------------------------------------------------------------------------------------------
query       | CREATE TEMPORARY TABLE upsert_data (server_id uuid NOT NULL, backend_id uuid NOT NULL,
            | query_start timestamp NOT NULL, query_fingerprint bytea NOT NULL, query_text text NOT NULL)
calls       | 7974948
wal_records | 966960816
wal_fpi     | 1018412
wal_bytes   | 100086092097

Like many other values in pg_stat_statements, the wal_records, wal_fpi, and wal_bytes values here are cumulative since the last pg_stat_statements_reset call.

This info can help you identify your write-heavy queries and optimize as necessary. Note that write-heavy queries can also affect replication: If you see replication lag, you can use these new features to understand better which statements are causing it.

Better Statement Logging in Postgres 13

Settings like log_min_duration_statement are great to help you understand your slow queries, but how slow is slow? Is the reporting query that runs overnight slow compared to the 5s query that runs in the context of a web request? Is that 5s query, that runs once in a rarely-used endpoint, slow compared to a 100ms query that runs twenty times to load your home page?

Until now, log_min_duration_statement was one blunt tool for all these situations, but Postgres 13 brings some flexibility with sampling-based statement logging. You can set log_min_duration_sample to enable sampling, and then either set log_statement_sample_rate or log_transaction_sample_rate to control sampling.

Both of these settings work in a similar manner: they range from 0 to 1, and determine the chance that a statement will be randomly selected for logging. The former applies to individual statements, the latter determines logging for all statements in a transaction. If both log_min_duration_statement and log_min_duration_sample are set, the former should be a higher threshold that logs everything, and the latter can be a lower threshold that logs only occasionally.

Another great statement logging improvement is being able to log parameters for failed statements with log_parameter_max_length_on_error. Here's an example of setting this to -1 (unlimited) and trying to run SELECT pg_sleep($1) (with parameter $1 set to 3) on a connection with a statement_timeout of 1s:

2020-09-17 12:23:03.161 PDT [321349] maciek@maciek ERROR:  canceling statement due to statement timeout
2020-09-17 12:23:03.161 PDT [321349] maciek@maciek CONTEXT:  extended query with parameters: $1 = '3'
2020-09-17 12:23:03.161 PDT [321349] maciek@maciek STATEMENT:  select pg_sleep($1)

The timeout case is especially useful: Since both the query text and the parameters are now available in the logs, you could run EXPLAIN on any failed query to figure out what query plan caused it to hit the time-out (N.B.: you are not guaranteed to get the same plan that failed, but depending on your workload, the odds are pretty good).

More Planning Information

The usual culprit in slow queries is the query execution itself, but with a complex schema and an elaborate query, planning can take significant time as well. Postgres 13 introduces two new changes that make it easier to keep an eye on planning:

First, the BUFFERS option to EXPLAIN gives you more information on memory usage during query planning. Postgres manages memory for your data and indexes using a "buffer pool", and the BUFFERS option can show you which parts of your query are using that memory and how. The EXPLAIN documentation has some more details. New in Postgres 13 is the ability to see how buffers are used during query planning:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_class;

                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Seq Scan on pg_class  (cost=0.00..16.86 rows=386 width=265) (actual time=0.014..0.120 rows=390 loops=1)
   Buffers: shared hit=13
 Planning Time: 1.021 ms
   Buffers: shared hit=118
 Execution Time: 0.316 ms
(5 rows)

Second, pg_stat_statements will keep track of time spent planning if you enable the pg_stat_statements.track_planning setting:

SELECT query, plans, total_plan_time,
       min_plan_time, max_plan_time, mean_plan_time, stddev_plan_time
FROM   pg_stat_statements WHERE queryid = -7012080368802260371;

-[ RECORD 1 ]----+----------------------------------------------------------------------
query            | SELECT query, plans, total_plan_time,                                +
                 |        min_plan_time, max_plan_time, mean_plan_time, stddev_plan_time+
                 | FROM   pg_stat_statements WHERE queryid = $1
plans            | 1
total_plan_time  | 0.083102
min_plan_time    | 0.083102
max_plan_time    | 0.083102
mean_plan_time   | 0.083102
stddev_plan_time | 0

This is turned off by default due to performance overhead for certain workloads, but if you suspect planning time is an issue, it's definitely worth checking out. For more details on the performance regression, see this mailing list discussion; this is expected to be resolved in the future and the default may change.

Usability

Postgres 13 usability improvements include better documentation, better built-in UUID support, and some handy psql enhancements.

Glossary

TOAST? Tuple? Postmaster?

Any complex system will develop its own jargon, and Postgres is no exception. Some of it comes from the database field in general, some of it is Postgres-specific. Having dedicated language to talk precisely about specific technical concepts is very useful, but it can be confusing for newcomers.

Tuple

A collection of attributes in a fixed order. That order may be defined by the table (or other relation) where the tuple is contained, in which case the tuple is often called a row. It may also be defined by the structure of a result set, in which case it is sometimes called a record. - PostgreSQL Glossary

You are likely familiar with the terms above, but if you ever run across something you are unclear on, those and many others are now documented in a new glossary. And now that there's an established place to do so, we can look forward to other technical terms being added here in the future.

Better UUID Support

If you use UUIDs in your system (and you should consider it—they're pretty handy), you're probably pretty familiar with the uuid-ossp extension. The base uuid type is built in, but by default, there's no simple mechanism to automatically generate new ones. The uuid-ossp extension ships with Postgres, but must be enabled explicitly to create UUID-generation functions like the common uuid_generate_v4.

Postgres 13 ships with a gen_random_uuid function that is equivalent to uuid_generate_v4, but available by default. If you were only using uuid-ossp for that function, you no longer need the extension:

=> \dx
     List of installed extensions
 Name | Version | Schema | Description
------+---------+--------+-------------
(0 rows)

=> SELECT gen_random_uuid();
           gen_random_uuid
--------------------------------------
 07b45dae-e92e-4f91-8661-5fc0ef947d03
(1 row)

psql Improvements

There are a number of small psql improvements in Postgres 13. My favorite is that \e, the command to invoke your $EDITOR on the current query buffer, will now display the query text when you save and exit (unless you directly submit it by ending with a semicolon or \g). Previously, the query text was saved, but hidden. Compare opening your editor and saving SELECT 1 in psql 11:

maciek=# \e
maciek-# ;
 ?column? 
----------
        1
(1 row)

versus psql 13:

maciek=# \e
maciek=# select 1
maciek-# ;
 ?column? 
----------
        1
(1 row)

It's now clear what query text will be submitted when you complete your query.

Postgres 13 also includes additional ways to customize your psql prompt. You can do so, as always, with \set (typically in your .psqlrc), but there's a couple of new substitutions available:

%x will display the status of the current transaction: an empty string for no transaction, * when in an open transaction, ! when in a failed transaction, or ? when the transaction state is unknown (typically when there is no connection to the server)
%w will pad PROMPT2 (used when more input is expected) to be the same width as PROMPT1 to keep things nicely aligned

There are some other small improvements as well. And these are all client-side changes, so they will also work if you are using a new psql with an older server!

Conclusion

These are just a few of the many small improvements that come with Postgres 13. There are many others, like partial TOAST decompression, trusted extensions (so you can enable them without being superuser), PL/pgSQL performance improvements, and more. You can check out the full release notes on the Postgres web site.

We're very excited for this release. We already support monitoring Postgres 13 in pganalyze, and are already working on incorporating the new monitoring features directly into the product to give you better insights into your database.

Share this article: If you liked this article you might want to tweet it to your peers.

]]>

Using Postgres Row-Level Security in Python and Django

Josh Alletto — Thu, 13 Aug 2020 12:00:00 GMT

Postgres introduced row-level security in 2016 to give database administrators a way to limit the rows a user can access, adding an extra layer of data protection. What's nice about RLS is that if a user tries to select or alter a row they don't have access to, their query will return 0 rows, rather than throwing a permissions error. This way, a user can use select * from table_name, and they will only receive the rows they have access to with no knowledge of rows they don't.

Most examples of RLS limit row access by database user. This can be a powerful feature. In this article, we will have a look at how you can make this happen for your Django app. The problem most people run into when trying to implement row level security is that most web applications, including Django applications, connect to the database with a single user, which makes it hard to take advantage of row level security.

One way to get around this is to create a database user for each application user. We’ll start with just the database layer. We’ll build out our tables and create a couple of users, then write our first row level security policy to limit which rows those users can access. Once we have an understanding of how RLS works in Postgres, we’ll expand our project out into Django and see how we can handle working with policies and multiple database users in a web application.

By the way, if you are interested in using Row-Level Security in Ruby on Rails, we have a dedicated article for that here: Using Postgres Row-Level Security in Ruby on Rails.

How to use RLS at the database level
How to Use Postgres Row-Level Security in Django
Conclusion
About the Author

How to use RLS at the database level

Before we get to the Django side of things, let's take a look at how RLS works in Postgres. We'll keep it simple and say we are building an app to help our salespeople keep track of their clients, and we want to make sure no salesperson can access the clients of another salesperson. (These are very competitive, cutthroat salespeople).

First, let's set up our tables and populate them with some data:

CREATE TABLE salespeople (id serial primary key, name text);
CREATE TABLE clients (id serial primary key, name text, salesperson_id integer);

INSERT INTO salespeople (name) values ('Picard');
INSERT INTO salespeople (name) values ('Crusher');

INSERT INTO clients (name, salesperson_id) values ('client1', 1);
INSERT INTO clients (name, salesperson_id) values ('client2', 2);
INSERT INTO clients (name, salesperson_id) values ('client3', 2);

Now, we have two salespeople. Picard has one client, and Crusher has two clients.

Next, we are going to need some database users, one for each salesperson. Because two salespeople might share the same name, we are going to use the id to create Postgres users. We are also going to create a role called salespeople. This will be the role we grant permissions on, and all of our salespeople can inherit from it.

CREATE ROLE "1";
CREATE ROLE "2";
CREATE ROLE salespeople;

GRANT select, insert ON clients TO salespeople;
GRANT salespeople TO "1";
GRANT salespeople TO "2";

This setup will come in handy in the next section when we have to deal with Django's tables in addition to the ones we create for our models.

Now we are ready to set up RLS on our clients table. Our policy will limit access to the Postgres current_user so that they can only view rows where current_user matches salesperson_id.

ALTER TABLE clients ENABLE ROW LEVEL SECURITY;
CREATE POLICY salesperson_clients ON clients USING (salesperson_id::text = current_user);

When we create the policy, we give it a name, salesperson_clients, and enter the table we want to set the policy on, clients. Next, we define the policy. In this case, it is very simple: the salesperson_id on the table must be equal to the value of current_user. We have to convert the salesperson_id from an integer to text because our current_user must be a string (we can't create Postgres users with integers as names).

Right now, we are logged in as the postgres user.

SELECT session_user, current_user;

 session_user  | current_user  
---------------+---------------
 postgres      | postgres
(1 row)

If we query our clients table, we will be able to see all the rows because RLS policies do not apply to superusers.

SELECT * FROM clients;

 id |  name   | salesperson_id 
----+---------+----------------
  1 | client1 |              1
  2 | client2 |              2
  3 | client3 |              2
(3 rows)

But if we change the current user, we only get the rows that belong to that user.

SET ROLE "1";
SELECT session_user, current_user;

 session_user | current_user 
--------------+--------------
 postgres     | 1
(1 row)

SELECT * FROM clients;

 id |  name   | salesperson_id 
----+---------+----------------
  1 | client1 |              1
(1 row)

How to Use Postgres Row-Level Security in Django

Now, how can we translate this to a Django application?

First, we will need to create a database user for each app user we create. One way to accomplish this would be to override the save method on the Salesperson model, but this is a great opportunity to take advantage of Django signals , so we'll create a signal that creates the database user after a new salesperson is saved.

Next, we'll have to figure out how to switch to the correct user when a salesperson logs in. For this, we can use a middleware that gets the salesperson_id and sets the role in the database.

Models

Our models reflect exactly what we set up in our earlier database example. Here I chose to make Salesperson a proxy of Django's built-in User model, but this is not required.

from django.db import models
from django.contrib.auth.models import User

class Salesperson(User):
    class Meta:
        proxy = True
    
class Client(models.Model):
    name = models.CharField(max_length=50)
    Salesperson = models.ForeignKey(Employee, on_delete=models.CASCADE)

Django Signals: Creating Our Database User

We want to create a new database user every time a new salesperson record is created. We can use Django signals to execute some code after a new record is saved. If you're not familiar with signals, the Django docs on this topic are easy to understand. If this piqued your interest, this article goes into more detail.

Here is the code for the signal itself, but you'll have to reference the above article to get it registered in your app:

from .models import Salesperson
from django.db.models.signals import post_save
from django.db import connection

def create_db_user(sender, instance, created, **kwargs):
    if created:
        user_id = instance.id
        with connection.cursor() as cursor:
            cursor.execute(f'CREATE ROLE "{user_id}"')
            cursor.execute(f'GRANT salespeople TO "{user_id}"')

post_save.connect(create_db_user, sender=Salesperson)

The post_save signal can take a named argument created, which is a boolean. This avoids running the code every time we update the record and ensures it will only run when we create a new salesperson. From there, we can get the user id from the instance and use django.db.connection to run our SQL to create the role and grant permissions.

It's very important to note that if you want to use Django's built-in User model and the authentication that comes with it, you'll need to grant salesperson permissions on the django_admin_log and auth_user tables. That's why it's so helpful to have this parent role that all individual users inherit from.

Django Middleware: Setting Current User

Now, we can write a middleware to switch the database user to the current application user making the request.

from django.db import connection

class RlsMiddleware(object):
    def __init__ (self, get_response):
        self.get_response = get_response
        
    def __call__ (self, request):
        user_id = request.user.id
        with connection.cursor() as cursor:
            cursor.execute(f'SET ROLE "{user_id}" ')

        response = self.get_response(request)
        return response

We get the user id from the request object. After that, the code looks pretty similar to our signal. We use the Django db connection again to set the role to the corresponding database user, which should match the application user's id. Don't forget to register your middleware in settings.py.

Now we can use all of Django's built-in query methods while maintaining row-level security in Postgres. What is particularly cool is that, with the role set, all we need to do to get all of a salesperson's clients is call Client.objects.all(), and we can be sure that only the clients related to the salesperson will be returned. If a salesperson tries to query for a client that doesn't belong to them, they'll get zero results.

Conclusion

In this article we were able to create a simple but powerful row level security policy and, with the help of Django middleware and Django signals, implement the policy at the application level. We saw how to create database users each time we created a new application user, and looked at setting the database role to the correct user after log in, ensuring each application user only had access to the rows that belonged to them.

There are a few caveats here. For one, using the ids 1, 2, 3 is probably not a good idea in production. You'd want to set up some kind of UUID or some other identifier. Also, creating a new database user for every application user becomes hard to scale at a certain point. Row level security can be a useful tool for limiting access at the database level, and we just scratched the surface of what's possible.

Still, you should be sure RLS is the right solution for your application before trying to implement it. In particular, the performance implications of row-level security, and how the Postgres planner treats it for query plans should not be overlooked. This has been significantly improved in Postgres 10, but its still essential to monitor your Postgres query plans when using RLS.

In many cases, RLS is not needed, and you’ll be able to secure your data using the security measures already built into Django.

Share this article: If you liked this article you might want to tweet it to your peers.

About the Author

]]>

Postgres JSONB Fields in Django

Karl Hughes — Thu, 30 Jul 2020 12:00:00 GMT

I remember the first time I built user preferences into an app. At first, users just needed to be able to opt in or out of our weekly emails. "No big deal," I thought, "I'll just add a new field on the Users table." For a while, that was fine. A few weeks later, my boss asked me if we could let users opt into push notifications. Fine, that's just one more column on the database. Can't hurt, right?

You probably see where this is going.

Within months, my user table had 40 columns, and while Postgres can handle it, it gets pretty tricky for new devs to keep up with all of them. You can imagine it looked pretty similar to this settings screen of Quora.

Fortunately, there is rich support in Postgres for JSON fields, which can be very handy in situations like mine. Both JSON data types (json and jsonb) allow you to store entire objects or lists directly in your database. This means that you can store any number of user preferences in one column.

Why two types of Postgres JSON fields?
Querying JSONB data in Postgres
Django support for JSONB fields
Limitations of JSONB fields with Postgres and Django
Conclusion
About the author

Why two types of Postgres JSON fields?

JSON support in Postgres gives you the flexibility of a document store database like Mongo with the speed and structure of a relational database. JSON support is powerful, but because it comes in two types (json and jsonb), it's helpful to understand which is the right choice for your application. The json data type was added in Postgres 9.2 and enhanced in 9.3. This new data type allowed you to store JSON directly in your database and even query it. The problem was that json data was stored as a special kind of text field, so it was slow to query.

Postgres introduced jsonb in 9.4 to combat this issue. Unlike json fields, jsonb fields are stored in a binary structure rather than text strings. While this means that writes are slightly slower, querying from jsonb fields is significantly faster. It also allows you to index jsonb fields. This makes jsonb the preferred format for most JSON data stored in Postgres, and the typical choice for Django applications.

Querying JSONB data in Postgres

The query syntax for accessing JSON in Postgres is not typical SQL. You have to use the specific JSON operators and functions, so queries on JSON data look different from other Postgres queries.

For example, if you stored the following data in a Postgres table called profiles:

id	name	preferences
1	`Mike`	`{"sms": false, "daily_email": true, "weekly_email": true}`
2	`Lucy`	`{"sms": true, "daily_email": false, "weekly_email": false}`
3	`Harriet`	`{"sms": true, "daily_email": true, "weekly_email": true}`

And you wanted to query all users who have opted into your daily_email, you'd write a query like this:

select * from profiles
where (preferences->>'daily_email')::boolean = true;

Which would give you back the rows for Mike and Harriet. I'm pretty good with SQL, but using JSON operators always slows me down. Fortunately, Django offers support for JSONB fields, so you don't have to become an expert at querying JSON in Postgres.

Django support for JSONB fields

Since Django 1.9, the popular Python framework has supported jsonb and several other Postgres-specific fields. Native Django support means that creating jsonb fields, using them in your models, inserting data into them, and querying from them are all possible with Django's ORM. Let's take a look at how you can get started using jsonb with Django.

Creating JSONB fields using migrations

Django's Postgres module comes with several field classes that you can import and add to your models. If you want to use a JSON field, import the JSONField class and use it for your model's property. In this example, we'll call the field preferences:

from django.db import models
from django.contrib.postgres.fields import JSONField
 
class Profile(models.Model):
    name = models.CharField(max_length=200)
    preferences = JSONField()
 
    def __str__(self):
        return self.name

Django only supports the jsonb column type, so when you run your migrations, Django will create a table definition like this:

create table app_profile
(
    id serial not null constraint app_profile_pkey primary key,
    name varchar(200) not null,
    preferences jsonb not null
);

Adding data to JSONB fields

Because JSON fields don't enforce a particular schema, Django will convert any valid Python data type (dictionary, list, string, number, boolean) into the appropriate JSON. For example, if you want to add a new row to the app_profile table created above, you can run the following in your Django application:

from app.models import Profile
 
# Create a Profile with preferences
p = Profile(name="Tanner", preferences={'sms': False, 'daily_email': True, 'weekly_email': True})
p.save()

This will create a new user named Tanner who will receive our daily_email and weekly_email, but no sms messages.

Querying JSONB fields

Django uses the double underscore pattern from field lookups to query JSON object keys. For example, if you want to get all the Profiles for users who have opted into our daily_email, you'd use the following code:

results = Profile.objects.filter(preferences__daily_email=True)

If you want to check the SQL query that Django runs, you can print it from the query property on the results object:

print(results.query)
# Output:
SELECT "app_profile"."id", "app_profile"."name", "app_profile"."preferences" 
FROM "app_profile" WHERE ("app_profile"."preferences" -> daily_email) = 'true'

As you can see, the query is slightly different from the one I manually wrote above (I cast daily_email to a boolean), but it accomplishes the same thing. You can also filter records based on the keys they contain. For example, if some user accounts were created before you added the sms option, you might want to find them and let the users know about the new option. You can use the isnull field lookup on the sms key in your JSON data:

results = Profile.objects.filter(preferences__sms__isnull=True)

There are many other ways to filter queries using JSON fields, so be sure to check out the official Django docs for more.

Limitations of JSONB fields with Postgres and Django

It's worth noting that jsonb fields come with some drawbacks. I've already mentioned that it takes slightly longer to write data to jsonb fields than json because the JSON string must be converted to binary, but there are other reasons to avoid jsonb fields.

First, if your data needs to enforce a strict schema, JSON may not be an ideal choice. While you can use check constraints to enforce the use of specific fields, this isn't natively supported in Django, so you'll need to write your own migrations to accomplish this.

A better way to address this shortcoming is by writing Django validation rules to enforce the structure you want. If you don't want to write the validation rules yourself, there's a popular package called jsonschema that I'd recommend.

Another drawback to using JSON fields is handling changes to the shape of your data. If you want to add a new column to a database table in Postgres using Django, you simply update your model and run a migration. If you want to add a new field to a JSON column, it isn't quite as straightforward.

A pattern I've used before is to create a custom migration that loops through the affected records and updates each one individually. This naive method works for relatively small datasets, but it might not be a good idea if you need to update 1 million profiles in a production database. In that case, it might be better to write your code to handle the existence or absence of the key or run a batch update on the JSON object.

Conclusion

While JSON data types come with some drawbacks, they are useful in situations where you need more flexibility in your data structure. Thanks to Django's native support for jsonb, you can get started using JSON data in your web applications without learning all the native Postgres query operators.

Next time you need more flexibility in your data model and want to benefit from the strengths of Postgres give jsonb fields a try.

Share this article: If you liked this article you might want to tweet it to your peers.

About the author

Karl Hughes is a technology team leader and software engineer. He is currently the founder of Draft, where he helps create technical content for engineering blogs.

]]>

Building SVG Components in React

Maciek Sakrejda — Thu, 09 Jul 2020 12:00:00 GMT

React is well known as a great tool for building complex applications from HTML and CSS, but that same approach can also be used with SVG to build sophisticated custom UI elements.

In this article, we'll give a brief overview of SVG, when to use it (and when not to), and how to use it effectively in a React application. We'll also briefly touch on how to integrate with d3 (which comes in very useful when working with SVG).

We relied heavily on SVG to build the charting updates we launched recently in pganalyze (check out my blog post about these if you missed it), and we would like to share how we work with SVG in React. At the end of the post, we link to a simple but functional charting example based on our new charting code. We stripped it down to make it easier to follow, and we think it's a great introduction to building SVG components in React.

What is SVG?
When to use SVG
SVG in React
Handling Layout in SVG
Sizing in SVG
SVG and d3
Interactivity
Styling SVG
Embedding HTML in SVG
Full Example
Conclusion

What is SVG?

SVG is an XML-based vector graphics format. It is commonly used for icons and illustrations, but the similarities to HTML make it a great fit to extend your UI. Like HTML, SVG consists of a DOM tree of elements which can be styled with CSS, can be scripted and animated, and can dispatch events on user interaction. SVG is well-supported in modern browsers, including Firefox, Safari, Chrome, and Edge. All of these support embedding SVG directly in HTML, and React supports using SVG elements to build your components.

A thorough overview of SVG is beyond the scope of this post, but let's review the salient features in the context of building UI components. The actual elements available fall in a few different categories:

simple lines and shapes, like rect, circle, and line
more complex lines and shapes, like polygon and path (check out Joni Trythall's post on everything you can do with the path data attribute!)
text, like the simple text, the fancy textPath (essentially text along an arbitrary path), and the handy title (for simple tooltips similar to HTML's title attribute)
special elements to combine and manipulate these, like mask, clipPath, and pattern
other odds and ends, like the familiar anchor (a) from HTML (with the neat feature that it can conform to whatever shape it is wrapping, and only that part is interactive).

Mozilla's MDN has a good reference to all the element types available.

The event system is very similar to what you're already used to in HTML. Some events are different, but many familiar ones like onClick, onMouseEnter, onFocus, and onKeyUp are there. Registering event handlers is the same—svg elements expose onEvent attributes and you can add your callback there. If you're using TypeScript, note that you'll need to parameterize generic React synthetic event types with SVGElement or just Element instead of HTMLElement. E.g.:

const handleClick = (e: React.MouseEvent<Element>): void => {
  console.log('clicked', e.currentTarget)
}

When to use SVG

If you think SVG might be a good fit for some section of your UI, there's a good chance you're right, but you should consider your options. There are always trade-offs. If you don't go with SVG, your other likely options in a React app are going to be sticking with HTML, or using Canvas. You can do a lot with some plain divs and CSS, so HTML may be suitable for more than you think. That said, if you feel like your use case is pushing HTML to the breaking point (or at least into a cryptic forest of obscure tags and esoteric styling), maybe it's not the right fit. Remember Kernighan's Law:

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

On the other end of the spectrum there's Canvas. Its immediate mode paradigm means it can perform much better with huge datasets, but that also makes it awkward to work with in React, and harder to script rich interactivity. SVG (and HTML) have a retained mode model that's better suited to building UIs.

As a rule of thumb, if it's reasonable to stick with HTML, stick with HTML. If not, and you expect to work with a modest number of data points (the threshold will vary based on your UX needs and your performance expectations), SVG is a good bet. It will allow you to build these components in a manner similar to building HTML components, and to maintain a consistent look and feel with the rest of your app. Otherwise, consider Canvas or even WebGL.

SVG in React

The mechanics of using SVG elements in React are straightforward: Just write a standard component and return an SVG tag instead of an HTML tag. You only need to ensure you're nesting tags correctly and only putting SVG elements inside an <svg> tag (just as you should ensure you're not putting block-level elements in a <p>). There's no special class to extend, no extra options to handle.

Here is a trivial SVG component:

const Greeting = ({name}) => {
  return <text>hello {name}</text>;
}

This uses the SVG <text> element instead of the <span> you might expect in an HTML component, but as you can see, it's otherwise identical to an HTML component. To use this, just wrap it in an <svg> element:

const Main = () => {
  return (
    <svg>
      <g transform="translate(20,20)">
        <Greeting name="Maciek" />
      </g>
    </svg>
  )
}

(Don't worry about the <g> element for now; we'll cover that next.)

Handling Layout in SVG

A big difference between HTML and SVG is layout: In HTML, the normal layout flow positions elements on the page automatically according to a set of ~~simple~~ rules. In SVG, it's up to you to place each individual element exactly where it's supposed to go. There is no built-in positioning mechanism at all, and the order of the tags really only determines what gets drawn on top of what (like z-index in HTML; SVG has no explicit z-index.)

The mechanism for this positioning is a coordinate system that's standard in computer graphics: The origin is in the upper-left corner, and positive x and y values move elements to the right and down, respectively. You can think of it like an HTML document where all elements are position: absolute, and x and y are top and left. Here's an example:

As you can see, the x and y offsets mean slightly different things for different types of elements. For rect, it's the upper-left corner. For circle and ellipse, it's the center (in fact, circles and ellipses use cx and cy attributes instead of x and y to make this clearer). For text, it's a reference point, and the text's placement relative to that point is configurable via attributes like dominant-baseline and text-anchor (by default, the text starts at the x position, with the baseline at the y position.)

At first blush, this seems like it would make any non-trivial SVG component a nightmare to put together, but we can use some SVG features and some conventions to help us build complex modular components that work well together.

Let's assume we're working with a certain explicit width and height for our component. This can get tricky if you need your component to be responsive, but we'll hand-wave around it for now; we discuss that in more detail below.

In general, we've found a good approach for SVG components is to have parents size and position their children by subdividing the parent's own width and height. Parents determine each child's desired width and height and pass those as props. We could also pass x and y to have children position themselves, but SVG provides a handy element that simplifies this: the group (<g>). As its name suggests, the element is a way to apply a set of properties to a group of children. Most relevant for us is the transform attribute, specifically its translate value. This lets us establish a new origin local to the group, offset from the parent origin (which may be another <g>!) by the specified x and y values. This is perfect for positioning children, since you can easily do so in the parent. The children themselves can pretend they're positioned at the origin, so they only have to worry about their width and height, and what to render within that space. In fact, this pattern is so useful, we have a helper component to do this:

const Translate = ({x=0,y=0,children}) => {
  if (!x && !y) return children;
  return <g transform={`translate(${x},${y})`}>{children}</g>;
}

Even though subdividing space like this in SVG component hierarchies is a good rule of thumb, sometimes it may not be a good fit for some part of your UI. Since <g> is not a bounded container like a sized <div> with overflow: hidden, the width and height pattern is just metadata for children to follow as a guideline. If that pattern gets in the way, you can break the rules and have children draw outside these bounds (though the result may be harder to reason about).

Another useful pattern is to have components specify all the sizing and positioning metadata in constants at the top of the component:

export const Chart = ({ data }) => {
  const viewBoxWidth = 800;
  const viewBoxHeight = 400;
  const paddingX = 6;
  const paddingY = 4;
  const bottomAxisHeight = 30;
  const leftAxisWidth = 50;
  const bodyHeight = viewBoxHeight - bottomAxisHeight - 2 * paddingY;
  const bodyWidth = viewBoxWidth - leftAxisWidth - 2 * paddingX;
  const leftAxis = {
    pos: {
      x: paddingX,
      y: paddingY,
    },
    size: {
      width: leftAxisWidth,
      height: bodyHeight,
    },
  };
  const bottomAxis = {
    pos: {
      x: paddingX + leftAxisWidth,
      y: paddingY + bodyHeight,
    },
    size: {
      width: bodyWidth,
      height: bottomAxisHeight,
    },
  };
  const body = {
    pos: {
      x: leftAxis.pos.x + leftAxisWidth,
      y: paddingY,
    },
    size: {
      width: bodyWidth,
      height: bodyHeight,
    },
  };
  // chart logic code omitted
  return (
    <svg width="100%" height="400" viewBox={`0 0 ${viewBoxWidth} ${viewBoxHeight}`}>
      <Translate {...body.pos}>
        {/* chart body omitted */}
      </Translate>
      <Translate {...leftAxis.pos}>
        <LeftAxis {...leftAxis.size} /* other props omitted */ />
      </Translate>
      <Translate {...bottomAxis.pos}>
        <BottomAxis {...bottomAxis.size} /* other props omitted */ />
      </Translate>
    </svg>
  );
};

This may look tedious and verbose at first, but having the layout defined in one place, where it can be tweaked centrally and cross-referenced across child components, will make your life much easier as you inevitably adjust these. Plus, grouping related properties and using destructuring to apply them to children simplifies things a bit. It's worth the extra verbosity to avoid having to hunt for magic constants across a complex component, and updating all the different occurrences (while making sure you avoid updating constants for unrelated properties that may have the same value).

Sizing in SVG

In the layout discussion above, we assumed that an explicit width and height are provided to our SVG element. This is reasonable if you have a fixed-size element, but that means your element is not responsive or even resizable. Fortunately, there are two approaches we can take to work around this.

The first is that the coordinate system discussed above is a simplification. The actual mechanism is more complex (you can read a great overview from Sara Soueidan here), but the most relevant part for us is the viewBox attribute of the svg element. This defines the actual coordinate system to be used for layout inside the SVG element in terms of arbitrary units (if unset, this defaults to the actual width and height of the element). It also supports an x and y offset for the coordinate system (relative to the upper left of the element), but you can likely leave these as zero. The full syntax is

  viewBox="<xOffset> <yOffset> <width> <height>"

This lets us size our SVG component however we like (e.g., width="100%"), but still work in terms of subdividing a specific width and height inside the component. One thing to note is that font size will be relative to this viewBox coordinate system (that is, the size of the font will vary based on the ratio of viewBox coordinate system width to actual width). Another caveat is that if the aspect ratio of your viewBox width and height does not match the actual aspect ratio of the component, you'll probably want to tweak the preserveAspectRatio property. If set to "none", the coordinate system (and content) will stretch to fit the dimensions of the actual component. This will distort proportional width and height (so that, e.g., squares will no longer be square), but if that's not a concern in your component, this may be the simplest way to go.

Another approach is to measure your component before you draw anything, e.g., using a hook like useMeasure. This is more complicated (it requires pulling in a dependency or writing your own hook like this) and it delays rendering until the component is sized, but it allows you to work in the actual dimensions, ensuring no aspect ratio distortion.

SVG and d3

Whenever visualizing data in JavaScript, d3 is a great tool to consider. However, since both d3 and React have strong opinions about how to handle the DOM, getting them to play together nicely can be tricky. A good rule of thumb is to use d3 for layout and React for rendering.

In our pganalyze charts, we use d3 for scales, helpers for stacking area series data, bisectors for finding data points near the cursor, and for generating path data (the d attribute) for line and area charts. Almost everything else is plain React and SVG. Amelia Wattenberger's blog, linked above, has a separate post that's a great overview of the different d3 modules. Many of these are still useful when working with React, but the rendering-oriented ones may be more trouble than they're worth.

The one exception is that we do use d3 selection to take advantage of d3's axis convenience functions. They are isolated in their own Axis components, and we found it's okay to let d3 handle rendering as long as it's not competing with React. Since our Axis components have a simple interface and are rarely updated once a chart is mounted, we use the useLayoutEffect hook to have d3 render the axis via the helper function (and remove a previous render if there was one). Here's the code:

const BottomAxis = ({ scale, width }) => {
  const ref = useRef(null);
  useLayoutEffect(() => {
    const host = select(ref.current);
    host.select("g").remove();
    const axisGenerator = axisBottom(scale);
    const [start, end] = extent(scale.range());
    if (start == null || end == null) {
      return;
    }
    const pxPerTick = 80;
    const tickCount = Math.ceil((end - start) / pxPerTick);
    axisGenerator.ticks(tickCount);

    const group = host.append("g");
    group.call(axisGenerator);
  }, [scale, width]);

  return <g ref={ref} />;
};

We use useLayoutEffect instead of plain useEffect since we want to update the DOM with the new configuration before the browser "paints" the DOM updates. For more details on the differences, check out this overview from Kent Dodds.

Another tricky aspect of working with d3 in React is plugins. There are a number of great d3 plugins, but many of them don't really fit into "d3 for layout, React for rendering" paradigm, because they're designed around d3, not React. We used a couple of plugins in our old code, but we found they didn't fit our new approach, so we decided to remove them and reimplement their functionality in our own components. Having more control over these features was worth having to write some extra code. If you're considering using d3 plugins, think about how they will integrate with the rest of your code.

Interactivity

A killer feature of SVG is the similarity of the event model to plain HTML, making it easy to build complex interactive interfaces. However, when combined with React's component architecture, it's easy to cause unnecessary re-renders of SVG components. Unnecessary re-renders can be a much bigger problem than slow renders, because the former can happen much more often. If you're re-rendering a significant part of your component on mouse move events, for example, it will be hard to get that to perform well, no matter how fast the individual pieces render.

Fortunately, SVG has a great way to avoid unnecessary renders: you can separate rendering and interactivity concerns into two different layers. Because both layers are defined by the same data, it's fairly easy to keep them in sync. Think of it as two mirror universes. In one, the data and props determine what's drawn on screen, but nothing is interactive. In the other, no data is rendered, but mouse events (or touch or keyboard events) are captured and mapped back to data (d3's scale.invert is great here), which can then be used to display tooltips or respond to click events. In the UI, this feels like a single set of interactive elements, and it can avoid a lot of re-renders (especially for any hover behavior) and keep the UI snappy. We have a full example below, but think of it like this:

Note that the static data rendering only needs to happen once per new set of data—if you have a lot of data points, this can make a big difference. (Depending on how you design your component, you may need to use React.memo to avoid extra renders.)

Another pattern we adopted to improve both performance and UI is to map mouse events back to the data, and only respond if the mapped data changes. That is, let's say your cursor is at (20,10) and this maps to data point X. If you move to (21,10) but the closest data point is still X, the UI does not react (other than the mouse pointer itself moving, obviously). We don't move the tooltip (it's snapped to the nearest data point, not to the cursor, and is always at a fixed height) and there's no other UI changes. We found this less distracting in the UI (why move things around if nothing meaningful happened?), and it helps avoid tooltip re-renders.

An important part of interactivity is avoiding interactions with unwanted elements. For elements like tooltips and anything else that pops up near the cursor, setting the pointerEvents attribute to none will ensure these are not photobombing your pointer events. If you don't do this, and these elements show up under the cursor, they may force mouseLeave events on the component where you were previously tracking the mouse, forcing that element to hide them as soon as they show up. You should generally consider adding that attribute to anything non-interactive. It can be set on the <g> element as well.

Here is our Mouse component which tracks which data point we're hovering over (if any) and re-renders its children whenever that changes. It also takes a click callback to handle clicks on data points. Note that for flexibility, mapping from screen coordinates to data points happens with another callback provided as a prop:

export const Mouse = ({ width, height, onClick, children, toDataPoint }) => {
  const [hoverPt, setHoverPt] = useState(undefined);
  const handleMouseMove = (e) => {
    const mouse = getMouse(e, width, height);
    const newPt = toDataPoint(mouse);

    if (!pointsEqual(hoverPt, newPt)) {
      setHoverPt(newPt);
    }
  };
  const handleMouseLeave = () => {
    setHoverPt(undefined);
  };
  const handleMouseUp = () => {
    onClick && hoverPt && onClick(hoverPt);
  };

  return (
    <>
      <rect
        width={width}
        height={height}
        pointerEvents="all"
        fill="none"
        stroke="none"
        onMouseMove={handleMouseMove}
        onMouseLeave={handleMouseLeave}
        onMouseUp={handleMouseUp}
      />
      {children && children(hoverPt)}
    </>
  );
};

const getMouse = (e, width, height) => {
  const dims = e.currentTarget.getBoundingClientRect();
  const rawX = e.clientX - dims.left;
  const rawY = e.clientY - dims.top;
  const x = (rawX / dims.width) * width;
  const y = (rawY / dims.height) * height;
  return { x, y };
};

const pointsEqual = (p1, p2) => {
  return (!p1 && !p2) || (p1 && p2 && p1.x === p2.x && p1.y === p2.y);
};

You can then render anything that does depend on mouse position (like the tooltip) through the render prop pattern:

<Mouse {...body.size} onClick={handleClick} toDataPoint={mapToDataPoint}>
  {(pt) => {
    // N.B.: Tooltip just returns `null` if pt is `undefined`
    return <Tooltip point={pt} xScale={xScale} yScale={yScale} {...body.size} />;
  }}
</Mouse>

Styling SVG

SVG can be styled with CSS just like HTML, but note that many of the actual styles themselves are different: fill instead of background-color (and instead of color for text, somewhat confusingly), stroke-width instead of border-width, etc. Aside from that, familiar rules and selectors apply. Many styles can also be specified via element attributes, and that may be preferable if you need prop-level control over things like color or stroke width.

Embedding HTML in SVG

One of the lesser-known features of SVG is that you can embed HTML inside an SVG document with the foreignObject tag. This is very useful for elements like legends or tooltips that benefit from the more user-friendly text layout capabilities of HTML. You can use standard HTML CSS in these components, and even use React elements.

One tricky aspect of this is that foreignObject is a standard SVG element, so it needs to be sized explicitly (like any other element). This makes it hard to size things like tooltips: you may not know how much space the label or value to display may need. But let's revisit the concept of overlays we discussed earlier. The HTML component does not need to just be the visible tooltip: a div is transparent out of the box, so we can stack a transparent wrapper div in front of our other content, and lay the tooltip out within it. The tooltip can then size itself to fit the items contained therein. If you adjust tip positioning based on the position along the x axis, you have almost half the width of the graph to play with (if that's not enough, you should probably rethink your tooltips).

One other issue we found is that some browsers (most notably, Safari) run into rendering issues with foreignObject in some cases. This bug details the problem. The bug is eleven years old and has several duplicates, so it's probably not getting fixed soon, but we found that setting position: fixed on the top-most element in foreignObject worked around these issues (and has no other layout impact, since in this case, fixed will function just like the default static).

Here is our Tooltip component:

const Tooltip = ({ point, xScale, yScale, width, height }) => {
  if (!point) {
    return null;
  }
  const tipY = 50;

  const screenX = xScale(point.x);
  const screenY = yScale(point.y);
  const time = new Date(point.x).toLocaleString();
  const value = point.y.toFixed(3);
  const tipContent = (
    <>
      <div>
        <span className={styles.tooltipLabel}>Time</span>: {time}
      </div>
      <div>
        <span className={styles.tooltipLabel}>Value</span>: {value}
      </div>
    </>
  );

  const placeRight = screenX < width / 2;
  const tipOverlay: Layout = {
    size: {
      width: placeRight ? width - screenX : screenX,
      height,
    },
    pos: {
      x: placeRight ? screenX : 0,
      y: tipY,
    },
  };
  const tipStyles = [styles.tooltip, placeRight ? styles.tooltipRight : styles.tooltipLeft].join(" ");
  return (
    <g pointerEvents="none">
      <circle cx={screenX} cy={screenY} r={3} fill="none" stroke="blue" strokeWidth={3} />
      <Translate {...tipOverlay.pos}>
        <foreignObject {...tipOverlay.size}>
          <div className={styles.tooltipContainer}>
            <div className={tipStyles}>
              {tipContent}
            </div>
          </div>
        </foreignObject>
      </Translate>
      {/* line indicating hover point */}
      <line x1={screenX} y1={0} x2={screenX} y2={height} stroke="darkslategray" strokeWidth={1} />
    </g>
  );
};

And yes, you can embed SVG inside this embedded HTML, and then embed HTML again, ad infinitum, just for kicks.

Full Example

Here is a full working example pulling together many of the concepts discussed above:

You can check out the source here, though we reviewed most of it piece by piece in the various sections above.

Conclusion

SVG can be a great way to extend your app's UI, and works well out-of-the-box in React. We've found it invaluable in rebuilding the charting components in pganalyze, and we'll reach for it again whenever it seems like a good fit. If you'd like to see all we've discussed in action, applied to real world use cases, you can check out the charts in the pganalyze app.

]]>

Advanced Active Record: Using Subqueries in Rails

Leigh Halliday — Wed, 24 Jun 2020 12:00:00 GMT

Active Record provides a great balance between the ability to perform simple queries simply, and also the ability to access the raw SQL sometimes required to get our jobs done. In this article, we will see a number of real-life examples of business needs that may arise at our jobs.

They will come in the form of a request for data from someone else at the company, where we will first translate the request into SQL, and then into the Rails code necessary to find those records. We will be covering five different types of subqueries to help us find the requested data.

Working with Active Record in Rails
What are Subqueries in Rails
An Overview of our Data
The Where Subquery
- Where Not Exists
The Select Subquery
The From Subquery
The Having Subquery
Conclusion
You might also be interested in
About the author

Let's take a look at why subqueries matter:

In the first case, without subqueries, we are going to the database twice: First to get the average salary, and then again to get the result set. With a subquery, we can avoid the extra roundtrip, getting the result directly with a single query.

Working with Active Record in Rails

Active Record is a little like a walled garden. It protects us as developers (and our users) from the harsh realities of what lies beyond those walls: Differences in SQL between databases (MySQL, Postgres, SQLite), knowing how to properly escape strings to avoid SQL injection attacks, and generally providing an elegant abstraction to interact with our database using the language of our choice, Ruby.

But, SQL is extremely powerful! By understanding the SQL that Active Record is executing, we can open the gate in our walled garden to reach beyond what you may think is possible to accomplish in Rails, taking advantage of optimizations and flexibility that may be difficult to achieve otherwise.

What are Subqueries in Rails

In this article, we will be learning how to use subqueries in Active Record. Subqueries are what their name implies: A query within a query. We will look at how to embed subqueries into the SELECT, FROM, WHERE, and HAVING clauses of SQL, to meet the demands of our business counterparts who are asking to view data in different and interesting ways.

We'll be playing the role of a developer fielding questions from HR. They are asking for reports about our employees at BCE (Best Company Ever), and we'll do our best to find the data they need using Active Record.

The source code for this article is available on GitHub.

An Overview of our Data

Our database has 4 tables:

roles: The job roles of our employees (Finance, Engineering, Sales, HR, etc...)
employees: The people that work for BCE
performance_reviews: Performance reviews carried out by an employee's manager, giving them a score between 0 and 100
vacations: Keeping track of when employees have taken vacation

Using https://dbdiagram.io/ we're able to see how these tables relate to each other:

If you are following along, the rails db:seed command will generate 1,000 employees, 1,000 vacations, and 10,000 performance reviews.

The Where Subquery

Now that we have our data set and we’re ready to go let’s help our HR team with their first request:

Leigh, could you find us all the employees that make more than the average salary at BCE?

Here we will use a subquery within the WHERE clause to find the employees that match HR's request:

SELECT *
FROM employees
WHERE
  employees.salary > (
    SELECT avg(salary)
    FROM employees)

My first attempt at replicating the query above looked like this:

Employee.where('salary > :avg', avg: Employee.average(:salary))

But what it produced was two queries: One to find the average, and a second to query employees with a salary greater than that number. Not technically wrong, but it doesn't line up with the SQL we were going for. There is also a potential performance impact of two round-trip requests to the database server, along with potential inconsistencies if a new employee making $1B/year is hired between queries one and two. Although this is unlikely in this particular scenario, it’s something to consider as a potential risk.

-- find the average
SELECT AVG("employees"."salary") FROM "employees"
-- find the employees
SELECT "employees".* FROM "employees" WHERE (salary > 99306.4)

What we shouldn’t forget about Active Record is that certain methods, such as average(:salary), actually execute the query and return a result, while other methods implement Method Chaining, allowing you to chain multiple Active Record methods together, building up more complex SQL statements prior to their execution.

Employee.where('salary > (:avg)', avg: Employee.select('avg(salary)'))

This produces the SQL we want, but note that we had to wrap the placeholder condition :avg in brackets, because the database wants subqueries wrapped in brackets as well.

Because the seed data is generated randomly, your results will vary from mine, but I am seeing 487 matching employees, getting a result that looks like this:

#<ActiveRecord::Relation [#<Employee id: 4, role_id: 5, name: "Bob Williams", salary: 127053.0, created_at: "2020-04-26 18:42:53", updated_at: "2020-04-26 18:42:53">, #<Employee id: 5, role_id: 4, name: "Bob Florez", salary: 149218.0, created_at: "2020-04-26 18:42:53", updated_at: "2020-04-26 18:42:53">, ...]>

Where Not Exists

Leigh, we would like to encourage employees to have a healthy work-life balance, and were hoping you could provide us with a list of all the employees who have yet to take any vacation time.

For this case, NOT EXISTS is a perfect fit, since it only matches records that do not have a match in the subquery. An alternative is to perform a left outer join, only choosing the records with no matches on the right side. This is referred to as an anti-join, where the purpose of the join is to find records that do not have a matching record.

SELECT *
FROM employees
WHERE
  NOT EXISTS (
    SELECT 1
    FROM vacations
    WHERE vacations.employee_id = employees.id)

If you're interested in the LEFT OUTER JOIN equivalent, it might look like this:

SELECT employees.*
FROM
  employees
  LEFT OUTER JOIN vacations ON vacations.employee_id = employees.id
WHERE vacations.id IS NULL

The subquery depends on a match between the employees.id column and the vacations.employee_id column, making it a correlated subquery. Because Rails follows standard naming conventions when querying (the downcased plural form of our model), we can add the above condition into our subquery without too much difficulty.

Employee.where(
  'NOT EXISTS (:vacations)',
  vacations: Vacation.select('1').where('employees.id = vacations.employee_id')
)

Using my seed data, I am seeing 369 employees that have yet to take any vacations.

#<ActiveRecord::Relation [#<Employee id: 2, role_id: 2, name: "Alice Florez", salary: 86920.0, created_at: "2020-04-26 18:42:53", updated_at: "2020-04-26 18:42:53">, #<Employee id: 5, role_id: 4, name: "Bob Florez", salary: 149218.0, created_at: "2020-04-26 18:42:53", updated_at: "2020-04-26 18:42:53">, ...]>

The Select Subquery

Leigh, could you provide us with a list of employees, including the average salary of a BCE employee, and how much this employee's salary differs from the average?

SELECT
  *,
  (SELECT avg(salary)
    FROM employees) avg_salary,
  salary - (
    SELECT avg(salary)
    FROM employees) above_avg
FROM employees

Because the subquery is repeated, we can save ourselves a little bit of hassle by placing the subquery SQL into a variable that we'll embed into the outer query. The to_sql method is perfect for this, but it's also fantastic to peak into the SQL that Rails is producing without actually executing the query.

avg_sql = Employee.select('avg(salary)').to_sql

Employee.select(
  '*',
  "(#{avg_sql}) avg_salary",
  "salary - (#{avg_sql}) avg_difference"
)

This query does not limit the results in any way, but instead selects two additional columns (avg_salary and avg_difference). Looking at the first three results, I am seeing:

[
  {"id"=>1, "role_id"=>1, "name"=>"Joe Serna", "salary"=>86340.0, "avg_salary"=>99306.4, "avg_difference"=>-12966.399999999994}, 
  {"id"=>2, "role_id"=>2, "name"=>"Alice Florez", "salary"=>86920.0, "avg_salary"=>99306.4, "avg_difference"=>-12386.399999999994}, 
  {"id"=>3, "role_id"=>3, "name"=>"Amanda Florez", "salary"=>93600.0, "avg_salary"=>99306.4, "avg_difference"=>-5706.399999999994}
]

As with any SQL query, there are often many ways to arrive at the same result. In this example we used subqueries to find the average employee salary, but it may have been better to use window functions instead. They give us the same result, but provide a simpler query which is actually more performant as well. Even on a small dataset of 1000 employees, this query takes approximately 12ms vs 18ms for the subquery equivalent.

SELECT
  *,
  avg(salary) OVER () AS avg_salary,
  salary - avg(salary) OVER () AS avg_salary
FROM
  employees

The window function approach is actually easier to write in Rails as well!

Employee.select(
  '*',
  "avg(salary) OVER () avg_salary",
  "salary - avg(salary) OVER () avg_difference"
)

The From Subquery

Leigh, we'd like to know the average performance review score given across all our managers.

After clarifying with HR, they are looking to take the average score each manager has given, and then take the average of those averages. In other words, the average average. When you are dealing with an aggregate of aggregates, it needs to be accomplished in two steps. This can be done using a subquery as the FROM clause, essentially giving us a temporary table to then select from, allowing us to find the average of those averages.

SELECT avg(avg_score) reviewer_avg
FROM (
  SELECT reviewer_id, avg(score) avg_score
  FROM performance_reviews
  GROUP BY reviewer_id) reviewer_avgs

To keep our Ruby code clean, we'll place the subquery into a variable which can then be embedded into the main query.

from_sql =
  PerformanceReview.select(:reviewer_id, 'avg(score) avg_score').group(
    :reviewer_id
  ).to_sql

PerformanceReview.select('avg(avg_score) reviewer_avg').from(
  "(#{from_sql}) as reviewer_avgs"
).take.reviewer_avg

The result of this query is 50.652. This makes sense given that the seed data used a random value between 1 and 100 (rand(1..100)).

The Having Subquery

Leigh, certain reviewers are consistently giving low performance review scores. Could you find us a list of all the managers whose average score is 25% below our company average? We need to find out what is happening.

We will start by joining the employees table to the performance_reviews table where the employee is the reviewer (a manager), and then take their average score. Then we will filter out these managers using a HAVING clause to only include those whose score increased by 25% is still lower than the company average.

SELECT
  employees.*,
  avg(score) avg_score,
  (SELECT avg(score)
    FROM performance_reviews) company_avg
FROM
  employees
  INNER JOIN performance_reviews
    ON performance_reviews.reviewer_id = employees.id
GROUP BY employees.id
HAVING
  avg(score) < 0.75 *
    (SELECT avg(score)
    FROM performance_reviews)

You'll notice that I actually included two subqueries in the above SQL. Because the SQL was saved to a variable (avg_sql), we were able to reuse this both within the SELECT portion of the query, and also within the HAVING clause.

avg_sql = PerformanceReview.select('avg(score)').to_sql

Employee.joins(:employee_reviews).select(
  'employees.*',
  'avg(score) avg_score',
  "(#{avg_sql}) company_avg"
).group('employees.id').having("avg(score) < 0.75 * (#{avg_sql})")

The result of this query gives me 103 employees, and the first three of them look like:

[
  {"id"=>173, "role_id"=>1, "name"=>"Bob Williams", "salary"=>109206.0, "avg_score"=>23.75, "company_avg"=>50.04}, 
  {"id"=>390, "role_id"=>5, "name"=>"Bob Serna", "salary"=>127559.0, "avg_score"=>26.0, "company_avg"=>50.04}, 
  {"id"=>802, "role_id"=>4, "name"=>"Alice Halliday", "salary"=>94956.0, "avg_score"=>35.88, "company_avg"=>50.04}
]

Conclusion

In this article we were able to see a number of (somewhat) real-life examples of real business needs translating first into SQL, and then into the Rails code necessary to find those records. A backend developer's career will consist in most likely hundreds of similar requests!

Active Record gives us the ability to perform simple queries simply, but also lets us access the raw SQL which is sometimes required to get our jobs done. Subqueries are a perfect example of that, and we saw how to create subqueries in Rails and Active Record in the SELECT, FROM, WHERE, and HAVING clauses of an SQL statement. As we have seen in the examples above, with the expressiveness of Active Record, one doesn’t have to resort to writing completely in SQL to use a subquery.

Share this article: If you liked this article we’d appreciate it if you’d tweet it to your peers.

About the author

]]>

Full Text Search in Milliseconds with Rails and PostgreSQL

Leigh Halliday — Thu, 16 Apr 2020 12:00:00 GMT

Imagine the following scenario: You have a database full of job titles and descriptions, and you’re trying to find the best match. Typically you’d start by using an ILIKE expression, but this requires the search phrase to be an exact match. Then you might use trigrams, allowing spelling mistakes and inexact matches based on word similarity, but this makes it difficult to search using multiple words. What you really want to use is Full Text Search, providing the benefits of ILIKE and trigrams, with the added ability to easily search through large documents using natural language.

The Foundations of Full Text Search
Implementing Postgres Full Text Search in Rails
Configuring pg_search
Optimizing Full Text Search Queries in Rails
Conclusion
You might also be interested in
About the author

To summarize, here is a quick overview of popular built-in Postgres search options:

Postgres Feature	Typical Use Case	Can be indexed?	Performance
LIKE/ILIKE	Wildcard-style search for small data	Sometimes	Unpredictable
pg_trgm	Similarity search for names, etc	Yes (GIN/GIST)	Good
Full Text Search	Natural language search	Yes (GIN/GIST)	Good

In this article, we are going to learn about the inner workings of Full Text Search in Postgres and how to easily integrate Full Text Search into your Rails application using a fantastic gem named pg_search. We will learn how to search multiple columns at once, to give one column precedence over another, and how to optimize our Full Text Search implementation, taking a single query from 130ms to 7ms.

The full source code used in this article can be found here. Instructions on how to run this application locally and how to load the sample data referenced within this article can be found in the README.

If you are interested in efficient Full Text Search in Postgres with Django, you can read our article about it.

The Foundations of Full Text Search

Let's break down the basics of Full Text Search, defining and explaining some of the most common terms you'll run into. Taking the text “looking for the right words”, we can see how Postgres stores this data internally, using the to_tsvector function:

SELECT to_tsvector('english', 'looking for the right words');
-- 'look':1 'right':4 'word':5

In the above SQL we have some text; often referred to as a document when talking about Full Text Search. A document must be parsed and converted into a special data type called a tsvector, which we did using the function to_tsvector.

The tsvector data type is comprised of lexemes. Lexemes are normalized key words which were contained in the document that will be used when searching through it. In this case we used the english language dictionary to normalize the words, breaking them down to their root. This means that words became word, and looking became look, with very common words such as for and the being removed completely, to avoid false positives.

SELECT to_tsvector('english', 'looking for the right words') @@ to_tsquery('english', 'words');
-- TRUE

The @@ operator allows us to check if a query (data type tsquery) exists within a document (data type tsvector). Much like tsvector, tsquery is also normalized prior to searching the document for matches.

SELECT
  ts_rank(
    to_tsvector('looking for the right words'),
    to_tsquery('english', 'words')
   );
-- 0.06079271

The ts_rank function takes a tsvector and a tsquery, returning a number that can be used when sorting the matching records, allowing us to sort the results from highest to lowest ranking.

Now that you have seen a few examples, let’s have a look at one last one before getting to Rails. Following, you can see an example of a query which searches through the jobs table where we are storing the title and description of each job. Here we are searching for the words ruby and rails, grabbing the 3 highest ranking results.

SELECT
  id,
  title,
  ts_rank(
    to_tsvector('english', title) || to_tsvector('english', description),
    to_tsquery('english', 'ruby & rails')
  ) AS rank
FROM jobs
WHERE
  to_tsvector('english', title) || to_tsvector('english', description) @@
  to_tsquery('english', 'ruby & rails')
ORDER BY rank DESC
LIMIT 3

The highest ranking result is a job with the title "Ruby on Rails Developer"... perfect! The full results of this query are:

[
  {
    "id": 1,
    "title": "Ruby on Rails Developer",
    "rank": 0.40266925
  },
  {
    "id": 109,
    "title": "Senior Ruby Developer - Remote",
    "rank": 0.26552397
  },
  {
    "id": 151,
    "title": "Team-Lead Developer",
    "rank": 0.14533159
  }
]

This query is actually concatenating (using ||) two tsvector fields together. This allows us to search both the title and the description at the same time. Later, we'll see how to give additional weight (precedence) to the title column.

Implementing Postgres Full Text Search in Rails

With a basic understanding of Full Text Search under our belts, it's time to take our knowledge over to Rails. We will be using the pg_search Gem, which can be used in two ways:

Multi Search: Search across multiple models and return a single array of results. Imagine having three models: Product, Brand, and Review. Using Multi Search we could search across all of them at the same time, seeing a single set of search results. This would be perfect for adding federated search functionality to your app.
Search Scope: Search within a single model, but with greater flexibility.

We will be focusing on the Search Scope approach in this article, as it lets us dive into the configuration options available when working with Full Text Search in Rails. Let's add the Gem to our Gemfile and get started:

# Gemfile
gem 'pg_search', '~> 2.3', '>= 2.3.2'

With that done, we can include a module in our Job model, and define our first searchable field:

class Job < ApplicationRecord
  include PgSearch::Model
  pg_search_scope :search_title, against: :title
end

This adds a class level method to Job, allowing us to find jobs with the following line, which automatically returns them ranked from best match to worst.

Job.search_title('Ruby on Rails')

If we were to append to_sql to the above Ruby statement, we can see the SQL that is being generated. I have to warn you, it’s a bit messy, but that is because it handles not only searching, but also putting the results in the correct order using the ts_rank function.

SELECT
  "jobs".*
FROM
  "jobs"
  INNER JOIN (
    SELECT
      "jobs"."id" AS pg_search_id,
      (ts_rank((to_tsvector('simple', coalesce("jobs"."title"::text, ''))), (to_tsquery('simple', ''' ' || 'Ruby' || ' ''') && to_tsquery('simple', ''' ' || 'on' || ' ''') && to_tsquery('simple', ''' ' || 'Rails' || ' ''')), 0)) AS rank
    FROM
      "jobs"
    WHERE ((to_tsvector('simple', coalesce("jobs"."title"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'Ruby' || ' ''') && to_tsquery('simple', ''' ' || 'on' || ' ''') && to_tsquery('simple', ''' ' || 'Rails' || ' ''')))) AS pg_search_5d9a17cb70b9733aadc073 ON "jobs"."id" = pg_search_5d9a17cb70b9733aadc073.pg_search_id
ORDER BY
  pg_search_5d9a17cb70b9733aadc073.rank DESC,
  "jobs"."id" ASC

Configuring pg_search

There are a number of ways you can configure pg_search: From support for prefixes and negation, to specifying which language dictionary to use when normalizing the document, as well as adding multiple, weighted columns.

By default pg_search uses the simple dictionary, which does zero normalization, but if we wanted to normalize our document using the english dictionary, searching across both the title and description, it would look like:

class Job < ApplicationRecord
  include PgSearch::Model
  pg_search_scope :search_job,
                  against: %i[title description],
                  using: { tsearch: { dictionary: 'english' } }
end

We can perform a search in the same way we did before: Job.search_job("Ruby on Rails"). If we wanted to give higher precedence to the title column, we can add weighting scores to each of the columns, with possible values of: A, B, C, D.

class Job < ApplicationRecord
  include PgSearch::Model
  pg_search_scope :search_job,
                  against: { title: 'A', description: 'B' },
                  using: { tsearch: { dictionary: 'english' } }
end

When you start combining columns, weighting them, and choosing which dictionary provides the best results, it really comes down to trial and error. Play around with it, try some queries and see if the results you get back match with what you are expecting!

Optimizing Full Text Search Queries in Rails

We have a problem! The query that is produced by Job.search_job("Ruby on Rails") takes an astounding 130ms. That may not seem like such a large number, but it is astounding because there are only 145 records in my database. Imagine if there were thousands! The majority of time is spent in the to_tsvector function. We can verify this by running this streamlined query below, which takes almost as much time to execute as the full query which actually finds the matching jobs:

SELECT to_tsvector('english', description) FROM jobs;
-- ~130ms

SELECT description FROM jobs;
-- ~15ms

This tells me that the slowness is in re-parsing and normalizing the document into a tsvector data type every single time the query is executed. The folks at thoughtbot have a great article about Full Text Search optimizations, where they add a pre-calculated tsvector column, keeping it up-to-date with triggers. This is great because it allows us to avoid re-parsing our document for every query and also lets us index this column!

There is a similar but slightly different approach I want to cover today which I learned by reading through the Postgres documentation. It also involves adding a pre-calculated tsvector column, but is done using a stored generated column. This means we don't need any triggers! It should be noted that this approach is only available in Postgres 12 and above. If you are using version 11 or earlier, the approach in the thoughtbot article is probably still the best one.

As we are venturing into the territory of more custom Postgres functionality, not easily supported by the Rails schema file in Ruby, we'll want to switch the schema format from :ruby to :sql. This line can be added to the application.rb file:

config.active_record.schema_format = :sql

Now, let's generate a migration to add a new column to the jobs table which will be automatically generated based on the setweight and to_tsvector functions:

class AddSearchableColumnToJobs < ActiveRecord::Migration[6.0]
  def up
    execute <<-SQL
      ALTER TABLE jobs
      ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description,'')), 'B')
      ) STORED;
    SQL
  end

  def down
    remove_column :jobs, :searchable
  end
end

Note that, as of the writing of this article, Postgres always requires a generated column to be a “Stored” column. That means it actually occupies space in your table and gets written on each INSERT/UPDATE. This also means that when you add a generated column to a table, it will require a rewrite of the table to actually set the values for all existing rows. This may block other operations on your database.

With our tsvector column added (which is giving precedence to the title over the description, is using the english dictionary, and is coalescing null values into empty strings), we're ready to add an index to it. Either GIN or GiST indexes can be used to speed up full text searches, but Postgres recommends GIN as the preferred index due to GiST searches being lossy, which may produce false matches. We'll add it concurrently to avoid locking issues when adding an index to large tables.

class AddIndexToSearchableJobs < ActiveRecord::Migration[6.0]
  disable_ddl_transaction!

  def change
    add_index :jobs, :searchable, using: :gin, algorithm: :concurrently
  end
end

The last thing we need to do is to tell pg_search to use our tsvector searchable column, rather than re-parsing the title and description fields each time. This is done by adding the tsvector_column option to tsearch:

class Job < ApplicationRecord
  include PgSearch::Model
  pg_search_scope :search_job,
                  against: { title: 'A', description: 'B' },
                  using: {
                    tsearch: {
                      dictionary: 'english', tsvector_column: 'searchable'
                    }
                  }
end

With this optimization done, we have gone from around 130ms to 7ms per query... not bad at all!

Conclusion

Let’s have a look at a real-life data set. We can prove the precision of our approach by looking at my database: Out of 145 jobs pulled from the GitHub and Hacker News job APIs, searching for "Ruby on Rails" returns the following results:

[
  "Ruby on Rails Developer",
  "Senior Ruby Developer - Remote",
  "Gobble (YC W14) – Senior Full Stack Software Engineers – Toronto, On",
  "DevOps (Remote - Europe)",
  "CareRev (YC S16) Is Hiring a Senior Back End Engineer in Los Angeles",
  "Software Engineer, Full Stack (Rails, React)",
  "Software Engineer",
  "Technology Solutions Developer"
]

To summarize:

We have shown how to use Postgres' Full Text Search within Rails and also how to customize it both in terms of functionality, but also in terms of performance. We ended up with a performant and flexible solution right inside the database we were already using.

Many use cases for Full Text Search can be implemented directly inside Postgres, avoiding the need to install and maintain additional services such as Elasticsearch.

If you find this article useful and want to share it with your peers you can tweet about it here.

About the author

]]>

Effectively Using Materialized Views in Ruby on Rails

Leigh Halliday — Thu, 16 Jan 2020 12:00:00 GMT

It's every developer's nightmare: SQL queries that get large and unwieldy. This can happen fairly quickly with the addition of multiple joins, a subquery and some complicated filtering logic. I have personally seen queries grow to nearly one hundred lines long in both the financial services and health industries.

Luckily Postgres provides two ways to encapsulate large queries: Views and Materialized Views. In this article, we will cover in detail how to utilize both views and materialized views within Ruby on Rails, and we can even take a look at creating and modifying them with database migrations.

Our example will be a real-world sized dataset of hockey teams and their top scorers. If you'd like to follow along, the source code covered in this article can be found here.

What is a view?
What makes a view materialized?
Creating a materialized view
Utilizing a materialized view
Refreshing a materialized view
When to use views vs. materialized views?
Migrating views
Testing with materialized views
Conclusion
About the Author

We'll also talk a bit about the performance benefits that a Materialized View can bring to your application:

What is a view?

A view allows us to query against the result of another query, providing a powerful way of abstracting away a complex query full of joins, conditions, groupings, and any other clause that can be added to an SQL query. Looking at the query below, it isn't overly complex, but it does include 3 joins, grouping by a number of fields to aggregate the numbers of goals scored for a player each season.

It takes approximately 450ms to execute on my computer. I am using seed data that generates 31 teams, each playing 200 games in a season, scoring 20 goals per game... a little unrealistic, but I wanted the dataset used to be substantial!

SELECT
  players.name AS player_name,
  players.id AS player_id,
  players.position AS player_position,
  matches.season AS season,
  teams.name AS team_name,
  teams.id AS team_id,
  count(goals.id) AS goal_count
FROM goals
  INNER JOIN players ON (goals.player_id = players.id)
  INNER JOIN matches ON (goals.match_id = matches.id)
  INNER JOIN teams ON (goals.team_id = teams.id)
GROUP BY players.id, teams.id, matches.season

A view allows us to take the final result of this query, and query against that as if it were any other table. You can see why views can come in handy in many different scenarios. They allow for the succinct abstraction of a complicated query, and allow us to re-use this logic in a simple to understand way.

Now, we could make a new view by running CREATE VIEW in Postgres. But, as we all know, one-off schema changes are hard to keep track of. Instead, let's try something thats closer to how Rails does things. How does that look like? First things first, we'll create a view using Scenic. Scenic gives us the ability to define migrations that create, update, or drop views, just as you're used to doing with regular tables in Rails.

rails g scenic:view top_scorers

This will generate two files. The first is named db/views/top_scorers_v01.sql, and in it we will paste the SQL for the underlying query (from above). The second is db/migrate/[date]_create_top_scorers.rb, and this is where the migration will live to migrate/rollback the creation of our view:

class CreateTopScorers < ActiveRecord::Migration[6.0]
  def change
    create_view :top_scorers
  end
end

With the view in place, we can now query against it. This query takes approximately 50ms to execute.

SELECT *
FROM top_scorers
WHERE
  team_name = 'Toronto Maple Leafs'
ORDER BY goal_count DESC

By creating a model in Rails, we can interact with it much like we would be able to with a typical model which is backed by a table. First things first, let's define the model, letting Rails know it is read-only. Whilst some views can be updated, this view contains a top-level GROUP BY clause and thus can't be updated.

# app/models/top_scorer.rb
class TopScorer < ApplicationRecord
  def readonly?
    true
  end
end

Now we can perform the same SQL query using the TopScorer model:

TopScorer.where(team_name: 'Toronto Maple Leafs').order(goal_count: :desc)

What makes a view materialized?

A regular view still performs the underlying query which defined it. It will only be as efficient as its underlying query is. This means, if the larger query discussed above takes 450ms to execute, executing SELECT * FROM top_scorers will also take 450ms.

Materialized views take regular views to the next level, though they aren't without their drawbacks. The difference is that they save the result of the original query to a cached/temporary table. When you query a materialized view, you aren't querying the source data, rather the cached result.

This can provide serious performance benefits, especially considering you can index materialized views. But, when the underlying data from the source tables is updated, the materialized view becomes out of date, serving up an older cached version of the data. We can resolve this by refreshing the materialized view, which we'll get to in a bit.

Creating a materialized view

Just like we saw with our regular view, materialized views begin the same way, by executing a command to generate a new view migration: rails g scenic:view mat_top_scorers. This produces two files, the first of which contains the SQL to produce the underlying view of the data. The difference is in the migration, passing in materialized: true to the create_view method. Also notice that we are able to add indexes to the materialized view.

class CreateMatTopScorers < ActiveRecord::Migration[6.0]
  def change
    create_view :mat_top_scorers, materialized: true

    add_index :mat_top_scorers, :player_name
    add_index :mat_top_scorers, :player_id
    add_index :mat_top_scorers, :team_name
    add_index :mat_top_scorers, :team_id
    add_index :mat_top_scorers, :season
  end
end

Utilizing a materialized view

Like a regular view, we are able to define an ActiveRecord model that can query it. Also notice that we can define relationships which point to other ActiveRecord models. If you didn't know, you might not even realize it is pointing to a materialized view, except for the readonly? method which was defined.

class MatTopScorer < ApplicationRecord
  belongs_to :player
  belongs_to :team
  belongs_to :match

  def self.top_scorer_for_season(season)
    where(season: season).order(goal_count: :desc).first
  end

  def readonly?
    true
  end
end

Let's take our new materialized view for a spin! Running the query select * from mat_top_scorers, which took 450ms as a view, takes 5ms as a materialized view, 90x faster! The ruby code below, which took 50ms as a view, takes under 1ms to execute!

MatTopScorer.where(team_name: 'Toronto Maple Leafs').order(goal_count: :desc)

For a side-by-side comparison, this performs the same query on both views:

irb(main):001:0> TopScorer.where(team_name: 'Toronto Maple Leafs').count
   (60.2ms)  SELECT COUNT(*) FROM "top_scorers" WHERE "top_scorers"."team_name" = $1  [["team_name", "Toronto Maple Leafs"]]
=> 30

irb(main):002:0> MatTopScorer.where(team_name: 'Toronto Maple Leafs').count
   (1.3ms)  SELECT COUNT(*) FROM "mat_top_scorers" WHERE "mat_top_scorers"."team_name" = $1  [["team_name", "Toronto Maple Leafs"]]
=> 30

Refreshing a materialized view

As mentioned previously, materialized views cache the underlying query's result to a temporary table. This is what gives us the speed improvements and the ability to add indexes. The downside is that we have to control when the cache is refreshed. Modifying the MatTopScorer model, let's add a refresh method that can be called any time the data is to be refreshed. You will need to figure out how often it makes sense to update the data for your specific use-case, depending on how often the data is changing and how quickly those changes need to be reflected to the end user.

class MatTopScorer < ApplicationRecord
  belongs_to :player
  belongs_to :team
  belongs_to :match

  def self.refresh
    Scenic.database.refresh_materialized_view(table_name, concurrently: false, cascade: false)
  end

  def self.top_scorer_for_season(season)
    where(season: season).order(goal_count: :desc).first
  end

  def readonly?
    true
  end
end

To schedule the refresh, I like to use the whenever gem. Let's call a rake task to refresh the materialized view every hour:

# config/schedule.rb
every 1.hour do
  rake "refreshers:mat_top_scorers"
end

The rake task is simple, only calling the refresh method defined on the MatTopScorer model.

# lib/tasks/refreshers.rake
namespace :refreshers do
  desc "Refresh materialized view for top scorers"
  task mat_top_scorers: :environment do
    MatTopScorer.refresh
  end
end

When to use views vs. materialized views?

Views focus on abstracting away complexity and encouraging reuse. Views allow you to interact with the result of a query as if it were a table itself, but they do not provide a performance benefit, as the underlying query is still executed, perfect for sharing logic but still having real-time access to the source data.

Materialized Views are related to views, but go a step further. You get all the abstraction and reuse of a view, but the underlying data is cached, providing serious performance benefits. Materialized views are especially useful for - for example - reporting dashboards because they can be indexed to allow for performant filtering.

If the purpose of the view is to provide a cleaner interface to complicated joins and query logic, and performance isn't too much of an issue, by all means stick with a regular view. Views have the advantage of always being real-time, since they simply reference the real underlying data rather than a cached copy of it.

If your purpose is to provide a cleaner interface in addition to performance improvements, and you can live with the data being not quite real-time, then creating it as a materialized view can provide some great benefits.

Migrating views

It's easy to migrate views in Scenic. Views are versioned by default in Scenic and generating a view with the same name will create a v2, providing two files, just like it did the first time we generated a view (earlier in this article). Likewise, Scenic also provides a way to drop a view.

Testing with materialized views

Views and materialized views aren't particularly challenging to test, but it does require remembering that both types of views don't contain any original data in and of themselves, they are either a live view of an underlying query, or a cached view of an underlying query, as in the case of materialized views.

Let's see how we would populate and then test our MatTopScorer model in RSpec and factory_bot.

After creating some test data using factory_bot, we'll call a method which is supposed to return the top scorer for a given season. It returns nil, and that is expected. The underlying data exists, but because materialized views must be refreshed, something we haven't done yet, there is no data to be found.

After calling MatTopScorer.refresh, we're now able to retrieve the expected result.

RSpec.describe MatTopScorer, type: :model do
  describe "#top_scorer_for_season" do
    it "finds top scorer" do
      # create some data using factory_bot helper methods
      match = create(:match)
      player = create(:player)
      goal = create(:goal, match: match, player: player)

      # without any data in materialized view
      expect(MatTopScorer.top_scorer_for_season(match.season)).to eq(nil)

      MatTopScorer.refresh

      # with data in materialized view
      top_scorer = MatTopScorer.top_scorer_for_season(match.season)
      expect(top_scorer).to be_present
      expect(top_scorer.player).to eq(player)
      expect(top_scorer.goal_count).to eq(1)
    end
  end
end

Conclusion

With the help of Scenic, using views and materialized views feels right at home in Rails. Truthfully, I haven't used views as much as I have used materialized views. In particular, I've found materialized views incredibly useful when building searchable reporting dashboards.

The ability to group and summarize data by geographic region, category, grouped by date, in combination with adding the correct indexes has provided an efficient way to report on large amounts of data without relying on external reporting systems or causing excessive load on the production database.

Share this article: If you liked this article we’d appreciate it if you’d tweet it to your peers.

About the Author

]]>

Similarity in Postgres and Rails using Trigrams

Leigh Halliday — Tue, 19 Nov 2019 12:00:00 GMT

You typed "postgras", did you mean "postgres"?

Use the best tool for the job. It seems like solid advice, but there's something to say about keeping things simple. There is a training and maintenance cost that comes with supporting an ever growing number of tools. It may be better advice to use an existing tool that works well, although not perfect, until it hurts. It all depends on your specific case.

Postgres is an amazing relational database, and it supports more features than you might initially think! It has full text search, JSON documents, and support for similarity matching through its pg_trgm module.

What are Trigrams?
- Postgres Trigram example
- Ruby Trigram example
Using Trigrams in Rails
Showing the closest matches for a term based on its similarity
Conclusion
About the Author

Today, we will break down how to use pg_trgm for a light-weight, built-in similarity matcher. Why are we doing this? Well, before reaching for a tool purpose-built for search such as Elasticsearch, potentially complicating development by adding another tool to your development stack, it's worth seeing if Postgres suits your application's needs! You may be surprised!

In this article, we will look at how it works under the covers, and how to use it efficiently in your Rails app.

What are Trigrams?

Trigrams, a subset of n-grams, break text down into groups of three consecutive letters. Let's see an example: postgres. It is made up of six groups: pos, ost, stg, tgr, gre, res.

This process of breaking a piece of text into smaller groups allows you to compare the groups of one word to the groups of another word. Knowing how many groups are shared between the two words allows you to make a comparison between them based on how similar their groups are.

Postgres Trigram example

Postgres' pg_trgm module comes with a number of functions and operators to compare strings. We'll look at the show_trgm and similarity functions, along with the % operator below:

select
  show_trgm('postgras') as tri1, -- {"  p"," po","as ",gra,ost,pos,ras,stg,tgr}
  show_trgm('postgres') as tri2, -- {"  p"," po","es ",gre,ost,pos,res,stg,tgr}
  similarity('postgras','postgres'), -- 0.5
  'postgras' % 'postgres' -- TRUE

The show_trgm function isn't one you'd necessarily use day-to-day, but it's good to see how Postgres breaks a string down into trigrams. You'll notice something interesting here, that two spaces are added to the beginning of the string, and a single space is added to the end.

This is done for a couple of reasons:

The first reason is that it allows trigram calculations on words with less than three characters, such as Hi.

Secondly, it ensures the first and last characters are not overly de-emphasized for comparisons. If we used only strict triplets, the first and last letters in longer words would each occur in only a single group: with padding they occur in three (for the first letter) and two (for the last). The last letter is less important for matching, which means that postgres and postgrez are more similar than postgres and postgras, even though they are both off by a single character.

The similarity function compares the trigrams from two strings and outputs a similarity number between 1 and 0. 1 means a perfect match, and 0 means no shared trigrams.

Lastly, we have the % operator, which gives you a boolean of whether two strings are similar. By default, Postgres uses the number 0.3 when making this decision, but you can always update this setting.

Ruby Trigram example

You don't need to know how to build a trigram in order to use them in Postgres, but it doesn't hurt to dive deeper and expand your knowledge. Let's take a look at how to implement something similar ourselves in Ruby.

The first method will take a string, and output an array of trigrams, adding two spaces to the front, and one to the back of the original string, just like Postgres does.

def trigram(word)
  return [] if word.strip == ""

  parts = []
  padded = "  #{word} ".downcase
  padded.chars.each_cons(3) { |w| parts << w.join }
  parts
end

p trigram("postgras")
# ["  p", " po", "pos", "ost", "stg", "tgr", "gra", "ras", "as "]
p trigram("postgres")
# ["  p", " po", "pos", "ost", "stg", "tgr", "gre", "res", "es "]

Next up, we'll compare the trigrams from our two words together, giving a ratio of how similar they are:

def similarity(word1, word2)
  tri1 = trigram(word1)
  tri2 = trigram(word2)

  return 0.0 if [tri1, tri2].any? { |arr| arr.size == 0 }

  # Find number of trigrams shared between them
  same_size = (tri1 & tri2).size
  # Find unique total trigrams in both arrays
  all_size = (tri1 | tri2).size

  same_size.to_f / all_size
end

p similarity("postgras", "postgres")
# 0.5

Now that we have our similarity calculator, we can implement a simple similar? method, which checks if the similarity is above the threshold of 0.3:

def similar?(word1, word2)
  similarity(word1, word2) >= 0.3
end

p similar?("postgras", "postgres")
# true

Using Trigrams in Rails

There aren't too many gotchas in order to use these similarity functions and operators within your Rails app, but there are a couple!

Below we have a migration to create a cities table. When indexing the name column, to ensure that querying with the similarity operator stays fast, we'll need to ensure that we use either a gin or gist index. We do this by indicating using: :gin. In addition to that, we have to pass the opclass option opclass: :gin_trgm_ops, so it knows which type of gin index to create.

Unless you have already enabled the pg_trgm extension, you will most likely receive an error, but this is easily fixed by adding enable_extension :pg_trgm to your migration.

class CreateCities < ActiveRecord::Migration[6.0]
  def change
    enable_extension :pg_trgm

    create_table :cities do |t|
      t.string :name, null: false
      t.timestamps
      t.index :name, opclass: :gin_trgm_ops, using: :gin
    end
  end
end

Now that we have the pg_trgm extension enabled, and have correctly indexed the table, we can use the similarity operator % inside of our where clauses, such as in the scope below:

class City < ApplicationRecord
  scope :name_similar, ->(name) { where("name % :name", name: name) }
end

City.name_similar("Torono").count
# SELECT COUNT(*) FROM "cities" WHERE (name % 'Torono')

Showing the closest matches for a term based on its similarity

We may not want to only limit by similarity using the % operator, but also order the results from most similar to least similar. Take the example query and its result below:

select name, similarity(name, 'Dease Lake')
from cities
where name % 'Dease Lake'
order by 2 desc

This query finds cities which have a name similar to Dease Lake, but you can see that we actually get seven results back, though we can clearly see that there was an exact match. Ideally then, we wouldn't just limit our query by similarity, but put it in the correct order as well.

Dease Lake  1
Deer Lake   0.5
Lake Louise 0.375
Lynn Lake   0.33333334
Red Lake    0.33333334
Cat Lake    0.33333334
Baker Lake  0.3125

We can do this by updating our scope to order by similarity. We have to be careful about this, because in order to use the similarity function, we need to pass in the user input of 'Dease Lake'. To avoid SQL injection attacks and to ensure safe string quoting, we'll use the quote_string method from ActiveRecord::Base.

class City < ApplicationRecord
  scope :name_similar, ->(name) {
    quoted_name = ActiveRecord::Base.connection.quote_string(name)
    where("name % :name", name: name).
      order(Arel.sql("similarity(name, '#{quoted_name}') DESC"))
  }
end

Now when we use the name_similar scope, the result will be ordered with the most similar city first, allowing us to find Dease Lake:

City.name_similar("Dease Lake").first.name
# => Dease Lake

And the SQL produced looks like:

SELECT "cities".*
FROM "cities"
WHERE (name % 'Dease Lake')
ORDER BY similarity(name, 'Dease Lake') DESC
LIMIT $1

Conclusion

In this article, we took a dive into the pg_trgm extension, seeing first what trigrams actually are, and then how we can practically use similarity functions and operators in our Rails apps. This allows us to improve keyword searching, by finding similar, rather than exact matches. We also managed to accomplish all of this without adding an additional backend service, or too much additional complexity to our application.

Share this article: If you liked this article we'd appreciate it if you'd tweet it to your peers.

About the Author

]]>

Efficient GraphQL queries in Ruby on Rails & Postgres

Leigh Halliday — Tue, 24 Sep 2019 12:00:00 GMT

GraphQL puts the user in control of their own destiny. Yes, they are confined to your schema, but beyond that they can access the data in any which way. Will they ask only for the "events", or also for the "category" of each event? We don't really know! In REST based APIs we know ahead of time what will be rendered, and can plan ahead by generating the required data efficiently, often by eager-loading the data we know we'll need.

In this article, we will discuss what N+1 queries are, how they are easily produced in GraphQL, and how to solve them using the graphql-batch gem along with a few custom batch loaders.

The source code for this article is available on GitHub.

What are N+1 queries?
N+1 queries in GraphQL
Optimizing GraphQL queries
Batch loading single records
Batch loading many records
Batch loading many records more efficiently
Batch loading active storage attachments
Conclusion
About the Author

What are N+1 queries?

N+1 queries can occur when you have one-to-many relationships in your models. Each Event belongs to a Category. Let's say that you find the last five events and you want to get the category name for each of them.

Event.last(5).each { |event| puts event.category.name }

Seems simple enough! We unfortunately just produced six queries. The first query to find the events, and another query to find each category's name. This is an easy problem to solve in Rails by using eager-loading:

Event.includes(:category).last(5).each { |event| puts event.category.name }

By using the includes method we've been able to knock our queries down from six to two: The first to find the events, and the second to find the categories for those events.

N+1 queries in GraphQL

As we mentioned earlier, in GraphQL, the user is in charge of their own destiny. They may or may not ask for the category name for each event. The query below will produce N+1 SQL queries as it finds the category for each event:

{
  events {
    id
    name
    category {
      id
      name
    }
  }
}

Optimizing GraphQL queries

Yes, we could solve the N+1 query in the previous example by eager-loading the category relationship, but if the user didn't actually want the category, why load it? We don't know what the user will ask for. There just so happens to be a better way, by lazy-loading data only as its needed using the graphql-batch gem.

Marc-André Giroux has written a very thorough article about the GraphQL Dataloader Pattern which I highly recommend reading before continuing.

Batch loading single records

The simplest case for batch loading data is the example of each event belonging to a category. Inside of our EventType class, there is a field called category which allows the user to access the category of an event.

class Types::EventType < Types::BaseObject
  field :category, Types::CategoryType, null: false

  def category
    # avoid `object.category`
    RecordLoader.for(Category).load(object.category_id)
  end
end

By using the RecordLoader class to load the category, we actually avoid loading the category right away, and instead load all of the required categories with a single query. The query it ends up producing may end up looking like:

SELECT "categories".* FROM "categories" WHERE "categories"."id" IN ($1, $2, $3, $4, $5)

Looking at the RecordLoader class we can see how it works. The perform method will receive all of the ids for a single model (Category in this case), load the records in a single SQL query, and then call the fulfill method for each of them. The fulfill method resolves the promise, which is basically like putting a face to a name... you gave me an ID, and I've fulfilled my promise to provide you with the corresponding record.

class RecordLoader < GraphQL::Batch::Loader
  def initialize(model)
    @model = model
  end

  def perform(ids)
    # Find all ids for this model and fulfill their promises
    @model.where(id: ids).each { |record| fulfill(record.id, record) }
    # Handle cases where a record was not found and fulfill the value as nil
    ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
  end
end

We can write a test for this class to ensure that it finds the records correctly, keeping in mind that in order for the lazy/promise based code to function correctly it needs to be wrapped inside something called an executor.

describe RecordLoader do
  it 'loads' do
    event = create(:event)
    result = GraphQL::Batch.batch do
      RecordLoader.for(Event).load(event.id)
    end
    expect(result).to eq(event)
  end
end

Batch loading many records

We've covered the case where we are batch loading a single record at a time, but how do we handle the reverse scenario? We are displaying categories along with the first five events for each category, which would also produce an N+1 query, so let's see how we can solve it using a batch loader. The query we're discussing would look something like this:

{
  categories {
    id
    name
    events(first: 5) {
      id
      name
    }
  }
}

I have created a custom loader called ForeignKeyLoader for this purpose. It will load the events using the foreign key category_id. I also added the ability to pass a lambda to merge in additional scopes into the query that will be run.

class Types::CategoryType < Types::BaseObject
  field :events, [Types::EventType], null: false do
    argument :first, Int, required: false, default_value: 5
  end

  def events(first:)
    ForeignKeyLoader.for(Event, :category_id, merge: -> { order(id: :asc) }).
      load(object.id).then do |records|
        records.first(first)
      end
  end
end

The query that gets produced looks something like:

SELECT "events".*
FROM "events"
WHERE "events"."category_id" IN ($1, $2, $3, $4, $5)
ORDER BY "events"."id" ASC

Notice in this case that we call the then method to execute some code after the promise has been resolved. Here we see the first issue with this method... we only wanted five events for each category, but our query will load ALL events for each category, and then, using the first method on the resulting Array, narrow it down to only the first five events. If there are thousands of events, we could run into some serious issues.

class ForeignKeyLoader < GraphQL::Batch::Loader
  attr_reader :model, :foreign_key, :merge

  def self.loader_key_for(*group_args)
    # avoiding including the `merge` lambda in loader key
    # each lambda is unique which defeats the purpose of
    # grouping queries together
    [self].concat(group_args.slice(0,2))
  end

  def initialize(model, foreign_key, merge: nil)
    @model = model
    @foreign_key = foreign_key
    @merge = merge
  end

  def perform(foreign_ids)
    # find all the records
    scope = model.where(foreign_key => foreign_ids)
    scope = scope.merge(merge) if merge.present?
    records = scope.to_a

    foreign_ids.each do |foreign_id|
      # find the records required to fulfill each promise
      matching_records = records.select do |r|
        foreign_id == r.send(foreign_key)
      end
      fulfill(foreign_id, matching_records)
    end
  end
end

Batch loading many records more efficiently

It turns out that there is a way to perform a query that says "find me the first N records for each X" (find me the first 5 records for each category), and that involves using Postgres Window Functions. While researching this concept, this article about window functions was useful along with this article about bringing window functions into Rails.

The following query produces the data that we want... we just need to figure out how to write a batch loader that generates the same result.

SELECT "events".*
FROM (
  SELECT
    *,
    row_number() OVER (
      PARTITION BY category_id ORDER BY start_time desc
    ) as rank
  FROM "events"
  WHERE "events"."category_id" IN (1, 2, 3, 4, 5)
) as events
WHERE rank <= 5

For this we'll create a batch loader called WindowKeyLoader which is used like:

class Types::CategoryType < Types::BaseObject
  field :events, [Types::EventType], null: false do
    argument :first, Int, required: false, default_value: 5
  end

  def events(first:)
    WindowKeyLoader.for(Event, :category_id,
      limit: first,
      order_col: :start_time,
      order_dir: :desc
    ).load(object.id)
  end
end

You can see the difference already. I am no longer required to slice the first N array elements in the then block of the resolved promise. The actual batch loader class looks like:

class WindowKeyLoader < GraphQL::Batch::Loader
  attr_reader :model, :foreign_key, :limit, :order_col, :order_dir

  def initialize(model, foreign_key, limit:, order_col: :id, order_dir: :asc)
    @model = model
    @foreign_key = foreign_key
    @limit = limit
    @order_col = order_col
    @order_dir = order_dir
  end

  def perform(foreign_ids)
    # build the sub-query, limiting results by foreign key at this point
    # we don't want to execute this query but get its SQL to be used later
    ranked_from = model.
      select("*,
        row_number() OVER (
          PARTITION BY #{foreign_key} ORDER BY #{order_col} #{order_dir}
        ) as rank").
      where(foreign_key => foreign_ids).
      to_sql

    # use the sub-query from above to query records which have a rank
    # value less than or equal to our limit
    records = model.
      from("(#{ranked_from}) as #{model.table_name}").
      where("rank <= #{limit}").
      to_a

    # match records and fulfill promises
    foreign_ids.each do |foreign_id|
      matching_records = records.select do |r|
        foreign_id == r.send(foreign_key)
      end
      fulfill(foreign_id, matching_records)
    end
  end
end

We're able to test the WindowKeyLoader by creating three events for a category but only asking for the first two of them:

describe WindowKeyLoader do
  it 'loads' do
    category = create(:category)
    events = (1..3).to_a.map do |n|
      create(:event, name: "Event #{n}", category: category)
    end

    result = GraphQL::Batch.batch do
      WindowKeyLoader.for(
        Event,
        :category_id,
        limit: 2, order_col: :id, order_dir: :asc
      ).load(category.id)
    end

    expect(result).to eq(events.first(2))
  end
end

Batch loading active storage attachments

You may run into situations where you're loading polymorphic data, or other types of relationships which don't exactly fit into the mold of your standard has-many or belongs-to relationships. One case is with ActiveStorage. In the code below we'll load an image URL for an event:

class Types::EventType < Types::BaseObject
  field :image, String, null: true

  def image
    # produces 2N + 1 queries... yikes!
    # url_for(object.image.variant({ quality: 75 }))

    AttachmentLoader.for(:Event, :image).load(object.id).then do |image|
      url_for(image.variant({ quality: 75 }))
    end
  end
end

This data is stored using a polymorphic relationship that loads an ActiveStorage::Attachment record, which then needs to load an ActiveStorage::Blob record in order to produce the image url. It ends up producing a 2N + 1 query... yikes! Our AttachmentLoader is able to completely optimize this field by cutting it down to just two queries to load as many images as you'd like.

class AttachmentLoader < GraphQL::Batch::Loader
  attr_reader :record_type, :attachment_name

  def initialize(record_type, attachment_name)
    @record_type = record_type
    @attachment_name = attachment_name
  end

  def perform(record_ids)
    # find records and fulfill promises
    ActiveStorage::Attachment.
      includes(:blob).
      where(record_type: record_type, record_id: record_ids, name: attachment_name).
      each { |record| fulfill(record.record_id, record) }

    # fulfill unfound records
    record_ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
  end
end

In this case we are taking advantage of eager-loading, because for each attachment we will need its corresponding blob record.

Conclusion

GraphQL can be as efficient as REST, but requires approaching optimizations from a different angle. Instead of upfront optimizations, we lazy-load data only when required, loading it in batches to avoid excess trips to the database. In this article, we covered techniques to load single records, multiple records, and records with different types of relationships, as is the case with Active Storage which has a polymorphic relationship.

Share this article: If you liked this article we'd appreciate it if you'd tweet it to your peers.

About the Author

]]>

Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures

Lukas Fittl — Thu, 04 Oct 2018 12:00:00 GMT

Everyone’s favorite database, PostgreSQL, has a new release coming out soon: Postgres 11

In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance.

Just-In-Time compilation (JIT) in Postgres 11

Just-In-Time compilation (JIT) for query execution was added in Postgres 11. It's not going to be enabled for queries by default, similar to parallel query in Postgres 9.6, but can be very helpful for CPU-bound workloads and analytical queries.

Specifically, JIT currently aims to optimize two essential parts of query execution: Expression evaluation and tuple deforming. To quote the Postgres documentation:

Expression evaluation is used to evaluate WHERE clauses, target lists, aggregates and projections. It can be accelerated by generating code specific to each case.

Tuple deforming is the process of transforming an on-disk tuple into its in-memory representation. It can be accelerated by creating a function specific to the table layout and the number of columns to be extracted.

Often you will have a workload that is mixed, where some queries will benefit from JIT, and some will be slowed down by the overhead.

Here is how you can monitor JIT performance using EXPLAIN and auto_explain, as well as how you can determine whether your queries are benefiting from JIT optimization.

Monitoring JIT with EXPLAIN / auto_explain

First of all, you will need to make sure that your Postgres packages are compiled with JIT support (--with-llvm configuration switch). Assuming that you have Postgres binaries compiled like that, the jit configuration parameter controls whether JIT is actually being used.

For this example, we’re working with one of our staging databases, and pick a relatively simple query that can benefit from JIT:

SELECT COUNT(*) FROM log_lines
 WHERE log_classification = 65 AND (details->>'new_dead_tuples')::integer >= 0;

For context, the table log_lines is an internal log event statistics table of pganalyze, which is typically indexed per-server, but in this case we want to run an analytical query across all servers to count interesting autovacuum completed log events.

First, if we run the query with jit = off, we will get an execution plan and runtime like this:

EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM log_lines
    WHERE log_classification = 65 AND (details->>'new_dead_tuples')::integer >= 0;

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                        QUERY PLAN                                                        │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=649724.03..649724.04 rows=1 width=8) (actual time=3498.939..3498.939 rows=1 loops=1)                    │
│   Buffers: shared hit=1538 read=386328                                                                                   │
│   I/O Timings: read=1098.036                                                                                             │
│   ->  Seq Scan on log_lines  (cost=0.00..649675.55 rows=19393 width=0) (actual time=0.028..3437.032 rows=667063 loops=1) │
│         Filter: ((log_classification = 65) AND (((details ->> 'new_dead_tuples'::text))::integer >= 0))                  │
│         Rows Removed by Filter: 14396065                                                                                 │
│         Buffers: shared hit=1538 read=386328                                                                             │
│         I/O Timings: read=1098.036                                                                                       │
│ Planning Time: 0.095 ms                                                                                                  │
│ Execution Time: 3499.089 ms                                                                                              │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

Time: 3499.580 ms (00:03.500)

Note the usage of EXPLAIN's BUFFERS option so we can compare whether any caching behavior affects our benchmarking. We can also see that I/O time was 1,098 ms out of 3,499 ms, so this query is definitely CPU bound.

For comparison, when we enable JIT, we can see the following:

SET jit = on;
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM log_lines
    WHERE log_classification = 65 AND (details->>'new_dead_tuples')::integer >= 0;

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                        QUERY PLAN                                                         │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=649724.03..649724.04 rows=1 width=8) (actual time=2816.497..2816.498 rows=1 loops=1)                     │
│   Buffers: shared hit=1570 read=386296                                                                                    │
│   I/O Timings: read=1154.438                                                                                              │
│   ->  Seq Scan on log_lines  (cost=0.00..649675.55 rows=19393 width=0) (actual time=78.912..2759.717 rows=667063 loops=1) │
│         Filter: ((log_classification = 65) AND (((details ->> 'new_dead_tuples'::text))::integer >= 0))                   │
│         Rows Removed by Filter: 14396065                                                                                  │
│         Buffers: shared hit=1570 read=386296                                                                              │
│         I/O Timings: read=1154.438                                                                                        │
│ Planning Time: 0.095 ms                                                                                                   │
│ JIT:                                                                                                                      │
│   Functions: 4                                                                                                            │
│   Options: Inlining true, Optimization true, Expressions true, Deforming true                                             │
│   Timing: Generation 1.044 ms, Inlining 14.205 ms, Optimization 46.678 ms, Emission 17.868 ms, Total 79.795 ms            │
│ Execution Time: 2817.713 ms                                                                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)

Time: 2818.250 ms (00:02.818)

In this case, JIT yields about a 25% speed-up, due to spending less CPU time, without any extra effort on our end. We can also see that JIT tasks themselves added 79 ms to the runtime.

You can fine tune whether JIT is used for a particular query by the jit_above_cost parameter which applies to the total cost of the query as determined by the Postgres planner. The cost is 649724 in the above EXPLAIN output, which exceeds the default jit_above_cost threshold of 100000. In a future post we'll walk through more examples of when using JIT can be beneficial.

You can gather these JIT statistics either for individual queries that you are interested in (using EXPLAIN), or automatically collect it for all of your queries using the auto_explain extension. If you want to learn more about how to enable auto_explain we recommend reviewing our guide about it: pganalyze Log Insights - Tuning Log Config Settings.

Fun fact: As part of the writing of this article we ran experiments with JIT and auto_explain, and discovered that JIT information wasn’t included with auto_explain, but only with regular EXPLAINs. Luckily, we were able to contribute a bug fix to Postgres, which has been merged and will be part of the Postgres 11 release.

Preventing cold caches: Auto prewarm in Postgres 11

A neat feature that will help you improve performance right after restarting Postgres, is the new autoprewarm background worker functionality.

If you are not familiar with pg_prewarm, its an extension thats bundled with Postgres (much like pg_stat_statements), that you can use to preload data that’s on disk into the Postgres buffer cache.

It is often very useful to ensure that a certain table is cached before the first production query hits the database, to avoid an overly slow response due to data being loaded from disk.

Previously, you needed to manually specify which relations (i.e. tables) and which page offsets to preload, which was cumbersome, and hard to automate.

Caching tables with autoprewarm

Starting in Postgres 11, you can instead have this done automatically, by adding pg_prewarm to shared_preload_libraries like this:

shared_preload_libraries = 'pg_prewarm,pg_stat_statements'

Doing this will automatically save information on which tables/indices are in the buffer cache (and which parts of them) every 300 seconds to a file called autoprewarm.blocks, and use that information after Postgres restarts to reload the previously cached data from disk into the buffer cache, thus improving initial query performance.

Stored procedures in Postgres 11

Postgres has had database server-side functions for a long time, with a variety of supported languages. You might have used the term “procedures” before to refer to such functions, as they are similar to what’s called “Stored Procedures” in other database systems such as Oracle.

However, one detail that is sometimes missed, is that the existing functions in Postgres were always running within the same transaction. There was no way to begin, commit, or rollback a transaction within a function, as they were not allowed to run outside of a transaction context.

Starting in Postgres 11, you will have the ability to use CREATE PROCEDURE instead of CREATE FUNCTION to create procedures.

Benefits of using stored procedures

Compared to regular functions, procedures can do more than just query or modify data: They also have the ability to begin/commit/rollback transactions within the procedure.

Particularly for those moving over from Oracle to PostgreSQL, the new procedure functionality can be a significant time saver. You can find some examples of how to convert procedures between those two relational database systems in the Postgres documentation.

How to use stored procedures

First, let’s create a simple procedure that handles some tables:

CREATE PROCEDURE my_table_task() LANGUAGE plpgsql AS $$
DECLARE
BEGIN
  CREATE TABLE table_committed (id int);
  COMMIT;
  CREATE TABLE table_rolled_back (id int);
  ROLLBACK;
END $$;

We can then call this procedure like this, using the new CALL statement:

=# CALL my_table_task();
CALL
Time: 1.573 ms

Here you can see the benefit of procedures - despite the rollback the overall execution is successful, and the first table got created, but the second one was not since the transaction was rolled back.

Be careful: Transaction timestamps and xact_start for procedures

Expanding on how transactions work inside procedures, there is currently an oddity with the transaction timestamp, which for example you can see in xact_start. When we expand the procedure like this:

CREATE PROCEDURE my_table_task() LANGUAGE plpgsql AS $$
DECLARE
  clock_str TEXT;
  tx_str TEXT;
BEGIN
  CREATE TABLE table_committed (id int);
    SELECT clock_timestamp() INTO clock_str;
    SELECT transaction_timestamp() INTO tx_str;
    RAISE NOTICE 'After 1st CREATE TABLE: % clock, % xact', clock_str, tx_str;
    PERFORM pg_sleep(5);
  COMMIT;
  CREATE TABLE table_rolled_back (id int);
    SELECT clock_timestamp() INTO clock_str;
    SELECT transaction_timestamp() INTO tx_str;
    RAISE NOTICE 'After 2nd CREATE TABLE: % clock, % xact', clock_str, tx_str;
  ROLLBACK;
END $$;

And then call the procedure, we see the following:

=# CALL my_table_task();
NOTICE:  00000: After 1st CREATE TABLE: 2018-10-03 22:17:26 clock, 2018-10-03 22:17:26 xact
NOTICE:  00000: After 2nd CREATE TABLE: 2018-10-03 22:17:31 clock, 2018-10-03 22:17:26 xact
CALL
Time: 5022.598 ms

Despite there being two transactions in the procedure, the transaction start timestamp is that of when the procedure got called, not when the embedded transaction actually started.

You will see the same problem with the xact_start field in pg_stat_activity, causing monitoring scripts to potentially detect false positives for long running transactions. This issue is currently in discussion and likely to be changed before the final release.

How often does my stored procedure get called?

Now, if you want to monitor the performance of procedures, it gets a bit difficult. Whilst regular functions can be tracked using track_functions = on, there is no such facility for procedures. You can however track the execution of CALL statements using pg_stat_statements:

SELECT query, calls, total_time FROM pg_stat_statements WHERE query LIKE 'CALL%';

┌────────────┬───────┬────────────┐
│   query    │ calls │ total_time │
├────────────┼───────┼────────────┤
│ CALL abc() │     4 │    5.62299 │
└────────────┴───────┴────────────┘

In addition, when you enable pg_stat_statements.track = all, queries that are called from within a procedure will be tracked, and made available in Postgres query performance monitoring tools such as pganalyze.

Conclusion

Postgres 11 is going to be the best Postgres release yet, and we are excited to put it into use.

Whilst common wisdom is to not upgrade right after a release, we encourage you to try out the new release early, help the community find bugs (just like we did!), and make sure that your performance monitoring systems are ready to handle the new features that were added.

PS: If this article was useful to you and you want to share it with your peers you can tweet it by clicking here.

]]>

Postgres Log Monitoring 101: Deadlocks, Checkpoint Tuning & Blocked Queries

Lukas Fittl — Mon, 12 Feb 2018 00:00:00 GMT

Those of us who operate production PostgreSQL databases have many jobs to do - and often there isn't enough time to take a regular look at the Postgres log files.

However, often times those logs contain critical details on how new application code is affecting the database due to locking issues, or how certain configuration parameters cause the database to produce I/O spikes.

This post highlights three common performance problems you can find by looking at, and automatically filtering your Postgres logs.

Blocked Queries

One of the most performance-related log events are blocked queries, due to waiting for locks that another query has taken. On systems that have problems with locks you will often also see very high CPU utilization that can't be explained.

First, in order to enable logging of lock waits, set log_lock_waits = on in your Postgres config. This will emit a log event like the following if a query has been waiting for longer than deadlock_timeout (default 1s):

LOG: process 123 still waiting for ShareLock on transaction 12345678 after 1000.606 ms
STATEMENT: SELECT table WHERE id = 1 FOR UPDATE;
CONTEXT: while updating tuple (1,3) in relation “table”
DETAIL: Process holding the lock: 456. Wait queue: 123.

This tells us that we're seeing lock contention on updates for table, as another transaction holds a lock on the same row we're trying to update. You can often see this caused by complex transactions that hold locks for too long. One frequent anti-pattern in a typical web app is to:

Open a transaction
Update a timestamp field (e.g. updated_at in Ruby on Rails)
Make an API call to an external service
Commit the transaction

The lock on the row that you updated in Step 2 will be held all the way to 4., which means if the API call takes a few seconds total, you will be holding a lock on that row for that time. If you have any concurrency in your system that affects the same rows, you will see lock contention, and the above lock notice for the queries in Step 2.

Often you however have to go back to a development or staging system with full query logging, to understand the full context of a transaction thats causing the problem.

Deadlocks

Related to blocked queries, but slightly different, are deadlocks, which result in a cancelled query due to it deadlocking against another query.

The easiest way to reproduce a deadlock is doing the following:

--- session 1
BEGIN;
SELECT * FROM table WHERE id = 1 FOR UPDATE;

--- session 2
BEGIN;
SELECT * FROM table WHERE id = 2 FOR UPDATE;
SELECT * FROM table WHERE id = 1 FOR UPDATE; --- this will block waiting for session 1 to finish

--- session 1
SELECT * FROM table WHERE id = 2 FOR UPDATE; --- this can never finish as it deadlocks against session 2

Again after deadlock_timeout Postgres will see the locking problem. In this case it decides that this will never finish, and emit the following to the logs:

2018-02-12 09:24:52.176 UTC [3098] ERROR:  deadlock detected
2018-02-12 09:24:52.176 UTC [3098] DETAIL:  Process 3098 waits for ShareLock on transaction 219201; blocked by process 3099.
	Process 3099 waits for ShareLock on transaction 219200; blocked by process 3098.
	Process 3098: SELECT * FROM table WHERE id = 2 FOR UPDATE;
	Process 3099: SELECT * FROM table WHERE id = 1 FOR UPDATE;
2018-02-12 09:24:52.176 UTC [3098] HINT:  See server log for query details.
2018-02-12 09:24:52.176 UTC [3098] CONTEXT:  while locking tuple (0,1) in relation "table"
2018-02-12 09:24:52.176 UTC [3098] STATEMENT:  SELECT * FROM table WHERE id = 2 FOR UPDATE;

You might think that deadlocks never happen in production, but the unfortunate truth is that heavy use of ORM frameworks can hide the circular dependency situation that produces deadlocks, and its certainly something to watch out for when you make use of complex transactions.

Checkpoints

Last but not least, checkpoints. For those unfamiliar, checkpointing is the mechanism by which PostgreSQL persists all changes to the data directory, which before were only in shared buffers and the WAL. Its what gives you a consistent copy of your data in one place (the data directory).

Due to the fact that checkpoints have to write out all the changes you've submitted to the database (which before were already written to the WAL), they can produce quite a lot of I/O - in particular when you are actively loading data.

The easiest way to produce a checkpoint is to call CHECKPOINT, but very few people would do that frequently in production. Instead Postgres has a mechanism that automatically triggers a checkpoint, most commonly due to either time, or xlog. After turning on log_checkpoints = 1 you can see this in the logs like this:

Feb 09 08:30:07am PST 12772 LOG: checkpoint starting: time
Feb 09 08:15:50am PST 12772 LOG: checkpoint starting: xlog
Feb 09 08:10:39am PST 12772 LOG: checkpoint starting: xlog

Or when visualized over time, it can look like this:

Occasionally Postgres will also output the following warning, which hints at the tuning you can do:

Feb 09 10:21:11am PST 5677 LOG: checkpoints are occurring too frequently (17 seconds apart)

With checkpoints you want to avoid having them occur to frequently, as each checkpoint will produce significant I/O, as well as cause all changes that are written to WAL right after to be written as a full-page write.

Ideally you would see checkpoints spaced out evenly and usually started by time instead of xlog. You can influence this behavior by the following config settings:

checkpoint_timeout - the time after which a time checkpoint will be kicked off (defaults to every 5 minutes)
max_wal_size - the maximum amount of WAL that will be accumulated before an xlog checkpoint gets triggered (defaults to 1 GB)
checkpoint_completion_target - how quickly a checkpoint finishes (defaults to 0.5 which means it will finish in half the time of checkpoint_timeout, i.e. 2.5 minutes)

On many production systems I've seen max_wal_size be increased to support higher write rates, checkpoint_timeout to be slightly increased as well to avoid too frequent time-based checkpoints, as well as setting checkpoint_completion_target to 0.9.

You should however tune all of this based on your own system, and the logs, so you can choose whats correct for your setup. Also note that less frequent checkpoints mean recovery of the server is going to take longer, as Postgres will have to replay all WAL, starting from the previous checkpoint, when booting after a crash.

Conclusion

Postgres log files contain a treasure of useful data you can analyze in order to make your system behave faster, as well as debug production issues. This data is readily available, but often difficult to parse.

This article tries to point the way towards which log lines are worth filtering for on production systems.

If you don't want to bother with setting up your own filters in a third party logging system, try out pganalyze Postgres Log Insights: a real-time PostgreSQL log analysis and log monitoring system built into pganalyze.

]]>

Visualizing & Tuning Postgres Autovacuum

Lukas Fittl — Tue, 28 Nov 2017 00:00:00 GMT

In this post we'll take a deep dive into one of the mysteries of PostgreSQL: VACUUM and autovacuum.

The Postgres autovacuum logic can be tricky to understand and tune - it has many moving parts, and is hard to understand, in particular for application developers who don't spend all day looking at database documentation.

But luckily there are recent improvements in Postgres, in particular the addition of pg_stat_progress_vacuum in Postgres 9.6, that make understanding autovacuum and VACUUM behavior a bit easier.

In this post we describe an approach to autovacuum tuning that is based on sampling these statistics over time, visualizing them, and then making tuning decisions based on data. The visualizations shown are all screenshots of real data, and are available for early access in pganalyze.

Why VACUUM?

First of all, why we need VACUUM, 101:

When you perform UPDATE and DELETE operations on a table in Postgres, the database has to keep around the old row data for concurrently running queries and transactions, due to its MVCC model. Once all concurrent transactions that have seen these old rows have finished, they effectively become dead rows which will need to be removed.

VACUUM is the process by which PostgreSQL cleans up these dead rows, and turns the space they have occupied into usable space again, to be used for future writes.

A more detailed description can be found in the PostgreSQL documentation.

Which tables have VACUUM running?

The easiest thing you can check on a running PostgreSQL system is which VACUUM operations are running right now. In all Postgres versions this information shows up in the pg_stat_activity view, look for query values that start with "autovacuum: ", or which contain the word "VACUUM":

SELECT pid, query FROM pg_stat_activity WHERE query LIKE 'autovacuum: %';

-------+----------------------------------------------------------------------------
 10469 | autovacuum: VACUUM ANALYZE public.schema_columns
 12848 | autovacuum: VACUUM public.replication_follower_stats (to prevent wraparound)
 28626 | autovacuum: VACUUM public.schema_index_stats (to prevent wraparound)

Based on sampling this data, we can generate a timeline view that helps us distinguish tables that are frequently vacuumed, from tables that have long running vacuums, to tables that don't get vacuumed much at all.

In the screenshot you can see the top 10 tables (by frequency) colored the same way, and in particular the table thats colored light yellow stand out as effectively having VACUUM running continuously.

We can also see that one manual VACUUM was started by the DBA user (colored in cyan), and that it ran much quicker than the same colored version started by autovacuum earlier in the day.

When does autovacuum run?

Another question that frequently comes up is, why did autovacuum decide to start VACUUMing a table?

There are essentially two major reasons:

1) To prevent Transaction ID wraparound

The number of non-frozen transaction IDs has reached "autovacuum_freeze_max_age" (default 200 million transactions), and VACUUM is required to prevent transaction ID wraparound.

We won't go too much into detail on tuning this parameter in this post, but rather reserve this as a follow-on topic.

Note that this can't be disabled, so it will cause autovacuum to start VACUUM, even if it is otherwise disabled. If you keep cancelling autovacuum processes started for this reason you will eventually have to perform a manual VACUUM, as Postgres will shut down the database otherwise.

2) To mark dead rows & enable re-use for new data

As you run UPDATEs and DELETEs, dead rows will accumulate, as described earlier in the post. Once the number of dead rows (or tuples) has exceeded the threshold, autovacuum will start a VACUUM run.

The following formula is used to decide whether vacuuming is needed:

vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples

By default the base threshold is 50 rows, and the scale factor is 20%. That means, a table will be vacuumed as soon as the number of dead rows exceeds 20% of all rows in the table, given that at least 50 rows are marked as dead.

In order to understand when this gets triggered, you can look at the n_live_tup and n_dead_tup values in pg_stat_user_tables:

SELECT * FROM pg_stat_user_tables WHERE relname = 'backend_states';

-[ RECORD 1 ]-------+------------------------------
relid               | 732156523
schemaname          | public
relname             | backend_states
...
n_live_tup          | 23047184
n_dead_tup          | 108373
...

We can then take this information, together with the autovacuum settings, and visualize it:

Here you can see that as soon as the dead tuples (grey/red area) reach the threshold (grey line), a VACUUM process kicks off (red line in the lower graph).

On a table that can't keep up with VACUUM, which results in bloat due to dead rows, this would instead look like this:

How fast does autovacuum run?

A VACUUM process that was started by autovacuum is artificially throttled in the default PostgreSQL configuration, so it doesn't fully utilize the CPU and I/O available.

That is the correct way to operate for most systems, as you wouldn't want VACUUM to slow down application queries during business hours.

The system that Postgres follows for this is that every VACUUM operation accumulates cost, which you can think of as points that get added up:

vacuum_cost_page_hit (cost for vacuuming a page found in the buffer cache, default 1)
vacuum_cost_page_miss (cost for vacuuming a page retrieved from disk, default 10)
vacuum_cost_page_dirty (cost for writing back a modified page to disk, default 20)

Once the sum of costs has reached autovacuum_cost_limit (default 200 for autovacuum, disabled for manual VACUUM), the VACUUM process will sleep and do nothing for autovacuum_vacuum_cost_delay (default 20 ms).

With the default parameters, that means that autovacuum will at most write 4MB/s to disk, and read 8MB/s from disk or the OS page cache.

How far has this VACUUM made progress?

VACUUM runs through three different major phases as part of its operation:

Scanning Heap
Vacuuming Indices
Vacuuming Heap

As well as a few minor phases that are usually really quick.

The "Vacuuming Indices" and "Vacuuming Heap" phase might run multiple times if the autovacuum_work_mem setting is set to a too low value that not all dead tuples can be held in memory.

Based on sampling pg_stat_progress_vacuum we can visualize in detail what goes on:

This works even whilst a autovacuum or manual VACUUM is still running, and so we can get a visual indication of how long we will roughly have to wait for it to finish.

What should I tune first?

In general, one might think that VACUUM is an expensive operation, and you'd want to only run it infrequently, maybe even as a nightly maintenance task.

That however is often the wrong way to approach it, as rarely run VACUUMs are much more expensive since they have more work to do, and it also means your system will spend more time in a sub-optimal state.

Instead, try to have VACUUM run more often, in proportion to UPDATEs and DELETEs your application performs. Frequently run VACUUMs will be faster, as there is less work to perform.

There is two primary tunings you should consider on production Postgres databases:

1) Lower autovacuum_vacuum_scale_factor on tables with old, inactive data

For tables with a lot of old, inactive data, consider lowering the threshold by which autovacuum is triggered. Since the calculation is based on the number of total rows in the table, autovacuum will not notice if most recent rows have been modified, since the overall number of dead rows will still be way below the default threshold of 20%.

However, you will see the impact of dead rows on your query performance, as the dead rows have to be scanned over when reading data. Reducing the scale factor to keep down the total number of dead rows can make sense in such cases.

2) Adjust autovacuum_cost_limit / autovacuum_cost_delay for bigger machines

The default settings for throttling are quite conservative on modern systems. Unless you run on the smallest instance type, or with the cheapest storage, it often makes sense to speed up autovacuum a bit.

In addition, for small tables that have a lot of updates/deletes, it can happen that autovacuum is not able to keep up, and that you will see new VACUUMs start pretty much right after the previous one was finished. In such cases adjusting the throttling on a per-table basis might also make sense.

Note that most autovacuum configuration settings can be overridden on a per-table basis:

ALTER TABLE my_table SET (autovacuum_vacuum_scale_factor = 0.05);

It often makes sense to review that table's particular statistics, e.g. how often is the table updated and how many dead tuples does it accumulate, before modifying autovacuum settings.

The visualizations shown in this post are based on real data, and are now available for early access to all pganalyze customers on the Scale plan and higher.

Reach out to have this feature enabled for your account - we'd be happy to walk you through it, and help you tune autovacuum on your database.

]]>

Whats New in Postgres 10: Monitoring Improvements

Lukas Fittl — Wed, 04 Oct 2017 00:00:00 GMT

Postgres 10 has been stamped on Monday, and will most likely be released this week, so this seems like a good time to review what this new release brings in terms of Monitoring functionality built into the database.

In this post you'll see a few things that we find exciting about the new release, as well as some tips on what to adjust, whether you use a hosted Postgres monitoring tool like pganalyze, or if you've written your own scripts.

New "pg_monitor" Monitoring Role

Most users of Postgres obviously don't want to give monitoring tools access to superuser, but in the past this was often required, as many Postgres statistic views (e.g. pg_stat_statements) only show the values for the current user, unless you are superuser.

This meant that you had to workaround with SECURITY DEFINER functions that queries the statistics views as superuser, but could be called from a restricted user.

Now, you can use the monitoring role in Postgres 10 to instead give a user specific access to monitor statistics views, without giving out any other access.

Its as simple as:

GRANT pg_monitor TO monitoring_user;

And afterwards that user can simply access statistics views without running into <insufficient privilege> issues like before.

This also works with pganalyze out of the box, so once you upgrade to 10 you can simply grant the monitoring role to the pganalyze user, and drop the helper functions we've previously asked you to create.

A subset of often used views that the monitoring role now grants you access to:

pg_stat_statements
pg_stat_activity
pg_stat_replication
pg_stat_progress_vacuum
.. and more

Note that there more fine-grained roles you can assign, should you want to.

Renaming of "xlog" to "wal", and "location" to "lsn"

If you've written your own monitoring scripts to check replication lag, and other statistics that have to do with WAL or LSNs, you'll need to update some function names.

In this new release, besides the WAL directory being renamed from "pg_xlog" to "pg_wal", all system administration functions have also been renamed to match this change. In addition, where previously functions had the name "location" in them, it now refers to "lsn".

You are most likely going to run into this with the often used pg_current_xlog_location (now pg_current_wal_lsn), as well as the helper method pg_xlog_location_diff (now pg_wal_lsn_diff).

Also note that the sent_location, write_location, etc fields in pg_stat_replication have been renamed to sent_lsn, write_lsn and so forth.

Wait Events & Non-Client Connections in pg_stat_activity

The pg_stat_activity view and underlying data structure has been thoroughly improved this release, and now shows not just client connections and autovacuum, but also other background workers that are running in the system:

SELECT pid, backend_type, backend_start FROM pg_stat_activity WHERE backend_type != 'client backend';

 pid |    backend_type     |         backend_start         
-----+---------------------+-------------------------------
  58 | autovacuum launcher | 2017-10-03 21:02:45.458053+00
  60 | background worker   | 2017-10-03 21:02:45.459172+00
  56 | background writer   | 2017-10-03 21:02:45.457657+00
  55 | checkpointer        | 2017-10-03 21:02:45.457491+00
  57 | walwriter           | 2017-10-03 21:02:45.457817+00

If you have previously written monitoring scripts that rely on counting the number of entries in pg_stat_activity, you should filter the view by backend_type = 'client backend', or switch to using numbackends from pg_stat_database.

In addition to this, the new release also brings an additional 115 wait events (visible in wait_event_type and wait_event in pg_stat_activity), in particular more than 60 new I/O related events which help you understand better what a query is busy with.

You can find the full list of wait events in the Postgres documentation.

amcheck

Last but not least, a useful feature for consistency checking got added in this release. Initially developed by Peter Geoghegan and battle-tested at Heroku Postgres, this new tool allows you to check a B-Tree index for corruption as well as verify that invariants in the structure of the index are as expected.

It first needs to be created as CREATE EXTENSION amcheck and can then be run by a superuser like this:

SELECT bt_index_check('my_test_index');

 bt_index_check
----------------

An empty result indicates that the index is consistent, as would be expected.

Note that amcheck accesses the index through the shared buffer cache, so it might not show problems at the disk level right away. See more details on its documentation page.

This concludes a short overview of new monitoring functionality in Postgres 10.

Note that there are many other amazing new features like parallel query, logical replication and declarative partitioning that are not covered in this post.

If this article proved useful to you, you might also be interested in our Postgres Log Monitoring 101 article where we take a closer look at Deadlocks, Checkpoint Tuning, and Blocked Queries.

]]>

Introducing pg_query: Parse PostgreSQL queries in Ruby

Lukas Fittl — Tue, 17 Jun 2014 00:00:00 GMT

In this article we'll take a look at the new pg_query Ruby library.

pg_query is a Ruby library I wrote to help you parse SQL queries and work with the PostgreSQL parse tree. We use this extension inside pganalyze to provide contextual information for each query and find columns which might need an index.

At the end of this article you'll also find monitor.rb - a ready-to-use example that filters pg_stat_statements output and restricts it to only show a specific table.

Existing Solutions to Parse SQL Queries

After a longer period of research on this problem, we've come to a few realizations:

Obviously, using regular expressions for parsing any complex language is a bad idea.
None of the existing parsers work really well, or are maintained. For example sqlparse is focused on re-indenting and beautifying SQL - not for actually working with the query.
Writing and maintaining our own SQL parser is a bad idea. SQL is complex, even for simple things like SELECT. And don't get me started on Common Table Expressions, sub-queries and other fun features.

Our conclusion: The only way to correctly parse all valid SQL queries that PostgreSQL understands, now and in the future, is to use PostgreSQL itself.

And in general, PostgreSQL turns out to have a pretty good SQL parser - other SQL databases even use it as a reference implementation.

So we've pretty much determined that we wanted to use the PostgreSQL parser itself - but how do we access it?

Accessing the PostgreSQL Parser

Lets get the PostgreSQL server source, go down the rabbit hole and find what we need:

/*
 * raw_parser
 * Given a query in string form, do lexical
 * and grammatical analysis.
 *
 * Returns a list of raw (un-analyzed) parse trees.
 */
List *
raw_parser(const char *str)
{
	...
}

This is the C function that takes a query and returns a parse tree as C structs.

Luckily this function is fairly independent, it does not need pg_catalog access (tables, indices, statistics, etc) since it runs before the query is rewritten, planned and executed:

Unfortunately raw_parser(...) is not exposed or included in any of the PostgreSQL libraries - and its quite difficult to extract the parser from PostgreSQL without taking a whole lot of other code with you.

The pgpool project has actually done this, but they do need to update that code for every new major release. We've therefore turned to a slightly different approach:

We use the PostgreSQL server code directly - by statically linking the code into our own shared library. Through a bit of linking magic, we simply call the internal parser functions, and expose that function through a Ruby interface, to be used like this:

require 'pg_query'

pp PgQuery.parse("SELECT 1")
#<PgQuery:0x007f8cdaa8f8b8
 @parsetree=
  [{"SELECT"=>
     {"distinctClause"=>nil,
      "intoClause"=>nil,
      "targetList"=>
       [{"RESTARGET"=>
          {"name"=>nil,
           "indirection"=>nil,
           "val"=>{"A_CONST"=>{"val"=>1, "location"=>7}},
           "location"=>7}}],
      "fromClause"=>nil,
      "whereClause"=>nil,
      "groupClause"=>nil,
      "havingClause"=>nil,
      "windowClause"=>nil,
      "valuesLists"=>nil,
      "sortClause"=>nil,
      "limitOffset"=>nil,
      "limitCount"=>nil,
      "lockingClause"=>nil,
      "withClause"=>nil,
      "op"=>0,
      "all"=>false,
      "larg"=>nil,
      "rarg"=>nil}}],
 @query="SELECT 1",
 @warnings=[]>

The result is a PostgreSQL parse tree as used by PostgreSQL internally.

Parsing Normalized Queries

Now, to the interesting part. Assume we collect pg_stat_statements queries like this one:

SELECT "users".* FROM "users" WHERE "users"."id" = ?

Note that the actual value has been replaced by the ? character. Unfortunately, the PostgreSQL parser can't parse queries normalized in this manner. It would simply return a syntax error.

At first, we simply replaced all occurences of ? with $0 (a parameter reference) before parsing, so that the query can be parsed correctly.

There are however a few problems with that kind of "dumb" string replacement - most prominentely: We're breaking all operators containing ?, like for example those for JSONB in 9.4.

Our improved solution to this: We've patched the PostgreSQL parser to support ? as a parameter reference (identical with $0).

require 'pg_query'

pp PgQuery.parse("SELECT * FROM x WHERE y = ?")
#<PgQuery:0x007f8cdaaaae10
 @parsetree=
  [{"SELECT"=>
     {"distinctClause"=>nil,
      "intoClause"=>nil,
      "targetList"=>
       [{"RESTARGET"=>
          {"name"=>nil,
           "indirection"=>nil,
           "val"=>{"COLUMNREF"=>{"fields"=>[{"A_STAR"=>{}}], "location"=>7}},
           "location"=>7}}],
      "fromClause"=>
       [{"RANGEVAR"=>
          {"schemaname"=>nil,
           "relname"=>"x",
           "inhOpt"=>2,
           "relpersistence"=>"p",
           "alias"=>nil,
           "location"=>14}}],
      "whereClause"=>
       {"AEXPR"=>
         {"name"=>["="],
          "lexpr"=>{"COLUMNREF"=>{"fields"=>["y"], "location"=>22}},
          "rexpr"=>{"PARAMREF"=>{"number"=>0, "location"=>26}},
          "location"=>24}},
      "groupClause"=>nil,
      "havingClause"=>nil,
      "windowClause"=>nil,
      "valuesLists"=>nil,
      "sortClause"=>nil,
      "limitOffset"=>nil,
      "limitCount"=>nil,
      "lockingClause"=>nil,
      "withClause"=>nil,
      "op"=>0,
      "all"=>false,
      "larg"=>nil,
      "rarg"=>nil}}],
 @query="SELECT * FROM x WHERE y = ?",
 @warnings=[]>

Unfortunately, right now, this parser change limits the usage of ? in operators to those in core - specifically JSONB and gemetric operators. If you use third-party extensions or custom operators that contain ?, pg_query likely won't be able to parse those queries.

The Result

As a proof of concept, I wrote monitor.rb, a Ruby script that shows the current information stored inside pg_stat_statements in a top-like manner, filtered by a specific table:

monitor.rb -d sampledb -t users

AVG     | QUERY
--------------------------------------------------------------------------------
1.5ms   | SELECT "users".* FROM "users"
0.1ms   | SELECT "users".* FROM "users" WHERE "users"."id" = ? ORDER BY "users"."id" ASC LIMIT ?
0.1ms   | UPDATE "users" SET "fullname" = $1, "updated_at" = $2 WHERE "users"."id" = ?
0.0ms   | SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1

This could be easily extended to highlight queries accessing large tables, potentially missing indices, etc.

Going Forward

As you can see, PostgreSQL parse trees are quite useful - and there are many more analysis/grouping options that could be explored.

If you enjoyed reading this, please give pg_query a try. Simply install it using:

gem install pg_query

During installation of the library a full PostgreSQL server is compiled, so it might take 5-10 minutes. Using a gem cache is advised for deployment.

Interested in support for other languages? Drop me a line and I'd love to chat how we can add support for Python, Perl, you name it.

Furthermore, we'll try to get some of our patches upstream for PostgreSQL 9.5 - this specifically relates to our changes in outfuncs.c, supporting additional query nodes and JSON output. Your help and feedback is appreciated.

And of course, if you build something cool with this, let us know! :)

]]>