PostgreSQL: A Set of Practices

Programming

I started working with PostgreSQL three years ago. During this time, I have managed to collect all the bugs you could imagine. To tell the truth, if I knew the current sad experience three years ago, my life would have been much easier. That’s exactly why I have decided to write a completely subjective article with a set of rules I stick to when using PostgreSQL. Perhaps, this article will help you avoid bugs I came across (and meet some other ones, ha ha!).

Safety Measures

The List of Rules

Almost every point below has a sad story behind itself, a story full of suffering and effort. The word “pain” marks the scariest situations that still make me shudder at night.

1. Version the Database Schema

A database schema is the code you have written. It should be in the Version Control System, and it should version with the rest of the project. In terms of PostgreSQL, I like Pyrseas most of all for this purpose. It converts a schema will all objects specific for PostgreSQL to a yaml file that versions. Such file allows us to work conveniently in branches, and also merge changes, in contrast to the pure SQL. As the final step, yaml file is compared with the database schema and an SQL migration is automatically generated.

2. Pain! Never Apply Changes to the Production Database.

Even if the change is simple, extremely urgent, and you really want to do this. First, you should apply it to developers’ database, commit changes, and apply it to the trunk database (identical to the production database). Only when everything is well in trunk, we can apply changes to the production database. This process is quite long but it will save you from lots of problems.

3. Pain! Write where Before Writing delete or update

Before you run the code, take a deep breath, count to three and make sure you’re in the session of the necessary database. It’s especially important for truncate. Say a prayer, and then run the code. Amen!

UPD. Some people say it’s more useful to set autocommit off for the current session, otherwise write begin before every update and delete.

4. Test Driven Development

You should always write tests first and then create database objects. I am talking about any objects: schemas, tables, functions, types, extensions – there are no exceptions! It may seem too hard at first but you will be really grateful afterward. You can easily miss something even during the initial creation of the schema. During the table refactoring in half a year, only written tests will save you from shooting yourself in the foot. In case of PostgreSQL, there’s a great extension pgTAP. For each schema, I recommend you to create an additional schema schema_name_ tap where you will write functions for testing. Then simply run tests through pg_prove.

5. Don’t forget to customize PITR

I’m afraid to become Captain Obvious, but every database should have a set up for backing things up. You must be able to restore the database from any point in time. We have barman in PostgreSQL for this purpose.

6. Data Consistency

Inconsistent data in the database has never lead to anything good. Even a small amount can turn the entire database into garbage. Therefore, we should never neglect normalization and limitations like foreign keys and checks. Use a denormalized form (e.g. jsonb) only after making sure you can’t implement the schema in a normalized form with an acceptable level of complexity and performance. A denormalized form can potentially lead to inconsistent data. No matter what arguments fans of denormalization provide, say that normalization has been invented for a reason, and then fall silent with a meaningful expression on your face.

7. Create Foreign Keys deferrable initially deferred

In this case, you defer the constraints check to the end of transaction, which allows to get away with inconsistency during its execution (in the end, everything will either be consistent or throw an error). Moreover, when we change a flag inside the transaction to immediate, it is possible to run the limit check at a specified time of the transaction.

UPD. Some comments say that deferrable is a controversial practice that simplifies a number of import tasks but complicates the process of debugging inside the transaction and is a bad practice for novice developers. Even though I think that it’s better to have deferrable keys than not to have them.

8. Do Not Use public schema

It’s a special schema for functions from extensions. Create separate schemas for your purposes. Treat them like modules and create a new schema for each logically separate set of entities.

9. A Separate Schema for API

For functions called on the side of the application, you can create a separate schema api_v_version_number. This will allow to control where functions being interfaces to your database are located. We can use the following template to name functions in this schema: entity_get/post/patch/delete_arguments.

10. Triggers for the Audit

Triggers are most suitable for activity audit. I also recommend you to create a multipurpose trigger function to write any actions of any table. To do this, we should extract the data about the structure of the target table from information_schema and understand whether a new or old line will be inserted depending on the action performed. This solution will make the code more maintainable.

If you plan to use triggers to calculate the savings account register be careful in logic: one mistake can lead to inconsistent data. They say it’s a very dangerous kung fu.

11. Pain! Importing Data to a New Schema

It’s the worst but a regular event occurring in the life of a database developer. FDW really help in PostgreSQL. Plus, they have been leveled up in 9.6 (if their developers want, FDW can build a plan on a remote side). By the way, there’s this user-friendly structure import foreign schema that saves us from writing wrappers on loads of tables. It is also a good practice to have a set of functions that store a set of SQL commands for removing and restoring foreign and primary keys existing in the database. Before you perform the import, I recommend writing a view set with data identical in its structure to target tables. Then make an insertion using copy (not insert!). It is better to store the entire sequence of SQL commands in a separate version file and run them via psql with -1 key (in a single transaction). By the way, import is the only case when we can disable fsync in PostgreSQL, after making a backup, fingers crossed.

12. Do Not Write in SQL:1999

I mean really, a lot has happened since those times, an entire generation has graduated from school, cell phones have become supercomputers since 1999. Anyway, you shouldn’t write the way our fathers did. Use with, it makes the code more clear so we can read it from top to bottom without wandering among join locks. If you make join on fields with the same name, it would make more sense to apply using, not on. It is also worth noting that you should never use offset in production. What is more, we often forget about such an amazing thing as join lateral — use it.

UPD. When using with, do not forget that the result of its execution will create a CTE that eats some memory and does not support indexes when we query it. Therefore, the abundant and inappropriate use of with can adversely affect the query performance. That’s why we should never forget to analyze the query via the scheduler. With is especially useful when we need to get a table that will be differently used in separate parts of the query below. Keep in mind that with radically improves readability of the query and is more efficient in each new version of PostgreSQL. All other things being equal, you should prefer this structure.

13. Temporary Tables

If you can write a query without temporary tables, don’t hesitate to do this! As a rule, a CTE created using with structure is an acceptable alternative. The thing is that PostgreSQL creates a temporary file for each new temporary table… It’s another sad thing here.

14. Pain! The Worst Anti-pattern in SQL

Never use structures like

select myfunc() from table; The execution time of such query increases linearly depending on the number of lines. We can always rewrite such query into something without a function applied to each line and win some execution speed.

15. The Main Secret of Queries

If your query is running slow on the test computer, it won’t run faster in production. The best analogy here is about roads with cars. The test computer is a one-lane road. Production server is a road with ten lanes. Ten lines will let much more cars pass at the rush hour without any traffic jams than a one-lane road. But if your car is an old rust bucket, it will never be as fast as Ferrari, no matter how many free lanes you give it.

16. Use The Index, Luke!

Depending on how properly you will create and use them will depend on the fact whether the query will be processed within a fracture of a second or a minute. I recommend getting acquainted with the website of Markus Winand about the structure of b-tree indexes. It’s the best publicly available explanation of balanced trees I have ever seen. His book is also pretty cool.

17. group by or window function?

Obviously, window function can do better but sometimes there are two ways to calculate the aggregation. In such cases, I follow this rule: if the aggregation is calculated via cover indexes only — use the group by only. In case there are no cover indexes, we can try window function.

18. set_config

set_config can be used not only to insert settings for postgresql.conf within a transaction but also to pass a user variable (if it is pre-defined in postgresql.conf) to the transaction. Using these variables in transactions, we can affect the behavior of invoked functions.

19. FTS and Trigrams

They’re amazing! They grant us a full text and fuzzy search without losing any power of SQL. Don’t forget to use them.

20. Calling Our Own Exceptions

In a big project, we often have to call lots of exceptions with their codes and messages. Not to get confused in them, we can create a separate type code – exception text for exceptions, and also functions for calling (a wrapper over raise), adding and removing them. If you have covered all database objects with tests, it is not possible to accidentally remove the exception code that is being used somewhere else.

21. There’s Always Something to Take Care of

Don’t forget to set an ACL for your tables, and run functions with SECURITY DEFINER attribute. When functions work only for reading, you’d better set the stable flag for them.

22. Pain! Icing on the Cake

UPD. You should never redirect the application user through the server to the database, with the one-to-one translation of the application user to the database user. Even if it seems that you can configure security for users and user groups in the database by regular means of PostgreSQL, don’t you ever do so, it’s a trap! In this schema, we cannot use connection pools, and every connected user of the application will consume the resource- intensive connection to the database. Databases store hundreds of connections, while servers store thousands. That’s exactly why we use load balancers and connection pools. When we translate every user one-to-one to the database, the load growth will make us break the schema and rewrite everything.

Comments

    3,751

    Ropes — Fast Strings

    Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.