[Ruby] Multi-tenant using PostgreSQL’s Row Level Security Policy with Rails

6 minute read

We are making a SaaS called Clipkit. Although it is a multi-schema system multi-tenant system, the number of tenants is approaching 1,000 and it is becoming difficult, so other methods are being verified.

LD;TR

I tried implementing multi-tenant using PostgreSQL’s Row Level Security Policy.

I could implement it without any problems and it seemed to work well. But in the end, I decided not to use it. RDB’s multi-tenant method has advantages and disadvantages, and is difficult.

Personally, I initially thought that I should start with a multi-schema method instead of RLS.

Introduction

In SaaS type Web services, there is a method to make independent applications for each customer live in one system. This is called multi-tenant.

RDB multi-tenant method

The usual idea is to come up with a design that mixes data from multiple tenants into a table. However, in that case, if there is a bug in the program, there is a possibility that it will cause a very big security problem such as seeing the data of other tenants.

Therefore, it is necessary to think of a method for separating the data for each tenant so that crosstalk will never occur.

There are roughly three methods to realize multi-tenant with RDB.

Multi-instance (Silo)

Use an independent DB instance (such as a virtual machine) for each tenant. Independence is high, but cost and maintainability are small.

Single-instance multi-schema (bridge)

Prepare a schema for each tenant in a single DB instance. Management of table definition is complicated because each tenant has an independent table.

Single schema (pool)

Mix data for all tenants in a table in a single schema. Most resource efficient, but if there is a bug in the program, there is a big risk that the data of other tenants will be crossed.

Easy to realize is multi-schema method

Rails has a gem called Apartment. With this, you can also perform simultaneous migration to all tenants.

Disadvantages of multi-schema method

By separating the schema for each tenant, when changing the structure of the table, it is necessary to perform the same migration for all schemas. Even if the migration process takes 2 to 3 seconds, it will become severe as the number of tenants exceeds several thousand. There will also be high administrative costs to ensure that the migration can be completed for all tenants.

Therefore, I feel that the single schema method is ideal if access control can be done with certainty.

Single schema method using Row Level Security Policy

Overview

PostgeeSQL 9.5 and later has a feature called “Row Level Security Policy (RLS)”. This is a function that allows access only to the row of the condition specified in advance according to the user’s role and runtime parameters.

Setting method

Specifically, set as follows.

Example) I want to make the tenant_id column of the users table invisible except for records with a specific value.

Set RLS. (This is a setting that controls according to the runtime parameter)

ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_isolation_policy ON users FOR ALL USING (tenant_id = current_setting('tenant.id')::BIGINT);

After that, if you set the run-time parameter as follows, you can only access records with tenant_id=999.

SET tenant.id = 999;

Implementation with Rails (draft)

Create tenants table (Tenant model) that manages tenants. (* Since it is an example for explanation, the definition of table is omitted.)

Implement so that the tenant can be switched with the Tenant#switch method. Furthermore, it is convenient to be able to get the current tenant with Tenant.current.

class Tenant <ApplicationRecord
  def switch
    ActiveRecord::Base.connection.execute("SET tenant.id = #{id}")
  end
  def self.current
    find(ActiveRecord::Base.connection.execute('SHOW tenant.id').getvalue(0, 0))
  end
end

Make the tenant switch according to the request domain in the before_action of ApplicationController.

class ApplicationController <ActionController::API
  before_action :switch_tenant

  def switch_tenant
    Tenant.find_by(domain: request.host).switch
  end
end

Now you can only touch the data of your tenant.

However, when adding data, you must enter the tenant_id yourself. Since this is troublesome, implement it in the base class (ApplicationRecord) of Model so that it will be automatically included.

class ApplicationRecord <ActiveRecord::Base
  self.abstract_class = true
  after_initialize :set_tenant_id

  def set_tenant_id
    if new_record?
      if has_attribute?(:tenant_id)
        self.tenant_id = Tenant.current.id
      end
    end
  end
end

Now you can transparently access the data without being aware of the tenant.

important point

RLS works only for general users

The RLS restrictions are invalid for the user who created the CREATE TABLE and SUPER USER. Therefore, migration is executed by SUPER USER and the application is started by a general user. And so on.

Grant the necessary privileges to general users as follows.

GRANT SELECT, UPDATE, INSERT, DELETE ON ALL TABLES IN SCHEMA public TO PUBLIC;
GRANT SELECT, UPDATE ON ALL SEQUENCES IN SCHEMA public TO PUBLIC;

Be careful of UNIQUE constraints

UNIQUE constraint must be a composite index with tenant_id. (Since it is not visible from the application, validation does not need to be a compound condition)

Notes on INSERT

SELECT only sees the transparently constrained records, but when you do an INSERT you have to set the tenant_id yourself. (In the above implementation plan, it is automatically entered using the base class ApplicationRecord of Model)

Migration notes

CREATE POLICY, which is required when adding a table, makes me want to do it by migration, but in that case it is not reflected in schema.rb, so db:reset/db:setup cannot be used. (db:migrate:reset is ok)

Demerit

This time, I have not actually operated with this implementation, but I will raise the possible disadvantages.

table bloats

RDB becomes difficult to handle when the number of records is huge. Even if the index is set, it doesn’t fit in the memory and it suddenly becomes heavy.

Therefore, we will consider using the partitioning (table division) function. There is a method called list partition that divides according to the value of the column, so you will use it.

Split the table for each tenant_id. I can think of such a strategy first, but it seems that it is generally not supposed to create more than 100 child tables in partitioning, and there are reports that there is a performance problem (I actually tried it Not there). This approach doesn’t seem very realistic.

The strategy seems to be to manually divide the data as the data increases. It’s troublesome.

Deleting tenants is a hassle

It seems to be difficult because I have to erase the records in all the tables. In the case of the multi-schema method, it was easy because you just deleted the schema.

Data migration from other environments is difficult

Although it is a SaaS type service, it is also offered on-premises. In that case, it seems to be difficult when you need to migrate data from on-premises to SaaS. This is because the id of each table changes. In the case of the multi-schema method, all you have to do is dump and restore.

RLS decided to put off

After all, the table is bloated. I could not dispel the concern.

Migrating with the multi-schema method is a problem only at the time of deployment, and I feel that it is far better than daily performance.

There is almost no problem with the migration if the number of tenants is about several hundreds, so depending on the assumption of the number of tenants, it may be possible to use the multi-schema method at the startup stage. I thought.

Apartment has a function that can change the DB server depending on the tenant, etc., and this is more secure in terms of performance.

Other solutions

Citus https://www.citusdata.com/

An extension of PostgreSQL that makes multi-tenant feel nice.

It’s OSS, so you can install it on EC2, but you can’t use it on RDS…

From 2016, it seems that there was a service called Citus Cloud that provides managed services on AWS.

However, Microsoft acquired Citus in 2019. Citus Cloud is over. It seems that it can be used on Azure instead. Ah”~

AWS is a must, so it’s tough…

Apartment development is stagnant

Okay, I’ll be going to Apartment again! So I tried to use it in a new project (Rails 6), but it didn’t work and that was it.I think it’s a major gem, but it’s still not compatible with Rails 6 at this point (July 2020).

There is an actively maintained Fork version, so it seems to be okay if you use this for the time being.