[JAVA] Mask confidential information in log messages with Logback

Overview

The condition is that the mask target can be extracted with a regular expression, but the character string can be replaced with ** % replace ** in the Logback settings.

Use the function of Java regular expression and the reference of the captured substring. Then, you can mask under more difficult conditions than the above example. Here is an example and the actual code.

Example

In the following log, user ID, token, and resource ID are output in * all the same format * (hexadecimal 32 digits). Of these, I want to mask ** tokens only **.

Log example


2020-11-14T09:30:52.774+09:00 [main] INFO com.example.Main - UserID: 35f44b06a3cf8dab8355eb8ba5844c73, Token: b9656056c799ab9ba19cebe12b49992b, ResourceID: 945c4f63c61f1bc7ba632fe0ce25aa0d

Logback settings


<configuration debug="true">
	<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
		<encoder>
			<pattern>%date{yyyy-MM-dd'T'HH:mm:ss.SSSXXX} [%thread] %level %logger - %message%n</pattern>
		</encoder>
	</appender>

	<root level="INFO">
		<appender-ref ref="STDOUT" />
	</root>
</configuration>

Method

If you want to extract the token part with a regular expression, you can use "what is written just before Token "in this example. (Regular expressions alone are not enough to make a strict judgment in every situation)

Rewrite % message as follows.

%replace(%message){'((?i:token).{0,10}?)\b\p{XDigit}{32}\b','$1****'}

Then the log will be as follows. Only the token part is ****, and the others have not changed.

2020-11-14T09:43:31.724+09:00 [main] INFO com.example.Main - UserID: 5457645aaa75b97eb9e2c7b0aec79ca6, Token: ****, ResourceID: c194b0155ac7ece290092c1ee2a73948

% replace takes parentheses and two arguments [String # replaceAll () ](https://docs.oracle.com/javase/jp/8/docs/api/java/lang/ You can think of it as the same as the receiver and 2 arguments of String.html # replaceAll-java.lang.String-java.lang.String-).

If you do your best with regular expressions, you can say, "Leave 4 digits before and after the token."

(Supplement) Details of regular expressions

When converting to JSON format with Logstash

(It seems that there is a mask processing setting in Logstash, but that has not been investigated yet)

Since the settings in Logback are written in JSON, it is necessary to ** escape the backslash **. (Otherwise, \ b is recognized as a backspace and \ p is recognized as an illegal escape)

Settings (partial)


{
	"timestamp": "%date{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}",
	"thread": "%thread",
	"level": "%level",
	"logger": "%logger",
	"message": "%replace(%message){'((?i:token).{0,10}?)\\b\\p{XDigit}{32}\\b','$1****'}"
}

log


{"timestamp":"2020-11-14T11:31:38.259+09:00","thread":"main","level":"INFO","logger":"com.example.Main","message":"UserID: c610e22e634ed2ff9f1bb27afc81e638, Token: ****, ResourceID: de343ea6405a8c559043c3e3e84f9bcd"}

(Appendix) Experimental code

The code used for this experiment is as follows.

pom.xml


<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>org.example</groupId>
	<artifactId>logback-sample</artifactId>
	<version>1.0-SNAPSHOT</version>

	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<configuration>
					<source>8</source>
					<target>8</target>
				</configuration>
			</plugin>
		</plugins>
	</build>

	<dependencies>
		<dependency>
			<groupId>ch.qos.logback</groupId>
			<artifactId>logback-classic</artifactId>
			<version>1.2.3</version>
		</dependency>
		<dependency>
			<groupId>net.logstash.logback</groupId>
			<artifactId>logstash-logback-encoder</artifactId>
			<version>6.4</version>
		</dependency>
	</dependencies>
</project>

src/main/resources/logback.xml


<configuration debug="true">
	<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
		<encoder>
			<pattern>%date{yyyy-MM-dd'T'HH:mm:ss.SSSXXX} [%thread] %level %logger - %replace(%message){'((?i:token).{0,10}?)\b\p{XDigit}{32}\b','$1****'}%n</pattern>
		</encoder>
	</appender>

	<appender name="STDOUT_JSON" class="ch.qos.logback.core.ConsoleAppender">
		<!-- https://github.com/logstash/logstash-logback-encoder -->
		<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
			<providers>
				<pattern>
					<pattern>
						{
						"timestamp": "%date{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}",
						"thread": "%thread",
						"level": "%level",
						"logger": "%logger",
						"message": "%replace(%message){'((?i:token).{0,10}?)\\b\\p{XDigit}{32}\\b','$1****'}"
						}
					</pattern>
				</pattern>
			</providers>
		</encoder>
	</appender>

	<root level="INFO">
		<appender-ref ref="STDOUT" />
		<appender-ref ref="STDOUT_JSON" />
	</root>
</configuration>

src/main/java/com/example/Main.java


package com.example;

public class Main {
	private static final org.slf4j.Logger log =
			org.slf4j.LoggerFactory.getLogger(Main.class);

	public static void main(String[] args) {
		log.info("UserID: {}, Token: {}, ResourceID: {}", hex(), hex(), hex());
	}

	private static String hex() {
		return new java.util.Random().ints(16, 0, 256)
				.mapToObj(x -> String.format("%02x", x))
				.reduce("", (a, b) -> a + b);
	}
}

Recommended Posts

Mask confidential information in log messages with Logback
Item 75: Include failure-capture information in detail messages
Create a CSR with extended information in Java
Output log to external file with slf4j + logback with Maven