Recently, I and my team of programmers have discovered a minor obstacle to the application's request mechanism. After some checks, I found the root cause.

This blog is a translation from the English version. You can check the original from here. We use some machine translation. We would appreciate it if you could point out any translation errors. *

Disability and diagnosis

The thread pool for processing requests for the application was full and could not handle additional requests. After dumping the threads, I found that the thread stack was blocked in the log write area. Blocking the thread stack in the log write area is not a new issue, but I used to think this was the result of something else. Therefore, based on past experience, we have assumed that the problem is not in the log write area.

After using a lot of troubleshooting and frustrating without finding the root cause of the problem, we looked at the source code and tried to get some clues as to what was causing the problem. .. I found that my code has a log lock. From this log lock, we can see that there is a block in ArrayBlockingQueue.put from the thread stack. Further investigation revealed that it was a 1024-long Blocking Queue. What this means is that if you have 1024 objects in this queue, subsequent put requests will be blocked.

The programmer who wrote this code seemed to think that the data should be processed once the BlockingQueue is full. The code in question is below.

if (blockingQueue.retainCapacity() < 1) { //todo } blockingQueue.put

There are two main parts to the problem here.

A complete verdict goes directly to "put" instead of "else".
Even after the queue is full, the processing logic is // todo ...

The code above shows that this particular programmer is not familiar with the BlockingQueue interface. You do not need to make this decision first to get this result. A better way would be to use blockingQueue.offer. If this returns as'false', you can implement the relevant exception handling.

Blocking Queues type

BlockingQueue is a commonly used data structure in production / consumer mode. The most commonly used types are ArrayBlockingQueue, LinkedBlockingQueue, and SynchronousQueue.

The main difference between ArrayBlockingQueue and LinkedBlockingQueue is the objects that are placed in the queue. One is used for arrays and the other is used for linked tables. Other differences are also mentioned in the code notes.

Linked queues typically have higher throughput than array-based queues, but with poor predictable performance for most concurrent applications.

SynchronousQueue is a special BlockingQueue. It is used during offer. If no other thread is currently taking or polling, the offer will fail. It will also fail if no other thread is running offer at the same time during the take. This special mode is suitable for threads from queues and non-fixed thread pools that have high response requirements.

Problem overview

Online business scenarios require a timeout mechanism in all areas where concurrency and external access are blocked. I don't know how many times I've seen the lack of a timeout mechanism at the root cause of serious disruption to online business. Online business focuses on processing and completing requests quickly. Therefore, fast failure is the most important principle in the design of online business systems and code programming. According to this principle, the most obvious mistake in the code above is to use put instead of offer, which uses the timeout mechanism. Alternatively, in non-essential scenarios, offer should be used directly, and the result of'false'can be said to cause direct exceptions to be thrown or logged.

For BlockingQueue scenarios, you must limit the queue length in addition to the timeout mechanism. Otherwise, Integer. MAX_VALUE is used by default. In this case, a bug in the code will cause the memory to stop.

When we talk about BlockingQueue, we also mention the most used area of BlockingQueue, the thread pool.

ThreadPoolExecutor has a BlockingQueue parameter. If ArrayBlockingQueue or LinkedBlockingQueue is used here and the coreSize and poolSize of the thread pool are different, the thread pool will first be offered to the BlockingQueue after the coreSize thread is occupied. Will be sent. If successful, the process ends. However, this scenario does not always fit the needs of online business.

Online businesses operate in a fast-paced environment and require fast processing as opposed to placing requests in queues. In fact, it's best for online business if the requests aren't queued at all. This kind of online business structure is easily broken, and it's a relatively simple but good way to directly reject a request that exceeds the capacity of the system and issue an error message instead. .. But it's a limited mechanism.

Keep in mind that when writing highly concurrent and distributed code, it is important to pay attention not only to the system design, but also to the details of the code. By doing so, you can avoid the above problems.

Alibaba Cloud is the No. 1 (2019 Gartner) cloud infrastructure operator in the Asia Pacific region with two data centers in Japan and more than 60 availability zones in the world. Click here for more information on Alibaba Cloud. Alibaba Cloud Japan Official Page *

How to solve the problems of Java's three Blocking Queues

Disability and diagnosis

Blocking Queues type

Problem overview