Basics of Quantum Information Theory: Trace Distance

\def\bra#1{\mathinner{\left\langle{#1}\right|}} \def\ket#1{\mathinner{\left|{#1}\right\rangle}} \def\braket#1#2{\mathinner{\left\langle{#1}\middle|#2\right\rangle}}

Introduction

As an index to measure the difference in quantum states, "Fidelity" was taken up in Previous article, so this time it is another index, "Trace distance (https://qiita.com/SamN/items/634b11b6faf8d713cff1). I will study "trace distance)". After explaining its definition and properties, I would like to actually calculate and confirm its important properties using the quantum calculation simulator qlazy.

The following documents were used as references.

  1. Nielsen, Chan "Quantum Computer and Quantum Communication (3)" Ohmsha (2005)
  2. Ishizaka, Ogawa, Kawachi, Kimura, Hayashi "Introduction to Quantum Information Science" Kyoritsu Shuppan (2012)
  3. Tomita "Quantum Information Engineering" Morikita Publishing (2017)

What is the trace distance?

Definition

Given the states $ \ rho $ and $ \ sigma $, the trace distance is defined as follows:

D(\rho,\sigma) = \frac{1}{2} ||\rho-\sigma|| = \frac{1}{2} Tr|\rho-\sigma|  \tag{1}

$ \ rho- \ sigma $ is Hermitian (not positive), so

D(\rho,\sigma) = \frac{1}{2} Tr \sum_{i} |\lambda_{i}| \ket{i} \bra{i} = \frac{1}{2} \sum_{i} |\lambda_{i}|  \tag{2}

It can be calculated as the sum of the absolute values of the eigenvalues of $ \ rho-\ sigma $, as in [^ 1].

[^ 1]: If $ \ rho $ and $ \ sigma $ are commutative, they can be diagonalized at the same time, and their diagonal components are probabilities, so they are equal to the distance between probability distributions in the classical sense. I will.

You can also use the projection operator or the positive operator $ P $ to express:

D(\rho,\sigma) = \max_{P} Tr(P(\rho-\sigma))  \tag{3}

From this equation, the trace distance is the maximum difference between the probabilities of performing all possible measurements (projection or POVM) when there are states $ \ rho $ and state $ \ sigma $. The physical interpretation that it represents is valid. Since this equation (3) is an important equation that is used many times when discussing the properties of the next section, I would like to prove it, but before that, for any Hermitian operator $ A $, the following holds. It is necessary to prove that, so I will beat this first.

Tr(A_{+}) = \max_{0 \leq P \leq I} (PA)  \tag{4}

Where $ P $ is a positive operator whose value is less than or equal to ~~ $ I $.

\begin{align}
A &= \sum_{i} a_{i} \ket{i} \bra{i} \\
A_{+} &= \sum_{i (a_i > 0)} a_{i} \ket{i} \bra{i} \\
A_{-} &= \sum_{i (a_i \leq 0)} |a_{i}| \ket{i} \bra{i} \tag{5}
\end{align}

is. $ A_ {+}, A_ {-} $ are called the positive part and negative part of the Hermitian operator $ A $.

[Proof]

\begin{align}
Tr(PA_{-}) &= Tr(P \sum_{i(a_i \leq 0)} |a_i| \ket{i} \bra{i}) \\
&= \sum_{i(a_i \leq 0),j} \bra{j} |a_i| P \ket{i} \braket{i}{j} \\
&= \sum_{i(a_i \leq 0)} |a_i| \bra{i} P \ket{i} \geq 0  \tag{6}
\end{align}

$ A = A_ {+}-A_ {-} $, so

Tr(PA) = Tr(PA_{+}) - Tr(PA_{-}) \leq Tr(PA_{+}) \leq Tr(A_{+})  \tag{7}

Is established. The last inequality sign used is $ Tr ((I-P) A_ {+}) \ geq 0 $ instead of $ I-P \ geq 0 $. Than this,

Tr(A_{+}) = \max_{0 \leq P \leq I} Tr(PA)  \tag{8}

Is established. Here, the maximum value is taken when $ P $ satisfies $ PA = A_ {+} $. (End of proof)

Then, it is a proof of equation (3).

[Proof]

For the Hermitian operator $ A $

\begin{align}
A &= A_{+} - A_{-} \\
|A| &= A_{+} + A_{-} \tag{9}
\end{align}

So

\begin{align}
Tr(A_{+}) &= \frac{1}{2} (Tr|A| + Tr(A)) \\
Tr(A_{-}) &= \frac{1}{2} (Tr|A| - Tr(A)) \tag{10}
\end{align}

is. Using equation (10) and the trace of $ \ rho-\ sigma $ being 0,

\begin{align}
D(\rho,\sigma) &= \frac{1}{2} Tr|\rho-\sigma| \\
&= Tr(\rho-\sigma)_{+} - \frac{1}{2} Tr(\rho-\sigma) \\
&= Tr(\rho-\sigma)_{+}  \tag{11}
\end{align}

Substituting Eq. (8) into this,

D(\rho,\sigma) = \max_{0 \leq P \leq I} Tr(P(\rho-\sigma))  \tag{12}

It will be. (End of proof)

nature

The trace distance has the following properties [^ 2].

[^ 2]: References Each property has a different classification method, but in this article, I have organized it into five. For the time being, I intend to (almost) cover the properties described in all references.

These (1), (2) and (3) are the three conditions that "distance" must meet [^ 3]. (4) shows that the trace distance always decreases in the physical process, that is, the state gradually becomes indistinguishable due to the interaction with the surrounding environment.

[^ 3]: On the other hand, Fidelity does not meet the distance requirement.

Let's check in order.

(1) Symmetry

D(\rho,\sigma) = D(\sigma,\rho)  \tag{13}

Is established. This is clear from the definition.

(2) Non-negative (maximum value is 1)

0 \leq D(\rho,\sigma) \leq 1  \tag{14}

Is established. In particular, the maximum and minimum are in the following cases.

\begin{align}
\rho = \sigma &\Leftrightarrow D(\rho,\sigma) = 0 \\
\rho \sigma = 0 &\Leftrightarrow D(\rho,\sigma) = 1 \tag{15}
\end{align}

[Proof]

First, the inequality sign on the left is clear from the definition (see equation (2)). Also, the equal sign holds only for $ \ rho = \ sigma $.

Next is the inequality sign on the right. Purify $ \ rho, \ sigma $ as follows.

\begin{align}
\rho &\rightarrow \ket{\phi_{\rho}} \\
\sigma &\rightarrow \ket{\phi_{\sigma}} \tag{16}
\end{align}

In general, the trace distance between purified states is larger than the trace distance before purification [^ 4],

[^ 4]: I casually use the contractility of property (4). From property (4), it can be said that the trace distance of the partial system is smaller than the trace distance of the whole system (Reference: Nielsen Chan )).

\begin{align}
D(\rho,\sigma) &\leq  \frac{1}{2} || \ket{\phi_{\rho}} \bra{\phi_{\rho}} - \ket{\phi_{\sigma}} \bra{\phi_{\sigma}} || \\
&= \frac{1}{2} Tr | \ket{\phi_{\rho}} \bra{\phi_{\rho}} - \ket{\phi_{\sigma}} \bra{\phi_{\sigma}} |   \tag{17}
\end{align}

Can be said. I want to calculate the trace on the right side concretely to show that the upper limit is 1, but I want some orthonormal system for that. Therefore, we use the Gram-Schmidt orthogonalization method. Since the rank is at most 2, it is only necessary to obtain the first two orthogonal systems. So I'll try it.

\begin{align}
\ket{u} &= \ket{\phi_{\rho}} \\
\ket{v} &= \frac{\ket{\phi_{\sigma}} - \braket{\phi_{\rho}}{\phi_{\sigma}} \ket{\phi_{\rho}}}{\sqrt{1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}}}  \tag{18}
\end{align}

Therefore,

\begin{align}
\ket{\phi_{\rho}} &= \ket{u} \\
\ket{\phi_{\sigma}} &= \braket{\phi_{\rho}}{\phi_{\sigma}} \ket{u} + \sqrt{1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}} \ket{v} \tag{19}
\end{align}

From this, the contents of the trace of Eq. (17) (calculated steadily),

\begin{align}
&\ket{\phi_{\rho}} \bra{\phi_{\rho}} - \ket{\phi_{\sigma}} \bra{\phi_{\sigma}} \\
&= (1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}) \ket{u} \bra{u} - \sqrt{1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}} \braket{\phi_{\rho}}{\phi_{\sigma}} \ket{u} \bra{v} - \sqrt{1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}} \braket{\phi_{\sigma}}{\phi_{\rho}} \ket{v} \bra{u} - (1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}) \ket{v} \bra{v}  \tag{20}
\end{align}

Then, if you set $ \ alpha = \ braket {\ phi_ {\ rho}} {\ phi_ {\ sigma}} $,

\begin{align}
&\ket{\phi_{\rho}} \bra{\phi_{\rho}} - \ket{\phi_{\sigma}} \bra{\phi_{\sigma}} \\
&= (1-|\alpha|^{2}) \ket{u}\bra{v} - \alpha \sqrt{1-|\alpha|^{2}} \ket{u}\bra{v} - \alpha^{*} \sqrt{1-|\alpha|^{2}} \ket{v}\bra{u} - (1-|\alpha|^{2}) \ket{v}\bra{v} \\
&=
\begin{pmatrix}
1-|\alpha|^{2} & - \alpha \sqrt{1-|\alpha|^{2}} \\
- \alpha^{*} \sqrt{1-|\alpha|^{2}} & - (1-|\alpha|^{2}) 
\end{pmatrix}  \tag{21}
\end{align}

is. You can diagonalize this matrix (= solve the eigenvalue problem) and calculate the sum of the absolute values of the obtained eigenvalues. So I'll try it. For the eigenvalue equation, the eigenvalue you want to find is $ \ lambda $.

\lambda^{2} - (1-|\alpha|^{2}) = 0  \tag{22}

It will be. Solving this, the two eigenvalues are

\begin{align}
&\lambda_1 = \sqrt{1-|\alpha|^{2}} = \sqrt{1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}} \\
&\lambda_2 = -\sqrt{1-|\alpha|^{2}} = -\sqrt{1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}} \tag{23}
\end{align}

And in the end, equation (17)

D(\rho,\sigma) \leq \sqrt{1-|\braket{\phi_{\rho}}{\phi_{\sigma}}|^{2}} \leq 1 \tag{24}

It will be. Here, the equal sign holds for $ \ braket {\ phi_ {\ rho}} {\ phi_ {\ sigma}} = 0 $, that is, when the states are orthogonal, that is, $ \ rho \ sigma = Only when it is 0 $. (End of proof)

(3) Triangle inequality

D(\rho,\sigma) \leq D(\rho,\tau) + D(\tau,\sigma)  \tag{25}

Is established.

[Proof]

From equation (3)

D(\rho,\sigma) = Tr(P(\rho-\sigma))  \tag{26}

There is $ P $ that meets. Using the appropriate $ \ tau $,

D(\rho,\sigma) = Tr(P(\rho-\sigma)) = Tr(P(\rho-\tau)) + Tr(P(\tau-\sigma))  \tag{27}

Can be done. Again, using equation (3),

D(\rho,\sigma) \leq D(\rho,\tau) + D(\tau,\sigma)  \tag{25}

Is established. (End of proof)

(4) Shrinkability

D(\rho,\sigma) \geq D(\Gamma(\rho),\Gamma(\sigma))  \tag{28}

Is established.

[Proof]

For the state $ \ rho, \ sigma $ in the attention system A, the state $ \ tau_ {R} $ in the reference system R is used to purify as follows.

\rho_{AR} = \rho \otimes \tau_{R}, \space \sigma_{AR} = \sigma \otimes \tau_{R}  \tag{29}

Note that the trace distance is invariant when purified using the same $ \ tau_ {R} $ in this way. That is,

\begin{align}
D(\rho_{AR}, \sigma_{AR}) &= \frac{1}{2} ||\rho_{AR}-\sigma_{AR}|| = \frac{1}{2} ||(\rho-\sigma) \otimes \tau_{R}|| \\
&= \frac{1}{2} ||\rho-\sigma|| \cdot ||\tau_{R}|| = \frac{1}{2} ||\rho-\sigma|| \\
&= D(\rho,\sigma)  \tag{30}
\end{align}

is. Also, the CPTP map for $ \ rho, \ sigma $ can be done using any unitary operator $ U $.

\begin{align}
\Gamma(\rho) &= Tr_{R} (U \rho_{AR} U^{\dagger})  \\
\Gamma(\sigma) &= Tr_{R} (U \sigma_{AR} U^{\dagger})  \tag{31}
\end{align}

Can be written. Then

\begin{align}
||\rho-\sigma|| &= ||\rho_{AR}-\sigma_{AR}|| = ||U(\rho_{AR}-\sigma_{AR})U^{\dagger}|| \\
&= \max_{V} |Tr(U(\rho_{AR}-\sigma_{AR})U^{\dagger}V)| \\
&\geq \max_{V_A} |Tr(U(\rho_{AR}-\sigma_{AR})U^{\dagger}(V_A \otimes I_R))| \\
&= \max_{V_A} |Tr((Tr_{R}(U\rho_{AR}U^{\dagger}) - Tr_{R}(U\sigma_{AR}U^{\dagger}))V_A)| \\
&= \max_{V_A} |Tr((\Gamma(\rho)-\Gamma(\sigma))V_A)| \\
&= ||\Gamma(\rho)-\Gamma(\sigma)||  \tag{32}
\end{align}

It will be.

Here, for the linear operator $ A $ and all unitary operators $ V $,

||A|| = \max_{V} |Tr(AV)|  \tag{33}

Was used to hold. After all, from equation (32)

D(\rho,\sigma) \geq D(\Gamma(\rho),\Gamma(\sigma))  \tag{28}

You can see that holds true. (End of proof)

(5) Strong convexity

D(\sum_{i} p_{i} \rho_{i}, \sum_{i} q_{i} \sigma_{i}) \leq D(p_{i},q_{i}) + \sum_{i} D(\rho_{i},\sigma_{i})  \tag{34}

Is established.

[Proof]

\begin{align}
&D(\sum_{i} p_{i} \rho_{i}, \sum_{i} q_{i} \sigma_{i}) \\
&= \frac{1}{2} Tr|\sum_{i} p_{i} \rho_{i} - \sum_{i} q_{i} \sigma_{i}| \\
&= \max_{0 \leq P \leq I} Tr(P(\sum_{i} p_{i} \rho_{i} - \sum_{i} q_{i} \sigma_{i})) \tag{35}
\end{align}

Here, by choosing $ P $ appropriately, you can do as follows.

\begin{align}
&D(\sum_{i} p_{i} \rho_{i}, \sum_{i} q_{i} \sigma_{i}) \\
&= Tr(P(\sum_{i} p_{i} \rho_{i} - \sum_{i} q_{i} \sigma_{i})) \\
&= Tr(P \sum_{i} p_{i} \rho_{i}) - Tr(P \sum_{i} q_{i} \sigma_{i}) \\
&= \sum_{i} p_{i} Tr(P\rho_{i}) - \sum_{i} q_{i} Tr(P\sigma_{i}) \\
&= \sum_{i} p_{i} Tr(P(\rho_{i}-\sigma_{i})) + \sum_{i} (p_i - q_i) Tr(P\sigma_{i})  \\
&\leq \sum_{i} p_{i} D(\rho_{i},\sigma_{i}) + \sum_{i} (p_i - q_i) Tr(P\sigma_{i})  \tag{36}
\end{align}

Now, in order to relate the second term of the last line to the inter-distribution distance of the classical probability distribution, consider the trace distance when $ \ rho, \ sigma $ are commutative. Since it is commutative, simultaneous diagonalization is possible as follows.

\begin{align}
\rho &= \sum_{i} p_{i} \ket{i} \bra{i} \\
\sigma &= \sum_{i} q_{i} \ket{i} \bra{i} \tag{37}
\end{align}

When calculating the trace distance under this condition,

\begin{align}
D(\rho,\sigma) &= \frac{1}{2} Tr|\sum_{i} (p_i - q_i) \ket{i} \bra{i}| \\
&= \frac{1}{2} \sum_{i} |p_i - q_i| \equiv D(p_i,q_i) \\
&= \max_{P} Tr(P(\rho-\sigma)) \\
&= \max_{P} Tr(P(\sum_{i} p_{i} \ket{i} \bra{i} - \sum_{i} q_{i} \ket{i} \bra{i})) \\
&= \max_{P} \sum_{i} (p_i - q_i) Tr (P \ket{i} \bra{i}) \\
&= \max_{P} \sum_{i} (p_i - q_i) Tr (P \sigma_{i}) \\
&\geq \sum_{i} (p_i -q_i) Tr(P\sigma_{i})  \tag{38}
\end{align}

Can lead. Where $ D (p_i, q_i) $ is the inter-distribution distance of the classical probability distribution. Substituting equation (38) into equation (36)

D(\sum_{i} p_{i} \rho_{i}, \sum_{i} q_{i} \sigma_{i}) \leq D(p_i,q_i) + \sum_{i} p_{i} D(\rho_{i},\sigma_{i}) \tag{34}

You can see that holds true. (End of proof)

By the way, if $ p_i = q_i $,

D(\sum_{i} p_{i} \rho_{i}, \sum_{i} p_{i} \sigma_{i}) \leq \sum_{i} p_{i} D(\rho_{i},\sigma_{i})  \tag{39}

Is established, and this is called "jointly convexity".

Check with the simulator

Now, let's focus on the fourth "shrinkability" of the trace distance properties shown above and check with the simulator whether it is true or not. Specifically, two density operators are randomly created, and a random unitary transformation is performed on the purified state by adding a randomly created quantum channel (reference system + environment system), and finally traced. As a result of passing through (defined in the way of out), let's see that the trace distance is certainly contraction (= decrease).

The whole Python code is below.

import random
import math
import numpy as np
from scipy.stats import unitary_group
from qlazypy import QState, DensOp

def random_densop(qnum_tar,qnum_ref,qnum_env):

    dim_pur = 2**(qnum_tar+qnum_ref)
    vec_pur = np.array([0.0]*dim_pur)
    vec_pur[0] = 1.0
    mat_pur = unitary_group.rvs(dim_pur)
    vec_pur = np.dot(mat_pur, vec_pur)

    dim_env = 2**qnum_env
    vec_env = np.array([0.0]*dim_env)
    vec_env[0] = 1.0

    vec_whole = np.kron(vec_pur,vec_env)

    qs = QState(vector=vec_whole)
    de = DensOp(qstate=[qs],prob=[1.0])

    qs.free()
    return de

def random_unitary(qnum):

    dim = 2**qnum
    mat = unitary_group.rvs(dim)
    
    return mat
    
if __name__ == '__main__':

    # settings
    qnum_tar = 2  # system A : target system
    qnum_ref = 2  # system R : reference system
    qnum_env = 2  # system E : environment system
    qnum_whole = qnum_tar + qnum_ref + qnum_env

    # two random states in system A+R+E (A+R:set randomly, E:set |0> initialy)
    de1_whole = random_densop(qnum_tar,qnum_ref,qnum_env)
    de2_whole = random_densop(qnum_tar,qnum_ref,qnum_env)

    # two states in system A (trace out R+E)
    de1_ini = de1_whole.partial(id=list(range(qnum_tar)))
    de2_ini = de2_whole.partial(id=list(range(qnum_tar)))

    # trace distance for initial states
    dis_ini = de1_ini.distance(de2_ini)

    # unitary transformation for whole system
    U = random_unitary(qnum_whole)
    de1_whole.apply(U)
    de2_whole.apply(U)

    # two states in system A (trace out R+E)
    de1_fin = de1_whole.partial(id=list(range(qnum_tar)))
    de2_fin = de2_whole.partial(id=list(range(qnum_tar)))
    
    # trace distance for final states
    dis_fin = de1_fin.distance(de2_fin)

    # result
    print("* trace distance(ini) =", dis_ini)
    print("* trace distance(fin) =", dis_fin)

    if dis_ini >= dis_fin:
        print("OK!")
    else:
        print("NG!")

    # free memory
    de1_whole.free()
    de2_whole.free()
    de1_ini.free()
    de2_ini.free()
    de1_fin.free()
    de2_fin.free()

I've just replaced the method that calculates fidelity in previous article with the method distance that calculates the trace distance, so I won't explain it in detail.

The execution result is as follows.

* trace distance(ini) = 0.5235122473095817
* trace distance(fin) = 0.26385081827631107
OK!

So you can see that the trace distance is shrinking. I ran it many times, but it always contracted (decreased).

in conclusion

This article has become quite long due to the careful development and proof of the formula. In the references, the key points are proved, but many are omitted. What I think is important is that I try to fill in the space between the lines myself, but this time it happened that the amount increased. It's almost the basics of linear algebra, but I learned a lot (I should have done it when I was in college, but sweat).

By the way, I'm planning next time, but for now I think it's about "entropy". This is a topic that cannot be excluded from the basics of quantum information theory. So, we are finally entering the area of information theory.

that's all

Recommended Posts

Basics of Quantum Information Theory: Trace Distance
Basics of Quantum Information Theory: Entropy (2)
Basics of Quantum Information Theory: Data Compression (1)
Basics of Quantum Information Theory: Horebaud Limits
Basics of Quantum Information Theory: Quantum State Tomography
Basics of Quantum Information Theory: Data Compression (2)
Basics of Quantum Information Theory: Topological Toric Code
Basics of Quantum Information Theory: Fault Tolerant Quantum Computation
Basics of Quantum Information Theory: Quantum Error Correction (Shor's Code)
Basics of Quantum Information Theory: Quantum Error Correction (CSS Code)
Basics of Quantum Information Theory: Quantum Error Correction (Stabilizer Code: 4)
Basics of Quantum Information Theory: Universal Quantum Calculation by Toric Code (1)
Basics of Quantum Information Theory: Logical Operation by Toric Code (Brading)
Read "Basics of Quantum Annealing" Day 5
Read "Basics of Quantum Annealing" Day 6
Basics of Tableau Basics (Visualization Using Geographic Information)
Basics of Python ①
Basics of python ①
Basics of Python scraping basics
# 4 [python] Basics of functions
Basics of network programs?
Basics of Perceptron Foundation
Basics of regression analysis
Basics of python: Output