Introduction

Purpose of this article

Continuing from the previous [What is standard input / standard output?](/ Angel_p_57 / items / 03582181e9f7a69f8168), this is an attempt to explain concepts and basics that may be difficult for Linux / UNIX beginners to grasp. This time, we are talking about ** environment variables **. Windows has the same terminology, and I think it's conceptually similar, but it's just about Linux / UNIX.

Caution

As for the explanation of environment variables that you often see, there are many talks about how to use them such as "If you add them to PATH, you can execute commands in that directory" or "Set the language with LANG", but in this article, I don't really talk about that. For standard environment variables, for example, POSIX Standard Environment Variables Chapter may be helpful (in English). ).

What are environment variables?

Super overview

Simply put, an environment variable is ** a type of parameter for adjusting the behavior of a program **, which is retained and managed for each process **. Therefore, knowledge of environment variables is required to handle Linux / UNIX only because ** there are some situations where the behavior of the program is adjusted through environment variables **.

Adjusting the behavior of the program

By the way, the term "process" mentioned above is a term that refers to the execution unit [^ 1] of a program on the OS. As you may know, in Linux / UNIX, the same program can be executed many times and sometimes at the same time, and the result obtained from time to time depends on the situation [^ 2].

The factors that change the result are the standard input / output explained in [What is standard input / standard output?](/ Angel_p_57 / items / 03582181e9f7a69f8168), and various configuration files may be involved depending on the program. Well, there are various things. However, there are only ** 2 types of parameters specified when the program is executed as the OS, excluding the program file name ** [^ 3]. One is the ** command line argument ** and the other is the ** environment variable **.

As a reminder, the command line arguments are literally ** the command string itself specified to run the program **, for example when you run the command ls -l,ls. The array data [^ 4] is decomposed into two strings,and -l.

Parameters for each process

Since command line arguments and environment variables are specified when the program is executed, as a matter of course, not only command line arguments but also environment variables are managed for each process **.

I think you can be satisfied with the former without any discomfort, but some people may not be satisfied with the latter.

Maybe some people are curious about multithreaded programs. Environment variables are common within the process, that is, ** shared data between threads **.

And this process-specific information can be referenced through the proc filesystem for Linux.

For example, suppose you have the same sleep program running two processes (PID 63, 64) as follows:

`Status of sleep command`


$ ps -fC sleep
UID        PID  PPID  C STIME TTY          TIME CMD
angel       63     5  0 17:51 pts/0    00:00:00 sleep 300
angel       64     5  0 17:51 pts/0    00:00:00 sleep 600

At this time, the command line arguments of each process and ** environment variables specified at runtime are read by reading the files / proc / PID / cmdline and / proc / PID / environ [^ 5]. You can know.

The purpose of saying "specified at runtime" is intended to mean that even if the content changes after that, it will not be reflected in this file ** [^ 6].

Since the inside of the file is separated by NUL characters (ASCII code 0), it will be easier to read if you change it to a line break with the tr command. (You can check that the NUL character is used with the od command.)

`Example of getting information from the proc file system`


$ od -An -tc /proc/63/cmdline
   s   l   e   e   p  \0   3   0   0  \0
$ tr \\0 \\n < /proc/63/cmdline
sleep
300
$ tr \\0 \\n < /proc/64/cmdline
sleep
600
$ od -An -tc /proc/63/environ
   P   A   T   H   =   /   b   i   n  \0   H   O   G   E   =   h
   o   g   e  \0
$ tr \\0 \\n < /proc/63/environ
PATH=/bin
HOGE=hoge
$ tr \\0 \\n < /proc/64/environ
PATH=/bin
UGE=uge

Therefore, the sleep program with PID 63 uses the two items sleep and 300 as command line strings and the two items PATH = / bin and HOGE = hoge as environment variables for PID 64. You can see that it was executed with sleep, 600 and PATH = / bin, ʻUGE = uge`.

Differences between command line arguments and environment variables

So far, we have explained that there are command line arguments and environment variables as parameters to be specified when executing a program. All of these are for adjusting the behavior of the program. But why are there two types? The reason is not clear (because it is a design issue as a UNIX operating system). But you can guess. We will approach by looking at the difference between the two.

Difference in data format

First is the difference in data format. The example of reading both the / proc / PID / cmdline and / proc / PID / environ files is excerpted and reprinted.

`Excerpt from an example of getting information from the proc file system`


$ tr \\0 \\n < /proc/63/cmdline
sleep
300
$ tr \\0 \\n < /proc/63/environ
PATH=/bin
HOGE=hoge

You can see that the command line arguments are just multiple strings, but the environment variables are a collection of name = value pairs [^ 7]. As you can imagine from this, there are the following differences between the two.

Command line arguments ** Array ** consisting of one or more strings (the order also makes sense)
Environment variable ** Associative array ** with value (character string) corresponding to key (name) ** (in no particular order, no duplicate keys are assumed)

An "associative array" is what is called a hash or dictionary (dictionary) in some programming languages. In other words, if there is data of HOGE = hoge in the environment variable, it means that the value of hoge is associated with the key of HOGE.

Surprisingly, there are no particular restrictions on the characters, except that NUL characters (ASCII code 0) and = cannot be used as keys. However, there may be problems when actually operating it [^ 8], so it is safer to limit it to ASCII alphanumeric characters and underscores. (It is customary to use uppercase letters)

By the way, the state where there is no specific key in the environment variable and the state where there is a key but the value is empty (a character string of zero length) are different. (Some programs may treat different things in the same way) The following is an example of executing the date command with no key TZ indicating the time zone and with a key but an empty value. The difference is that the former uses the system default timezone and the latter uses UTC.

`TZ(Time zone information)Difference in date behavior due to`


$ date  #No key TZ, if system default is JST
Sunday, December 2, 2018 20:35:01 JST
$ TZ= date  #Specify an empty string for key TZ and treat it as UTC
Sunday, December 2, 2018 11:35:08 UTC
$ date -u #Specify UTC as a command line argument
Sunday, December 2, 2018 11:35:12 UTC

By the way, as mentioned above, an environment variable is a collection of multiple data, but it is troublesome to say something like "the value associated with the environment variable key TZ ... "and no one does it. Therefore, it is generally called " TZ environment variable (value) "as if there were individual data. TZ is also an" environment variable name "instead of a" key ".

Difference in usage

Next is the difference in usage between the two. Some have effects like the TZ environment variable and the -u command line argument of date, and it is not uncommon to even combine environment variables and command line arguments that have exactly the same effect.

In other words, the two are not used properly according to such "effects". The proper use of the two is rather for administrative reasons.

Information redundancy

The command line arguments are array data, while the environment variables are associative arrays. In other words, the former has a troublesome structure to sort out even if unnecessary information is entered, while the latter simply ignores unused keys.

In other words, the ** command line argument is a usage that specifies only the information that is really necessary **, while the environment variable is ** a redundancy that merges the information specified in various programs and specifies it as the greatest common divisor ** It can be used with sex.

Separation of common processing and individual processing

Now look back at the TZ environment variable again.

I gave an example with the date command earlier, but in fact it is also effective for various programs related to time information such as time stamp related files of ls and stat.

`Effect of TZ environment variable on ls`


$ ls -l test.txt  #System standard(In this case JST)Timestamp output in
-rwx------1 angel angel 398 December 2 20:52 test.txt
$ TZ= ls -l test.txt  #Timestamp output to UTC
-rwx------1 angel angel 398 December 2 11:52 test.txt

This is because the library function localtime, which calculates the year, month, day, hour, day, and second from the time stamp (UNIX time [^ 9]), changes its behavior depending on the TZ environment variable. To put it the other way around, programs such as ls are not aware of this environment variable.

Let's think about it here. When writing a program, it is rare to write all its functions from scratch, and most of the processing should use some kind of library. What if you try to adjust all of those libraries that are trivial for the main purpose of the program, but that are adjustable ** with command line arguments **? Interpreting command-line arguments can be terribly complicated by itself, and even small changes to the library can be overwhelming.

Therefore, it makes sense to segregate ** the parts related to individual program functions with command line arguments and the common functions such as libraries with environment variables **.

In some cases, even if it is not a library, common environment variables are used to control similar functions of multiple programs. The LANG environment variable [^ 10] that controls what language to use, English or Japanese, etc. is a typical example.

Of course, that doesn't mean that you shouldn't control the functions of individual programs with environment variables **. Such usage is also common.

Handling and role of environment variables

So, I explained that environment variables can be used as follows.

It should be possible to merge various information including information (which may not be used depending on the program) and specify it as the greatest common divisor.
Being able to adjust the behavior of common functions such as individual libraries

In addition, in standard libraries and various programming languages, APIs that control program execution are often designed to inherit the environment variables of the current process to the program that starts them. [^ 11]

This is not an OS-related story, but a ** customary story about libraries, etc. **

Due to the above properties, for example, once you specify the LANG environment variable as ja_JP.UTF-8 (Japanese UTF-8), all the processes started from it (by inheriting the environment variable) (program). Change the behavior to output Japanese (as long as is supported). Therefore, since various programs behave as if they share the same parameters, they are used ** as if they are global setting information that controls the execution environment of various programs **.

But we have to be careful here. Environment variables are simply passed information from the starting process by the behavior of "taking over by default". ** The OS does not scatter common settings with some mysterious power **, so when you log in with ssh, for example, when a job is executed with cron, for example, a server-side application is executed from a web server. At that time, the contents of the environment variables may be completely different depending on the starting process.

Therefore, regarding environment variables, it is better to be aware of how and from which program the information is inherited and the extent of its influence.

Manipulating environment variables

Up to this point, we have only talked about passing environment variables when the program is executed, but even after the program is started **, you can operate the environment variables yourself (add, delete keys, change the value corresponding to the key) * *.

In the following, we will look at the operation of environment variables in each situation.

Who manages environment variables

When the program starts, it receives environment variables from the OS, but the dedicated C language API manages the subsequent operations. The OS is not involved.

Therefore, the operation contents are not reflected in the / proc / PID / environ file.

The typical API is as follows.

getenv
Get the value corresponding to the specified key
putenv,setenv
Add the specified key or change the value associated with the specified key
unsetenv
Delete the specified key
clearenv
Delete all keys

Note that ** changing environment variables is not thread-safe **. In a multithreaded program, the environment variables themselves are shared data between threads, so if you do not consider the consistency between threads when making changes, it can cause malfunctions, and use different environment variables in different threads. You can not.

Manipulate environment variables in the shell etc.

Since the shell's primary role is to control program execution, I think that manipulating environment variables in the shell is the most typical method.

And in the case of shells, ** environment variables can be handled by extending shell variables **.

Therefore, some people may think that environment variables are a shell-specific function ...?

Manipulating environment variables in the shell

Specifically, in the case of bash, you can refer to / change normal variables and refer to / change environment variables in exactly the same way [^ 12]. One difference is "whether or not to add an attribute that is an environment variable". You can see if a variable is a regular variable or an environment variable with the declare command. This declare command can also be used to add environment variable attributes. (Although the common one is the ʻexport` command)

`Variable operation example`


$ declare -p SHELL  #SHELL is usually an environment variable( -Seen by x)
declare -x SHELL="bash"
$ echo $SHELL  # $Can be referenced by variable name
bash
$ HOGE=hoge  #If you simply define a variable, the default is a normal variable
$ declare -p HOGE  # -It can be seen as a normal variable because there is no x
declare -- HOGE="hoge"
$ echo $HOGE  # $Can be referenced by variable name
hoge
$ declare -x HOGE  #Adding environment variable attributes(If you want to delete the attribute+Do with x)
$ declare -p HOGE  #You can see that it has changed to an environment variable
declare -x HOGE="hoge"
$ HOGE=hogehoge    #Once the environment variable attribute is attached, the attribute remains the same even if the value is changed.
$ declare -p HOGE  #Still environment variables
declare -x HOGE="hogehoge"
$ UGE=uge; export UGE  #export also does environment variable attributes(Definitions can be done together with one command)
$ declare -p UGE  #You can see that it is an environment variable
declare -x UGE="uge"
$ unset UGE  #Delete both regular variables and environment variables
$ declare -p UGE  #It will be an error because it has been deleted
bash: declare: UGE:Not found

Environment variables set in this way affect the behavior of the shell itself, as well as all commands executed from the shell.

If you want to see all the information of environment variables, execute the declare -x command without specifying the variable name. Alternatively, you can run the printenv command.

Setting temporary environment variables

However, you can temporarily set the environment variable when executing the command, or you can execute the command with the environment variable deleted (so that it cannot be inherited).

`Temporary variable changes`


$ declare -p LANG TZ   #LANG is Japanese UTF, TZ is none(Default time zone)
declare -x LANG="ja_JP.UTF-8"
bash: declare: TZ:Not found
$ date -d 2018-12-25T00:00+0900
Tuesday, December 25, 2018 00:00:00 JST
$ LANG= TZ=America/Los_Angeles date -d 2018-12-25T00:00+0900 #Temporarily LANG,Set TZ
Mon Dec 24 07:00:00 PST 2018
$ declare -p LANG TZ   #The environment variables of the shell itself have not changed
declare -x LANG="ja_JP.UTF-8"
bash: declare: TZ:Not found
$ ( exec -c date -d 2018-12-25T00:00+0900 )  #Executed with environment variables cleared, no longer output in Japanese
Tue Dec 25 00:00:00 JST 2018
$

In the above example, the ʻexec command is executed in a subshell with the ʻexec command in parentheses because the ʻexec command has the effect of ** executing the program in the form of replacing the current shell **. .. The parentheses are not required if the shell does not need to continue running anymore. ~~ Alternatively, simply executing ʻexec -c without the parentheses also clears the environment variables of the running shell. ~~ * I'm sorry, this description seems to have been misunderstood, so I will cancel it.

However, in the case of bash, temporary setting of environment variables and temporary clearing cannot be used together. If you want to use it together, you can use the ʻenv` command.

`env command example`


$ date -d 2018-12-25T00:00+0900  #Raw state
Sunday, December 2, 2018 20:41:02 JST
$ env TZ=America/Los_Angeles date  #TZ only setting
Sunday, December 2, 2018 03:41:12 PST
$ env - /bin/date  #Clear environment variables and execute
Sun Dec  2 20:41:30 JST 2018
$ env - PATH=/bin:/usr/bin TZ=America/Los_Angeles date  #Set TZ after clearing environment variables
Sun Dec  2 03:41:44 PST 2018

In this way, you can simply set environment variables temporarily, clear them, or use both. However, if you clear the environment variable, the PATH environment variable will also be cleared, so the program file will not be searched automatically from the command name. Therefore, it is necessary to specify the file name such as / bin / date or set the PATH environment variable again.

Examples of other tools

As another example, I will introduce make, which treats variables managed in the tool and environment variables closely.

make is a tool traditionally used to build programs (compile and link programs in multiple files, install them in system directories).

The following is an example of creating a file Makefile that controls the behavior of make and outputting the contents of variables and environment variables through it. I can't afford to explain how to use it, but I hope you can see how it is handled in a similar way.

`Example of using variables by make itself`


$ cat > Makefile   #Create Makefile first, Ctrl-End input with D
HOGE ?= hoge #Variable setting(Only when not set)、$(Variable name)See in
default:
#Output the contents of variables
        @echo $(HOGE)
#Output environment variables(If set)
        @printenv HOGE
^D
$ make  #Printenv fails because the variables in make are not environment variables
hoge
Makefile:4:target'default'Failed with the recipe
make: *** [default]Error 1
$ HOGE=hogehoge make  #Even if it is set as an environment variable, it is treated like a variable
hogehoge
hogehoge
$ make HOGE=hogehoge  #Set as an environment variable when specified as a command line argument
hogehoge
hogehoge

Manipulate environment variables in a scripting language

Next, I will touch on how to manipulate environment variables in various scripting languages (Perl, Python, Ruby).

In these languages, you can manipulate environment variables as if you were manipulating associative arrays (dictionaries / hashes). For Perl, % ENV, for Python, the os module ʻos.environ, and for Ruby, ʻENV.

These are not just variables, they are special data that execute C language APIs such as getenv and putenv through operations.

The scripts that set the TZ environment variable and reflect it in the current time display are shown below.

`date.pl`


use POSIX;

print strftime("%c",localtime),"\n";
$ENV{TZ}="America/Los_Angeles";
print strftime("%c",localtime),"\n";

`date.py`


import datetime
import os

print(datetime.datetime.now().strftime("%c"))
os.environ["TZ"]="America/Los_Angeles"
print(datetime.datetime.now().strftime("%c"))

`date.rb`


puts Time.now.strftime("%c")
ENV["TZ"]="America/Los_Angeles"
puts Time.now.strftime("%c")

They specify the same output format with the same API called strftime. You can see that there is a time lag for the time zone before and after changing the environment variables.

`Execution of each script`


$ unset LANG
$ perl date.pl
Sun Dec  2 18:47:35 2018
Sun Dec  2 01:47:35 2018
$ python3 date.py
Sun Dec  2 18:47:40 2018
Sun Dec  2 01:47:40 2018
$ ruby date.rb
Sun Dec  2 18:47:44 2018
Sun Dec  2 01:47:44 2018

Precautions for environment variable operation

There is one caveat when working with environment variables during program execution.

That is, ** changing environment variables does not always have an immediate effect **. This is because some libraries look at environment variables only when the program is started and perform initialization processing, and after that, they may not follow changes in environment variables.

In fact, the LANG environment variable that adjusts the language also has that aspect. If you don't explicitly call POSIX :: setlocale after changing the environment variables, as in the following Perl example, the` changes will not take effect [^ 13].

`Reflection of LANG in Perl`


$ #If LANG is set from the beginning, it will be output in Japanese
$ LANG=ja_JP.UTF-8 perl -CO -MPOSIX -E 'say strftime("%c",localtime)'
December 02, 2018 19:55:54
$ #Just changing LANG after startup does not make it Japanese
$ LANG=C perl -CO -MPOSIX -E '$ENV{LANG}="ja_JP.UTF-8"; say strftime("%c",localtime)'
Tue Dec  2 19:56:21 2018
$ #LANG changes are reflected by setlocale
$ LANG=C perl -CO -MPOSIX -E '$ENV{LANG}="ja_JP.UTF-8"; setlocale(LC_ALL,""); say strftime("%c",localtime)'
December 02, 2018 19:56:34

If it is an environment variable to be inherited by another program to be executed, you do not need to be very conscious, but if you want to change the behavior of your own process, be careful when the change of the environment variable is reflected in the library. I think that is good.

in conclusion

Summary

So it is a summary.

What are environment variables?
A type of parameter for adjusting the behavior of the program
Maintain and manage for each process
In a multi-thread program, it is shared data between threads.
Differences from command line arguments
Both command line arguments and environment variables are parameters specified when the program is started.
Command line arguments correspond to an array of strings, and environment variables correspond to an associative array of strings.
Command line arguments are program-specific parameters and basically do not contain useless information. On the other hand, environment variables are parameters that are referenced by various libraries and common functions rather than by the program itself, and may contain unused information (ignored).
Handling of environment variables
Like command line arguments, for environment variables, the OS is only involved in passing information to the program at runtime.
However, the library that controls program execution shows the behavior of inheriting the environment variables held by the current process to the program to be executed by default.
As a result, various programs behave as if they share the same parameters, so they are used as setting information that controls the execution environment of various programs.
However, the contents of environment variables can change significantly depending on what the starting program is, so it is better to be aware of where the environment variables are inherited and the range of influence.
Manipulating environment variables
Environment variables can be added / changed / deleted by each process even after the program is started.
Environment variable operations can be performed through the C language API as a library function (OS is not involved)
Tools such as shell and make can handle environment variables as an extension of variables managed by the tool itself.
In scripting languages such as Perl, Python, and Ruby, environment variables can be manipulated as a kind of associative array (hash / dictionary).

[^ 1]: ** Process **: You can list by ps or top command, but they are distinguished by ID called PID. That's it. [^ 2]: ** Some things like the / bin / true command will always give the same result unless there is a systematic error, but those commands are not useful. [^ 3]: ** Parameters to be specified when executing the program **: For details, see exeve (2) man page It is in .html). execve is an OS API (system call) that executes programs. [^ 4]: ** Decomposed array data **: Of course, in the shell, the user inputs the command as one character string, but it is decomposed and passed to the OS as sequence information. is. [^ 5]: ** Read file **: The command line argument / proc / PID / cmdline can be read by anyone, but the environment variable / proc / PID / environ can be the process executor or It can only be read by root. [^ 6]: ** Not reflected **: To tell the truth, there is no way to change the content later, but I will omit it. [^ 7]: ** Name = value pair **: It is possible to pass data in a format that is not name = value (the OS just passes the specified data). However, it is unclear what effect it will have, and perhaps no guarantee can be obtained. [^ 8]: ** Possible problems **: Simply put, those environment variables can no longer be manipulated in the shell. [^ 9]: ** UNIX time **: UNIX time is the time managed by the number of seconds elapsed from 1970/1/1 00:00 called Epoch. [^ 10]: ** LANG environment variable **: That said, the LANG environment variable is important in that it affects the language setting by the library function setlocale, so it may be more for the library. I don't. [^ 11]: ** Standard library that controls execution **: The so-called exec family of library functions is applicable. For details, refer to exec (3) man page. Of course, there are APIs (ʻexecle, ʻexecvpe) that can specify environment variables, but other APIs show the behavior of inheriting environment variables. [^ 12]: ** Extension of normal variables **: In the case of csh / tcsh, the interface for setting / changing is separated (for environment variables, the setenv, ʻunsetenvcommand). However, it is similar to bash that you can refer to the contents with$ variable name. [^ 13]: ** Not reflected unless setlocale is called **: Conversely, it can be said that setlocaleis called only when the program is started to automatically set the language. However, if the processing is not related tosetlocale while the LANGenvironment variable is affected, it may be affected immediately. If you don't think about inheriting environment variables to other programs, it is faster to specify the language directly withsetlocale without resetting the LANG` environment variables.

What are environment variables? (Linux)