Continuing from the previous [What is standard input / standard output?](/ Angel_p_57 / items / 03582181e9f7a69f8168), this is an attempt to explain concepts and basics that may be difficult for Linux / UNIX beginners to grasp. This time, we are talking about ** environment variables **. Windows has the same terminology, and I think it's conceptually similar, but it's just about Linux / UNIX.
As for the explanation of environment variables that you often see, there are many talks about how to use them such as "If you add them to PATH, you can execute commands in that directory" or "Set the language with LANG", but in this article, I don't really talk about that. For standard environment variables, for example, POSIX Standard Environment Variables Chapter may be helpful (in English). ).
Simply put, an environment variable is ** a type of parameter for adjusting the behavior of a program **, which is retained and managed for each process **. Therefore, knowledge of environment variables is required to handle Linux / UNIX only because ** there are some situations where the behavior of the program is adjusted through environment variables **.
By the way, the term "process" mentioned above is a term that refers to the execution unit [^ 1] of a program on the OS. As you may know, in Linux / UNIX, the same program can be executed many times and sometimes at the same time, and the result obtained from time to time depends on the situation [^ 2].
The factors that change the result are the standard input / output explained in [What is standard input / standard output?](/ Angel_p_57 / items / 03582181e9f7a69f8168), and various configuration files may be involved depending on the program. Well, there are various things. However, there are only ** 2 types of parameters specified when the program is executed as the OS, excluding the program file name ** [^ 3]. One is the ** command line argument ** and the other is the ** environment variable **.
As a reminder, the command line arguments are literally ** the command string itself specified to run the program **, for example when you run the command ls -l
,ls. The array data [^ 4] is decomposed into two strings,
and -l
.
Since command line arguments and environment variables are specified when the program is executed, as a matter of course, not only command line arguments but also environment variables are managed for each process **.
Maybe some people are curious about multithreaded programs. Environment variables are common within the process, that is, ** shared data between threads **.
And this process-specific information can be referenced through the proc filesystem for Linux.
For example, suppose you have the same sleep program running two processes (PID 63, 64) as follows:
Status of sleep command
$ ps -fC sleep
UID PID PPID C STIME TTY TIME CMD
angel 63 5 0 17:51 pts/0 00:00:00 sleep 300
angel 64 5 0 17:51 pts/0 00:00:00 sleep 600
At this time, the command line arguments of each process and ** environment variables specified at runtime are read by reading the files / proc / PID / cmdline
and / proc / PID / environ
[^ 5]. You can know.
Since the inside of the file is separated by NUL characters (ASCII code 0), it will be easier to read if you change it to a line break with the tr command. (You can check that the NUL character is used with the od command.)
Example of getting information from the proc file system
$ od -An -tc /proc/63/cmdline
s l e e p \0 3 0 0 \0
$ tr \\0 \\n < /proc/63/cmdline
sleep
300
$ tr \\0 \\n < /proc/64/cmdline
sleep
600
$ od -An -tc /proc/63/environ
P A T H = / b i n \0 H O G E = h
o g e \0
$ tr \\0 \\n < /proc/63/environ
PATH=/bin
HOGE=hoge
$ tr \\0 \\n < /proc/64/environ
PATH=/bin
UGE=uge
Therefore, the sleep program with PID 63 uses the two items sleep
and 300
as command line strings and the two items PATH = / bin
and HOGE = hoge
as environment variables for PID 64. You can see that it was executed with sleep
, 600
and PATH = / bin
, ʻUGE = uge`.
So far, we have explained that there are command line arguments and environment variables as parameters to be specified when executing a program. All of these are for adjusting the behavior of the program. But why are there two types? The reason is not clear (because it is a design issue as a UNIX operating system). But you can guess. We will approach by looking at the difference between the two.
First is the difference in data format. The example of reading both the / proc / PID / cmdline
and / proc / PID / environ
files is excerpted and reprinted.
Excerpt from an example of getting information from the proc file system
$ tr \\0 \\n < /proc/63/cmdline
sleep
300
$ tr \\0 \\n < /proc/63/environ
PATH=/bin
HOGE=hoge
You can see that the command line arguments are just multiple strings, but the environment variables are a collection of name = value
pairs [^ 7]. As you can imagine from this, there are the following differences between the two.
An "associative array" is what is called a hash or dictionary (dictionary) in some programming languages. In other words, if there is data of HOGE = hoge
in the environment variable, it means that the value of hoge
is associated with the key of HOGE
.
Surprisingly, there are no particular restrictions on the characters, except that NUL characters (ASCII code 0) and =
cannot be used as keys. However, there may be problems when actually operating it [^ 8], so it is safer to limit it to ASCII alphanumeric characters and underscores. (It is customary to use uppercase letters)
By the way, the state where there is no specific key in the environment variable and the state where there is a key but the value is empty (a character string of zero length) are different. (Some programs may treat different things in the same way)
The following is an example of executing the date command with no key TZ
indicating the time zone and with a key but an empty value. The difference is that the former uses the system default timezone and the latter uses UTC.
TZ(Time zone information)Difference in date behavior due to
$ date #No key TZ, if system default is JST
Sunday, December 2, 2018 20:35:01 JST
$ TZ= date #Specify an empty string for key TZ and treat it as UTC
Sunday, December 2, 2018 11:35:08 UTC
$ date -u #Specify UTC as a command line argument
Sunday, December 2, 2018 11:35:12 UTC
By the way, as mentioned above, an environment variable is a collection of multiple data, but it is troublesome to say something like "the value associated with the environment variable key TZ
... "and no one does it. Therefore, it is generally called " TZ
environment variable (value) "as if there were individual data. TZ
is also an" environment variable name "instead of a" key ".
Next is the difference in usage between the two.
Some have effects like the TZ
environment variable and the -u
command line argument of date
, and it is not uncommon to even combine environment variables and command line arguments that have exactly the same effect.
In other words, the two are not used properly according to such "effects". The proper use of the two is rather for administrative reasons.
The command line arguments are array data, while the environment variables are associative arrays. In other words, the former has a troublesome structure to sort out even if unnecessary information is entered, while the latter simply ignores unused keys.
In other words, the ** command line argument is a usage that specifies only the information that is really necessary **, while the environment variable is ** a redundancy that merges the information specified in various programs and specifies it as the greatest common divisor ** It can be used with sex.
Now look back at the TZ
environment variable again.
I gave an example with the date
command earlier, but in fact it is also effective for various programs related to time information such as time stamp related files of ls
and stat
.
Effect of TZ environment variable on ls
$ ls -l test.txt #System standard(In this case JST)Timestamp output in
-rwx------1 angel angel 398 December 2 20:52 test.txt
$ TZ= ls -l test.txt #Timestamp output to UTC
-rwx------1 angel angel 398 December 2 11:52 test.txt
This is because the library function localtime
, which calculates the year, month, day, hour, day, and second from the time stamp (UNIX time [^ 9]), changes its behavior depending on the TZ
environment variable. To put it the other way around, programs such as ls
are not aware of this environment variable.
Let's think about it here. When writing a program, it is rare to write all its functions from scratch, and most of the processing should use some kind of library. What if you try to adjust all of those libraries that are trivial for the main purpose of the program, but that are adjustable ** with command line arguments **? Interpreting command-line arguments can be terribly complicated by itself, and even small changes to the library can be overwhelming.
Therefore, it makes sense to segregate ** the parts related to individual program functions with command line arguments and the common functions such as libraries with environment variables **.
In some cases, even if it is not a library, common environment variables are used to control similar functions of multiple programs. The LANG
environment variable [^ 10] that controls what language to use, English or Japanese, etc. is a typical example.
So, I explained that environment variables can be used as follows.
In addition, in standard libraries and various programming languages, APIs that control program execution are often designed to inherit the environment variables of the current process to the program that starts them. [^ 11]
Due to the above properties, for example, once you specify the LANG
environment variable as ja_JP.UTF-8
(Japanese UTF-8), all the processes started from it (by inheriting the environment variable) (program). Change the behavior to output Japanese (as long as is supported).
Therefore, since various programs behave as if they share the same parameters, they are used ** as if they are global setting information that controls the execution environment of various programs **.
But we have to be careful here. Environment variables are simply passed information from the starting process by the behavior of "taking over by default". ** The OS does not scatter common settings with some mysterious power **, so when you log in with ssh, for example, when a job is executed with cron, for example, a server-side application is executed from a web server. At that time, the contents of the environment variables may be completely different depending on the starting process.
Therefore, regarding environment variables, it is better to be aware of how and from which program the information is inherited and the extent of its influence.
Up to this point, we have only talked about passing environment variables when the program is executed, but even after the program is started **, you can operate the environment variables yourself (add, delete keys, change the value corresponding to the key) * *.
In the following, we will look at the operation of environment variables in each situation.
When the program starts, it receives environment variables from the OS, but the dedicated C language API manages the subsequent operations. The OS is not involved.
/ proc / PID / environ
file.The typical API is as follows.
Note that ** changing environment variables is not thread-safe **. In a multithreaded program, the environment variables themselves are shared data between threads, so if you do not consider the consistency between threads when making changes, it can cause malfunctions, and use different environment variables in different threads. You can not.
Since the shell's primary role is to control program execution, I think that manipulating environment variables in the shell is the most typical method.
And in the case of shells, ** environment variables can be handled by extending shell variables **.
Specifically, in the case of bash, you can refer to / change normal variables and refer to / change environment variables in exactly the same way [^ 12]. One difference is "whether or not to add an attribute that is an environment variable".
You can see if a variable is a regular variable or an environment variable with the declare
command. This declare
command can also be used to add environment variable attributes. (Although the common one is the ʻexport` command)
Variable operation example
$ declare -p SHELL #SHELL is usually an environment variable( -Seen by x)
declare -x SHELL="bash"
$ echo $SHELL # $Can be referenced by variable name
bash
$ HOGE=hoge #If you simply define a variable, the default is a normal variable
$ declare -p HOGE # -It can be seen as a normal variable because there is no x
declare -- HOGE="hoge"
$ echo $HOGE # $Can be referenced by variable name
hoge
$ declare -x HOGE #Adding environment variable attributes(If you want to delete the attribute+Do with x)
$ declare -p HOGE #You can see that it has changed to an environment variable
declare -x HOGE="hoge"
$ HOGE=hogehoge #Once the environment variable attribute is attached, the attribute remains the same even if the value is changed.
$ declare -p HOGE #Still environment variables
declare -x HOGE="hogehoge"
$ UGE=uge; export UGE #export also does environment variable attributes(Definitions can be done together with one command)
$ declare -p UGE #You can see that it is an environment variable
declare -x UGE="uge"
$ unset UGE #Delete both regular variables and environment variables
$ declare -p UGE #It will be an error because it has been deleted
bash: declare: UGE:Not found
Environment variables set in this way affect the behavior of the shell itself, as well as all commands executed from the shell.
If you want to see all the information of environment variables, execute the declare -x
command without specifying the variable name. Alternatively, you can run the printenv
command.
However, you can temporarily set the environment variable when executing the command, or you can execute the command with the environment variable deleted (so that it cannot be inherited).
Temporary variable changes
$ declare -p LANG TZ #LANG is Japanese UTF, TZ is none(Default time zone)
declare -x LANG="ja_JP.UTF-8"
bash: declare: TZ:Not found
$ date -d 2018-12-25T00:00+0900
Tuesday, December 25, 2018 00:00:00 JST
$ LANG= TZ=America/Los_Angeles date -d 2018-12-25T00:00+0900 #Temporarily LANG,Set TZ
Mon Dec 24 07:00:00 PST 2018
$ declare -p LANG TZ #The environment variables of the shell itself have not changed
declare -x LANG="ja_JP.UTF-8"
bash: declare: TZ:Not found
$ ( exec -c date -d 2018-12-25T00:00+0900 ) #Executed with environment variables cleared, no longer output in Japanese
Tue Dec 25 00:00:00 JST 2018
$
In the above example, the ʻexec command is executed in a subshell with the ʻexec
command in parentheses because the ʻexec command has the effect of ** executing the program in the form of replacing the current shell **. .. The parentheses are not required if the shell does not need to continue running anymore. ~~ Alternatively, simply executing ʻexec -c
without the parentheses also clears the environment variables of the running shell. ~~ * I'm sorry, this description seems to have been misunderstood, so I will cancel it.
However, in the case of bash, temporary setting of environment variables and temporary clearing cannot be used together. If you want to use it together, you can use the ʻenv` command.
env command example
$ date -d 2018-12-25T00:00+0900 #Raw state
Sunday, December 2, 2018 20:41:02 JST
$ env TZ=America/Los_Angeles date #TZ only setting
Sunday, December 2, 2018 03:41:12 PST
$ env - /bin/date #Clear environment variables and execute
Sun Dec 2 20:41:30 JST 2018
$ env - PATH=/bin:/usr/bin TZ=America/Los_Angeles date #Set TZ after clearing environment variables
Sun Dec 2 03:41:44 PST 2018
In this way, you can simply set environment variables temporarily, clear them, or use both.
However, if you clear the environment variable, the PATH
environment variable will also be cleared, so the program file will not be searched automatically from the command name. Therefore, it is necessary to specify the file name such as / bin / date
or set the PATH
environment variable again.
As another example, I will introduce make
, which treats variables managed in the tool and environment variables closely.
make
is a tool traditionally used to build programs (compile and link programs in multiple files, install them in system directories).
The following is an example of creating a file Makefile
that controls the behavior of make
and outputting the contents of variables and environment variables through it. I can't afford to explain how to use it, but I hope you can see how it is handled in a similar way.
Example of using variables by make itself
$ cat > Makefile #Create Makefile first, Ctrl-End input with D
HOGE ?= hoge #Variable setting(Only when not set)、$(Variable name)See in
default:
#Output the contents of variables
@echo $(HOGE)
#Output environment variables(If set)
@printenv HOGE
^D
$ make #Printenv fails because the variables in make are not environment variables
hoge
Makefile:4:target'default'Failed with the recipe
make: *** [default]Error 1
$ HOGE=hogehoge make #Even if it is set as an environment variable, it is treated like a variable
hogehoge
hogehoge
$ make HOGE=hogehoge #Set as an environment variable when specified as a command line argument
hogehoge
hogehoge
Next, I will touch on how to manipulate environment variables in various scripting languages (Perl, Python, Ruby).
In these languages, you can manipulate environment variables as if you were manipulating associative arrays (dictionaries / hashes).
For Perl, % ENV
, for Python, the os module ʻos.environ, and for Ruby, ʻENV
.
These are not just variables, they are special data that execute C language APIs such as getenv
and putenv
through operations.
The scripts that set the TZ
environment variable and reflect it in the current time display are shown below.
date.pl
use POSIX;
print strftime("%c",localtime),"\n";
$ENV{TZ}="America/Los_Angeles";
print strftime("%c",localtime),"\n";
date.py
import datetime
import os
print(datetime.datetime.now().strftime("%c"))
os.environ["TZ"]="America/Los_Angeles"
print(datetime.datetime.now().strftime("%c"))
date.rb
puts Time.now.strftime("%c")
ENV["TZ"]="America/Los_Angeles"
puts Time.now.strftime("%c")
They specify the same output format with the same API called strftime
. You can see that there is a time lag for the time zone before and after changing the environment variables.
Execution of each script
$ unset LANG
$ perl date.pl
Sun Dec 2 18:47:35 2018
Sun Dec 2 01:47:35 2018
$ python3 date.py
Sun Dec 2 18:47:40 2018
Sun Dec 2 01:47:40 2018
$ ruby date.rb
Sun Dec 2 18:47:44 2018
Sun Dec 2 01:47:44 2018
There is one caveat when working with environment variables during program execution.
That is, ** changing environment variables does not always have an immediate effect **. This is because some libraries look at environment variables only when the program is started and perform initialization processing, and after that, they may not follow changes in environment variables.
In fact, the LANG
environment variable that adjusts the language also has that aspect.
If you don't explicitly call POSIX :: setlocale
after changing the environment variables, as in the following Perl example, the` changes will not take effect [^ 13].
Reflection of LANG in Perl
$ #If LANG is set from the beginning, it will be output in Japanese
$ LANG=ja_JP.UTF-8 perl -CO -MPOSIX -E 'say strftime("%c",localtime)'
December 02, 2018 19:55:54
$ #Just changing LANG after startup does not make it Japanese
$ LANG=C perl -CO -MPOSIX -E '$ENV{LANG}="ja_JP.UTF-8"; say strftime("%c",localtime)'
Tue Dec 2 19:56:21 2018
$ #LANG changes are reflected by setlocale
$ LANG=C perl -CO -MPOSIX -E '$ENV{LANG}="ja_JP.UTF-8"; setlocale(LC_ALL,""); say strftime("%c",localtime)'
December 02, 2018 19:56:34
If it is an environment variable to be inherited by another program to be executed, you do not need to be very conscious, but if you want to change the behavior of your own process, be careful when the change of the environment variable is reflected in the library. I think that is good.
So it is a summary.
[^ 1]: ** Process **: You can list by ps or top command, but they are distinguished by ID called PID. That's it.
[^ 2]: ** Some things like the / bin / true
command will always give the same result unless there is a systematic error, but those commands are not useful.
[^ 3]: ** Parameters to be specified when executing the program **: For details, see exeve (2) man page It is in .html). execve is an OS API (system call) that executes programs.
[^ 4]: ** Decomposed array data **: Of course, in the shell, the user inputs the command as one character string, but it is decomposed and passed to the OS as sequence information. is.
[^ 5]: ** Read file **: The command line argument / proc / PID / cmdline
can be read by anyone, but the environment variable / proc / PID / environ
can be the process executor or It can only be read by root.
[^ 6]: ** Not reflected **: To tell the truth, there is no way to change the content later, but I will omit it.
[^ 7]: ** Name = value pair **: It is possible to pass data in a format that is not name = value (the OS just passes the specified data). However, it is unclear what effect it will have, and perhaps no guarantee can be obtained.
[^ 8]: ** Possible problems **: Simply put, those environment variables can no longer be manipulated in the shell.
[^ 9]: ** UNIX time **: UNIX time is the time managed by the number of seconds elapsed from 1970/1/1 00:00 called Epoch.
[^ 10]: ** LANG environment variable **: That said, the LANG environment variable is important in that it affects the language setting by the library function setlocale
, so it may be more for the library. I don't.
[^ 11]: ** Standard library that controls execution **: The so-called exec family of library functions is applicable. For details, refer to exec (3) man page. Of course, there are APIs (ʻexecle, ʻexecvpe
) that can specify environment variables, but other APIs show the behavior of inheriting environment variables.
[^ 12]: ** Extension of normal variables **: In the case of csh / tcsh, the interface for setting / changing is separated (for environment variables, the setenv
, ʻunsetenvcommand). However, it is similar to bash that you can refer to the contents with
$ variable name. [^ 13]: ** Not reflected unless setlocale is called **: Conversely, it can be said that
setlocaleis called only when the program is started to automatically set the language. However, if the processing is not related to
setlocale while the
LANGenvironment variable is affected, it may be affected immediately. If you don't think about inheriting environment variables to other programs, it is faster to specify the language directly with
setlocale without resetting the
LANG` environment variables.
Recommended Posts