10 key tips for efficient Linux administrators

Image for post
Image for post

Good system administrators distinguish between efficiency. If an efficient system administrator can complete a task that others need 2 hours to complete within 10 minutes, then he should be rewarded (get more paid) because he saves time for the company, and time is money . The following techniques can save time-even if you are not paid more for being efficient, you can at least have more free time.

Tip 1: Uninstall the unresponsive DVD drive

The experience of a network novice: When you press the Eject button on the DVD drive of the server (running Redmond-based operating system), it will pop up immediately. He then complained that in most enterprise Linux servers, if a process is run in that directory, the popup will not happen. As a long-term Linux administrator, I will reboot the machine. If I don’t know what’s running and why I don’t release the DVD drive, I eject the disk. But this is very inefficient.

Here’s how to find the process of keeping the DVD drive and eject the DVD drive easily: first simulate. Put the disk in the DVD drive, open a terminal, and mount the DVD drive:

Now open the second terminal and try to eject the DVD drive:

You will get the following message:

Before releasing the device, let’s find out who is using it

The process is running and the disk cannot be ejected is actually our fault. Now, if you are the root user, you can terminate the process at will:

Now you can finally unmount the drive:

fuser is normal.

Tip 2: Restore the problematic screen

Try the following:

Attention! The terminal is like garbage. Everything entered is very messy. So what to do?

Enter reset. However, typing reset is too close to typing reboot or shutdown. Scared your palms to sweat — especially when performing this operation on a production machine.

Don’t worry, the machine will not restart during this operation. Continue operation:

The screen is now back to normal. This is much better than logging in again after closing the window, especially when you have to go through 5 machines and SSH to reach this machine.

Tip 3: Screen collaboration

David, a senior maintenance user from product engineering, called and said: “Why can’t I compile supercode.c on these new machines you deploy”.

You will ask him: “What machine are you running?”

David replied: “Posh”. (This fictitious company named its 5 production servers in honor of Spice Girls). Now you can show your skills, another machine is operated by David:

Go to posh:

After arriving, run the following code:

Then call David: “David, run the command # screen -x foo in the terminal”.

At this time, your conversation with David is connected in the Linux shell. You can type, he can also type, but each other can see what each other is doing. This avoids entering other levels, and both parties have the same control rights. The advantage of this is that David can observe your troubleshooting skills and understand exactly how to solve the problem.

In the end, everyone can see the problem: David’s compilation script hard-coded an old directory that was not on this new server. After loading it and compiling again, the problem can be solved, and then David continues to work. You can continue the previous entertainment activities.

One thing to note about this technique is that both parties need to log in as the same user. The screen command can also: implement multiple windows and split screens. Please read the man page for more information.

For screen sessions, I have one last trick. To detach from it and let it open, enter

(That is, hold down the Ctrl key and click the A key. Then press the D key). Then run the screen -x foo command again to re-splice.

Tip 4: Retrieve the root password

If you forget the root password, you must reinstall the entire machine. Worse still, many people will do this. But starting the machine and changing the password is very simple. This is not applicable in all situations (such as setting a GRUB password, but also forgotten), but here is a Cent OS Linux example to illustrate the general operation.

First restart the system. When restarting, the GRUB screen as shown in Figure 1 will pop up. Move the arrow keys so that you can stay on this screen instead of entering normal startup.

Image for post
Image for post

Then, use the arrow keys to select the kernel to start, and enter E to edit the kernel line. Then you can see the screen shown in Figure 2:

Image for post
Image for post

Use the arrow keys again to highlight the line starting with kernel, and press E to edit the kernel parameters. When you reach the screen shown in Figure 3, add the number 1 after the parameters shown in Figure 3:

Image for post
Image for post

Then press Enter and B, the kernel will boot into single user mode. Then run the passwd command to change the user root password:

Now you can restart, and the machine will start with the new password.

Tip 5: SSH backdoor

There have been many times when my site needs someone’s remote support, but he is blocked by the company’s fire protection. Few people realize that if they can reach the outside through a firewall, they can easily let in external information. In the original sense, this is called “smashing a hole in the firewall.” I call it an SSH backdoor. In order to use it, there must be a machine connected to the Internet as an intermediary. In this example, call such a machine blackbox.example.com. The machine behind the company firewall is called ginger. The machine supported by this technology is called tech. Figure 4 explains the setup process.

Image for post
Image for post

Here are the steps:

Check what is allowed, but make sure you ask the right person. Most people worry that you have turned on the firewall, but they don’t understand that it is fully encrypted. Moreover, external machines must be hacked to enter the company. However, you may belong to the “dare to do” type. Judge yourself the way you should choose, but don’t complain about others when you are not satisfied.

Use the -R flag to connect to blackbox.example.com from ginger via SSH. Assuming you are the root user on ginger, tech needs the root user ID to help use the system. Use the -R flag to forward the description of port 2222 on the blackbox to port 22 of ginger. This sets up the SSH channel. Note that only SSH communication can enter ginger: you will not put ginger on the unprotected Internet. This can be achieved using the following syntax:

After entering the blackbox, you only need to keep logged in. I always enter the following command:

Keep the machine busy. Then minimize the window.

Now instruct friends on tech to use SSH to connect to blackbox without using any special SSH flags. But you must give them the password:

Tech will prompt for a password. The root password of ginger should be entered. Now you and the support from tech can work together and solve problems. Even need to use the screen together! (see Tip 4).

Tip 6: Conduct remote VNC sessions via SSH channel

VNC or virtual network computing has been around for a long time. Usually, when a certain type of graphics program on a remote server can only be used on this server, I only need VNC.

For example, suppose in Tip 5, ginger is a storage server. Many devices use GUI programs to manage storage controllers. These GUI management tools usually need to be directly connected to the storage server through a network, and this network is sometimes stored in a dedicated sub-network. Therefore, this GUI can only be accessed through ginger.

You can try to use the -X option to connect to ginger via SSH and start it, but this requires a lot of bandwidth and you need to endure the pain of waiting. VNC is a network-friendly tool, suitable for almost all operating systems.

Assume that the settings are the same as in Tip 5, but hope that tech can access VNC instead of SSH. In this case, some similar operations are required, but the VNC port is forwarded. Perform the following steps:

Start a VNC server session on ginger. Run the following command:

These options indicate to start the server with a resolution of 1024×768 and a pixel depth of 24 bits per pixel. If you use a slower connection setting, 8 may be a better option. Use: 99 to specify the port that can access the VNC server. The VNC protocol starts at 5900, so: 99 means that the server is accessible from port 5999.

When starting the session, you are asked to specify a password. The user ID is the same as the user when starting the VNC server (root user in this example).

The SSH connecting from ginger to blackbox.example.com forwards port 5999 on blackbox to ginger. This is done in ginger by running the following command:

After running this command, you need to keep this SSH session open in order to preserve the port forwarded to ginger. At this point, if you are on blackbox, you can access the VNC session on ginger by running the following command:

This will forward the port to ginger via SSH, but we want to let VNC access ginger via tech. For this, another channel is needed. In tech, open a channel and forward port 5999 to port 5999 on blackbox via SHH. This is done by running the following command:

The SSH used this time is marked as -L. Instead of putting 5999 in the blackbox, it gets it from it. After reaching the blackbox, you need to keep this session open. You can now use VNC in tech!

In tech, run the following command to connect VNC to ginger:

Tech will now have a VNC session directly to ginger. Although the setup is a bit cumbersome, it is much better than running around to repair the storage array. But it becomes easier to practice several times.

Let me add one more point to this technique: If tech is running Windows® operating system and there is no command line SSH client, then tech can run Putty. Putty can be set to forward the SSH port by looking for the option in the sidebar. If the port is 5902 instead of 5999 in this example, you can enter the content in Figure 5.

Image for post
Image for post

If this setting is made, then tech can use VNC to connect to localhost:2 as if tech is running on the Linux operating system.

Tip 7: Check bandwidth

Imagine: Company A has a storage server named ginger and mounts NFS through a client node named beckham. Company A determines that they need to get more bandwidth from Ginger because there are a large number of nodes that need NFS to mount Ginger’s shared file system.

The most common and cheapest way to achieve this is to combine two Gigabit Ethernet NICs. This is the cheapest, because you usually have an additional available NIC and an additional port.

So take this approach. But the question now is: how much bandwidth is needed?

The theoretical limit of Gigabit Ethernet is 128MBit/s. Where does this number come from? Look at these calculations:

1Gb = 1024Mb; 1024Mb/8 = 128MB; “b” = “bits,”, “B” = “bytes”

But what do you actually see, and what is a good measurement method? I recommend a tool iperf. You can obtain iperf as follows:

This tool needs to be installed on a shared file system visible to both ginger and beckham, or compiled and installed on both nodes. I will compile it in the bob user’s home directory where both nodes are visible:

On ginger, run:

This machine will be used as a server and output the execution speed in MBit/s.

On the beckham node, run:

The results on both screens indicate the speed. On a normal server using a gigabit adapter, you may see a speed of about 112MBit/s. This is the common bandwidth in TCP stacks and physical cables. By connecting two servers in an end-to-end manner, and each server uses two connected Ethernet cards, I obtained about 220MBit/s of bandwidth.

In fact, the NFS seen on the connected network is about 150–160MBit/s. This still means that the bandwidth can achieve the desired effect. If you see a smaller value, you should check for problems.

I recently encountered a situation where two NICs using different drivers were connected through the connection driver. This results in very low performance, with a bandwidth of about 20MBit/s, which is smaller than the bandwidth when the Ethernet card is not connected!

Tip 8: Command line scripts and utilities

Linux system administrators can become more efficient by using authoritative command line scripts. This includes clever use of loops and knowing how to parse data using utilities such as awk, grep, and sed. Usually this can reduce the number of keystrokes and reduce user error rates.

For example, suppose you need to generate a new /etc/hosts file for the Linux cluster to be installed. The general practice is to add the IP address in vi or a text editor. However, it can be achieved by using the existing /etc/hosts file and appending the following to this file. Run on the command line:

200 hostnames (n001 to n200) will be created by IP addresses (192.168.99.1 to 192.168.99.200). Filling in such a file manually may create duplicate IP addresses or host names, so this is a good example of using the built-in command line to eliminate user errors. Please note that this is done in the bash shell (the default for most Linux distributions).

To give another example, suppose you want to check whether the memory size in each computing node in a Linux cluster is the same. Generally, having a distribution or similar shell is best. But for demonstration purposes, SSH is used below. Assume that SSH is set to not use password authentication. Then run:

Such a command line is quite concise. (It will be worse if you put regular expressions in it). Let’s break it down and discuss each part in detail.

First cycle from 001 to 200. Use the -w option of the seq command to fill 0 in the front. Then replace the num variable to create a host connected via SSH. Once you have the target host, issue a command to it. In this example:

1. This command means: use the free command to get the memory size in megabytes.

2. Get the result of this command, and use grep to get the line containing the string Mem.

3. Get that line and use awk to output the second field, which is the total memory in the node, perform this operation on each node.

After executing the command on each node, the entire output of the 200 nodes is sent (|d) to the sort command to sort all memory values. Finally, use the uniq command to eliminate duplicates. This command will cause one of the following situations:

1. If all nodes (n001 to n200) have the same memory size, only one number will be displayed. This number is the memory size seen by each operating system.

2. If the node memory size is different, you will see several memory size values.

3. Finally, if the SSH on a node fails, you will see some error messages.

This command is not perfect. If you find a different memory value than expected, you don’t know which node has the problem or how many nodes there are. Another command needs to be issued for this.

This technique provides a quick way to view some content, and if an error occurs, you can immediately know. Its value lies in quick checks.

Tip 9: Console reconnaissance

Some software will output error messages to the console, but the console may not be displayed in the SHH session. Use vcs equipment to check. In the SSH session, run the following command on the remote server # cat /dev/vcs1. This will display the contents of the first console. You can also use 2, 3, etc. to view other virtual terminals. If a user types on a remote system, you will see what he typed.

In most data farms, using a remote terminal server, KVM, or even Serial Over LAN is the best way to view this type of information; it also provides some benefits of out-of-band viewing. The use of vcs equipment can provide a fast in-band method, which can save time in the computer room to check the console.

Tip 10: Random system information collection

In Tip 8, an example of using the command line to obtain information about the total memory in the system is introduced. In this tip, I will introduce several other methods for collecting important information from systems that require verification, troubleshooting, or remote support.

First, gather information about the processor. It is easy to achieve with the following commands:

This command gives information about the speed, number, and model of the processor. In many cases, grep can be used to get the desired value. The check I often do is to determine the number of processors in the system. Therefore, if I buy a quad-core server with dual-core processors, I can run the following command:

Then I saw that the value should be 8. If not, I will call the supplier and ask them to send me another processor.

The other piece of information I need is disk information. It can be obtained using the df command. I always add the -h flag to see output in gigabytes or megabytes. # df -h will also display the partition status of the disk.

At the end of the list is the way to view the system firmware-a way to get the BIOS level and firmware information on the NIC.

To check the BIOS version, you can run the dmidecode command. Unfortunately, grep cannot be used to obtain information easily, so this is not a very effective method. For my Lenovo T61 laptop, the output is as follows:

This is much more effective than restarting the machine and viewing the POST output. To check the driver and firmware version of the Ethernet adapter, run ethtool:

Concluding remarks

Many techniques can be learned from someone who is proficient in the command line. The best way to learn is:

1. Work with other people. Share screen sessions and observe how other people work-you will discover new ways of doing things. You may need to be humble and let others guide you, but you can usually learn a lot.

2. Read the man page. Read the man page carefully, even if it is a familiar command, you can get a deeper insight. For example, you might not know that you can use awk for network programming before.

3. Solve the problem. As a system administrator, you always have to solve problems, whether they are caused by you or others. This is experience, and experience can make you better and more efficient.

The best administrators are more laid-back, because they can find the fastest way to complete tasks, and can complete tasks quickly, thus maintaining a leisure life.

Written by

Digital Nomad

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store