Monday 9 December 2013

Server Hardware’s Fault ! - diagnosis & troubleshooting

I will cover some of the common hardware failure you might run into, along with the steps to troubleshoot and confirm them. 

Environment: CentOS_6.3 (32-bit)

Few of the more common pieces of hardware that fail are 
1. Hard drive(HDD).
2. RAM. 
3. Network card failures.
4. Server's temperature.

1. HDD

Hard drive manufacturers all have their own hard drive testing tools, modern hard drives should also support SMART(Self-Monitoring, Analysis and Reporting Technology; often written as SMART), which monitors the overall health of the hard drive and can alert you when the drive will fail soon. 

Install package smarttools, pass the -H option to smartctl to check the health of a drive:

# smartctl -H /dev/sda
smartctl 5.40 2010-10-16 r3189 [x86_64-redhat-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen,

SMART Health Status: OK 

The test has been passed without any errors. If it would be there then it would have displayed the WARNING/ERROR 

You can also pull much more information about a hard drive using smartctl with the -a option. That option will pull out all of the SMART information 
about the drive

#smartctl -a /dev/sda


Tool for detecting memory would be Memtest86  which needs to be installed additionally which is added automatically to your GRUB configurations.

No matter how you invoke it at boot time, once you start Memtest86, it will immediately launch and start scanning your RAM.

If there are found to be few errors, try to change a new RAM and again re-run the test.

3. Network card failures

When a network card or some other network component to which your host is connected starts to fail, you can see it in packet errors on your system. The 
ifconfig command you may have used for network troubleshooting before can also tell you about TX (transmit) or RX (receive) errors for a card.

# ifconfig eth0 | egrep "RX|TX"
          RX packets:2144865318 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2339638100 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000

If you start to see a lot of errors here, then it's worth troubleshooting your physical net-work components. It's possible a network card, cable, or switch port is going bad.

4. Server Temperature

A poorly cooled server can cause premature hard drive failure and premature failure in the rest of the server components as well.  

Linux provides tools that allow you to probe CPU and motherboard temperatures, and, in some cases, the temperatures of PCI devices and even fan speeds. All of this support is provided by the lm-sensors package.

Once the lm-sensors package is installed, run the sensors-detect as root :


This interactive script will probe the hardware on your system so it knows
how to query for temperature. 
Once the sensors-detect script is completed, you can pull data about your server by running the command sensors.

# sensors
Adapter: ISA adapter
Core 0:       +46.0°C  (high = +84.0°C, crit = +100.0°C)

Adapter: ISA adapter
Core 1:       +42.0°C  (high = +84.0°C, crit = +100.0°C)

Adapter: ISA adapter
Core 2:       +42.0°C  (high = +84.0°C, crit = +100.0°C)

Adapter: ISA adapter
Core 3:       +42.0°C  (high = +84.0°C, crit = +100.0°C)

Adapter: ISA adapter
Ch. 0 DIMM 0:  +58.0°C  (low  = +118.0°C, high = +124.0°C)
Ch. 1 DIMM 0:  +64.0°C  (low  = +118.0°C, high = +124.0°C)
Ch. 2 DIMM 0:  +72.0°C  (low  = +118.0°C, high = +124.0°C)
Ch. 3 DIMM 0:  +64.0°C  (low  = +118.0°C, high = +124.0°C)

You could sue sensors -f for the temperatures in Fahrenheit.

Examine the airflow around the server and make sure the vents in and out of the server aren't clogged with dust.If you have the bad habit of not rack mounting your servers but instead installing a shelf and stacking servers one on top of the other, that will also contribute to poor airflow and overheating. 

I would like to conclude the article by saying that these are the few of the common failures for the hardware faults.

Wednesday 16 October 2013

Recovering volume group metadata in LVM with UUID disk #CentOS_6_RHEL

Objective: Recover volume group metadata in LVM.

Environment: CentOS_6.3 (32-bit)

LVM version: 2.02.95

Brief introduction on LVM:

To use the device for an LVM logical volume the device must be initialized as a physical volume (PV), which on initializing places a label near the start of the device. By default, the LVM label is placed in the second 512-byte sector. The LVM label identifies the device as an LVM physical volume which contains the UUID for the physical name. It also stores the size of block devices in bytes, and LVM metadata stored on the disk.

The LVM metadata contains the configuration details of the LVM volume groups on your system. By default, an identical copy of the metadata is maintained in every metadata area in every physical volume within the volume group.

Error which I had received when accessing an disk, where metadata was corrupted.

# pvs -o pv_name,uuid
  Couldn't find device with uuid U3qzRV-jakK-0JFC-zVmq-F1aa-nxrg-9JYsXy.
- We would be able to find the UUID for the PV in /etc/lvm/archive directory, deactivating the volume and setting the partial(-P) argument will
enable you to find the UUID of the missing corrupted physical volume.

# vgchange -an --partial
  Partial mode. Incomplete logical volumes will be processed.
  Couldn't find device with uuid U3qzRV-jakK-0JFC-zVmq-F1aa-nxrg-9JYsXy.
  0 logical volume(s) in volume group "testvg" now active

# vgck testvg
  Couldn't find device with uuid U3qzRV-jakK-0JFC-zVmq-F1aa-nxrg-9JYsXy.
  The volume group is missing 1 physical volumes.

# pvck /dev/sdd
  Could not find LVM label on /dev/sdd

- pvcreate comes in handy restores the physical volume label with the metadata information contained in VG group(testvg). The restore file argument instructs the   pvcreate command to make the new physical volume compatible with the old one on the volume group, ensuring that the new metadata will not be placed where the old physical volume contained data.

NOTE: The pvcreate command overwrites only the LVM metadata areas and does not affect the existing data areas.

# pvcreate --uuid "U3qzRV-jakK-0JFC-zVmq-F1aa-nxrg-9JYsXy" --restore /etc/lvm/backup/testvg /dev/sdd
  Couldn't find device with uuid U3qzRV-jakK-0JFC-zVmq-F1aa-nxrg-9JYsXy.
  Writing physical volume data to disk "/dev/sdd"
  Physical volume "/dev/sdd" successfully created

- vgcfgrestore command to restore the volume group's metadata

# vgcfgrestore testvg
  Restored volume group testvg

# pvs -o pv_name,uuid | grep "U3qzRV-jakK-0JFC-zVmq-F1aa-nxrg-9JYsXy"
  /dev/sdd   U3qzRV-jakK-0JFC-zVmq-F1aa-nxrg-9JYsXy

Recovered volume metadata from LVM.

Thursday 10 October 2013

Disk Encryption - eCryptfs & dm-crypt + LUKS - #CentOS_6/RHEL_6

Objective :- Disk encryption techniques.

Environment :- CentOS release 6.3(32-bit)

I would like to discuss few techniques available in linux for cryptographically protecting a logical part of a storage disk(folder, partition, whole disk, ...), so that all data that is written to it is automatically encrypted, and decrypted on-the-fly when read again.

Below two methods are discussed as below :-

1.  eCryptfs 
2.  dm-crypt with LUKS


I would describe the basic use of eCryptfs, which will guide through the process of creating a secure and a private directory which can store your sensitive and private data.

This doesn't require special on-disk storage allocation effort, such as seperate partition, you can mount eCryptfs on top of any single directory to protect it.All cryptographic metadata is stored in the headers of files, so encrypted data can be easily moved, stored for backup and recovered. 

There are few of the drawbacks, for instance eCryptfs is not suitable for encrypting complete partitions, however instead you can combine it with dm-crypt. But in this article I would be combining dm-crypt with LUKS mechanism and demonstrate.

On summarizing, eCryptfs, a "pseudo-file system" which provides data and filename encryption on a per-file basis, it is a file system layer that resides on top of an actual file system,  providing encryption capabilities.

- Install the package.
#yum install ecryptfs-utils -y

Assume while creating a new file system(/db), I had created another file system layer that which resides on the top of the actual file system.
# df -h /db

Filesystem            Size  Used Avail Use% Mounted on
                      485M   11M  449M   3% /db

# mount -t ecryptfs /db /db  
Select key type to use for newly created files:
1) tspi
2) openssl
3) passphrase  
Selection: 3  
Select cipher:  
1) aes: blocksize = 16;min keysize = 16; max keysize = 32 (not loaded)  
2) blowfish: blocksize = 16; min keysize = 16; max keysize = 56 (not loaded)  
3) des3_ede: blocksize = 8; min keysize = 24; max keysize = 24 (not loaded)  
4) twofish: blocksize = 16; min keysize = 16; max keysize = 32 (not loaded)  
5) cast6: blocksize = 16; min keysize = 16; max keysize = 32 (not loaded)  
6) cast5: blocksize = 8; min keysize = 5; max keysize = 16 (not loaded)  
Selection [aes]: 
Select key bytes:  
1) 16  
2) 32  
3) 24  
Selection [16]:  
Enable plaintext passthrough (y/n) [n]:  
Enable filename encryption (y/n) [n]:  
Attempting to mount with the following options:  
Mounted eCryptfs  
- Now there are two /db file system, one which is an actual file system, another which resides on the top of the file system.
# df -h | grep db
                      485M   11M  449M   3% /db
/db                   485M   11M  449M   3% /db

- Any thing written on the /db file system will now be encrypted.
:/db]# cat >encrypt_file
All files and directories will be encrypted in this file system.

- I would now unmount /db, which is encrypted file system. viewing the actual file system will result in garbled random looking characters.
Reading the encrypted sectors without permission will return garbled random-looking data instead of the actual files.
#umount /db 

:/db]# tail encrypt_file

fÓxpu;¬0éb¶EyºlH$ÖR/'úÚÿ±' £¢?ß¹[]v­*ã*Ʀ¡²Ò
Æéþ½ rll9æ½vÿså¢GÙMøåÇûÀÅë·ÙúKoí:/db]#

Hence we have successfully encrypted the files in the file system.

Snap is provided which is self-explanatory.

dm-crypt with LUKS

dm-crypt is the standard device-mapper encryption functionality provided by the Linux kernel. It can be used directly by those who like to have full control over all aspects of partition and key management.

LUKS(Linux Unified Keystartup) is an additional convenience layer which stores all of the needed setup information for dm-crypt on the disk itself and abstracts partition and key management in an attempt to improve ease of use.

Summarizing, LUKS is a standard format for device encryption. LUKS encrypts the parttion or volume, the volume must be decrypted before the file system in it can be mounted.

- Create a new partiton on the disk.

- Encrypt the new partition and set the decryption password.
# cryptsetup luksFormat /dev/sdd1

This will overwrite data on /dev/sdd1 irrevocably.
Are you sure? (Type uppercase yes): YES
Enter LUKS passphrase:
Verify passphrase:

- You need to unlock the encrypted volume.
# cryptsetup luksOpen /dev/sdd1 cryptvg-cryptlv
Enter passphrase for /dev/sdd1:

- Create an ext4 file system on the decrypted volume.
[root@kickstart ~]# mkfs -t ext4 /dev/mapper/cryptvg-cryptlv

- Create a directory mount point and mount the file system.
# mount /dev/mapper/cryptvg-cryptlv /secret #

- When finished, unmount and lock the encrypted volume.
# umount /secret/

# cryptsetup luksClose /dev/mapper/cryptvg-cryptlv #

Persistant Mount Encrypted partition.

- Add your list of devices to be unlocked during the system startup.
# cat /etc/crypttab
Name         Device         /path/to/password
cryptvg-cryptlv /dev/sdd1       /root/encdisk

- Create an entry in fstab.
# tail -1 /etc/fstab
/dev/mapper/cryptvg-cryptlv     /secret         ext4    defaults        1 2

- Create a keyfile that includes the password. Make sure it is owned by the root and the mode is 600. Add the key for LKS.
#echo -n "passphrase" > /root/encdisk
#chown root /root/encdisk
#chmod 600 /root/encdisk
#cryptsetup luksAddKey /dev/sdd1 /root/encdisk

Reboot your system, it should not ask for any passphrase during the boot. Check your file system should be mounted automatically without providing the passphrase.

Providing the snap, which is self-explanatory.

NOTE: The device in the /etc/fstab and the /etc/crypttab should be the same. Since I made a mistake in these two files which made system to PANIC. This would be kernel was unable to boot with the name as in fstab.


Monday 30 September 2013

PXE boot[Installation/Rescue-Environment]with kickstart configurations - #linux_CentOS/Redhat

Objective: How to automate installation/Resuce(PXE using kickstart) CentOS.

I have already discussed about the kickstart installation in my previous post  CentOS kickstart

In my earlier post, I had generated kickstart file then boot from CD/DVD where I pointed the location of kickstart file either from HTTP/FTP/NFS method. But in this post I have used the same kickstart file to NIC of the destination server which acts like a boot device, eliminating CD/DVD.

Environment: RHEL/CentOS 6.3 (i386)

What is PXE ?

Preboot eXecution Environment is an environment to boot computers using a network interface independently of data storage devices (like hard disks) or installed operating systems.

PXE works with NIC of the system making it function like a boot device. PXE enabled NIC of the client sends out broadcast request to DHCP server, which returns with the IP address of the client along with the address of the TFTP server, and the location of the boot files on the TFTP server.

In detail :

1. Destination server is booted
2. NIC of the destination triggers DHCP request.
3. DHCP server intercepts the request and responds with the standard information(IP, subnet, gateway, etc). In addition it provides information about the location of a TFTP server and boot image(pxelinux.0)
4. When client receives this information, it contacts TFTP server for obtaining the boot image.
5. TFTP server sends the boot image(pxelinux.0) and client executes it.
6. By default, boot image searches the pxelinux.cfg directory in TFTP server.
7. The destination server downloads all the files it needs(kernel and root file system), and then loads them.

Make sure below packages are installed and services are started. [ Skipping package installation ]
1. tftp-server 
2. syslinux
3. dhcp

#grep disable /etc/xinetd.d/tftp
disable                 = no

#cp /usr/share/syslinux/pxelinux.0 /var/lib/tftpboot/
#cp /usr/share/syslinux/menu.c32 /var/lib/tftpboot/
#cp /usr/share/syslinux/memdisk /var/lib/tftpboot/
#cp /usr/share/syslinux/mboot.c32 /var/lib/tftpboot/
#cp /usr/share/syslinux/chain.c32 /var/lib/tftpboot/
#mkdir /var/lib/tftpboot/pxelinux.cfg
#mkdir -p /var/lib/tftpboot/images/centos/i386/6.3
#cp /var/ftp/pub/images/pxeboot/initrd.img /var/lib/tftpboot/images/centos/i386/6.3
#cp /var/ftp/pub/isolinux/vmlinuz /var/lib/tftpboot/images/centos/i386/6.3

DHCP configuration file is as below..

# cat /etc/dhcp/dhcpd.conf

option domain-name      "";
option domain-name-servers;
default-lease-time 600;
max-lease-time 7200;
#################The followings are mandatory to boot from PXE ###################
allow booting;
allow bootp;
option option-128 code 128 = string;
option option-129 code 129 = text;
filename "/pxelinux.0";
subnet netmask {
        range dynamic-bootp;
        option broadcast-address;
        option routers;

# cat /var/lib/tftpboot/pxelinux.cfg/default

default menu.c32
prompt 0
timeout 30


LABEL CentsOS_6.3_i386
    MENU LABEL CentOS 6.3 i386
    KERNEL images/centos/i386/6.3/vmlinuz
    APPEND initrd=images/centos/i386/6.3/initrd.img ks=ftp://<IPaddress of kickstart file server>/pub/ks.cfg ramdisk_size=100000

LABEL CentsOS_6.3_i386_Rescue
    MENU LABEL CentOS 6.3 i386_Rescue
    KERNEL images/centos/i386/6.3/vmlinuz
    APPEND initrd=images/centos/i386/6.3/initrd.img ramdisk_size=100000 repo=ftp://<IPaddress of kickstart file server>/pub lang=en_US.UTF-8 keymap=us rescue 

This completes all your configuration, before this works make sure all your services are running and persistant.

#service xinetd start
#chkconfig xinetd on
#service dhcpd start
#chkconfig dhcpd on

you need to check in what method are you trying to kickstart your installation based on which your services should be up.
#service vsftpd/httpd/nfs start
#chkconfig vsftpd/httpd/nfs on

NOTE: Make sure your first boot device is network.

I have chosen to change the password in the rescue environment as a test and is successful.


Saturday 14 September 2013

automated installations - kickstart

Objective: How to automate installation(kickstart) CentOS

Environment: RHEL/CentOS 6.3 (i386)

What is kickstart ?

I had preferred to use an automated installation of CentOS-6.3 on my system. Redhat created kickstart installation method. I had created a kickstart file from system-config-kickstart utility and customized with my requirements to the file. This single file containing the answers to all the questions that would normally be asked during a typical installation.

This file is being kept in a single server and read by individual servers during installations. 

How did I perform kickstart ?

firstly, created a kickstart file.
secondly, I have made kickstart file available over the network.
thirdly, make the installation tree available.
lastly, started the installation.

I would not be describing here about how to create kickstart file with the GUI mode, I would leave it as an exercise for the reader. 
How to create kickstart file in GUI mode ? click here

I have tried using kickstart over network in below ways all of which were successful :-
1. FTP
3. NFS

Summary on kickstart script :-

- This was an text installation, non-interactive over FTP, NFS. HTTP, in which firewall and SeLinux was disabled. 

- There are two network interfaces one assigned with DHCP and another with static, in which host names are assigned.

- Grub password has been password protected.

- HDD will be erased and partitioned with LVM on which /, /tmp, /var, /usr, /home, and swap formatted with ext4 system. 

- Created user and assigned a password.

- Unnecessary services were been stopped

- %package% section all your required packages will be installed.

- %post% section can be executed with all your post installation scripts.
I have used a script to get all details of the system which is installed, Download

I had placed the script on a folder which was shared with NFS. I had to mount to the client, execute and stored result in a file.

- Once after the installation, CD/DVD will be ejected automatically.

Installation successful.

Kick-start installation 


Download the kickstart file. 
copy kick-start file to /var/ftp/pub. boot with CD/DVD, and with boot prompt, specify the method to be used for kickstart which is read by anaconda installer for installation.

boot: linux ks=ftp://<IPaddress of kickstart file server>/pub/ks.cfg 


Download the kickstart file.
copy to /var/www/html rootDirectory of HTTP server. Boot the CD/DVD and with boot prompt, specify the method to be used for kickstart which is read by anaconda installer for installation.

boot: linux ks=http://<IPaddress of kickstart file server>/ks.cfg


Download the kickstart file.
Copy to some of the director and share file to /etc/exports. 
Boot the CD/DVD and with boot prompt, specify the method to be used for kickstart which is read by anaconda installer for installation.

boot: linux ks=nfs:192.XXX.XXX.XXX:/share/ks.cfg

I here by like to stretch this topic to configure PXE so that I will eliminate booting from CD/DVD. 

I will cover PXE - automated kickstart installation for CentOS/Redhat in next post.

Tuesday 20 August 2013

Shorten your log analysis

Bigger the log file, it will be difficult for anyone to search during analysis, hence wanted to shorten log file based on the Day, Month, and most importantly time(Hour).

I have written script based on the hourly basics, which will retrieve your logs for requested day of the month.

I have tried on Redhat/CentOS.

Script can be found here, Download


echo "Enter the time stamp to search in log files"

read -p "Day: " DAY
read -p "Month[Eg aug..etc]: " MONTH
read -p "Hour[Eg 02, 10..etc]: " HOUR

echo -e "\e[00;31mLogs which occured in mentioned timestamp: $DAY"-"$MONTH"-"$HOUR":00" \e[00m"

if [ $DAY -lt 9 ]
cat $LOGFILE | grep "$HOUR:[0-5][0-9]" | grep -i -n "$MONTH $BLANK$DAY"
cat $LOGFILE | grep "$HOUR:[0-5][0-9]" | grep -i -n "$MONTH $DAY"

Monday 19 August 2013

Get System Information

This script was made to make my work easy. Let me explain that I needed to pull system information for the servers which I own. Since executing each and every command was an tedious job hence automated in the cron, which will email be once it collects the data. 

Since knowing all information at once was helpful, developed script. I will call this only as pre-release, because I think code must be still revised and should be much more short in lines which could save execution time.

Since I had created Github account long time, I took today to upload the script and sharing for all.

How to create git account from command line, click here
Code can be copy/pasted from the link,

This was tested only in CentOS/Redhat/OpenSUSE.
My first blog to get into Bash via GitHub. 

Thursday 8 August 2013

Backup/Restore your hard-drive - Linux

I had chosen to backup my hard-drive in-case of any failures.Below are the steps which was made to restore from the hard disk image which was chosen. 

I would here brief you about how could one create an image of an primary disk(i.e here in this case hard disk sda) was backed up to the other remote server.

I have used Power of Clonezilla

as backup and recovery tool.



Here is what it follows for cloning the image of the HDD(sda). Few snaps are omitted as it would be self explanatory.

1. Download an ISO image of CloneZilla and boot clonezilla live media

2. When asked to choose between starting Clonezilla and entering login shell, start clonezilla

3. I am selecting disk-partition as an image to backup.

4. Save an image to remote server.

5. Configure dynamic ip address on the current system. Later configure your IP address, PATH to remote host for saving the disk backup image.

6. Using the provided SSH server information, Clonezilla will use sshfs to mount the remote SSH server's destination directory locally. Press ENTER to continue.

7. Next, you will be asked to choose a cloning mode. Choose "Beginner mode" which accept all default options.

8. In the next screen, decide whether to back up a whole disk or a specific partition in the disk. Here I choose an entire disk.

9. Next, type in the name of a Clonezilla image to be created, and mark the disk to back up

Next, use "check saved image" option if you want to check whether or not an image is restorable after the image is generated.
Finally, Clonezilla will start generating a (compressed) image of a chosen disk drive, and transfer it to a remote SSH server.

11. You will see the following screen after the image has been successfully generated and verified. Press ENTER to continue.

Now you can 
power off the computer, reboot, or start over the same backup procedure for other disk drives/partitions.


Follow the same steps as above, choose "restoredisk" option to restore a whole disk from the image that was created in the past.

Next, choose a Clonezilla image to restore
and the destination disk.

Rest all taken care by Clonezilla.

Successful : I was able to boot to the server after restoration of the partition.