Setting up my Raspberry Pi to be the home server

When I first had a broadband connection, the cable company provided a modem, but it was up to me to provide the router. Not long after, I was having regular failures from the small proprietary routers that you could buy which a detailed examination showed could not keep up with the massive amount of low level ethernet protocol messages coming from the cable side (looking like a large ethernet community) that seemed to be caused by a virus someone on that network had. I put a linux box in to see if I could get more information, but it was rock solid and was up for almost a year before a power failure caused a reboot. However it remaining running was crucial to my whole houses internet connectivity. Recent disk failures in this PC has highlighted its vunerability, and I decided to see if I could do something about it.
Continue reading “Setting up my Raspberry Pi to be the home server”

Keeping my personal data backed up

I just read a post on Slashdot asking the question of how to keep personal data safe.  The questioner had just returned from Mexico with lots (he said 16GB worth) of photographs.  Other peoples comments about offsite backup, and keeping too much data made me realise that some experiences I have from this last couple of weeks were worthy of some comment.

I have been looking at both offsite backup and reviewing my archived data.

I just read a post on Slashdot asking the question of how to keep personal data safe.  The questioner had just returned from Mexico with lots (he said 16GB worth) of photographs.  Other peoples comments about offsite backup, and keeping too much data made me realise that some experiences I have from this last couple of weeks were worthy of some comment.

I have been looking at both offsite backup and reviewing my archived data.
Continue reading “Keeping my personal data backed up”

The Squeeze between a Rock and a Hard Place (installing a Debian Squeeze system)

I have just re-installed my Debian MythTV server now that I have bought it bigger disks. In fact, with two new 1TB (ie approx 1000 Gigs) disks, I have put them together as a RAID1 pair and use this box to act as my new Home Internet Gateway.

I wanted to use Debian Squeeze because it will soon (sic) become Debian Stable and it is the first distribution that supports the Ext-4 filesystem that I feel is necessary for this important set of files.

However, I do not have a CD-ROM drive on this machine, so this is the story of installing Debian Squeeze using just just usb memory. With the easy and availability of USB to SD card converters, it does not matter whether you use a memory stick or an SD card – the process can be the same.

I have also decided to use extlinux as the boot loader. I have simple boot requirements, and feel nervous that using Grub (the Debian standard) means the use of a program to create the configuration file. What happens when I need to do something manual.

I have just re-installed my Debian MythTV server now that I have bought it bigger disks. In fact, with two new 1TB (ie approx 1000 Gigs) disks, I have put them together as a RAID1 pair and use this box to act as my new Home Internet Gateway.

I wanted to use Debian Squeeze because it will soon (sic) become Debian Stable and it is the first distribution that supports the Ext-4 filesystem that I feel is necessary for this important set of files.

However, I do not have a CD-ROM drive on this machine, so this is the story of installing Debian Squeeze using just just usb memory. With the easy and availability of USB to SD card converters, it does not matter whether you use a memory stick or an SD card – the process can be the same.

I have also decided to use extlinux as the boot loader. I have simple boot requirements, and feel nervous that using Grub (the Debian standard) means the use of a program to create the configuration file. What happens when I need to do something manual.

This blog entry does two things.

  1. It describes how to turn a bootable usb memory stick into an installation disk
  2. How to setup extlinux as the bootloader

Creating a USB installation memory stick

The first job is to prepare the USB memory stick (or SD card). The majority will already be formatted so that they have a boot sector that is bootable, with a single partition, formatted as FAT. If they are not, its straight forward under Linux to format them yourselves. I tend to use cfdisk as my disk partitioning program because I find it easier to use, but it doesn’t matter which. Just mark the first partition bootable, and then format it using the command

mkfs -t vfat /dev/sdX1

where /dev/sdX is the device name of your usb stick when plugged in. You can also use

mkdosfs /dev/sdX1

directly if your prefer (I never remember the second, so always use the first). mkdosfs is found in the dosfstools Debian package.

The second step is to locate and download the latest installer images. The Debian installer consist of a kernel image and and initrd image. The kernel is obvious, what is less clear is that the initrd image is the software the controls the install process. Start by navigating to http://www.debian.org/devel/debian-installer/ and looking for the different architecture images listed under the heading “other images (netboot, USB stick, etc)” (at the time of writing the squeeze alpha images are available as are the daily build images – I have used the daily builds without problems)

Regardless of which one of these two you choose you will then need to go down one further level into directory “hd-media”. In there you will find a vmlinuz image (the linux kernel) and an initrd.gz file listed. Download both of them.

You will also need an .iso image as when the installer starts it looks at all the possible partitions for a .iso image in the root. Obviously having it in the same root as the installer is the ideal here. If you have sufficient space to store a full CD image (approx 700Mb) this would be the ideal image to download, if not you will have to choose one of the smaller images (netinst or business card). I found I needed the full CD image to support my ethernet hardware so you may be forced to go for the large image anyway.

You can get the CD images from the same installer page – but then select the weekly snapshot CD image for your architecture as the one to download. Download CD1 – you don’t need any of the others.

Copy all three files (vmlinuz, initrd.gz and the .iso to the usb drive once it has been mounted.

You are also going to put syslinux on the usb, however for now just create a file syslinux.cfg in the root of the partition with the following contents

default vmlinuz
append initrd=initrd.gz

which will tell syslinux to boot the vmlinuz image with initrd=initrd.gz appended to the command line.

Finally unmount the usb stick and then type

syslinux /dev/sdX1

to put syslinux image on the partition’s boot sector.

You should now have an installation usb 🙂

Installing extlinux as the boot loader

An update on Asterisk

This is just a quick blog entry to note that I now have a working environment of a local telephone system, including three telephones inside the house and two located with my daughter some way away.

I initially purchased a linksys spa3102 as the first attempt to get ordinary telephones to work with , but after that I discovered the linksys pap2t, which gives two analogue phone circuits each. The whole setup works fine.

I have even managed to get asterisk to set up conference rooms and voicemail. Quality is really good.

This is just a quick blog entry to note that I now have a working environment of a local telephone system, including three telephones inside the house and two located with my daughter some way away.

I initially purchased a linksys spa3102 as the first attempt to get ordinary telephones to work with , but after that I discovered the linksys pap2t, which gives two analogue phone circuits each. The whole setup works fine.

I have even managed to get asterisk to set up conference rooms and voicemail. Quality is really good.

Playing with Asterisk

Asterisk is the Open Source PBX software that can manage telephone lines. As well as the classical hardware phones, more importantly for me, it also manages VOIP traffic. I decided to give it a go after I received our last quarterly telephone bill and discovered that we had spent £110 talking to my daughter in the period.

Asterisk is the Open Source PBX software that can manage telephone lines. As well as the classical hardware phones, more importantly for me, it also manages VOIP traffic. I decided to give it a go after I received our last quarterly telephone bill and discovered that we had spent £110 talking to my daughter in the period.

An initial trial last week managed to allow my daughter to call in from outside and have a voice conversation with me (who was inside the house). This is with two firewalls (one at her place and one at mine) in the path. Unfortunately, there was a problem with my microphone.

Over the weekend I extended that further, connecting the Free World Dialup and used there echo service to take a call and bounce it back to me.

I have also ordered a linksys SPA3102 so that I can try and connect a real phone into the process.

I was worried that yet another application running on my server would bog it down, but it doesn’t appear to be a problem with CPU load rising from an average of around 2% to about 5% when asterisk is bridging a call including changing codecs (and therefore presumable de-encoding and then re-encoding the voice on the channel).

I still have a problem with sound quality but it seens to get better with every tweak that I do. Lets hope over the next week I can get it working fully.

Update to Tomcat 5.5

Last night I updated the underlying server for this web site from Tomcat5 to Tomcat5.5

I wanted to make use of the ability of Tomcat5.5 to take the context.xml file from within the .war file of the application and automatically use it. This means that I can keep the development of this project together (under the git version control system) much more easily than trying to separately edit a configuration file for each new application added (or updated).

Last night I updated the underlying server for this web site from Tomcat5 to Tomcat5.5

I wanted to make use of the ability of Tomcat5.5 to take the context.xml file from within the .war file of the application and automatically use it. This means that I can keep the development of this project together (under the git version control system) much more easily than trying to separately edit a configuration file for each new application added (or updated).

I had two rather subtle problems that are worth describing here, one to do with a change in the way Postgres works, and the second because the documentation for Tomcat don’t really explain fully what happens and there is a bug in the Debian (Etch at the time of writing – the unstable version has the bug fixed) distribution.

Firstly the Postgres problem. I had previously created a database user called tomcat4 and another called apache2 to access the user database. tomcat4 was for the java application and apache2 was for the security access for this site managed by apache2. Anyway, I tried to change these to tomcat and apache respectively so that I didn’t have to change these parts of the application just because I upgraded version. So I created new roles (users) in Postgres, only to find that the whole process of logging in stopped working. I eventually discoverd that Postgres 8.1 had introduced the concept of users who could login and that if I was not careful the default was no login (its more complicated that that, but if it applies to you you can read the documentation). Once I changed these values this part was working again.

The second problem is exactly how tomcat dealts with the context.xml file it finds. It actually physically copies the context.xml file from the .war file to $CATALINAHOME/conf/$ENGINENAME/$HOSTNAME/$APPNAME.xml. Unfortunately the Debian distribution runs tomcat under the tomcat55 username but makes the $CATALINAHOME/conf directory owned by root and with 755 access rights. So the file permissions was wrong, the context was not set up, and the resources to access the database not created. As a result as soon as the application tried to access the database it failed with a jdbc failure.

Any it is working now – I hope. So ready for the next step (my bug reporting section called iTrac).

Disaster Strikes

I ought to know better, there are loads of sayings that I should take heed of.

“Don’t fix want ain’t broke” is probably one of them, followed very closely by “If it can go wrong it will go wrong”.

It was started by my daughter, who recently received some money for here 21st birthday and decided to build herself a new computer with some TV tuners, so she could watch TV whilst she was working at university.

I ought to know better, there are loads of sayings that I should take heed of.

“Don’t fix want ain’t broke” is probably one of them, followed very closely by “If it can go wrong it will go wrong”.

It was started by my daughter, who recently received some money for here 21st birthday and decided to build herself a new computer with some TV tuners, so she could watch TV whilst she was working at university.

I decided to try and improve the security of my hard drives. I was seduced by the thought that I could buy a 160GB hard drive for about £40, and thought about the long term plan to add TV tuners to the server to provide for the backbone of a TV server for the house. This would of course need storage for all the video files that we would collect.

But the little voice in my head would try and seduce me even more. “Hard drives are so cheap now that you could store your data archives on them rather than CD-ROMS”. So the plan was hatched. Buy two SATA hard drives and a PCI SATA controller and add them to the server as a new storage capability. Use some space for video, but set up the remainder as raid, and make a secure storage for archive data. Order placed, parts delivered.

First problem; Installed the PCI card in the server, and whilst it would recognize the drives, if any drives were connected to it, it would not get past the BIOS checks in the computer, and the computer would fail to boot. Calls to the hard drive manufacturer where no help – their advice, buy a different card. Just before I did, I tried the card in my workstation PC, but no joy their either – BIOS (at the latest release) would not boot up.

So, I since the first SATA PCI card cost only a few pounds, I decided to take the plunge and buy another one. Bringing it back home, I tried it in the server – still no joy, could not get past boot. Tried it in the workstation and it worked great, no problems.

So back to the drawing board for a rethink. If I took my four largest IDE drives from both machines (200,80, 60 and 40GB) and put them in the server I could have a complicated arrangement of raid1, raid5 and no raid, and a liberal sprinkling of LVM to create sufficient space to handle video, image and audio files for all the family, and continue run its existing role as web, mail and gateway for the internet and a network server for the home lan. I could put the two 160GB sata drives in to a raid1 mirror and use them for the workstation. The archive would have to stay on this workstation (where it had been in the old arrangements – that is until it was written to CD-ROM after sitting there for 6 months).

In addition, I worked out as regime of backups, so that all key data stores on any one machine had a recent backup copy on the other, and the most important datastore (my online database) would actually have versioned backups also stored in the archive.

Second Problem; Goodness knows why, but this seemed a great time to move my server, which had been Sarge based over to Etch. Why? Well I had this idea to move to Tomcat5 for my backend application server and this was not supported on the old release. But I should have known better. I chose the time that there was a bug in libdevmapper which somehow prevented me accessing LVM volumes, so in the mids of all the potential termoil of moving disks around, I suddenly couldn’t boot the system. I think it probably took a whole weekend to sort that one out.

Third Problem; All this re-arrangement a complicated movemnet of the data between machines as the IDE drives were all to be moved over to the server, and re-ordered as to which IDE channel they were on, and partition sizes changed. Still, I had mostly used LVM for my filesystems, so it should be easy to clear down spare partitions, turn them into new PV devices with pvcreate, add them to a volume group with vgextend, and use the pvmove command to move the data onto this spare space. Finally, the old location could be removed from the volume group with vgreduce, and the old partition released with pvremove. This should be the ideal way to clear down a drive and move it to somewhere else.

Wrong the problem started when I was in the middle of a pvmove, and went to another (psuedo) terminal and attempted to start another pvmove. Unfortunately I started at this point to learn the hard way that pvmove is a very fragile operation and any LVM activity in parallel has a tendency to cause things to lock solid, leaving corrupted metadata and sometimes (and it would be crucial times in my operations) lost data.

Fortunately, I have always been very careful with backups. I have had a regime of always having another a backup copy, less than one day old, on another hard disk, so that in the event of a problem I could recover. This was one of the times that such a backup was invaluable. But another few valuable hours was wasted recovering the data, although I was now starting to feel vunerable, because as the filesystems were getting more and more optimised as I was squeezing things into small partitions, I have to switch off backups to avoid them filling up space I was trying to free up. Still it would be finished in the next day or so – or would it?

Fourth Problem; I had almost rebuilt most of my systems under raid, moving data between machines, partitions, raid arrays and lvm physical volumes.

I had organised my workstation to have 4 partitions on each of the disks – which apart from swap space was all formed into 3 raid1 arrays. Root, /boot where direct mounts in raid devices, the last raid device was a huge lvm volume group where I could allocate extra space as needed.

The server was a little trickier, with different sized disks, and the need to allocate 100GB for a video storage. In the end I have about 40Gb in pairs (raid 1 arrays) for root /boot and and some lvm volumes for filesystems to hold key data, 120GB of raid 5 space across three disks by using 60GB from each of three disks, and two 500M raid 0 partitions for the log files and about 110GB as a single lvm volume spread over the remaining small holes on all the disks. Because of its size, the 200GB disk was involved in lots of these, and needed more than four partitions, so needed some logical partitions. This was almost working, and needed a last couple of pushes before getting there. And it was here that I made the crucial mistake. I was going to have to delete the backup of some data that I was about to move into the new partitions. Instead of moving it elsewhere to ensure I had a backup copy, I just deleted the backup and started to move the real data into the space it had just occupied.

Guess what – no sooner had I moved the data than the disk complained that there was a problem with all the partions (in fact only one) in the logical partition part, and try as I could, there was no recovery and no backup.

This just happened to be the area that held

  • Web site, both the static and dynamic content
  • The photo gallery for the web site
  • The database behind the web site
  • The git repositories that are presented to the public

Fortunately, I have managed to recover most of this data from source. The web site, and dynamic content applications were available from my development environment on the workstation, and a month old copy of the database was available also for testing purposes on my workstation.

I do have the pictures to put back in the gallery (although have lost the comments that I put against each picture), but I have to recreate the theme to match it in to the rest of the web site.

The git repositories can just be recovered from my workstation repositories, except for some minor issues that will take a little time to resolve. These are:-

  • Specialist hooks, that created tarballs of releases for the download area
  • Gitweb configuration to make it work within my web site environment

So I am almost up and running again, and will get the rest up in a short while. I am currently engaged in another project (see articles to follow shortly) that is distracting me.

But what I do have now is a much stronger configuration than before. All key data is stored on mirrored disks, and even less important data is stored in raid 5 (which requires 2 out of 3 disks to be working), and all important key stores are backed up on the other machine.

But if I had known how hard and long this simple activity would actually take, I would never have started it!

Backup and Archiving at Home

I have several computers at home, and it is important that they are properly backed up in order to not lose data. I want to show an example of how this is done, but first a number of preliminaries.

I have several computers at home, and it is important that they are properly backed up in order to not lose data. I want to show an example of how this is done, but first a number of preliminaries.

  1. I have defined that backups should, where possible, be placed on a different disk to the source. Thus I should not lose data if I have a disk corruption or a hardware failure.
  2. There are certain directories (for example /etc, and the subdirectoy mydocs in my home directory) which am changing the files and would like to keep changes to those files so that I can revert, or insure that when I delete them a copy is archived for posterity.
  3. I break down my file layout into separate filesystems, and in particular, I have separated out:-
    • the backup directory (well it is on another disk)
    • my home directory
    • certain directories (particularly on my server) which are likely to contain massive amounts of data (such as /var/lib/svn where all the svn repositories lie)
  4. Where possible I am using lvm to manage most partitions as logical volumes, so creation, deletion and resizing of them is easy.
  5. Once a file changes in one of the special directires (such as /etc), the copied file is stored in on of several snapshot directories related to points back in time. I have
    1. the latest snapshots
    2. daily snapshots from yesterday – up until one week old
    3. weekly snapshots up to one month old
    4. monthly snapshots up to 6 months old
    5. older than six months are assumed to be queueing for eventual manual writing to CD for keeping for ever.

So how do I do it.

Firstly, simple backup is done using rsync with the -aHxq and –delete switches. This cause the destination directory (and subdirecties) to become a copy (ie a backup) of source directory (and subdirectories). The -x switch limites this to a single filesystem. Where I need to keep the changes to a specific directory then I also use the –backup-dir switch to write them into the latest snapshots directory.

Archiving the snapshot directory is done daily just before the backup (so its actually part of a daily backup script that is run creating the script file as /etc/cron.daily/backup). This snapshot is turned into the daily snapshot by simply using mv to change the name of the directory from snap to daily.0 (or course daily.0 should have already been renamed to daily.1 before hand). Similar backup scripts for archiving only are placed in /etc/cron.weekly and /etc/cron.monthly)

The interesting trick comes when merging a daily snapshot into an already existing weekly snapshot (or weekly into monthly, or monthly into the CD archive). By using cp -alf this just makes an additional link in the weekly snapshot to the file already in the daily snapshot (so it happens fast as there is not file copying). Where a file already existing in the weekly snapshot it is replaced by the link (this effectively overwriting the old version), where a file didn’t already exist a new link is simply created. If the old daily snapshot is removed at this point, then this just unlinks the file from the daily snapshot but leaves it in the weekly.

So here is the relevent code from the files

/etc/cron.daily/backup

#!/bin/sh

logger -t "Backup:" "Daily backup started"
ARCH=/bak/archive

if [ -d $ARCH/daily.6 ] ; then
if [ ! -d $ARCH/weekly.1 ] ; then mkdir -p $ARCH/weekly.1 ; fi
# Now merge in stuff here with what might already be there using hard links
cp -alf $ARCH/daily.6/* $ARCH/weekly.1
# Finally loose the rest
rm -rf $ARCH/daily.6 ;

fi
# Shift along snapshots
if [ -d $ARCH/daily.5 ] ; then mv $ARCH/daily.5 $ARCH/daily.6 ; fi
if [ -d $ARCH/daily.4 ] ; then mv $ARCH/daily.4 $ARCH/daily.5 ; fi
if [ -d $ARCH/daily.3 ] ; then mv $ARCH/daily.3 $ARCH/daily.4 ; fi
if [ -d $ARCH/daily.2 ] ; then mv $ARCH/daily.2 $ARCH/daily.3 ; fi
if [ -d $ARCH/daily.1 ] ; then mv $ARCH/daily.1 $ARCH/daily.2 ; fi
if [ -d $ARCH/snap ] ; then mv $ARCH/snap $ARCH/daily.1 ; fi

# Collect new snapshot archive stuff doing daily backup on the way

mkdir -p $ARCH/snap
...

/etc/cron.weekly/backup

#!/bin/sh
#	AKC - see below for history

ARCH=/bak/archive
if [ -d $ARCH/weekly.5 ] ; then
#  if any of the files only have one hard link, it needs to be passed on
if [ ! -d $ARCH/monthly.1 ] ; then mkdir -p $ARCH/monthly.1 ; fi
# Merge into monthly archive
cp -alf $ARCH/weekly.5/* $ARCH/monthly.1
# Shift along snapshots
rm -rf $ARCH/weekly.5
fi

if [ -d $ARCH/weekly.4 ] ; then mv $ARCH/weekly.4 $ARCH/weekly.5 ; fi
if [ -d $ARCH/weekly.3 ] ; then mv $ARCH/weekly.3 $ARCH/weekly.4 ; fi
if [ -d $ARCH/weekly.2 ] ; then mv $ARCH/weekly.2 $ARCH/weekly.3 ; fi
if [ -d $ARCH/weekly.1 ] ; then mv $ARCH/weekly.1 $ARCH/weekly.2 ; fi
...

/etc/cron.monthly/backup

#!/bin/sh
#	AKC - see below for history

ARCH=/bak/archive
CDARCH=/bak/archive/CDarch-`date +%Y`
MACH=piglet

if [ -d $ARCH/monthly.6 ] ; then

if [ ! -d $CDARCH ] ; then mkdir -p $CDARCH ; fi
cp -alf $ARCH/monthly.6/* $CDARCH

rm -rf $ARCH/monthly.6
fi

# Shift along snapshots

if [ -d $ARCH/monthly.5 ] ; then mv $ARCH/monthly.5 $ARCH/monthly.6 ; fi
if [ -d $ARCH/monthly.4 ] ; then mv $ARCH/monthly.4 $ARCH/monthly.5 ; fi
if [ -d $ARCH/monthly.3 ] ; then mv $ARCH/monthly.3 $ARCH/monthly.4 ; fi
if [ -d $ARCH/monthly.2 ] ; then mv $ARCH/monthly.2 $ARCH/monthly.3 ; fi
if [ -d $ARCH/monthly.1 ] ; then mv $ARCH/monthly.1 $ARCH/monthly.2 ; fi

...

UPDATE: As of 26th February 2011 the basic mechanisms show in this post are still in use.  However some detail is wrong  (this disk layout and partitions).  Nothing that detracts from the basic message.  See also my recent post about keeping personal data backed up

Open File Formats

The state of Massachusetts is defining that all government documents should be in an open format. Quite right too. Any government department should ensure that all documents are produced in a form that anyone can read the data – for ever and without payment to any third party licence fees.

The state of Massachusetts is defining that all government documents should be in an open format. Quite right too. Any government department should ensure that all documents are produced in a form that anyone can read the data – for ever and without payment to any third party licence fees.

The problem with the approach that they are taking is that defining that Microsoft Office XML standards are open. It also appears that Microsoft appears to be offering a licence to read this documents that confirms to this open standard. However, I think this openness is illusary and should not be allowed. For me the key reasons are

  • The licence is extremely tightly worded to imply that whilst you might be able to develop software to read these formats, you can’t distribute this software to others
  • You can’t write software to write to these formats
  • The formats are defined arbitarily by Microsoft and are not guarenteed not to change.

All this means that sometime in the future it could well be that future generations do not have access to applications which can read and manipulate these documents.

There is a standard, OASIS Open Document Format for Office Applications (OpenDocument), that can be used to xchange documents across the network. The next release of openoffice.org (2.0) will support this as its default standard.

I would like to encourage everyone to adopt this standard as their default exchange mechanism. If we can build up enough momentum behind this, then a few years down the line we will have a standardised mechism everyone can use – and hopefully prevent archive material disappearing never to be readable again