Current Page Contents
If we want to develop our own method of secure storage for encrypted data, we need to understand how data is stored on a hard disk. This section explains in detail how the hard disk works and how DOS takes care of tracking files on disk.
I've purposely chosen DOS for this section as it offers many advantages over Windows and Linux for storing encrypted information, especially if you want to develop your own routines from scratch.
It's certainly possible to develop secure filing systems under Windows and Linux, but it involves much more hard work to ensure that you are completely safe. For example, you need to write additional code to ensure the swap files are clean after you have viewed any encrypted data. That's just a single example, there are many more I could give.
If you don't have the need for extreme encryption, then use whatever program you want under Windows or Linux. However, if you have data that's extremely sensitive....You had better read this and the following sections ;)
OK, even if you don't have a DOS system, then go out and buy another hard drive. They're cheap enough. Put DOS on it and follow the rest of this guide and you'll end up with a system that's secure enough to beat the most determined of people who might take an interest in your files ;)
You don't need another computer for this as DOS will quite happily exist on another drive along side Windows or Linux. The
Let's face it, most of the information you might want to hide won't be specific to Windows or Linux, so use DOS to hide it and view it !
This section explains in detail how DOS works from startup. The
Unfortunately things get a bit heavy from now on. We need to take a look at hardware in detail. We also need to dabble in assembly language, or at least be able to understand it. Anyway, take a deep breath and go for it. I'm always willing to answer questions via email if you get stuck.
This section is quite large, so you might want to save this page to disk so you can read it at your leisure.
2 Physical Disk StructureIf you have ever taken a look at your BIOS Setup screens, you might have noticed disk parameters such as Cylinders, Heads, Sectors, Mode=LBA. You might also have come across the terminology of Sides, Tracks, Tracks per Side and Sectors Per Track.
So...What does this all mean ?
OK, let's start from the basics and work our way up. We will start with a 1.44MB floppy disk as an example.
The floppy disk has two sides (also referred to as heads). These are side (head) 0 and side (head) 1.
Each side has a 80 concentric tracks starting at track 0 from the outer edge of the disk through to track 79 at the inner most part of the disk. Given the disk has two sides (heads), this means the disk has 2 x 80 tracks in total, i.e. 160 tracks.
The concept of the cylinder is especially important to speed up the reading and writing to hard disks with multiple platters, ie having many sides. The cylinder can be thought of as a vertical set of all tracks with the same number on each side. So when writing a file, the operating system will try to fill all tracks on a cylinder with the file to avoid the need to move the heads excessively, thus also reducing wear and tear on the disk as well as speeding up the operation. So for the floppy disk cylinder 0 is track 0 on sides 0 and 1, cylinder 1 is track 1 on sides 0 and 1 etc.
Tracks are split into sectors. For a 1.44MB floppy disk, each track has 18 sectors, with each sector size being able to hold 512 Bytes of information.
Therefore the size of a disk can be determined by the following formula :-
For the 1.44MB floppy disk this is :-
Cylinder and Head numbers always start from 0, But sector numbers always start from 1. So for the floppy disk, it has 80 cylinders numbered 0 to 79, and 18 sectors per track numbered 1 to 18.
3 BIOS and Operating System LimitsIn the early days of the PC hard disks were expensive and only had storage capacities of typically 20 to 40MB. There weren't many disk manufacturers and the range of disks available was so small that it was possible to just give a particular disk a type number. When you installed your hard disk, you didn't have to worry about the physical structure of the disk, you just selected it's type number. The list is still there in todays BIOS's for backward compatability, although I can't see that anyone running a 750MHz Pentium is going to be using a 20MB hard disk!
The problems began as hard disks fell in price and became available in ever increasing capacities. Neither the BIOS nor DOS software writers ever expected GB disks to become available, so there were limitations on the size of disk that could be recognised depending on which version of DOS you were running and which BIOS version you had.
Disk manufacturers often bundled low level software to enable the user to get the full capacity of the disk. There were other problems with DOS which also meant you had to partition your hard disk to make several smaller logical hard drives.
Even the last issued version of MS-DOS had problems. Although it could use big hard drives, you still had to partition drives into several smaller ones to enable the full disk capacity to be used as it had a maximum cluster value of 65535 and a maximum FAT size of 256 sectors.
The BIOS problem is still with us now. It only allows 10 bits to specify a cylinder, meaning we can have a maximum of 1024 cylinders. It only allows 5 bits to specify a sector number, thus limiting sectors per track to 63 (OK 5 bits gives 0-63, ie 64 discrete numbers, but sectors always have to start at 1, so that gives 63 sectors per track max). The odd thing is that the BIOS has always allowed 8 bits for the head number, giving up to 256 heads.
There aren't any disk manufacturers out there that produce hard disks with anywhere near that number of heads ! (or at least none that are available to the average PC user)
The problem was partially solved in 1994 when BIOS manufacturers developed a method of addressing the disk called Logical Block Addressing (LBA). This still had the 1024 cylinder limit, however, the number of heads was artificially increased to enable the full disk capacity to be reached. The number of cylinders, heads and tracks stored by the BIOS for LBA mode does not reflect the true physical structure of the disk, however it is used to calculate a logical address, ie the position on the disk where the data is stored.
LBA only works for disks up to 8.4GB. For disks of greater capacity, most recent BIOS versions have a mode called LARGE which once again is a translation mode to allow full disk access for drives greater than 8.4GB.
This won't affect our plans for providing an encryption storage method as we will be using BIOS calls to access the data, so we can use whatever LBA or LARGE parameters the BIOS uses when it does an auto-detect on the disk when it is first installed. All we will need to do is to supply a cylinder, head and sector number to any BIOS calls we make, and let the BIOS do the translation.
Incidentally, on the subject of disk sizes, the hard disk manufacturers define 1MB as being 1,000,000 bytes, whereas many operating systems and other disk software define 1MB as being 1,048,576 bytes which is why for example your BIOS will report your disk as being of a different size than CHKDSK (and many other disk utilities) do.
4 Formatting - An OverviewFormatting is a complex procedure, involving three key stages. First of all, the disk is low level (hard) formatted by the manufacturer before it leaves the factory. This low level format writes the actual sector ID fields on the media. The user of a new disk then has to perform stages two and three.
The second stage involves building up a partition table for the disk using the FDISK utility that comes with both DOS and Linux. The partition process basically lays down information on the hard disk to allow the BIOS and operating system to identify individual partitions, ie where they start and end, and what operating system is used on that partition. It also places a Master Boot Record (MBR) on the disk if the disk has an active partition, ie is bootable.
The third stage is slightly different under DOS, Linux and other operating systems. This involves storing the information needed by the operating system to boot the machine, store and index data files etc. Under DOS this is performed using the FORMAT command. With Linux, one uses the mkfs (make file system) command.
We will be taking an in depth look at what happens when you use FDISK and FORMAT with the DOS operating system to show the sort of things you need to consider when developing your own storage method. We will be developing our own versions of FDISK and FORMAT in following sections to allow us to create our own secure filing system
5 The Low Level FormatAs mentioned before, the low level format is carried out by the manufacturer before the hard disk leaves the factory. Most current BIOS versions do have this as an option, however, not all hard disks respond to this. In some cases, the disk can actually be rendered unusable, so take care to read the manual that came with your disk if you want to experiment with low level formatting. In any event, you don't have any control over settings for this, other than the chance to select an interleave, so it's not really worth considering using the BIOS low level format program. I might add a section on this at a later date after I have perfected my routines for rendering a disk apparently unreadable, even by the hard disk manufacturers and expert data recovery agencies ;)
I've been working on this on and off for about 18 months now, so don't hold your breath !
During the low level format, the manufacturer uses software to write directly to the disk controller rather than to the BIOS. This sets up the disk so that sectors can be recognised. You might have read that a sector consists of 512 bytes, however it is actually longer than this. The 512 bytes is the actual data area for the sector. There are other bytes either side of the data area which belong to that sector. These bytes aren't accessible to DOS, although the BIOS can access some of them via diagnostic calls through INT 13h.
The additional bytes are used for synchronisation and error control to ensure the controller can recognise the exact start and end of each sector.
The low level format also decides upon the actual sector numbering, and uses such methods as interleaving, track and cylinder skewing and multiple zone recording to get the absolute maximum capacity possible from the disk with the best possible access times. These techniques along with advancements in RLL encoding have lead to the availability of massive hard drives over the past few years.
6 FDISK - Partition Table and Master Boot Record
6.1 IntroductionThe first sector of a hard disk is always reserved for the partition table. This is common to all operating systems. It also contains the Master Boot Record (MBR) if the disk is bootable. The DOS FDISK command is used to partition the disk and write the MBR.
From now on I'll be using C to denote cylinder, H for head and S for sector.
The MBR starts at CHS=0,0,1 and the partition table is at offset 01BEh (The h stands for hexadecimal).
When you switch your PC on, the BIOS program goes through various initialisation and system testing routines, then it loads the first sector from either the hard disk (or a bootable floppy disk) to memory segment 0000h, offset 7C00h.
In the case of a hard drive, this is the MBR (also containing the partition table). For a floppy disk (under DOS), this sector is the boot sector. Either way, the BIOS then passes control to the code starting at 0000h:7C00h.
6.2 Master Boot Record CodeHere's what happens when you boot up your PC with a hard disk MBR created by DOS :-
The error messages start at 068Bh (008Bh in the MBR) and are as follows :-
6.3 Partition Table EntriesThe partition table itself starts at 07BEh (01BEh in the MBR). There are four consecutive partition table entries, each consisting of 10h bytes, followed by the boot signature 55AAh. The format of each partition table entry is as follows :-
Although the MBR/Partition table for DOS only occupies a single sector, the complete track 0 (ie CHS=0,0,1 to CHS=0,0,sectors per track) is reserved as other operating systems may use more than just one sector. These additional sectors have been used in the past by disk manufacturers to place translational code in to get around DOS and BIOS limits for disk sizes.
It's also an area that has been used by virus writers in the past to ensure that their code gets executed every time you boot your PC !
7 FORMAT - High Level Formatting
7.1 IntroductionOK. We've discussed what FDISK does, and have plowed through the assembly code for the MBR. As we found, the MBR loads the DOS boot sector to 0000h:7C00h (assuming the MBR and partition table passes a few tests).
Now we need to look at the DOS boot sector, which is placed on disk via the FORMAT command. We also need to take a close look at what else FORMAT does as it provides an indexing method to enable data files to be found on disk. If we can understand what is needed to do this by studying FORMAT in detail, we will be well on the way to understanding how we can hide encrypted data on disk by developing our own bespoke code to do this without it being public knowledge ;)
Unfortunately, this section involves further understanding of assembly code. Once again, I have presented assembly code with liberal comments.
So, let us continue and see how DOS stores and indexes information.
FORMAT does the following :-
7.2 Logical Sectors and ClustersWe have already seen that any sector on a hard disk is referenced by a CHS number. This is inconvenient for DOS as disks come in many different sizes with different CHS numbers. DOS uses a logical sector number format to index every sector on the disk. Logical sector numbering starts at 0 rather than 1, starting from CHS=0,0,1 (ie the MBR/Partition Table). Logical sectors are numbered sequentially, filling complete cylinders at a time to avoid unneccessary head movement when writing to sequential sectors on disk.
The smallest piece of data the disk controller (through the BIOS routines) can read or write is a complete sector of 512 bytes. For floppy disks, DOS keeps track of all sectors on the disk. For disks of large capacity, this would involve maintaining a massive FAT. DOS uses multiple sector units called clusters to keep the size of the FAT down. This is a compromise however, and for large disks, leads to wasted disk space.
For example, on my 4.3GB drive, DOS has a cluster size of 64 sectors, ie 32KB. This means that if you write a 1KB file to disk, DOS uses a single cluster of 32KB for the file, thus wasting 31KB of disk space. This wasted space is often referred to as slack space. Wasting 31KB might not sound like much, but if you have say 2000 files on your disk which are 1KB in size, this actually uses 62.5MB disk space rather than 1.95MB !
Other operating systems allow you to choose the smallest unit of storage space, so if you know you are going to have many small files, you can choose a small cluster size. This is particularly important for Linux for example, where you might have several thousand small script files.
One way round this problem in DOS is to partition your drive, leaving a large partition for standard files and creating smaller partitions to store small files. DOS reduces the number of Sectors per Cluster for small partitions. The following table shows what cluster sizes DOS uses depending on the size of the disk
Note that for disks greater than 2GB, DOS needs you to create multiple partitions as it allows a maximum FAT of 256 sectors and a maximum of 65535 clusters.
Cluster numbering starts at 2, the first cluster being located at the beginning of the Data Area.
7.3 The Boot Sector
7.3.1 IntroductionThe Boot Sector under DOS occupies the first sector immediately after the end of track 0, ie at CHS=0,1,1. It is copied from the hard disk to memory location 0000h:7C00h by the MBR on system startup and can be divided into 6 main areas, as indicated by the following table.
The second area is the only part that is machine specific as it contains information relating to your own hard drive. All other areas should be the same from machine to machine.
I'll be using the Boot Sector from my 4.3GB hard drive to show what everything means as we look at the boot sector in detail. All address offsets will be in hex, and will reflect the 0000h:7C00h memory location that the boot sector is loaded to.
One thing to remember when you try to decipher the boot record is that data is stored in what is known as "Little Endian" format, ie you need to reverse the order of the bytes to get the real value !
The following table shows you how to do this :-
7.3.2 Assembly Language Jump to Boot Code
Nice and easy to understand. The MBR (which we looked at previously) loads the boot sector to 0000h:7C00h and passes control to it. The boot sector then jumps directly to the code that will load DOS.
We'll now take a look at the second area, which contains disk parameters DOS needs to refer to.
7.3.3 Disk Parameters Needed by DOSAs mentioned before, the second section will be specific to your own hard drive. In this example I am going to use the boot sector from my 4.3GB hard drive. As such, I need to explain how I set this up using the BIOS and FDISK to allow you to understand the second section of the boot sector.
FDISK wouldn't let me partition the lot as a single drive as it would exceed DOS 6.22's limit for a 256 sector FAT !
FDISK allowed me a primary partition of :-
OK, let's take a look at the disk parameters :-
Most of the table is self explanatory, however, there are a few entries that need to be explained :-
Offset 7C13 has the sectors on disk as being 0. Going back to early versions of DOS, there was a maximum partition size of 32MB. Sectors on disk could fit into this entry quite easily. Anything bigger than a 32MB partition under DOS requires more than two bytes to give the size. If the entry at 7C13 is zero, then the entry at 7C20 gives the number of sectors on the disk, ie 4192902 in the case of my hard drive.
Offset 7C1C talks about hidden sectors. All it means is that the MBR occupies a single sector, but the rest of the track is reserved before the start of the boot sector. In the case of this hard disk, the MBR occupies CHS=0,0,1 to CHS=0,0,63. A total of 63 sectors are therefore used for the MBR, hence these are called hidden sectors.
We also need to take a diversion here to look at how DOS uses information from this section of the boot sector.
If you have calculated the number of sectors on disk from the CHS parameters, you will have noticed that this does not match up to the Big Total Sectors value that DOS puts into the boot sector.
DOS does not count the number of hidden sectors, hence the 63 difference.
The number of clusters can be calculated from information in the boot sector. We need to find out how many sectors are used for the boot sector, 2 FAT's and the root directory.
The number of clusters is :-
Clusters have to be an integer number so we discard the number after the decimal point, so we have 65505 clusters.
If you add up the system area sectors and cluster sectors you will find that there are 37 sectors left over. These sectors can never be used by DOS as they are insufficient to form a cluster, so are wasted space.
7.3.4 Assembly Language Boot CodeHere's what the assembly boot code looks like :-
I bet you're glad that's finished ;)
So what the boot sector does is primarily to check if we have a DOS operating system present (on the basis that IO.SYS is where it should be), and if so, it copies the first 4 sectors of IO.SYS to 0070h:0000h and passes control to IO.SYS.
IO.SYS continues with it's own setup procedure then passes control to MSDOS.SYS, which also has it's own stuff to process before passing control to COMMAND.COM which ends up giving you your C:\ prompt (after processing CONFIG.SYS and AUTOEXEC.BAT if they are present).
I could go further and put up the code for IO.SYS, MSDOS.SYS and COMMAND.COM, but this would end up as a MASSIVE page if I did, and it's big enough already !
We did need to look at the boot record code in detail as we will be modifying it in later sections as part of our intended encryption procedures.
7.3.5 Error MessageHere's the error message :-
Basically, if anything goes wrong, you are prompted to put in a DOS disk and press any key for the "red hot" reboot.
This can easily happen for example if you have a non system disk in drive A and have that drive setup in the BIOS as primary choice for bootup.
7.3.6 System File NamesSystem file names are padded with spaces to use the DOS 8.3 (filename.extension) format. :-
Note the full stop is missing between the filename and extension. DOS automatically assigns this.
DOS expects these files to be the first files on the disk, ie that IO.SYS will start at cluster 2, and MSDOS.SYS will start immediately after IO.SYS.
If not, then DOS will complain and won't be able to boot !
7.3.7 DOS SignatureThe usual DOS signature (55AAh) has to be at location 7DFE for DOS to boot.
7.4 The File Allocation Table
7.4.1 IntroductionFinally we are about to find out how DOS keeps track of files on your computer.
OK. So far we have the MBR/Partition Table at CHS=0,0,1. We also have the boot record at CHS=0,1,1 (remember that the whole of track 0 is reserved for the MBR even though it usually only uses a single sector).
The start of the first copy of the FAT comes immediately after this, ie at CHS=0,1,2. The second copy of the FAT follows the first, so in the case of my hard drive which has a 256 sector FAT, the second copy starts at CHS=0,5,6.
When a file is saved to disk, it's starting cluster number is stored in it's directory entry as a two byte number (see the next section on Root Directory and Sub Directory entries).
This starting cluster number serves as a pointer to the FAT entry that contains the number of the second cluster.
7.4.2 Calculating Clusters
7.4.3 ExampleLet's take IO.SYS as an example. According to it's directory entry, the starting cluster number is 02. So, we multiply this by 2 to get the FAT offset for the next cluster number, ie it should be at FAT offset 04. When we look at this offset, we find the cluster number is 03, so we know where to look for this part of IO.SYS. We now mutiply the cluster number by 2 again to find the next entry in the FAT, ie 06. So we now take a look at FAT offset 06 to find the next cluster number. This is listed as FFFFh.
FFFFh signifies the end of the file, ie there are no more clusters used.
7.4.4 Reserved EntriesOK. That's the way DOS handles FAT entries. However, if you have noticed, then in the example of finding clusters for IO.SYS, I said that FFFFh signified the end of the file, but in the above table, I've said that anything less than FFF8h is the end of the file.
There are a few number ranges reserved for FAT entries. These are as follows :-
7.5 The Root Directory Entry
7.5.1 IntroductionAs we have seen from the data area of the boot sector, the Root Directory has a maximum of 200h (512d) entries. Each entry in the Root Directory has 20h (32d) bytes, which describe the filename, date/time, start cluster number etc.
The root directory is the last part of the system area and starts immediately after the second copy of the FAT. In the case of my hard drive, it starts at CHS=0,9,A
For all hard drives under DOS the root directory occupies 32 sectors to accomodate the 512 entries.
This means that it is important you make liberal use of subdirectories as you will run out of disk space rapidly if you just use the root directory to store files. If you don't create any subdirectories, your disk will be full when you have 512 files on it !
The root directory is fixed in this respect, however, subdirectories are dynamic and can expand or contract to hold as many files as your disk can take.
7.5.2 StructureAs mentioned before, each directory entry contains 20h (32d) bytes. The meaning of each byte in the entry is given by the following table :-
7.5.3 Sub DirectoriesThe same format is also used to create other directory structures throughout the disk. The root directory entry is limited to 512 entries maximum as it is in the last part of the system area. Other directory entries are not as they are held in the data area.
7.6 The Data AreaThe Data Area is the part of the disk that DOS stores files in.
If you use DOS to format a floppy disk, it overwrites the complete data area with F6h.
For a hard drive, DOS does nothing at all to the data area.
8 ConclusionOK, after studying what DOS does upon bootup, we are now in a position to develop our own extremely secure filing system, that can be accessed via DOS.
To summarise what we have learned so far for a hard disk formatted under DOS:-
We are now in a position to develop our own highly secure indexing system for DOS.
I hope you enjoy the next section ;)