
			    ********

       READ THIS FILE CAREFULLY BEFORE INSTALLING E2COMPR.

      PLEASE, READ AND FOLLOW VERY CAREFULLY THE INSTALLATION
 INSTRUCTIONS, *EVEN* IF YOU HAVE ALREADY INSTALLED E2COMPR ONCE.

      I TAKE NO RESPONSIBILITY FOR WHATEVER MAY ARISE BECAUSE
          OF THE USE OF THIS CODE. USE AT YOUR OWN RISK.

			    ********

                 PLEASE, READ THE COPYRIGHT FILE.

			    ********

This file gives some explanations about the e2compr patch.  It
describes what the patch is for, how it works, how to install it, and
how you can compress your files.  Here is the table of contents:

	1. What is e2compr?
	    1.1 What does e2compr do?
	    1.2 Why was e2compr written?
	2. How do I install e2compr?
	3. How do I use e2compr?
            3.1 Examples
            3.2 Current limitations
	4. Compression turned off
	5. About efficiency
	6. Authorship

			    ********

1. What is e2compr?
----------------------

  1.1 What does e2compr do?
  ----------------------------

E2compr is a small patch against the ext2 filesystem that allows
on-the-fly compression and decompression.  It compresses only regular
files; the administrative data (superblock, inodes, directory contents etc.)
are not compressed (mainly for safety reasons).  Access to compressed
blocks is provided for read and write operations.  The compression
algorithm and cluster size is specified on a per-file basis.
Directories can also be marked for compression, in which case every
newly created file in the directory will be automatically compressed
with the same cluster size and the same algorithm that was specified
for the directory.

E2compr is not a new filesystem.  It is only a patch to the ext2
filesystem made to support the EXT2_COMPR flag.  It does not require
you to make a new partition, and will continue to read or write
existing ext2 filesystems.  You should consider it is only a way for
the read and write routines to access files that could have been
created by a simple utility similar to gzip or compress.  Compressed
and uncompressed files will coexist nicely on your ext2 partitions.

  1.2 Why was e2compr written?
  ----------------------------

When I got my laptop and installed Linux on it, it was pretty obvious
that the hard drive was too small (240MB) to get a complete
installation of all the tools I needed, while still having a DOS
partition (games for my children) and some space for users.  Buying a
new drive is quite expensive and not very practical for a laptop.  

That's why I immediately looked for tools that could do on-the-fly
compression and decompression.  I installed zlibc and tcx, and used
them for a while.  Though these are nice tools, there are some
limitations.  For instance, you can't write to compressed files
through zlibc; and tcx works only for executables.  Of course these
are minor problems.  These tools are doing what they were written for.
I used them for some months, and one day I reinstalled some packages
on my system: I realised that each time I reinstalled something, I
had to compress by hand the manuals, the binaries if I wanted to....
I found this very painful, and I started to dream about a file system
that would do it in a transparent way.  What I was looking for was a
system that could provide write access to compressed files, and for
which I wouldn't need to compress newly installed files myself.

Someone alerted me to the existence of DouBle by Jean-Marc
Verbavatz (ftp://achaz.saclay.cea.fr/pub/double/double-X.X.tar.gz).
I got it and looked at how it did its work.  There was however one thing that
made me feel a little insecure: administrative data is compressed as
well.  This means that if your system crashes you may loose much more
than a few blocks.  Maybe I am a little overanxious, but I couldn't bear
the idea of my son pressing the power off button while I was
working.  I felt this was a major problem for me.  I was only ready to
use DouBle instead of zlibc for files I could easily recover.
[I can confirm that a filesystem compressed with DouBle can run into
big problems if the machine is powered off in the middle of a write --
pjm.]

I then started to think about what I would really like to have on my
machine.  The result is the e2compr patch for which you are now
reading the documentation.  The ideal tools would meet the following
requirements:

	a) automatic compression	
	b) automatic decompression
	c) file access in both read and write modes
	d) flexible configuration
	e) secure

From the 3 tools mentioned above, DouBle is probably the best one.  It
meets requirements a), b), c) and d).  Unfortunately I can't live with the
idea that inodes are blindly compressed.  Both tcx and zlibc meet
requirements b) and e).  I don't consider they are very flexible
because there is no way to tune them.  Note also that they will both
fail to uncompress data on a completely full file system.

E2compr obviously meets requirements a), b), c), d) and e).  And
that's why I wrote it.

			    ********

2. How do I install e2compr?
----------------------------

Installations instructions are located in the INSTALL file.  Read it
very carefully before doing anything. 

			    ********

3. How do I use e2compr?
------------------------

There is almost nothing to do.  Basically, a file is compressed with
the command 'chattr +c filename', and uncompressed with the command
'chattr -c filename'.  You can also do a 'chattr +c' over a directory.
In such a case any newly created file in the directory will inherit
from the compress flag and will automatically be compressed.  You can
also use 'lsattr' to see if a file is compressed or not.

  3.1 Examples
  ------------

Here are a few examples you may want to type to experiment with
e2compr.  I will assume you did install the updated version of
chattr and lsattr, and that you are still in the directory created by
unpacking the archive.

First we create some temporary directories, and copy some files to
them.  tmp1 will be the reference directory (Note: file lengths may
be different when you do the test):

> mkdir tmp1
> cp /sbin/lsattr HOWTO lsattr.c e2compr.patch tmp1
> ls -l tmp1
total 144
-rw-r--r--   1 root     root        17864 Mar 25 23:20 HOWTO
-rw-r--r--   1 root     root        67149 Mar 25 23:20 e2compr.patch
-rwxr-xr-x   1 root     root        51524 Mar 25 23:20 lsattr
-rw-r--r--   1 root     root         5811 Mar 25 23:20 lsattr.c
> du tmp1
145     tmp1
> cp -r tmp1 tmp2

Then we compress some files in directory tmp2:

> chattr +c tmp2/*
> lsattr tmp2
--c----  8 lzv1    HOWTO
--c----  8 lzv1    e2compr.patch
--c----  8 lzv1    lsattr
--c----  8 lzv1    lsattr.c
           ^^^^--------------- algorithm
         ^-------------------- cluster size (here 8 blocks per cluster)
  ^--------------------------- file is compressed 
> du tmp2
94      tmp2

Ok, we gained 51 blocks.  The default cluster size is 8, and the
default algorithm is LZV1 (Note: file sizes will probably be different
on your system).  Now we uncompress the file we just compressed: 

> chattr -c tmp2/*
> lsattr tmp2
-------  - -       HOWTO
-------  - -       e2compr.patch
-------  - -       lsattr
-------  - -       lsattr.c
> du tmp2
145     tmp2

We can use different cluster sizes with the -b option.
(Supported values are 4, 8, 16 or 32 blocks.)

> chattr +c -b 16 tmp2/*
> lsattr tmp2
--c---- 16 lzv1    HOWTO
--c---- 16 lzv1    e2compr.patch
--c---- 16 lzv1    lsattr
--c---- 16 lzv1    lsattr.c
> du tmp2
88      tmp2

Of course, lsattr reports the new cluster size.  As you can see, a
bigger cluster size seems to compress better (but it is slower,
particularly for random access).  Now we uncompress again the files
and compress them with another algorithm thanks to the '-m' option
(Note: this will not work if your kernel was not configured to use
LZRW3A.  Also note that if you have an older version of chattr that
doesn't support '-m', then you should use '-A' in its place.):

> chattr -c tmp2/*
> chattr +c -bm 16 lzrw3a tmp2/*
> lsattr tmp2
--c---- 16 lzrw3a  HOWTO
--c---- 16 lzrw3a  e2compr.patch
--c---- 16 lzrw3a  lsattr
--c---- 16 lzrw3a  lsattr.c
> du tmp2
81      tmp2

Well, lzrw3a compresses better.  Now we create another directory and
mark it as compressed:

> chattr +c -b 16 tmp3
> lsattr -d tmp3
--c---- 16 lzv1    tmp3
> cp tmp1/* tmp3
> lsattr tmp3
--c---- 16 lzv1    HOWTO
--c---- 16 lzv1    e2compr.patch
--c---- 16 lzv1    lsattr
--c---- 16 lzv1    lsattr.c

Files in tmp3 have automatically been compressed with the same cluster
size and algorithm as tmp3.  Good.

For sceptic peoples:

> diff tmp1/HOWTO tmp3/HOWTO
> ./tmp3/lsattr tmp3
--c---- 16 lzv1    HOWTO
--c---- 16 lzv1    e2compr.patch
--c---- 16 lzv1    lsattr
--c---- 16 lzv1    lsattr.c

Compressed files and uncompressed ones hold the same data!  Compressed
binaries behave as uncompressed ones.

And now, the ultimate test:

# cd /usr/src/linux-1.1.76
# make clean
# time make depend
81.66user 50.67system 2:24.37elapsed 91%CPU ...
# du -s
8753    .
# time make
1006.18user 138.49system 19:37.92elapsed 97%CPU ...
# du -s
11842   .  

# chattr +c -Rbm 16 lzrw3a /usr/src/linux-1.1.76
# cd /usr/src/linux-1.1.76
# make clean
# time make depend
77.27user 70.95system 2:45.50elapsed 89%CPU ...
# du -s
4269    .
# time make
1002.23user 173.16system 20:01.90elapsed 97%CPU ...
# du -s
6449    .

  3.2 Current limitations
  -----------------------

In the current version, you cannot change the cluster size once a
file has been compressed.  You must uncompress the file first, and
make sure no process has the file still open before compressing it
again with the new cluster size.

In the current version, if you change the compression algorithm for a
compressed file, already compressed clusters will not be recompressed
with the new algorithm.  If you want to do so, you should uncompress
the file first, and make sure no process has the file still opened
before compressing it again with the new algorithm.

In the versions 0.1a and 0.1b, only the first 16 clusters will be
compressed.  The other ones will remain uncompressed.  The reason is that
it would require to patch e2fsck in order to store a bigger bitmap for
compressed clusters (currently, e2fsck would reclaim the block where
we would store the bitmap, because it would fail to see it is
used).  This limitation does not hold anymore in version 0.1c and
later, provided you patched e2fsck and installed the new e2fsck.


			    ********

4. Compression turned off
-------------------------

If the read/write routine in the kernel finds an error when accessing
compressed clusters, the compression is automatically disabled for the
file.  In such a case the file can't be accessed any more.  For such
files, lsattr shows an 'E' (error) instead of 'c'. 

You can clear the 'E' flag using 'chattr -E', but chances are that it
will be set again the next time the file is accessed, unless you found
the origin of the error and corrected it.

Of course, data are not lost.  It is possible the read raw compressed
data and to try to uncompress it.  This should be done by a small
utilities running in user space.  Unfortunately such an tool does
not exist.  It is not difficult to write it, but I was not really
motivated to do it, because I never had such errors.  Maybe this will
be done in the next version (if you don't know what to do, and want to
help ... email me).  Hint : this tool should open the file and then set
the EXT2_NOCOMPR_FL flag.  This will allow access to raw compressed
data (see above for the format).  Of course the NOCOMPR flag should be
cleared when done.

			    ********

5. About efficiency
-------------------

The table below gives some idea of the efficiency of the various
algorithm/cluster size pairs.  (The data were collected on a 66MHz 486
with the BIOS set with all the slow options, by the way.)

The information could certainly be improved.  (E.g. what effect does
file size have?  What about writing?  (I'd hoped `chattr +c' would
give this information, but I note that the times for tar and `chattr
-c' do not correlate as well as I'd expected.  I wonder if tar is
reading 4KB at a time while `chattr -c' a cluster at a time?  Is
st_blksize set to cluster size for compressed files?))  If you have
the curiosity to do some timing of your own, send me the results
(together with the script for generating them).  Two hints on timing:
(i) Don't use the machine for anything else while timing.  It does
make a difference.  (Kill off daemons if you're really keen.)  (ii)
Make sure you haven't got any energy-saving things active in your BIOS
setup.


for a in gzip9 gzip6 gzip3 lzrw3a lzv1; do for b in 32 16 8 4; do for t in 1 2; do 
  clear-disk-cache
  time chattr +c -R -m $a -b $b /usr/doc; clear-disk-cache
  time du -s /usr/doc
  time tar cf - /usr/doc | md5sum; clear-disk-cache
  time chattr -c -R /usr/doc
  echo done $b $a
done;echo;done;done

The table below has been formatted, and sorted by compression ratio.
(The md5 sums were identical, you'll be happy to know.)

 ch+                du               tar (read)        ch-              cs alg  
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~ ~~ ~~~~~
0.16u 203.88s 3:51 2257 0.16u 0.84s 0.59u 93.88s 1:49 0.17u 20.09s 0:36 32 gzip9
0.20u 204.00s 3:50 2257 0.21u 0.81s 0.62u 93.75s 1:49 0.20u 19.43s 0:36 32 gzip9

0.17u 127.32s 2:33 2260 0.15u 0.80s 0.56u 93.86s 1:48 0.29u 20.36s 0:38 32 gzip6
0.11u 127.39s 2:35 2260 0.14u 0.81s 0.65u 93.68s 1:48 0.10u 20.23s 0:36 32 gzip6

0.17u  66.70s 1:31 2400 0.13u 0.76s 0.46u 95.83s 1:50 0.19u 20.31s 0:36 32 gzip3
0.23u  66.49s 1:32 2400 0.17u 0.69s 0.66u 95.67s 1:50 0.26u 20.06s 0:38 32 gzip3

0.16u 145.30s 2:52 2406 0.18u 0.64s 0.61u 58.82s 1:13 0.25u 20.24s 0:38 16 gzip9
0.22u 145.18s 2:53 2406 0.14u 0.80s 0.47u 58.00s 1:13 0.24u 20.26s 0:38 16 gzip9

0.17u 107.45s 2:14 2409 0.17u 0.72s 0.69u 57.87s 1:13 0.24u 20.10s 0:38 16 gzip6
0.28u 107.00s 2:12 2409 0.12u 0.82s 0.53u 58.11s 1:13 0.33u 20.07s 0:38 16 gzip6

0.15u  64.22s 1:29 2525 0.18u 0.78s 0.65u 58.80s 1:14 0.16u 20.40s 0:39 16 gzip3
0.16u  63.75s 1:29 2525 0.17u 0.85s 0.70u 58.85s 1:15 0.23u 20.49s 0:41 16 gzip3

0.22u 109.34s 2:22 2681 0.15u 0.80s 0.73u 37.23s 0:54 0.15u 21.65s 0:46 08 gzip9

0.30u  92.93s 2:06 2682 0.13u 0.78s 0.65u 38.03s 0:56 0.17u 21.38s 0:50 08 gzip6

0.21u  63.44s 1:35 2757 0.15u 0.79s 0.71u 37.62s 0:55 0.18u 21.51s 0:46 08 gzip3

0.15u  28.97s 0:53 2898 0.20u 0.80s 0.56u 41.78s 0:58 0.13u 13.61s 0:32 32 lzrw3a
0.15u  29.02s 0:52 2898 0.17u 0.96s 0.69u 41.64s 0:58 0.19u 13.99s 0:32 32 lzrw3a

0.22u  93.60s 2:11 2919 0.21u 0.64s 0.76u 25.62s 0:45 0.22u 24.51s 1:04 04 gzip9

0.17u  87.74s 2:07 2919 0.17u 0.78s 0.72u 25.54s 0:46 0.25u 24.16s 1:03 04 gzip6

0.17u  68.00s 1:45 2983 0.21u 0.68s 0.67u 25.99s 0:46 0.22u 25.17s 1:04 04 gzip3

0.16u  29.01s 0:53 3080 0.14u 0.70s 0.56u 28.95s 0:47 0.20u 13.88s 0:34 16 lzrw3a
0.22u  28.88s 0:53 3080 0.19u 0.74s 0.67u 28.96s 0:47 0.18u 13.75s 0:33 16 lzrw3a

0.12u  17.61s 0:39 3179 0.17u 0.76s 0.56u 23.81s 0:41 0.18u  8.87s 0:28 32 lzv1
0.21u  17.54s 0:41 3179 0.14u 0.83s 0.67u 23.82s 0:42 0.19u  8.59s 0:30 32 lzv1

0.16u  17.86s 0:38 3310 0.11u 0.85s 0.63u 16.75s 0:35 0.15u  9.00s 0:30 16 lzv1
0.18u  17.71s 0:41 3310 0.15u 0.95s 0.75u 16.75s 0:37 0.21u  8.98s 0:29 16 lzv1

0.17u  29.93s 1:03 3385 0.20u 0.80s 0.58u 21.87s 0:42 0.18u 14.49s 0:40 08 lzrw3a

0.19u  18.02s 0:58 3559 0.17u 0.74s 0.66u 12.92s 0:37 0.11u  9.14s 0:40 08 lzv1

0.19u  32.25s 1:10 3965 0.16u 0.71s 0.64u 17.71s 0:40 0.21u 16.35s 0:58 04 lzrw3a

0.17u  19.30s 1:00 4025 0.11u 0.74s 0.66u 10.36s 0:34 0.26u  9.33s 0:45 04 lzv1

			    ********

6. Authorship
-------------

This document was originally written by:

> Antoine Dumesnil de Maricourt			E-mail:	dumesnil@etca.fr 
> ETCA CREA-SP				  	Phone:	(33) 1 42 31 96 68
> 16 bis, avenue Prieur de la Cote d'Or	  	Fax:	(33) 1 42 31 99 64
> 94114 Arcueil Cedex FRANCE	 	 	
>
> http://www.etca.fr/Users/Antoine%20de%20Maricourt/

First person should be taken as referring to Antoine except in section
5 or comments tagged with `pjm'.

Some anglicisation, updating and the like, as well as the timing
statistics were done by the current maintainer, Peter Moulder
<reiter@netspace.net.au>.


The home page for this project (where updates can be found) is
<http://netspace.net.au/~reiter/e2compr.html>. 
