
Here are a few questions that where asked by early testers, and that
can be of interest for everybody. I didn't know where to put them, and
I decided an FAQ list would be a good place.... If you have some
question/s that you consider worth putting here send them to
<reiter@netspace.net.au> (if possible with answers...).

=======================================================================

 SUMMARY.

 Section 1: e2compr with other packages

   1.1  How does e2compr interact with e2fsck ?
   1.2  Can I still use a defragmenter ?
   1.3  Can I still use a e2dump ?
   1.4  Can I still use zlibc ?
   1.5  Can I still use DouBle ?

 Section 2: e2compr at work

   2.1  When does de/compression really occur ?
   2.2  Are uncompressed data cached ?
   2.3  What if I uncompress a file on a full device ?
   2.4  How do I see if a file hasn't been uncompressed ?
   2.5  Is it possible to compress a swap file ?
   2.7  It seems to me that files will be fragmented a lot.
   2.7  Can I make a compressed boot/root floppy ?
   2.8  What files should(n't) I compress ?
   2.9  What algorithms should(n't) I use ?
   2.10 How do I uninstall e2compr ?

 Section 3: e2compr troubles

   3.1  I get the error 'Filesystem full', even if there is obviously
        a lot a place on the device (as reported by df).
   3.2  I get the error 'File busy'.
   3.3  I get the error 'I/O Error'.
   3.4  When reading, I see strange data in my file.
   3.5  E2compr doesn't seem to compress as well as gzip.

=======================================================================

 Section 1: e2compr and other programs
 -----------------------------------------

   1.1  How does e2compr interact with e2fsck ?
   ---------------------------------------------

In the restricted version (IND_BITMAP not defined), there is no
problem because the indirect bitmap is not used.  In the full version,
you absolutely must use the patched version of e2fsck.

The rule is the following: if you run a kernel that have been compiled
with IND_BITMAP and if you have used 'chattr +c' at least once with
such a kernel, then you must used the modified version of e2fsck and
never use again the original version.
[pjm: I believe this only applies for files that have compressed
clusters after the first 16 clusters of the file.]

If you run a kernel compiled without IND_BITMAP, or compiled with
IND_BITMAP but you never used 'chattr +c', then you can use either the
original version of e2fsck either the patched one, at your will.

   1.2  Can I still use a defragmenter ?
   -------------------------------------

DON'T DO THAT.  In fact, I don't know if this is risky (I didn't try
and I didn't spend time to look at the sources), but chances are that
the defragmenter will fail to manage the indirect bitmap for
compressed file.  It depends on the way it is written (i.e. does it
compute itself the list of free blocks, or does it use the one
provided by the file system ?)

If you know for sure it can be used, please let me know.

   1.3  Can I still use a e2dump ?
   -------------------------------

Well, you certainly can, but don't expect restore to work on
compressed files!  E2dump has (still) not been modified to save the
information that certain clusters are compressed.  Thus, when restoring
compressed files, it will not restore the cluster's bits that say if
there are compressed or not.  Data won't really be lost, but you will
have a direct access to raw compressed data, and transparent
decompression will not occur.  This is almost certainly not what you want.

I'm sorry but you'll have to use something else to make your backups,
or wait until e2dump is modified, or do it yourself ...

   1.4  Can I still use zlibc ?
   ----------------------------

Yes, there is no problem.  Beware however about some oddnesses in
zlibc.  For instance, when you stat a (zlibc) compressed file, you
will probably see it as a regular file.  When you open it for reading,
you can get a pipe instead of a truly regular file.  Programs (such as
chattr) that set inode flags don't like that.

   1.5  Can I still use DouBle?
   ----------------------------

Yes, there is no problem.  But if you mount DouBle over an ext2 file,
be very careful not to turn on the 'c' flag for this file.  You will
probably get in trouble (see also question 2.5).  This is because
mmap() is not implemented for write access.

=======================================================================

 Section 2: e2compr at work
 ---------------------------

Implementation details, including an introductory text, is included in
the file /usr/src/linux/fs/ext2/Readme.e2compr (once you have applied
the patch; alternatively, read the file as it appears in the patch).


   2.1  When does de/compression really occur ?
   --------------------------------------------

Compression really occurs when the inode is put, i.e. closed by every
process that have a reference to it.  Writing to a compressed file is
done as if the file were not compressed (uncompressed data is written
to the disk), but the file is marked dirty.  When the inode is put, and
if the file is dirty, the kernel scans every cluster and compresses
those that are not compressed.  Unneeded blocks are freed again.

Decompression occurs when needed, i.e. every time the kernel wants to
read a compressed cluster.  Of course, the cluster remains compressed
on the disk.  But this means that we will have to uncompress the
cluster every time it is accessed.

   2.2  Are uncompressed data cached ?
   -----------------------------------

No.  Only compressed data are cached, and there is a price to pay for
that.  This is probably something to change, but it is not really easy
to do.

One easy way to proceed would to have a private cache.  I will maybe
experiment that soon.

The right way would to use he kernel buffer cache, but I really don't
know how to do that.  We could create a special device that would
provide some uncompressed blocks.  The trouble is to find an efficient
way to map <inode + compressed cluster number> in the ext2 filesystem
into some block number for this special device.  I have no idea how to
do that (that is, no simple idea).

Another idea would be to have two versions of each compressed file on the
disk.  The normal, user visible, version would be compressed, and there
would be a hidden uncompressed version lying into some special
directory.  Every compressed file would have a reference to its
uncompressed version.  When accessing a compressed file, we could check
if it is already uncompressed and have fast accesses if yes.  Of course
the uncompressed versions should be garbage-collected when there is no
more space on the device.  This idea is interesting because it would
optimize the disk usage, and provide a kind of persistent cache.

Maybe this special directory containing uncompressed versions of files
could be in a ram disk?  This would be a way to use the kernel buffer
cache.

   2.3  What if I uncompress a file on a full device ?
   ---------------------------------------------------

In such a case, the file is simply not uncompressed, and remains as it
is.  Note that the 'c' flag (as reported by lsattr) will be
cleared.  Clearing the 'c' flag (with chattr -c) is just a request
saying that the normal state of the file is to be uncompressed.  It is
not a guarantee that the file will be stored uncompressed.  If there are
no more blocks on the device, it is obvious that the file can't be
uncompressed, even if the 'c' flag is cleared. 

However, the file will be uncompressed after free blocks become
available.  This is done by marking the file as dirty (which means not
in normal state) as explained in question 2.1.  New attempts to
uncompress a dirty file are made when the file is next accessed.  (A
simple touch or ls -l should be enough for that.)

This could also be done with a simple daemon, but it is not ... 

   2.4  How do I see that a file has not been uncompressed ?
   ---------------------------------------------------------

You can see if a file is dirty (cf question 2.1 and 2.3) with
lsattr.  A file is dirty if the 'c' flag is cleared, and if some
compressed clusters remain in the file.  In such a case, lsattr will
not show the 'c' flag, but will still show a cluster size.  Here are
some examples. 

Here, the file is compressed:

  > lsattr foo
  --c---- 16 lzrw3a  foo

We try to uncompress it.  The 'c' flag is cleared, but the cluster size
is still there.  This means that the file still holds some compressed
clusters:

  > chattr -c foo
  > lsattr foo
  ------- 16 lzrw3a  foo

In the next example, the cluster size is no more displayed after the
file is uncompressed.  The uncompression has succeeded:

  > lsattr foo
  --c---- 16 -       foo
  > chattr -c foo
  > lsattr foo
  -------  - -       foo

   2.5  Is it possible to compress a swap file ?
   ---------------------------------------------

It is possible to do a 'chattr +c' over an active swapfile, but it will do
nothing, and moreover I don't recommend to do that.  The reason is that
compression occurs only when the inode is put (see question 2.1).  This
is just a guess, but it seems to me that a swap file will be opened
when swap is enabled and put when swap is disabled.  The compression
will thus only occur when the swap file is not used anymore, which is
certainly not what you want.

It is certainly possible to make compression occur when the file is
accessed, and not only when it is closed.  But remember that e2compr
works by freeing blocks.  When you create a swap file, the first thing
to do is to allocate the blocks (using dd if=/dev/zero for instance)
so that the kernel will be sure to find these blocks when it will need
them.  Allowing e2compr to free some blocks in a swap file could be
dangerous because the kernel could eventually need to reallocate some
blocks when swapping.  The question is of course: what if the device
is full ?

   2.7  It seems to me that files will be fragmented a lot.
   --------------------------------------------------------

Then you are right ...  Instead of freeing unneeded blocks, we could
free the complete cluster and then reallocate the really needed
blocks.  Preliminary tests showed me it does not improve the speed if
the machine is not highly loaded, but it would certainly if it were.
It also certainly depends on the speed of your disk.

If this is a problem, you can use e2compress to compress your files
instead of chattr.  This will of course be useful if the file is used
only in read mode, but even if it is sometimes written it will
probably reduce the fragmentation.

Please, DON'T USE A DEFRAGMENTER unless you're prepared to lose your
data (see question 1.2).

   2.7  Can I make a compressed boot/root floppy ?
   -----------------------------------------------

No problem, and you can put a lot of things on it.  The only
precaution is, of course, to make sure that you don't compress files
that are needed before the kernel is fully loaded and before the ext2
code can be used !!! You thus shouldn't compress your kernel image
(which is probably already compressed) or any file in the /boot
directory.  But you can compress dynamic libraries and binaries without
problem.

However, if you're reading that floppy into a RAM disk, and memory
consumption isn't too much of a problem, then you will be better off
using the compressed ramdisk feature of the kernel instead of e2compr.

   2.8  What files should(n't) I compress ?
   ----------------------------------------

   2.9  What algorithms should(n't) I use ?
   ----------------------------------------

The INSTALL file has a little information on this.

   2.10 How do I uninstall e2compr ?
   ---------------------------------

Why do you want to do that?  However, it is very simple: uncompress
every file with 'chattr -c', then make sure that files are really
uncompressed (cf question 2.4), and eventually restore your kernel and
the original copies of e2fsck, lsattr and chattr. 
 
=======================================================================

 Section 3: e2compress troubles
 -------------------------------

   3.1  I get the error 'Filesystem full'.
   ---------------------------------------

[even if there is obviously a lot a place on the device]

This was reported by C. Niemi after he made an ext2 partition over a
floppy disk.  He got the message, while df reported about 900KB free on
the floppy.  The explanation he found was that, because of the
compression, it was possible to put many more files than usual on the
disk.  As a consequence, there were no more inodes left, even if 900KB
data was still available.

In such a case, I suggest that you increase the amount of inodes when
you create the file system on the floppy.  See the manual pages for
mke2fs.

   3.2  I get the error 'File busy'.
   ---------------------------------

Then do a 'lsattr' over the file.  It will probably display `*' at the
place you would have expected `c'.  This means that the file is in a
special state where automatic compression is turned off (for instance
the temporary file created by e2compress).  In such a case, only one
process can open the file (security reason), the other ones being said
that the file is busy.  Programs that turn off automatic compression
should be written with care....

   3.3  I get the error 'I/O Error'.
   ---------------------------------

Then do a 'lsattr' over the file.  It will probably display `E' at the
place you would have expected `c'.  This means the file is in a special
state where access is disallowed, the reason being that an error (related
to compression) occured earlier.  The file is flaged so that the chances to
lose more data are minimized.

Currently, there is however no tool that could recover this file.  Such
an option should be added to e2uncompress.

   3.4  When reading, I see strange data in my file.
   -------------------------------------------------

Then do a 'lsattr' over the file.  It will probably display `*' at the
place you would have expected `c'.  The difference with question 3.2 is
that nobody else has the file opened.  One way to enable again automatic
compression is to do a 'chattr -X' over the file.

   3.5  E2compr doesn't seem to compress as well as gzip.
   ------------------------------------------------------

First of all, check that the `best' (i.e. most compressing) compression algorithm
and cluster size are being used.

  $ chattr -c myfile
  $ df myfile    # Just check that we didn't run out of space
                 # Another, surer way of doing this if we have the e2compr
                 # version of lsattr is `lsattr myfile' and see if an
                 # algorithm name is still displayed.
 
  Filesystem         1024-blocks  Used Available Capacity Mounted on
  /dev/hda2             149806  124412    17658     88%   /

                 #                        ^^^^^
                 # No, we probably didn't run out of space.

  $ chattr +c -m gzip9 -b 32 myfile


Now take a look at how it compares with straight `gzip -9'.

  $ e2ratio -l myfile
  1488    721      48.5%  myfile
  $ gzip -9 < myfile > myfile.gz
  $ du -k myfile.gz
  601     /tmp/delme.gz

There is still a difference (721 versus 601 1KB-blocks).  This difference
is because e2compr divides a file into sections (called `clusters') of a
fixed size (in this case 32KB) and compresses each of those clusters
independently: any similarity between the contents of different clusters
will be ignored.  The decision to do things this way was made in order to
reduce the amount of time it takes for random access.  (With gzip, there's
no way of accessing the last byte of the uncompressed stream without
decompressing the whole stream.  Other compression methods, more suitable
to random access, are conceivable, but designing a good compression method
is a serious undertaking.)
