
  What is currently implemented
  -----------------------------

Everything that has been described in the README has been implemented,
but for the following limitation:

In versions 0.1a and 0.1b, the indirect bitmap is not implemented.  It
would require a patch to e2fsck so that e2fsck does not reclaim the
block, because it thinks it is free.  As a consequence, only the first
16 clusters can be compressed.  With 4KB clusters, this means only the
first 64KB is eligible for compression; with 32KB clusters (the
maximum possible), the first 512KB is eligible.

In version 0.1c and following, the indirect bitmap is partially
implemented and may be enabled provided that an e2fsck customised for
e2compr is used (see installation instructions).  `Partially': only
the first level of indirect bitmap is provided.  This means that a
file can store (X - 8) * 8 + 16 bits in the bitmap (where X is the
block size in bytes).  For instance, assuming 1KB blocks, a file can
hold 8144 compressed clusters, that is 32MB of compressed data with a
cluster size of 4, or 260MB with a cluster size of 32.


  What is still to be done
  ------------------------

+ Add a flag to force immediate recompression after write access?
  (Currently we wait for the inode to be put.)  I don't know if this is
  really important.  The advantage of waiting is that it reduces the
  possible races.

  When I was thinking about the ideal compressed file system, I
  imagined we could wait a little more before we really compress file
  that have been accessed.  Since access to compressed cluster is
  slower, we could uncompress them and mark the file dirty, but
  instead of compressing it again when the inode is put, just link it
  into a special directory that would hold all dirty files.  Files in
  this directory could be compressed again after a certain amount of
  time, or when we start to lack free blocks.  This is a feature I
  liked in tcx.  This is no more than a cache where the uncompressed
  block would be stored on the disk, and that would persist even after
  the machine has been stopped.

+ Support the SYNC flag.
+ Support the SECRM flag.

+ Support mmap.  (This is already done for binaries, but not for write
  access.)

+ Free preallocated blocks when we fail to uncompress a cluster.

+ Make a little private cache for uncompressed data: in the current
  version, the working area is dimensionned for the worst case, but
  parts of it are probably never used in the average case.  They could
  be used to provide a small cache for free.
  
+ Try to reduce fragmentation.

+ Patch tune2fs in order to tune dynamically determine the default cluster size
  or the default algorithm?

+ Allow modification of cluster size even for already compressed
  files.

+ Recompress the whole file when the algorithm is changed?

+ Better provision for logfiles, where we'd like to compress all but the
  last (incomplete) cluster.  (If the last cluster is compressed then we have
  to uncompress and recompress on every write -- and remember that logfiles
  are usually sync'ed after every line.)

+ Add some mount options?  Anything useful come to mind?

+ Make the algorithms into kernel modules.  This would reduce physical
  memory usage, and may have other advantages, e.g. adding a new algorithm
  without rebooting, or allowing a proprietory compression method
  (where the owner would not allow source code to be distributed).

+ Make an e2compr kernel module?  The aim would be that people can
  insmod e2compr into a kernel even if that kernel already has the
  ext2 fs (without e2compr) built in.  Useful if you don't have a
  choice of the base kernel, as may be the case when upgrading some
  Linux distributions.

+ Get e2uncompress to work even under a kernel without e2compression enabled.
  Thus we uncompress compressed files that are stored on a filesystem
  created by a patched one.  Should also be used to recover data from
  files that have the ECOMPR flag turned on.

  On an unpatched kernel, we don't have access to the bitmap showing
  which clusters are compressed, nor to the cluster size.  Thus we
  have to guess by looking for cluster heads.  We assume that anything
  that looks like a compressed cluster and can be successfully
  compressed, does correspond to a cluster marked as compressed in the
  cluster bitmap.  In practice, the only sort of case where this
  assumption will not hold is if an archive file (e.g. a tar file
  perhaps) contains an e2compressed file as read with a kernel not
  supporting e2compr.  (The chances of random bit patterns appearing
  like a compressed cluster are too unlikely to bear thinking about,
  and are insignificant compared to the chances of error due to
  hardware failure.)  Any other clusters (e.g. uncompressed clusters
  and clusters with compression errors), we copy across the raw data.

+ Add more compression algorithms.  LZO (``real-time compression'',
  <http://www.infosys.tuwien.ac.at/Staff/lux/marco/lzo.html>) looks
  interesting.

+ After compression, fill the last partial block with zeros.  (Is this
  really useful?  The only time I think it could be mildly helpful is
  for recovering a compressed cluster with errors in it.  Not worth it
  for that alone.)

                         -------------


If you have ideas that are not listed in the TODO file, or opinions
about how things that are listed should be done, please write to the
maintainer (<reiter@netspace.net.au>).
