Computers


The Java JVM has the -Xnoclassgc argument to inhibit class garbage collection. That is so 2006. If you have a long-running server JVM, this is most likely going to leak memory.

Specifically, if your process

  • Uses serialization
  • Uses reflection
  • Can have a remote debugger or JConsole attached
  • Uses any dynamically generated classes
  • Uses any 3rd party jars that do any of the above

then you have a slow, psssssssssssssssssssss sound coming from your JVM.

Do you seriously think that in 2008 you know that none of the jars on your classpath use reflection or serialization?

We live in a world now where JRuby comes along and may generate holder and wrapper classes during runtime and do all sorts of stuff to the internals trying to get things to byte-compile down. Only geniuses can truly understand this stuff.

Instead of trying to save a few millis over the course of a month of server JVM run time by turning off the class GC, make it ConcurrentMarkSweep enabled instead:


java -server -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled

and let’s relegate -Xnoclassgc to the scrap-heap. It probably doesn’t do what you think it does.

It all started with Log4j — Java, log4j and I sort of grew up together. I loved log4j. But along came j.c.l, jdk1.4 logging and I had to try them all out. When building a library that other people are supposed to embed into something of theirs, though, turning the logging choice over to them makes a lot of sense. As our little project grew and needed to bring in other components we always appreciated those that did logging the way we wanted it done.

Jetty was a standout in that area — I first learned the coolness of dependency injection from studying Jetty. Jetty provided our first introduction to Slf4j. Slf4j is sort of the ultimately flexible logging system. You build to the Slf4j API and drop in the statically bound jar file for the back end implementation you want (e.g. log4j, logback, log4juli). This is a fantastic idea and thanks are due to Ceki Gülcü for log4j, logback and slf4j.

Quite some time back we felt the pain of just trying out a new logging system. Since every file has imports and a factory method call to get the logger, it is painful to switch. So we insulated ourselves from that by creating our own Logger API, an abstract base class that pretty closely matched what we needed and were using from log4j. We had a Logger.getLogger(String) method, debug, info, warn, error and fatal levels. And we had a LogManager that acted as the factory for it all, returning a wrapper around log4j loggers that met our API. Life was good.

We added Slf4j into the mix at first only to get logging from some Jakarta components that were using j.c.l out into the log4j backend where we wanted them. Slf4j was fantastic at that.

Then the haunting question — why aren’t we doing in our own library the very thing that we enjoy about the logging capabilities of Jetty? Jetty has a soft dependency on Slf4j and falls back to some type of logging to stdout when it isn’t present. Cool. We can do that too.

The friction came when I realized that our nice little log4j-based API abstraction needed NDC. The Slf4j API does not support NDC. Major bummer. We find this incredibly helpful in debugging a complex multi-threaded system. We have pushContext(String), popContext() and clearContext() methods in our logging abstraction and I don’t really want to give them up.

But hey! Why not make a soft dependency on those features just like we are doing for Slf4j? So in the new Slf4jLogger that we use to encapsulate our Slf4j capability, we can check for the presence of org.apache.log4j.Logger and grab those methods we need for NDC. If they aren’t there, ok, we drop the NDC information, but if they are there, then we have it all!

Here’s some code to add the soft dependency on NDC. First, in the static initializer of our Slf4jLogger class:

 // Support methods for ndc capability if log4j is available
 private static Method ndcClear;
 private static Method ndcPop;
 private static Method ndcPush;

   static {
            /* Not shown - Slf4j soft dependency loading */
            Class ndcz = null;
            String NDC = "org.apache.log4j.NDC"
            try
            {
                ClassLoader cl = Thread.currentThread().getContextClassLoader();
                ndcz = cl==null?Class.forName(NDC):cl.loadClass(NDC);
                ndcClear = ndcz.getMethod("clear",new Class[]{});
                ndcPop = ndcz.getMethod("pop",new Class[]{});
                ndcPush = ndcz.getMethod("push",new Class[]{String.class});
            } catch (Throwable t) {}
   }

So that grabs the underlying methods we need when they are present and doesn’t carp when they aren’t.
We can wrap that up in a simple function to let us know when NDC capability is present:

    protected static boolean isNDCEnabled ()
    {
         return ndcPush != null;
    }

And protect the actual methods that are the NDC part of the logger API:

    /**
     * Push the string on the ndc context stack if the capability is supported
     * @param s the new context
     */
    public void pushContext(String s)
    {
        if (isNDCEnabled())
        {
            try
            {
                ndcPush.invoke(null,s);
            }
            catch (Exception e) { /* e.printStackTrace(); */}
        }
    }

Since we are pushing the context into the actual Log4j NDC object, we didn’t have to reinvent anything. If Log4j is on the runtime classpath the features will be found. If Slf4j is configured to actually use Log4j as the backend for logging, then the NDC will actually pop out of the log and we still don’t need Log4j to be on the classpath at compile time!

So have your Slf4j and your NDC too!

String s = "This is a test of the Google Syntax highligher plugin.";

The enterprise and presentation layers of one of the VLDB Data Warehouse (DW) projects that I work on is date range partitioned. This allows us to easily create a sliding window of data that is as big as we can handle given the storage constraints. Because we do partition maintenance operations quite often, we don’t use any global indexes — all of our indexes are local prefix indexes. This impacts primary key index generation.

If the table is created with a primary key constraint


    CREATE TABLE mydata (
         part    TIMESTAMP NOT NULL,
         id        NUMBER NOT NULL,
         title    VARCHAR2(256)
     );
     ALTER TABLE mydata ADD
         (CONSTRAINT pk_mydata PRIMARY KEY(part,id));

Then what you get is a global non-partitioned primary key. One common piece of DW advice is to just not use primary keys at all, but this removes some of the self-documentation that exists in the schema that would be available to both the Cost Based Optimizer and any data modeling tools that might be used by a future DBA or programmer.

Here is what I have come up with that preserves the documentation and creates a LOCAL partitioned index for the primary key fields (Oracle 10g).


   CREATE TABLE mydata (
         part    TIMESTAMP NOT NULL,
         id        NUMBER NOT NULL,
         title    VARCHAR2(256)
     );
    ALTER TABLE mydata ADD
      (CONSTRAINT pk_mydata  PRIMARY KEY (part,id)
      DISABLE NOVALIDATE;
    ALTER TABLE mydata MODIFY CONSTRAINT pk_mydata RELY;
    CREATE INDEX mydata_pk_idx
         ON mydata(part,id) LOCAL
         COMPUTE STATISTICS PARALLEL
/

So now the primary key is there in the user_constraints table for everyone to find. It can be used by the optimizer if needed, but it is not enforced. The index created over the columns that I intend to function as primary key columns is created as a local index and each local index chunk lives in the partition with the data that it indexes. This allows the partition maintenance operations to slide the window forward one date range increment as needed without any indexes becoming unusable.

Bill: What happens if I specify -Xmx multiple times on the command line?
Me: I don’t know, let’s find out.


java -Xms1024m -Xmx99999 -Xmx1024m -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode)

java -Xms1024m -Xmx1024m -Xmx99999 -version
Error occurred during initialization of VM
Incompatible initial and maximum heap sizes specified
Abort trap

Me: The last one always wins.

Suppose you have a Java interface


   public interface Filter {
       int filter(Payload p);
       int filter(java.util.Collection plist);
   }

and an abstract class that implements the interface:


   public abstract class AbstractFilter implements Filter {
        public filter(java.util.Collection plist) {
             for (Payload p : plist) {
                  filter(p);
              }
         }
     }

I sometimes do stuff like this, providing convenience methods in an
abstract base class to enrich an API without putting a burden
on those wishing to implement it, and not tempting anyone to
cut-and-paste a bunch of code around.

Since the filter(Payload p ) method is declared on the interface and
because AbstractFilter is, well, Abstract, I don’t need to mention
filter(Payload p).

But suppose someone wants to provide an implementation:


   public class FooFilter extends AbstractFilter {
      @Override
      public int filter(Payload p) {
          ...
      }
   }

This fails on JDK 1.5 with “method does not override a method from its superclass”.
This will compile without warning on JDK 1.6, and we could just run around and tell
everyone not to use @Override in that situation, but, it’s clearer and better for everyone
and compiles without warnings in both JDK 1.5 and 1.6 if we just add this line to
the AbstractFilter class


    public abstract int filter(Payload p);

This is nice because it gives a place to put some javadoc for the AbstractFilter class,
returns search results if someone is grepping the source for filter methods, and it
allows the @Override to work as it should.

So +3 and -0, on that. It is an all around winner.

Thanks to a little help from the JRuby user mailing list, I learned that the behavior we had been seeing is not a bug. Local variables that come from an eval are created in ThreadLocal storage. I think I already should have know this as I was not having similar issues with global variables.

But in order to keep multiple users from interfering with each other and never being able to use local variables, I changed our RubyConsole object to run on its own thread. It now implements the HttpSessionBindingListener interface so that I can properly cleanup when the session expires. So there is happiness in JRuby integration land again.

I also learned how to capture the stdout and stderr from the JRuby environment and stuff them into ByteArrayOutputStream instances so that I can capture the data and put it on the HTML response along with the toString of any actual RubyObject that comes back from the eval call. So our JRuby console over AJAX HTTP looks a lot more like what happens in irb on a local terminal. This isn’t currently possible to due with pure BSF, but with a little help from JRuby objects it works by setting up like this when a new BSFManager is created.

            protected ByteArrayOutputStream stdout = new ByteArrayOutputStream();
            Ruby runtime = Ruby.getDefaultInstance();
            IRubyObject out = new RubyIO( runtime, stdout );
            manager.declareBean("stdout", out, RubyIO.class);
            manager.declareBean("defout", out, RubyIO.class);
            manager.declareBean(">", out, RubyIO.class);

And then after every eval call, we can get any available stdout data by calling

            stdout.toString();
            stdout.reset();

There are some other nice ways to do this without BSF and I still need to investigate whether there are improvements in JSR233 in JDK6 that would make this easier. But it works really nicely. Way to go JRuby guys!

Mac OS X Leopard (10.5) has some interesting new output from the ls(1) command. Some files include a “@” or a “+” at the end of the permission string. Like this:
-rw-rw-r–@ 1 12345 Jan 2 file.txt
This is different from the “@” used after the filename to note a symbolic link when using the -F flag. This new marker indicates:

  • @ - the presence of extended metadata, see it with “ls -@”
  • + - the presence of security ACL info, see it with “ls -e”

The new mdls(1) command might also be if interest for another view of the metadata. The metadata is stored in a file that begins with ._ (dot underscore) and then the normal filename. So the metadata for file.txt would be found in ._file.txt.

The new flags for ls are listed in the Leopard version of the ls man page, but if you’ve upgraded from Tiger you may not see any new man pages. The new man pages are delivered in gzip compressed format but the old man pages are not deleted — apparently a bug in the upgrade installer.

If you want to remove just those man pages that also have a (most likely newer) gzip version, then you can run this as root:

   cd /usr/share/man
   for f in `find . -name "*.gz" -print ; do
       o=`echo $f | sed 's/\.gz//'`
       if [ -f $o ] ; then
           echo rm $o
       fi
   done

Once you are happy with the output of that, remove the echo statement and the old man pages will be deleted. This isn’t perfect, it still leaves old man pages laying around for things that were part of Tiger but are not part of Leopard. There will still be a man page for niutil(1) , for example, which is obsoleted from Leopard. But I think I can live with that.

« Previous Page