Computers


The enterprise and presentation layers of one of the VLDB Data Warehouse (DW) projects that I work on is date range partitioned. This allows us to easily create a sliding window of data that is as big as we can handle given the storage constraints. Because we do partition maintenance operations quite often, we don’t use any global indexes — all of our indexes are local prefix indexes. This impacts primary key index generation.

If the table is created with a primary key constraint


    CREATE TABLE mydata (
         part    TIMESTAMP NOT NULL,
         id        NUMBER NOT NULL,
         title    VARCHAR2(256)
     );
     ALTER TABLE mydata ADD
         (CONSTRAINT pk_mydata PRIMARY KEY(part,id));

Then what you get is a global non-partitioned primary key. One common piece of DW advice is to just not use primary keys at all, but this removes some of the self-documentation that exists in the schema that would be available to both the Cost Based Optimizer and any data modeling tools that might be used by a future DBA or programmer.

Here is what I have come up with that preserves the documentation and creates a LOCAL partitioned index for the primary key fields (Oracle 10g).


   CREATE TABLE mydata (
         part    TIMESTAMP NOT NULL,
         id        NUMBER NOT NULL,
         title    VARCHAR2(256)
     );
    ALTER TABLE mydata ADD
      (CONSTRAINT pk_mydata  PRIMARY KEY (part,id)
      DISABLE NOVALIDATE;
    ALTER TABLE mydata MODIFY CONSTRAINT pk_mydata RELY;
    CREATE INDEX mydata_pk_idx
         ON mydata(part,id) LOCAL
         COMPUTE STATISTICS PARALLEL
/

So now the primary key is there in the user_constraints table for everyone to find. It can be used by the optimizer if needed, but it is not enforced. The index created over the columns that I intend to function as primary key columns is created as a local index and each local index chunk lives in the partition with the data that it indexes. This allows the partition maintenance operations to slide the window forward one date range increment as needed without any indexes becoming unusable.

Bill: What happens if I specify -Xmx multiple times on the command line?
Me: I don’t know, let’s find out.


java -Xms1024m -Xmx99999 -Xmx1024m -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode)

java -Xms1024m -Xmx1024m -Xmx99999 -version
Error occurred during initialization of VM
Incompatible initial and maximum heap sizes specified
Abort trap

Me: The last one always wins.

Suppose you have a Java interface


   public interface Filter {
       int filter(Payload p);
       int filter(java.util.Collection plist);
   }

and an abstract class that implements the interface:


   public abstract class AbstractFilter implements Filter {
        public filter(java.util.Collection plist) {
             for (Payload p : plist) {
                  filter(p);
              }
         }
     }

I sometimes do stuff like this, providing convenience methods in an
abstract base class to enrich an API without putting a burden
on those wishing to implement it, and not tempting anyone to
cut-and-paste a bunch of code around.

Since the filter(Payload p ) method is declared on the interface and
because AbstractFilter is, well, Abstract, I don’t need to mention
filter(Payload p).

But suppose someone wants to provide an implementation:


   public class FooFilter extends AbstractFilter {
      @Override
      public int filter(Payload p) {
          ...
      }
   }

This fails on JDK 1.5 with “method does not override a method from its superclass”.
This will compile without warning on JDK 1.6, and we could just run around and tell
everyone not to use @Override in that situation, but, it’s clearer and better for everyone
and compiles without warnings in both JDK 1.5 and 1.6 if we just add this line to
the AbstractFilter class


    public abstract int filter(Payload p);

This is nice because it gives a place to put some javadoc for the AbstractFilter class,
returns search results if someone is grepping the source for filter methods, and it
allows the @Override to work as it should.

So +3 and -0, on that. It is an all around winner.

Thanks to a little help from the JRuby user mailing list, I learned that the behavior we had been seeing is not a bug. Local variables that come from an eval are created in ThreadLocal storage. I think I already should have know this as I was not having similar issues with global variables.

But in order to keep multiple users from interfering with each other and never being able to use local variables, I changed our RubyConsole object to run on its own thread. It now implements the HttpSessionBindingListener interface so that I can properly cleanup when the session expires. So there is happiness in JRuby integration land again.

I also learned how to capture the stdout and stderr from the JRuby environment and stuff them into ByteArrayOutputStream instances so that I can capture the data and put it on the HTML response along with the toString of any actual RubyObject that comes back from the eval call. So our JRuby console over AJAX HTTP looks a lot more like what happens in irb on a local terminal. This isn’t currently possible to due with pure BSF, but with a little help from JRuby objects it works by setting up like this when a new BSFManager is created.

            protected ByteArrayOutputStream stdout = new ByteArrayOutputStream();
            Ruby runtime = Ruby.getDefaultInstance();
            IRubyObject out = new RubyIO( runtime, stdout );
            manager.declareBean("stdout", out, RubyIO.class);
            manager.declareBean("defout", out, RubyIO.class);
            manager.declareBean(">", out, RubyIO.class);

And then after every eval call, we can get any available stdout data by calling

            stdout.toString();
            stdout.reset();

There are some other nice ways to do this without BSF and I still need to investigate whether there are improvements in JSR233 in JDK6 that would make this easier. But it works really nicely. Way to go JRuby guys!

Mac OS X Leopard (10.5) has some interesting new output from the ls(1) command. Some files include a “@” or a “+” at the end of the permission string. Like this:
-rw-rw-r–@ 1 12345 Jan 2 file.txt
This is different from the “@” used after the filename to note a symbolic link when using the -F flag. This new marker indicates:

  • @ – the presence of extended metadata, see it with “ls -@”
  • + – the presence of security ACL info, see it with “ls -e”

The new mdls(1) command might also be if interest for another view of the metadata. The metadata is stored in a file that begins with ._ (dot underscore) and then the normal filename. So the metadata for file.txt would be found in ._file.txt.

The new flags for ls are listed in the Leopard version of the ls man page, but if you’ve upgraded from Tiger you may not see any new man pages. The new man pages are delivered in gzip compressed format but the old man pages are not deleted — apparently a bug in the upgrade installer.

If you want to remove just those man pages that also have a (most likely newer) gzip version, then you can run this as root:

   cd /usr/share/man
   for f in `find . -name "*.gz" -print ; do
       o=`echo $f | sed 's/\.gz//'`
       if [ -f $o ] ; then
           echo rm $o
       fi
   done

Once you are happy with the output of that, remove the echo statement and the old man pages will be deleted. This isn’t perfect, it still leaves old man pages laying around for things that were part of Tiger but are not part of Leopard. There will still be a man page for niutil(1) , for example, which is obsoleted from Leopard. But I think I can live with that.

« Previous Page