db4o Developer Community
Developer Community db4o open source object database, native to Java and .NET
Register   |  Login
  Search
  • Forums
  • Documentation
  • Resources
  • Downloads
  • Blogs
  • About
Unanswered Active Topics Forums
Forums > English Forums > db4o User Forum
Abysmal Query Performance! Am I doing something wrong?
Last Post 22 Mar 2010 11:08 AM by amercieca. 6 Replies.
AddThis - Bookmarking and Sharing Button Printer Friendly
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
amercieca
New Member
New Member
Posts:3
amercieca

--
07 Mar 2010 10:37 PM  

 Hi,

 

I'm currently evaluating db4o.

The API is very neat and I can see developer productivity getting a massive boost using an OODBMS with such a clean API.

Anyway, got excited about db4o and did some tests..... which resulted in disappointment. The query performance is bad; read bad. I must be doing something wrong!

My test is based on one persistent POJO:

 

package db4ostress;


import com.db4o.config.annotations.Indexed;


public class Person {


    @Indexed

    private String name;

    @Indexed

    private String surname;

    @Indexed

    int age;


    public Person(String name, String surname, int age) {

        this.name = name;

        this.surname = surname;

        this.age = age;

    }


    public String getName() {

        return name;

    }


    public String getSurname() {

        return surname;

    }


    public int getAge() {

        return age;

    }


    @Override

    public String toString() {

        return String.format("%s %s: %d", name, surname, age);

    }

}

 

I used a tool to generate a CSV file with realistic data for this class. I had it generate 300,000 records.

Then I wrote a simple import routine that read the data from the CSV into a db4o ObjectContainer. I did have problems with this too; I could not load all the objects in one transaction because the VM kept running out of heap space. I had to load the data using transactions limited to 1000 new objects. This is something that we might live with, but it's still a limitation. Bdw... loading this data took more than 5+ minutes - which is a bit slow.

 

Anyway, the major problem I hit was when I tried to retrieve all Person instance having an age >= 25. The query (using SODA) takes about 15 seconds!! Way too long for it to be feasible in a production environment. I had tried Native queries using a Predicate, but this was much worse - it seemed to me that it was walking through the whole list of Person instances and invoking my predicate for each one - ignoring any indexes I had set up (using the @Indexed annotation). So after reading the docs, I tried SODA; which improved things, but still is far from what I would consider as acceptable. The same query against the same data on a MySQL database takes about 250ms.

One thing I noticed was that if I changed the query to find objects having age == 25, the query executed in approx 1.2 seconds!!  Surely this must indicate that something is wrong no?

This is the code for my SODA query:

 

        ObjectSet r;

        Query q = oc.query(); //oc is the ObjectContainer

        q.constrain(Person.class);

        q.descend("age").constrain(new Integer(24)).greater();

        long now = System.currentTimeMillis();

        System.out.println("Starting query...");

        r = q.execute();

        System.out.println("Query Duration: " + (System.currentTimeMillis() - now));

 

Please tell me that I am doing something wrong! I am impressed with the db4o API and how easy it is to work with. But such poor performance is surely a stumbling block for our adoption of it.

 

I did the tests with v7.12.

 

Tks.

- Adrian.

Carl Rosenberger
Veteran Member
Veteran Member
Posts:2122
Carl Rosenberger

--
09 Mar 2010 11:54 AM  
db4o query processing runs in two steps.
In a first step the index processor creates a set of candidate objects that the query starts out with.
In a second step all constraints are processed against the candidate objects.

Step 1 is very fast. It uses BTree logic and "BTree pointers", so very little needs to get loaded into memory.
Step 2 is based on an old SODA processor from the beginnings of db4o and it is comparatively slow.

That means If the set of objects determined by the index processor is large, queries also are comparatively slow.

From your results I assume that you don't have an even distribution of data and your query returns a large number of objects.
To check the difference you may like to try a value where you know you have little or no results:
q.descend("age").constrain(new Integer(444));

We have a pending optimisation for the query processor: If all querying work has been done by the index processor the old SODA processor does not need to run. You can vote for the issue here:
http://tracker.db4o.com/browse/COR-1133

If the expected set of result objects is small and if indexes can be used well, the query processor should be very fast as is.
amercieca
New Member
New Member
Posts:3
amercieca

--
11 Mar 2010 07:46 AM  

 Hi,

 

Thanks for your answer.

 

I have actually analysed the data and found that the data is evenly distributed as regards the 'age' field. There are +/- 3000 records for each age; e.g. about 3000 for age 12, 13, 14 etc up to 99. So the data for this indexed field is evenly distributed.

 

I have experimented on what you said, by trying queries for age >= 99; This results in the query returning the results very fast. When I did the query for age >= 95, the query slowed down significantly. The lesser the age I specify (and hence the number of records in the resultant result set grows) the query takes longer and longer to execute.

 

This, imho, is somewhat of a flaw. Using BTree index, when one queries the data in the above scenario for say age >= 45, locating the first record, using the 'age' index, having age >= 45 should be very fast - btree's are very fast. Then, having located this first matching record, there would be no need to re-analyse all the candidates to the right of the tree node, because, following the btree algorithm/logic, all records to the right of the matching record will always be >= 45. The flaw therefore is the analysing of all the candidate records to the right for them having age >= 45, when this is already known from the btree index.

I might be wrong, but this correlates perfectly with my findings that the lesser the age I specify, the longer the query takes to return.

Is this something that you would consider looking at?

Surely others here must have come across this issue.

 

Any comments?

 

Tks.

- Adrian.

 

 

 

Carl Rosenberger
Veteran Member
Veteran Member
Posts:2122
Carl Rosenberger

--
15 Mar 2010 12:41 PM  
Yes, absolutely, this is an issue that we want to improve.
That's why the Jira issue is there:
http://tracker.db4o.com/browse/COR-1133
mnemosyn
New Member
New Member
Posts:23
mnemosyn

--
16 Mar 2010 10:12 AM  
I'm guess I encountered the same issue just now - only via .net/linq:

// This query is blazing fast (86ms on 200k objects), because there where clause will only result in a few hundred objects
var rpoListOrdered = (from ReallyPlainObject o in _context.Client where o.Ticks > lastMonth.Ticks orderby o.Ticks select o).Take(100).ToList();

// This query takes about 6s, i.e it's almost 100 times slower
var rpoListOrderedAll = (from ReallyPlainObject o in _context.Client orderby o.Ticks select o).Take(100).ToList();

First I thought this was related to some kind of Linq translation issue, but it seems that is not the case from what I read here and from what SODA queries suggest.

I wonder how much priority this task has for you and if it is complex in terms of implementation or not -- more to the point, if we can expect this feature in the near future or not?

In order to find e.g. the most recent posts of a forum I'd now come up with a very strange solution that loops over queries and expand the range of items in the where clause - but that seems really, really hacky. It will still be the fastest solution, however. I see that posts in a forum is not exactly the case for an object database, after all - but such queries also occur in my complex domain model where I need to sort large chunks of data and return the top troublemakers...

Otherwise, using db4o with Linq is real fun because it is nicely encapsulated, fast and easy. Rocks!

Cheers, Chris
Carl Rosenberger
Veteran Member
Veteran Member
Posts:2122
Carl Rosenberger

--
16 Mar 2010 11:27 AM  

Performance improvements like this one certainly have priority in the near future. If you vote for the issue, you can increase the likelihood that we do it:

tracker.db4o.com/browse/COR-1133

So far there is only one vote.

amercieca
New Member
New Member
Posts:3
amercieca

--
22 Mar 2010 11:08 AM  

Phew! Good to hear I'm not the only one who have come across this.

I've just voted for it to be fixed.

Tks - Adrian.

You are not authorized to post a reply.
Forums > English Forums > db4o User Forum

Active Forums 4.2
Close
Copyright ©2000-2010 by Versant Corp.
Privacy Policy