20 messages in com.googlegroups.google-appengine[google-appengine] Re: Many-to-many J...
FromSent OnAttachments
arbi...@gmail.com29 Apr 2008 08:19 
Yagiz Erkan29 Apr 2008 08:43 
Jeremey Barrett29 Apr 2008 08:49 
Dado29 Apr 2008 09:15 
arbi...@gmail.com29 Apr 2008 09:50 
Ben the Indefatigable29 Apr 2008 11:06 
arbi...@gmail.com29 Apr 2008 11:19 
arbi...@gmail.com29 Apr 2008 13:25 
Ben the Indefatigable29 Apr 2008 17:01 
Ben the Indefatigable29 Apr 2008 18:33 
arbi...@gmail.com30 Apr 2008 07:22 
Brett Morgan30 Apr 2008 17:36 
cyj03 May 2008 06:27 
arbi...@gmail.com08 May 2008 09:16 
joh...@easypublisher.com12 May 2008 08:58 
Andrew Fong14 May 2008 16:15 
Brett Morgan14 May 2008 17:00 
Andrew Fong15 May 2008 01:14 
Brett Morgan15 May 2008 04:42 
Filip15 May 2008 05:01 
Subject:[google-appengine] Re: Many-to-many JOIN with the Datastore
From:Filip (fili@gmail.com)
Date:05/15/2008 05:01:28 AM
List:com.googlegroups.google-appengine

Another method is to unwind links into a single Expando model. Often, you need an record which has a fixed set of references in a list, and each referenced record has a set of properties. Instead of retrieving the set of referenced records for the record every time, you could try to include the referenced records in the parent record (Expando) by including ref1_prop1, ref1_prop2, etc, where prop1 and prop2 are replaced by actual property names. That way, you can substantially reduce the number of searches in the database to build a single webpage.

Now, I wonder if it would be possible to write a plugin script that automatically convert those ref1, ref2, etc. back into a list you can loop through as if there were real references. I'll try that.

Is anybody aware of limitations on the number of fields an Expando model can have?

On 15 mei, 10:15, Andrew Fong <Fong@gmail.com> wrote:

Hmmm, so maybe the proper way to approach the datastore is think of it as a pseudo-cache. Let's say we start with a more or less normalized datastore and we do all the joins through a ReferenceProperty -- and if we notice we're frequently using that reference, we "cache" the the referenced values in the referencing entity. And we treat updates to the referenced attribute using the same strategies we treat updates to any item that's cached -- e.g. wait for the values to propagate via some background task (speaking of which, how are people doing background tasks in GAE?), whether that's one that runs periodically or whenever certain kinds of entities are updated.

It seems to me that a large part of this could be automated though. I really like how the datastore indices are automatically generated in the index.yaml file without any action on the developers part. I'm new to python and GAE but how feasible would it be to write a plugin that automatically does this sort of "caching"?

-- Andrew

On May 14, 5:01 pm, "Brett Morgan" <bret@gmail.com> wrote:

On Thu, May 15, 2008 at 9:15 AM, Andrew Fong <Fong@gmail.com> wrote:

I still have issues with denormalization. It's not just a space issue. The reason normalized databases don't repeat records is to avoid some confusion down the road. For example, what happens if, in the LibraryBook example, the Library changes its name? In a normalized database, you would only have to update one record. Under a denormalized database, would that entail finding every LibraryBook that referenced that particular Library and updating it?

It so, it seems that the more denormalized a database is, the more expensive updates are (even if the reads are fast).

Furthermore, it would require anyone trying to update an entity to understand the structure of all the entities that referenced this entity. In the LibraryBook example, updating the name attribute for Library requires knowing that there is a libraryname attribute in LibraryBook. Not a big deal for one model, but as the number of models increases, it's going to get difficult keeping track of which entities referencing Library have a libraryname attribute, which have a libraryaddress attribute, and which ones might not have any such attribute at all -- especially on a multi-person project.

Am I missing something?

-- Andrew

Yes, all of the above concerns are valid. Yes, denormalisation hurts, both on disk space, and on correctness.

The reason we are doing this is to achieve scale. At scale you wind up doing a bunch of things that seem wrong, but that are required by the numbers we are running. Go watch the EBay talks. Or read the posts about how many database instances FaceBook is running.

The simple truth is, what we learned about in uni was great for the business automation apps of small to medium enterprise applications, where the load was predictable, and there was money enough to buy the server required to handle the load of 50 people doing data entry into an accounts or business planning and control app.

On the web, we are in a different world. If you get successful, you'll get slashdotted. Well, these days it's probably more correct to call it reddited. Or boing boinged. And suddenly you have to go from 4 servers to fourty, to four hundred, to four thousand. Read up the story about the iLike guys. They wrote an app that went viral on FB. And they melted. Needed servers. Yesterday.

What GAE gives you is the ability to handle this, easily. All the things that GAE makes you do is done with this end game in mind. You have to write your code such that it can run on 400 app servers spread across the globe, on google's infrastructure. You have to deal with the fact that the transaction engine is distributed. You have to deal with the fact that queries are slow, and you should really be publishing entities that match one to one with your popular pages. And that you need to hide your updates using ajax. It's better to give the user a progress bar than a white screen of death, anyways.

If you aren't interested in serving millions of customers, then this is likely overkill for you. But if you are, then you have to go through this world view change. And yes it hurts. I'm not going to say it's easy. It hurt me when I had to go through it back in 2000. It actually took me about four attempts (aka, webapps that melted underload) before i got it. But, once you make the leap, and understand that we are breaking rules for a reason, then you'll understand where and when to do it. Every choice has costs and benefits. Understanding when GAE makes sense is part of this journey of discovery.

And if any of the above doesn't make sense, feel free to come back with more questions. =)

--

Brett Morganhttp://brett.morgan.googlepages.com/- Tekst uit oorspronkelijk
bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -