Re: [geonode] How much layer metadata should we replicate in the
Django models?
- From:
- David Winslow
- Date:
- 2010-06-14 @ 18:45
I don't think we are yet at the point where we need to start worrying
about RAM caches, there is still a lot of room for using the database in
a smarter (or rather, less braindead) way. In the specific example of
displaying GeoNetwork search results, we are hitting the database
separately for each search result to grab the metadata for it. Luke has
implemented a workaround that cleverly sidesteps this issue by holding
off on loading data from the database until the user expands a JS
widget, but really we should be able to work out a way to batch up those
requests if we think about it for a minute.
Conceptually, I'm not sure we even need to hit Django's DB for this at
all, all the metadata needed to display search results could probably
live in GeoNetwork (and the more metadata that lives in GeoNetwork, the
better, since GeoNetwork has its own search interface and can be
federated with other GeoNetworks). However, mirroring this stuff in
Django is going to be fairly important if we want to do the kinds of
things we've been talking up for GeoNode - having user profiles
influence layer metadata, etc.
I suppose one formulation of the problem is in a use case:
Jorge has uploaded several dozen layers to GeoNode. Since he has
filled out the profile for his GeoNode account, he has been able to
avoid a lot of repetitive work filling out the descriptions for each
of these layers. Now, however, he's been promoted and needs to
change his title from Data Wrangler to Poobah of Informatology ...
on 200 layers. GeoNode to the rescue! He simply edits his profile
and GeoNode updates all the metadata documents that reference him as
provider or metadata maintainer with current contact information.
So, we need some sort of data architecture that can
* figure out which layers need updating after a user profile changes
* update just the fields corresponding to that user profile (actually,
GN is basically storing the metadata documents as blobs so we will have
to overwrite everything... but we need to make sure that we don't
clobber the fields that aren't being modified)
One possible implementation would be to have a more relational model in
GeoNode and use the typical "WHERE owner.uid = updated_profile.uid" kind
of query to figure out what documents to update, and then just generate
entire new metadata documents to clobber the pre-existing ones. To
preserve the fields that aren't coming from GeoNetwork, we'd probably
want to store everything in the layer's Django representation.
--
David Winslow
OpenGeo - http://opengeo.org/
On 06/14/2010 12:06 PM, Ariel Nunez wrote:
> Short story:
> http://github.com/sebleier/django-redis-cache
>
> Long story:
> IMHO, the best idea is to just cache the metadata in RAM, unlike
> memcached, Redis also writes a backup periodically to the disk and is
> able to maintain the data between restarts. What we would do then is
> either write the key, value pairs or just store a geojson dict for a
> given layer.
>
> Here is some code I wrote a while ago that uses redis in a very simple
> yet effective way to cache an expensive operation:
>
> http://github.com/ingenieroariel/dondevoto/blob/master/server.py#L16
>
Re: [geonode] How much layer metadata should we replicate in the
Django models?
- From:
- Ariel Nunez
- Date:
- 2010-06-14 @ 20:51
>
> So, we need some sort of data architecture that can
> * figure out which layers need updating after a user profile changes
I suggest we hook up the post_save[1] signal handler for the Profile
model and take ``self.user.layer_set.all()`` as the list of layers to
be updated. If we can do bulk updates to GeoNetwork, it would be great
to add that as a LayerManager method.
Which makes me think: Is this update operation expected to be
expensive(in terms of time) ? If it is, then we better take it out of
the request/response cycle, for example creating a
``update_geonetwork`` management command that runs every minute and
sees if there are pending updates (by checking a PendingUpdates table
or similar).
[1]
http://docs.djangoproject.com/en/dev/ref/signals/#django.db.models.signals.post_save
> * update just the fields corresponding to that user profile (actually, GN is
> basically storing the metadata documents as blobs so we will have to
> overwrite everything... but we need to make sure that we don't clobber the
> fields that aren't being modified)
From your comments I get that this is not feasible, am I correct?
Which one is supposed to be the authoritative data source for
metadata, our GeoNode or GeoNetwork? Can we safely assume that every
GeoNode instance starts off with a fresh GeoNetwork?
> One possible implementation would be to have a more relational model in
> GeoNode and use the typical "WHERE owner.uid = updated_profile.uid" kind of
> query to figure out what documents to update, and then just generate entire
> new metadata documents to clobber the pre-existing ones. To preserve the
> fields that aren't coming from GeoNetwork, we'd probably want to store
> everything in the layer's Django representation.
BTW, If we are going to replicate a lot of the GeoNetwork metadata in
the Django db, I wonder why we still need GeoNetwork, only for
searching?
Ariel
Re: [geonode] How much layer metadata should we replicate in the
Django models?
- From:
- David Winslow
- Date:
- 2010-06-14 @ 21:17
On 06/14/2010 04:51 PM, Ariel Nunez wrote:
>> So, we need some sort of data architecture that can
>> * figure out which layers need updating after a user profile changes
>>
> I suggest we hook up the post_save[1] signal handler for the Profile
> model and take ``self.user.layer_set.all()`` as the list of layers to
> be updated. If we can do bulk updates to GeoNetwork, it would be great
> to add that as a LayerManager method.
>
> Which makes me think: Is this update operation expected to be
> expensive(in terms of time) ? If it is, then we better take it out of
> the request/response cycle, for example creating a
> ``update_geonetwork`` management command that runs every minute and
> sees if there are pending updates (by checking a PendingUpdates table
> or similar).
>
> [1]
http://docs.djangoproject.com/en/dev/ref/signals/#django.db.models.signals.post_save
>
>
>
Yes something like this would make sense. Updating GeoNetwork would
likely take some time, since afaik we can only write metadata documents
one at at time (separate HTTP request per document.) At some point we
might want to modify GeoNetwork with some facilities for better
supporting our usage, depending on how receptive the GeoNetwork project
is to such changes.
>> * update just the fields corresponding to that user profile (actually, GN is
>> basically storing the metadata documents as blobs so we will have to
>> overwrite everything... but we need to make sure that we don't clobber the
>> fields that aren't being modified)
>>
> From your comments I get that this is not feasible, am I correct?
> Which one is supposed to be the authoritative data source for
> metadata, our GeoNode or GeoNetwork? Can we safely assume that every
> GeoNode instance starts off with a fresh GeoNetwork?
>
>
I think we can for now, later mass import to "upgrade" a standalone
GeoNetwork to a GeoNode site will be very desirable. We will need to at
least be able to handle 'foreign' layers reasonably well, so if you have
a federated GeoNetwork setup, GeoNode doesn't try to modify metadata for
layers for which it is not the authoritative provider.
>> One possible implementation would be to have a more relational model in
>> GeoNode and use the typical "WHERE owner.uid = updated_profile.uid" kind of
>> query to figure out what documents to update, and then just generate entire
>> new metadata documents to clobber the pre-existing ones. To preserve the
>> fields that aren't coming from GeoNetwork, we'd probably want to store
>> everything in the layer's Django representation.
>>
> BTW, If we are going to replicate a lot of the GeoNetwork metadata in
> the Django db, I wonder why we still need GeoNetwork, only for
> searching?
>
Yes, searching. GeoNetwork is basically playing the role of a
BBOX-aware full-text search engine for us right now. It could also be a
mechanism for publishing GeoNode data to other GeoNetwork and GeoNode
sites via CSW federation (one GeoNetwork instance can crawl another and
mirror the metadata records).