Fwd: [CAPRA Development] Update on geonetwork's fine grained edits
- From:
- Gabriel Roldan
- Date:
- 2010-05-17 @ 15:57
Hi Seb, sorry I ran so late on this.
(its gonna be a lenghty email, so if not in the mood just jump to the
last paragraph, we discussed this in person anyway :)
So I spent quite a bit of time in trying to figure out how to perform
finer grained updates to geonetwork metadata records, and honestly I
don't see an easy way out.
I think if we finally decide we really need it, we should contact Jeroen
as he or some of his developers might well have some
more ideas on how to do this.
Problem is, it is not like it is impossible from the CSW API pov to
implement, but that I don't quite see how such an implementation can't suck.
Maybe I'm wrong and this is just the nature of xml based storage. What
pisses me off is that, if for example you only want to update the title
of a metadata record, you'd need to read the full record into a DOM,
then update the title, and serializa back the DOM to the database.
Disregard that, and assume that's the only way to update an element for
a metadata record, I still see a couple issues:
- concurrency: care should be taken that no two concurrent threads try
to update the same record at a time, either by geonetwork itself or by
some sort of database locking, if supported.
- capability: I clearly have no the XML/XSL skills and deep knowledge of
the thousand xml/xsl files geonetwork uses to reliably asses how I would
append content to a record. Moreover, Say we have some sort of
socialMetadata element that obviously should have cardinality 0..N, and
we want to add a new entry for it. I guess the only way would be for the
update request to contain the aggregated elements for the already
existing entries as well as the new one. Otherwise I don't know how the
"update" semantics would apply: if we send only the element we want to
"append" to the record, that'd be working more like a bastardized insert
than like an update. Update means replace the value for a given element,
so we can't be sure when the client means that or just add this new
entry to the list of entries for the given element instead?
- Even if we do so (sending the whole thing all together), say two
requests come in, both with the aggregate entries that the record had
and a new one, the one that's gonna finally persist is the one that
happened later, the former being lost.
Humm.. I realize this might be getting confusing.
Lets see, geonetwork stores metadata records in a database table field
(the data field in the DDL bellow) as a complete xml document.
Any operation that needs to be done on the document hence require
loading the whole document into memory as a DOM and manipulating it.
Inserts and Updates (at its current state), are easy because it means
either insert this new record, or replace this record by this other
complete xml document.
Querying is also easy (though not that much), because depending on the
level of detail requested geonetwork uses one or the other XSL
transformation to squeeze out the record's contents.
Sub-element updates, by the other hand, seem harder to do, though it'd
be possible using similar techniques, upon an update request, load the
document into a DOM, apply some kind of XSL transformation that replaces
an element by some other (though not sure how would I add an element in
the correct place if it didn't already exist and still respect the
metadata schema!).
Geonetwork metadata table:
CREATE TABLE metadata
(
id integer NOT NULL,
uuid character varying(250) NOT NULL,
schemaid character varying(32) NOT NULL,
istemplate character(1) NOT NULL DEFAULT 'n'::bpchar,
isharvested character(1) NOT NULL DEFAULT 'n'::bpchar,
createdate character varying(24) NOT NULL,
changedate character varying(24) NOT NULL,
---> data text NOT NULL,
source character varying(250) NOT NULL,
title character varying(255),
root character varying(255),
harvestuuid character varying(250) DEFAULT NULL::character varying,
"owner" integer NOT NULL,
groupowner integer,
harvesturi character varying(255) DEFAULT NULL::character varying,
rating integer NOT NULL DEFAULT 0,
popularity integer NOT NULL DEFAULT 0,
displayorder integer
)
All this is to say that I'm starting to question the viability of
storing social metadata inside the same metadata records, and wondering
if we couldn't just hanlde that easily in DJango, keeping users comments
separate from the layers metadata and as a GeoNode specific thing?
mostly because I don't know yet how would that social metadata
look like and how valuable it is to have it as part of the layer's
standard metadata record, though I reckon its elegant.
And if needed, we better consult the true geonetwork experts, but I
wouldn't hold my breath expecting the solution to be elegant neither
well performing. Hope to be wrong though.
A compromise solution, by the other hand, could be that when you need to
add a socialMetadata entry to a layer's metadata record, we take care of
the sync'ing in the DJango application, something like:
- user adds a comment for the layer and submits
- in DJango we receive the request and somehow lock the metadata layer
for edition, like in having a global queue of UUIDs for the layers being
processed
- we get the metadata record from geonetwork
- unmarshall it to a DOM in DJango
- add the xml elements to the DOM
- update the metadata record to GeoNetwork, replacing the whole record
(as it will happen anyways)
- unlock the layer (by removing the UUID from the global queue)
Does that sound too crazy? It won't be super performant, but at least
doable.
* on a second thought, may be its just better to create a specific REST
endpoint in GeoNetwork to perform the above without translating all the
complexity to the DJango app, so that it simply calls a <url>/addComment
REST endpoint with some parameters and the restlet takes care of
appending the comment to the metadata record and ensure synchronicity?
What do you think?
Cheers,
Gabriel
--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.
--
Archive:
http://lists.opengeo.org/capra-development/archive/2010/05/1273873081184
To unsubscribe send an email with subject "unsubscribe" to
capra-dev@lists.opengeo.org. Please contact
capra-dev-manager@lists.opengeo.org for questions.