On mandag 24 januar 2005, 17:09, Kake L Pugh wrote:
On Sun 23 Jan 2005, Kjetil Kjernsmo kjetil@kjernsmo.net wrote:
In other cases, you can't reasonably expect to get just a fragment, you'll get the whole document, even if you're just interested in a fragment. I'd like to create a Formatter around HTML::Tidy RSN, and while libtidy can format fragments, HTML::Tidy can't yet, AFAIK.
I'm not sure that the limitations of a helper module are a good thing to base an API on. You seem to be saying that because HTML::Tidy can't be told not to add <head>, <body>, etc to things, then a Formatter using it as a helper module must always return a full HTML document regardless of whether this is appropriate.
Ah, not really. The key is just that you need to know what the Formatter will do. Currently, the format method is undefined, and that's not good enough. So, it is not the shortcomings of a helper module as such, it is different use-cases combined with the limitations one has to expect from different helper modules.
That's why I propose to have a document and a fragment method. In the case of HTML::Tidy, format will naturally return a document, and so will document, whereas fragment will return undef. In the case of Textile, format will naturally return a fragment, and so will fragment. document will return <html><head><title>$self->title</title></head><body>$self->fragment</body></html>
Or perhaps I'm misunderstanding, since you mention the "fragment" method. But if you're going to provide a method to return a fragment, the limitations of whatever module you get to do the heavy lifting aren't relevant, surely.
Well, it could be, if it can't return a fragment, the behaviour has to be defined...
I've just read the HTML::Tidy docs at http://cpan.uwinnipeg.ca/htdocs/HTML-Tidy/HTML/Tidy.html and as far as I can tell what it does is check the syntax of an HTML document. I can't see how to make it spit out HTML. What am I missing?
Ah, I think that's a version old....? Anyway, the HTML::Tidy clean method can clean up the string.
For an AxKit Provider, you would usually want to return a document, since, in principle it can be served directly to the user.
I don't actually know how AxKit works, but it seems to me that a Formatter can only ever create a really basic HTML document.
Yup.
How is it to know things like keywords, description, stylesheet, RDF auto-discovery whatsit, etc, to put in the <head>?
It can't... But often, that's the best you can do with the information you have available.
What can it be reasonably expected to do other than stick in a really basic
<html><head><title>Whatever the title was set as in the input</title></head><body> at the start of the output and </body></html> at the end? Is that ever really going to be useful?
If it is the best you've got, it is better than nothing... :-)
Basically, what I do in AxKit is that I record a lot of metadata and classification information independently of the text itself. So, it is not really that's the issue. The issue is just that I need to know if I get just the fragment, something that can be inserted in the body element, or the entire document.
In my case, I will always need just the fragment, since all the metadata, titles, authors and stuff comes from other sources. In the cases where I can't get just the fragment, I want to know, in those cases, I'll use XSLT to extract the contents of the body element. I just need to know what I get. As for the AxKit provider, I found that it probably easier to always have it return a document, people can use that standalone if they wish, and I'll always use XSLT to get my fragment. But it is really quite irrelevant... :-)
This is not meant to pour scorn on your idea, but to explain how I see it at the moment and hopefully provoke comments from others.
Oh, no problem! Thanks for the opportunity to clarify the ideas!
Cheers,
Kjetil