[cgi-wiki-dev] Formatters reorg

Kake L Pugh cgi-wiki-dev@earth.li
Thu, 30 Sep 2004 04:58:17 +0100


This is long, but I think worth reading.

I had this mail from a chap called Kjetil Kjernsmo (who is now on this list):

> I'm developing a system I call TABOO, a little something based on
> AxKit.  I've got a small demo on http://demo.kjernsmo.net/ Please feel
> free to play with anything. You can log in as foo or bar with password
> trustno1 :-) bar is an editor.
> 
> Currently, I'm working on the problem you've been addressing with your
> CGI::Wiki::Formatter::* People will submit things to the site, either
> as stories or as comments (and in the future, longer articles). Most
> user will not know HTML, and they are probably unwilling to learn any
> markup language, allthough some may do things like putting some
> *asterikses* here and there...
> 
> I think you have something good going with the CGI::Wiki::Formatter::*
> classes there, and I wouldn't want to reinvent your wheel. I wouldn't
> want to have a lot of dependencies, but I see you're not actually
> use-ing CGI::Wiki in those, and package them separately, which looks great!
>
> So, before I go around hacking on it, I figured I'd ask you if you
> think that it would be straightforward to integrate
> CGI::Wiki::Formatter:: modules outside of CGI::Wiki?

and I replied:

> On Sat 25 Sep 2004, Kjetil Kjernsmo <kjetil@kjernsmo.net> wrote:
>> So, before I go around hacking on it, I figured I'd ask you if you
>> think that it would be straightforward to integrate
>> CGI::Wiki::Formatter:: modules outside of CGI::Wiki?
> 
> It's perfectly possible.  I used CGI::Wiki::Formatter::UseMod in a
> half-assed bulletin board thing I wrote for work.  Just instantiate
> the formatter and call ->format
> [...]
> I'd be happy to give you a hand with your formatter.  Maybe this might
> be a nice way to start moving CGI::Wiki things to the Wiki::* space.
> I'd really like CGI::Wiki::Formatter::UseMod to be called something
> like Wiki::Formatter::UseMod, so you might want to be thinking about
> using that namespace.
> 
> In terms of how to get started, I'm unsure that Text::WikiFormat is
> the best route any more, since I've been hitting its limitations and
> chromatic is far too busy to do much with it.  I quite liked the
> original (ie pre-obscure code) Kwiki approach of having little methods
> to handle each bit of markup.

Then there was lots of discussion on #OpenGuides.  IRC log with some
stuff cut out follows - I don't think I cut out anything important.  I
did cut out some suggestions where the suggester later decided it was
a bad idea, just because there was so much of it.

21:40 <KjetilK> basically, I have written a Wiki::Formatter::Textile, but
      instead of using it with CGI::Wiki
21:40 <KjetilK> I'm using it with my TABOO project...
21:40 <KjetilK> which is something I build on the top of AxKit
21:51 <jerakeen> ooooh, I was going to write W::F::T.
21:51 <jerakeen> how are you doing page linking?
21:52 <KjetilK> I'm just calling Text::Textile to do the hard work, and it
      does that, apparently
21:53 <KjetilK> I'm don't want to upload it to CPAN before I have talked with
      the rest of the interested folks, but I'll put it up on a my webserver
      in a minute, you can have a look
21:53 <KjetilK> (it is very simple, really...)
21:53 <jerakeen> for the backlinking tracking, etc, CGI::Wiki likes to ask for
      a list of other wiki pages you're linking to. You are returning an
      emtpry list?
21:54 <KjetilK> http://dev.kjernsmo.net/tmp/Wiki-Formatter-Textile-0.01.tar.gz
21:55 <KjetilK> no, that's one of the things I didn't know that I suspect I
      should learn about... :-)
21:55 <KjetilK> is that only internal links or also external links?
21:56 <jerakeen> internal links only.
21:56 <jerakeen> see the find_internal_links method in
      http://search.cpan.org/src/KAKE/CGI-Wiki-0.59/lib/CGI/Wiki/Formatter/Defa
      ult.pm
21:59 <jerakeen> linking, though - normall wikis use a different linking style
      to link to internal pages than to link to external ones
22:03 <jerakeen> you could subclass T::T instead of merely calling it, and
      override the format_link method..
22:04 <jerakeen> it's more that you don't want to express internal wiki links
      as http://site/wiki.cgi?node=Whatever
22:04 <jerakeen> the wiki should be generating the links itself, so you can
      move the site and it remains internally consistent.
22:04 <jerakeen> so maybe something like "this":WikiPage
22:04 <jerakeen> of course, that stuffs the url discovery regexp.
22:05 <jerakeen> and you can't have spaces in page names.
22:05 <jerakeen> I'd be tempted to use the [[Old Style]] links op _top_ of
      normal textile markup
22:05 <jerakeen> feels slighlty ugly
22:05 <KjetilK> another thing is that I would hope to make it independent of
      wikis
22:06 <jerakeen> there's not a lot of space to be in that's bigger than
      Text::Textile and smaller than a wiki formatter.
22:06 <KjetilK> my site isn't a wiki in the "editable by anyone" sense
22:08 <jerakeen> what do you want to do over T::T?
22:08 <KjetilK> not much, really
22:09 <KjetilK> basically, the main idea is to be able to use any formatters
      CGI::Wiki people can come up with...
22:09 <KjetilK> but Textile seems to be nice default for submitted stories...
22:10 <jerakeen> the underlying problem is that there are plenty of plain text
      => html formatting modules, and none of them have the same api as any
      other one.
22:10 <jerakeen> so you want to use the CGI::Wiki::Formatter::* namespace as
      an api
22:10 <KjetilK> exactly!
22:11 <jerakeen> hmm, this feels like a very good thing to do, but the
      approach is wrong. I'd rather see a formatter module compatibility
      layer, and a CGI::Wiki::Formatter module that will use any of the
      compatible formatters.
22:12 <jerakeen> KjetilK: I suggest you make up a vaguely sensible api, then
      mail the owners of all the text formatting modules complaining until
      they implement it.
22:12 <jerakeen> the ability to get a list of outgoing links should be in this
      api.
22:12 <KjetilK> I guess it is why Kake suggested we move into Wiki:: namespace
22:13 <jerakeen> KjetilK: but it's not a wiki feature, it's a really useful
      real world feature I'd _love_ to see.
22:13 <KjetilK> yep
22:13 <perigrin> same here.
22:14 <jerakeen> alternatively, lets pick a nice namespace we can
      automatically traverse with Module::Pluggable, say Text::Formatter::* or
      something, and write lots of compatibility layers in that namespace, so
      you can programmatically discover formatters.
22:14 <KjetilK> sounds nice
22:15 <KjetilK> how easy is it to get a lot of module owners to support a
      common API?
22:15 <jerakeen> if you explicitly restrict yourself to 'pretty plain text' ->
      html modules, you still have quite a few, and that's probably the most
      useful sort of formatter anyway.
22:15 <jerakeen> KjetilK: T::T will be easy. What other ones are you looking
      at?
22:15 <KjetilK> POD, usual wiki formatting...
22:16 <KjetilK> BBcode, but I looked at it, and I don't think I can be
      bothered... :-)
22:16 <KjetilK> HTML::Tidy->clean...
22:16 <jerakeen> I think modules in a dedicated traversible namespace is
      probably the way to go, the more I think about it, I'd really like the
      ability to programmatically discover these things.
22:18 <KjetilK> BTW, I think we wouldn't want to constrain ourselves to HTML,
      FOAF for example might need some of this stuff...
22:19 <KjetilK> eh, not only text->html but also text->rdf (FOAF == Friend of
      a Friend, a RDF vocab)
22:19 <jerakeen> you're becoming dangerously generalized. text->html is a nice
      niche that I need about once a week. FOAF I never touch.
22:19 <jerakeen> rdf is also somewhat specialist, and doesn't need a lot of
      different text-based source formats.
22:20 <perigrin> Kjetilk: solve the first problem ... and if you find you
      still need/want the second you have half of it's solution there.
22:20 <KjetilK> ok
22:21 <perigrin> it's easier to go from a working text->html to
      text->[xml|rdf] than it is from text->[something unspecific]
22:21 <KjetilK> I guess CGI::Wiki::Formatter::Default is a good start for an
      API...?
22:21 <jerakeen> the simplest possible API is a known method name for 'here's
      text. give me html'
22:22 <jerakeen> assume we'll require an oo model, for the sake of modules
      that like that.
22:22 <jerakeen> because we're a wiki, we need link tracking, so let's add
      that as a requirement.
22:23 <jerakeen> we need a namespace. I can't find anything I really like.
22:23 <perigrin> Text::Formatter::* is nice and descriptive.
22:23 <jerakeen> perigrin: I worry about being too vague. I'd like 'html' in
      there somewhere.
22:24 <KjetilK> if we're talking HTML, we're pretty much tied to the web so,
      how about 
22:24 <KjetilK> WWW::Formatter
22:24 <KjetilK> WWW::TextFormatter
22:24 <KjetilK> WWW::Text::Formatter
22:24 <jerakeen> HTML::Formatter
22:24 <jerakeen> WWW is evil
22:25 <jerakeen> KjetilK: personal opinion leaking there a little...
22:26 <perigrin> WWW tends to be consumers of ... not producers of ... web
      content too.
22:26 <KjetilK> but if we're talking XHTML:...
22:26 <jerakeen> perigrin: Aaah, much more rational reason
22:26 <perigrin> I live to justify.
22:26 <KjetilK> ok, peri, you're right (as usual)
22:27 # KjetilK doesn't really like to tie it too tightly to HTML...
22:27 <jerakeen> KjetilK: the problem with not, is that you don't really have
      a use for a non-html target right now.
22:28 <perigrin> KjetilK, later when you generalize you can have
      XML::Formatter
22:28 <jerakeen> and without a concrete use, you can't come up with a list of
      things that you want.
22:28 <jerakeen> I can write HTML::Formatter::Textile in about 30 seconds, as
      soon as we agree on method names.
22:29 <KjetilK> yeah, but the reason why it took me so long to discover the
      good things that is in C:W:F:* is that it began with CGI and I'm not
      interested in CGI...
22:29 <knewt> how about Formatter::Textile::HTML, or Formatter::HTML::Textile
      ?
22:29 <jerakeen> scary top-level namespace..
22:30 <jerakeen> but very tempting
22:30 <jerakeen> specifically, Formatter::HMTL::Textile
22:30 <KjetilK> Perhaps Text::Formatter: is good...
22:30 <jerakeen> or more generally, Formatter::<to>::<from>
22:30 <perigrin> hrm ... rpn namespaces ... what a concept.
22:31 <jerakeen> perigrin: it's a practical thing. Most of the time, you care
      much more about what you want than what you have.
22:31 <jerakeen> perigrin: 'reverse polish notation'. fule.
22:34 <KjetilK> hm... isn't <from>::<to> better...?
22:34 <perigrin> No, because you want the thing you change most often on the
      end.
22:35 <jerakeen> and it's way easier to search for modules given a prefix
22:35 <jerakeen> getting Formatter::HTML::* is easier than Formatter::*::HTML
22:35 <jerakeen> getting Formatter::HTML::* is easier than Formatter::*::HTML
22:35 <KjetilK> ok
22:35 <perigrin> method names
22:35 <jerakeen> see Module::Pluggable
22:35 <KjetilK> how about convert()
22:36 <KjetilK> to do the actuall conversion?
22:36 <jerakeen> format? We are Formatter.
22:36 <KjetilK> yeah, but you do convert something?
22:36 <perigrin> No, you format it.
22:37 <jerakeen> personally I'd also like some methods like title() in there
      as well.
22:37 <jerakeen> I suggest ->new to get a new object, then ->format($text) to
      return a lump of HTML
22:37 <perigrin> format in this usage is a synonym for convert
22:38 <jerakeen> also format_file($filename). :-)
22:38 <jerakeen> then, after a formal, methods like title() should work.
22:38 <knewt> general formatting options supplied to new, possibly overridable
      at the format call? ->format($text [,$options]) 
22:38 <jerakeen> and links()
22:38 <KjetilK> ok, I tried to envision having a Convert:: namespace and
      decided it didn't make sense... :-)
22:39 <jerakeen> knewt: tricky, options hare to make portable.
22:39 <jerakeen> hare/hard
22:40 <knewt> jerakeen: have ->title and such in the original class (eg,
      Formatter::HTML::Textile), or return a new class from ->format that has
      the converted stuff in it?
22:40 <jerakeen> knewt: format() returns formatted text. keep the simple bit
      of the API simple.
22:40 <jerakeen> but it changes the state of the object so that the title(),
      etc, methods will work.
22:44 <jerakeen> I would say $html = Class->new( $text )->format should work,
      and $html = Class->new->format($text) should work, and both should do
      the same thing
22:49 <jerakeen> current Text::Textile API: use Text::Textile qw( textile );
      print textile($string);
22:50 <jerakeen> maximum complication of new API: use Formatter::HTML::Class;
      print Formatter::HTML::Class->format($text);
22:50 <jerakeen> in it's simplest form, we're just turning text into html.
      it's really, really, simple.
22:50 <knewt> yes, that looks nice
22:50 # KjetilK nods
22:51 <jerakeen> complicated things should also be possible, but don't make
      the simple api need chained method or scalar references or anything
      nasty like that.
22:52 <jerakeen> perigrin: Rather than have Text::Carrots and
      Formatter::HTML::Carrots be a thin wrappre for it, I'd just write
      Formatter::HTML::Carrots and not write Text::Carrots at all.
22:54 <KjetilK> also, it would be nice if Formatter.pm could say something
      like "you don't have a formatter for this", if that is the case...
22:54 <jerakeen> knewt: actually, tempting is
      Formatter->new($text)->from("Textile")->to("HTML");
22:55 <jerakeen> and Formatter::HTML->new($text)->from("Textile")->format
22:55 <jerakeen> or Formatter::HTML->new()->from("Textile")->format($text)
22:55 <jerakeen> but this is all Phase 2. :-)
22:55 <knewt> not    Formatter->from("Textile")->to("HTML")->format($text)  ?
22:56 <jerakeen> knewt: put a new() in there, and yep, I like that too.
22:56 <jerakeen> can we decide on the phase 1 api now? I need to go to sleep
      soon. :-)
22:57 <perigrin> so ... new() and format()
22:57 <KjetilK> right!
22:57 <KjetilK> what does title() do?
22:57 <jerakeen> and links(), at least
22:57 <jerakeen> anyone not like title()
22:57 <jerakeen> ?
22:58 <perigrin> well ... fragments that don't have a title()
22:58 <KjetilK> I didn't get what it was supposed to do?
22:58 <jerakeen> ..can return "".
22:58 <perigrin> also is title() equivalent to the html <title> or <h1>
22:58 <jerakeen> KjetilK: if I format pod for, say, Text::Textile, to HTML,
      I'd like title to return "Text::Textile".
22:58 <knewt> jerakeen: i think undef, to distinguish from something that
      actually has a title, but it's empty
22:58 <jerakeen> knewt: good call.
23:00 <KjetilK> uhm, POD, T::T, HTML...
23:01 <jerakeen> aah, sorry, bad module name as an example. :-)
23:02 <KjetilK> OK, I'll rewrite Wiki::Formatter::Textile to the new API! :-)
23:03 <jerakeen>
      https://dev.jerakeen.org/svn/tomi/Projects/Formatter-HTML-Textile/lib/For
      matter/HTML/Textile.pm
23:04 <jerakeen> now it has a title method.
23:11 <jerakeen>
      https://dev.jerakeen.org/svn/tomi/Projects/Formatter-HTML-HTML # 2 down.
      :-)
23:14 <knewt> how about Formatter::XHTML::HTML - might actually be useful
23:14 <jerakeen> ow, now _there's_ a hard problem.
23:14 <jerakeen> would be really handy, though.
23:14 <perigrin> HTML::HTML should automatically tidy the html though.
23:15 <jerakeen> ooh! ooh! chained formatters! If you can go from
      Textile->HTML, and HTML->say... .png or something, then
      Formatter->from("Textile")->to("png") should magically do the Right
      Thing!!!!!
23:15 <knewt> oooh, that'd be cool
23:15 <jerakeen> probably possible, too.
23:15 <jerakeen> certainly from Textile to XHTML would be useful.
23:16 # KjetilK is working on HTML::Tidy...
23:16 <KjetilK> er, F::H:T
23:16 # perigrin ponders Formatter::PNG::HTML
23:16 <knewt> have to figure out how to resolve multiple same-length paths
23:16 <jerakeen> I'll do POD tomorrow, I think.
23:16 <jerakeen> KjetilK: from Tidy? Doesn't make much sense..
23:16 <KjetilK> er
23:16 <KjetilK> er, Formatter::Tidy::HTML
23:17 <jerakeen> _to_ Tidy doesn't make much more sense.
23:17 <KjetilK> well, guess it should really be in HTML::HTML 
23:17 <jerakeen> yeah, mine was more a joke than anything else, steal the name
23:18 <jerakeen> I'll do POD tomorrow, because pod is easy, and I'll loko at
      gutting C::W::F::UseMod/Default or something after that.
23:19 <KjetilK> hehe, so when Kake returns, it is all uploaded to CPAN and
      working fine... :-)
23:19 <jerakeen> KjetilK: I'm not going to actually upload things until I have
      a second opinion on the top-level namespace, it still scares me a litt.e
23:20 <KjetilK> ask modules@cpan.org?
23:20 <jerakeen> I was going to ask Dr Nick at work
23:20 <perigrin> I think I can get Formatter::PNG::HTML working tonight.
23:21 <jerakeen> perigrin: EVIL EVIL EVIL. And cool.
23:21 <perigrin> Formatter is actually intelligent and descriptive.
23:23 <jerakeen> will, that's that uploaded.
04:09 <perigrin> Hrm, well my ideas/plans for Formatter::PNG::HTML isn't
      panning out. I can't get Gtk2::MozEmbed to build on my laptop.
04:10 # KjetilK thought about doing F::HTML::Pre but I'm too tired. Think I'm
      going to bed soon
04:10 <perigrin> Pre?
04:10 <KjetilK> yeah, just stick the text in <pre></pre>
04:11 <KjetilK> and possibly scan for URLs


I am not very well at the moment, so my inkling is to give you guys my
blessing to just get on with it.  So here it is :)

Kake