Re: latex <-> xhtml


Subject: Re: latex <-> xhtml
From: oldo (oldo@CoLi.Uni-SB.DE)
Date: Thu Apr 26 2001 - 18:42:39 CDT


Hello Dom,

[ Cc: to mates and (former) colleagues who might get involved later ;-) ]

thank for your friendly reply. I must admit I haven't used irc very much
and I want to start out with an email exchange. Then we can meet on irc if
we have a common ground and target to move for. Otherwise these online
chats tend to confuse and distract me. call me old-fashioned ;-)

My name is Oliver. I am a free software enthusiast, active "ecologist"
(wwf, "friends of the earth", ...) and enjoy jumping around at grunge,
punk, indie, "Gustav Mahler" and that sort of enthusiastic music. ;-]

I use SuSE Linux 6.4 at home ("schumann11.de" by ippp) with FVWM2 and GNU
EMACS, W3M, pine (not yet mutt..) and some apps from the two major
DESKTOPs: one of them is ABIWORD which I find quite exciting in its
conceptions.

At work I try to ;-) admin a Debian/GNU system but I'm fairly new in that
area (just got the grips on some "apt & deb"" ideas. I have Java
experience and developed a little mySQL+JDBC+jserv+Apache library
service: http://bib-term.math.uni-sb.de/.

I am a student at the "University of the Saarland" and study
German Literature and Language", "Computerlinguistik" and mathematics.
For slightly more info on my "background" you could check out my homepage
at http://www.coli.uni-sb.de/~oldo/ ... (just a short page)

I'm currently writing my final thesis on a subject concerned with
literature, and there are some format requirements for the work (as
usual).

To keep CONTENT (i.e. "plain" textual data), STRUCTURE (="the DOM") and
LAYOUT ("style") separated, I used a subset of strict (X)HTML first,
(without any tags or attributes that would control the layout, but with an
optional "stylesheet" in mind (XSL or CSS)).

I only used ... :

- rarely some anchors

- heading tags <h1>..</h1> ... <h6>..</h6>
- paragraph enclosings <p>
                           ..
                         </p>
- logic text annotation: <em>..</em>, <strong>..</strong>
- citations: <cite>..</cite>
- quotations: <cite>
                           ..
                         </cite>

I distinguish the latter two for certain reasons: There are citations in
the flow of text within a paragraph and (bigger) quotations that usually
appear indented as own "paragraphs" (from a layout point of view). (The
distinction is not really relevant in html, but it is in LateX) ...
  
With some pain and not for long I used the <br/>, but it is
certainly a hack and doesn't fit into the principles I mentioned
above and disappears from XML anyway ... &->
                  
So far, fairly basic!

I used W3M and a graphical browsers (opera, konqueror, AMAYA(!) ) first,
to browse my chapters, improve my ideas and get a grip on the subject of
my thesis. I was looking at the html-files as "files:/home/oldo/...".
(local files).
 
Now - as I am happy with the content being split into a reasonable
chapter structure, I am tackling the layout issue: with LaTeX.

So I have a conversion job here! I tried to do it with abiword:
XHTML import and LaTeX export. It produced valid LaTeX but most of the
structure was gone or at least: not VISIBLE.

Since I've heard that XML with a DTD is used for the document
representation of abiword I wondered whether the big parallels between
LaTex and XHTML (especially in subsets like I use) will be intuitively
realized in future releases due to a DTD that already has these two very
important formats in mind?!

I have learned the LaTeX basics quite quickly and it appeared to me
that there is immediate/obvious mapping between the enclosing HTML tag
pairs I use and corresponding LaTeX commands:

- h1..h6 : \chapter{}, \subchapter{} .. \section{} subsection{} ..
- p : [CR/LF] [CR/LF] # so simple :-)
- em : \emph{..} # again
- cite(1) : \emph{".."} | ".." # I'm still thinking about this

- cite(2) : \qoutation{} or \qoute{} # the important case

- strong : \bftext{..} ???? # I don't like this
- br\ : \\ # I don't like this at all %-[

At the time being I still check out some "html2latex" command line tools
but I am not satisfied. Their HTML model seems outdated (no XHTML).

A much nicer and more general approach would be using LaTeX-Macros. This
would make LaTeX perfectly readable for HTML-to-LATEX users and yield some
"natural" XML-element-names for the XML-DTD of this approach when coming
the LATEX-to-HTML direction. (the DTD of abiword-XML for example. This
seems to be of much value to me, because LaTeX is so widely used among the
developers and techies among the community that it should be among first
priority to have great compatibility to it, even internally:

\ \documentclass[]{} \ \newcommand{xyz} ... > % macro definitions for all .bla. > ------------------> % xyz-commands or > % xyz.environments / % where xyz is an XHTML-tag / \begin{document} .. >--------------> \tag{..} \ \begin{tag} .. > ---------------------> .. / \end{tag} and > ---------------> \single-tag % mirrored! .-) and / \begin{latex-command} / .. .. < ----------------< \end{latex-command} \ OR \ \latex-command{ .. } Of course there is the monster solution with an XML/DOM (dom? .. ;-)) parser in Java like Xalan or or other languages and the XSLT technology for transformation based on DTDs (am I right?) but I don't want to have too much trouble with this little thing: I probably will write a tiny shell or PERL script to do the conversion, IF THE STYLES IN CURRENT ABIWORD (and the filters respecting them) can't do it ... Concrete QUESTIONs: * What do you know about the XHTML <-> XML <-> LaTeX conversions in Abiword? * Is there this (maybe tough) requirement to distinguish the "DOM", "content" and "style" using XML and XSL in the project? * Who has been primarily involved in developing the corresponding architecture (the DTD, the filter logics) so far? I wrote a long mail. If you reached here, I hope you understand my concern. :-) looking forward to your reply cheers Oliver -- Request your IT advice from: <11serv11@gmx.net> -- .Org's smile: Free Software is Communism that works ... :-) .Com's think: Open Source is a community that works for us ... 8-| --



This archive was generated by hypermail 2b25 : Thu Apr 26 2001 - 18:44:40 CDT