Last modified: 2013-07-12 17:19:45 UTC
e.g. "Lorem [[File:Blargh|thumb]] ipsum" should come out as <p>Lorem</p> <!--Thumbnail goes here--> <p>ipsum</p> We currently output Lorem <!--Thumbnail--> ipsum which is useless.
Related URL: https://gerrit.wikimedia.org/r/60378 (Gerrit Change I47ac2c0e73331bb6829427d58138984337df14e0)
Ignore this -- I used the wrong bug id in the commit message.
There is a trade-off between round-tripping and ease of editing / conformance to the PHP parser behavior here. You are right that the PHP parser creates two paragraphs, so it might make sense to emulate that behavior. We will however need to find a way to still cleanly round-trip content like this. To me it is also not self-evident why the current behavior is 'useless'. Could you elaborate on that?
It's useless because the figure will break out of the inline context and do unexpected things. Sorry for being less than clear about it. Adding InezK because he was the original reporter IIRC.
I agree with what Mark said in the first comment: "Lorem [[File:Blargh|thumb]] ipsum" should produce <p>Lorem</p> <!--Thumbnail goes here--> <p>ipsum</p> Actually the most common usage that I noticed is putting thumb image at the beginning or the end of the paragraph, for instance: "[[File:Wiki.png|thumb]] Lorem ipsum" which in my opinion definitely should produce two block elements (as siblings).
I tried ignoring figure and figcaption elements in the ParagraphWrapper: var isBlockToken = Util.isBlockToken(token) && token.name !== 'figure' && token.name !== 'figcaption'; This wraps the entire line into a single paragraph, but the treebuilder (sadly not our own, but FF's) then breaks it into <p>foo </p><figure></figure> bar<p></p>.
Doesn't the tree builder do this automagically if we use <figure> elements for the image?
See my comment above: Ours doesn't (HTML5 lib), FF's does.
This should be possible to fix -- but requires additional tweaking of the p-wrapper. Tweaking the 'isBlockToken' by itself is insufficient since the output is wrapping everything in a single para whereas we effectively need p-breaks before and after the figure. We will probably need a special case in the p-wrapper for <figure>. But, the question is if we need <p>Lorem</p> <figure> .. </figure> <p>Ipsum</p> really if browsers treat the <figure> tag as a block element. Doesn't "Lorem <figure> .. </figure> Ipsum"? give you the same thing since the browser should break the flow before/after figure? Or is figure a flow element?
In VE we create new DOMDocument from htmldom that we receive from Parsoid and we don't want browser to modify it (break the flow) cause based on that DOMDocument we create our data model - then we create view nodes and render them. Btw. as far as I know breaking the flow before/after figure will vary in different browser and also based on CSS (display: inline, etc.).
(In reply to comment #10) > In VE we create new DOMDocument from htmldom that we receive from Parsoid and > we don't want browser to modify it (break the flow) cause based on that > DOMDocument we create our data model - then we create view nodes and render > them. > > Btw. as far as I know breaking the flow before/after figure will vary in > different browser and also based on CSS (display: inline, etc.). Our current output for this example is plain text (not inside a paragraph) interspersed with the figure. This is not broken up in the DOM by any browser. Can you describe why this result is not acceptable for VE?
I so no reason to keep Parsoid and Parser outputs inconsistent for that case. Output "123<figure/>456" and "<p>123</p><figure/><p>456</p>" will render differently in browser - and we want to avoid that.
(In reply to comment #12) > I so no reason to keep Parsoid and Parser outputs inconsistent for that case. > > Output "123<figure/>456" and "<p>123</p><figure/><p>456</p>" will render > differently in browser - and we want to avoid that. The difference is hardly noticeable unless there is custom styling though. Without the paragraphs, this round-trips already and is valid HTML.
To further clarify: 123 <figure/> 456 and <p>123</p> <figure/> <p>456</p> render identically in the browser modulo any CSS styles set on p-tags. We verified this on en.wp.org and the difference is in a small bottom margin on p-tags. However, this can fixed in two ways: 1. gwicke indicated that VE wraps all plain text into p-tags anyway, so would this take care of it? Can we verify this on the VE end? 2. Since figure only shows up in Parsoid's output, not PHP parser output, could this be equivalently fixed by adding some appropriate top/bottom margin on figure as required? We could fix this in the p-wrapper, but it seems unnecessary and we are keen to avoid adding special cases in addition to what already exists in the code to deal with PHP parser oddities and old hacks. Special cases add longer-term maintenance headaches. So, if that could be avoided by (1) and/or (2) above, that would be the best solution all around given that current Parsoid output RTs correctly and is valid HTML as gwicke indicated.
Please reopen if this is still an issue.
[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]
Adopting this bug, just in case it is re-opened. I am king of all images!