Last modified: 2014-09-23 20:00:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32042, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30042 - Form inputs need to reject problem characters
Form inputs need to reject problem characters
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
SemanticForms (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Yaron Koren
: patch, patch-reviewed
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-24 23:27 UTC by badon
Modified: 2014-09-23 20:00 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch for rejecting characters: {|} (3.68 KB, patch)
2011-10-03 19:33 UTC, Van de Bugger
Details
Patch for rejecting {{, |, }}; v2. (5.51 KB, patch)
2011-11-16 00:12 UTC, Van de Bugger
Details

Description badon 2011-07-24 23:27:07 UTC
Described here:

http://old.nabble.com/Semantic-Forms%3A-Need-better-input-checking.-tt32128157.html

These characters cause problems on form inputs, and need to be rejected:

#<>[]|{}
Comment 1 Yaron Koren 2011-07-25 19:34:40 UTC
Actually, I believe it's only the "|" that's problematic. Still, there should be Javascript validation for it.
Comment 2 s7eph4n 2011-07-25 20:18:17 UTC
And probably "}}" without (or with unbalanced) opening braces.
Are there other cases?

Oh, and probably be careful not to reject pipes inside correctly balanced braces. There will be people who want to be able to insert wikitext into fields.
Comment 3 Yaron Koren 2011-07-25 20:35:58 UTC
Oh yeah, "{{" and "}}". And it's true that pipes within parser functions and the like can be fine, in theory, although I think SF handles these incorrectly, the next time it goes to edit the page. Or maybe that bug has been fixed? I should really test this stuff out...
Comment 4 s7eph4n 2011-07-25 20:39:23 UTC
Just tried it. Seems to be fixed.
Comment 5 Yaron Koren 2011-07-25 21:07:50 UTC
Oh, that's great to hear. Although it seems to make this feature request a good deal more difficult to implement, since the Javascript would need to parse through the input string, to determine that pipes were always within double braces, and that double braces always got closed...
Comment 6 badon 2011-07-25 21:43:29 UTC
Do you have to reinvent the wheel on that, or is there a nice off the shelf parser that can do such a common task for you? It seems like I ran across one that was very sophisticated while still be very simple to use, but it was for PHP, not JavaScript.
Comment 7 Yaron Koren 2011-07-25 21:52:39 UTC
If you find a Javascript one, let me know.
Comment 8 badon 2011-07-25 23:55:26 UTC
CakePHP had the best parser functions I found for PHP:

http://cakephp.org/

Maybe they could work for you? I think it can do some trickery with forms and JavaScript too, so it might be worth a look. This might be a good place to start:

http://book.cakephp.org/view/1143/Data-Validation

I have never used CakePHP proper, since what I do is much too simple to need the whole thing. But, I did once find the a very good text truncation algorithm that I needed, and I was able to use that as a pattern for my own project.

It has some fancy AJAX functions too, so maybe it'll work that way, instead of with JavaScript.

I may be off-base with this suggestion, but 9 times out of 10 I find the perfect solution on the 10th try :)
Comment 9 s7eph4n 2011-07-26 23:03:01 UTC
Just had a quick look. None of their pre-made rules checks wikitext. Custom-made rules can either be given as regexps or as functions. But for that we do not need a framework.

I think if you want to do this right, you will end up halfway into building a wikitext parser. I wonder if the MW parser could be employed somehow.
Comment 10 badon 2011-07-27 03:58:50 UTC
That sounds like a promising avenue of investigation. But, the Mediawiki parser does not parse nested tags correctly, so it's kind of buggy on its own:

https://bugzilla.wikimedia.org/show_bug.cgi?id=29889

Still, it might be the basis for a solution, even if it has to be improved also...unless another approach turns out to be more expedient.
Comment 11 Van de Bugger 2011-07-27 18:28:32 UTC
Hi guys,

I see you start thinking about quite complicated stuff like using parser functions and templates within an input field...

Please wait for a moment. I think usage of these complicated constructs is very limited, it will be needed in very rare cases. In 99% of cases input field serves for entering simple strings like "John, Dow" or "1100 Fox Drive", or "yellow". 

Let us implement very strict rules: characters {|} (and, probably, []) are rejected. This strict rule will be a "right thing" in 99% of cases. Then, if someone requests allowing complicated stuff (like using parser functions or templates), it can be implemented by a parameter, e. g. 

{{{field|Name|input type=text|validate=off}}}
Comment 12 s7eph4n 2011-07-27 18:41:55 UTC
You can do that already using input type regexp, I think: http://www.mediawiki.org/wiki/Extension:Semantic_Forms_Inputs#Regular_expression_filter.

Needs the SFI extension, though.
Comment 13 badon 2011-07-27 18:45:18 UTC
Validation is something that should be built-in if the consequences are severe enough.
Comment 14 Dan Bolser 2011-10-01 09:24:42 UTC

about the above... can you just make an api query to pass the value to 'expand templates'? ... Hmm... can't find that just now, but in general, the JS can call the MW api to check the values for 'valid' wiki text?

I think using the existing parser (if possible) via a JS api call is much better than building a JS MW parser from scratch (or even using an existing framework, because the MW syntax may evolve over time).

I agree that, in the first instance, just banning pipes and braces is enough.
Comment 15 Van de Bugger 2011-10-03 19:33:32 UTC
Created attachment 9152 [details]
Patch for rejecting characters: {|}

This is a simple patch for rejecting characters by javascript code.
Comment 16 Sumana Harihareswara 2011-11-14 17:00:47 UTC
Hope the SMW folks don't mind -- marking this with the "patch" & "need-review" keywords to signal that Van contributed a patch here.  Thanks, Van.
Comment 17 Jeroen De Dauw 2011-11-15 21:03:28 UTC
Patch looks valid. One remark though: it seems to disallow any occurrence of { and }, while I think it only needs to disallow {{ and }}.
Comment 18 s7eph4n 2011-11-15 21:48:09 UTC
As Van already stated, this is a simple patch. IMHO it is too simple.

It rejects any input that has a { or | or }, although this might be perfectly valid. I agree that for normal text inputs usage of these characters is unusual. The situation is probably different for textareas.

What makes it unacceptable for me is, that it changes the behavior of SF so that old forms may not work as expected with new versions of SF. If you are really intent on implementing this, introduce a parameter in input definitions that needs to be set to turn this validation ON and is OFF by default.

For me this whole thing is an enhancement, not a bug.
Comment 19 s7eph4n 2011-11-15 22:19:29 UTC
I apologize if this came over harsh. I do appreciate improvements of SF and I do agree that there is a problem here. But if possible we should avoid implementing solutions that leave people out in the rain, even if these people are only 1% of the users and it would help the other 99%.
Comment 20 Van de Bugger 2011-11-16 00:11:01 UTC
@Sumana Harihareswara: Thanks for attracting attention to the problem.

> it seems to disallow any occurrence of { and }, while I think it only needs to
> disallow {{ and }}.

Hmm… May be. Is it the only issue? Will attach new patch soon.

> It rejects any input that has a { or | or }, although this might be perfectly
> valid. I agree that for normal text inputs usage of these characters is
> unusual. The situation is probably different for textareas.

Proposed patch does not affect textareas.

Perfectly valid? What do you mean? Balanced braces so Mediawiki parser will interpret them as template/magic word/parser function invocation? We already talked about it. 

Currently editing a page trough form is "fragile" -- a user editing a page through a form can easily break code. The page will be displayed incorrectly, and even editing the page through the form will not work as expected -- some values will be lost. An experienced user can easily fix the problem by editing the page source, but an inexperienced user will be very frustrated. To me this is unacceptable. Forms must not accept invalid input.

I implemented a simple patch, so user will be warned about invalid characters. Yes, users can not use templates, magic words and other stuff in form fields. But do you really think all this stuff is actually required in forms? If somebody wants to use templates, it means this person is aware about template syntax, and, therefore, can relatively easily edit pages directly. 

Having a script which can recognize perfectly valid wikitext code means somebody should reimplement Mediawiki parser (~11000 lines of PHP code) in JavaScript. Will you? I won't.
Comment 21 Van de Bugger 2011-11-16 00:12:12 UTC
Created attachment 9462 [details]
Patch for rejecting {{, |, }}; v2.
Comment 22 badon 2011-11-16 06:17:01 UTC
I agree with Van that this issue is important, and not just an enhancement. Forms ideally should make things simple for anyone and everyone, by eliminating the kinds of manual code editing that will be required in some cases without fixing the issue.

However, if possible, let's not leave anyone out in the rain :) I think it is possible, so let's make sure nothing too unusual occurs once the patch is applied that might disrupt current forms that are in use.
Comment 23 s7eph4n 2011-11-16 09:32:09 UTC
(In reply to comment #20)
> Proposed patch does not affect textareas.

Right. Sorry, did not read the code correctly.


> Perfectly valid? What do you mean? Balanced braces so Mediawiki parser will
> interpret them as template/magic word/parser function invocation? We already
> talked about it. 

Yes, that's what I mean. And yes, we already talked about it. But no, there was no agreed way to go forward.


> Forms must not accept invalid input.

But they must accept valid input. Especially if it was accepted before.


> I implemented a simple patch, so user will be warned about invalid characters.

The user will not only be warned, they will be denied the usage of these characters.


> Yes, users can not use templates, magic words and other stuff in form fields.
> But do you really think all this stuff is actually required in forms?

Yes.


> If somebody wants to use templates, it means this person is aware about template
> syntax, and, therefore, can relatively easily edit pages directly. 

That's exactly what I am saying: This patch implements something for the sake of the majority and in it's course reduces functionality for a minority. And there would even be a solution that does not involve rewriting the MW parser in JS: Introduce a parameter in input definitions
that needs to be set to turn this validation ON and is OFF by default. You even proposed something similar yourself, it could look like this:

{{{field|Name|input type=text|validate=on}}}

It would be even better if you implemented it in a way that other validators could be plugged in, e.g.

{{{field|Name|input type=text|validate=nobraces}}}

Somebody else could then contribute other validators, e.g. "telephone", "email"


Or you could implement a new input type that only accepts simple text, e.g.

{{{field|Name|input type=simple text}}}


Or you could parse the input text and ensure that braces are balanced, no need to completely recreate the MW parser.


Or you could actually send the values to the wiki for validation as Dan proposed.
Comment 24 badon 2011-11-16 23:42:36 UTC
Here's an example of a situation where it might be desirable to optionally enable these problem characters to be input (Login with Demo/test):

http://www.coincompendium.com/wiki/index.php?title=File:IMG_0629.JPG&action=formedit

Granted, it's being done in the textarea that's already excluded from this patch, but it could just as easily appear in other areas of the form, so is still a handy example. Notice two things:

1. Lupo's ImageAnnotator uses template formatting. A future version of the ImageAnnotator might be made compatible with Semantic Forms (Note 0). That could raise the possibility that annotation data with curly braces and possibly pipes would appear in form field, perhaps automatically using ImageAnnotator's JavaScript.

2. The form provides very simple bolded instructions that suggests users try doing a template substitution. Template substitutions are very easy ways to ask a form's users to do something unusual in a form field, without any effort. Almost anybody is capable of following such simple instructions in order to access sophisticated features that would otherwise be far too complicated for an ordinary user.

I like f.trott's idea for a validate parameter on form fields. That would allow form creators to tailor the form for their needs. Indeed, it opens the possibility of generalizing the validation code such that ANY type of validation can be done. 

For example, a form field that inputs monetary amounts may require that the currency be selected in another field. By allowing only numbers, it will serve to prevent symbols from being entered. Here's an example of that exact use case, where the best I've been able to do is tell users to enter numbers only, and not to enter symbols:

http://www.coincompendium.com/wiki/index.php/Special:FormEdit/Sighting/CCS2



Note 0. If you're impressed by Lupo's ImageAnnotator, have a look here for the suggestion to incorporate it into an extension and then SMW:

https://www.mediawiki.org/wiki/User:Badon/Extension:Semantic_MediaWiki/Manual#Future_TODO
Comment 25 Van de Bugger 2011-11-23 18:34:15 UTC
I believe that Forms *must* *not* generate invalid code. The simplest method to achieve this -- deny {{, |, and }}, which is now implemented and available as a patch. The patch fulfils my current needs. I do not object if somebody implements any of advanced validation techniques mentioned above.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links