Last modified: 2014-07-16 12:27:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65224, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63224 - review backend part of entity suggester (php code)
review backend part of entity suggester (php code)
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
master
All All
: Highest major (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=8 s=2014-05-20
: performance
Depends on: 66380 66381
Blocks: 46555 64956
  Show dependency treegraph
 
Reported: 2014-03-28 11:27 UTC by Lydia Pintscher
Modified: 2014-07-16 12:27 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Lydia Pintscher 2014-03-28 11:27:15 UTC
The entity suggester is getting ready for a first deployment. The code at https://github.com/Wikidata-lib/PropertySuggester/pull/44 needs review.
Comment 1 Thiemo Mättig 2014-04-01 13:49:13 UTC
Note: There is also Python code to review at https://github.com/Wikidata-lib/PropertySuggester-Python
Comment 2 tobias.gritschacher 2014-04-01 14:35:16 UTC
(In reply to Thiemo Mättig from comment #1)
> Note: There is also Python code to review at
> https://github.com/Wikidata-lib/PropertySuggester-Python

Review of the python code is bug 63368.
Comment 3 Christopher Johnson 2014-05-23 09:40:12 UTC
Posted potential issue with JS module here:
https://github.com/Wikidata-lib/PropertySuggester/issues/67
Comment 4 tobias.gritschacher 2014-06-03 14:17:46 UTC
Review is done from Wikidata-Team side.
Pending WMF review now.
Comment 5 Chris Steipp 2014-06-03 22:52:19 UTC
In general, please follow the style guide (unless Wikibase has it's own)

The build directory shouldn't be web accessible-- is this meant to be removed in production? Otherwise, add an htaccess rule for it.

The code uses PropertyId::newFromNumber, which looks like isn't supposed to be used.


./PropertySuggesterHooks.php
* It seems like the JS doesn't need to be added on every page view

./maintenance/UpdateTable.php
* Not a security issue, but using do { } while would make the code more readable.

./src/PropertySuggester/Suggesters/SimpleSuggester.php
Line 75 - please escape $minProbability
Please add profiling

./src/PropertySuggester/ResultBuilder.php
Nitpick, but the "createJSON" function returns an array. Naming consistency makes reviewing easier.

./src/PropertySuggester/GetSuggestions.php
The performance of using continue + limit as your db limit seems like it may cause issues. It's efficient, but running profiling will tell if it's a serious issue.

./src/PropertySuggester/UpdateTable/Importer/MySQLImporter.php
* For production, imports need to be broken up into chunks. Talk with Sean, but Asher used to recommend importing in blocks of about 10k to avoid overloading the slaves.

./src/PropertySuggester/SuggesterParamsParser.php
Line 48 - test is_null first

./PropertySuggester.php
Please don't require_once /vendor/autoload.php
Comment 6 Moritz Finke 2014-06-10 14:45:08 UTC
Thanks for the review - by now we should have adressed all of the important issues. For more details you might want to have a look at this issue
https://github.com/Wikidata-lib/PropertySuggester/issues/70
Comment 7 Chris Steipp 2014-06-11 18:54:16 UTC
Moritz, please keep discussion here so we have the conversation all in one place.

(In reply to Chris Steipp from comment #5)
> In general, please follow the style guide (unless Wikibase has it's own)

This wasn't copied to https://github.com/Wikidata-lib/PropertySuggester/issues/70, and I'm guessing wasn't addressed.


> The build directory shouldn't be web accessible-- is this meant to be
> removed in production? Otherwise, add an htaccess rule for it.

Fixed.

> The code uses PropertyId::newFromNumber, which looks like isn't supposed to
> be used.

Dacry said, "PropertyId::newFromNumber is not supposed to be used, because usually one shouldn't care about (numeric) ids but we naturally do - so i think we agreed that it should be ok in our special case ..."

Are wikidata folks ok with that? I want to make sure the Wikibase interface doesn't change in a way that will break this, if no one is supposed to be using that function.

> ./PropertySuggesterHooks.php
> * It seems like the JS doesn't need to be added on every page view

Fixed (I'll assume that's the right way in wikibase to do that)
 
> ./maintenance/UpdateTable.php
> * Not a security issue, but using do { } while would make the code more
> readable.

Fixed.

> ./src/PropertySuggester/Suggesters/SimpleSuggester.php
> Line 75 - please escape $minProbability

xchrdw said "minProbability is checked to be float in line 56", which isn't the point. Please follow [[Security_for_developers#Demonstrable_security]] and escape as close to the output as possible. Additionally, is_float may not be sufficient (I honestly don't have time to try and exploit your implementation).

> Please add profiling

fixed

> ./src/PropertySuggester/ResultBuilder.php
> Nitpick, but the "createJSON" function returns an array. Naming consistency
> makes reviewing easier.

fixed

> ./src/PropertySuggester/GetSuggestions.php
> The performance of using continue + limit as your db limit seems like it may
> cause issues. It's efficient, but running profiling will tell if it's a
> serious issue.

Dacry said, "I do not really see issues caused by using continue + limit as your db limit. We did already do some profiling and performance seems to be fine"

Can you point add or point to your profiling data?


> ./src/PropertySuggester/UpdateTable/Importer/MySQLImporter.php
> * For production, imports need to be broken up into chunks. Talk with Sean,
> but Asher used to recommend importing in blocks of about 10k to avoid
> overloading the slaves.

xchrdw said, "the basic importer uses chunks and is the default importer now".

That's fine. Are the deployment issues documented anywhere, so that whoever at the wmf/wmde who is deploying this will know not to override the default?


> ./src/PropertySuggester/SuggesterParamsParser.php
> Line 48 - test is_null first

Fixed

> ./PropertySuggester.php
> Please don't require_once /vendor/autoload.php

Dacry said, "requiring autoload.php ist needed for testing purposes and it is only included if an autoloader exists. Is the way of doing it problematic / does a better way exist? ;)"

If it's only needed for testing, can you also check that PHP_SAPI == 'cli'? We should minimize places where an exploit that allows creating a new file can be turned into remote code execution.
Comment 8 Moritz Finke 2014-06-12 09:10:12 UTC
Wikibase has it's own style guide and we followed it
Comment 9 Jeroen De Dauw 2014-06-12 11:24:13 UTC
The usages of PropertyId::newFromNumber are fine, expect for the one I just pointed out here: https://github.com/Wikidata-lib/PropertySuggester/pull/44
Comment 10 Jeroen De Dauw 2014-06-12 11:28:02 UTC
> ./PropertySuggester.php
> Please don't require_once /vendor/autoload.php

> If it's only needed for testing, can you also check that PHP_SAPI == 'cli'?

Good point here. You indeed do not need to have this in the enty point file at all. Including it in the test bootstarp is better. Example:

https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/bootstrap.php
Comment 11 Christian Dullweber 2014-06-12 13:09:56 UTC
I'm sorry we forgot to mention the autoloader include is also required when the extension is not installed via composer as we need to include the generated psr-4 loader and the classmap. 

But e.g. WikibaseLib.php or ValueView.php do this the same way. 

I also think that someone who is able to create files in a directory could just delete existing files and replace them?
Comment 12 Jeroen De Dauw 2014-06-12 16:20:13 UTC
> when the extension is not installed via composer

Is that something we actually need to support? We do not support it for Wikibase itself...
Comment 13 Lydia Pintscher 2014-06-12 17:08:30 UTC
(In reply to Jeroen De Dauw from comment #12)
> > when the extension is not installed via composer
> 
> Is that something we actually need to support? We do not support it for
> Wikibase itself...

IMHO not. The extension isn't useful without Wikibase at this point.
Comment 14 Christian Dullweber 2014-06-12 17:34:27 UTC
So the solution is to just remove the file_exists condition?

All other issues should be fixed now. Changes are here: https://github.com/Wikidata-lib/PropertySuggester/compare/2c409e676...269a1734a
Comment 15 Chris Steipp 2014-06-13 00:07:10 UTC
(In reply to Christian Dullweber from comment #14)
> So the solution is to just remove the file_exists condition?
> 
> All other issues should be fixed now. Changes are here:
> https://github.com/Wikidata-lib/PropertySuggester/compare/2c409e676...
> 269a1734a

Regarding the escaping in SimpleSuggester.php, can you use addQuotes()? The threat is an attacker finds a string that satisfies is_float criteria, but allows adding extra commands when it's cast back to a string. Scientific notation is handled correctly, but I'm not sure if php accepts other formats that might include a space, and strencode won't help in that situation.
Comment 16 Christian Dullweber 2014-06-17 07:33:12 UTC
I tried to use addQuotes() but it didn't work with sqlite. shouldn't is_float check the variable to be a float and not a string that looks like a float? I could add an additional cast to float as safety?
Comment 17 Thiemo Mättig 2014-06-17 08:56:02 UTC
(In reply to Christian Dullweber from comment #16)
> I tried to use addQuotes() but it didn't work with sqlite.

addQuotes to what? Field names? This can't work in SQLite. addQuotes is for values, not for identifiers. There are other methods like addIdentifierQuotes that may be more suitable.

> shouldn't is_float check the variable to be a float and not a string that
> looks like a float?

Yes, it does. Chris seems to confuse this with is_numeric. To be sure you can always add an extra floatval( $var ) or (float)$var to the places where the variable is used inside of a string, especially if it's a possible SQL injection.

(In reply to Chris Steipp from comment #15)
> I'm not sure if php accepts other formats that might include a space

Simple answer: No. http://php.net/language.types.float.php
Comment 18 Christian Dullweber 2014-06-17 09:36:08 UTC
(In reply to Thiemo Mättig from comment #17)
> addQuotes to what? Field names? This can't work in SQLite. addQuotes is for
> values, not for identifiers. There are other methods like
> addIdentifierQuotes that may be more suitable.
I added addQuotes to $minProbability in our HAVING clause:
'HAVING'   => "prob > " . $dbr->addQuotes($minProbability)

I will change it floatval to put the escaping as near to the statement as possible:
'HAVING'   => "prob > " . floatval($minProbability)
Comment 19 Christian Dullweber 2014-06-17 09:49:44 UTC
I fixed the autoloader include and changed the way $minProbability is escaped:
https://github.com/Wikidata-lib/PropertySuggester/pull/72
Comment 20 Christian Dullweber 2014-06-17 09:50:37 UTC
(In reply to Christian Dullweber from comment #19)
> I fixed the autoloader include and changed the way $minProbability is
> escaped:
> https://github.com/Wikidata-lib/PropertySuggester/pull/72

oops, wrong url:
https://github.com/Wikidata-lib/PropertySuggester/pull/76

(is it impossible to delete/edit a comment?)
Comment 21 Chris Steipp 2014-06-17 16:01:29 UTC
(In reply to Thiemo Mättig from comment #17)
> Yes, it does. Chris seems to confuse this with is_numeric. To be sure you
> can always add an extra floatval( $var ) or (float)$var to the places where
> the variable is used inside of a string, especially if it's a possible SQL
> injection.

Not confusion, but concern that it might suffer from the same issues.
 
> (In reply to Chris Steipp from comment #15)
> > I'm not sure if php accepts other formats that might include a space
> 
> Simple answer: No. http://php.net/language.types.float.php

Thanks for the link, that does clarify my concern was invalid. So yes, floatval looks like it should be fine.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links