Last modified: 2014-10-20 11:44:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74158, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72158 - deep copy of Entity is made when calling Revision::getContent()
deep copy of Entity is made when calling Revision::getContent()
Status: ASSIGNED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=0
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-16 22:55 UTC by Aude
Modified: 2014-10-20 11:44 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Aude 2014-10-16 22:55:44 UTC
a deep copy "of Entity is made whenever calling Revision::getContent().  the deep copy is done with "unserialize( serialize( $this ) );" which is expensive, time and memory wise.

When saving an item, getContent() is called at least 15 times:

5x - WikiPage::getContent
1x - WikiPage::doEditContent
1x - WikiPage::prepareContentForEdit
1x - RepoHooks::onNewRevisionFromEditComplete
1x - WikiPage::isRedirect
1x - WikiPage::isCountable
2x - ChangeNotifier::notifyOnPageModification
2x - Revision::getText
2x - EchoDiscussionParser::getChangeInterpretationForRevision
1x - AbuseFilterHooks::filterEdit
1x - EntityChange::setRevisionInfo
1x - RepoHooks::notifyEntityStoreWatchersOnUpdate
1x - WikiPage::doEditUpdates
1x - Revision::checkContentModel
1x - WikiPage::updateRevisionOn

getContent might be called more times in various other extensions that I do not have enabled currently.

When viewing or purging the item, getContent is called 5 times:

1x - WikiPage::isRedirect
1x - RepoHooks::onOutputPageBodyAttributes
1x - ViewEntityAction::show (via ContentRetriever::getContentForRequest)
1x - ViewEntityAction::isPlainView (again via ContentRetriever::getContentForRequest)
1x - Article::fetchContentObject

possible action items are:

1) reduce the number of times getContent() is called.

2) maybe lazy initialize the full entity parameter? perhaps we can have some sort of stubbed Entity / EntityContent object, with the entity serialization as a variable, and then full serialize when getEntity is called? stuff like redirect target and entity id and type might be made more easily accessible without needing the entire entity object.

3) investigate alternatives for deep cloning the entity that are better performance-wise, not needing to do full serialize/unserialize.
Comment 1 Aude 2014-10-16 23:01:18 UTC
with corrected indentation:

5x - WikiPage::getContent
* 1x - WikiPage::doEditContent
* 1x - WikiPage::prepareContentForEdit
* 1x - RepoHooks::onNewRevisionFromEditComplete
* 1x - WikiPage::isRedirect
* 1x - WikiPage::isCountable
2x - ChangeNotifier::notifyOnPageModification
2x - Revision::getText
* 2x - EchoDiscussionParser::getChangeInterpretationForRevision
1x - AbuseFilterHooks::filterEdit
1x - EntityChange::setRevisionInfo
1x - RepoHooks::notifyEntityStoreWatchersOnUpdate
1x - WikiPage::doEditUpdates
1x - Revision::checkContentModel
1x - WikiPage::updateRevisionOn
Comment 2 Gerrit Notification Bot 2014-10-16 23:04:22 UTC
Change 167116 had a related patch set uploaded by Aude:
Remove duplicate getContentForRequest call in ViewEntityAction

https://gerrit.wikimedia.org/r/167116
Comment 3 Gerrit Notification Bot 2014-10-16 23:43:35 UTC
Change 167127 had a related patch set uploaded by Aude:
Don't getContent in OutputPageBodyAttributes handler

https://gerrit.wikimedia.org/r/167127
Comment 4 Daniel Kinzler 2014-10-17 10:10:22 UTC
For cases where getContent() is called to look at the EntityContent, but getEntity() is never called on it, we could could give EntityContent a flag that tells it to copy the Entity object when getEntity() is called (or even copy the Entity every time). EntityContent::copy could then just return $this.

We could also defer the deserialization of the Entity until getEntity is called (EntityContent would get some sort of SerializedEntity object instead of an actual Entity). 

We could further have special logic in EntityContent::getEntityId() that would extract the ID from the serialized version without actually unserializing the full Entity.
Comment 5 Gerrit Notification Bot 2014-10-17 10:18:01 UTC
Change 167116 merged by jenkins-bot:
Remove duplicate getContentForRequest call in ViewEntityAction

https://gerrit.wikimedia.org/r/167116
Comment 6 Gerrit Notification Bot 2014-10-17 10:57:53 UTC
Change 167127 merged by jenkins-bot:
Don't getContent in OutputPageBodyAttributes handler

https://gerrit.wikimedia.org/r/167127
Comment 7 Aude 2014-10-20 11:44:28 UTC
what's still missing?

"a deep copy "of Entity is made whenever calling Revision::getContent().  the deep copy is done with "unserialize( serialize( $this ) );" which is expensive, time and memory wise."

: daniel is making a patch to help avoid this when the entity is not needed. we need to see if/how much this helps.

"When saving an item, getContent() is called at least 15 times:

5x - WikiPage::getContent
 1x - WikiPage::doEditContent
 1x - WikiPage::prepareContentForEdit
 1x - RepoHooks::onNewRevisionFromEditComplete
 1x - WikiPage::isRedirect
 1x - WikiPage::isCountable
2x - ChangeNotifier::notifyOnPageModification
2x - Revision::getText
 2x - EchoDiscussionParser::getChangeInterpretationForRevision
1x - AbuseFilterHooks::filterEdit
1x - EntityChange::setRevisionInfo
1x - RepoHooks::notifyEntityStoreWatchersOnUpdate
1x - WikiPage::doEditUpdates
1x - Revision::checkContentModel
1x - WikiPage::updateRevisionOn

getContent might be called more times in various other extensions that I do not have enabled currently."

"When viewing or purging the item, getContent is called 5 times:

1x - WikiPage::isRedirect
1x - RepoHooks::onOutputPageBodyAttributes"
: removed!
"1x - ViewEntityAction::show (via ContentRetriever::getContentForRequest)
1x - ViewEntityAction::isPlainView (again via ContentRetriever::getContentForRequest)"
: removed!
"1x - Article::fetchContentObject

possible action items are:

1) reduce the number of times getContent() is called."

: removed 2 places, but more to investigate and do.

"2) maybe lazy initialize the full entity parameter? perhaps we can have some sort of stubbed Entity / EntityContent object, with the entity serialization as a variable, and then full serialize when getEntity is called? stuff like redirect target and entity id and type might be made more easily accessible without needing the entire entity object."

: in progress

"3) investigate alternatives for deep cloning the entity that are better performance-wise, not needing to do full serialize/unserialize."

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links