Last modified: 2013-02-20 16:32:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46447, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 44447 - Top-level block-wise tokenization for better performance


Summary:	Top-level block-wise tokenization for better performance

Status:	RESOLVED FIXED

Product:	Parsoid
Classification:	Unclassified
Component:	tokenizer (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low normal
Target Milestone:	---
Assigned To:	ssastry

URL:
Whiteboard:
Keywords:	performance

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-01-29 00:14 UTC by Gabriel Wicke
Modified:	2013-02-20 16:32 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Gabriel Wicke 2013-01-29 00:14:06 UTC

The current tokenizer performs a single pass and does not yield to other async tasks until it is done. This queues up a lot of tokens and async actions at once. It would be more efficient to cooperatively yield after each top-level block or so, so that some async processing can already happen as soon as the data becomes available. A simple process.nextTick call after each top-level block and a new offset parameter to the tokenizer to re-start tokenization at the given offset are probably all that is needed to achieve this.

Comment 1 Gabriel Wicke 2013-02-06 21:25:21 UTC

tsr values on tokens also need to be updated for subsequent blocks, as the internal offset in the tokenizer will always be zero-based.

Comment 2 ssastry 2013-02-20 16:32:02 UTC

Implemented in https://gerrit.wikimedia.org/r/#/c/49856/ and several related patches before this final one.  Currently being RT-tested.   Looks good so far.  Closing.  Reopen if any significant concerns surface.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links