Last modified: 2012-05-05 07:21:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32906, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 30906 - Store DjVu extracted text in a structured table instead of img_metadata


Summary:	Store DjVu extracted text in a structured table instead of img_metadata

Status:	NEW

Product:	MediaWiki
Classification:	Unclassified
Component:	File management (Other open bugs)
Version:	1.20.x
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Wikisource 21062 30751
	Show dependency tree / graph

Reported:	2011-09-14 22:34 UTC by Brion Vibber
Modified:	2012-05-05 07:21 UTC (History)
CC List:	5 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Brion Vibber 2011-09-14 22:34:07 UTC

When DjVu files contain text layers, we currently extract these and store them into the file's metadata blob, so it's available to extensions like ProofreadPage which can use it.

Unfortunately this *massively* increases the size of the file object -- which contains the uncompressed serialized metadata blob in memory -- leading to errors like bug 30751, running out of memory when loading a bunch of file objects at once in an API request.

In addition it's a bit awkward to access the text from other places; things like search indexing (bug 6421) would benefit from having a more standardish place to get at extracted text, and this could also be used for other file formats.

Comment 1 Brion Vibber 2011-09-14 22:36:43 UTC

Changing deps from bug 6421 (DjVu-only) to bug 21062 (also notes PDF etc), so we cover wider space.

Comment 2 Bawolff (Brian Wolff) 2011-09-15 05:08:42 UTC

Perhaps (as an interim solution) we shouldn't be loading file metadata unless a method is called that specifically needs it. I imagine most of the time you don't need the metadata (otoh, maybe you need it more now a days that we check if jpg's need to be rotated)

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links