Last modified: 2014-11-06 21:24:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74931, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72931 - Add el_timestamp to the externallinks table
Add el_timestamp to the externallinks table
Status: NEW
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: easy, schema-change
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-04 00:53 UTC by Betacommand
Modified: 2014-11-06 21:24 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Betacommand 2014-11-04 00:53:36 UTC
Betacommand	Ive got an interesting idea for the externallinks table. What about having a including the timestamp that a link was added? Like what happens with cl_timestamp ?
legoktm	what's your usecase
Reedy	it's doable, but ^
Betacommand	legoktm: tracking when links are added, so batch requests for archival (IE a partnership with IA) can be done
Betacommand	or tracking how long a link has been in an article without having to check every diff
Betacommand	tracking overall external link volume over time
Betacommand	or within a given time span
legoktm	sounds useful
legoktm	file a bug?
Betacommand	legoktm: I was thinking about it but wanted a sounding board first

This issue came up as I was thinking about external link recovery (Preventing link rot). Right now there is zero ways of finding external links that have been added in the last X time. Which means any attempt at proactive archiving of URLs must be done via database dumps and diffing the externallinks table between two dumps. 

While it may be feasible for smaller wikis any type of diffing on a large scale easily becomes unmanageable. Being able to do a select based off a given times would enable this and would allow nightly incremental dumps that could then be passed to archival sites to take proactive steps to avoid link rot.
Comment 1 Marc A. Pelletier 2014-11-04 01:59:29 UTC
Sounds sane; the actual cost of adding a timestamp should be essentially nil, and I can think of a couple use cases when patrolling for spam links that make it easier than trawling the RC.

That said, the column would be nearly useless without an index and I know there's a cost for /that/, so someone more versed in performance will need to chime in.
Comment 2 Sam Reed (reedy) 2014-11-04 15:15:53 UTC
(In reply to Marc A. Pelletier from comment #1)
> Sounds sane; the actual cost of adding a timestamp should be essentially
> nil, and I can think of a couple use cases when patrolling for spam links
> that make it easier than trawling the RC.
> 
> That said, the column would be nearly useless without an index and I know
> there's a cost for /that/, so someone more versed in performance will need
> to chime in.

I think it should be alright. Any indexing has some cost, and we regularly index many tables/columns with timestamps. The cost of doing should be fine as there's a reasonable use case, rather than a generic "this might be useful"

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links