Last modified: 2014-11-20 14:47:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T75611, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 73611 - Cache multimedia limn JSON datasources
Cache multimedia limn JSON datasources
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
wmf-deployment
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://multimedia-metrics.wmflabs.org...
: analytics
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-19 15:01 UTC by Nemo
Modified: 2014-11-20 14:47 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nemo 2014-11-19 15:01:42 UTC
Second view shouldn't reload all that JSON stuff: http://www.webpagetest.org/result/141119_VN_MP8/6/details/cached/
Comment 1 Tisza Gergő 2014-11-19 19:57:52 UTC
Is that a feature in limn that we could enable by getting the configuration right? Or a webserver configuration issue? Or neither, in which case this should be a feature request (which would probably get ignored as limn is on its way out)?
Comment 2 Dan Andreescu 2014-11-19 20:04:54 UTC
limn right now just cache-busts every datasource (at the time, that's what everyone using it wanted).  But yeah, it would be pretty simple to add options to the datasource that would make it stop cache-busting.  Or maybe we can do like daily cache-busting.  If you can describe the feature very clearly, and if this is really useful to someone, I would probably be able to do it in my volunteer time.
Comment 3 Nemo 2014-11-19 20:30:08 UTC
Thanks both.

Krinkle> General practice: Either make sure requests for data have a version or timestamp in them and cache very long (e.g. 30+ days), or purge it when you detect a change server side (ideal), or cache 5-10 minutes server side (smax-age, varnish) and client-side

$ curl -I http://multimedia-metrics.wmflabs.org/dashboards/mmv
HTTP/1.1 200 OK
Server: nginx/1.5.0

Just add something to your nginx config à la

location ~*  \.(json|csv)$ {
   expires 10 m;
}

?

Though, you could even do "expires @1h00m;" or something like that, since the data only needs updating after a cronjob, doesn't it?
Comment 4 Dan Andreescu 2014-11-19 20:40:44 UTC
that particular solution does not apply here, because the nginx is just a proxy, this is being served through apache from nodejs.  And besides, limn bypasses anything the server does with client-side cache busting.  But as I say, the fix is not too bad and I can do it if this is important for someone (keep in mind a lot of people ask for a lot of important stuff).
Comment 5 Tisza Gergő 2014-11-19 23:06:40 UTC
The slowness of limn is a major usability problem; if we are going to use it for a long time, I think it's important to improve it. My understanding is that it is going to be replaced soon-ish, though, in which case I don't think it's worth spending time on fixing it. (Also, I don't know how much the lack of caching contributes to the performance problems, although the 20 sec request linked by Nemo sounds pretty bad.)

As for implementing caching, retrieving the data does not seem to be much of a performance concern. In the network log linked by Nemo, only the last request (the tsv file) contains actual data; all others are limn configuration files. Those are always local so limn could just use their last modification date as a cache-buster string. I don't think it is a big deal to leave the tsv file uncached (it is generally updated once a day, but when working on the dataset-generating code, not being able to see updates immediately would be a major inconvenience), although maybe the cache buster could be removed so that normal ETag- or Last-Modified-based caching works.
Comment 6 Dan Andreescu 2014-11-19 23:19:28 UTC
The slowness of limn is a complicated problem.  It's not the caching, it has more to do with how it renders all the graphs even if they're not on a visible tab.  I've tried to solve this but it leads to other problems.  Limn will be replaced by a dashboarding system that we're trying to design right now.

So it doesn't sound like there's anything simple we can do right now to make anyone's life better in the short term.  One suggestion, though, is that we might want to try and make the metadata inferring logic in Limn a little smarter and able to handle a few more parameters.

Right now, most graphs are added as a graphId to the dashboard.  This graphId is looked up on the server, fetched and retrieved.  It then loads one or more datasources which are fetched and retrieved.  Those each load a datafile and that's why Limn takes forever.

If you add "some valid URL" instead of a graphId to the dashboard, limn will infer the graph and datasource metadata making it much faster.  So, one thing we could do is instead of just the URL we could pass some bare minimum parameters to make it draw what we want like:

{url: '...', type: 'bar|world|line', title: 'Custom'}

Like I say, we're working on a better way to dashboard, but if  that's happening too slow, this is the most bang for our buck and I'd be happy to help make it happen.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links