Last modified: 2014-09-22 15:21:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72682, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70682 - Implement extracts in Pywikibot
Implement extracts in Pywikibot
Status: NEW
Product: Pywikibot
Classification: Unclassified
General (Other open bugs)
core-(2.0)
All All
: Unprioritized enhancement
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-10 19:48 UTC by Maarten Dammers
Modified: 2014-09-22 15:21 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Maarten Dammers 2014-09-10 19:48:22 UTC
Mediawiki has the extracts api function. It should be implemented in Pywikibot too.

* prop=extracts (ex) *
  Returns plain-text or limited HTML extracts of the given page(s)
  https://www.mediawiki.org/wiki/Extension:TextExtracts#API

This module requires read rights
Parameters:
  exchars             - How many characters to return, actual text returned might be slightly longer.
                        The value must be no less than 1
  exsentences         - How many sentences to return
                        The value must be between 1 and 10
  exlimit             - How many extracts to return
                        No more than 20 (20 for bots) allowed
                        Default: 1
  exintro             - Return only content before the first section
  explaintext         - Return extracts as plaintext instead of limited HTML
  exsectionformat     - How to format sections in plaintext mode:
                         plain - No formatting
                         wiki - Wikitext-style formatting == like this ==
                         raw - This module's internal representation (section titles prefixed with <ASCII 1><ASCII 2><section level><ASCII 2><ASCII 1>
                        One value: plain, wiki, raw
                        Default: wiki
  excontinue          - When more results are available, use this to continue
  exvariant           - Convert content into this language variant`
Example:
  Get a 175-character extract:
    api.php?action=query&prop=extracts&exchars=175&titles=Therion

https://nl.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=175&titles=Nicolaas_IJzendoorn&format=json
Comment 1 John Mark Vandenberg 2014-09-18 09:43:35 UTC
How do you intend to use this?
Comment 2 Maarten Dammers 2014-09-20 09:02:24 UTC
I'm already using it to extract date of birth and date of death. Extracts already gets rid of the infobox template or image so I don't have to do that myself.
Comment 3 John Mark Vandenberg 2014-09-21 12:44:13 UTC
Why not extract those dates from the infobox?
Comment 4 Maarten Dammers 2014-09-22 15:21:52 UTC
A lot of articles don't have an infobox with this information.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links