Topics

Batch Editing in Dokuwiki #edit #regexp

Tom H
 

Civihosting responded privately to my post in the main group about issues with some punctuation or other characters in page names and media filenames and that stimulated some interaction with my Dokuwiki trial out of which I've learned some useful things. I must give Civihosting credit for going through multiple re-migrations over the weekend as they tackled one problem after another. Are we there yet? Too soon for me to say as my time to inspect is limited and I want to record these other lessons I've learned on the user side that I have been exploring in parallel with their revisions to their migration tools.

Dokuwiki has optional plugins

Batch Editing with BatchEdit

I wanted to fix the broken links to images stored on postimages.org servers as reported. Can't do it in bulk on Wikispaces, can be done with Notepad++ on the exported files but they had already been imported into Civihosting's Dokuwiki. The default configuration has no global search & replace tool but Support suggested I look at BatchEdit. Alternatively, I could FTP download the pages (they are .txt files) and batch edit locally with Notepad++ and upload the revised files; however, that method would not record the changes in page histories.  

So I plunged ahead on the Admin page and learned how to search for and install an extension (or plugin). BatchEdit is a regular expression pattern matching tool, much more powerful than a simple string matcher, and more difficult to use. All I wanted to do was replace all instances of "postimg.org" with "postimg.cc". With BatchEdit, the search term has to be expressed "/postimg\.org/", sans quotes. The forward slashes demark the pattern to be matched while the backslash escapes the period to be taken literally, not as a wild card for any character. The replacement term, in this case, is simply "postimg.cc" but could have variables representing sub-pattern matches from the search.

But then I was confronted with the odd result that BatchEdit only found 9 matches in 2 files while Notepad++ found 32 in 15 of the original export and the same number in the migrated pages which I downloaded for comparison. I learned the following from the plugin developer who responded within an hour:

The plugin does not scan file system for pages like grep or Notepad++ do. Instead it uses DW page index, so it can find only those pages that DW search can find. It also takes into account DW ACL to ensure that users can modify only the pages they allowed to edit.

With a little more research, I learned that Dokuwiki builds its full text search index incrementally as users access pages, not all at once. As Civihosting had just done a fresh migration, only 2 of the 15 pages had been touched by anyone.

So how could I build a complete index short of browsing every page?

Batch Indexing with SearchIndex Manager

To the rescue, I found the SearchIndex Manager plugin. Launching it, you see a progress report of what page number it is working on out of the total to be indexed. It's not blazingly fast which makes me wonder if it is basically browsing each page in turn to trigger the indexer, not actually controlling the indexer directly. Perhaps there is a more direct way for those knowledgeable in the workings of Dokuwiki and PHP.

After SearchIndex completed, voila! BatchEdit found the 32 instances. A downside of BatchEdit is that you then have to select each instance you want replaced by checking a box, one at a time. Perhaps it is understandable that the developer wants the users to be very deliberate in their actions by inspecting the now and after states for every match but it seems a nuisance when accustomed to all or nothing decisions with Notepad++.

Tom H
 

Buoyed by my success, I then went on to see if I could delete all the "Table of Content" sections that came from the Wikispaces ToC widget. They are badly rendered in Dokuwiki, are static and aren't needed because the Dokuwiki template includes a dynamic ToC on every page. Fortunately, I had used the widget on only some 30 of the 265 pages so, if I could come up with a reliable search expression, BatchEdit could take care of it for me without too much clicking of checkboxes and a darned sight faster than editing each page.

The magical search term is "/====== Table of Contents ======\n(\[\[#[^\]]+\]\]){0,}/" sans quotes.
The replace term is just the empty field.

The same expression without the quotes and forward slashes also works in Notepad++ on the Dokuwiki export; this may be the better place to strip out the static ToC in case your migration has to be repeated.

Before:

After:

Creuset
 

https://www.patreon.com/posts/find-and-replace-8484591

There is a "find and replace" function in this plugin by the main dokuwiki developer. Haven't tried it yet but thought you, or others, might find it helpful.