[Mesa-dev] [PATCH 02/16] docs: Add python script that converts html to rst.

Laura Ekstrand laura at jlekstrand.net
Sat May 26 02:58:48 UTC 2018


I specifically tried forcing a rename earlier, but it doesn't work.  Git
sees too much change.  The only way I could get it to work was manually
renaming the HTML files to rst first, then committing, then converting to
rst.

The problem with that strategy is that then the Pandoc command for
converting to rst doesn't make sense.  (.rst to .rst? What?)

Laura

On Fri, May 25, 2018, 4:26 AM Eric Engestrom <eric.engestrom at intel.com>
wrote:

> On Thursday, 2018-05-24 17:27:05 -0700, Laura Ekstrand wrote:
> > Use Beautiful Soup to fix bad html, then use pandoc for converting to
> > rst.
> > ---
> >  docs/rstConverter.py | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> >  create mode 100755 docs/rstConverter.py
> >
> > diff --git a/docs/rstConverter.py b/docs/rstConverter.py
> > new file mode 100755
> > index 0000000000..5321fdde8b
> > --- /dev/null
> > +++ b/docs/rstConverter.py
> > @@ -0,0 +1,23 @@
> > +#!/usr/bin/python3
> > +import glob
> > +import subprocess
> > +from bs4 import BeautifulSoup
> > +
> > +pages = glob.glob("*.html")
> > +pages += glob.glob("relnotes/*.html")
> > +for filename in pages:
> > +    # Fix some annoyingly bad html.
> > +    with open(filename) as f:
> > +        soup = BeautifulSoup(f, 'html5lib')
> > +    soup.find("div", "header").extract() # Get rid of old header
> > +    soup.iframe.extract() # Get rid of old contents bar.
> > +    soup.find("div", "content").unwrap() # Strip the content div.
>
> Good call on using beautifulsoup to clean the html before converting it!
>
> > +
> > +    # Write out the better html.
> > +    with open(filename, 'wt') as f:
> > +        f.write(str(soup))
> > +
> > +    # Convert to rst with pandoc.
> > +    name = filename.split(".html")[0]
> > +    bashCmd = "pandoc " + filename + " -o " + name + ".rst"
> > +    subprocess.run(bashCmd.split())
>
> Idea: remove the old html at the same time as we introduce the rst
> (commit-wise), so that git picks it up as a rename with changes, which
> hopefully would be easier to check as a 1:1 of any given conversion?
>
> (In case this is as unclear as I think it is, I'm thinking about how we
> can review individual pages conversions; say index.html -> index.rst, to
> see that no release has been dropped in the process. If git shows this
> as a rename with changes, I expect it will be easier to check than if
> one commit creates all the rst files and another deletes all the html)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180525/fd574538/attachment.html>


More information about the mesa-dev mailing list