Post by James AddisonThanks Holger,
Post by Holger WansingHi,
Post by James AddisonFrom some testing of these: the search results have a problem that they
hyperlink to a language-less .html URL, meaning that clicking a result link in
the DE-language search results takes the user to a EN-language page.
Yes, good catch.
However it's even worse: if you view a German page and use the Search function,
the search index for English is used.
Wait, I'm confused. Not on your site, though. There you have the per-language
search indices.
And in fact, when deployed to the debian.org website, requested-language search
(mapping of the browser language to an appropriate searchindex.*.js
file) could be
(and is already) a better user experience. If you hypothetically send
me a hyperlink
to a policy section auf Deutsch, that's fine, but when I search for
'configuration'
after reading that, I might want my browser settings to have been respected, in
terms of what is searched.
Post by Holger WansingI have tried to deal with this by some adaptions in the cronjob - see the
first two additions in my patch: change all links to search.html into
search.§lang.html, and rename the language-specific searchindex files into
searchindex.$lang.js.
However, that does not seem to be enough.
When you say it is not enough: could I check what you mean by that?
Post by Holger WansingPost by James AddisonThe _other_ hyperlinks in the static content are replaced as part of the
cronjob[1] - but that doesn't work for items in the searchindex.js file.
since we use content negotiation at Debian website (so the pages are
delivered in the correct language according to user's browser setting), we
after Sphinx' make there are separate directories for every language,
and they contain everything for that language, including the searchindex.js
file. And in that structure it works fine.
On Debian website we put everything in one directory, adding the language
code into the filename in front of the .html extension.
While this works fine for static content, it does not for the search
function here.
I think this is a reasonable solution; serving the content from a
single directory
is simple and logical because the permissions and content should be the same;
the latter only differs as a result of locale and therefore translation.
Post by Holger WansingPost by James AddisonFortunately I think there might be a better way to do this. Sphinx itself has
an HTML builder option 'html_file_suffix' and I think we could use that instead
to define the filenames. That option is respected by the search JavaScript
using a template variable[3] in the documentation_options.js file.
We should be careful of other side-effects if making that change, but it
would remove a deployment transformation step on the static content, and I
think that's beneficial.
I don't understand how that could affect our search function problem,
but I could give it a try.
The main change that it would introduce is that the dynamic search results that
appear in the search results (as gathered by the JavaScript) have
hyperlinks that
include the build-time suffix in the filename. So in the example
above, you have
linked me to a German-language dokumentation page, and when I search from
that page, I find (based on a DE search index) and am linked to (based on DE
file suffixes) Deutsch results; foo.de.html instead of foo.html for example.
I'm in two minds about this: if my browser settings say that my locale is en-150
and I land on a de-DE page, what language should search be performed in, and
what language should the results link to?
An answer that I find straightforward is that if the page is de-DE -- which your
hypothetical link to me was -- then because everything on that page _should_
(with sufficient translation availability) be in German, then I would expect to
search and be linked to pages accordingly. If you'd linked to a language-less
URL, then that would (a) have been thoughtful if you suspected that I did not
comprehend Deutsch, and (b) also be provided in my default locale, with search
and results taking place accordingly, and without any specific locale in the
result hyperlinks (because the server will select a resource to serve).
Note also: there does _not_ appear to be an equivalent to the 'html_file_suffix'
config setting to adjust the search index filename.
Regards,
James
I'm don't think that I communicated clearly, which means that I should have
taken more time before adding a reply; I'd also like to apologize for the
erratic formatting of my messages.
My understanding is that we want to build a single RST project into multiple
languages in HTML format (multi-page, currently), and that each of those
per-language sites should be internally consistent.
We'd like some language-agnostic URLs to exist for those resources, and when
those are used, the webserver should select the appropriate files to serve;
currently this uses language-to-filename mapping, and that seems reasonable.
Per-language search is important; precisely how this should function may or may
not have been specified.
Currently Debian uses some custom scripting to build its documentation using
Sphinx into multiple languages, and this includes some post-processing of the
results -- outside of standard .deb packaging -- before they reach the
webserver.
Sphinx itself has an open feature request for multi-language builds[2]; Debian
should not bind itself to any solution proposed there, but may be able to offer
constructive feedback. Similarly, Sphinx dev/users may have stories about what
has worked well (or not) for them.
We would like the documentation to build using both Sphinx as packaged in
Debian stable (currently v5.3.0) and also testing (currently v7.2.6).
Tangentially, there may have some cases where existing Sphinx-built Debian
HTML documentation contains hyperlink/reference consistency problems[1].
My hope is that I will be able to attend next month's MiniDebConf, and if so
then I would like to work on trying to clarify and improve the situation here,
to the benefit of both Debian and Sphinx.
[1] - https://lists.debian.org/debian-www/2024/04/msg00041.html
[2] - https://github.com/sphinx-doc/sphinx/issues/788