Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
Make sure that you generate site HTML with `jekyll build`, and include the changes to the HTML in your pull request also. See README.md for more information.
Build and test your changes locally according to the instructions in [README](../README.md).

Once you've done that, submit a pull request with your changes. You only need to commit your changes to the source. A GitHub Actions workflow will [generate the corresponding HTML and push it for you](./workflows/html-push.yml).
1 change: 0 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md

This file was deleted.

76 changes: 0 additions & 76 deletions .github/workflows/doc_gen.yml

This file was deleted.

21 changes: 21 additions & 0 deletions .github/workflows/html-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Build HTML

on:
pull_request:
branches:
- asf-site

jobs:
build:
name: Build HTML
runs-on: ubuntu-24.04
steps:
- name: Checkout Spark Website repository
uses: actions/checkout@v7
- name: Set up Ruby and Bundler
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.4'
bundler-cache: true
- name: Run documentation build
run: bundle exec jekyll build
46 changes: 46 additions & 0 deletions .github/workflows/html-push.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Build and Push HTML

on:
push:
branches:
- asf-site
# TODO: Remove after testing.
- automated-html

jobs:
commit:
name: Build and commit HTML to `asf-site`
# This condition is important. We don't want to trigger this job if the last
# commit was created _by_ this job!
if: "!contains(github.event.head_commit.message, '[html]')"
# Not technically necessary, but helps avoid spurious failures if multiple
# commits are pushed in rapid succession.
concurrency:
group: html-push-${{ github.ref }}
cancel-in-progress: true
runs-on: ubuntu-24.04
permissions:
contents: write
steps:
- name: Checkout Spark Website repository
uses: actions/checkout@v7
- name: Set up Ruby and Bundler
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.4'
bundler-cache: true
- name: Run documentation build
run: bundle exec jekyll build
- name: Commit and push generated HTML
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
# `-f` because we told git to otherwise ignore `site/`
git add -f site/
if git diff --cached --quiet; then
echo "No changes to commit."
else
COMMIT_TITLE=$(git log -1 --pretty=%s)
git commit -m "[html] $COMMIT_TITLE"
git push
fi
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ target/
.jekyll-cache/
.jekyll-metadata
.local_ruby_bundle
site/python
site/
28 changes: 10 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
## Generating the website HTML
# Apache Spark Main Website

This repository captures the main Apache Spark website located at https://spark.apache.org. The programming docs are [in the main Spark repo][1], not here.

[1]: https://github.com/apache/spark/tree/master/docs

To contribute changes, see [CONTRIBUTING](.github/CONTRIBUTING.md).

## Generating the website HTML locally

In this directory you will find text files formatted using Markdown, with an `.md` suffix.

Expand Down Expand Up @@ -28,22 +36,6 @@ of Spark from the Spark source repository and then copied to the website under t
directory. See the instructions for building those in the readme in the Spark
project's `/docs` directory.

## Rouge and Pygments

We also use [Rouge](https://github.com/rouge-ruby/rouge) for syntax highlighting in documentation Markdown pages.
Its HTML output is compatible with CSS files designed for [Pygments](https://pygments.org/).

To mark a block of code in your Markdown to be syntax highlighted by `jekyll` during the
compile phase, use the following syntax:

{% highlight scala %}
// Your Scala code goes here, you can replace Scala with many other
// supported languages too.
{% endhighlight %}

You probably don't need to install that unless you want to regenerate the Pygments CSS file.
It requires Python, and can be installed by running `sudo easy_install Pygments`.

## Merge PR

To merge pull request, use the `merge_pr.py` script which also squashes the commits.
To merge a pull request, use the `merge_pr.py` script. This script also squashes the commits.
2 changes: 1 addition & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ kramdown:
entity_output: symbol
permalink: none
destination: site
exclude: ['README.md', 'content', 'LICENSE', 'merge_pr.py', 'Gemfile', 'Gemfile.lock']
exclude: ['README.md', 'LICENSE', 'merge_pr.py', 'Gemfile', 'Gemfile.lock']
keep_files: ['docs', 'static', 'llms.txt']
url: https://spark.apache.org
1 change: 0 additions & 1 deletion content

This file was deleted.

39 changes: 18 additions & 21 deletions site/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1163,7 +1163,10 @@
<loc>https://spark.apache.org/releases/spark-release-0-3.html</loc>
<changefreq>weekly</changefreq>
</url>

<url>
<loc>https://spark.apache.org/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/404.html</loc>
<changefreq>weekly</changefreq>
Expand Down Expand Up @@ -1205,70 +1208,65 @@
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/history.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/improvement-proposals.html</loc>
<loc>https://spark.apache.org/graphx/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/spark-connect/</loc>
<loc>https://spark.apache.org/history.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/pandas-on-spark/</loc>
<loc>https://spark.apache.org/improvement-proposals.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/graphx/</loc>
<loc>https://spark.apache.org/mailing-lists.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/mllib/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/streaming/</loc>
<loc>https://spark.apache.org/news/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/news/</loc>
<loc>https://spark.apache.org/pandas-on-spark/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/screencasts/</loc>
<loc>https://spark.apache.org/powered-by.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/sql/</loc>
<loc>https://spark.apache.org/release-process.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/</loc>
<loc>https://spark.apache.org/research.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/mailing-lists.html</loc>
<loc>https://spark.apache.org/screencasts/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/powered-by.html</loc>
<loc>https://spark.apache.org/security.html</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/release-process.html</loc>
<loc>https://spark.apache.org/spark-connect/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/research.html</loc>
<loc>https://spark.apache.org/sql/</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://spark.apache.org/security.html</loc>
<loc>https://spark.apache.org/streaming/</loc>
<changefreq>weekly</changefreq>
</url>

<url>
<loc>https://spark.apache.org/third-party-projects.html</loc>
<changefreq>weekly</changefreq>
Expand All @@ -1281,5 +1279,4 @@
<loc>https://spark.apache.org/versioning-policy.html</loc>
<changefreq>weekly</changefreq>
</url>

</urlset>
15 changes: 12 additions & 3 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,18 @@ sitemap: false
<changefreq>weekly</changefreq>
</url>
{% endfor %}
{% for page in site.pages %}{% if page.sitemap != false %}<url>
{%- comment -%}
Explicitly sort `site.pages` so that the order is consistent and we don't get spurious git diffs.
`site.posts` doesn't have this issue because it's already sorted.
See: https://jekyllrb.com/docs/variables/#site-variables
{%- endcomment -%}
{%- assign sorted_pages = site.pages | sort: "url" -%}
{%- for page in sorted_pages -%}
{%- if page.sitemap != false -%}
<url>
<loc>{{ site.url }}{{ page.url }}</loc>
<changefreq>weekly</changefreq>
</url>{% endif %}
{% endfor %}
</url>
{% endif %}
{%- endfor -%}
</urlset>