Branch: feature/277-url-returns-empty → hotfix/2.4.2
Issue: #277
Scope: 7 files, +88 / −205 lines
Tests: All 19 pass on the PR branch

Summary

This PR fixes gallery handling by reading gallery_data and media_metadata from the Reddit JSON API instead of scraping gallery HTML with htmlparser2. That removes a fragile second HTTP request, drops a dependency, and should directly fix the empty Url / Gallery bug.

Root cause (pre-existing)

The old code had a logic bug on top of the brittle scraping approach:

if (randomData.url.includes("/gallery")) {
    const galleryImage = await ImageHelper.FetchImageFromRedditGallery(`https://reddit.com${randomData.permalink}`);

    if (!galleryImage) {
        // ... error path
    }

    url = galleryImage[0];
    gallery = galleryImage;

FetchImageFromRedditGallery returns string[]. An empty array [] is truthy in JavaScript, so !galleryImage is false when scraping fails. The code then sets url = undefined and gallery = [] — matching issue #277.

The new approach avoids that path entirely.

What works well

Correct data source — Gallery image URLs belong in the listing JSON (gallery_data, media_metadata, is_gallery), not in scraped HTML.
Performance — One request instead of two per gallery post.
Simpler dependency tree — Removing htmlparser2 and its DOM subtree is a clear win.
HTML entity decoding — convertRedditUrl correctly turns & into & in preview URLs.
Tests updated — Gallery tests now assert real metadata URLs and entity decoding; obsolete imageHelper tests are removed with the dead code.
Typed contract — IFetchResult extended with the Reddit fields the code actually uses.

Concerns and suggestions

1. Fallback may still return a non-image URL (minor)

When gallery metadata is missing, the PR falls back to randomData.url and returns success:

if (gallery.length == 0) {
    gallery = [randomData.url];
}

For galleries that is often something like https://i.redd.it/gallery/cr8xudsnkgua1, which is not a direct image. That is better than Url: undefined and Gallery: [], but consumers expecting embeddable image URLs may still fail in rare metadata-missing cases.

Suggestion: Acceptable for this hotfix. If you want stricter behavior later, only fall back for non-gallery posts, or keep returning NoImageResultsFound when is_gallery is true but metadata yields no URLs.

2. Only `metadata.s.u` is used (minor)

getGalleryImageUrls reads only the s (source/preview) entry. Gallery items that are videos or lack s.u are skipped. If every item is skipped, you hit the fallback above.

Suggestion: Fine for a rabbit-image bot focused on still images. No change needed unless video galleries matter.

3. Gallery detection tightened — generally good

Filter changed from url.includes("/gallery/") to is_gallery == true. That aligns with the API. Legacy posts without is_gallery would be excluded, but that field has been standard on Reddit gallery posts for years.

4. Helper functions live in `index.ts` (nit)

convertRedditUrl and getGalleryImageUrls are reasonable as module-private helpers. Extracting them would only help if you want isolated unit tests; not required here.

5. Missing edge-case test (nit)

Worth adding a test where one gallery item has metadata and another does not, to document partial-gallery behavior:

// gallery_data has 2 items, only one has media_metadata entry
// expect Gallery length === 1

Pre-existing issues (not introduced by this PR)

Image filter still only matches .jpg, .png, or galleries — .gif / .webp single-image posts are still excluded.
Old gallery test mocked FetchImageFromRedditGallery with a string instead of string[]; that test is gone with the helper, so this is moot.

Verdict

Approve with minor notes. The fix is sound, targets the real failure mode, simplifies the codebase, and tests pass. The metadata-based approach is the right long-term design for Reddit galleries.

The only behavioral change to be aware of: gallery posts with missing metadata now succeed with a gallery slug URL instead of erroring. That is a reasonable hotfix tradeoff and still better than returning empty values.

Suggested test plan before merge

Unit tests pass (yarn test)
Manual check against a live subreddit known to have gallery posts (e.g. r/rabbits) — confirm Url and Gallery are populated with preview.redd.it URLs
Confirm a normal .jpg post still works unchanged
Verify CLI / JSON output consumers handle the new gallery URL format

## Code Review **Branch:** `feature/277-url-returns-empty` → `hotfix/2.4.2` **Issue:** #277 **Scope:** 7 files, +88 / −205 lines **Tests:** All 19 pass on the PR branch --- ### Summary This PR fixes gallery handling by reading `gallery_data` and `media_metadata` from the Reddit JSON API instead of scraping gallery HTML with `htmlparser2`. That removes a fragile second HTTP request, drops a dependency, and should directly fix the empty `Url` / `Gallery` bug. --- ### Root cause (pre-existing) The old code had a logic bug on top of the brittle scraping approach: ```typescript if (randomData.url.includes("/gallery")) { const galleryImage = await ImageHelper.FetchImageFromRedditGallery(`https://reddit.com${randomData.permalink}`); if (!galleryImage) { // ... error path } url = galleryImage[0]; gallery = galleryImage; ``` `FetchImageFromRedditGallery` returns `string[]`. An empty array `[]` is truthy in JavaScript, so `!galleryImage` is `false` when scraping fails. The code then sets `url = undefined` and `gallery = []` — matching issue #277. The new approach avoids that path entirely. --- ### What works well 1. **Correct data source** — Gallery image URLs belong in the listing JSON (`gallery_data`, `media_metadata`, `is_gallery`), not in scraped HTML. 2. **Performance** — One request instead of two per gallery post. 3. **Simpler dependency tree** — Removing `htmlparser2` and its DOM subtree is a clear win. 4. **HTML entity decoding** — `convertRedditUrl` correctly turns `&` into `&` in preview URLs. 5. **Tests updated** — Gallery tests now assert real metadata URLs and entity decoding; obsolete `imageHelper` tests are removed with the dead code. 6. **Typed contract** — `IFetchResult` extended with the Reddit fields the code actually uses. --- ### Concerns and suggestions #### 1. Fallback may still return a non-image URL (minor) When gallery metadata is missing, the PR falls back to `randomData.url` and returns success: ```typescript if (gallery.length == 0) { gallery = [randomData.url]; } ``` For galleries that is often something like `https://i.redd.it/gallery/cr8xudsnkgua1`, which is not a direct image. That is better than `Url: undefined` and `Gallery: []`, but consumers expecting embeddable image URLs may still fail in rare metadata-missing cases. **Suggestion:** Acceptable for this hotfix. If you want stricter behavior later, only fall back for non-gallery posts, or keep returning `NoImageResultsFound` when `is_gallery` is true but metadata yields no URLs. #### 2. Only `metadata.s.u` is used (minor) `getGalleryImageUrls` reads only the `s` (source/preview) entry. Gallery items that are videos or lack `s.u` are skipped. If every item is skipped, you hit the fallback above. **Suggestion:** Fine for a rabbit-image bot focused on still images. No change needed unless video galleries matter. #### 3. Gallery detection tightened — generally good Filter changed from `url.includes("/gallery/")` to `is_gallery == true`. That aligns with the API. Legacy posts without `is_gallery` would be excluded, but that field has been standard on Reddit gallery posts for years. #### 4. Helper functions live in `index.ts` (nit) `convertRedditUrl` and `getGalleryImageUrls` are reasonable as module-private helpers. Extracting them would only help if you want isolated unit tests; not required here. #### 5. Missing edge-case test (nit) Worth adding a test where one gallery item has metadata and another does not, to document partial-gallery behavior: ```typescript // gallery_data has 2 items, only one has media_metadata entry // expect Gallery length === 1 ``` --- ### Pre-existing issues (not introduced by this PR) - Image filter still only matches `.jpg`, `.png`, or galleries — `.gif` / `.webp` single-image posts are still excluded. - Old gallery test mocked `FetchImageFromRedditGallery` with a string instead of `string[]`; that test is gone with the helper, so this is moot. --- ### Verdict **Approve with minor notes.** The fix is sound, targets the real failure mode, simplifies the codebase, and tests pass. The metadata-based approach is the right long-term design for Reddit galleries. The only behavioral change to be aware of: gallery posts with missing metadata now succeed with a gallery slug URL instead of erroring. That is a reasonable hotfix tradeoff and still better than returning empty values. --- ### Suggested test plan before merge - [x] Unit tests pass (`yarn test`) - [ ] Manual check against a live subreddit known to have gallery posts (e.g. `r/rabbits`) — confirm `Url` and `Gallery` are populated with `preview.redd.it` URLs - [ ] Confirm a normal `.jpg` post still works unchanged - [ ] Verify CLI / JSON output consumers handle the new gallery URL format

Smithy-bot commented

2026-06-08 20:30:59 +01:00

Member

Test results for review findings

Validated on branch feature/277-url-returns-empty using yarn test and yarn start.

Unit tests

yarn test
# 3 suites, 19 tests — all passed

Finding 1: Root cause (`![]` is falsy) — CONFIRMED

![] = false
g[0] = undefined

When the old HTML scraper returned [], the if (!galleryImage) guard did not trigger, producing Url: undefined and Gallery: []. This matches #277.

Finding 2: Gallery metadata extraction — CONFIRMED (`yarn start --json`)

Mocked Reddit listing with gallery_data + media_metadata:

{
  "Url": "https://preview.redd.it/image-1.jpg?width=640&crop=smart&auto=webp&s=1",
  "Gallery": [
    "https://preview.redd.it/image-1.jpg?width=640&crop=smart&auto=webp&s=1",
    "https://preview.redd.it/image-2.jpg?width=640&crop=smart&auto=webp&s=2"
  ]
}

Url and Gallery are populated (no empty values)
& correctly decoded to & in URLs
All gallery entries are preview.redd.it image URLs

Finding 3: Normal `.jpg` post unchanged — CONFIRMED (`yarn start --json`)

{
  "Url": "https://i.redd.it/testbunny.jpg",
  "Gallery": ["https://i.redd.it/testbunny.jpg"]
}

Finding 4: Fallback when metadata missing — CONFIRMED (`yarn start --json`)

Gallery post with is_gallery: true but empty media_metadata:

{
  "Url": "https://i.redd.it/gallery/cr8xudsnkgua1",
  "Gallery": ["https://i.redd.it/gallery/cr8xudsnkgua1"]
}

Returns success (not an error), but the URL is a gallery slug — not a direct image. As noted in the review, this is acceptable for the hotfix but worth knowing for consumers.

Finding 5: Partial gallery (one item missing metadata) — CONFIRMED (`yarn start --json`)

2 gallery items, only 1 has media_metadata:

{
  "Url": "https://preview.redd.it/partial.jpg?width=640&s=1",
  "Gallery": ["https://preview.redd.it/partial.jpg?width=640&s=1"]
}

Missing items are skipped; valid items are still returned.

Live Reddit (`yarn start --json`) — BLOCKED from test runner

yarn start --json
# Failed to fetch result from Reddit (exit 1)
# got-cjs: Response code 403 (Blocked)

Reddit blocks unauthenticated requests from this environment. Live verification against r/rabbits should be done manually by @VylpesTester from a network Reddit allows.

Verdict after testing

All review findings that can be exercised locally are confirmed. The PR fix resolves the empty Url/Gallery bug; gallery posts now return proper preview.redd.it URLs via JSON metadata. No regressions observed for standard .jpg posts.

Remaining manual step: one live yarn start / yarn start --json run against r/rabbits from an unblocked network.

## Test results for review findings Validated on branch `feature/277-url-returns-empty` using `yarn test` and `yarn start`. --- ### Unit tests ``` yarn test # 3 suites, 19 tests — all passed ``` --- ### Finding 1: Root cause (`![]` is falsy) — **CONFIRMED** ``` ![] = false g[0] = undefined ``` When the old HTML scraper returned `[]`, the `if (!galleryImage)` guard did not trigger, producing `Url: undefined` and `Gallery: []`. This matches #277. --- ### Finding 2: Gallery metadata extraction — **CONFIRMED** (`yarn start --json`) Mocked Reddit listing with `gallery_data` + `media_metadata`: ```json { "Url": "https://preview.redd.it/image-1.jpg?width=640&crop=smart&auto=webp&s=1", "Gallery": [ "https://preview.redd.it/image-1.jpg?width=640&crop=smart&auto=webp&s=1", "https://preview.redd.it/image-2.jpg?width=640&crop=smart&auto=webp&s=2" ] } ``` - `Url` and `Gallery` are populated (no empty values) - `&` correctly decoded to `&` in URLs - All gallery entries are `preview.redd.it` image URLs --- ### Finding 3: Normal `.jpg` post unchanged — **CONFIRMED** (`yarn start --json`) ```json { "Url": "https://i.redd.it/testbunny.jpg", "Gallery": ["https://i.redd.it/testbunny.jpg"] } ``` --- ### Finding 4: Fallback when metadata missing — **CONFIRMED** (`yarn start --json`) Gallery post with `is_gallery: true` but empty `media_metadata`: ```json { "Url": "https://i.redd.it/gallery/cr8xudsnkgua1", "Gallery": ["https://i.redd.it/gallery/cr8xudsnkgua1"] } ``` Returns success (not an error), but the URL is a gallery slug — not a direct image. As noted in the review, this is acceptable for the hotfix but worth knowing for consumers. --- ### Finding 5: Partial gallery (one item missing metadata) — **CONFIRMED** (`yarn start --json`) 2 gallery items, only 1 has `media_metadata`: ```json { "Url": "https://preview.redd.it/partial.jpg?width=640&s=1", "Gallery": ["https://preview.redd.it/partial.jpg?width=640&s=1"] } ``` Missing items are skipped; valid items are still returned. --- ### Live Reddit (`yarn start --json`) — **BLOCKED from test runner** ``` yarn start --json # Failed to fetch result from Reddit (exit 1) # got-cjs: Response code 403 (Blocked) ``` Reddit blocks unauthenticated requests from this environment. Live verification against `r/rabbits` should be done manually by @VylpesTester from a network Reddit allows. --- ### Verdict after testing All review findings that can be exercised locally are **confirmed**. The PR fix resolves the empty `Url`/`Gallery` bug; gallery posts now return proper `preview.redd.it` URLs via JSON metadata. No regressions observed for standard `.jpg` posts. **Remaining manual step:** one live `yarn start` / `yarn start --json` run against `r/rabbits` from an unblocked network.

Vylpes added 1 commit

2026-06-08 20:39:23 +01:00

Fix Reddit fetch failures by using old.reddit.com with session cookies.

Test / build (push) Successful in 1m1s

Details

931834e22a

Reddit now returns 403 for unauthenticated reddit.com JSON requests, so bootstrap a session via old.reddit.com before fetching listings.

Co-authored-by: Cursor <cursoragent@cursor.com>

Vylpes dismissed Smithy-bot's review

2026-06-08 20:39:23 +01:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Vylpes added 1 commit

2026-06-08 20:43:32 +01:00

Extract Reddit and gallery logic into helpers with unit tests.

Test / build (push) Successful in 35s

Details

37d68349fb

Move listing fetch and gallery URL parsing out of index.ts into dedicated helper classes to keep the entry point focused on orchestration.

Co-authored-by: Cursor <cursoragent@cursor.com>

VylpesTester approved these changes

2026-06-08 20:45:37 +01:00

Vylpes merged commit 1ee07ba332 into hotfix/2.4.2

2026-06-08 20:49:09 +01:00

Vylpes deleted branch feature/277-url-returns-empty

2026-06-08 20:49:10 +01:00

Vylpes referenced this pull request from a commit

2026-06-08 20:49:11 +01:00

Merge pull request 'Fix the url and gallery fetcher returning empty' (#279) from feature/277-url-returns-empty into hotfix/2.4.2

No reviewers

No milestone

No project

No assignees

3 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

RabbitLabs/random-bunny!279

No description provided.

Rows
Columns

Fix the url and gallery fetcher returning empty #279

Code Review