Fix the url and gallery fetcher returning empty #279
No reviewers
Labels
No labels
blocked
duplicate
needs
approval
needs
criteria
needs
estimate
needs
tests
question
type
admin
type
alert
type
bug
type
change
type
defect
type
dependencies
type
epic
type
idea
type
incident
type
investigation
type
spike
type
story
won't fix
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
RabbitLabs/random-bunny!279
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/277-url-returns-empty"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
#277
Code Review
Branch:
feature/277-url-returns-empty→hotfix/2.4.2Issue: #277
Scope: 7 files, +88 / −205 lines
Tests: All 19 pass on the PR branch
Summary
This PR fixes gallery handling by reading
gallery_dataandmedia_metadatafrom the Reddit JSON API instead of scraping gallery HTML withhtmlparser2. That removes a fragile second HTTP request, drops a dependency, and should directly fix the emptyUrl/Gallerybug.Root cause (pre-existing)
The old code had a logic bug on top of the brittle scraping approach:
FetchImageFromRedditGalleryreturnsstring[]. An empty array[]is truthy in JavaScript, so!galleryImageisfalsewhen scraping fails. The code then setsurl = undefinedandgallery = []— matching issue #277.The new approach avoids that path entirely.
What works well
gallery_data,media_metadata,is_gallery), not in scraped HTML.htmlparser2and its DOM subtree is a clear win.convertRedditUrlcorrectly turns&into&in preview URLs.imageHelpertests are removed with the dead code.IFetchResultextended with the Reddit fields the code actually uses.Concerns and suggestions
1. Fallback may still return a non-image URL (minor)
When gallery metadata is missing, the PR falls back to
randomData.urland returns success:For galleries that is often something like
https://i.redd.it/gallery/cr8xudsnkgua1, which is not a direct image. That is better thanUrl: undefinedandGallery: [], but consumers expecting embeddable image URLs may still fail in rare metadata-missing cases.Suggestion: Acceptable for this hotfix. If you want stricter behavior later, only fall back for non-gallery posts, or keep returning
NoImageResultsFoundwhenis_galleryis true but metadata yields no URLs.2. Only
metadata.s.uis used (minor)getGalleryImageUrlsreads only thes(source/preview) entry. Gallery items that are videos or lacks.uare skipped. If every item is skipped, you hit the fallback above.Suggestion: Fine for a rabbit-image bot focused on still images. No change needed unless video galleries matter.
3. Gallery detection tightened — generally good
Filter changed from
url.includes("/gallery/")tois_gallery == true. That aligns with the API. Legacy posts withoutis_gallerywould be excluded, but that field has been standard on Reddit gallery posts for years.4. Helper functions live in
index.ts(nit)convertRedditUrlandgetGalleryImageUrlsare reasonable as module-private helpers. Extracting them would only help if you want isolated unit tests; not required here.5. Missing edge-case test (nit)
Worth adding a test where one gallery item has metadata and another does not, to document partial-gallery behavior:
Pre-existing issues (not introduced by this PR)
.jpg,.png, or galleries —.gif/.webpsingle-image posts are still excluded.FetchImageFromRedditGallerywith a string instead ofstring[]; that test is gone with the helper, so this is moot.Verdict
Approve with minor notes. The fix is sound, targets the real failure mode, simplifies the codebase, and tests pass. The metadata-based approach is the right long-term design for Reddit galleries.
The only behavioral change to be aware of: gallery posts with missing metadata now succeed with a gallery slug URL instead of erroring. That is a reasonable hotfix tradeoff and still better than returning empty values.
Suggested test plan before merge
yarn test)r/rabbits) — confirmUrlandGalleryare populated withpreview.redd.itURLs.jpgpost still works unchangedTest results for review findings
Validated on branch
feature/277-url-returns-emptyusingyarn testandyarn start.Unit tests
Finding 1: Root cause (
![]is falsy) — CONFIRMEDWhen the old HTML scraper returned
[], theif (!galleryImage)guard did not trigger, producingUrl: undefinedandGallery: []. This matches #277.Finding 2: Gallery metadata extraction — CONFIRMED (
yarn start --json)Mocked Reddit listing with
gallery_data+media_metadata:UrlandGalleryare populated (no empty values)&correctly decoded to&in URLspreview.redd.itimage URLsFinding 3: Normal
.jpgpost unchanged — CONFIRMED (yarn start --json)Finding 4: Fallback when metadata missing — CONFIRMED (
yarn start --json)Gallery post with
is_gallery: truebut emptymedia_metadata:Returns success (not an error), but the URL is a gallery slug — not a direct image. As noted in the review, this is acceptable for the hotfix but worth knowing for consumers.
Finding 5: Partial gallery (one item missing metadata) — CONFIRMED (
yarn start --json)2 gallery items, only 1 has
media_metadata:Missing items are skipped; valid items are still returned.
Live Reddit (
yarn start --json) — BLOCKED from test runnerReddit blocks unauthenticated requests from this environment. Live verification against
r/rabbitsshould be done manually by @VylpesTester from a network Reddit allows.Verdict after testing
All review findings that can be exercised locally are confirmed. The PR fix resolves the empty
Url/Gallerybug; gallery posts now return properpreview.redd.itURLs via JSON metadata. No regressions observed for standard.jpgposts.Remaining manual step: one live
yarn start/yarn start --jsonrun againstr/rabbitsfrom an unblocked network.New commits pushed, approval review dismissed automatically according to repository settings