Aussie living in the San Francisco Bay Area.
Coding since 1998.
.NET Foundation member. C# fan
https://d.sb/
Mastodon: @dan@d.sb

  • 0 Posts
  • 29 Comments
Joined 2 years ago
cake
Cake day: June 14th, 2023

help-circle

  • dan@upvote.autoSelfhosted@lemmy.worldNew to self-hosting
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    I haven’t looked into paperless-ai yet, but I hope my machine would be beefy enough for this task

    You need a GPU with a decent amount of VRAM to get LLMs working well locally. I don’t have a new enough GPU to be useful - my server just has the Intel iGPU, and my desktop PC only has a GTX1080, which is from before Nvidia added Tensor cores for AI.


  • dan@upvote.autoSelfhosted@lemmy.worldNew to self-hosting
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    3 days ago

    And that sticker also has the ASN in human readable form?

    Yes! They look like this:

    So you would then add many documents at once to the feeder, and Paperless will read the QR and also split documents whenever a new code appears? What about documents you don’t want to keep physically? Is there a way to get Paperless to split them automatically as well if you add many to the feeder?

    Paperless supports two different splitting methods:

    • If it encounters an ASN QR code, it’ll split at that point and keep the page with the barcode
    • If it encounters a special barcode that’s used as a separator sheet, it’ll split at that point and delete the page with the barcode. By default it looks for a “Patch T” barcode, and you can a page with a Patch T barcode from https://www.alliancegroup.co.uk/patch-codes.htm

    so all you need to do is have a “Patch T” page between each document and it’ll split them automatically.

    Docs: https://docs.paperless-ngx.com/advanced_usage/#document-splitting

    I’m also using paperless-ai to automatically tag and set a title for scanned documents. Very useful. I’d love to run my own AI locally using ollama, but I don’t have good enough hardware so for now I’m using Google’s Gemini 2.0 Flash. I trust Google’s privacy policy far more than OpenAI’s, Google Gemini is very cheap, and if you use the paid version they don’t retain any of your data nor use it for training.


  • dan@upvote.autoSelfhosted@lemmy.worldNew to self-hosting
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    3 days ago

    a VM with torrent client and a killswitched VPN

    You can use Docker for the same setup using the --network container:vpn flag to docker run or network_mode: "container:vpn" option in docker-compose.yml where vpn is the name of the container to route through. This makes one Docker container use the network of another (the VPN one), so both containers will share the same internal IP address, and you’ll have to map any ports on the VPN container rather than the torrent/whatever one. This is just as safe as a killswitched VPN.

    Unraid has a nice UI for it when editing a Docker container:

    also meant if it ever got virused I could just roll it back

    Consider using a file system that has snapshots, like ZFS. Then you can get this same behaviour for your whole system rather than just a VM :)

    is it ok to sit on the perpetual license (for a few years at a time), or are the updates really required?

    I’m not sure, as the new licensing model is pretty new. I purchased Unraid in 2023, and back then, all licenses included lifetime updates. They switched to a subscription mode to make the business more viable long-term and afford to hire more developers, which I definitely understand.

    It supports GPU passthrough right

    It does. You can pass through any PCIe devices, so for example if you have multiple network cards, you can pass one directly to a VM (it’s a bit more efficient compared to using a virtual Ethernet adapter)


  • dan@upvote.autoSelfhosted@lemmy.worldNew to self-hosting
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    3 days ago

    ScanSnap iX1600. I bought mine from B&H: https://www.bhphotovideo.com/c/product/1615326-REG/fujitsu_pa03770_b635_scansnap_ix1600_document_scanner.html. There’s two scanners that usually get recommended for paperless: this one, and a cheaper (but not as nice) Brother one.

    It’s a really compact unit - smaller than I thought it’d be! You can put up to 50 sheets in the feeder and it scans them all, on both sides (no need to manually flip the pages). Can scan 40 pages per minute.

    I’ve combined it with ASN (archive serial number) QR code stickers for documents that I need to keep a physical copy of. I’m using Avery 5267 stickers + Avery’s online designer site to design and print them. If I need to keep a physical copy of the document, I stick a sticker on the document, scan it, and Paperless automatically detects the QR code and sets the ASN. Then I keep all the physical copies in a binder, ordered by ASN. If I need to locate a physical document, I find it in Paperless, check the ASN, then go to the right document in the binder (easy to find the right place since they’re all in order).

    There’s just a few minor issues with the scanner, but otherwise it’s perfect:

    • It was a bit expensive, at $400 in the USA.
    • You need a Windows or MacOS system to do the initial setup. Setting it up is done through a desktop app rather than through the touchscreen on the device.
    • Some of the options need a computer connected to the scanner via USB, or signing up to their cloud service. However, it does support scanning to a SMB share without a computer connected, which is all I needed. I have my paperless-ngx “consume” directory shared via Samba. You just need to delete the default scanning profiles and add a network scan (SMB) one.

  • dan@upvote.autoSelfhosted@lemmy.worldNew to self-hosting
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    3 days ago

    Should/Could I be hosting anything else?

    If you deal with a lot of paperwork, paperless-ngx and paperless-ai are very good for managing it. I bought a good scanner (edit: it’s a ScanSnap iX1600) and have been digitizing a bunch of paperwork. I feel like a proper adult now lol

    Maybe something for recipe management - Mealie or Tandoor?

    Audiobookshelf for audiobooks and podcasts.

    Healthchecks and Uptime Kuma for monitoring and alerting when things go down.


  • dan@upvote.autoSelfhosted@lemmy.worldNew to self-hosting
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    3 days ago

    I used to use Wireguard, but Tailscale is a lot easier and has a lot of useful features. Tailscale is built on top of Wireguard but automates all the configuration - all you need to do is install it and log in on all devices. It handles NAT traversal using techniques like UDP hole punching, so you don’t need to configure port forwarding and it works behind firewalls.

    What do you want to run in a VM that can’t run in Docker? If you’re using a VPN for torrents or whatever, you can easily use Gluetun and configure the Docker containers so that only done of them use Gluetun’s VPN connection, while the other containers directly connect to the internet.

    I like Unraid. It supports Docker, VMs (via KVM), and Linux containers (via LXC), and has a nice UI to configure them. It’s a paid piece of software, but works very well. Proxmox is also very good and free, but it doesn’t directly support Docker.



  • I was pretty impressed with the Samsung Gear VR (and Google Cardboard before it) when it was first released back in 2015. Instead of having to spend a lot on a fancy computer system and headset to experience VR, you could just stick your phone close to your eyes. Of course, it wasn’t as good as an actual VR headset, but it was the first VR experience that was easily approachable for ‘regular’ people, and was a lot better than I thought it’d be.



  • Some people think the big tech companies literally sell your data though, so IMO it’s important to clarify.

    There are companies that do that though, like Acxiom, LiveRamp, CoreLogic, etc. With Acxiom at least, you can buy lists like “high net worth individuals who are likely to buy a new car in the next 6 months” and get a list of names, phone numbers, and email addresses, based on data they’ve collected from both public and private sources.

    Those data broker companies collect data from things like supermarket loyalty programs (to determine consumer spending patterns) and other companies who are willing to sell data about you, and compile them into profiles.



  • For what it’s worth, Q4 always has higher ad revenue because of Black Friday and Christmas.

    I think the cost per ad went up too (that’s also in the presentation). Google and Facebook both mostly use an auction system for ads, so the price is based on the market. Out of all the possible ads a user can see (active ads targeting their demographic), the one with the highest bid will be shown.


  • Take the superbowl for example. It’s usually the most viewed event every year in the US

    Interesting that you mention the superbowl, since one of the techniques that sales reps at large digital ad networks (like Google and Facebook) use to sell ads to large advertisers is comparing it to the superbowl.

    This year’s superbowl had a viewership of around 127 million people. In comparison, 194 million Americans use Facebook at least once per day and 267 million use Google, so your ads on those platforms will have a wider potential audience than the superbowl, while being much more cost effective since you can run the ad just to a more specific audience rather than having to run the ad to every single person watching TV at that time.




  • dan@upvote.autoAsklemmy@lemmy.mlHow does social media generate revenue?
    link
    fedilink
    arrow-up
    27
    arrow-down
    4
    ·
    edit-2
    7 days ago

    selling your personal data

    The major tech companies like Google, Facebook, etc don’t sell user data. That’s a common misconception. The data is what makes the company valuable - nobody else has it. It wouldn’t make sense for them to sell it, because they’d lose their competitive advantage over other companies.

    Advertisers can target ads based on the data, but the advertiser never actually sees user data.