My Bluesky Mirrors Explained (even if you don’t know Python)

As you may know, I have a Bluesky mirror script, and some ask me how it works. Well, here it is!

The Flowchart (how data moves)

flowchart TD subgraph A["Startup and Configuration"] A1["Load environment vars"] A2["Configure logging"] A3["Init Twitter Async client"] A4["Init Bluesky client"] end subgraph B["Twitter Monitoring"] B1["Sign in to Twitter"] B2["Get target user info"] B3["Fetch tweets for user"] B4["Check latest tweet ID"] B5["If new tweet then process"] end subgraph C["Process Tweet"] C1["Clean tweet text"] C2["Download tweet media"] C3["Build Bluesky post"] C4["Upload media to Bluesky"] C5["Send post to Bluesky"] C6["Optional: Translate and reply"] C7["Remove old downloaded media"] end subgraph D["Loop and Sleep"] D1["Wait x seconds"] D2["Repeat fetch next tweets"] end A1 --> A2 --> A3 --> A4 --> B1 --> B2 --> B3 B3 --> B4 --> B5 B5 -->|new tweet| C1 --> C2 --> C3 --> C4 --> C5 --> C6 --> C7 --> D1 --> B2 B5 -->|no new tweet| D1 --> B2

That is the most straightforward opening I can give. But yes, it’s also the most boring because only people who know the code would understand what each step means in detail. I’ll be your tour guide throughout this mess I kind of cleaned up!

(1) Startup & Configuration

I can easily break this down into two parts here, setup and final setup; it’ll make sense soon.

Initial Setup (Load the passwords)

First thing first, if you want to go on Twitter/X, or post on Bluesky, you would need to have an account on both ends. That’s where we start, getting all the core information out of the way and ready for the script to save.

load_dotenv()

So simple! Well, for simplicity, I am hiding the fact that the loading part is spread out, but really, this is the most important function; it loads your environment variables. You can look at the full code if you want to read along.

Final Setup (Loading the connection)

Here is the “cooler” code for initializing the Bluesky and Twitter/X clients that will do the posting and fetching.

app = TwitterAsync("session")
    username = os.getenv("TWITTER_USERNAME") # thanks to load_dotenv(), we can get the saved hidden USERNAME
    password = os.getenv("TWITTER_PASSWORD") # and password
    await app.sign_in(username, password) # setting up twitter/x
    bluesky_client = init_bluesky_client() # stting up bluesky

And if you’re curious about the Bluesky initialization code:

def init_bluesky_client() -> Client:
    client = Client()

    session_string = get_session() # trying to get old session (saves time)
    if session_string:
        process('Reusing session')
        try:
client.login(session_string=session_string)
        return client # connects through older session
        ...  

    process('Creating new session') # creating new client from scratch
    bluesky_username = os.getenv("BLUESKY_USERNAME") # same env you saw earlier
    bluesky_password = os.getenv("BLUESKY_PASSWORD")
    ...

    client.login(bluesky_username, bluesky_password)

    return client # client is initialized!

Just a tiny bit more complicated, the Twitter/X client initialization has a built-in old-session grabber, so that’s why it’s a bit shorter, basically. That’s it for step 1.

(2) Twitter/X Monitoring

To understand this part, it’s like understanding the difference between downloading a file and opening a file.

For example, you can download the VLC media player, but if you don’t have it installed, you can’t run it or watch any videos at all. Hope that makes sense to you.

Going to target profile

This is the download part; we are going to the profile by simply running this extremely long {< term “async” >} function…

# Heavily shrunk down for the record
async def monitor_tweets(app, bluesky_client, target_username: str, check_interval: int, enable_translation: bool, from_lang: str, to_lang: str):
    ...
    try:
            user = await app.get_user_info(target_username) # that's really it going to the target profile

I know, right? So anticlimactic, but it’s really that simple. Of course, the Python library I use did that, so thank you, Tayyab Hussain. But the very next lines after that code are exactly the next step.

Retrieve lastest post

...
all_tweets = await get_tweets_with_retry(app, user) # gets the tweets now
            ...
            if all_tweets: 
                latest_tweet = None
                for tweet in all_tweets: # loops through all tweets and grabs the latest post, saving it as latest_tweet
                    if hasattr(tweet, 'id'):
                        latest_tweet = tweet
                        break
                        ...

This is the main core; it checks every x seconds to see if the latest_tweet is the same or if it is different. If it is the same, then you do nothing and wait again. If latest_tweet is different from the current latest, well, then it is time for the main course meal.

(3) Process Tweet (juicy stuff)

Bringing all the stuff in

Just 10 lines down the code from the previous snippet, we have where the code starts processing everything.

if tweet_id != last_tweet_id:
  last_tweet_id = tweet_id
  ...
  await process_tweet(latest_tweet, bluesky_client,...) # calls the main function that will handle everything, I just hid out the translation part for simplicity

Well does the process_tweet function do? Well, a lot.

Reads tweet text (finally are looking at what we downloaded)
Cleans tweet text (will come back to this)
Downloads media (videos/images) if available
Post to Bluesky
Delete the downloaded media (if available)

That is a lot of stuff to do in a short amount of time, but I have never actually benchmark or tested so maybe it’s faster than I think.

Removing the junk

When the script runs and gets the new tweet, the text is so much messier than it normally looks. So we need to clean it up. Here is what an example response looks like before and after the cleaning process.

Before:
fallen in love with this library so fast https://t.co/17cikzlYZO

After:
fallen in love with this library so fast

If I can say something about this, way before when I was making this in late 2024, the output before was a bit messier, but at least now the library I use has cleaned up most of it!

So now the clean tweet text function is much shorter than before.

def clean_tweet_text(text: str) -> str: # -> means this function returns a string
    # Use the remove_tco_links function to clean the text
    text = remove_tco_links(text)
    # Replace 'RT ' at the beginning with '🔁'
    text = re.sub(r'^RT ', '🔁 ', text) # swaps out the RT for a repost symbol
    return text

So just in case the post is a repost, it looks nicer now. Here is the remove_tco_links function for the curious:

def remove_tco_links(text: str) -> str:
    # Updated pattern to match all HTTP/HTTPS links
    pattern = r'https?://\S+' # whole lotta regex no one understands
    cleaned_text = re.sub(pattern, '', text) 
    cleaned_text = ' '.join(cleaned_text.split())
    return cleaned_text

Wonderful, now we have a clean text that is ready to be posted on Bluesky. It gets a bit more advanced from here; I will try my best.

(4) Posting to Bluesky

Downloading the media

Huge behemoth of a function here as well, so let’s shrink it down to the utmost core.

async def download_tweet_media(tweet):
    images = [] # creates a list to store the images
    videos = [] # creates a list to store the videos
    if hasattr(tweet, 'media') and tweet.media:
        for index, media in enumerate(tweet.media): # loops through the media
            try:
                media_type = media.type if hasattr(media, 'type') else 'photo' # checks if the media is a video or an image
                if media_type == "video": # if the media is a video
                    best_stream = await media.best_stream()
                    if best_stream: # if the best stream is available
                        video_path = await best_stream.download(filename=f"video{index}.mp4") # downloads the video
                        if video_path: # if the video is downloaded
                            videos.append(video_path) # adds the video to the list
                else: # if the media is an image
                    image_path = await media.download(filename=f"image{index}.jpg") # downloads the image
                    if image_path: # if the image is downloaded
                        images.append(image_path) # adds the image to the list
    return images, videos

So so much code for just some JPEGs and MP4s, but it’s worth it. Also the hardest part to code for sure.

Checks if the tweet has media
Checks if the media is a video and/or an image
Downloads the available media
Returns the images and videos as a group to the function calling it

That’s the best I can do to shorten it down into 4 key steps. Now let’s move aside the media and come back to it later.

Building the Bluesky post

The library I use for Bluesky is called atproto; it is a wonderful tool for easily interacting with the Bluesky API with Python instead of having to use the API directly. At first when making this, I had an annoying time (it is my fault) with the post builder the library had built in. But now it is 100% easier. First we will check if the post will have images or videos:

async def post_to_bluesky(bluesky_client, post_text: str, images, videos,...):
    try:
        if images or videos:
            image_objects = [] # creates a list to store the images
            for image_path in images:
                embed = await upload_media(bluesky_client, image_path, "image") # uploads the image to Bluesky
                if embed:
                    image_objects.append(embed)

            video_embeds = [] # creates a list to store the videos
            for video_path in videos:
                embed = await upload_media(bluesky_client, video_path, "video")
                if embed:
                    video_embeds.append(embed) # adds the video to the list
                    ...

Uploading the media

We have to upload the media to Bluesky first, basically preloading the media so it can be posted. Then we can send the post to Bluesky. Uploading the media is done by the upload_media function:

async def upload_media(bluesky_client, media_path, media_type): # takes in the client, media location and type
    try:
        with open(media_path, 'rb') as f:
            media_data = f.read() # reads the media data

        # Upload the media to Bluesky
        upload_response = bluesky_client.com.atproto.repo.upload_blob(media_data) # the magic happens here, it uploads the media to Bluesky
        if upload_response and hasattr(upload_response, "blob"): # if the media is uploaded successfully
            if media_type == "video":
                return VideoEmbed(video=upload_response.blob,alt="Video uploaded from tweet") # creates a video embed
            elif media_type == "image":
                return Image(alt="Image uploaded from tweet",image=upload_response.blob) # creates an image embed

Now that Bluesky has the media, we can send the text alongside it quickly.

Send post to Bluesky

So let’s go back to the post_to_bluesky function because I hid the last part for dramatic effect.

...
if image_objects: # if there are images, we send them to Bluesky
  image_embed = ImageEmbed(images=image_objects) # creates an image embed
  response = bluesky_client.send_post(text=post_text,embed=image_embed) # sends the post to Bluesky with the images
  ...

if video_embeds: # if there are videos, we send them to Bluesky
    for video_embed in video_embeds:
        response = bluesky_client.send_post(text=post_text,embed=video_embed) # sends the post to Bluesky with the videos
        ...
else: # if there is no media, we just send the text
    response = bluesky_client.send_post(text=post_text) # sends the post to Bluesky
    ...

That seemed like a mess to read through, because it kind of is. But there it is; the post is sent to Bluesky for the world to see. Time to clean up the mess with the media we downloaded.

os.remove(image_path) # removes the image
os.remove(video_path) # removes the video

Now we are done, time to wait for the next new tweet.

Lesson of the day

Back in 2024, I suffered making this mirror, and I’m glad at least I don’t have to work on it every day. Now it runs on my self-hosted server, and I forget about it because it just works, even if it does have a few quirks and is super messy.

Check out two mirrors I run (as of writing) here:

Check out the sidebar for links to my socials and BMAC.