As you may know, I have a Bluesky mirror script, and some ask me how it works. Well, here it is!
The Flowchart (how data moves)
That is the most straightforward opening I can give. But yes, it’s also the most boring because only people who know the code would understand what each step means in detail. I’ll be your tour guide throughout this mess I kind of cleaned up!
(1) Startup & Configuration
I can easily break this down into two parts here, setup and final setup; it’ll make sense soon.
Initial Setup (Load the passwords)
First thing first, if you want to go on Twitter/X, or post on Bluesky, you would need to have an account on both ends. That’s where we start, getting all the core information out of the way and ready for the script to save.
load_dotenv()
So simple! Well, for simplicity, I am hiding the fact that the loading part is spread out, but really, this is the most important function; it loads your environment variables. You can look at the full code if you want to read along.
Final Setup (Loading the connection)
Here is the “cooler” code for initializing the Bluesky and Twitter/X clients that will do the posting and fetching.
app = TwitterAsync("session")
username = os.getenv("TWITTER_USERNAME") # thanks to load_dotenv(), we can get the saved hidden USERNAME
password = os.getenv("TWITTER_PASSWORD") # and password
await app.sign_in(username, password) # setting up twitter/x
bluesky_client = init_bluesky_client() # stting up bluesky
And if you’re curious about the Bluesky initialization code:
def init_bluesky_client() -> Client:
client = Client()
session_string = get_session() # trying to get old session (saves time)
if session_string:
process('Reusing session')
try:
client.login(session_string=session_string)
return client # connects through older session
...
process('Creating new session') # creating new client from scratch
bluesky_username = os.getenv("BLUESKY_USERNAME") # same env you saw earlier
bluesky_password = os.getenv("BLUESKY_PASSWORD")
...
client.login(bluesky_username, bluesky_password)
return client # client is initialized!
Just a tiny bit more complicated, the Twitter/X client initialization has a built-in old-session grabber, so that’s why it’s a bit shorter, basically. That’s it for step 1.
(2) Twitter/X Monitoring
To understand this part, it’s like understanding the difference between downloading a file and opening a file.
For example, you can download the VLC media player, but if you don’t have it installed, you can’t run it or watch any videos at all. Hope that makes sense to you.
Going to target profile
This is the download part; we are going to the profile by simply running this extremely long {< term “async” >} function…
# Heavily shrunk down for the record
async def monitor_tweets(app, bluesky_client, target_username: str, check_interval: int, enable_translation: bool, from_lang: str, to_lang: str):
...
try:
user = await app.get_user_info(target_username) # that's really it going to the target profile
I know, right? So anticlimactic, but it’s really that simple. Of course, the Python library I use did that, so thank you, Tayyab Hussain. But the very next lines after that code are exactly the next step.
Retrieve lastest post
...
all_tweets = await get_tweets_with_retry(app, user) # gets the tweets now
...
if all_tweets:
latest_tweet = None
for tweet in all_tweets: # loops through all tweets and grabs the latest post, saving it as latest_tweet
if hasattr(tweet, 'id'):
latest_tweet = tweet
break
...
This is the main core; it checks every x seconds to see if the latest_tweet is the same or if it is different. If it is the same, then you do nothing and wait again. If latest_tweet is different from the current latest, well, then it is time for the main course meal.
(3) Process Tweet (juicy stuff)
Bringing all the stuff in
Just 10 lines down the code from the previous snippet, we have where the code starts processing everything.
if tweet_id != last_tweet_id:
last_tweet_id = tweet_id
...
await process_tweet(latest_tweet, bluesky_client,...) # calls the main function that will handle everything, I just hid out the translation part for simplicity
Well does the process_tweet function do? Well, a lot.
- Reads tweet text (finally are looking at what we downloaded)
- Cleans tweet text (will come back to this)
- Downloads media (videos/images) if available
- Post to Bluesky
- Delete the downloaded media (if available)
That is a lot of stuff to do in a short amount of time, but I have never actually benchmark or tested so maybe it’s faster than I think.
Removing the junk
When the script runs and gets the new tweet, the text is so much messier than it normally looks. So we need to clean it up. Here is what an example response looks like before and after the cleaning process.
Before:
fallen in love with this library so fast https://t.co/17cikzlYZO
After:
fallen in love with this library so fast
If I can say something about this, way before when I was making this in late 2024, the output before was a bit messier, but at least now the library I use has cleaned up most of it!
So now the clean tweet text function is much shorter than before.
def clean_tweet_text(text: str) -> str: # -> means this function returns a string
# Use the remove_tco_links function to clean the text
text = remove_tco_links(text)
# Replace 'RT ' at the beginning with '🔁'
text = re.sub(r'^RT ', '🔁 ', text) # swaps out the RT for a repost symbol
return text
So just in case the post is a repost, it looks nicer now. Here is the remove_tco_links function for the curious:
def remove_tco_links(text: str) -> str:
# Updated pattern to match all HTTP/HTTPS links
pattern = r'https?://\S+' # whole lotta regex no one understands
cleaned_text = re.sub(pattern, '', text)
cleaned_text = ' '.join(cleaned_text.split())
return cleaned_text
Wonderful, now we have a clean text that is ready to be posted on Bluesky. It gets a bit more advanced from here; I will try my best.
(4) Posting to Bluesky
Downloading the media
Huge behemoth of a function here as well, so let’s shrink it down to the utmost core.
async def download_tweet_media(tweet):
images = [] # creates a list to store the images
videos = [] # creates a list to store the videos
if hasattr(tweet, 'media') and tweet.media:
for index, media in enumerate(tweet.media): # loops through the media
try:
media_type = media.type if hasattr(media, 'type') else 'photo' # checks if the media is a video or an image
if media_type == "video": # if the media is a video
best_stream = await media.best_stream()
if best_stream: # if the best stream is available
video_path = await best_stream.download(filename=f"video{index}.mp4") # downloads the video
if video_path: # if the video is downloaded
videos.append(video_path) # adds the video to the list
else: # if the media is an image
image_path = await media.download(filename=f"image{index}.jpg") # downloads the image
if image_path: # if the image is downloaded
images.append(image_path) # adds the image to the list
return images, videos
So so much code for just some JPEGs and MP4s, but it’s worth it. Also the hardest part to code for sure.
- Checks if the tweet has media
- Checks if the media is a video and/or an image
- Downloads the available media
- Returns the images and videos as a group to the function calling it
That’s the best I can do to shorten it down into 4 key steps. Now let’s move aside the media and come back to it later.
Building the Bluesky post
The library I use for Bluesky is called atproto; it is a wonderful tool for easily interacting with the Bluesky API with Python instead of having to use the API directly. At first when making this, I had an annoying time (it is my fault) with the post builder the library had built in. But now it is 100% easier. First we will check if the post will have images or videos:
async def post_to_bluesky(bluesky_client, post_text: str, images, videos,...):
try:
if images or videos:
image_objects = [] # creates a list to store the images
for image_path in images:
embed = await upload_media(bluesky_client, image_path, "image") # uploads the image to Bluesky
if embed:
image_objects.append(embed)
video_embeds = [] # creates a list to store the videos
for video_path in videos:
embed = await upload_media(bluesky_client, video_path, "video")
if embed:
video_embeds.append(embed) # adds the video to the list
...
Uploading the media
We have to upload the media to Bluesky first, basically preloading the media so it can be posted. Then we can send the post to Bluesky. Uploading the media is done by the upload_media function:
async def upload_media(bluesky_client, media_path, media_type): # takes in the client, media location and type
try:
with open(media_path, 'rb') as f:
media_data = f.read() # reads the media data
# Upload the media to Bluesky
upload_response = bluesky_client.com.atproto.repo.upload_blob(media_data) # the magic happens here, it uploads the media to Bluesky
if upload_response and hasattr(upload_response, "blob"): # if the media is uploaded successfully
if media_type == "video":
return VideoEmbed(video=upload_response.blob,alt="Video uploaded from tweet") # creates a video embed
elif media_type == "image":
return Image(alt="Image uploaded from tweet",image=upload_response.blob) # creates an image embed
Now that Bluesky has the media, we can send the text alongside it quickly.
Send post to Bluesky
So let’s go back to the post_to_bluesky function because I hid the last part for dramatic effect.
...
if image_objects: # if there are images, we send them to Bluesky
image_embed = ImageEmbed(images=image_objects) # creates an image embed
response = bluesky_client.send_post(text=post_text,embed=image_embed) # sends the post to Bluesky with the images
...
if video_embeds: # if there are videos, we send them to Bluesky
for video_embed in video_embeds:
response = bluesky_client.send_post(text=post_text,embed=video_embed) # sends the post to Bluesky with the videos
...
else: # if there is no media, we just send the text
response = bluesky_client.send_post(text=post_text) # sends the post to Bluesky
...
That seemed like a mess to read through, because it kind of is. But there it is; the post is sent to Bluesky for the world to see. Time to clean up the mess with the media we downloaded.
os.remove(image_path) # removes the image
os.remove(video_path) # removes the video
Now we are done, time to wait for the next new tweet.
Lesson of the day
Back in 2024, I suffered making this mirror, and I’m glad at least I don’t have to work on it every day. Now it runs on my self-hosted server, and I forget about it because it just works, even if it does have a few quirks and is super messy.
Check out two mirrors I run (as of writing) here:
Check out the sidebar for links to my socials and BMAC.