There can be only one

When I got my new SSD I decided it was time to be brave and try to boot Linux from a USB. In case you were not aware, from the beginning of the year (2013) there  were reports that booting Ubuntu was bricking some Samsung laptops. Of course I did not want to risk it with my new toy. At the end the problem was not Linux but a bug in the Samsung BIOS related to the UEFI boot. My laptop was not listed as one of the affected models, however not listed did not mean (to me) out of the woods. Moreover, if the laptop would deactivate the UEFI boot nothing bad should happen but once again we don’t leave in Shouldland and at the time the laptop was my only computer (not any more).

So I started Kubuntu from a USB and the sky did not fall over my head, that was a sign: there can be only one. Windows had to go.  I grabbed a live CD (Parted Magic) and started my journey.

Part -I: Before the beginning.

If you plan to install a new operating system consider to first secure erase the disk, this is not for security reasons but for performance. Some tutorials using different tools:

Be aware that if your SSD is old (like the one in my Asus Eee PC 900) it may not support the ATA command issue to clear the memory. In that case if you figure it out what to do tell me.

Part I: prepare yourself.

Proper SSD alignment is essential. Should you be wondering “alignment to what?” then read first the SSD basics or even better read the mother of all SSD guides.

We will have to align everything (partitions, filesystems, lvm) to the erase block size, therefore the first order of business is to obtain the erase block and the page size of your SSD (or SD card). Try the website and the forums of your drive manufacturer, if you don’t find anything then try Google (or any other search engine), if you still out of luck contact the manufacturer, if everything fails it’s time for DIY.

The DIY way. There is a small tool call flashbench that can help to determine the erase block and page size of drives based on flash memories. You probably want to read first the theoretical explanation (which comes with nice tables and many flash model memories). Steps to install and execute:

  1. Make sure you have the essential tools to compile source code:
    (debian based) $ sudo apt-get install build-essential
    (archlinux) # pacman -S base-devel
  2. Install git:
    (debian based) $ sudo apt-get install git
    (archlinux) # pacman -S git
  3. Create a working directory
    $ mkdir ~/projects; cd ~/projects
  4. Obtain the source code:
    $ git clone git://git.linaro.org/people/arnd/flashbench.git
  5. Compile
    $ cd flashbench
    $ make
  6. Now you can execute it and “guess” the values for your drive.

My Kingston HyperX 3K was not giving any meaningful results so I will print the results for the ASUS-PHISON SSD of my Eee PC 900:
$ sudo ./flashbench -a /dev/sdb --blocksize=1024
align 4294967296 pre 743µs on 964µs post 635µs diff 275µs
align 2147483648 pre 772µs on 1ms post 633µs diff 300µs
align 1073741824 pre 777µs on 1.01ms post 631µs diff 309µs
align 536870912 pre 743µs on 977µs post 603µs diff 304µs
align 268435456 pre 739µs on 976µs post 635µs diff 288µs
align 134217728 pre 715µs on 947µs post 605µs diff 287µs
align 67108864 pre 620µs on 856µs post 614µs diff 239µs
align 33554432 pre 620µs on 847µs post 602µs diff 236µs
align 16777216 pre 619µs on 849µs post 605µs diff 237µs
align 8388608 pre 628µs on 856µs post 603µs diff 241µs
align 4194304 pre 621µs on 854µs post 606µs diff 241µs
align 2097152 pre 622µs on 850µs post 599µs diff 239µs
align 1048576 pre 608µs on 801µs post 618µs diff 189µs
align 524288 pre 608µs on 797µs post 619µs diff 184µs
align 262144 pre 609µs on 798µs post 620µs diff 183µs
align 131072 pre 607µs on 799µs post 619µs diff 186µs
align 65536 pre 605µs on 795µs post 613µs diff 185µs
align 32768 pre 617µs on 780µs post 590µs diff 176µs
align 16384 pre 617µs on 791µs post 601µs diff 182µs
align 8192 pre 616µs on 793µs post 600µs diff 185µs
align 4096 pre 606µs on 610µs post 611µs diff 2.01µs

align 2048 pre 604µs on 606µs post 605µs diff 1.5µs

I will not explain the numbers above, for that read the flashbench README and also this link. The theory is in both those links, the hands on is as follows: taking into account that the erase blocks are usually between 256 kB and 8 MB, look at the third column from up to down, the first “big” gap in such a range (256 kB – 8 MB) is likely to be the erase block, the second “big” gap is likely to be the page size, from the above results:

  • Erase Block: 2097152 = 2 MB
  • Page size: 8192 = 8 kB

As I said, my Kingston HyperX 3K SSD did not throw any meaningful results, so I contacted the customer service and they gave the following values (which coincidentally are the same):

  • Erase block: 2 MB
  • Page size: 8 kB

 

Part II: Party-tioning

This part can be rather painful if you think you should leave some allocated space in your drive. Yes, I said to leave some space without being used. Why on earth would I do that when the SSD are so expensive? Well, it turns out that SSDs need free space in order to function correctly (you should know this already), for that end some manufacturers leave some part of the memory for the internal use of the SSD, this is called over-provisioning. For instance my Kingston is advertised as 240 GB but in reality is 256 GB therefore 16 GB are reserved for internal usage of the SSD, that is 6.25% of the total capacity.

There are quite reliable sources stating that for optimal performance 10-30% of the total capacity should be left unused for the SSD, being 25% the recommended amount and therefore since my SSD has only 6.25% reserved, should I leave out more free space? That point is not clear and people have different opinions in the forums, for instance, this user says the Kingston told him it is not necessary to leave unallocated space. Just in case, I will leave some space unallocated up to 10%.

Would you like to know more?  Here, here and here but always take into account the dates of the articles.

That was the painful part, knowing that I will leave unallocated 10 GB. Let’s move on. When you create the partitions the most important is to make sure they are all aligned to the erase block, that means, that every partition beginning should be divisible by the erase block.  So, for example, for my SSD the first partition should start at (have an offset of) 2 MB or 2,097,152 bytes and the rest of partitions should start at any multiple of 2 MB.

Some guides on how to achieve this:

My layout will be very simple: one single partition of 230GB and starting at 2 MB. In my system, the resulting partition is named /dev/sda1

By the way, watch out if you are using Windows 7 because by default it aligns the partitions to 1 MB.

Part III: A new filesystem is born

For my installation I will use LVM and ext4. How many logical volumes should I have? I will keep it simply and have the basics:

  • / (root)
  • /home
  • /var
  • swap

Remember we have to align the LVM volumes to the erase block (2M), for that we need to specify ‘dataalignment‘ when creating physical volumes and make sure that the parameter ‘physicalextentsize‘ of the volume group is set to a multiple of the erase block. From the LVM documentation:

(pvcreate) --dataalignment alignment
     Align  the  start of the data to a multiple of this number.  You
     should also specify an appropriate PhysicalExtentSize when  cre-
     ating the Volume Group with vgcreate.

     To  see the location of the first Physical Extent of an existing
     Physical Volume use pvs -o +pe_start .  It will be a multiple of
     the  requested  alignment.   In  addition  it  may be shifted by
     alignment_offset   from   data_alignment_offset_detection    (if
     enabled in lvm.conf) or --dataalignmentoffset.
(vgcreate) -s, --physicalextentsize PhysicalExtentSize[bBsSkKmMgGtTpPeE]    
     Sets the physical extent size on physical volumes of this volume
     group.  A size suffix (k for kilobytes up to t for terabytes) is
     optional, megabytes is the default if no suffix is present.  The
     default is 4 MB and it must be at least 1 KB and a power of 2.

     Once this value has been set, it is difficult to change it with-
     out recreating the volume group which would involve  backing  up
     and  restoring  data  on  any  logical  volumes.  However, if no
     extents need moving for the  new  value  to  apply,  it  can  be
     altered using vgchange -s.

     If  the volume group metadata uses lvm1 format, extents can vary
     in size from 8KB to 16GB and there is a limit of  65534  extents
     in  each logical volume.  The default of 4 MB leads to a maximum
     logical volume size of around 256GB.

     If the volume group metadata uses lvm2 format those restrictions
     do  not  apply,  but  having a large number of extents will slow
     down the tools but have no impact on I/O performance to the log-
     ical volume.  The smallest PE is 1KB.

     The 2.4 kernel has a limitation of 2TB per block device.

You can read an interesting forum post about the about LVM alignments.

As you see, the documentation states that the physical extent size is is 4 MB which is already a multiple of my erase block, however I will make it explicit. But should we use the default value or is there a better value? I have not found any guide about it, the most relevant information is what the manual already says: in the previous version (LVMv1) the extent size limited the maximum volume size but nowadays (LVMv2) it does not apply and that bigger extent do not penalize IO operations but LVM tools (like lvresize).

Great Markus uses 128MB for his multimedia volumes to improve seek times for mp3/movies however he makes clears that this affects only regular HDD and not SSD. In this link the author chooses the extent size according to the expected maximum volume size in order not to penalize the LVM tools. And that’s it, no more than that.

Just be a Rambo and create the dammed thing.
Create the physical volumes:
# pvcreate --dataalignment 2M /dev/sda1
# vgcreate vgssd --physicalextentsize 4M /dev/sda1

Create the logical volumes:
# lvcreate -C y -L 2G vgssd -n lvswap
# lvcreate -L 30G vgssd -n lvroot
# lvcreate -L 20G vgssd -n lvvar
# lvcreate -l +100%FREE vgssd -n lvhome

Next, we have to format them (except the swap), I will use ext4. The ext4 stride/stripe will be computed as follows:

  • Filesystem block = 4 kB
  • Stride =Page size / Filesystem Block = 8 kB / 4 kB = 2
  • Stripe-width = Erase Block / Filesystem Block = 2 MB / 4 kB = 512

Format the the logical volumes with ext4:
# mkfs.ext4 -b 4096 -E stride=2,stripe-width=512 -L ROOT /dev/mapper/vgssd-lvroot
# mkfs.ext4 -b 4096 -E stride=2,stripe-width=512 -L VAR /dev/mapper/vgssd-lvvar
# mkfs.ext4 -b 4096 -E stride=2,stripe-width=512 -L HOME /dev/mapper/vgssd-lvhome

By the way,  it seems not worthy to  disable Journaling on the filesystem, the gain is not big enough for risk of potential data loss.

I am not planning to use LUKS but, if you do, once more you will have to align to the erase block. Excerpt from the Java Hamster blog:

6. Creating encrypted containers 
Next we setup an encrypted volume above logical volume. According to
this mail the encrypted container also should be aligned. To do so we pass --align-payload option which value is in 512 bytes sectors. To align to 4 MiB we do:


cryptsetup luksFormat --align-payload=8192 /dev/vg_main/lv_ubuntu

Part IV: Distroland, install to your taste.

With all the partitions set now it is time to install the distro you like-hate the most. Of course do not partition again, simply select the already created partitions and assign the correct mount points.

I will install Ubuntu Server 12.04 LTS. In my  desktop I use Arch Linux but I for my laptop I don’t want to have a rolling release distribution, let’s say that when I am on the road I don’t want to face any kind of problems due to upgrades. Don’t get me wrong, if you always check the Arch Linux announcement you will likely not have any problems when updating , still it needs periodical maintenance which I don’t want to provide for the laptop.

I don’t particularly like Unity, I am not a fan of Gnome neither, my desktop is KDE but I don’t like that Ubuntu is always bloated with so much crap that I don’t know that they do (besides sucking my resources). But Ubuntu server is very light and from there I will have “minimalistic” KDE desktop, so Ubuntu Server it is. By the way the Kubuntu team are doing a very good job. 

Part V: My Name Is Optimus Prime

You have your system, is it over? Nooooooooooooooo, now it comes the details, the optimization.

Ext4 mount flags noatime vs relatime

Both flags reduce the amount of data written. Every time a file is read  its metadata is updated with the access time, that means, that every time you open a file (or any program reads a file from disk) there will be write to disk to update the metadata. Well, this info, IMHO, is useless for the majority of the users and it uses are precious SSD writes, therefore let’s ban it. Now we have to options:

  • noatime:  do not update ever the access time of the files.
  • relatime: update the access time of the files only when the files are modified.

Rumor has it that some programs rely on the access time of the files and therefore relatime should be used instead of noatime. However, it seems that “some programs” means Mutt (an email client) or at least no other program is mentioned.

Linus says relatime is cool but I prefer noatime, if one day I find a program that does not work properly because of the access time then I will change it.

Would you like to know more?  For you one and two.

TRIM, TRIM, TRIM, reclaim your space 

As you already know, when a file is deleted the filesystem does not inform the SSD, therefore the SSD has no way of knowing of the data store in it is valid or not. The TRIM command informs to the SSD which data is not valid and therefore it can be discarded. For TRIM to work all the layers (and the hardware) have to support and enable it, that is, if you are using ext4, LVM and LUKS then you have to enable TRIM in the three of them. Note that activating TRIM on LUKS containers may rise some security concerns.

There are two ways issue the TRIM command: let the filesystem to automatically handle it,  explicitly execute the TRIM command or set a periodic (cron) job to execute the command regularly (every day, every week, etc).

  • Let the ext4 to handle TRIM
    Edit /etc/fstab, add ‘discard’ to the list of feature of all the mount points of the SSD. For example:
    UUID=d3a47006-a714-4786-a766-8bc2101c45fd /   ext4   noatime,discard,errors=remount-ro 0 1
    UUID=58497f0c-e9b1-41fc-830f-71f727e9f56a /home  ext4    defaults,discard,noatime 0 2
    UUID=963f0dc4-9f49-415a-98cc-c9b577febd6d /var ext4    defaults,discard,noatime 0 2
  • Manually execute TRIM
    As root execute fstrim on the desired mount points, e.g.
    # for mountPoint in / /home /var; do fstrim -v $mountPoint ; done
  • Add a cron job
    As root, create /etc/cron.daily/trim or /etc/cron.weekly/trim, set execution permissions and write this content:

    #! /bin/sh
    LOG=/var/log/trim.log
    echo “*** $(date -R) ***” >> $LOG
    for mountPoint in / /home /var; do
        fstrim -v $mountPoint >> $LOG ;
    done

There is no consensus about which option is the best approach. The Arch Linux wiki recommends to use ‘discard’  but other sites (and this) disagree. The Gentoo wiki talks about both options:

You don't want to use -o discard on a rootfs mount. [..] Having this running potentially constantly may cause performance degredation, and there are articles to suggest this all over the web.

[..] If you're going to have a low write directory mounted on SSD, using "discard" option will be fine in fstab. [..] If you're going to mount a database to an SSD, you probably want timed TRIM commands rather than the discard option.

In my opinion is not worthy to set use the ‘discard’ option in the fstab, I think it is an unnecessary load, I will use a weekly cron job.

Finally don’t forget to enable TRIM in the rest of the layers:

  • For LVM enable issue_discards option in /etc/lvm/lvm.conf
    – – -UPDATE: make sure the LVM version is at least 2.02.85, previous versions do not support TRIM. Ubuntu 12.04 LTS uses version 2.02.66, so do not install it – – –.
  • All LUKS containers must be opened with the option allow-discards therefore edit /etc/crypttab and add such an option to each SSD encrypted volume, and of course add the same option for volumes open at boot time, in grub:
    GRUB_CMDLINE_LINUX=”cryptdevice=/dev/disk/by-uuid/96b2fe8d-dbde-4225-85a3-baee3bb193d5:cryptlvm:allow-discards

And rebuild the initram:
(debian based) $ sudo update-initramfs -u -k all
(archlinux) # mkinitcpio -p linux

Webupd8 has an excellent article about how to enable TRIM including the support for LUKS.

Schedule your I/O

The default I/O scheduler is CFQ but for SSDs the NOOP or deadline are preferred. Phoronix has a performance review of the different schedulers for the kernel 3.4 and there are also other tests. The most recommended scheduler for SSDs is deadline because it guarantees read responsiveness under heavy writes.

You can check which scheduler is running by :
$ cat /sys/block/sdX/queue/scheduler
noop deadline [cfq]

The value inside square brackets is the algorithm in use. If you want to change the value then, as root, echo the new scheduler to the desired drive (sdX):
# echo deadline > /sys/block/sdX/queue/scheduler

If you only have SSDs and you want to permanently change the scheduler, then add the ‘elevator’ kernel parameter. For GRUB these are the steps:

  1. Edit /etc/default/grub and add elevator=deadline to GRUB_CMDLINE_LINUX_DEFAULT, it should look like:
    GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash elevator=deadline”
  2. Update grub:
    (debian based)$ sudo update-grub
    (archlinux)# grub-mkconfig -o /boot/grub/grub.cfg
  3. Restart

If you have a mix environment of regular HDD and SSD then should should specify the scheduler by disk and not by kernel parameter. Follow this instructions for Ubuntu  and for Arch Linux.

SWAP

swap is my enemy, I will buy more RAM
swap is my enemy, I will buy more RAM
swap is my enemy, I will buy more RAM

Repeat this mantra three times a day until you buy more RAM. Anyhow, we are going to try not to use the swap at all, we should have a swap partition but only use it when we have exhausted the RAM. In order to command the kernel to obey your wishes (at least regarding the swap), as root execute:
sysctl -w vm.swappiness=1
sysctl -w vm.vfs_cache_pressure=50

For this change to be permanent, edit /etc/sysctl.conf and set those values:
vm.swappiness=1
vm.vfs_cache_pressure=50

Surfing Fast: Put Web Browsers Profiles on RAM

Web browser maintain user data in what it is the user’s profiles; the web browsers are constantly writing data, in other words, this is a source of constants and undesired writes. Fortunately graysky2 has develop a program to keep the profiles in RAM and periodically synchronize them to disk.

Install the Profile-sync-daemon, the instructions are in the web, for Ubuntu:
sudo add-apt-repository ppa:graysky/utils
sudo apt-get update
sudo apt-get install profile-sync-daemon

Have a look at the manual, but if you are one of those who only read the manual when everything else fails then, at the very least, edit /etc/psd.conf and add the users you want. For Ubuntu you can find more information in the Webupd8 article.

Live large: Put Everything on RAM

Well, this is a bit of an exaggeration, replace ‘everything’ by ‘anything’. The idea is to not only have the web browser profiles on RAM but whichever directory you want. This approach will decrease SSD writes at the expense of risking data loss in case of a sudden shutdown (like when the cat bites the plug). The following directories would be a good choice:
/var/log
/var/tmp

Once more graysky2 provides the solution for Arch Linux: Anything-sync-daemon [manual]. And for Ubuntu (deb package) wor  has developed goanysync (which also has an Arch version).

Compiling in tmpfs

Another good advice from Arch Linux, if you are going to compile then do it on RAM. Read the instructions but basically add the following line in /etc/fstab to create a RAM disk (substitued 7G by the desired number):
tmpfs /scratch tmpfs nodev,nosuid,size=7G 0 0

Final touch: Use compressed RAM, Compcache / Zram

My old and limited Eee PC 900 has only 1 GB of RAM and the use of zram (and preload) makes a big difference. I am not sure if it can affect positively the performance of systems with more than 4 GB (like my laptop that has 6GB, yes you read correctly 6GB, not 4 not 8 but 6, why? I don’t know, ask Samsung) but in the Arch Linux wiki  they recommend it for SSDs because it reduces writes to disk.

If you want to install it in Arch Linux follow the wiki instructions, for Ubuntu:
 (ubuntu)$ sudo apt-get install zram-config

And that’s it, well one more thing. Have a look at the preloading applications personally I like ‘preload‘ but be aware.

Part VI: No More Emacs

There is no more, what else do you want? Here some interesting links and references and leave me alone.

Essential:
The SSD Anthology: Understanding SSDs and New Drives from OCZ
(2009 March 18)
Flash memory card design (linaro)

Important:
The Impact of Spare Area on SandForce, More Capacity At No Performance Loss? (2010 May 03)
Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs (2012 December 04)
ADATA XPG SX900 (128GB) Review: Maximizing SandForce Capacity (2012 June 08) 
New to SSDs? Read this first before asking questions! (2011 July  07) 

Wikipedia pages:
Solid-state drive 
Write amplification
TRIM
Deadline scheduler
Noop scheduler

Other wikis:
(Arch Linux )Solid State Drives
(Arch Linux) Maximizing Performance
(Debian) SSDOptimization
(Gentoo) SSD

Alignments:
Aligning Filesystems to an SSD’s Erase Block Size (2009 February 20)
Ensuring SSD alignment with parted tool (2009 July 20)
Aligning partitions, lvm and encrypted volumes on SSD (2012 April 04)
Linux SSD partition alignment tips (2012 August 09)
Speed Up Your SSD By Correctly Aligning Your Partitions (2011 September 06)
SSD Alignment Calculator
Partition Alignment (online tool)

Optimization guides:
How to maximise SSD performance with Linux (2012 March 06)
SSD optimisations for Ubuntu (2012 August 26)
Speed Up Applications Load Time in Ubuntu – Preload (2009 August)
Solid State Drive (SSD): optimize it for Ubuntu, Linux Mint and Debian
Optimizing fs on sd-card for Linux/Fedora on Dreamplug  (2012 January 14) 

Secure Erase:
How to use HDDErase  (2010 June 11)
How to: Secure Erase your Solid State Drive (SSD) with Parted Magic (2012 March 11)
ATA Secure Erase (SE) and hdparm

Garbage Collector:
Enable TRIM On SSD (Solid-State Drives) In Ubuntu For Better Performance (2013 January 15)
How to Activate TRIM on LUKS Encrypted Partitions in Ubuntu & Debian (2012 April 13)
TRIM & dm-crypt … problems? (2011 August 14)
Impact of ext4’s discard option on my SSD (2011 July 08)
Should You Care About Over-Provisioning On A SandForce-Based SSD? (2012 November 23)

Reducing writes:
Keep Your Browser Profiles In tmpfs (RAM) For Reduced Disk Writes And Increased Performance With Profile Sync Daemon (2013 February 14)

Schedulers:
Linux I/O Scheduler Comparison On The Linux 3.4 Desktop (2012 May 11)
Effects Of Linux IO Scheduler On SSD Performance (2012 May 18)

LVM:
Markus Gattol on Logical Volume Management (2013 April 23)
Configure LVM for Data Storage (2010 September 30)

Forums and emails:
Drawbacks of using preload? Why isn’t it included by default? (2012 Mar 05)
Kingston said over provisioning is not needed on my Kingston HyperX 3K (2013 April 26)
Relatime vs noatime for flash drives (2012 May 23)
Re: How to optimise encrypted filesystems on an SSD? (2010 Febraury 06)
Linus Torvalds on relatime (2007 August 04 )
SSD, Erase Block Size & LVM: PV on raw device, Alignment(2012 February 03)

Bricking Samsungs:
Booting Linux using UEFI can brick Samsung laptops (2013 January 30)
Samsung laptop bug is not Linux specific (2013 February 08)

Tools and programs:
Parted Magic
goanysync
Anything-sync-daemon (source)
Profile-sync-daemon (source)
flashbench doc
GParted Manual

Advertisements
This entry was posted in Linux, SSD and tagged , , , , , . Bookmark the permalink.

3 Responses to There can be only one

  1. Costas says:

    Eimai maimou

  2. djk says:

    Hi

    A nice set of articles on the Kingston SSD. One thing – had you considered any calculations for the geometry in fdisk? eg http://blog.nuclex-games.com/2009/12/aligning-an-ssd-on-linux/ or the one you quoted http://www.thomas-krenn.com/en/wiki/Partition_Alignment#fdisk_from_Version_2.17.1 ???

    For my 240GB HyperX, I’m contemplating using the following command to set up the partitions on 2M boundaries as per the erase size you discovered:
    # fdisk -S32 -H128 -u=cylinders -c /dev/sda

    • thepadawan42 says:

      Hi, thank you for you comment.

      Modifying the disk geometry (like in fdisk) was the only way in the past to properly align the partitions to the erase blocks, at that time support for SSD was quite limited. Fortunately, the partition tools have evolved and currently they allow us to achieve the same result without “faking” a disk geometry. I would discourage the use of the disk geometry, in my opinion, it is more complicated and error prone (plus it seems incompatible with Windows dual boot).

      Keep in mind that, at the end, the only thing you need is to simply start each partition at a multiple of the erase block.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s