List vs Vector in 2022 :: Mar 15, 2022

It’s hard to choose between Scala’s List and Vector.

On one hand, List is the most popular sequence. It is the default Seq, and most libraries including the Scala 3 compiler use it extensively. List is very fast in the usecases it is designed for, which are O(n) traversals, head/tail destructuring, and construction with prepend.

Vector is theoretically better than List. It adds O(1) length, O(1) access, O(1) update, and O(1) append methods. Vector is more complicated than List, which could affect its real-world performance. Of course, List is not so simple either.

Hypothesis: If Vector can match List’s performance in the common usecases, then we should prefer Vector.

Usecase 1: O(n) Traversals

Given

case class Cls(i: Int)

val list: List[Cls] = ???
val vector: Vector[Cls] = ???

Which is faster?

@Benchmark
def listSum: Long =
  list.filter(_.i > 0).map(_.i.toLong).sum

@Benchmark
def vectorSum: Long =
  vector.filter(_.i > 0).map(_.i.toLong).sum


If you have one million elements, Vector can do this 4 milliseconds faster. Not a very significant difference. So, Vector meets our requirement for O(n) traversals.

Usecase 2: Construction with Prepend + Traversals

Of course, every collection needs to be constructed before being used. So lets look at construction followed by traversal.

If we construct using prepend, List wins.

@Benchmark
def listPrependAndSum: Long =
  var lst = List.empty[Cls]
  for cls <- data do lst = cls :: lst
  lst.filter(_.i > 0).map(_.i.toLong).sum

@Benchmark
def vectorPrependAndSum: Long =
  var vec = Vector.empty[Cls]
  for cls <- data do vec = cls +: vec
  vec.filter(_.i > 0).map(_.i.toLong).sum


List is 28 ms faster in the case of 1 million elements. Which is starting to get significant; don’t use Vector as a ‘functional Stack’.

Usecase 3: Construction with Append + Traversals

In-order construction with append is more common in everyday code. Lists are O(n) in this case so the only way to fairly compare is with their mutable builders:

@Benchmark
def listBuilderAndSum: Long =
  val builder = List.newBuilder[Cls]
  for cls <- data do builder.addOne(cls)
  val list = builder.result()

  list.filter(_.i > 0).map(_.i.toLong).sum


@Benchmark
def vectorBuilderAndSum: Long =
  val builder = Vector.newBuilder[Cls]
  for i <- data do builder.addOne(i)
  val vec = builder.result()

  vec.filter(_.i > 0).map(_.i.toLong).sum


Besides being faster than List’s builder, the Vector builder is about 20% faster than building a List with prepend. It’s also much faster than using :+ on Vector:


So use the builder when constructing a Vector item by item.

Do Views help?

Each filter and map create a new collection. Will using a view help?

@Benchmark
def listViewSum: Long =
  list.view.filter(_.i > 0).map(_.i.toLong).sum

@Benchmark
def vectorViewSum: Long =
  vector.view.filter(_.i > 0).map(_.i.toLong).sum

Not significantly.


Views are lazy, which makes debugging & resource safety harder. The performance difference on million-element collections is only a few milliseconds. So don’t use them unless you measure.

Conclusions

If you are using List like a functional Stack, keep using it.

Otherwise, Vector is just as fast and more flexible.

JMH Benchmark

Results are using JDK 18 preview, Scala 3.1.1, Linux.

import org.openjdk.jmh.annotations.{Benchmark, BenchmarkMode, Fork, Measurement, Mode, Param, Scope, State, Warmup}

import java.util.ArrayDeque
import scala.util.Random
import org.openjdk.jmh.annotations.Setup
import scala.jdk.StreamConverters.*

case class Cls(i: Int)

@BenchmarkMode(Array(Mode.Throughput))
@State(Scope.Benchmark)
@Measurement(time = 1, iterations = 5)
@Warmup(time = 1, iterations = 5)
@Fork(value = 2)
class Bench:

  @Param(Array("100", "10000", "100000", "1000000"))
  var size: Int = _

  var data: Array[Cls] = _
  var list: List[Cls] = _
  var vector: Vector[Cls] = _

  @Setup
  def setup(): Unit =
    data = Array.fill(size)(Cls(Random.nextInt))
    list = List.from(data)
    vector = Vector.from(data)

  @Benchmark
  def listBuilderAndSum: Long =
    val builder = List.newBuilder[Cls]
    for cls <- data do builder.addOne(cls)
    val list = builder.result()

    list.filter(_.i > 0).map(_.i.toLong).sum


  @Benchmark
  def vectorBuilderAndSum: Long =
    val builder = Vector.newBuilder[Cls]
    for i <- data do builder.addOne(i)
    val vec = builder.result

    vec.filter(_.i > 0).map(_.i.toLong).sum

  @Benchmark
  def vectorBuildDirectAndSum: Long =
    var vec = Vector.empty[Cls]
    for cls <- data do vec = vec :+ cls

    vec.filter(_.i > 0).map(_.i.toLong).sum


  @Benchmark
  def listPrependAndSum: Long =
    var lst = List.empty[Cls]
    for cls <- data do lst = cls :: lst
    lst.filter(_.i > 0).map(_.i.toLong).sum

  @Benchmark
  def vectorPrependAndSum: Long =
    var vec = Vector.empty[Cls]
    for cls <- data do vec = cls +: vec
    vec.filter(_.i > 0).map(_.i.toLong).sum
  
  @Benchmark
  def listSum: Long =
    list.filter(_.i > 0).map(_.i.toLong).sum

  @Benchmark
  def vectorSum: Long =
    vector.filter(_.i > 0).map(_.i.toLong).sum

  @Benchmark
  def listViewSum: Long =
    list.view.filter(_.i > 0).map(_.i.toLong).sum

  @Benchmark
  def vectorViewSum: Long =
    vector.view.filter(_.i > 0).map(_.i.toLong).sum

  @Benchmark
  def listJavaStreamSum: Long =
    list.asJavaSeqStream.filter(_.i > 0).mapToLong(_.i.toLong).sum

  @Benchmark
  def vectorJavaStreamSum: Long =
    vector.asJavaSeqStream.filter(_.i > 0).mapToLong(_.i.toLong).sum