ETS is an exceptional tool that I feel is greatly under-used in the Elixir world.
This talk by Claudio from Erlang-Solutions highlights a very good implementation for achieving concurrent reads as well as sequential-concurrent writes with ETS.
1defmodule Cache do2 use GenServer34 @table :cache5 @name __MODULE__67 def start_link() do8 GenServer.start_link(@name, [], name: @name)9 end1011 def init(_) do12 # :public and read concurrency are very important13 :ets.new(@table, [:named_table, :public, :read_concurrency])14 {:ok, []}15 end1617 def get(key) do18 :ets.lookup(@table, key)19 end2021 def set(key, val) do22 GenServer.cast(__MODULE__, {:set, key, val})23 end2425 def handle_cast({:set, key, val}, state) do26 :ets.insert(@table, {key, val})27 {:noreply, state}28 end29end
Since we have :public
and :read_concurrency
set in our table options, we have the ability to offload the reading of the ETS data to the calling process, allowing the Cache
genserver to focus on inserting only.
However, you really need to understand your data for this to be effective as it really isn’t suited to data that is going to be updated due to race conditions.
For example, let’s say you get a flood of data that needs to be written to the cache e.g. 10,000,000 struct-objects.
These objects contain a tracking counter key of some sort.
Let’s that during these writes, a request comes in to increment counter :foo
. So, something like this might happen:
1iex> foo = Cache.get(:foo)2iex> foo.counter3# => 104iex> foo = %Struct{foo | counter: foo.counter + 1}5# => %Struct{counter: 11..}6iex> Cache.set(:foo, foo)
Nothing wrong with this - it all looks fine. The problem occurs when you look at what already is in the Cache
mailbox.
Let’s say that of those 10million write requests, req # 2,900,000 was to do the same thing - update :foo
counter by one.
Since potentially this update would not have happened by the time the second update request comes through, Cache.get(:foo)
is still going to return the old value of 10
whereas it really should be 11.
This means that the cache is going to get 2 write requests for :foo
with a counter value of 11 rather than for 11 and the next for 12.